Complexity and Disorder of 1/fα Noises

The complexity and the disorder of a 1/fα noise time series are quantified by entropy of entropy (EoE) and average entropy (AE), respectively. The resulting EoE vs. AE plot of a series of 1/fα noises of various values of α exhibits a distinct inverted U curve. For the 1/fα noises, we have shown that α decreases monotonically as AE increases, which indicates that α is also a measure of disorder. Furthermore, a 1/fα noise and a cardiac interbeat (RR) interval series are considered equivalent as they have the same AE. Accordingly, we have found that the 1/fα noises for α around 1.5 are equivalent to the RR interval series of healthy subjects. The pink noise at α = 1 is equivalent to atrial fibrillation (AF) RR interval series while the white noise at α = 0 is more disordered than AF RR interval series. These results, based on AE, are different from the previous ones based on spectral analysis. The testing macro-average F-score is 0.93 when classifying the RR interval series of three groups using AE-based α, while it is 0.73 when using spectral-analysis-based α.


Introduction
It is widely considered that the complexity is relatively high for a complex system intermediate between extreme order and disorder. This results in an inverted U relation between the complexity and the disorder for complex physiologic signals or complex simulated 1/ f α noises [1][2][3][4][5][6][7][8][9]. Exploring the resemblance between the two kinds of signals would provide the underlying mechanism of the physiologic signals of health and disease [10][11][12][13][14].
The power spectral density (PSD) of a 1/ f α noise is inversely proportional to the frequency f to the power of α. The relation between the complexity and the exponent α of 1/ f α noises was studied by Zhang [5] and Saito et al [6]. They separately proposed two different measures of complexity for 1/ f α noises, which resulting in different inverted U curves of complexity vs. α. However, both methods require more than 10 6 data points to obtain reliable results. It is thus not practical to apply the two measures of complexity to physiological signals of limited data length.
Kobayashi and Musha first found that the PSDs of some RR interval series exhibit a 1/f tendency for frequencies below 0.02 Hz [10]. Later, many other studies found consistent results for the RR interval series of healthy and atrial fibrillation (AF) subjects [11][12][13]. Moreover, some PSDs over higher frequencies were found to be influenced by the autonomic nervous system and breathing activity. Thus, the PSDs of healthy subjects contained fluctuations, [14] while those of the AF patients showed a white noise-like flat spectrum [12,13].
On the other hand, entropy of entropy (EoE) and average entropy (AE) have been shown to be two reliable measures of complexity and disorder of RR interval series with only 500 data points, respectively. EoE is used to differentiate healthy subjects from patients who suffer from heart disease [7]. AE is used to differentiate the three groups of subjects, namely congestive heart failure (CHF), atrial fibrillation (AF), and healthy groups [8]. Furthermore, the expected inverted U curves of EoE vs. AE have been realized for real physiological signals such as heart rate signals and static standing postural stability [7][8][9].
In this paper, we first calculate the EoE and the AE of a series of 1/ f α noises with various values of α. We then illustrate a plot of EoE vs. AE and a plot of α vs. AE for the 1/ f α noises. For comparison, two similar plots for a series of RR interval series are illustrated. A cardiac RR interval series and a 1/ f α noise series are considered equivalent when they have the same AE. We find that the 1/ f α noises of different values of α correspond to CHF, AF, and healthy RR interval series, separately.

Simulation of 1/ f α Noises
We use the fractional differencing method to simulate 1/ f α noises [6,15]. The series of a 1/ f α noise {z i (α)} was obtained by performing the convolution of the uniform white noise {I i (α)} between −1 and 1 to an impulse response function {h i (α)}, as follows: where α is the exponent of the 1/ f α noise and i is the index of the element z i (α). In addition, the impulse response function h n (α) is given by and Then, a new series {x i (α)} was constructed as normalized 1/ f α noise as follows: In this study, we simulated a series of normalized 1/ f α noises, each with a length of 50,000 data points. The α in the series of {x i (α)} was set to range from 0 to 3 at an interval of ∆α = 0.01. In total, there are 301 values of α. For each α, 30 simulations of {x i (α)} were generated. The values of EoE and AE of every {x i (α)} were evaluated for averages separately. Consequently, a pair of averaged EoE and AE values were obtained for each α. In total, 9030 (=301 × 30) normalized 1/ f α noises were analyzed.

Description of Heart Rate Data
The cardiac RR interval time series used in this study were extracted from the following three groups of databases in the online database on Physionet [16], namely (i) the Beth Israel Deaconess Medical Center (BIDMC) Congestive Heart Failure Database (CHFDB), (ii) the Massachusetts Institute of Technology (MIT)-Beth Israel Hospital (BIH) Normal Sinus Rhythm Database (NSRDB), which is a database of healthy subjects, and (iii) the Long Term AF Database (LTAFDB). For convenience, the three databases are abbreviated as CHFDB, NSRDB, and LTAFDB for the purposes of this paper. Note that CHFDB, NSRDB, and LTAFDB originally consist of 15, 18, and 83 24-hour-long electrocardiography (ECG) raw records, respectively. The sampling rate of CHFDB was 250 Hz, while that of the NSRDB and LTAFDB was 128 Hz. Moreover, each of the RR interval series extracted from the long-term ECG raw records contains more than 50,000 data points.
To remove the outlier data points among the RR interval series extracted from the online database, which might be due to noise or detection errors, the following two-step process was applied to each of the long-term RR interval series. First, for each individual data point in an RR interval series, we calculated the mean (MEAN) of the 10 data points before and after the data point itself. The data point under test was removed as it was out of the range MEAN ± 0.5 MEAN. Second, for each of the remaining data points that passed the test in step one, we repeated step one for a new mean (MEAN ) and a new standard deviation (SD ). The data point under the test was removed as it was out of the range MEAN ± 2.5 SD .
Then, for each of the 15 and 18 long-term ECG signals from the CHFDB and the NSRDB, we extracted the first 50,000 data points. As for each of the 83 long-term ECG signals from the LTAFDB, we first extracted all the data segments during AF episodes, according to the annotation in PhysioNet. Among them, we adopted 26 segments whose lengths all exceeded 50,000 data points individually. As a result, there were 59 RR interval series in total, each with 50,000 data points. We then extracted the first 500 or 50,000 data points as short or long time series for the following analysis, respectively.

Entropy of Entropy (EoE) and Average Entropy (AE) Analyses
The algorithms of both EoE [7] and AE [8] methods consist of three steps in analyzing a time series {x i } = {x 1 , . . . , x N } of length N. The first and the second steps of the two methods are the same for the construction of a Shannon entropy sequence to represent the time series {x i }. First, the time series is divided into many consecutive non-overlapping windows of equal length τ. Each window is in the form of w where j is the window index ranging from 1 to N/τ and τ corresponds to the scale factor of EoE and AE.
Second, the Shannon entropy value of each window w (τ) j is derived as follows. Suppose that x max and x min are the maximum and minimum of all data collected in this study, respectively. The range from x max to x min is divided into s 1 slices of equal width ∆s 1 = (x max − x min )/s 1 . The probability p jk for a certain data point xi over window w (τ) j to occur in slice k is thus obtained in the form of where k is the state index from 1 to s 1 . Subsequently, the Shannon entropy value y τ j of each window w (τ) j is given by In this respect, the Shannon entropy value y j . Repeating the same process for every window results in a representative Shannon entropy sequence Third, the AE value of {x i } is defined as the average of the Shannon entropy sequence y On the other hand, the EoE value of {x i } is derived as follows. It can be imagined that all elements of y (τ) j distribute over some finite levels and the maximum number of all possible levels s 2 (τ) depends upon the time scale τ. For example, s 2 (1) = 1, s 2 (2) = 2, s 2 (3) = 3, s 2 (4) = 5, s 2 (5) = 7, and s 2 (6) = 11. Then, the probability p l for a certain representative y (τ) j over the sequence y (τ) j to occur in level l is obtained in the form of where l is the level index ranging from 1 to s 2 . Thus, the EoE value of the original time series {x i } is defined as the Shannon entropy value of the Shannon entropy sequence y (τ) j and is given by In this study, x max = 1.6, x min = 0.3, τ = 14, and s 1 = 55 are used for the RR interval series analysis, as suggested in our previous study [8]. Similarly, x max = 1.0, x min = 0.0, τ = 14, and s 1 = 55 are set for the normalized 1/ f α noise analysis. Figure 1 demonstrates three 1/ f α noises and their representative Shannon entropy sequences for AE and EoE analyses. Figure 1a shows the three 1/ f α noises {x i } with the same length N of 80 data points but different α values of 2.0, 1.5, and 1.0 individually. In this case, all 1/ f α noises were analyzed at τ = 5. It can be seen that the 80 data points of each noise were equally divided into 16(= N/τ) windows with each of the 5 data points in a red frame. Then, the Shannon entropy value of every window in red was calculated individually. Figure 1b shows the representative Shannon entropy sequences y

AE-Based Equivalent 1/ f α Noise of an RR Interval Series
In this study, both the AE and the EoE analyses were applied to the 301 simulated 1/ f α noises with α from 0 to 3 at ∆α = 0.01 and the 59 cardiac RR interval time series from 15 CHF, 26 AF, and 18 healthy subjects, separately. We will demonstrate in the results section the resemblance between the 301 1/ f α noises and 59 RR interval series in terms of their inverted U curves of EoE (complexity) vs. AE (disorder). This allows us to consider a 1/ f α noise and an RR interval series as effectively equivalent as they have the same AE value. Accordingly, each RR interval series can be associated with an equivalent α AE . Then, the corresponding 1/ f α noise is referred to as the AE-based equivalent 1/ f α noise of the RR interval time series.
2.6. Spectrum-Based Equivalent 1/ f α Noise of an RR Interval Series The power spectral density of an RR interval time series looks like the power spectral density p( f ) of a theoretical 1/ f α noise, as given by p( f ) = c(1/ f α ), where c is a proportional constant. The value of α can be derived from the slope of the log 10 p( f ) vs. log 10 f plot. Consequently, using the least square fitting method, we obtain the optimal value of α sp that yields the best fit for the theoretical power spectral density p( f ) to the power spectral density of a measured RR interval time series.
The particular 1/ f α noise is referred to as the spectrum-based equivalent 1/ f α noise of the RR interval time series.
In this study, we applied the piecewise cubic Hermite interpolating polynomial method to transform an original RR interval time series vs. beat number into a new RR interval time series vs. time at a sampling frequency of 1 Hz. The power spectral density of the new RR interval series can be obtained by using Welch's periodogram method for Hamming windows with a window length of 2048 points and an overlap of 50% between consecutive windows. Figure 2 illustrates the power spectral densities of the three typical RR interval series from a CHF, a healthy, and an AF subject in a log 10 -log 10 scale, separately. The red line indicates the slope of each power spectral density vs. frequency plot for f ≤ 0.02 Hz, as suggested by Kobayashi and Musha [10], which was obtained by using the least square fitting method. The product of the slope of the red line and −1 gives rise to the exponent α sp of the equivalent 1/ f α noise of the RR interval series.

Classification of AF, Healthy, and CHF Subjects and Performance Indices
For each RR interval series, we obtain two kinds of equivalent α, namely an AE-based equivalent α AE and a spectrum-based equivalent α sp , as stated in Sections 2.5 and 2.6, respectively.
Quadratic discriminant analysis (QDA) was applied to classify the equivalent α of the RR interval series of the three groups [17]. In the training phase, we calculated the mean and the standard deviation of all the α associated with each group. This allowed the fitting of a Gaussian distribution to the distribution of α in each group. Moreover, three different Gaussian curves, corresponding to the three groups, were obtained. Consequently, the two intersection points of the three Gaussian curves determined the two thresholds for the classification of the equivalent α into three groups.
In the testing phase, we adopted the following three commonly used indices under leave-one-out cross validation (LOOCV) of multi-class classification problems [18]: macro-average recall : macro-average precision : macro-average F score : where i, ranging from 1 to 3, indicates one of the three groups of CHF, healthy, or AF patients. TP i , TN i , FP i , and FN i represent true positive, true negative, false positive, and false negative, respectively. Finally, the 95% confidence interval (CI) for Recall M , Precision M , and F M was obtained by using the bootstrap method with sampling carried 10,000 times.

Figure 3a
illustrates the EoE EoE (τ = 5) vs. AE AE (τ = 14) of the 15 CHF, the 18 healthy, and the 26 AF sets of short RR interval series (500 data points), as well as those of the simulated 1/ f α with α ranging from zero to three at ∆α = 0.01. Figure 3b illustrates the EoE (τ = 14) vs. AE (τ = 14) of the same data, but with a long data size (50,000 data points). For each α, 30 simulations of normalized 1/ f α noises were generated. The values of EoE and AE of every normalized 1/ f α noise were separately evaluated to obtain their averages. Consequently, a pair of averaged EoE and AE values was obtained for each α.  From the inverted U curve of the EoE vs. AE plot of the RR interval series, it can be seen that EoE is a good measure of complexity to separate the healthy from the pathologic (CHF and AF). AE is a good measures of disorder to differentiate the three groups of RR interval series. On the other hand, the EoE vs. AE plot of the 1/ f α noises exhibits a distinct inverted U curve, which resembles that of the RR interval series but with lower EoE. The EoE value reaches its maximum at α around 1.50, which corresponds to an AE value of 1.45. In this respect, there appears to be a one-to-one relation between AE and α.
The solid curve in Figure 4 illustrates the relation between AE (τ = 14) and α for the 301 simulated 1/ f α noises for α between zero and three at ∆α = 0.01. It can be seen that α decreases monotonically as AE increases, which implies that α can be a measure of disorder like AE.
An 1/ f α noise and an RR interval series are considered equivalent as they have the same AE. Thus, we may assign an AE-based equivalent α AE to an RR interval series. It can be seen that the values of the equivalent α AE of the healthy subjects range between 1.32 and 1.71 while those of the CHF and the AF subjects are larger than 1.71 and smaller than 1.32, respectively.     Table 1 lists the confusion matrices of LOOCV in differentiating the 59 RR interval series into CHF, healthy, and AF groups using (a) α AE and (b) α sp . Note that the differentiated performances using α AE with short and long data sets are the same. Table 2 lists the Recall M , Precision M , and F M for the same analysis. The 95% confidence interval (CI) for the indices was obtained by using the bootstrap method with sampling carried out 10,000 times. In comparison, the α AE of an RR interval series AE exhibits a better performance.

Discussion
We have shown that the brown noise at α = 2 resembles the CHF RR interval series. The pink noise at α = 1 resembles the AF RR interval series, while the white noise at α = 0 is more disordered than the AF RR interval series. These results, based on AE, are different from previous studies [10][11][12][13]19].
Previous studies explored the long-range correlation properties [19] of the cardiac RR interval series over the low-frequency domain [10][11][12][13]. Thus, 1/f fluctuations were found in healthy heart rates. However, the analysis requires relatively long data sets to quantify the long-term properties.
On the other hand, AE analysis provides the short-term stability of a short time series. This allows us to explore a reliable correspondence via AE between the simulated 1/ f α noises and the real RR interval series. Figure 6 demonstrates the similarity in the waveforms between three very short RR interval time series and their equivalent 1/ f α noises separately. Figure 6a illustrates three simulated 1/ f α noises with a same length of 80 points and α at 2.0, 1.5, and 1.0 separately. Figure 6b illustrates three typical RR interval series with a length of 80 points from a CHF, a healthy, and an AF subject separately. For the three pairs of RR interval time series and 1/ f α noise of different α in Figure 6a,b, the corresponding AE values at α = 5 are 0.29, 1.00, and 1.41 from top to bottom, respectively. It can be seen that each of the three pairs looks very similar.

Conclusions
We calculated the EoE and AE of 1/ f α noise time series with 50,000 data points for 0 ≤ α ≤ 3. The plot of EoE vs. AE exhibits a distinct inverted U curve. The feature of the inverted U curve remains valid with 80 short data points. We have also shown that α decreases monotonically as AE increases, which indicates that α is also a measure of disorder.
We plotted the EoE vs. AE of 59 cardiac RR interval series, which resembles the inverted U curve of 1/ f α noises but with larger EoE. A 1/ f α noise and a cardiac RR interval series are considered equivalent as they have the same AE. Accordingly, we found that the 1/ f α noise resembles a healthy RR interval series when α is around 1.5. The brown noise at α = 2 resembles the CHF RR interval series. The pink noise at α=1 resembles the AF RR interval series, while the white noise at α = 0 is more disordered than the AF RR interval series. These results, based on AE, are different from the previous ones based on spectral analysis. We have shown that our AE-based results are superior in terms of their accuracy in differentiating between CHF, AF, and healthy groups.
There are two main problems that are worth further exploration. The first problem lies in the clarification of why the equivalent α values of RR interval series obtained from AE and spectral analyses are so different. The second problem is that the EoE values of the 1/ f α noises are slightly lower than those of their corresponding AE-based equivalent RR interval series. This implies that differences between the two signals still exist. We believe that exploring a new model for a different simulated signal to fit the cardiac RR interval series better would solve the above two problems.

Conflicts of Interest:
The authors declare no conflicts of interest.