The Systematic Bias of Entropy Calculation in the Multi-Scale Entropy Algorithm

Entropy indicates irregularity or randomness of a dynamic system. Over the decades, entropy calculated at different scales of the system through subsampling or coarse graining has been used as a surrogate measure of system complexity. One popular multi-scale entropy analysis is the multi-scale sample entropy (MSE), which calculates entropy through the sample entropy (SampEn) formula at each time scale. SampEn is defined by the “logarithmic likelihood” that a small section (within a window of a length m) of the data “matches” with other sections will still “match” the others if the section window length increases by one. “Match” is defined by a threshold of r times standard deviation of the entire time series. A problem of current MSE algorithm is that SampEn calculations at different scales are based on the same matching threshold defined by the original time series but data standard deviation actually changes with the subsampling scales. Using a fixed threshold will automatically introduce systematic bias to the calculation results. The purpose of this paper is to mathematically present this systematic bias and to provide methods for correcting it. Our work will help the large MSE user community avoiding introducing the bias to their multi-scale SampEn calculation results.


Introduction
Complexity is an important property of a complex system such as the living organisms, Internet, traffic system etc. Measuring system complexity has long been of great interest in many research fields. Since complexity is still elusive to define, a few approximate metrics have been used to quantify complexity. One widely used measure is entropy which quantifies the irregularity or randomness. Complexity and entropy, however, diverge when complexity reaches the peak. Before the peak, complexity increases with complexity, but complexity decreases with entropy after the peak. To provide approximate solution to this dilemma, people have proposed many empirical measures. A popular one is the multi-scale entropy (MSE) proposed by Costa et al. [1]. MSE is based on Sample entropy (SampEn) [2,3], which is an extension of the well-known Approximate entropy (ApEn) [3,4] after removing the self-matching induced bias. SampEn has gained popularity in many applications such as neurophysiological data analysis [5] and functional MRI data analysis [6,7] because of the relative insensitivity to data length [2,8]. Because complex signal often presents self similarity when the signal is observed at different time scale, Costa et al first applied SampEn to the same signal but at different time scales after coarse graining. When applied to Gaussian noise and 1/f noise, it was observed that SampEn of Gaussian noise decreases with the signal subsampling scale while it stays at the same level for most of scales of a 1/f process. Since a 1/f process is known to have higher complexity (defined by the higher self similarity) than Gaussian noise, the diverging MSE of a 1/f noise and the Gaussian noise appears to support that MSE may provide an approximate approach to measure system complexity. Since its introduction, MSE has been widely used in many different applications as reflected by the thousands of paper citations [1,9]. While MSE and its variants have been shown to be effective for differentiating different system states through simulation or real data, it introduces bias by using the same threshold for identifying the repeated transit status at all time scales. Nikulin and Brismar [10] first observed that MSE not purely measures entropy but both entropy and variation at different scales. We here claimed that the changing variation captured by MSE is mainly caused by an incomplete scaling during the coarse-graining process and the subsequent variance change induced entropy change should be considered as a systematic bias to be removed.
The rest of this report is organized as follows. Section 2 is the background. To better understand the series of entropy formation, we introduced Shannon entropy, ApEn, Sam-pEn, and MSE. Section 3 describes the bias caused by the coarse-graining process and the one threshold-for-all-scales MSE algorithm. Both a mathematical solution and a practical solution were provided to correct the bias. Section 5 concludes the paper.

Entropy and MSE
This section provides a brief history about the evolution of entropy and approximate entropy measures.
Hartley and Nyquist first used logarithm to quantify information [11,12]. Shannon then proposed the concept of Shannon entropy as a measure of information through the sum of the logarithmically weighted probability [13]. Denoting a discrete random variable by X and its probability by p(x), Shannon entropy of X is formulated as: In an analogous manner Shannon defined the entropy of a continuous distribution with the density distribution function(pdf) p(x) by: where E represent the expectation operator. Without loss of generality, in this paper we use natural logarithms to calculate entropy. When the entropy calculated via a logarithm to base b, it could be calculated by H b (X) = 1 log b H(X). Shannon entropy was then extended into the Kolmogorov-Sinai(K-S) entropy [14] for characterizing a dynamic system. Assume that the F-dimension phase space is partitioned into a collection of cells of size r F and the state of the system is measured at constant time intervals δ. Let p(c 1 , ...c n ) be the joint probability that the state of system x(t = δ) is in cell c 1 , x(t = 2δ) is in cell c 2 , ... , and x(t = nδ) is in cell c n . The K-S entropy is defined as K-S entropy depends several parameters and is not easy to estimate. To solve this problem, Grassberger and Procaccia [15] proposed K 2 entropy as a lower bound of K-S entropy. Given a time series U = {u 1 , u 2 , ..., u N } with length N, define a sequence of m dimension vectors v 2 and θ(·) is Heaviside step function. K 2 entropy is defined as By incorporating the embedding vector based phase space reconstruction idea proposed by Takens [16] and replacing the Euclidean metric with the Chebyshev metric v i − v j = max m−1 h=0 |u i+h − u j+h |, Eckmann and Ruelle [17] proposed an estimate of the K-S entropy through the so-called E-R entropy: where the delay is often set to be δ = 1. The E-R entropy has been useful in classifying low-dimensional chaotic systems, but it becomes infinity for a process with superimposed noise of any magnitude [18]. Pincus [4] then extended the E-R entropy into the now well-known ApEn depending on a given embedding window length m and a distance cutoff r for the Heaviside function: ApEn(m, r) = lim N→∞ ApEn(U; m, r), N is the length of discrete signal U.
SampEn was proposed by Richman and Moorman [19] as an extension of ApEn to avoid the bias induced by countering the self-matching of each of the embedding vectors. Specifically, SampEn is formulated by: , fix m and r, SampEn(m, r) = lim N→∞ SampEn(U; m, r), N is the length of discrete signal U.
The coarse-graining multi-scale entropy-based complexity measurement can be traced back to the work by Zhang [20] and Fogedby [21]. In [1,22] Costa et al. calculated entropy at each coarse-grained scale using SampEn and named this process as the MSE. As commented by Nikulin and Brismar [10], a problem of the MSE algorithm is the use of the same matching criterion r for all scales, which causes systematic bias to SampEn.

The Systematic Bias of Entropy Calculation in MSE
In MSE [1,22], the embedding vector matching threshold r in defined by the standard deviation of the original signal. Using the same threshold, entropy of Gaussian signal decreases with the scale used to downsample the original signal. By contrast, entropy of 1/f signal remains unchanged when scale increases. As 1/f signal is known to have high complexity while Gaussian noise has a very low complexity, the monotonic MSE decaying trend or the sum of MSE at different scales were proposed as a metric for quantifying signal complexity.
However, the moving-average based coarse-graining process automatically scales down the subsampled signal at different time scales. Without correction, this additional multiplicative scaling will be propagated into the standard deviation of the signal to be assessed at each time scale and will artificially change sample entropy. This bias can be easily seen from the coarse-graining of a Gaussian noise.
Denote a Gaussian variable and its observations by X = {x 1 , x 2 , ..., x N }, where N indicates the length of the time series. The coarse-graining or moving averaging process where τ > 0 is the coarsegraining level or the so-called "scale". Given the mutual independence of the individual samples of X, the moving averaging of these samples can be considered as an average of independent random variables rather than observations of a particular random variable.
In other word, we can rewrite For Gaussian noise X, X i will be Gaussian noise too and can be fully characterized with the same mean µ and standard deviation (SD) σ. Through a simple mathematics operation, we can get that SD(Y (τ) ) = σ/ √ τ. Because SD(τ) monotonically decreases with τ, if we do not adjust the matching threshold, the number of matched embedded vectors will increase with τ, resulting a decreasing SampEn.
Entropy of a Gaussian distributed variable can be calculated through Shannon entropy: For the simplicity of description, we often normalize the random variable to have a µ = 0 and σ = 1. Considering the scale-dependent SD derived above, we can then get the Shannon entropy of the Gaussian variable at the scale τ by This equation clearly demonstrates the non-linearly but monotonically decreasing relationship of entropy with respect to scale τ.
Below, we provided mathematical derivation of the dependence of MSE on the signal subsampling scale. Given the m dimensional embedding vectors Z , sample entropy can be expressed as [22] SampEn(Y; m, r) = − log where · is the Chebyshev distance. For m = 1, we can have Thus, Based on the iid condition of Y j , we can draw a conclusion that If m ≥ 2, we can get Therefore, and given the mutual independence of Y j . It should be noted that this conclusion does not require the condition of identical distribution, as long as the condition of independence is sufficient. For the simplicity of description, we re-denote Y j+m and Y i+m by two general normally distributed but independent random variables ξ and η whose means are 0 and SDs are 1. The joint probability density functions (PDF) is We can then get Similar to Shannon entropy calculating, after normalize the random variable to have a µ = 0 and σ = 1, the scale-dependent SD derived for coarse grained signal is SD(Y (τ) ) = 1/ √ τ. We can get increases with τ, the above integral monotonically increases with τ. Accordingly, the negative logarithm based sample entropy SampEn(Y (τ) ; m, r) will monotonically decreases with τ. This is consistent with the aforementioned Shannon entropy-based MSE bias description.
The systematic bias in MSE can be corrected by using a scale adaptive matching threshold. One approach to adjust the threshold is to use SD (τ) = SD(0)/ √ τ for scale τ during SampEn(Y (τ) ; m, r) calculation. This works well for Gaussian signal but may not be effective for other signals if they have extra scale-dependent SD behavior in addition to that induced by the subsampling scale. Finding the theoretical scale-dependent SD equation may not be trivial too. Instead, SD can be directly calculated from the data after each coarse graining. This approach has been proposed in [10].
To demonstrate the systematic bias of MSE and the effeteness of the correction method, we used three synthetic time series with known entropy difference: the Gaussian noise, a 1/f noise, and a random walk. The length of time series was N = 2 × 10 4 . MSE with and without bias correction were performed. Figure 1 shows the results of MSE with and without bias correction for the three time series (Figure 1a). Parameters used for SampEn calculation were m = 2, and r = 0.15 × SD. Without bias correction, MSE produced a monotonically decaying SampEn for Gaussian noise when scale increases. By contrast, SampEn of Gaussian noise stays the same level at different scales after bias correction. The SD bias showed minor effects on SampEn calculation for both 1/f noise and the random walk. Correcting the bias did not dramatically change the SampEn at different scales.

Discussion and Conclusions
We provided a full mathematical derivation for the systematic bias in MSE introduced by the coarse graining process. We then used synthetic data to show the bias and the correction of it using dynamic SD calculation. Bias correction for Gaussian data MSE calculation works exactly as described by the theoretical descriptions given in this paper. The systematic bias does not appear to be a big issue for the temporally correlated process such as the 1/f noise and random walk. This is because variance of a temporally correlated process does change with the subsampling process if the sampling rate is still higher than the maximal frequency. According to [23], both 1/f noise and random work can be considered special cases of the autoregressive integrated moving average (ARIMA) model. As we derived in Appendix A, an ARIMA model is still an ARIMA model after coarse graining given the condition of that the residuals at different time points are independently and identically distributed (i.i.d.) Gaussian noise. In other words, the moving averaging process will not change the signal variance and will not change SampEn.
While we only showed the results based on one particular set SampEn calcualtion parameters, we included additional figures in Appendix B showing that the bias and the bias correction are still true for other parameters. We did not show the effects of bias correction on real data, but the results shown in the synthetic data should be generalizable to real applications since both the math derivations and the correction process are independent of any specific data but rather general to any dynamic system.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: MSE multi-scale sample entropy SampEn sample entropy ApEn Approximate entropy ARIMA autoregressive integrated moving average

Appendix A. The Coarse-Grained ARIMA Process
Assume that a time series {X i } can be modeled by an ARIMA process: ARIMA(p,d,q): where τ is scale. Let j } are also i.i.d. Gaussian and we can have This proves that for any scale τ, {Y (τ) j } is also an ARIMA(p,d,q) process.

Appendix B. Numerical Results on Different m and r
The following Figures A1-A8 show additional MSE calculation results for different SampEn parameters m and r with and without bias correction. N means the length of the time series at scale 1. Theses figures confirmed the systematic bias in the original MSE algorithm for different m and r and the data adaptive correction successfully removed the bias for all assessed signals.