Abstract
Sample entropy, an approximation of the Kolmogorov entropy, was proposed to characterize complexity of a time series, which is essentially defined as , where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with , for a predetermined positive integer m. It has been widely used to analyze physiological signals. As computing sample entropy is time consuming, the box-assisted, bucket-assisted, x-sort, assisted sliding box, and kd-tree-based algorithms were proposed to accelerate its computation. These algorithms require or computational complexity, where N is the length of the time series analyzed. When N is big, the computational costs of these algorithms are large. We propose a super fast algorithm to estimate sample entropy based on Monte Carlo, with computational costs independent of N (the length of the time series) and the estimation converging to the exact sample entropy as the number of repeating experiments becomes large. The convergence rate of the algorithm is also established. Numerical experiments are performed for electrocardiogram time series, electroencephalogram time series, cardiac inter-beat time series, mechanical vibration signals (MVS), meteorological data (MD), and noise. Numerical results show that the proposed algorithm can gain 100–1000 times speedup compared to the kd-tree and assisted sliding box algorithms while providing satisfactory approximate accuracy.
1. Introduction
Kolmogorov entropy is a well-suited measure for the complexity of dynamical systems containing noises. Approximate entropy (AppEn), proposed by Pincus [1], is an approximation of the Kolmogorov entropy. To overcome the biasedness of AppEn caused by self-matching, Richman proposed sample entropy (SampEn) [2] in 2000. SampEn is essentially defined as , where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with . SampEn has prevailed in many areas, such as cyber-physical systems, mechanical systems, health monitoring, disease diagnosis, and control. Based on AppEn and SampEn, multiscale entropy [3] and hierarchical entropy [4] were developed for measuring the complexity of physiological time series in multiple time scales. Since low-frequency filters are involved, multiscale entropy can weaken the influence of meaningless structures such as noise on complexity measurement. By adding the sample entropy of the high-frequency component of the time series, the hierarchical entropy provides more comprehensive and accurate information and improves the ability to distinguish different time series. Multiscale entropy, hierarchical entropy, and their variants have been applied to various fields such as fault identification [5,6] and feature extraction [7], beyond physiological time series analysis.
Computing SampEn requires counting the number of similar templates of time series. In other words, it requires counting the number of matched template pairs for a given time series. Clearly, direct computing of SampEn requires computational complexity of , where N is the length of the time series analyzed. To accelerate the computation of SampEn, kd-tree based algorithms for sample entropy were proposed, which reduce the time complexity to , where m is the template (also called pattern) length [8,9]. In addition, box-assisted [10,11], bucket-assisted [12], lightweight [13], and assisted sliding box (SBOX) [14] algorithms were developed. However, the complexity of all these algorithms is . Recently, an algorithm proposed in [15] for computing approximate values of AppEn and SampEn, without theoretical error analysis, still requires computational costs in the worst scenario, even though it requires only number of operations in certain best cases. Developing fast algorithms for estimating SampEn is still of great interest.
The goal of this study is to develop a Monte-Carlo-based algorithm for calculating SampEn. The most costly step in computing SampEn is to compute the matched template ratio of length m over length . Noting that (resp. ) is the probability that templates of length m (resp. ) are matched, the ratio can be regarded as a conditional probability. From this viewpoint, we can approximate this conditional probability of the original data set by that of a data set randomly down-sampled from the original one. Specifically, we randomly select templates of lengths m and templates of from the original time series. We then count the number (resp. ) of matched pairs among the selected templates of lengths m (resp. ). We repeat this process times, and compute the mean (resp. ) of (resp. ). Then, we use to approximate for the time series to measure its complexity. We establish the computational complexity and convergence rate of the proposed algorithm. We then study the performance of the proposed algorithm, by comparing it with the kd-tree-based algorithm and the SBOX method on the electrocardiogram (ECG) time series, electroencephalogram time series (EEG), cardiac inter-beat (RR) time series, mechanical vibration signals (MVS), meteorological data (MD), and noise. Numerical results show that the proposed algorithm can gain more than 100 times speedup compared to the SBOX algorithm (the most recent algorithm in the literature to the best of our knowledge) for a time series of length , and more than 1000 times speedup for a time series of length . Compared to the kd-tree algorithm, the proposed algorithm can again achieve up to 1000 times speedup for a time series of length .
This article is organized in five sections. The proposed Monte-Carlo-based algorithm for estimating sample entropy is described in Section 2. Section 3 includes the main results of the analysis of approximate accuracy of the proposed algorithm, and the proofs are given in the Appendix A. Numerical results are presented in Section 4, and conclusion remarks are made in Section 5.
2. Sample Entropy via Monte Carlo Sampling
In this section, we describe a Monte-Carlo-based algorithm for estimating the sample entropy of a time series.
We first recall the definition of sample entropy. For all , let . The distance of two real vectors and of length k is defined by
We let be a time series of length . For , we let . We define a set X of N vectors by , where is called a template of length m for the time series . We also define a set Y of N vectors by , where is called a template of length for . To avoid confusion, we call the elements in X and Y the templates for the time series . We denote by the cardinality of a set E. We use , , to denote the cardinality of the set consisting of templates satisfying , that is,
Likewise, for , we let
Letting
we define the sample entropy of time series by
The definition of sample entropy yields the direct algorithm, which explicitly utilizes two nested loops, where the inner one computes and , and the outer one computes A and B. Algorithm 1 will be called repeatedly in the Monte-Carlo-based algorithm to be described later.
| Algorithm 1 Direct method for range counting |
Require: Sequence , subset , template length m and threshold r. 1: procedure DirectRangeCounting () 2: Set , 3: Set , 4: for to L do 5: Set , 6: for to L do 7: Set , 8: if then 9: , 10: return |
The definition of sample entropy shows that sample entropy measures the predictability of data. Precisely, in the definition of sample entropy, measures a conditional probability that when the distance of two templates and is less than or equal to r, the distance of their corresponding -th component is also less than or equal to r. From this perspective, we can approximate this conditional probability of the original data set by computing it on a data set randomly down-sampled from the original one. To describe this method precisely, we define the notations as follows.
We choose a positive integer , randomly draw numbers from without replacement, and form an -dimensional vector. All of such vectors form a subset of the product space
that is,
Suppose that is the power set of (the set of all subsets of , including the empty set and itself). We let P be the uniform probability measure satisfying for all and define the probability space . The definition of implies , and thus the probability measure satisfies for all . The definition of means all events that may occur in the sample space are considered in the probability space . We randomly select templates of length m and templates of length from the original time series. We then count the number (resp. ) of matched pairs among the selected templates of lengths m (resp. ). That is,
and
We repeat this process times.
Note that and are random variables on the probability space . Let and be the averages of random variables and , respectively, over the repeated processes. That is,
where is a subset of . With and , we can estimate the sample entropy by computing . We summarize the procedure for computing in Algorithm 2 and call it the Monte-Carlo-based algorithm for evaluating sample entropy (MCSampEn). In MCSampEn, , , are selected by the Hidden Shuffle algorithm proposed in [16].
| Algorithm 2 Monte-Carlo-based algorithm for evaluating sample entropy |
Require: Sequence , template length m, tolerance , sample size and number of experiments , probability space 1: procedure MCSampEn () 2: Set and , 3: for to do 4: Select , randomly, with uniform distribution, 5: Compute by calling DirectRangeCounting (), 6: Compute by calling DirectRangeCounting (), 7: , 8: , 9: , 10: return |
We next estimate the computational complexity of MCSampEn measured by the number of arithmetic operations. To this end, we recall Theorem 3.5 of [16] which gives the number of arithmetic operations used in the Hidden Shuffle algorithm.
Theorem 1.
The Hidden Shuffle algorithm generates a random sample of size sequentially from a population of size N with arithmetic operations in total.
Theorem 2.
The total number of arithmetic operations needed in Algorithm 2 is .
Proof.
For each , according to Theorem 1, the number of arithmetic operations needed for selecting on line 4 of Algorithm 2 is . Moreover, from Algorithm 1 we can see that for each , the number of arithmetic operations needed for computing and on lines 5 and 6 is . Thus, by counting the number of arithmetic operations needed for lines 7, 8, and 9 of Algorithm 2, we obtain the desired result. □
Theorem 2 indicates that the computational complexity of MCSampEn is controlled by setting appropriate sampling parameters and . When and are fixed, the computational complexity of MCSampEn is independent of the length N of time series . Meanwhile, we can also select and depending on N to balance the error and computational complexity of MCSampEn. For example, we can set and , where denotes the greatest integer no bigger than . In this case, the computational complexity is .
Noting that MCSampEn provides an approximation of the sample entropy, and not the exact value, convergence of MCSampEn is an important issue. We will discuss this in Section 3.
3. Error Analysis
In this section, we analyze the error of MCSampEn. Specifically, we will establish an approximation rate of MCSampEn in the sense of almost sure convergence.
A sequence of of random variables in probability space is said to converge almost surely to , denoted by
if there exists a set with such that for all ,
It is known (see [17]) that converges almost surely to if and only if
Furthermore, we can describe the convergence rate of by the declining rate of the sequence for all . If for ,
we say converges to V almost surely with rate .
To establish the approximation error of MCSampEn, we first derive two theoretical results for the expectations and variations of and . Then, by combining these results with the results of the almost sure convergence rate in [18] and the local smoothness of logarithm functions, we obtain the approximation rate of in the sense of almost sure convergence, which is the main theoretical result of this paper. We state these results below and postpone their proofs to the Appendix A.
The expectations of and are given in the following theorem.
Theorem 3.
It holds that for all with ,
and
The next theorem presents the variations of and .
Theorem 4.
It holds that for all with ,
and
where
Moreover, there is .
Based on Theorems 3 and 4, we can obtain by the Kolmogorov strong law of large numbers and the continuous mapping theorem. However, in practice it is desirable to quantify the approximation rate in the sense of almost sure convergence, so that we can estimate the error between and . To this end, we define , and . Let and . For all and , we also let
With the notation defined above, we present below the main theoretical result of this paper, which gives the rate of approximating in the sense of almost sure convergence.
Theorem 5.
Let and with . If , then there exist constants and (depending only on β) such that for all and , such that
The proof for Theorems 3–5 are included in the Appendix A. Note that Theorem 5 indicates that approximates in the sense of almost sure convergence of order 1.
4. Experiments
We present numerical experiments to show the accuracy and computational complexity of the proposed algorithm MCSampEn.
As sample entropy has been prevalently used in a large number of areas, we consider several series with a variety of statistical features, including the electrocardiogram (ECG) series, RR interval series, electroencephalogram (EEG) series, mechanical vibration signals (MVS), meteorological data (MD), and noise. The ECG and EEG data can be downloaded from PhysioNet, a website offering access to recorded physiologic signals (PhysioBank) and related open-source toolkits (PhysioToolkit) [19]. The MVS data can be found in [20] and the website of the Case Western Reserve University Bearing Data Center [21]. The MD data can be downloaded from the website of the Royal Netherlands Meteorological Institute [22]. The databases used in this paper include:
- Long-Term AF Database (ltafdb) [23]. This database includes 84 long-term ECG recordings of subjects with paroxysmal or sustained atrial fibrillation (AF). Each record contains two simultaneously recorded ECG signals digitized at 128 Hz with 12-bit resolution over a 20 mV range; record durations vary but are typically 24 to 25 h.
- Long-Term ST Database (ltstdb) [24]. This database contains 86 lengthy ECG recordings of 80 human subjects, chosen to exhibit a variety of events of ST segment changes, including ischemic ST episodes, axis-related non-ischemic ST episodes, episodes of slow ST level drift, and episodes containing mixtures of these phenomena.
- MIT-BIH Long-Term ECG Database (ltecg) [19]. This database contains 7 long-term ECG recordings (14 to 22 h each), with manually reviewed beat annotations.
- BIDMC Congestive Heart Failure Database (chfdb) [25]. This database includes long-term ECG recordings from 15 subjects (11 men, aged 22 to 71, and 4 women, aged 54 to 63) with severe congestive heart failure (NYHA class 3–4).
- MGH/MF Waveform Database (mghdb) [26]. The Massachusetts General Hospital/ Marquette Foundation (MGH/MF) Waveform Database is a comprehensive collection of electronic recordings of hemodynamic and electrocardiographic waveforms of stable and unstable patients in critical care units, operating rooms, and cardiac catheterization laboratories. Note that only the ECG records were considered in our experiments.
- RR Interval Time Series (RR). The RR interval time series are derived from healthy subjects (RR/Health), and subjects with heart failure (RR/CHF) and atrial fibrillation (RR/AF).
- CHB-MIT Scalp EEG Database (chbmit) [27]. This database contains (EEG) records of pediatric subjects with intractable seizures. The records are collected from 22 subjects, monitored for up to several days.
- Gearbox Database (gearbox) [20]. The gearbox dataset was introduced in [20] and was published on https://github.com/cathysiyu/Mechanical-datasets (accessed on 27 March 2022).
- Rolling Bearing Database (RB) [21]. This database as a standard reference for the rolling bearing fault diagnosis is provided by the Case Western Reserve University Bearing Data Center [21].
- Meteorological Database (MD) [22]. The meteorological database used in this section records the hourly weather data in the past 70 years in the Netherlands.
As each database consists of multiple records from different subjects, we select one record randomly from each database. Specifically, we choose record “00” from ltafdb, “s20011” from ltstdb, “14046” from ltdb, “chf01” from chfdb, “mgh001” from mghdb, “chb07_01” from chbmit, “Miss_30_2” from gearbox, “XE110_DE_Time” from RB, and “380_t” from MD. Moreover, noise signal, an artificial signal, is studied to increase diversity. The time series considered in this section are illustrated in Figure 1, where all samples are normalized to have a standard deviation of 1, since the parameter threshold r is proportional to the standard deviation of the records, and thus the whole range of the records is negligible.
Figure 1.
Samples of the dataset records.
4.1. Approximation Accuracy
In the experiments presented in this subsection, we examine the approximation accuracy of the MCSampEn algorithm. Specifically, we set and . We vary the sampling size and the number of computations to study the approximation accuracy of the proposed algorithm. In this experiment, records with lengths exceeding are truncated to have length ; otherwise, the entire records are used. Since in the MCSampEn algorithm, are selected randomly, the outcome of the algorithm depends on the selected value of . To overcome the effect of the randomness, for every specified pair of , we run the algorithm 50 times and calculate the mean errors (MeanErr) and the root mean squared errors (RMeanSqErr) of the 50 outcomes.
In our first experiment, we consider series “mghdb/mgh001”, select parameters , , and show in Figure 2 the mean errors and the root mean squared errors of the MCSampEn outputs as surfaces in the - coordinate system. Images (a) and (c) of Figure 2 show the values of and images (b), (d), and (f) of Figure 2 show the values of . Figure 2 clearly demonstrates that both the mean errors and the root mean squared errors of the MCSampEn outputs converge to 0 as or increases to infinity. This is consistent with our theoretical analysis in the previous section.
Figure 2.
The values of and for time series “mghdb/mgh001” with respect to the sample size and the number of computations , where parameters and . (a) with . (b) with . (c) with . (d) with .
In the second experiment, we consider all series illustrated in Figure 1 and show numerical results in Figure 3 and Figure 4. Images (a), (c), and (e) of Figure 3 show the values of , and images (b), (d), and (f) of Figure 3 show the values of , with and fixed . Images (a), (c), and (e) of Figure 4 show the values of , and images (b), (d), and (f) of Figure 4 show the values of , with and . Figure 3 indicates that the outputs of the MCSampEn algorithm converge as increases. We can also see from Figure 3 that when , , and , both MeanErr and RMeanSqErr are less than for all tested time series. In other words, the MCSampEn algorithm can effectively estimate sample entropy when , , and . From Figure 4, we can also observe that the outputs of the MCSampEn algorithm converge as increases. This is consistent with the theoretical results established in Section 3.
Figure 3.
The values of and with respect to and , where parameters and . (a) with . (b) with . (c) with . (d) with .
Figure 4.
The values of and with respect to and , where parameters and . (a) with . (b) with . (c) with . (d) with .
We next explain how the randomness of a time series effects the accuracy of the MCSampEn algorithm by applying the algorithm to the stochastic process , which has been widely applied to studies of sample entropy [1,2,28]. The is defined as follows. Let for all where
Let be a family of independent identically distributed (i.i.d) real random variables with uniform probability density on the interval . Note that and are sequences with contrary properties: the former is a completely regular sine sequence, and the latter is completely random. Let , and be a family of i.i.d random variables satisfying with probability p and with probability . Then, the process is defined as . It’s not hard to find that the parameter p controls the ratio of sine sequence and random noise in the process and the increase in p makes the process more random. When , the process is a deterministic sine sequence. Meanwhile, when , the process turns out completely unpredictable uniform noise. This feature makes it an ideal series to study how randomness affects the accuracy of the MCSampEn algorithm.
Here, we apply MCSampEn to , and show the results of versus p in Figure 5. From Figure 5, we can observe that the values of increase linearly with a very small growth rate when . When , the values of are significantly faster than that of . Therefore, we believe that when the randomness of a time series is weak, the error of the MCSampEn algorithm is small; as the randomness of the time series increases, the error of the MCSampEn grows.
Figure 5.
The values of with respect to p, where parameters , , , , and .
4.2. Time Complexity
In the experiments presented in this subsection, we compare the computing time of the MCSampEn algorithm with that of the kd-tree algorithm [8] and SBOX algorithm [14], under the condition that the value of sample entropy computed by the MCSampEn algorithm is very close to the ground truth value. The computational time experiments are performed on a desktop computer running Windows 11, with an Intel(R) Core(TM) i5-9500 CPU, and 32GB RAM. The implementations of the kd-tree-based algorithm and the MCSampEn algorithm are available on the website https://github.com/phreer/fast_sampen_impl.git (accessed on 30 March 2022). As for the SBOX method, we utilize the implementation given by the original author, published on website https://sites.google.com/view/yhw-personal-homepage (accessed on 25 October 2021). To demonstrate the validity of the MCSampEn algorithm, we also show both the sample entropy estimated by MCSampEn and the corresponding ground truth.
As we have discussed above, the time complexity of the MCSampEn algorithm depends on the parameters and . In this subsection, we discuss two strategies for choosing and :
- S1
- Choose and to be independent of N, for example and .
- S2
- Choose and , depending on N.
An intuitive explanation of the second strategy is shown below. We would like to choose and such that the overall time complexity of executing the algorithm is . For this purpose, we expect to grow like and to grow logarithmically in N. However, when N is not large enough, lack of sampling templates can seriously impair the accuracy of the algorithm. To overcome this problem, we set a lower bound of to 1024, which is a good trade-off between accuracy and time complexity. The experimental results in this subsection show that this strategy can produce satisfactory output even when N is small.
The results on different signals “ltafdb/00”, “ noise”, “chbmit/chb07_01”, and “ltecg/14046” are shown in Figure 6, where the first strategy is adopted by setting and , and the results for are marked by red color, and the results for are marked by blue. In the left column of Figure 6, the values of computation time consumed by the kd-tree, SBOX, and MCSampEn algorithms are plotted, respectively, with the dashed lines marked “x”, the dash-dot lines marked “+”, and the solid lines marked “o”. From the results shown in the left column of Figure 6, we can find that MCSampEn is faster than the SBOX algorithm when N is greater than . We also can see when the time series “chbmit/chb07_01” and “ltecg/14046” have length N of , MCSampEn is nearly 1000 times faster than the SBOX algorithm. Compared to the kd-tree algorithm, the MCSampEn algorithm can still achieve up to hundreds of times acceleration when . In addition, the time complexity of MCSampEn algorithm is close to a constant relative to m, and is much smaller than the kd-tree and SBOX algorithms when N is large enough. Meanwhile, the computational time (shown in the left column of Figure 6) required is hardly affected by the times series length N.
Figure 6.
The left column shows the results of computational time versus data length N on different signals. In the right column, the values of are presented by error bars “I”, where the larger the value of , the longer the error bar “I”. In this figure, we set , , and . (a) Time for “ltafdb/00”. (b) Sample entropy “ltafdb/00”. (c) Time for noise. (d) Sample entropy for noise. (e) Time for “chbmit/chb07_01”. (f) Sample entropy for “chbmit/chb07_01”. (g) Time “ltecg/14046”. (h) Sample entropy for “ltecg/14046”.
The right column of Figure 6 shows the average of 50 outputs of the MCSampEn algorithm for different time series under the settings of and , where the red solid lines plot the average for the cases of , and the blue solid lines plot the average for the cases of . In the right column of Figure 6, the values of ground truth for the cases of and are plotted by the red and blue dashed lines, respectively. Meanwhile, in the right column of Figure 6, we use error bars “I” to represent the values of , where the larger the value of , the longer the error bar “I”. From the length of error bar “I”, we can see that the values of are small compared to the ground truth. Especially on the time series “ltafdb/00”, “chbmit/chb_0701”, and “ltecg/14046”, the values of are negligible compared to the values of ground truth. These results imply that when and , the sample entropy estimated by the MCSampEn algorithm can effectively approximate the ground truth value.
The results of the second strategy are shown in Figure 7, where and . The results for are marked by red color, and the results for are marked by blue color. The left column of Figure 6 shows the values of computation time consumed by the kd-tree, SBOX, and MCSampEn algorithms, which are presented by the dashed lines marked “x”, the dash-dot lines marked “+”, and the solid lines marked “o”, respectively. From the left column of Figure 7, we also can see that with the second strategy, the computational time of MCSampEn algorithm is much less than that of the kd-tree and SBOX algorithms, since the computational complexity of Algorithm 2 is . Furthermore, we observe that MCSampEn achieves a speedup of more than 100 compared to the SBOX algorithm when N goes from to , and it is over 1000 times faster when . Compared to the kd-tree algorithm, the MCSampEn algorithm can still obtain up to 1000 times acceleration when .
Figure 7.
The left column shows the results of computational time versus data length N on different signals. The right column shows the values of by error bar, where the larger the value of , the longer the error bar “I”. In this figure, we set , , and . (a) Time for “ltafdb/00”. (b) Sample entropy “ltafdb/00”. (c) Time for noise. (d) Sample entropy for noise. (e) Time for “chbmit/chb07_01”. (f) Sample entropy for “chbmit/chb07_01”. (g) Time “ltecg/14046”. (h) Sample entropy for “ltecg/14046”.
In the right column of Figure 7, we plot the average of 50 outputs of the MCSampEn algorithm for different time series by the red and blue solid lines for and , respectively. At the same time, the values of ground truth for the cases of and are plotted by the red and blue dashed lines, respectively. As in Figure 6, we use the error bar “I” to represent the values of . Comparing the error bar “I” in Figure 6, we can see that the values of the in this experiment are larger than that shown in Figure 6. However, the value of is still small in terms of the values of ground truth. Moreover, we can observe that the length of the error bars decreases as N increases. This means that we can obtain a better approximation of sample entropy as the time series length increases.
To reveal the effect of randomness on the speedup, we compare the time taken by the kd-tree and MCSampEn algorithms to compute the sample entropy of the time series , . The experimental results are shown in Figure 8, where the results for are marked by red color, and the results for are marked by blue. The values of computation time consumed by the kd-tree and MCSampEn algorithms are plotted, respectively, with the dashed lines marked “x” and the solid lines marked “o”. In this experiment, we set and . We also let and to ensure that the relative error is no greater than . From Figure 8, we can see that when the value of p is less than , compared with the kd-tree algorithm, the MCSampEn algorithm can achieve 300 to 1000 times speedup. When the value of p is greater than , our algorithm can still obtain a 10x speedup relative to the kd-tree algorithm.
Figure 8.
The results of computational time with respect to p, where parameters , , , , and are selected such that relative error .
From the experiments in this subsection, we can observe that the MCSampEn algorithm can achieve a high speedup when it is applied to different types of signals. In fact, compared with kd-tree algorithm, the MCSampEn algorithm can achieve high accuracy and more than 300 times acceleration when the time series has less randomness. When the randomness of the time series is high, our algorithm can still obtain a speedup of nearly 10 times.
4.3. Memory Usage
In order to show the performance of the MCSampEn algorithm more comprehensively, we also compare the memory usage of the kd-tree and MCSampEn algorithms. The memory usage on signal “ltstdb/s20011” is shown in Figure 9, where the memory usage for and is shown in Figure 9a,b, respectively. In this figure, the memory usage of the kd-tree algorithm is plotted by the blue dash-dot lines marked “x”. The memory usage of the MCSampEn algorithm with the first and second strategies is plotted by the green dashed lines marked “+” and the red dotted lines marked “o”, respectively. In Figure 9, the first strategy is adopted by setting and , and the second strategy is adopted by and . We also present the memory usage for storing the data by the black solid lines marked “□”.
Figure 9.
The results of memory usage versus data length N with . (a) Memory usage for . (b) Memory usage for .
From the results shown in Figure 9, it can be seen that when the size of the data is , the memory required by the kd-tree algorithm is almost 36 times that of the memory required by the MCSampEn algorithm. This is because the kd-tree algorithm requires a large memory space to save the kd-tree. Meanwhile, the experimental results in Figure 9 also show that the amount of memory required by the MCSampEn algorithm is only about 15 MB more than the amount of memory required to store the data when the length of data is between and . This is because the MCSampEn algorithm requires additional memory for storing templates and to execute the subroutines that generate random numbers.
Because the MCSampEn algorithm is based on Monte Carlo sampling and the law of large numbers, it is an easily parallelizable algorithm. Therefore, combined with distributed storage techniques, the idea of the MCSampEn algorithm can be used to compute sample entropy for large-scale data (for example, where the size of data is larger than 1 TB). Parallel algorithms for computing sample entropy of large-scale data will be our future work.
5. Conclusions
In this paper, we propose a Monte-Carlo-based algorithm called MCSampEn to estimate sample entropy and prove that the outputs of MCSampEn can approximate sample entropy in the sense of almost sure convergence of order 1. We provide two strategies to select the sampling parameters and , which appear in MCSampEn. The experiment results show that we can flexibly select the parameters and to balance the computational complexity and error. From the experimental results, we can observe that the computational time consumed by the proposed algorithm is significantly shorter than the kd-tree and SBOX algorithms, with negligible loss of accuracy. Meanwhile, the computational complexity of our MCSampEn method is hardly affected by the time series length N. We also study how the randomness of the time series affects the accuracy and computation time of the MCSampEn algorithm by applying the algorithm to the stochastic process . The results indicate that the proposed algorithm performs well for time series with less randomness.
Author Contributions
Conceptualization, Y.J.; methodology, Y.J. and W.L.; software, W.L.; validation, Y.J. and W.L.; formal analysis, Y.J. and W.L.; investigation, Y.J.; writing—original draft preparation, W.L.; writing—review and editing, Y.J. and Y.X.; visualization, W.L.; supervision, Y.J.; project administration, Y.J.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.
Funding
W. Liu and Y. Jiang are supported in part by the Key Area Research and Development Program of Guangdong Province, China (No. 2021B0101190003); the Natural Science Foundation of Guangdong Province, China (No.2022A1515010831); and Science and Technology Program of Guangzhou, China (No. 201804020053). Yuesheng Xu was supported in part by US National Science Foundation under grant DMS-1912958.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used are included in the article.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
In this Appendix, we provide proofs of Theorems 3–5, where Theorems 3 and 4 describe the expectations and variances of and , and Theorem 5 presents the convergence rate of .
Note that the only difference in the definitions between and is the template length. Without loss of generality, we discuss the expectation (2) and variation (4) of . The Equations (1) and (3) of can be obtained in a similar way.
To analyze the expectation of , we define the following notation. For all , we define random variable on the probability space by
For all , the definition of indicates that is the number of elements in that satisfy and . From the definitions of and , we have that for all ,
For , we say random variable V follows the hypergeometric distribution if and only if the probability of
See Section 5.3 of [29] for more details about the hypergeometric distribution. For all , let , which is the index set of elements of Y satisfying . From the definition of , we have that . For the purpose of analyzing the expectation of , we recall the expectation of the hypergeometric distribution (see Theorem 5.3.2 in [29]) and prove a technical lemma as follows.
Theorem A1.
For , the expectation of the hypergeometric distribution is
Lemma A1.
Let with . For any fixed and , the conditional probability distribution of given is the hypergeometric distribution . Moreover, for all , the expectation of random variable is
Proof.
Let and . From the definition of , we can see that for all with , . On the other hand, since for all with ,
from the definitions of and , we have that . Thus, we can see that for all with , . This means that for or ,
Meanwhile, it can be checked that for all with and , if and only if vector contains k components belonging to , and components belonging to . Note that there are ways of drawing k elements from set , and ways of drawing elements from set . Thus, by noting that each element in is a permutation formed by extracting numbers from , we have that for all ,
Note that , and the elements in are of equal probability. Hence, dividing the right term of (A3) by , we obtain
This indicates that the conditional probability distribution of given is the hypergeometric distribution (see [29]).
Since the conditional probability distribution of given is the hypergeometric distribution , from Theorem A1 we have for any and , . Thus, by noting and for all , from the law of total expectation we obtain (A2). □
The proof for Theorem 3 is shown as follows.
Next we consider the variance of . Since , the variance of can be obtained by summing the covariances , . This motivates us to compute these covariances. As a preparation, we establish two auxiliary lemmas. For all with , we define and .
Lemma A2.
It holds that
Proof.
Note that is not necessarily empty for . For , we define new sets so that they mutually disjoint and have the same cardinality as . In this way, the formula (A5) will be proved by establishing a set identity and counting their cardinality. To this end, we define , for each with , and , for each . From the definition of , we have that and if . Thus,
Likewise, the definition of ensures that and if . Thus, by noting that ,
Combining Equations (A6) and (A7), we see that it suffices to prove
For all with , and , the definitions of and ensure
In other words, there are , and . Thus, for all with , and , there has . Thus, we obtain
On the other hand, for all and , we know (A9) holds and from the definitions of and . This means that and . Hence, we obtain that
From (A10) and (A11) we obtain (A8), which leads to the desired result (A5). □
For with , we define random variable on the probability space by
From the definition of , we can see that . Thus, in order to compute the covariance , we next show the values of for and with .
Lemma A3.
It holds that for with , and with ,
Moreover, for all and with , it holds that
Proof.
We first prove (A13). Let
and for all , we define
We prove (A13) by counting the cardinality of . To this end, we identify as the union of disjoint subsets of . From the definition of and , we know for all and that and . At the same time, note that for and , the numbers in set are distinct. Thus, for all and , it holds that , , and . Namely,
On the other hand, it is easy to check that
Thus, for all , can be rewritten as
For , we define Then, we can rewrite as
Since if , from (A15) we can see that
Note that for all and ,
and the two sets on the right-hand side of the above equation are disjoint. Thus, it holds that for all and ,
Substituting (A17) into (A16) leads to
By direct computation with noting , we obtain from the equation above that
Note that and . We then have that
and
Substituting (A5) and the above equations into (A18), we obtain that
By noting that and we obtain (A13) from (A19).
With the help of Lemma A3, we can calculate in the following lemma.
Lemma A4.
If with , then for all with ,
and for all ,
Proof.
We first prove (A21). Let with . From the decomposition , we obtain for all with that
We further rewrite the right-hand side of the above equation to obtain
We next compute the terms on the right hand side of (A23) one by one. Since for all , and ,
from Equation (A13) of Lemma A3, we know the first term in the right-hand side of (A23) satisfies
Likewise, by noting that , from Equation (A14) of Lemma A3, we obtain the second, third, and fourth terms on the right-hand side of (A23),
Note that for all with , it holds that and . Thus, the last term on the right-hand side of (A23) satisfies
Substituting (A24), (A25), and (A26) into (A23) leads to (A21).
Now, we are ready to discuss the variance of .
The proof for Theorem 4 is shown as follows.
Proof.
To analyze this almost sure convergence rate of , we require Theorem 2 of [18], which is recalled as follows.
Theorem A2.
Let be a sequence of independent and identically distributed random variables in probability space with expectation μ, and . If and , then for all and , there are constants and (depending only on β) such that for all ,
where is defined by (6).
Combining Theorems 3, 4, and A2 leads to the almost sure convergence of and in the next lemma.
Lemma A5.
Let and with . Then, there are constants and (depending only on β) such that for all and ,
and
where is defined by (6).
Proof.
We next consider the almost sure convergence rate of . To this end, we introduce the following lemma.
Lemma A6.
Let with . If and , then for all and ,
and
Proof.
Note that for all and , when
it holds that Hence, when (A35) holds, there is
By noting that and , from (A36), we know that when (A35) holds, there has that is,
Note that when , for all and , inequality (A37) always holds. Thus, we know that for all and , when (A35) holds, inequality (A37) holds. Then, replacing a, b, and by , and , we know for all and , when
there has
Let be the set of the events satisfying (A38), and be the set of the events satisfying (A39). From (A38) and (A39), we know that . Thus, we can obtain (A34) (see Theorem 1.5.4 in [29]). Similarly, we can obtain (A33). □
Combining Lemmas A5 and A6, we obtain the almost sure convergence rate of
in Theorem 5.
The proof of Theorem 5 is provided as follows.
Proof.
Note that for all ,
Thus, we know that for all and , if
then
or
Let be the set of the events satisfying (A40), be the set of the events satisfying (A41), and be the set of events satisfying (A42). Then, from the above inequalities, we have . Hence, we have (see Theorems 1.5.4 and 1.5.7 in [29]), that is,
Substituting (A34) and (A33) into above inequality, from Lemma A5 and the definitions of and , we obtain the desired result (7). □
References
- Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, 2039–2049. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Costa, M.; Goldberger, A.L.; Peng, C.-K. Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 2002, 89, 068102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiang, Y.; Peng, C.-K.; Xu, Y. Hierarchical entropy analysis for biological signals. J. Comp. Appl. Math. 2011, 236, 728–742. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Li, G.; Yang, Y.; Liang, X.; Xu, M. A fault diagnosis scheme for planetary gearboxes using adaptive multi-scale morphology filter and modified hierarchical permutation entropy. Mech. Syst. Signal Proc. 2017, 105, 319–337. [Google Scholar] [CrossRef]
- Yang, C.; Jia, M. Hierarchical multiscale permutation entropy-based feature extraction and fuzzy support tensor machine with pinball loss for bearing fault identification. Mech. Syst. Signal Proc. 2021, 149, 107182. [Google Scholar] [CrossRef]
- Li, W.; Shen, X.; Li, Y. A comparative study of multiscale sample entropy and hierarchical entropy and its application in feature extraction for ship-radiated noise. Entropy 2019, 21, 793. [Google Scholar] [CrossRef] [Green Version]
- Jiang, Y.; Mao, D.; Xu, Y. A fast algorithm for computing sample entropy. Adv. Adapt. Data Anal. 2011, 3, 167–186. [Google Scholar] [CrossRef]
- Mao, D. Biological Time Series Classification via Reproducing Kernels and Sample Entropy. Ph.D. Dissertation, Syracuse University, Syracuse, NY, USA, August 2008. [Google Scholar]
- Grassberger, P. An optimized box-assisted algorithm for fractal dimensions. Phys. Lett. A 1990, 148, 63–68. [Google Scholar] [CrossRef]
- Theiler, J. Efficient algorithm for estimating the correlation dimension from a set of discrete points. Phys. Rev. A Gen. Phys. 1987, 36, 4456–4462. [Google Scholar] [CrossRef]
- Manis, G. Fast computation of approximate entropy. Comput. Meth. Prog. Biomed. 2008, 91, 48–54. [Google Scholar] [CrossRef]
- Manis, G.; Aktaruzzaman, M.; Sassi, R. Low computational cost for sample entropy. Entropy 2018, 20, 61. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.H.; Chen, I.Y.; Chiueh, H.; Liang, S.F. A low-cost implementation of sample entropy in wearable embedded systems: An example of online analysis for sleep eeg. IEEE Trans. Instrum. Meas. 2021, 70, 9312616. [Google Scholar] [CrossRef]
- Tomčala, J. New fast ApEn and SampEn entropy algorithms implementation and their application to supercomputer power consumption. Entropy 2020, 22, 863. [Google Scholar] [CrossRef] [PubMed]
- Shekelyan, M.; Cormode, G. Sequential Random Sampling Revisited: Hidden Shuffle Method. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtually Held, 13–15 April 2021; pp. 3628–3636. [Google Scholar]
- Karr, A.F. Probability; Springer: New York, NY, USA, 1993. [Google Scholar]
- Luzia, N. A simple proof of the strong law of large numbers with rates. Bull. Aust. Math. Soc. 2018, 97, 513–517. [Google Scholar] [CrossRef]
- Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, 215–220. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shao, S.; McAleer, S.; Yan, R.; Baldi, P. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Trans. Ind. Inform. 2019, 15, 2446–2455. [Google Scholar] [CrossRef]
- Case Western Reserve University Bearing Data Center. Available online: https://engineering.case.edu/bearingdatacenter (accessed on 27 March 2022).
- Royal Netherlands Meteorological Institute. Available online: https://www.knmi.nl/nederland-nu/klimatologie/uurgegevens (accessed on 27 March 2022).
- Petrutiu, S.; Sahakian, A.V.; Swiryn, S. Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans. Europace 2007, 9, 466–470. [Google Scholar] [CrossRef]
- Jager, F.; Taddei, A.; Moody, G.B.; Emdin, M.; Antolič, G.; Dorn, R.; Smrdel, A.; Marchesi, C.; Mark, R.G. Long-term st database: A reference for the development and evaluation of automated ischaemia detectors and for the study of the dynamics of myocardial ischaemia. Med. Biol. Eng. Comput. 2003, 41, 172–182. [Google Scholar] [CrossRef]
- Baim, D.S.; Colucci, W.S.; Monrad, E.S.; Smith, H.S.; Wright, R.F.; Lanoue, A.; Gauthier, D.F.; Ransil, B.J.; Grossman, W.; Braunwald, E. Survival of patients with severe congestive heart failure treated with oral milrinone. J. Am. Coll. Cardiol. 1986, 7, 661–670. [Google Scholar] [CrossRef] [Green Version]
- Welch, J.; Ford, P.; Teplick, R.; Rubsamen, R. The massachusetts general hospital-marquette foundation hemodynamic and electrocardiographic database–comprehensive collection of critical care waveforms. Clin. Monit. 1991, 7, 96–97. [Google Scholar]
- Shoeb, A.H. Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, September 2009. [Google Scholar]
- Silva, L.E.V.; Filho, A.C.S.S.; Fazan, V.P.S.; Felipe, J.C.; Junior, L.O.M. Two-dimensional sample entropy: Assessing image texture through irregularity. Biomed. Phys. Eng. Expr. 2016, 2, 045002. [Google Scholar] [CrossRef]
- DeGroot, M.H.; Schervish, M.J. Probability and Statistics, 4th ed.; Person Education: New York, NY, USA, 2012. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).