A Super Fast Algorithm for Estimating Sample Entropy

Sample entropy, an approximation of the Kolmogorov entropy, was proposed to characterize complexity of a time series, which is essentially defined as −log(B/A), where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with m+1, for a predetermined positive integer m. It has been widely used to analyze physiological signals. As computing sample entropy is time consuming, the box-assisted, bucket-assisted, x-sort, assisted sliding box, and kd-tree-based algorithms were proposed to accelerate its computation. These algorithms require O(N2) or O(N2−1m+1) computational complexity, where N is the length of the time series analyzed. When N is big, the computational costs of these algorithms are large. We propose a super fast algorithm to estimate sample entropy based on Monte Carlo, with computational costs independent of N (the length of the time series) and the estimation converging to the exact sample entropy as the number of repeating experiments becomes large. The convergence rate of the algorithm is also established. Numerical experiments are performed for electrocardiogram time series, electroencephalogram time series, cardiac inter-beat time series, mechanical vibration signals (MVS), meteorological data (MD), and 1/f noise. Numerical results show that the proposed algorithm can gain 100–1000 times speedup compared to the kd-tree and assisted sliding box algorithms while providing satisfactory approximate accuracy.


Introduction
Kolmogorov entropy is a well-suited measure for the complexity of dynamical systems containing noises. Approximate entropy (AppEn), proposed by Pincus [1], is an approximation of the Kolmogorov entropy. To overcome the biasedness of AppEn caused by self-matching, Richman proposed sample entropy (SampEn) [2] in 2000. SampEn is essentially defined as − log(B/A), where B denotes the number of matched template pairs with length m and A denotes the number of matched template pairs with m + 1. SampEn has prevailed in many areas, such as cyber-physical systems, mechanical systems, health monitoring, disease diagnosis, and control. Based on AppEn and SampEn, multiscale entropy [3] and hierarchical entropy [4] were developed for measuring the complexity of physiological time series in multiple time scales. Since low-frequency filters are involved, multiscale entropy can weaken the influence of meaningless structures such as noise on complexity measurement. By adding the sample entropy of the high-frequency component of the time series, the hierarchical entropy provides more comprehensive and accurate information and improves the ability to distinguish different time series. Multiscale entropy, hierarchical entropy, and their variants have been applied to various fields such as fault identification [5,6] and feature extraction [7], beyond physiological time series analysis.
Computing SampEn requires counting the number of similar templates of time series. In other words, it requires counting the number of matched template pairs for a given time series. Clearly, direct computing of SampEn requires computational complexity of O(N 2 ), where N is the length of the time series analyzed. To accelerate the computation of SampEn, kd-tree based algorithms for sample entropy were proposed, which reduce the time complexity to O(N 2− 1 m+1 ), where m is the template (also called pattern) length [8,9]. In addition, box-assisted [10,11], bucket-assisted [12], lightweight [13], and assisted sliding box (SBOX) [14] algorithms were developed. However, the complexity of all these algorithms is O(N 2 ). Recently, an algorithm proposed in [15] for computing approximate values of AppEn and SampEn, without theoretical error analysis, still requires O(N 2 ) computational costs in the worst scenario, even though it requires only O(N) number of operations in certain best cases. Developing fast algorithms for estimating SampEn is still of great interest.
The goal of this study is to develop a Monte-Carlo-based algorithm for calculating SampEn. The most costly step in computing SampEn is to compute the matched template ratio B/A of length m over length m + 1. Noting that A N(N−1) (resp. B N(N−1) ) is the probability that templates of length m (resp. m + 1) are matched, the ratio B/A can be regarded as a conditional probability. From this viewpoint, we can approximate this conditional probability of the original data set by that of a data set randomly down-sampled from the original one. Specifically, we randomly select N 0 templates of lengths m and N 0 templates of m + 1 from the original time series. We then count the numberÃ (resp.B) of matched pairs among the selected templates of lengths m (resp. m + 1). We repeat this process N 1 times, and compute the meanĀ N 1 (resp.B N 1 ) ofÃ (resp.B). Then, we use − log(B N 1 /Ā N 1 ) to approximate − log(B/A) for the time series to measure its complexity. We establish the computational complexity and convergence rate of the proposed algorithm. We then study the performance of the proposed algorithm, by comparing it with the kd-tree-based algorithm and the SBOX method on the electrocardiogram (ECG) time series, electroencephalogram time series (EEG), cardiac inter-beat (RR) time series, mechanical vibration signals (MVS), meteorological data (MD), and 1/ f noise. Numerical results show that the proposed algorithm can gain more than 100 times speedup compared to the SBOX algorithm (the most recent algorithm in the literature to the best of our knowledge) for a time series of length 2 16 − 2 18 , and more than 1000 times speedup for a time series of length 2 19 − 2 20 . Compared to the kd-tree algorithm, the proposed algorithm can again achieve up to 1000 times speedup for a time series of length 2 20 .
This article is organized in five sections. The proposed Monte-Carlo-based algorithm for estimating sample entropy is described in Section 2. Section 3 includes the main results of the analysis of approximate accuracy of the proposed algorithm, and the proofs are given in the Appendix A. Numerical results are presented in Section 4, and conclusion remarks are made in Section 5.

Sample Entropy via Monte Carlo Sampling
In this section, we describe a Monte-Carlo-based algorithm for estimating the sample entropy of a time series.
We first recall the definition of sample entropy. For all k ∈ N, let Z k := {1, 2, . . . , k}. The distance of two real vectors a := [a l : l ∈ Z k ] and b : We let u := (u i ∈ R : i ∈ Z n ) be a time series of length n ∈ N. For m ∈ N, we let N := n − m − 1. We define a set X of N vectors by X : is called a template of length m for the time series u. We also define a set Y of N vectors by Y := {y i : i ∈ Z N }, where y i := [u i+l−1 : l ∈ Z m+1 ] is called a template of length m + 1 for u. To avoid confusion, we call the elements in X and Y the templates for the time series u. We denote by #E the cardinality of a set E. We use A i , i ∈ Z N , to denote the cardinality of the set consisting of templates x ∈ X \ {x i } satisfying ρ(x i , x) ≤ r, that is, Likewise, for i ∈ Z N , we let The definition of sample entropy yields the direct algorithm, which explicitly utilizes two nested loops, where the inner one computes A i and B i , and the outer one computes A and B. Algorithm 1 will be called repeatedly in the Monte-Carlo-based algorithm to be described later.

Algorithm 1 Direct method for range counting
Require: Sequence u := (u i : i ∈ Z N+m ), subset s ⊂ Z N , template length m and threshold r. Set count = 0, 3: Set L = #s, 4: for i = 1 to L do 5: Set a = [u s i +l−1 : l ∈ Z m ], 6: for j = i + 1 to L do 7: Set b = [u s j +l−1 : l ∈ Z m ], 8: if ρ(a − b) ≤ r then 9: count = count + 1, 10: return count The definition of sample entropy shows that sample entropy measures the predictability of data. Precisely, in the definition of sample entropy, B/A measures a conditional probability that when the distance of two templates a and b is less than or equal to r, the distance of their corresponding (m + 1)-th component is also less than or equal to r. From this perspective, we can approximate this conditional probability of the original data set by computing it on a data set randomly down-sampled from the original one. To describe this method precisely, we define the notations as follows.
We choose a positive integer N 0 , randomly draw N 0 numbers from Z N without replacement, and form an N 0 -dimensional vector. All of such vectors form a subset Ω of the product space Suppose that F is the power set of Ω (the set of all subsets of Ω, including the empty set and Ω itself). We let P be the uniform probability measure satisfying P(s) = 1/(#Ω) for all s ∈ Ω and define the probability space {Ω, F , P}. The definition of Ω implies #Ω = N! (N−N 0 )! , and thus the probability measure satisfies P(s) = (N−N 0 )! N! for all s ∈ Ω. The definition of F means all events that may occur in the sample space Ω are considered in the probability space {Ω, F , P}. We randomly select N 0 templates of length m and N 0 templates of length m + 1 from the original time series. We then count the numberÃ (resp. B) of matched pairs among the selected templates of lengths m (resp. m + 1). That is, We repeat this process N 1 times. Note thatÃ andB are random variables on the probability space {Ω, F , P}. LetĀ N 1 andB N 1 be the averages of random variablesÃ andB, respectively, over the N 1 repeated processes. That is,Ā where {s k : k ∈ Z N 1 } is a subset of Ω. WithĀ N 1 andB N 1 , we can estimate the sample entropy − log(B/A) by computing − log(B N 1 /Ā N 1 ). We summarize the procedure for computing − log(B N 1 /Ā N 1 ) in Algorithm 2 and call it the Monte-Carlo-based algorithm for evaluating sample entropy (MCSampEn). In MCSampEn, s k , k ∈ Z N 0 , are selected by the Hidden Shuffle algorithm proposed in [16].

Algorithm 2 Monte-Carlo-based algorithm for evaluating sample entropy
Require: Sequence u = (u i : i ∈ Z N+m ), template length m, tolerance r ∈ R, sample size N 0 and number of experiments N 1 , probability space {Ω, F , P} 1: procedure MCSAMPEN(u, m, r, N 0 , N 1 ) 2: 3: for k = 1 to N 1 do 4: Select s k ∈ Ω, randomly, with uniform distribution, 5: ComputeÃ(s k ) by calling DirectRangeCounting(u, s (k) , m, r), 6: ComputeB(s k ) by calling DirectRangeCounting(u, s (k) , m + 1, r), 10: return entropy We next estimate the computational complexity of MCSampEn measured by the number of arithmetic operations. To this end, we recall Theorem 3.5 of [16] which gives the number of arithmetic operations used in the Hidden Shuffle algorithm. Proof. For each k ∈ Z N 1 , according to Theorem 1, the number of arithmetic operations needed for selecting s (k) on line 4 of Algorithm 2 is O(N 0 ). Moreover, from Algorithm 1 we can see that for each k ∈ Z N 1 , the number of arithmetic operations needed for computing A(s k ) andB(s k ) on lines 5 and 6 is O(N 2 0 ). Thus, by counting the number of arithmetic operations needed for lines 7, 8, and 9 of Algorithm 2, we obtain the desired result.
Theorem 2 indicates that the computational complexity of MCSampEn is controlled by setting appropriate sampling parameters N 0 and N 1 . When N 0 and N 1 are fixed, the computational complexity of MCSampEn is independent of the length N of time series u. Meanwhile, we can also select N 0 and N 1 depending on N to balance the error and computational complexity of MCSampEn. For example, we can set N 0 := max{1024, √ N } and N 1 := min{5 + log 2 N, N/N 0 }, where a denotes the greatest integer no bigger than a ∈ R. In this case, the computational complexity is O(N log 2 N).
Noting that MCSampEn provides an approximation of the sample entropy, and not the exact value, convergence of MCSampEn is an important issue. We will discuss this in Section 3.

Error Analysis
In this section, we analyze the error of MCSampEn. Specifically, we will establish an approximation rate of MCSampEn in the sense of almost sure convergence.
A sequence of {V k : k ∈ N} of random variables in probability space {Ω, F , P} is said to converge almost surely to V ∈ {Ω, F , P}, denoted by if there exists a set N ∈ F with P(N ) = 0 such that for all ω ∈ Ω \ N , It is known (see [17]) that {V k : k ∈ N} converges almost surely to V ∈ {Ω, F , P} if and only if Furthermore, we can describe the convergence rate of {V i : i ∈ N} by the declining rate of we say {V i : i ∈ N} converges to V almost surely with rate α.
To establish the approximation error of MCSampEn, we first derive two theoretical results for the expectations and variations ofÃ N 0 (N 0 −1) andB N 0 (N 0 −1) . Then, by combining these results with the results of the almost sure convergence rate in [18] and the local smoothness of logarithm functions, we obtain the approximation rate of {− log(B N 1 /Ā N 1 ) : N 1 ∈ N} in the sense of almost sure convergence, which is the main theoretical result of this paper. We state these results below and postpone their proofs to the Appendix A.
Theorem 3. It holds that for all N 0 ∈ Z N with N 0 > 1, and .

Theorem 4.
It holds that for all N 0 ∈ Z N with N 0 > 1, and where

−→ log B
A by the Kolmogorov strong law of large numbers and the continuous mapping theorem. However, in practice it is desirable to quantify the approximation rate in the sense of almost sure convergence, so that we can estimate the error between logB With the notation defined above, we present below the main theoretical result of this paper, which gives the rate of {− logB k A k : k ∈ N} approximating − log B A in the sense of almost sure convergence.
Theorem 5. Let β > 1 and N 0 ∈ Z N with N 0 > 3. If A, B > 0, then there exist constants D β andD β (depending only on β) such that for all 0 < < 1 and N 1 > n ,β , such that P sup The proof for Theorems 3-5 are included in the Appendix A. Note that Theorem 5 indicates that − logB k A k approximates − log B A in the sense of almost sure convergence of order 1.

Experiments
We present numerical experiments to show the accuracy and computational complexity of the proposed algorithm MCSampEn.
As sample entropy has been prevalently used in a large number of areas, we consider several series with a variety of statistical features, including the electrocardiogram (ECG) series, RR interval series, electroencephalogram (EEG) series , mechanical vibration signals (MVS), meteorological data (MD), and 1/ f noise. The ECG and EEG data can be downloaded from PhysioNet, a website offering access to recorded physiologic signals (PhysioBank) and related open-source toolkits (PhysioToolkit) [19]. The MVS data can be found in [20] and the website of the Case Western Reserve University Bearing Data Center [21]. The MD data can be downloaded from the website of the Royal Netherlands Meteorological Institute [22]. The databases used in this paper include: Long-Term AF Database (ltafdb) [23]. This database includes 84 long-term ECG recordings of subjects with paroxysmal or sustained atrial fibrillation (AF). Each record contains two simultaneously recorded ECG signals digitized at 128 Hz with 12-bit resolution over a 20 mV range; record durations vary but are typically 24 to 25 h. Meteorological Database (MD) [22]. The meteorological database used in this section records the hourly weather data in the past 70 years in the Netherlands.

Long-Term ST Database (ltstdb) [24].
As each database consists of multiple records from different subjects, we select one record randomly from each database. Specifically, we choose record "00" from ltafdb, "s20011" from ltstdb, "14046" from ltdb, "chf01" from chfdb, "mgh001" from mghdb, "chb07_01" from chbmit, "Miss_30_2" from gearbox, "XE110_DE_Time" from RB, and "380_t" from MD. Moreover, 1/ f noise signal, an artificial signal, is studied to increase diversity. The time series considered in this section are illustrated in Figure 1, where all samples are normalized to have a standard deviation of 1, since the parameter threshold r is proportional to the standard deviation of the records, and thus the whole range of the records is negligible.

Approximation Accuracy
In the experiments presented in this subsection, we examine the approximation accuracy of the MCSampEn algorithm. Specifically, we set r := 0.15 and m := 4, 5. We vary the sampling size N 0 and the number N 1 of computations to study the approximation accuracy of the proposed algorithm. In this experiment, records with lengths exceeding 10 6 are truncated to have length 10 6 ; otherwise, the entire records are used. Since in the MCSampEn algorithm, s k ∈ Ω are selected randomly, the outcome of the algorithm depends on the selected value of s k . To overcome the effect of the randomness, for every specified pair of (N 0 , N 1 ), we run the algorithm 50 times and calculate the mean errors (MeanErr) and the root mean squared errors (RMeanSqErr) of the 50 outcomes.
In our first experiment, we consider series "mghdb/mgh001", select parameters Figure 2 the mean errors and the root mean squared errors of the MCSampEn outputs as surfaces in the N 0 -N 1 coordinate system. Images (a) and (c) of Figure 2 show the values of MeanErr and images (b), (d), and (f) of Figure 2 show the values of RMeanSqErr. Figure 2 clearly demonstrates that both the mean errors and the root mean squared errors of the MCSampEn outputs converge to 0 as N 0 or N 1 increases to infinity. This is consistent with our theoretical analysis in the previous section.
In the second experiment, we consider all series illustrated in Figure 1 and show numerical results in Figures 3 and 4. Images (a), (c), and (e) of Figure 3 show the values of MeanErr, and images (b), (d), and (f) of Figure 3 show the values of RMeanSqErr, with N 0 ∈ {200i : i ∈ Z + 20 } and fixed N 1 = 250. Images (a), (c), and (e) of Figure 4 show the values of MeanErr, and images (b), (d), and (f) of Figure 4 show the values of RMeanSqErr, with N 0 = 4000 and N 1 ∈ {10i : i ∈ Z + 25 }. Figure 3 indicates that the outputs of the MCSampEn algorithm converge as N 0 increases. We can also see from Figure 3 that when N 0 ≥ 1500, N 1 = 150, and m = 4, both MeanErr and RMeanSqErr are less than 1 × 10 −2 for all tested time series. In other words, the MCSampEn algorithm can effectively estimate sample entropy when N 0 ≥ 1500, N 1 = 150, and m = 4. From Figure 4, we can also observe that the outputs of the MCSampEn algorithm converge as N 1 increases. This is consistent with the theoretical results established in Section 3.   We next explain how the randomness of a time series effects the accuracy of the MCSampEn algorithm by applying the algorithm to the stochastic process MIX(p), which has been widely applied to studies of sample entropy [1,2,28]. The MIX(p) is defined as follows. Let x j := α −1/2 sin(12πj/12) for all j ∈ Z N where α := 12 ∑ j=1 sin 2 (2πj/12) /12. Let {y j : j ∈ Z N } be a family of independent identically distributed (i.i.d) real random variables with uniform probability density on the interval [− √ 3, √ 3]. Note that {x j : j ∈ Z N } and {y j : j ∈ Z N } are sequences with contrary properties: the former is a completely regular sine sequence, and the latter is completely random. Let p ∈ [0, 1], and {z j : j ∈ Z N } be a family of i.i.d random variables satisfying z j = 1 with probability p and z j = 0 with probability 1 − p. Then, the MIX(p) process is defined as {m j := (1 − z j )x j + z j y j : j ∈ Z N }. It's not hard to find that the parameter p controls the ratio of sine sequence and random noise in the MIX(p) process and the increase in p makes the MIX(p) process more random. When p = 0, the MIX(p) process is a deterministic sine sequence. Meanwhile, when p = 1, the MIX(p) process turns out completely unpredictable uniform noise. This feature makes it an ideal series to study how randomness affects the accuracy of the MCSampEn algorithm.  Here, we apply MCSampEn to MIX(p), p ∈ {0.5 + 0.5i : i ∈ Z 19 } and show the results of RMeanSqErr versus p in Figure 5. From Figure 5, we can observe that the values of RMeanSqErr increase linearly with a very small growth rate when p ≤ 0.5. When p > 0.5, the values of RMeanSqErr are significantly faster than that of p ≤ 0.5. Therefore, we believe that when the randomness of a time series is weak, the error of the MCSampEn algorithm is small; as the randomness of the time series increases, the error of the MCSampEn grows.

Time Complexity
In the experiments presented in this subsection, we compare the computing time of the MCSampEn algorithm with that of the kd-tree algorithm [8] and SBOX algorithm [14], under the condition that the value of sample entropy computed by the MCSampEn algorithm is very close to the ground truth value. The computational time experiments are performed on a desktop computer running Windows 11, with an Intel(R) Core(TM) i5-9500 CPU, and 32GB RAM. The implementations of the kd-tree-based algorithm and the MCSampEn algorithm are available on the website https://github.com/phreer/fast_sampen_impl.git (accessed on 30 March 2022). As for the SBOX method, we utilize the implementation given by the original author, published on website https://sites.google.com/view/yhwpersonal-homepage (accessed on 25 October 2021). To demonstrate the validity of the MCSampEn algorithm, we also show both the sample entropy estimated by MCSampEn and the corresponding ground truth.
As we have discussed above, the time complexity of the MCSampEn algorithm depends on the parameters N 0 and N 1 . In this subsection, we discuss two strategies for choosing N 0 and N 1 : S1 Choose N 0 and N 1 to be independent of N, for example N 0 = 2 × 10 3 and N 1 = 150.
An intuitive explanation of the second strategy is shown below. We would like to choose N 0 and N 1 such that the overall time complexity of executing the algorithm is O (N log N). For this purpose, we expect N 0 to grow like √ N and N 1 to grow logarithmically in N. However, when N is not large enough, lack of sampling templates can seriously impair the accuracy of the algorithm. To overcome this problem, we set a lower bound of N 0 to 1024, which is a good trade-off between accuracy and time complexity. The experimental results in this subsection show that this strategy can produce satisfactory output even when N is small.
The results on different signals "ltafdb/00", "1/ f noise", "chbmit/chb07_01", and "ltecg/14046" are shown in Figure 6, where the first strategy is adopted by setting N 0 = 2 × 10 3 and N 1 = 150, and the results for m = 4 are marked by red color, and the results for m = 5 are marked by blue. In the left column of Figure 6, the values of computation time consumed by the kd-tree, SBOX, and MCSampEn algorithms are plotted, respectively, with the dashed lines marked "x", the dash-dot lines marked "+", and the solid lines marked "o". From the results shown in the left column of Figure 6, we can find that MCSampEn is faster than the SBOX algorithm when N is greater than 2 15 . We also can see when the time series "chbmit/chb07_01" and "ltecg/14046" have length N of 2 20 , MCSampEn is nearly 1000 times faster than the SBOX algorithm. Compared to the kd-tree algorithm, the MCSampEn algorithm can still achieve up to hundreds of times acceleration when N = 2 20 . In addition, the time complexity of MCSampEn algorithm is close to a constant relative to m, and is much smaller than the kd-tree and SBOX algorithms when N is large enough. Meanwhile, the computational time (shown in the left column of Figure 6) required is hardly affected by the times series length N.
The right column of Figure 6 shows the average of 50 outputs of the MCSampEn algorithm for different time series under the settings of N 0 = 2 × 10 3 and N 1 = 150, where the red solid lines plot the average for the cases of m = 4, and the blue solid lines plot the average for the cases of m = 5. In the right column of Figure 6, the values of ground truth for the cases of m = 4 and m = 5 are plotted by the red and blue dashed lines, respectively. Meanwhile, in the right column of Figure 6, we use error bars "I" to represent the values of RMeanSqErr, where the larger the value of RMeanSqErr, the longer the error bar "I". From the length of error bar "I", we can see that the values of RMeanSqErr are small compared to the ground truth. Especially on the time series "ltafdb/00", "chbmit/chb_0701", and "ltecg/14046", the values of RMeanSqErr are negligible compared to the values of ground truth. These results imply that when N 0 = 2 × 10 3 and N 1 = 150, the sample entropy estimated by the MCSampEn algorithm can effectively approximate the ground truth value. The results of the second strategy are shown in Figure 7, where N 0 = max{1024, √ N } and N 1 = min{5 + log 2 N, N/N 0 }. The results for m = 4 are marked by red color, and the results for m = 5 are marked by blue color. The left column of Figure 6 shows the values of computation time consumed by the kd-tree, SBOX, and MCSampEn algorithms, which are presented by the dashed lines marked "x", the dash-dot lines marked "+", and the solid lines marked "o", respectively. From the left column of Figure 7, we also can see that with the second strategy, the computational time of MCSampEn algorithm is much less than that of the kd-tree and SBOX algorithms, since the computational complexity of Algorithm 2 is O (N log N). Furthermore, we observe that MCSampEn achieves a speedup of more than 100 compared to the SBOX algorithm when N goes from 2 16 to 2 18 , and it is over 1000 times faster when N = 2 20 . Compared to the kd-tree algorithm, the MCSampEn algorithm can still obtain up to 1000 times acceleration when N = 2 20 .  In the right column of Figure 7, we plot the average of 50 outputs of the MCSampEn algorithm for different time series by the red and blue solid lines for m = 4 and m = 5, respectively. At the same time, the values of ground truth for the cases of m = 4 and m = 5 are plotted by the red and blue dashed lines, respectively. As in Figure 6, we use the error bar "I" to represent the values of RMeanSqErr. Comparing the error bar "I" in Figure 6, we can see that the values of the RMeanSqErr in this experiment are larger than that shown in Figure 6. However, the value of RMeanSqErr is still small in terms of the values of ground truth. Moreover, we can observe that the length of the error bars decreases as N increases. This means that we can obtain a better approximation of sample entropy as the time series length increases.
To reveal the effect of randomness on the speedup, we compare the time taken by the kd-tree and MCSampEn algorithms to compute the sample entropy of the time series MIX(p), p ∈ {0.5 + 0.5i : i ∈ Z 19 }. The experimental results are shown in Figure 8, where the results for m = 4 are marked by red color, and the results for m = 5 are marked by blue. The values of computation time consumed by the kd-tree and MCSampEn algorithms are plotted, respectively, with the dashed lines marked "x" and the solid lines marked "o". In this experiment, we set N = 2 20 and r = 0.15. We also let N 0 = 1000 + 3000p and N 1 = 80 + 70p to ensure that the relative error RMeanSqErr/SampEn is no greater than 0.02. From Figure 8, we can see that when the value of p is less than 0.2, compared with the kd-tree algorithm, the MCSampEn algorithm can achieve 300 to 1000 times speedup. When the value of p is greater than 0.8, our algorithm can still obtain a 10x speedup relative to the kd-tree algorithm. From the experiments in this subsection, we can observe that the MCSampEn algorithm can achieve a high speedup when it is applied to different types of signals. In fact, compared with kd-tree algorithm, the MCSampEn algorithm can achieve high accuracy and more than 300 times acceleration when the time series has less randomness. When the randomness of the time series is high, our algorithm can still obtain a speedup of nearly 10 times.

Memory Usage
In order to show the performance of the MCSampEn algorithm more comprehensively, we also compare the memory usage of the kd-tree and MCSampEn algorithms. The memory usage on signal "ltstdb/s20011" is shown in Figure 9, where the memory usage for m = 4 and m = 5 is shown in Figure 9a,b, respectively. In this figure, the memory usage of the kd-tree algorithm is plotted by the blue dash-dot lines marked "x". The memory usage of the MCSampEn algorithm with the first and second strategies is plotted by the green dashed lines marked "+" and the red dotted lines marked "o", respectively. In Figure 9, the first strategy is adopted by setting N 0 = 2048 and N 1 = 150, and the second strategy is adopted by N 0 = max{1024, √ N } and N 1 = min{5 + log 2 N, N/N 0 }. We also present the memory usage for storing the data by the black solid lines marked " ".
From the results shown in Figure 9, it can be seen that when the size of the data is 2 20 , the memory required by the kd-tree algorithm is almost 36 times that of the memory required by the MCSampEn algorithm. This is because the kd-tree algorithm requires a large memory space to save the kd-tree. Meanwhile, the experimental results in Figure 9 also show that the amount of memory required by the MCSampEn algorithm is only about 15 MB more than the amount of memory required to store the data when the length of data is between 2 14 and 2 24 . This is because the MCSampEn algorithm requires additional memory for storing N 0 templates and to execute the subroutines that generate random numbers. Because the MCSampEn algorithm is based on Monte Carlo sampling and the law of large numbers, it is an easily parallelizable algorithm. Therefore, combined with distributed storage techniques, the idea of the MCSampEn algorithm can be used to compute sample entropy for large-scale data (for example, where the size of data is larger than 1 TB). Parallel algorithms for computing sample entropy of large-scale data will be our future work.

Conclusions
In this paper, we propose a Monte-Carlo-based algorithm called MCSampEn to estimate sample entropy and prove that the outputs of MCSampEn can approximate sample entropy in the sense of almost sure convergence of order 1. We provide two strategies to select the sampling parameters N 0 and N 1 , which appear in MCSampEn. The experiment results show that we can flexibly select the parameters N 0 and N 1 to balance the computational complexity and error. From the experimental results, we can observe that the computational time consumed by the proposed algorithm is significantly shorter than the kd-tree and SBOX algorithms, with negligible loss of accuracy. Meanwhile, the computational complexity of our MCSampEn method is hardly affected by the time series length N. We also study how the randomness of the time series affects the accuracy and computation time of the MCSampEn algorithm by applying the algorithm to the stochastic process MIX(p). The results indicate that the proposed algorithm performs well for time series with less randomness.

Conflicts of Interest:
The authors declare no conflict of interest.
To analyze the expectation ofB N 0 (N 0 −1) , we define the following notation. For all j ∈ Z N 0 , we define random variableB j on the probability space {Ω, F , P} bỹ B j (s) := # i ∈ Z N 0 : i = j and ρ(y s i , y s j ) ≤ r , s ∈ Ω. (A1) For all j ∈ Z N 0 , the definition ofB j indicates thatB j (s) is the number of elements in {y s i : i ∈ Z N 0 } that satisfy ρ(y s i , y s j ) ≤ r and i = j. From the definitions ofB andB j , we have that for all s ∈ Ω,B For p, q, l ∈ N, we say random variable V follows the hypergeometric distribution H(p, q, l) if and only if the probability of V = k See Section 5.3 of [29] for more details about the hypergeometric distribution. For all l ∈ Z N , let B l := {i ∈ Z N : i = l and ρ(y i , y l ) ≤ r}, which is the index set of elements of Y satisfying ρ(y i , y l ) ≤ r. From the definition of B l , we have that B l = #B l . For the purpose of analyzing the expectation ofB N 0 (N 0 −1) , we recall the expectation of the hypergeometric distribution H(p, q, l) (see Theorem 5.3.2 in [29]) and prove a technical lemma as follows.
Theorem A1. For p, q, l ∈ N, the expectation of the hypergeometric distribution H(p, q, l) is ql p .
Lemma A1. Let N 0 ∈ Z N with N 0 > 1. For any fixed j ∈ Z N 0 and l ∈ Z N , the conditional probability distribution ofB j given s j = l is the hypergeometric distribution H(N − 1, B l , N 0 − 1). Moreover, for all j ∈ Z N 0 , the expectation of random variableB j is Proof. Let j ∈ Z N 0 and l ∈ Z N . From the definition ofB j , we can see that for all s ∈ Ω with s j = l,B j (s) ≤ min{B l , N 0 − 1}. On the other hand, since for all s ∈ Ω with s j = l, from the definitions ofB j and B l , we have that N 0 −B j ≤ N − B l . Thus, we can see that for all s ∈ Ω with s j = l, max{0, N 0 − N + B l } ≤B j ≤ min{N 0 − 1, B l }. This means that for k < max{0, N 0 − N + B l } or k > min{N 0 − 1, B l }, #{s ∈ Ω :B j (s) = k and s j = l} = 0.
Meanwhile, it can be checked that for all s ∈ Ω with s j = l and max{0, N 0 − N + B l } ≤ k ≤ min{N 0 − 1, B l },B j (s) = k if and only if vector s contains k components belonging to B l , and N 0 − 1 − k components belonging to Z N /(B l ∪ {l}). Note that there are ( B l k ) ways of drawing k elements from set B l , and ( N−1−B l N 0 −1−k ) ways of drawing N 0 − 1 − k elements from set Z N /(B l ∪ {l}). Thus, by noting that each element in Ω is a permutation formed by extracting N 0 numbers from Z N , we have that for all max{0, Note that #{s ∈ Ω : s j = l} = (N−1)! (N−N 0 )! , and the elements in {s ∈ Ω : s j = l} are of equal probability. Hence, dividing the right term of (A3) by otherwise.
This indicates that the conditional probability distribution ofB j given s j = l is the hypergeometric distribution H(N − 1, B l , N 0 − 1) (see [29]). Since the conditional probability distribution ofB j given s j = l is the hypergeometric distribution H(N − 1, B l , N 0 − 1), from Theorem A1 we have for any j ∈ Z N 0 and l ∈ Z N , E B j | s j = l = B l (N 0 −1) N−1 . Thus, by noting ∑ l∈Z N B l = 2B and P s j = l = 1 N for all l ∈ Z N , from the law of total expectation we obtain (A2).

The proof for Theorem 3 is shown as follows.
Proof. From the definitions ofB andB j , we know Substituting (A2) into (A4) leads to (2).
Next we consider the variance ofB N 0 (N 0 −1) . SinceB = 1 2 ∑ j∈Z N 0B j , the variance of B N 0 (N 0 −1) can be obtained by summing the covariances E[B j 1B j 2 ], j 1 , j 2 ∈ Z N 0 . This motivates us to compute these covariances. As a preparation, we establish two auxiliary lemmas. For all k, l ∈ Z N with k = l, we define B kl := B k ∩ B l and B kl := #B kl .

Lemma A2. It holds that
Proof. Note that B kl B k l is not necessarily empty for (k, l) = (k , l ). For B kl , we define new sets Π kl so that they mutually disjoint and have the same cardinality as B kl . In this way, the formula (A5) will be proved by establishing a set identity and counting their cardinality. To this end, we define Π kl := {(p, k, l) : p ∈ B kl }, for each k, l ∈ Z N with k = l, and Π p := (p, k, l) : k, l ∈ B p with k = l , for each p ∈ Z N . From the definition of Π kl , we have that #Π kl = B kl and Π kl Π k l = ∅ if (k, l) = (k , l ). Thus, Likewise, the definition of Π p ensures that #Π p = B p (B p − 1) and Π p Π p = ∅ if p = p . Thus, by noting that 2B = ∑ p∈Z N B p , Combining Equations (A6) and (A7), we see that it suffices to prove For all k, l ∈ Z N with k = l, and (p, k, l) ∈ Π kl , the definitions of Π kl and B kl ensure p = k, p = l, ρ(y p , y k ) ≤ r and ρ(y p , y l ) ≤ r.
In other words, there are k ∈ B p , l ∈ B p and k = l. Thus, for all k, l ∈ Z N with k = l, and (p, k, l) ∈ Π kl , there has (p, k, l) ∈ Π p . Thus, we obtain On the other hand, for all p ∈ Z N and (p, k, l) ∈ Π p , we know (A9) holds and k = l from the definitions of Π p and B p . This means that (p, k, l) ∈ Π kl and k = l. Hence, we obtain that From (A10) and (A11) we obtain (A8), which leads to the desired result (A5).
For i, j ∈ Z N 0 with i = j, we define random variable Z ij on the probability space {Ω, F , P} by From the definition ofB j , we can see thatB j = ∑ i∈Z N 0 \{j} Z ij . Thus, in order to compute the covariance E[B j 1B j 2 ], we next show the values of P(Z i 1 j 1 = 1, Z i 2 j 2 = 1) for j 1 , j 2 ∈ Z N 0 and i 1 , Lemma A3. It holds that for j 1 , j 2 ∈ Z N 0 with j 1 = j 2 , and i 1 , Moreover, for all j ∈ Z N 0 and i, i ∈ Z N 0 \ {j} with i = i , it holds that Proof. We first prove (A13). Let L N 0 := {(i 1 , j 1 , i 2 , j 2 ) : j 1 , j 2 ∈ Z N 0 with j 1 = j 2 , and i 1 , and for all (i 1 , j 1 , i 2 , j 2 ) ∈ L N 0 , we define Ω i 1 j 1 ,i 2 j 2 := {s ∈ Ω : Z i 1 j 1 (s) = 1, and Z i 2 j 2 (s) = 1}.
We prove (A13) by counting the cardinality of Ω i 1 j 1 ,i 2 j 2 . To this end, we identify Ω i 1 j 1 ,i 2 j 2 as the union of disjoint subsets of Ω i 1 j 1 ,i 2 j 2 . From the definition of Z i 1 j 1 and Z i 2 j 2 , we know for all (i 1 , j 1 , i 2 , j 2 ) ∈ L N 0 and s ∈ Ω i 1 j 1 ,i 2 j 2 that s i 1 ∈ B s j 1 and s i 2 ∈ B s j 2 . At the same time, note that for (i 1 , j 1 , i 2 , j 2 ) ∈ L N 0 and s ∈ Ω i 1 j 1 ,i 2 j 2 , the numbers in set {s j 1 , s j 2 , s i 1 , s i 2 } are distinct. Thus, for all (i 1 , j 1 , i 2 , j 2 ) ∈ L N 0 and s ∈ Ω i 1 j 1 ,i 2 j 2 , it holds that s j 1 = s j 2 , s i 1 ∈ B s j 1 \ {s j 2 }, and s i 2 ∈ B s j 2 \ {s i 1 , s j 1 }. Namely, On the other hand, it is easy to check that Thus, for all (i 1 , j 1 , i 2 , j 2 ) ∈ L N 0 , Ω i 1 j 1 ,i 2 j 2 can be rewritten as For k = l, we define Ω k,l i 1 j 1 ,i 2 j 2 := {s ∈ Ω i 1 j 1 ,i 2 j 2 : s j 1 = k, s j 2 = l}. Then, we can rewrite Ω i 1 j 1 ,i 2 j 2 as (A15) Since Ω k,l Note that for all k ∈ Z N and l ∈ Z N \ {k}, Ω k,l i 1 j 1 ,i 2 j 2 = {s ∈ Ω : s j 1 = k, s j 2 = l, s i 1 ∈ B k \ (B kl ∪ {l}) and s i 2 ∈ B l \ {k}} ∪{s ∈ Ω : s j 1 = k, s j 2 = l, s i 1 ∈ B kl and s i 2 ∈ B l \ {s i 1 , k}}, and the two sets on the right-hand side of the above equation are disjoint. Thus, it holds that for all k ∈ Z N and l ∈ Z N \ {k}, Substituting (A17) into (A16) leads to By direct computation with noting Z 2 kl = Z kl , we obtain from the equation above that Note that ∑ k∈Z N \{l} Z kl = B l and ∑ l∈Z N \{k} B l = 2B − B k . We then have that Substituting (A5) and the above equations into (A18), we obtain that By noting that #Ω = N!
We now turn to prove (A14). Let j ∈ Z N 0 and i, i ∈ Z N 0 \ {j} with i = i . Note that Thus, it holds that Since ∑ l∈Z N B l = 2B, from (A20) we obtain (A14).
With the help of Lemma A3, we can calculate E B j 1B j 2 in the following lemma.
Lemma A4. If N 0 ∈ Z N with N 0 > 3, then for all j 1 , j 2 ∈ Z N 0 with j 1 = j 2 , and for all j ∈ Z N 0 , Proof. We first prove (A21). Let j 1 , j 2 ∈ Z N 0 with j 1 = j 2 . From the decompositioñ B j = ∑ i∈Z N 0 \{j} Z ij , we obtain for all j 1 , We further rewrite the right-hand side of the above equation to obtain We next compute the terms on the right hand side of (A23) one by one. Since for all j, j ∈ Z N 0 , i ∈ Z N 0 \ {j} and i ∈ Z N 0 \ {j }, from Equation (A13) of Lemma A3, we know the first term in the right-hand side of (A23) satisfies (A24) Likewise, by noting that Z ij = Z ji , from Equation (A14) of Lemma A3, we obtain the second, third, and fourth terms on the right-hand side of (A23), Note that for all i, j ∈ Z N 0 with i = j, it holds that Z ij = Z ji and Z 2 ij = Z ij . Thus, the last term on the right-hand side of (A23) satisfies Substituting (A24), (A25), and (A26) into (A23) leads to (A21). It remains to prove (A22). Since for all j ∈ Z N 0 ,B j = ∑ i∈Z N 0 \{j} Z ij = ∑ i∈Z N 0 \{j} Z 2 ij , there has Note that for all j ∈ Z N 0 , Thus, substituting (A2) and (A14) into (A27), we obtain (A22). Now, we are ready to discuss the variance ofB N 0 (N 0 −1) . The proof for Theorem 4 is shown as follows.
Proof. To prove this theorem, we compute E[B 2 ]. Noting thatB = 1 2 ∑ N 0 j=1B j , we have Substituting (A21) and (A22) into (A28) leads to by conducting some computation, from (A29) and the definition of C N 0 (5), we obtain (4). We next estimate C N 0 . It can be checked that To analyze this almost sure convergence rate of − logB k A k : k ∈ N , we require Theorem 2 of [18], which is recalled as follows.
Combining Theorems 3, 4, and A2 leads to the almost sure convergence ofB Lemma A5. Let β > 1 and N 0 ∈ Z N with N 0 > 3. Then, there are constants D β andD β (depending only on β) such that for all 1 ≥ > 0 and N 1 > n ,β , P sup and P sup where n ,β is defined by (6). We next consider the almost sure convergence rate of − logB k A k : k ∈ N . To this end, we introduce the following lemma. Lemma A6. Let N 0 ∈ Z N with N 0 > 3. If A > 0 and B > 0, then for all N 1 ∈ N and 1 > > 0,
Combining Lemmas A5 and A6, we obtain the almost sure convergence rate of − logB k A k : k ∈ N in Theorem 5. The proof of Theorem 5 is provided as follows.
Proof. Note that for all N 1 ∈ N, .
Thus, we know that for all N 1 ∈ N and 1 > > 0, if Let F 1 be the set of the events satisfying (A40), F 2 be the set of the events satisfying (A41), and F 3 be the set of events satisfying (A42). Then, from the above inequalities, we have F 1 ⊂ F 2 ∪ F 3 . Hence, we have P(F 1 ) ≤ P(F 2 ) + P(F 3 ) (see Theorems 1.5.4 and 1.5.7 in [29]), that is, Substituting (A34) and (A33) into above inequality, from Lemma A5 and the definitions of γ A and γ B , we obtain the desired result (7).