Centered and Averaged Fuzzy Entropy to Improve Fuzzy Entropy Precision

Several entropy measures are now widely used to analyze real-world time series. Among them, we can cite approximate entropy, sample entropy and fuzzy entropy (FuzzyEn), the latter one being probably the most efficient among the three. However, FuzzyEn precision depends on the number of samples in the data under study. The longer the signal, the better it is. Nevertheless, long signals are often difficult to obtain in real applications. This is why we herein propose a new FuzzyEn that presents better precision than the standard FuzzyEn. This is performed by increasing the number of samples used in the computation of the entropy measure, without changing the length of the time series. Thus, for the comparisons of the patterns, the mean value is no longer a constraint. Moreover, translated patterns are not the only ones considered: reflected, inversed, and glide-reflected patterns are also taken into account. The new measure (so-called centered and averaged FuzzyEn) is applied to synthetic and biomedical signals. The results show that the centered and averaged FuzzyEn leads to more precise results than the standard FuzzyEn: the relative percentile range is reduced compared to the standard sample entropy and fuzzy entropy measures. The centered and averaged FuzzyEn could now be used in other applications to compare its performances to those of other already-existing entropy measures.


Introduction
Approximate entropy (ApEn) and sample entropy (SampEn) algorithms are now widely used to quantify the irregularity of experimental time series [1,2]. They both rely on the evaluation of vectors' similarity. However, in both ApEn and SampEn, the vectors' similarity is based on the Heaviside function, a function that has rigid boundaries. Thus, the contributions of samples inside the boundary are treated equally, but the samples outside the boundary are left out. However, in the real world, boundaries between classes may be ambiguous: it is often difficult to determine if an input pattern belongs totally to a class. To overcome this lack of reality in ApEn and SampEn algorithms, Chen et al. proposed the fuzzy entropy (FuzzyEn) algorithm [3]. In the latter case, the vectors' similarity is defined by the soft and continuous boundaries of a fuzzy function. Since its introduction, it has been reported that FuzzyEn leads to better performance than ApEn or SampEn [4][5][6]. FuzzyEn presents a stronger relative consistency and shows less dependence on data length than ApEn and SampEn [3].
Nevertheless, the number of samples in a signal still plays a role in the precision of FuzzyEn: the shorter the signal, the lower the number of vectors, and thus, the lower the precision of FuzzyEn (i.e., the larger the standard deviation). Therefore, to obtain more precise entropy values, the longer the signal, the better it is. In practical situations (real data), this may be a challenge. Indeed, it is often difficult to have long recordings, particularly in the biomedical field where patients may have difficulty to stay still or to cooperate. This is why we herein propose a new fuzzy entropy measure that presents better precision than the traditional FuzzyEn measure. This is performed by increasing the number of samples used in the computation, without changing the length of the time series.
The paper is organized as follows. The original algorithm of FuzzyEn is first detailed in Section 2; then the new entropy measure is described. The synthetic and biomedical data (fetal heart rate time series) used in our work are introduced in Section 3. In Section 4, we first present, analyze, and discuss the results obtained with the synthetic data. We then describe and interpret the results obtained with the biomedical time series. We finally end with the conclusion.

Standard Fuzzy Entropy and the New Entropy Measure
In this section, we recall the FuzzyEn concept based on the use of a membership function. For this purpose, the generalized Gaussian membership function is used since it allows the derivation of both the rectangular function used in the calculation of SampEn and the standard Gaussian function used in the calculation of FuzzyEn.

Fuzzy Entropy Algorithm
For a given discrete time series X = {x(1), x(2), . . . , x(N)} of length N, the algorithm to compute FuzzyEn relies on the following steps [1]: For each vector X m (i), compute the similarity degree D m ij of its neighboring vector X m (j) using a similarity function as: where the membership function µ p reported in Figure 1 is defined ∀d ≥ 0 as: and where the distance function d is the maximum absolute difference d[X m (i), X m (j)] = max 0≤k≤m−1 (|x(i + k) − x(j + k)|). For p = 2, we have the Gaussian function, and for p = ∞, we have the rectangular function.
As for ApEn and SampEn, the statistical stability of the FuzzyEn estimation depends on the length N of the time series as reported in Equation (7). To decrease this length-dependency, several strategies can be proposed.

New Approaches
As mentioned above, from a fixed number of samples N in the time series, a way to improve the statistical stability of the entropy measurement consists in artificially increasing the number of similar m-patterns taken into account in the entropy calculation. To do so, three different ways are proposed: 1.
The first approach is inspired by [3,7]. In the latter studies, the interest in centering each m-pattern has been shown. In this case, instead of limiting the search of m-patterns with the same mean value, any pattern can be taken into account. Therefore, the number of similar patterns drastically increases. Therefore, in the first approach, a centered m-pattern Xc m (j) is compared to a reference centered m-pattern Xc m (i). The similarity degree is calculated with Xc m (i) = {x(i), x(i + 1), . . . , x(i + j), through a similarity function: with the same membership function as the one reported in Equation (2). The centered fuzzy entropy FuzzyEn c is thus defined as: The second approach is inspired by [8], where transformed patterns are compared to reference patterns. Thus, in the second approach, a transformed m-pattern Γ k [X m (j)] (see below) is compared to a reference m-pattern X m (i). The similarity degree is calculated with the same membership function as the one reported in Equation (2): Four types of Γ k [X m (j)] operations with k = {T, R, I, G} are evaluated: corresponds to a reflection at the position n, k = R; corresponds to an inversion at the position n, k = I; At first sight, any type of operation could be used. However, from our point of view, only isometries (translation T, reflection R, inversion I and glide reflection G) are suitable. This statement is supported by the recent work reported in [8] where the concept of symmetry was placed back on stage in the study of time series. Indeed, in [8], it was shown that the concept of recurrences could be generalized by taking into account the symmetry properties of m-patterns.
As entropy can be derived from the recurrence concept (the recurrence plot [9] is defined as [8], four new kinds of entropy (ApEn T , ApEn R , ApEn I , ApEn G or SampEn T , SampEn R , SampEn I , SampEn G or FuzzyEn T , FuzzyEn R , FuzzyEn I , FuzzyEn G ) can be proposed. Finally, as our ultimate goal is to increase the precision of FuzzyEn, it is more appropriate here to calculate the mean value of the four new fuzzy entropies. In this case, the averaged fuzzy entropy FuzzyEn a is defined as: with: The last approach compares a centered m-pattern Xc m (i) to a transformed centered m-pattern Γ k [Xc m (j)]. In this case, the centered and averaged fuzzy entropy FuzzyEn ca is defined as: with: The novelty of our method therefore relies on two main points: (i) the mean value of the patterns is no longer a constraint in the computation as the patterns are centered; (ii) translated patterns, but also reflected, inversed, and glide-reflected patterns are taken into account (in the standard sample and fuzzy entropy measures, only translated patterns are considered). Therefore, for a given number of samples N in the time series, we managed to increase the number of similar m-patterns taken into account in the entropy calculation. In what follows, the new entropy measure will be applied to synthetic 1/ f β time series and biomedical datasets. Its precision will be compared to the one of the standard FuzzyEn.

Synthetic Signals
In order to analyze the new fuzzy entropy measures and to compare their performances with the ones of the standard FuzzyEn, we used 1/ f β time series, with different β values: β varied from −1 to 2 in steps of 0.2. For β > 0, the 1/ f β signals are persistent processes with long-term correlations [10]. However, for β < 0, the 1/ f β signals are anti-persistent processes with short-term anti-correlations [10]. From a theoretical point of view, the higher the value of β, the larger the number of correlations in the time series and, therefore, the larger the number of similar samples used in the computation of FuzzyEn. For each β value, 50 time series were simulated.

Biomedical Data
The new descriptors mentioned above were also applied to biomedical data and more precisely to fetal heart rate (FHR) time series. The latter were acquired using a homemade pulse Doppler system co-developed with Altaïs Technologies (Tours, France). This Doppler fetal monitor transmits ultrasound waves of 2.25 MHz for an acoustic power limited to 1 mW/cm 2 (for more details, see [11]). It was developed to measure both the FHR and fetal movements (pseudo-breathing, limb movements).
The study was approved by the Ethics Committee of the Clinical Investigation Centre for Innovative Technology of Tours (CIC-IT 806 CHRUof Tours). Before acquisition, the consent of each parent was obtained. All parents were over eighteen years of age, and pregnancies were single. After locating the fetal heart with an echographic scanner, 18 Doppler recordings of 30 min each were acquired at CHRU Bretonneau Tours, France. This corresponds to approximately 3600 heart beats for each recording. In order to constitute homogeneous groups without spurious data, gestations complicated by other kinds of disorders (hypertension, diabetes) were discarded. Two groups of fetuses were selected: normal and those with severe intra-uterine growth retardation (IUGR). The severe IUGR group included nine fetuses delivered prematurely by cesarean section. The normal group included nine fetuses without disorders, delivered at term by spontaneous labor. For this clinical protocol, the gestational ages of fetuses ranged from 30-34 weeks.
In what follows, the 30 min of data were processed, but also segments of 10 min and 20 min. Our goal was thus to compare the results obtained as the data length decreases. Moreover, in order to compare the results obtained between normal and IUGR groups, a Mann-Whitney test was used. A p-value strictly less than 0.05 was considered to define statistical significance.

Results and Discussion
In all that follows, the value of r is set at 0.1 × the standard deviation of the time series.

Results for the Synthetic Signals
In order to validate our hypothesis (that is, the greater the number of similar m-patterns taken into account in the computation, the more precise the entropy measure), we started by counting the number of similar m-patterns from 50 synthetic time series.
From 1/ f β noises generated with N = 5000 samples with β ranging from −1 to 2, the median of the mean number MN of similar 3-patterns and the median of the mean number MN ca of centered and averaged similar 3-patterns were evaluated and are reported in Table 1. As expected, the higher the sample correlation in the time series, the higher the value of β and the higher the number of similar 3-patterns. Indeed, from Table 1, when β increases from 0 to 2, MN goes from 1 to 162. When symmetrical properties and the centering operation are taken into account, MN ca goes from 21 to 9278 for β ranging from 0 to 2. From this, it can be claimed that the averaging and the centering operations increase the number of similar patterns. Furthermore, whatever the m-value, we obtain rising trends as β increases (data not shown).
In order to evaluate the performance of our new approaches, for a fixed m-value and for 50 1/ To quantitatively evaluate the gain brought by our new approaches in comparison with FuzzyEn, two kinds of statistics have been evaluated: percentile ranges and relative percentile ranges. The following percentile ranges have thus been computed: Finally, from the percentile ranges, the following relative percentile ranges have been evaluated: The global results are presented in Tables A1-A3 reported in the Appendix and are shown in Figure 4. We observe from the tables that SampEn leads to worse results than FuzzyEn, as already shown by others. Moreover, we observe that the new approach leads to results that show a reduced percentile range compared to the standard fuzzy entropy measure. Its precision is therefore better than the other entropy measures. However, our work also has some drawbacks: the gain provided by the method depends on the signal properties. The gain differs with β values.

Results for the Fetal Heart Rate Time Series
The results obtained from FHR time series for m = 2 are presented in Figure 5 for data lengths of 10 min, 20 min, and 30 min. For the three data lengths, we observe that the normal fetuses show a significantly higher entropy value than the pathological fetuses. This is true for the two entropy measures: FuzzyEn ca and the standard FuzzyEn. This means that FHR time series are more irregular for the normal fetuses than for the pathological ones. We also observe that the p-value between the two groups decreases as the data length increases. Therefore, the longer the data, the better the separation between the two groups. However, we note that, whatever the length studied, the p-value is lower for FuzzyEn ca than for the standard FuzzyEn. Our new entropy measure is therefore more interesting for this classification purpose than the standard FuzzyEn. Other data may now be processed; see, e.g., [12][13][14].

Conclusions
A new entropy measure, FuzzyEn ca , is proposed to improve the precision of the standard FuzzyEn. The new measure relies on centering and averaging approaches that lead to a larger number of similar patterns used in the computation of the entropy algorithm. This is performed by removing the constraint of the mean value in the comparison of the patterns. Moreover, translated patterns are not the only ones considered: reflected, inversed, and glide-reflected patterns are also taken into account. The results obtained on 1/ f β time series reveal that FuzzyEn ca shows a greater precision than FuzzyEn. Moreover, when applied to FHR time series acquired from normal and pathological fetuses, FuzzyEn ca leads to a better discrimination between the two groups than the standard FuzzyEn. These findings could allow one to obtain entropy-based relevant information by processing shorter datasets (we could obtain the same precision as the standard FuzzyEn, but with less data). This is particularly interesting for the biomedical field. FuzzyEn ca now has to be applied to other datasets, and its performance has to be compared to those of other already-existing entropy measures.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Results reported in Tables A1-A3 show performances that differ with the β values. We observe that the higher the β value, the lower the gain obtained in terms of relative percentile range. This is probably due to the level of correlation between samples in the time series.  Table A3. Same as Table A1, but for m = 4. "-" means that an undefined value is obtained due the absence of similar m-patterns in the time series.