Entropy and Compression Capture Different Complexity Features: The Case of Fetal Heart Rate

Entropy and compression have been used to distinguish fetuses at risk of hypoxia from their healthy counterparts through the analysis of Fetal Heart Rate (FHR). Low correlation that was observed between these two approaches suggests that they capture different complexity features. This study aims at characterizing the complexity of FHR features captured by entropy and compression, using as reference international guidelines. Single and multi-scale approaches were considered in the computation of entropy and compression. The following physiologic-based features were considered: FHR baseline; percentage of abnormal long (%abLTV) and short (%abSTV) term variability; average short term variability; and, number of acceleration and decelerations. All of the features were computed on a set of 68 intrapartum FHR tracings, divided as normal, mildly, and moderately-severely acidemic born fetuses. The correlation between entropy/compression features and the physiologic-based features was assessed. There were correlations between compressions and accelerations and decelerations, but neither accelerations nor decelerations were significantly correlated with entropies. The %abSTV was significantly correlated with entropies (ranging between −0.54 and −0.62), and to a higher extent with compression (ranging between −0.80 and −0.94). Distinction between groups was clearer in the lower scales using entropy and in the higher scales using compression. Entropy and compression are complementary complexity measures.


Introduction
In developed countries, clinical decisions during labor are strongly based on Fetal Heart Rate (FHR) monitoring [1,2], and cardiotocography is the tool that is routinely used for FHR and uterine contractions recordings. FHR is generally assessed in beats per minute to evaluate fetal wellbeing allowing for an obstetrician to intervene and prevent potentially irreversible fetal brain damage or death. Despite the importance of FHR monitoring, poor reproducibility of visual analysis of cardiotocograms have been reported [1,3], and consequently computerized FHR analysis and new Entropy 2017, 19, 688 2 of 9 signal processing and pattern recognition techniques have been developed [4][5][6]. In this setting, complexity analysis of FHR recordings remains one of the most challenging tasks. Actually, FHR during labor seems to be part of a complex system, where, most of the times, individual agents behave in unpredictable ways, and whose actions are connected, inducing changes to one another [7]. In cases like this, a high degree of uncertainty is known to be present, leading to a poor interrater agreement. As a result, uncritical adherence to conventional guidelines might become more harmful than beneficial [7] and other approaches may be more appropriate, such as nonlinear models and scan of patterns [8].
Complexity is a property of systems that quantifies the amount of structured information and may be assessed using both entropy and compression. Approximate entropy (ApEn) is a measure of complexity, introduced by Pincus, used to quantify the amount of regularity and the unpredictability of fluctuations over time-series [9]. Later, Sample Entropy (SampEn) was presented by Richman and Moorman with the same goal as ApEn to assess biological time series [10]. In the particular case of FHR analysis, ApEn and SampEn are the most used measures of complexity, and are known to be used in the detection of different pathologies. On the other hand, the Kolmogorov complexity of an object is the length of the shortest computer program that can output it. Although Kolmogorov complexity is a non-computable measure, compressors do a very good job approximating it. This approach has led to positive results in very different subjects, as in literature [11], music [12], and computer virus and Internet traffic analysis [13]. Despite the successful application of compressors to FHR tracings in pathology detection, they have been used only to a limited extent in the analysis of biological signals to date [6,14].
Although both entropy and compression were able to distinguish fetuses at risk of hypoxia from their healthy counterparts through the analysis of the FHR signal, their low correlation suggests that these measures capture different features [15]. Henrique's work also suggested further research in order to study how physiological features are captured by entropy and compression.
The small computational time that is associated with both measures, namely with compressors, is particularly convenient if their inclusion in existing FHR monitoring systems is justified. Hopefully, the information on the fetus complexity obtained from the FHR signal may provide important auxiliary information to clinicians in fetal assessment, supporting clinical decisions. However, as entropy and compression seem to capture different features, it is important to study which features are captured by these measures. Following previous work's suggestion [15], this study aims at characterizing the complexity FHR features that are captured by entropy and by compression having as reference international clinical guidelines, and exploring the multiscale approach for entropy and compression.

Materials and Methods
Sixty eight FHR intrapartum tracings consecutively selected from a pre-existing database of term singleton gestations, with at most 60 min of tracing, were analyzed according to FIGO (The International Federation of Gynecology and Obstetrics) guidelines [16] using Omniview-SisPorto, version 4.0.9 [17] and the following FHR features, from the last 60 min of tracings, were registered: FHR baseline, which is the mean level of the most horizontal and less oscillatory FHR segments; percentage of abnormal short term variability (%abSTV), subsequent FHR signals differing < 1 bpm; percentage of abnormal long term variability (%abLTV), FHR signals with difference between minimum and maximum values in surrounding 1-min window < 5 bpm; mean short term variability (mean STV); number of acceleration (Acc), i.e., abrupt increases in FHR above the baseline, of more than 15 bpm in amplitude, and lasting more than 15 s, but less than 10 min per minute; number of decelerations (Dec), i.e., decreases in the FHR below the baseline, of more than 15 bpm in amplitude, and lasting more than 15 s per minute.
Newborn umbilical artery blood (UAB) pH was used as measure of fetal oxygenation, as it represents an active measure of fetal oxygenation. A low UAB pH indicates the presence of acidemia occurring during labor and delivery, presenting a higher risk of perinatal death or neurological injuries from hypoxia. Of the 68 cases, 48 delivered fetuses with pH in the normal range, pH ≥ 7.20 (N), 10 delivered with UAB pH between 7.10 and 7.20, mildly acidemic fetuses (MA), and 10 moderate-to-severe acidemic fetuses with UAB pH ≤ 7.10 (MSA).
All of the tracings were resampled at a frequency of 2Hz after pre-processing, based on an algorithm described in previous studies [18]. A more detailed description of the data is presented elsewhere [5].
Spearman Correlation Coefficients were used to compare each complexity measure with different parameters. For entropy, two measures were used: Approximate Entropy (ApEn) and Sample Entropy (SampEn), and for each, 3 different tolerances were used (0.10, 0.15 and 0.2). For compression, six different compressors were used, namely brotli, bzip2, gzip, paq8l, ppmd, and lzma. For the first five, the lowest and highest levels of compression were tested. The latter, lzma, only one level of compression was possible. Compression was measured as compression rate, this is the compressed size of the trace divided by the original size of the same trace.
Regarding the physiological features that were captured by SisPorto, Spearman Correlation Coefficients between the percentage of abnormal short term variability (%abSTV), mean value of the STV, baseline, percentage of abnormal long term variability (%abLTV), number of Accelerations (Acc) per 10 min, and Decelerations (Dec) per 10 min were computed.
Spearman Correlation Coefficients between the physiological features and Compressors and between the physiological features and entropies were computed.
Multiscale analysis was also performed to study the effect of entropy and compression, up to scale 20, in the three different groups (N, MA, and MSA). For entropy, tolerance 0.2 was used in both Approximate Entropy and Sample Entropy, while in compression paq8l, brotli, and gzip, with maximum level of compression were used.

Results
The 68 tracings were acquired in singleton pregnancies and were over 32 to 60 min long (mean = 55, standard deviation = 7) with gestational age, for groups MSA (mean = 39.8, standard deviation = 1.3), MA (39.4, 1.6), and N (39. 3,2). Median time between the end of the tracings and the delivery was 0.0 min for all of the groups. The mean (standard deviation) Apgar score at the first minute was 6.2 (2.7) in MSA group, 8.5 (0.7) in MA group, and 8.8 (0.7) in N group.
Correlations between Approximate and Sample entropies with different tolerances were almost perfect, as the lowest Spearman Correlation Coefficient was 0.918. All were significant at the 0.01 level (2-tailed).
As for compression, six different compressors were applied, with five of them having two versions, one as the lowest level of compression (faster) and another with highest level of compression (slower). All of the Spearman Correlation Coefficients are over 0.819. All correlations are significant at the 0.01 level (2-tailed). Table 1 shows the correlations between different compressors (for distinct levels of compression) with entropies (with distinct tolerances). Paq8l_8 has the highest correlation with entropies with Spearman's Coefficient over 0.549 (p < 0.01). The lowest correlation coefficients were found between Bzip2 and entropies, as values were under 0.143, with no statistically significant difference found.  Table 2 describes the correlation between physiological features captured by SisPorto, using the FIGO guidelines. The highest correlations were found between %abSTV and Mean STV (r = −0.796), and also between %abSTV and %abLTV (r = 0.706). Note that %abSTV was the only variable significantly correlated with all others.  Both entropy and compression captured complexity in the tracings, as seen in Tables 3 and 4, but compression was correlated with more physiologic features. Actually, there were significant correlations between compressors and number of accelerations and decelerations per minute, but neither were correlated with entropies. Despite that there was a significant fair correlation between entropies and %abSTV, its magnitude was rather median, when compared with compression (Tables 3 and 4).  A multiscale analysis of the tracings using either entropy or compression was also performed. Using Approximate Entropy with tolerance of 0.2, it was possible to distinguish MSA tracings from the other two groups up to scale 7. With Sample Entropy (0.2) the distinction was shown up to scale 4 (Figures 1 and 2). A multiscale analysis of the tracings using either entropy or compression was also performed. Using Approximate Entropy with tolerance of 0.2, it was possible to distinguish MSA tracings from the other two groups up to scale 7. With Sample Entropy (0.2) the distinction was shown up to scale 4 (Figures 1 and 2).  Regarding compression, the opposite happens. Using all of the compressors, it was possible to distinguish groups N and MA in lower scales. Actually, paq8l distinguished them in all scales used. In all compressors, as scale increases, group N starts to diverge from group MSA, with bzip2 and gzip getting statistically significant results for scale higher than 2 and 7, respectively (Figures 3-6).   A multiscale analysis of the tracings using either entropy or compression was also performed. Using Approximate Entropy with tolerance of 0.2, it was possible to distinguish MSA tracings from the other two groups up to scale 7. With Sample Entropy (0.2) the distinction was shown up to scale 4 (Figures 1 and 2).  Regarding compression, the opposite happens. Using all of the compressors, it was possible to distinguish groups N and MA in lower scales. Actually, paq8l distinguished them in all scales used. In all compressors, as scale increases, group N starts to diverge from group MSA, with bzip2 and gzip getting statistically significant results for scale higher than 2 and 7, respectively (Figures 3-6). Regarding compression, the opposite happens. Using all of the compressors, it was possible to distinguish groups N and MA in lower scales. Actually, paq8l distinguished them in all scales used. In all compressors, as scale increases, group N starts to diverge from group MSA, with bzip2 and gzip getting statistically significant results for scale higher than 2 and 7, respectively (Figures 3-6). Regarding compression, the opposite happens. Using all of the compressors, it was possible to distinguish groups N and MA in lower scales. Actually, paq8l distinguished them in all scales used. In all compressors, as scale increases, group N starts to diverge from group MSA, with bzip2 and gzip getting statistically significant results for scale higher than 2 and 7, respectively (Figures 3-6).

Discussion
In this study, the different entropies and tolerances used in FHR analysis were highly correlated as were the different used compressors. This suggests that different entropy and compressor estimators are internally valid, i.e., despite some discrepancies in results, they seem to absorb similar information from the tracings.
Despite the high correlation within similar complexity measures, the use of different values for certain parameters, such as the threshold r when using ApEn, might lead to different performances in the characterization of different fetal behavioral patterns or acute and chronic conditions [20]. Besides, different complexity measures seem to capture different information since correlation between entropies and compressors is low, with bzip2 being perfectly uncorrelated, while ppmd and gzip having some non-significant correlations with entropies. On the other hand, paq8l is moderately correlated. It is still not totally clear why this happens, but maybe some characteristics of each compressor algorithm may provide us some answers. For example, in order to optimize the compression process, bzip2 performs a block sort (reversible), and, consequently, does not take advantage from the initial structure (related with entropy).
As to the main objective of this study of comparing the two different complexity measures, entropy and compression, with the FIGO international guidelines for fetal monitoring, we have observed a good correlation between percentage of abnormal short term variability and entropy. However, compression has seemed to be the measure that has captured most of the information out of abnormal short term variability. Higher values of entropy and compression mean that the presence of more complex structures and these structures were characteristic from healthier status. Tracings with lower percentage of abnormal STV were typical from healthy babies, which explains the negative correlation between the variables (compression/entropy and abnormal short term variability).
There is no uniform definition of STV. The one that is used in this work was defined in SisPorto as subsequent FHR signals differing <1 bpm, closely following the FIGO guidelines, but other definitions exist in the literature, with no concrete agreement on which one should be used [21]. It would be interesting to see how compression behaves with these other variants of STV. In an affirmative case, compression could be used as a universal measure to capture STV. Moreover, compression seems to capture more features from tracings since numbers of accelerations and decelerations per minute were correlated with compression but not with entropies.
Actually, it is interesting to notice that the accelerations and decelerations did not correlate with any entropy, but correlated with all complexities. As in Baumert's study [22], our results suggest that acceleration and decelerations are not purely random but follow some deterministic structures that can be explored by compression algorithms. The use of different parameters in the entropies, considering for instance variants of the threshold such as fuzzy functions, might allow for capturing accelerations and decelerations.
With multiscale analysis, it was observed that entropy and compression captured distinct clinical information from the tracings. Entropy distinguishes MSA tracings from the other two groups based on some features that compression cannot capture, and, on the other hand, compression distinguishes groups N and MA possibly by using all SisPorto features. Probably, entropy is correlated with some other features (such as the dynamics) of the physiological data that is not captured by SisPorto. As in Voss study [23] these results show that several nonlinear indices should be combined in order to improve the performance of FHR analysis. Moreover, lower scales in entropy and higher scales in compression can distinguish groups (MSA, MA and N). Compression is better correlated with physiological features that are captured by SisPorto system, and, in fact, these physiological features also do not distinguish groups at scale one. On the other hand, entropy is able to distinguish groups in lower scales, probably because entropy is correlated with some other features, such as the dynamics of the physiological data that are not captured by SisPorto. Notice that, in higher scales, the averaging of the points can justify the absence of these dynamics, lowering the ability to distinguish the classes.
One limitation of this work is the low number of cases, particularly in acidemic groups. This is particularly due to the low prevalence of these cases, which limits the obtention of datasets with higher sample size. It would be interesting, as a future work, and with a bigger dataset, to try to build a predictive model and evaluate how much better would entropy/compression plus SisPorto features perform compared to SisPorto features alone.
Our results enhance the idea that entropy and compression are complementary complexity measures. More research in this area should be done, regarding higher scale values (with long duration tracings), and the possibility of building a model combining both measures.