More than 50 years ago, it was realized that the variation of the heart rate, i.e.
, heart rate variability (HRV), could be used as a marker of cardiovascular health status [1
]. This discovery was originally applied to fetal medicine where reductions in fetal HRV during labor identified babies who were in distress [1
]. However, when HRV measures were applied to post-myocardial infarction (MI) patient populations, it became clear that the presence of decreased HRV could be important for risk-stratifying such patients [2
HRV reflects the net effect of many physiological factors modulating the normal rhythm of the heart and ideally providing cardiac output that is matched to the needs of the body on a beat by beat basis. The data on which HRV analysis is based have traditionally been derived from the ambulatory electrocardiogram (ECG), i.e., the electrical signature of the cardiac cycles from which normal-to-normal interbeat intervals can be identified and measured.
An increasing number of measures are being developed and applied to quantify HRV in the time- and/or frequency domain [3
]. Sassi et al
] provided a critical review of newer methods (e.g., long-range correlation and fractal analysis, short-term complexity, entropy) highlighting their contribution to the technical understanding of HRV and their ability to quantify the complex regulation mechanism of the heart rate not covered by traditional methods. In addition, they addressed the rather limited success of these newer methods in clinical applications.
More specifically, some of the traditional HRV measures [4
] achieve good results in clinical settings (e.g., Total Power, Ultra and Very Low Frequency Power, Low Frequency/High Frequency ratio, [6
]), others work only on specific subgroups (e.g., the standard deviation of normal-to-normal (NN) intervals (SDNN) or the standard deviation of the 5-minute average of NN intervals (SDANN), [6
]), and some are very limited in assisting in the diagnosis of patients (e.g., the average of NN intervals (AVGNN), or the root mean square successive difference of NN intervals (rMSSD), [7
]). Therefore, in the quest of comprehensively quantifying all aspects of HRV, one has to try new measures and learn more about their potential and optimal settings.
Machine learning, which combines computer and data science, statistics, and artificial intelligence, can potentially help to overcome the aforementioned limitations. It is rapidly growing, in terms of new learning algorithms and theory, and this growth is fueled by the increasing amount of available online data and the availability of low-cost computing power. The adoption of data-intensive machine-learning methods can lead to more evidence-based decision-making in the biomedical domain [8
]. However, one is confronted with probabilistic, uncertain, unknown, incomplete, heterogeneous, noisy, and dirty data sets which increase the possibility of the modeling of artifacts. Another problem is that most machine learning approaches assume homogeneity in time, although most physiologic processes do not fulfill this requirement.
Entropy measures represent a family of new methods to quantify the variability of the heart rate [9
]. Use of entropy measures is a promising approach, due to their ability to discover certain patterns and shifts in the "apparent ensemble amount of randomness" of a stochastic process [10
], and to measure randomness as well as the predictability of processes [11
]. In recent years, a huge variety of entropy measures was developed. Amongst the most widely used, especially in HRV analysis, are the approximate entropy (
], sample entropy (
], and fuzzy entropy (
]. Some of those have been further developed, e.g., fuzzy measure entropy (
] and corrected
In general, entropy measures quantify the likelihood that two similar runs of patterns of a certain length remain similar after increasing this length by one point [12
]. Their specific behavior is determined by a number of parameters and the selection of these is crucial for the results.
In recent years, there have been several reports about the selection of parameters for various entropy measures (e.g., [17
]). The results of these studies are in agreement on a couple of parameters, but there are different results published for some others. Furthermore, to our knowledge, all these HRV studies have performed their parameter selection on cross-sectional data, in order to differentiate between pathological and non-pathological HRV. However, to the best of our knowledge, the predictive value (i.e.
, the ability to predict mortality) of entropy measures utilizing the published parameter sets has not been investigated using outcome data. Therefore, Holzinger et al
. recently raised the question of how to select parameters for entropy measures. By now, this problem has not yet been solved, especially for ApEn, SampEn, FuzzyEn and FuzzyMEn [9
We therefore set out to assess the effect of parameter selection on the predictive value of entropy measures in a study of high risk post-myocardial infarction patients with multiple recordings who were followed for survival over a period of
days. Thus, there are two primary aims of this study. The first is to investigate the predictive value of HRV for mortality using recently published parameter sets. Secondly, if these existing parameter sets fail to deliver significant results, our aim is to investigate additional possible parameter combinations. In this work, we are addressing the question raised by Holzinger et al
]; therefore, we focus on
. Furthermore, this work is based on the findings of previous publications focused on the same entropy measures [17
is known to be biased by self-matches, and several corrections have been proposed.
is one of those corrections, and
is another. To enhance comparability of the results of this work,
was added to the list of methods under investigation.
An interesting finding was that the predictive value of the entropy measures is different for data at baseline and data after treatment. This holds true for parameter sets from the literature and all of the different parameter variations, as well as for the threshold parameter r
as a multiple of the standard deviation σ
or as a multiple of
. Analysis of baseline data did not lead to significant results for any of the subgroups or any of the parameter combinations. In contrast, entropy measures applied to the after treatment data proved to have significant predictive value, especially for all patients and patients without CABG. This finding was unexpected but can probably be explained by the fact that the encainide/flecainide phase of the Cardiac Arrhythmia Suppression Trial (CAST) was stopped early because of excess mortality and the moricizine arm of the trial, which was not stopped early, had no effect on mortality [36
]. Thus, these measures likely reflect the overall negative effect of treatment on outcomes in the trial. Another possible explanation could be the fact that the subjects for the trial were initially selected as being at elevated risk of mortality based on frequent ventricular premature contractions (VPCs).
Another important finding was that different approaches for choosing the threshold parameter r
yield different results. In the literature, there are three common ways of choosing the threshold value r
. The first approach uses constant r
]. Choosing such a constant value was questioned, especially for fast dynamic series in [38
] and [27
]. An approach to overcoming these shortcomings is to choose r
as a multiple of the standard deviation σ
of the time series as suggested by Pincus, i.e.
]. A threshold value of
is most commonly reported in the literature [13
]. Another approach for choosing r
is to maximize entropy values. A replacement for the computationally expensive maximization is an approximation
suggested by Chon et al
]. The formula was derived from non-physiologic data and it was shown in [39
] that it does not outperform a constant r
. The authors cited in [15
] suggested choosing
In our study, we determined that a constant choice and parameter sets suggested by Zhao et al
] without additional parameter variation did not result in significant predictive values. Thus, we were unable to demonstrate that
does not outperform a constant value as shown in [39
]. The results presented in Figure 2
and Figure 3
do not support superiority of the choice of the threshold parameter as a multiple of the standard deviation σ
or as a multiple of
. For after treatment data, one can see a tendency to the former approach, since all five entropy measures show good to acceptable performance, but the latter seems to be appropriate as well, especially for
. In general, one can see that the entropy measures behave more stably with respect to changing magnitudes of
compared to the approach depending on the standard deviation, which shows more unstable behavior especially for
. The results of this study are consistent with other research which suggest setting
Varying the threshold parameter r
showed that, in general,
are more sensitive to changes in r
. This phenomenon can be found for both choices of r
, in all subgroups, at baseline and after treatment (Figure 2
and Figure 3
). These findings are in line with those of previous studies and reflect the idea of using a fuzzy membership function instead of the Heaviside function to overcome the sensitivity of
with respect to the threshold parameter [14
The choice of the weighting factors for
has been less investigated and reported in the literature. Chen et al
for test signals, whereas it is described that for a larger n
, closer data points are weighted more strongly. Liu et al
] used the weighting factors
for heart rate variability analysis. Their choices for
were given without any motivation. Our previous findings in [17
] had suggested the choice of
according to [24
In the current study, as stated in Section 3.2.2
, the variation of the weighting factors does not lead to significant changes in the results. Thus, it can be assumed that the choice of the weighting factors is less critical than the choice of the threshold parameter in this population. These results and the variation of r
support previous research and are in line with the choices of n
as suggested in [17
]. Nevertheless, this aspect of parameter selection needs to be investigated in more detail in future research.
One unanticipated finding was that the predictive power of the entropy measures could not be shown for the smallest subgroup, i.e.
, patients without coronary artery bypass grafting and without diabetes mellitus, for data after treatment, even when apparent in the two other groups. This result has not previously been described and reported. Stein et al
. reported that an inclusion of patients with diabetes mellitus or CABG decreases predictive power of traditional heart rate variability after myocardial infarction [34
]. A possible explanation for this might be, as both of these groups tend to have lower HRV, that entropy measures capture a feature in patients with low HRV that is not captured by traditional HRV measures. Alternatively, since the primary difference between the smallest subgroup and the others is the absence of diabetes, a relation between diabetes and entropy affecting mortality might be suspected. Nevertheless, this fact needs further attention.
Another interesting finding was that none of the tested entropy measures, regardless of the used parameter sets, were able for predicting mortality for baseline data. This is especially surprising as several traditional HRV measures proved to be reliable predictors in the same dataset in previous studies [6
]. Compared to the time domain HRV parameters reported by Stein et al
] (see Table 5
), one can see for some entropy measures and parameter combinations trends similar to SDNN and Ln SDANN for baseline data (Figure 3
A,B,C) when varying r
as a multiple of σ
., no significance for all data (A), approaching the significance level for patients w/o CABG (B) and (borderline) significance for all patients w/o CABG and DM (C). The other time-domain parameters did not predict mortality for baseline data in uni-variate analysis as well [7
]. On the contrary, all of the entropy measures under investigation proved to be able to predict mortality after treatment. It therefore can be assumed that these complexity measures detect adverse effects of the treatments used in the CAST. Sensitivity and specificity were in a similar range as reported for frequency-domain parameters by Stein et al
Finally, a number of important limitations of this study need to be considered. First, this study is limited to a relatively small part of the entire 24 h recordings, which was necessary to decrease computation time. Second, the template length was fixed to
for all calculations, as the number of possible parameter combinations would increase dramatically otherwise. According to the literature,
seems to be a reasonable choice. Furthermore, parameters were iterated consecutively and not simultaneously to reduce the dimension of the parameter space. In addition, the study did not evaluate the dependency of the entropy measures on age and gender as reported in literature [41
]. Finally, one has to keep in mind that the Cardiac Arrhythmia Suppression Trial tested three antiarrhythmic drugs which led to adverse effects in some patients. Therefore, the results for data after treatment of this study cannot be generalized to patients with different or without treatment.