The Effect of Threshold Values and Weighting Factors on the Association between Entropy Measures and Mortality after Myocardial Infarction in the Cardiac Arrhythmia Suppression Trial (cast)

Heart rate variability (HRV) is a non-invasive measurement based on the intervals between normal heart beats that characterize cardiac autonomic function. Decreased HRV is associated with increased risk of cardiovascular events. Characterizing HRV using only moment statistics fails to capture abnormalities in regulatory function that are important aspects of disease risk. Thus, entropy measures are a promising approach to quantify HRV for risk stratification. The purpose of this study was to investigate this potential for approximate, corrected approximate, sample, fuzzy, and fuzzy measure entropy and its dependency on the parameter selection. Recently, published parameter sets and further parameter combinations were investigated. Heart rate data were obtained from the "Cardiac Arrhythmia Suppression Trial (CAST) RR Interval Sub-Study Database" (Physionet). Corresponding outcomes and clinical data were provided by one of the investigators. The use of previously-reported parameter sets on the pre-treatment data did not significantly add to the identification of patients at risk for cardiovascular death on follow-up. After arrhythmia suppression treatment, several parameter sets predicted outcomes for all patients and patients without coronary artery bypass grafting (CABG). The strongest results were seen using the threshold parameter as a multiple of the data's standard deviation (r = 0.2 · σ). Approximate and sample entropy provided significant hazard ratios for patients without CABG and without diabetes for an entropy maximizing threshold approximation. Additional parameter combinations did not improve the results for pre-treatment data. The results of this study illustrate the influence of parameter selection on entropy measures' potential for cardiovascular risk stratification and support the potential use of entropy measures in future studies.


Introduction
More than 50 years ago, it was realized that the variation of the heart rate, i.e., heart rate variability (HRV), could be used as a marker of cardiovascular health status [1].This discovery was originally applied to fetal medicine where reductions in fetal HRV during labor identified babies who were in distress [1].However, when HRV measures were applied to post-myocardial infarction (MI) patient populations, it became clear that the presence of decreased HRV could be important for risk-stratifying such patients [2].
HRV reflects the net effect of many physiological factors modulating the normal rhythm of the heart and ideally providing cardiac output that is matched to the needs of the body on a beat by beat basis.The data on which HRV analysis is based have traditionally been derived from the ambulatory electrocardiogram (ECG), i.e., the electrical signature of the cardiac cycles from which normal-to-normal interbeat intervals can be identified and measured.
An increasing number of measures are being developed and applied to quantify HRV in the time-and/or frequency domain [3,4].Sassi et al. [5] provided a critical review of newer methods (e.g., long-range correlation and fractal analysis, short-term complexity, entropy) highlighting their contribution to the technical understanding of HRV and their ability to quantify the complex regulation mechanism of the heart rate not covered by traditional methods.In addition, they addressed the rather limited success of these newer methods in clinical applications.
More specifically, some of the traditional HRV measures [4] achieve good results in clinical settings (e.g., Total Power, Ultra and Very Low Frequency Power, Low Frequency/High Frequency ratio, [6]), others work only on specific subgroups (e.g., the standard deviation of normal-to-normal (NN) intervals (SDNN) or the standard deviation of the 5-minute average of NN intervals (SDANN), [6]), and some are very limited in assisting in the diagnosis of patients (e.g., the average of NN intervals (AVGNN), or the root mean square successive difference of NN intervals (rMSSD), [7]).Therefore, in the quest of comprehensively quantifying all aspects of HRV, one has to try new measures and learn more about their potential and optimal settings.
Machine learning, which combines computer and data science, statistics, and artificial intelligence, can potentially help to overcome the aforementioned limitations.It is rapidly growing, in terms of new learning algorithms and theory, and this growth is fueled by the increasing amount of available online data and the availability of low-cost computing power.The adoption of data-intensive machine-learning methods can lead to more evidence-based decision-making in the biomedical domain [8].However, one is confronted with probabilistic, uncertain, unknown, incomplete, heterogeneous, noisy, and dirty data sets which increase the possibility of the modeling of artifacts.Another problem is that most machine learning approaches assume homogeneity in time, although most physiologic processes do not fulfill this requirement.
Entropy measures represent a family of new methods to quantify the variability of the heart rate [9].Use of entropy measures is a promising approach, due to their ability to discover certain patterns and shifts in the "apparent ensemble amount of randomness" of a stochastic process [10], and to measure randomness as well as the predictability of processes [11].In recent years, a huge variety of entropy measures was developed.Amongst the most widely used, especially in HRV analysis, are the approximate entropy (ApEn) [12], sample entropy (SampEn) [13], and fuzzy entropy (FuzzyEn) [14].Some of those have been further developed, e.g., fuzzy measure entropy (FuzzyMEn) [15] and corrected ApEn (CApEn) [16].
In general, entropy measures quantify the likelihood that two similar runs of patterns of a certain length remain similar after increasing this length by one point [12].Their specific behavior is determined by a number of parameters and the selection of these is crucial for the results.
In recent years, there have been several reports about the selection of parameters for various entropy measures (e.g., [17,18]).The results of these studies are in agreement on a couple of parameters, but there are different results published for some others.Furthermore, to our knowledge, all these HRV studies have performed their parameter selection on cross-sectional data, in order to differentiate between pathological and non-pathological HRV.However, to the best of our knowledge, the predictive value (i.e., the ability to predict mortality) of entropy measures utilizing the published parameter sets has not been investigated using outcome data.Therefore, Holzinger et al. recently raised the question of how to select parameters for entropy measures.By now, this problem has not yet been solved, especially for ApEn, SampEn, FuzzyEn and FuzzyMEn [9].
We therefore set out to assess the effect of parameter selection on the predictive value of entropy measures in a study of high risk post-myocardial infarction patients with multiple recordings who were followed for survival over a period of 362 ± 243 days.Thus, there are two primary aims of this study.The first is to investigate the predictive value of HRV for mortality using recently published parameter sets.Secondly, if these existing parameter sets fail to deliver significant results, our aim is to investigate additional possible parameter combinations.In this work, we are addressing the question raised by Holzinger et al. [9]; therefore, we focus on ApEn, SampEn, FuzzyEn, and FuzzyMEn.Furthermore, this work is based on the findings of previous publications focused on the same entropy measures [17,19].Moreover, ApEn is known to be biased by self-matches, and several corrections have been proposed.SampEn is one of those corrections, and CApEn is another.To enhance comparability of the results of this work, CApEn was added to the list of methods under investigation.

Data
All heart rate data used in this study were taken from Physionet [20], a free-access, on-line archive of physiological signals.Physionet guarantees that all data have been fully de-identified (anonymized), and may be used without further institutional review board approval.
All RR interval data were taken from the "Cardiac Arrhythmia Suppression Trial (CAST) RR Interval Sub-Study Database" [7].This subset was selected based on the availability of usable qualifying and suppression tapes to allow for evaluating the predictive value of HRV parameters with treatment [7].CAST was designed to analyze the effects of suppression of ventricular arrhythmia by anti-arrhythmic drugs after MI on survival [21,22].Corresponding outcome and clinical data were provided by one of the investigators and co-authors Phyllis K. Stein.Baseline (pre-treatment) interbeat (RR) interval data and data after treatment were used.In total, 760 pre-treatment and 740 after treatment recordings with corresponding follow-up data were available and used.Patient baseline data can be found in Table 1.
Table 1.Patient baseline data and number of records before and after treatment (data are stated as mean ± SD or median and 95% confidence interval (CI)).Abbreviations: SD = standard deviation; CI = confidence interval; MI = myocardial infarction; CABG = coronary artery bypass grafting; DM = diabetes mellitus.

Entropy Measures
Entropy measures quantify the logarithmic likelihood of two similar sequences of length m (template length) to remain similar after increasing the length to m + 1 [12].Thereby, similarity is defined either using a rectangular function of radius r (threshold parameter) or a fuzzy membership function of radius r and exponent n (weighting factor).Finally, some entropy measures distinguish between local and global similarity, therefore requiring local and global radii and exponents (r L , r F , n L , and n F ).The total length of the analyzed series is commonly denoted as N.
• The approximate entropy ApEn as introduced by Pincus et al. [12] is calculated as where C m i (r) is the number of points found within the distance r for any point x(i) within the points x m i := [x(i), ..., x(i + m − 1)], divided by N − m + 1. • Since ApEn is biased by self-matches, the corrected approximate entropy CApEn was introduced by Porta et al. [16].The correction is obtained by replacing the ratio • Another modification of ApEn in order to correct its bias by self-matches is the sample entropy SampEn as introduced by Richman and Moorman [13].It is defined as however, this time, C m i (r) does not count self-matches.• To soften the effects of a hard threshold r, Chen et al. [14] replaced it with the fuzzy membership function The factor of 0.69 was incorporated to get a value of 0.5 for x/r = 1, which is important for comparisons between rectangular and fuzzy membership functions.Finally, with where d is the Chebyshev distance, the fuzzy entropy FuzzyEn is defined as: • The fuzzy measure entropy FuzzyMEn, proposed by Liu et al. [15], introduces a distinction between local and global similarity based on FuzzyEn:

Application of Entropy Measures to CAST Data
NN interval data were downloaded from Physionet [20] using the Waveform Database library (WFDB) for Matlab (version 0.9.10) [20,23].Thus, the wrapper function "ann2rr.m" was extended to use the full functionality of the function "ann2rr" of the WFDB Software Package (version 0.9.10) [20] including the options "-p N" and "-P N" to restrict the output to NN intervals.There was no pre-processing step necessary, since downloaded NN intervals were already free of obvious artifacts.NN intervals at 6 p.m. (chosen to avoid transition effects between day and night) were extracted for all subjects to decrease computation time and to avoid daytime dependent variations.Depending on the parameter set, data length N was either 1000 or 1200, as listed in Table 2.
Unfortunately, we could not perform more robust randomized testing strategies (i.e., the permutation of the original data and choosing various time intervals) due to the high computational complexity of the entropy measures, which was already increased by two orders of magnitude due to parameter variation.A randomized testing strategy would have multiplied the computation time further by at least a thousand times.
Concerning the choice of the parameters of the entropy measures, recent studies are in agreement on a couple of parameters, such as template and data lengths (m and N), but there are discrepancies regarding the threshold parameters r (r L , r F ) for all entropy measures and the selection of the weighting factors n (n L , n F ) for fuzzy entropies [17,18].
Therefore, the parameters were first set according to those in the literature to test whether entropy measures have a predictive value with the suggested parameter sets [17,18].The parameters used in this analysis can be found in Table 2. Second, a parameter iteration and selection process for r (r L and r F ) and n (n L and n F ), respectively, was attempted to test whether the modified parameter sets improved the ability of the measures to risk stratify patients.Consistent with published recommendations, template and data length were set to m = 2 and N = 1200, respectively [17,18].The other parameters were iterated in the following ranges: (1) r ∈ [0.10 • σ, 0.45 • σ], where σ is the standard deviation of the signal, and r ∈ [0.25 • r Chon , 3.00 • r Chon ], where r Chon is an approximation for the r value maximizing entropy values [24], and (2) n ∈ [1.0, 5.0].
The template length is usually set to m = 2 based on the recommendations of Pincus and Goldberger for ApEn [25], Yentes et al. for SampEn [26], and Porta et al. for CApEn [16].These recommendations have been confirmed by other studies, e.g., [27].Zhao et al. showed that SampEn performed similarly with m = 2 and m = 3, but the latter choice is not suitable for FuzzyMEn [18].A false nearest neighbor method is sometimes used as well, but its suitability for ApEn for human heart rate variability data could not be shown [24].In [14,15], the template length is set to m = 2 for FuzzyEn and FuzzyMEn as well.
There have been reports that these measurements are sensitive to data length N [3,11,12].The influence of data length on SampEn was not large for N > 200 [13,28], but was for N < 100 [26], indicating a stabilization of the behavior in the dataset that was tested [18].Zhao et al. reported only small changes for SampEn and FuzzyMEn from N = 300 to N = 1000 [18].In our previous study, we suggested choosing N > 200 or even N > 1000 depending on the threshold value r [17].

Statistical Analysis
The predictive value of each entropy measure was determined using a univariate Cox proportional hazards regression.It is a method to determine the influence of predictor variables X i (in this work: entropy measures) on survival times by fitting the coefficients b i of the model to the observations, where the hazard rate h(t) and the baseline hazard rate function h 0 (t) are estimated from the data [29,30].Finally, the particular hazard ratios are calculated as exp(b).Hazard ratios quantify the increase (or decrease) of the instantaneous risk at any particular time for every unit increase in the associated predictor variable.For example, a hazard ratio of 2 for a certain entropy measure means that the risk of dying is doubled for every unit increase in the entropy measure.
The univariate Cox proportional hazards regression is based on the null hypothesis that the predictor variable does not have any influence on mortality, i.e., that b = 0 and therefore the hazard ratio equals 1.The p-value represents the probability of a certain or more extreme observation under the assumption of a true null hypothesis.In this work, significance levels were set at the 5% level, i.e., the null hypothesis is rejected if p ≤ 0.05.
Data were transformed by using the two-parametric Box-Cox transformation [31,32] to remove skewness after replacing negative values by 0 and adding an offset of λ 2 = 0.001 to fulfill the requirement of positive inputs.The transformation parameter λ is given where necessary.
Results for the parameter iteration and selection process are presented by their p-values and not by their corresponding hazard ratios and their 95% confidence interval opposed to literature, since the focus of the study was to assess the effect of parameter selection on the predictive value of entropy measures but not to compare these hazard ratios.Furthermore, this approach was chosen for reasons of clarity and comprehensibility, because some confidence intervals differ in order of magnitude and their visual comparison would be meaningless.
Additionally, cut-off values and their corresponding sensitivity and specificity were determined for entropies with significant predictive values.Thus, survival curves were compared for various cut-off values using the log-rank test, i.e., a rank order statistics to compare survival distributions of two samples [33], and the cut-off value maximizing the separation was chosen, i.e., cut-off value with lowest p-value.Results are stated as sensitivity and specificity with their corresponding results from the log-rank test comparing lower and higher risk groups (χ 2 statistics and the associated p-value).
The analyses were performed on the whole data set as well as in subsets of patients without coronary artery bypass grafting (CABG), and without CABG and diabetes as reported in [7,34].Baseline data and data after treatment were analyzed independently.All computations were performed using Matlab (R2014a) and R (Version 3.2.3).
We used statistical tests based on the same null hypothesis, (i.e., the predictor variable does not influence mortality), the same subject groups, the same endpoints and only slight variations of the analysis method.Thus, interaction of the observed results is not only possible, but highly probable.However, p-value adjustments, such as the commonly used Bonferroni correction, assume uncorrelated endpoints and are therefore considered inappropriate for the tasks in this work [35].Besides, the aim of this work is not to test whether there is a difference between groups, but to investigate the ability of the measures to risk stratifying patients.

Predictive Value with Standard Parameter Sets
The results for baseline data from a univariate Cox proportional hazards regression using parameters from the literature (see Table 2) are shown in Table 3.No combination of parameter sets or entropy measures from the pre-treatment baseline provided significant hazard ratios for all patients or for subgroups, respectively.Furthermore, the transformation parameter λ of the Box-Cox transformation are stated in Table 3 to indicate the skewness of the data before transformation and to allow an interpretation of the hazard ratio by inversely transforming the data.
Results shown in Table 4 indicate that the situation is different for post-treatment data.This table is quite revealing in several ways.First, unlike in Table 3, ApEn and SampEn each had significant predictive value for all groups if r = r Chon is chosen.CApEn reached very similar results.Second, several further parameter sets are significant predictors of outcome both for all patients and for patients without CABG, especially FuzzyEn and FuzzyMEn with r = 0.2 • σ.Third, using a constant r does not lead to any significant risk stratification.Therefore, the use of r = 0.2 • σ seems especially promising for FuzzyEn and FuzzyMEn, while r = r Chon seems better suited for ApEn, CApEn, and SampEn.In the following, sensitivity and specificity are presented exemplarily for ApEn (r = r Chon ) for post-treatment data.ApEn < 0.011 had a sensitivity of 77% and a specificity of 42% for mortality for all patients (χ 2 = 7.8, p = 0.005) , a sensitivity of 76% and a specificity of 45% for mortality for patients without CABG (χ 2 = 9.5, p = 0.002) and a sensitivity of 69% and a specificity of 49% for mortality for patients without CABG and DM (χ 2 = 5.2, p = 0.023).In Figure 1, survival curves are presented separated according to risk groups.2); for all patients (A), for all patients w/o CABG (B) and w/o CABG and DM (C).

Variation of the Threshold Value
Variation of the threshold value r as a multiple of r Chon , i.e., an approximation of the threshold value maximizing entropy values [24], does not improve the predictive value of the studied entropy measures for baseline data, as can be seen in Figure 2A-C.The significant predictive values of ApEn and SampEn for data after treatment are confirmed for 1.00, 1.25 and 1.50 times r Chon with p-values between 0.002 and 0.049 for all subgroups in Figure 2D-F.In the subgroup of patients without CABG (Figure 2E), ApEn and SampEn provided significant predictive values for all threshold values except 3.00 • r Chon .CApEn performed similar to ApEn and SampEn in the range of 1.00 • r Chon to 1.50 • r Chon .FuzzyEn and FuzzyMEn did not provide any significant results.Variations of the threshold value r as a multiple of σ are displayed in Figure 3 and, for baseline data (Figure 3A-C), yielded similar results as the variation as multiple of r Chon -with one exception, where ApEn had borderline significance (p = 0.056) for the subgroup without CABG and without diabetes mellitus (Figure 3C).In contrast to the results displayed in Figure 2, all entropy measures were statistically significantly associated with outcome for data after treatment for at least one r value for all patients and patients without CABG (Figure 3D,E).Furthermore, FuzzyEn and FuzzyMEn outperformed the other three entropy measures with constantly reaching significant results for the whole range of r for the same groups.However, in contrast to Figure 2, no entropy measure reached significance for data after treatment for patients without CABG and without diabetes mellitus (Figure 3F).Independent variation of r L and r F as a multiple of r Chon did not improve the results for FuzzyMEn, except for the subgroup without CABG for data after treatment (Figure 4E).In this subgroup, the following duplets reached p-values in the range from 0.034 to 0.044: (r L , r F ) ∈ {(1.50, 0.25), (2.00, 0.25), (3.00, 0.25), (3.00, 0.50), (3.00, 1.00), (3.00, 1.25) • r Chon }.Furthermore, despite achieving or not achieving statistical significance, r L = 3.00 and r F = 0.25 appear to be the most suitable selections.
Independent variation of r L and r F as a multiple of σ again confirmed the results of the variation with r L = r F for all data (Figure 5).Again, significance could not be reached for the pre-treatment groups.For post-treatment data, the choice of r L and r F is not critical, since all combinations lead to significant results for all patients and the subgroup patients without CABG (Figure 5D,E).

Variation of the Weighting Factor
Results for variation of the weighting factor(s) for FuzzyEn and FuzzyMEn are shown in the online supplement Figures S1-S4.Overall, it can be stated that the variation of the weighting factor(s) does not lead to significant changes in the results.Significance (p < 0.05) is reached for all n = n L and n F values for all patients and patients without CABG after treatment but not for baseline data and patients without CABG and diabetes mellitus, if r = r L = r F = 0.2 • σ.In contrast, using r = r L = r F = r Chon and variation of the weighting factor(s) does not improve the predictive value of FuzzyEn and FuzzyMEn for almost all cases.Only FuzzyMEn achieved better results with very high n F and n L for all patients after treatment and for patients without CABG after treatment, when r = r L = r F = r Chon is used.However, for the same subgroups, r = r L = r F = 0.2 • σ led to better results with any n F and n L anyway.

Discussion
An interesting finding was that the predictive value of the entropy measures is different for data at baseline and data after treatment.This holds true for parameter sets from the literature and all of the different parameter variations, as well as for the threshold parameter r as a multiple of the standard deviation σ or as a multiple of r Chon .Analysis of baseline data did not lead to significant results for any of the subgroups or any of the parameter combinations.In contrast, entropy measures applied to the after treatment data proved to have significant predictive value, especially for all patients and patients without CABG.This finding was unexpected but can probably be explained by the fact that the encainide/flecainide phase of the Cardiac Arrhythmia Suppression Trial (CAST) was stopped early because of excess mortality and the moricizine arm of the trial, which was not stopped early, had no effect on mortality [36,37].Thus, these measures likely reflect the overall negative effect of treatment on outcomes in the trial.Another possible explanation could be the fact that the subjects for the trial were initially selected as being at elevated risk of mortality based on frequent ventricular premature contractions (VPCs).
Another important finding was that different approaches for choosing the threshold parameter r yield different results.In the literature, there are three common ways of choosing the threshold value r.The first approach uses constant r values, e.g., r ∈ {0.10, 0.15} for SampEn and m = 2 or m = 3, or r ∈ {0.10, 0.15, 0.20, 0.25} for FuzzyEn and m = 1 or m = 2 [18].Choosing such a constant value was questioned, especially for fast dynamic series in [38] and [27].An approach to overcoming these shortcomings is to choose r as a multiple of the standard deviation σ of the time series as suggested by Pincus, i.e., r ∈ [0.1 • σ, 0.25 • σ] [12].A threshold value of r = 0.2 • σ is most commonly reported in the literature [13,15,17,[38][39][40].Another approach for choosing r is to maximize entropy values.A replacement for the computationally expensive maximization is an approximation r Chon suggested by Chon et al. [24].The formula was derived from non-physiologic data and it was shown in [39] that it does not outperform a constant r.The authors cited in [15,17] suggested choosing r L = r F for FuzzyMEn.
In our study, we determined that a constant choice and parameter sets suggested by Zhao et al. [18] without additional parameter variation did not result in significant predictive values.Thus, we were unable to demonstrate that r Chon does not outperform a constant value as shown in [39].The results presented in Figures 2 and 3 do not support superiority of the choice of the threshold parameter as a multiple of the standard deviation σ or as a multiple of r Chon .For after treatment data, one can see a tendency to the former approach, since all five entropy measures show good to acceptable performance, but the latter seems to be appropriate as well, especially for ApEn, CApEn, and SampEn.In general, one can see that the entropy measures behave more stably with respect to changing magnitudes of r Chon compared to the approach depending on the standard deviation, which shows more unstable behavior especially for ApEn, CApEn, and SampEn.The results of this study are consistent with other research which suggest setting r L = r F for FuzzyEn [15,17].
Varying the threshold parameter r showed that, in general, ApEn, CApEn, and SampEn are more sensitive to changes in r than FuzzyEn and FuzzyMEn.This phenomenon can be found for both choices of r, in all subgroups, at baseline and after treatment (Figures 2 and 3).These findings are in line with those of previous studies and reflect the idea of using a fuzzy membership function instead of the Heaviside function to overcome the sensitivity of ApEn, CApEn, and SampEn with respect to the threshold parameter [14].
The choice of the weighting factors for FuzzyEn and FuzzyMEn has been less investigated and reported in the literature.Chen et al. [14] used n = 2 and r = 0.2 • σ for test signals, whereas it is described that for a larger n, closer data points are weighted more strongly.Liu et al. [15] used the weighting factors n L = 3, n F = 2 and r L = r F = 0.2 • σ for heart rate variability analysis.Their choices for n L and n F were given without any motivation.Our previous findings in [17] had suggested the choice of n = n L = 2 and n F = 1 for r Chon according to [24] or n = n L = 1 and n F = 3 for r = 0.2 • σ.
In the current study, as stated in section 3.2.2, the variation of the weighting factors does not lead to significant changes in the results.Thus, it can be assumed that the choice of the weighting factors is less critical than the choice of the threshold parameter in this population.These results and the variation of r support previous research and are in line with the choices of n as suggested in [17].Nevertheless, this aspect of parameter selection needs to be investigated in more detail in future research.
One unanticipated finding was that the predictive power of the entropy measures could not be shown for the smallest subgroup, i.e., patients without coronary artery bypass grafting and without diabetes mellitus, for data after treatment, even when apparent in the two other groups.This result has not previously been described and reported.Stein et al. reported that an inclusion of patients with diabetes mellitus or CABG decreases predictive power of traditional heart rate variability after myocardial infarction [34].A possible explanation for this might be, as both of these groups tend to have lower HRV, that entropy measures capture a feature in patients with low HRV that is not captured by traditional HRV measures.Alternatively, since the primary difference between the smallest subgroup and the others is the absence of diabetes, a relation between diabetes and entropy affecting mortality might be suspected.Nevertheless, this fact needs further attention.
Another interesting finding was that none of the tested entropy measures, regardless of the used parameter sets, were able for predicting mortality for baseline data.This is especially surprising as several traditional HRV measures proved to be reliable predictors in the same dataset in previous studies [6,7,34].Compared to the time domain HRV parameters reported by Stein et al. [7] (see Table 5), one can see for some entropy measures and parameter combinations trends similar to SDNN and Ln SDANN for baseline data (Figure 3 A,B,C) when varying r as a multiple of σ, i.e., no significance for all data (A), approaching the significance level for patients w/o CABG (B) and (borderline) significance for all patients w/o CABG and DM (C).The other time-domain parameters did not predict mortality for baseline data in uni-variate analysis as well [7].On the contrary, all of the entropy measures under investigation proved to be able to predict mortality after treatment.It therefore can be assumed that these complexity measures detect adverse effects of the treatments used in the CAST.Sensitivity and specificity were in a similar range as reported for frequency-domain parameters by Stein et al. [34].
Finally, a number of important limitations of this study need to be considered.First, this study is limited to a relatively small part of the entire 24 h recordings, which was necessary to decrease computation time.Second, the template length was fixed to m = 2 for all calculations, as the number of possible parameter combinations would increase dramatically otherwise.According to the literature, m = 2 seems to be a reasonable choice.Furthermore, parameters were iterated consecutively and not simultaneously to reduce the dimension of the parameter space.In addition, the study did not evaluate the dependency of the entropy measures on age and gender as reported in literature [41,42].Finally, one has to keep in mind that the Cardiac Arrhythmia Suppression Trial tested three antiarrhythmic drugs which led to adverse effects in some patients.Therefore, the results for data after treatment of this study cannot be generalized to patients with different or without treatment.

Figure 1 .
Figure 1.Kaplan-Meier survival curves according to risk groups based on ApEn for post-treatment data (see parameter set No. 1 in Table2); for all patients (A), for all patients w/o CABG (B) and w/o CABG and DM (C).

Figure 2 .
Figure 2. Significance of predictive values of entropy measures for different choices of r (multiples of r Chon ); parameters: m = 2, N = 1200, n = n L = 2, n F = 1; HRV data at baseline (A,B,C) and after treatment (D,E,F); for all patients (A,D), for all patients w/o CABG (B,E) and w/o CABG and DM (C,F).p = 0.05 marks the threshold of statistical significance.

Figure 3 .
Figure 3. Significance of predictive values of entropy measures for different choices of r (multiples of σ); parameters: m = 2, N = 1200, n = n L = 1, n F = 3; HRV data at baseline (A,B,C) and after treatment (D,E,F); for all patients (A,D), for all patients w/o CABG (B,E) and w/o CABG and DM (C,F).p = 0.05 marks the threshold of statistical significance.

Figure 4 .Figure 5 .
Figure 4. Significance of predictive values of FuzzyMEn for different choices of r L and r F (multiples of r Chon ); parameters: m = 2, N = 1200, n = n L = 2, n F = 1; HRV data at baseline (A,B,C) and after treatment (D,E,F); for all patients (A,D), for all patients w/o CABG (B,E) and w/o CABG and DM (C,F).

Table 2 .
Parameter sets based on literature for template length m, data length N, threshold parameter r, the weighting factor(s) n, n L and n F , and the threshold parameter(s) r, r L and r F .