Multiscale Entropy Analysis with Low-Dimensional Exhaustive Search for Detecting Heart Failure

: Multiscale entropy (MSE) is widely used to analyze heartbeat signals. Even though cardiologists do not use MSE to diagnose heart failure at present, these studies are of importance and have potential clinical applications. In previous studies, MSE discrimination between old congestive heart failure (CHF) and healthy individuals has remained controversial. Few studies have been published on the discrimination between them, using only MSE with machine learning for automatic multidimensional analysis, with reported testing accuracies of less than 86%. In this study, we determined the optimal MSE scales for discrimination by using a low-dimensional exhaustive search along with three classiﬁers—linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbor (KNN). In younger people ( < 55 years), the results showed an accuracy of up to 95.5% with two optimal MSE scales (2D) and up to 97.7% with four optimal MSE scales (4D) in discriminating between young CHF and healthy participants. In older people ( ≥ 55 years), the discrimination accuracy reached 90.1% using LDA in 2D, SVM in 3D (three optimal MSE scales), and KNN in 5D (ﬁve optimal MSE scales). LDA with a 3D exhaustive search also achieved 94.4% accuracy in older people. Therefore, the results indicate that MSE analysis can di ﬀ erentiate between CHF and healthy individuals of any age.


Introduction
Heart disease is common and is associated with high mortality, as the end phase of many cardiovascular diseases is heart failure [1,2]. Heart failure in the elderly has a prevalence greater than 10% [3] and is a frequent cause of hospitalization [4]; the annual mortality rate of heart failure is 10% as well [5,6]. In addition, 30% to 40% of patients die within one year of being diagnosed with heart failure [7], and conditional four-year survivability is less than 50% [8]. Physiologically, heart failure is associated with dysfunction in the autonomic nervous system [9,10]. Autonomic activity is traditionally evaluated by heart rate variability (HRV) [11][12][13].
HRV is the physiological phenomenon of variation in the time interval between two consecutive heartbeats. The beat-to-beat interval is usually determined by the time interval between two consecutive R-peaks observed in an electrocardiogram (ECG), which is knowns as an RR interval. RR intervals between normal heartbeats, which excludes missed beat detections and premature beats, are referred to as NN intervals. RR interval and NN interval time series are two common heart rate signals for evaluating HRV.
HRV is analyzed from three perspectives-time domain, frequency domain, and nonlinear methods. All of these methods, however, characterize HRV based on a single temporal scale, which can result in a lack of data on features that can only be observed using alternate temporal scales. Figure 1 shows the flowchart of the analytical procedure used in this study. For discriminations between younger/older normal sinus rhythm (NSR) and congestive heart failure (CHF) participants, their preprocessed NN interval time series were analyzed by an MSE algorithm with r 0.10/0.15/0.20 and scales from 1 to 20 to extract 20 features. These 20 features were then evaluated by a low-dimensional exhaustive search to determine the optimal combinations of two, three, four, or five features. With respect to the 2D exhaustive search, for example, an unexamined combination of two features was selected and the two features were used for discrimination using the LDA/SVM/KNN classifier with leave-one-out cross validation. This step was repeated to evaluate all combinations of two features. Afterward, combinations that led to overtraining were excluded, and then the optimal combinations of two features yielding the highest testing accuracy were determined from the rest.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 11 combination of two features was selected and the two features were used for discrimination using the LDA/SVM/KNN classifier with leave-one-out cross validation. This step was repeated to evaluate all combinations of two features. Afterward, combinations that led to overtraining were excluded, and then the optimal combinations of two features yielding the highest testing accuracy were determined from the rest.

Multiscale Entropy
MSE was used to extract features of heart rate signals (NN interval time series) in this study. MSE is the sample entropy from a coarse-grained time series [14,15]. Conceptually, sample entropy evaluates the difference between the number of repeating patterns of m length and the number of repeating patterns of m + 1 length [13]. The coarse-grain process resamples the original time series at a temporal scale  in two steps. First, the original time series is equally divided into segments, each with a length of . Second, each segment is averaged, and these means form the coarse-grained time series at scale . In this study, MSE was calculated with an m of 2 [51] on scales ranging from 1 to 20.
To count the number of repeating patterns, the similarity criterion of two data points was defined as r  SD, with SD being the standard deviation of the original time series. Similarity factor r usually varies from 0.10 to 0.20 [15] and has three suggested values, 0.10 [16,47,49,52], 0.15 [14][15][16]47,51,52], and 0.20 [25,53]; these three r values were thus employed in this study.
The number of data points in the time series for MSE analysis depends upon the value of m. To obtain a reliable MSE evaluation, this number of data points must be at least 10 m , and preferably at least 30 m [51]; for this study, 10,000, 20,000, and 40,000 data points were considered.

Multiscale Entropy
MSE was used to extract features of heart rate signals (NN interval time series) in this study. MSE is the sample entropy from a coarse-grained time series [14,15]. Conceptually, sample entropy evaluates the difference between the number of repeating patterns of m length and the number of repeating patterns of m + 1 length [13]. The coarse-grain process resamples the original time series at a temporal scale τ in two steps. First, the original time series is equally divided into segments, each with a length of τ. Second, each segment is averaged, and these means form the coarse-grained time series at scale τ. In this study, MSE was calculated with an m of 2 [51] on scales ranging from 1 to 20.
To count the number of repeating patterns, the similarity criterion of two data points was defined as r × SD, with SD being the standard deviation of the original time series. Similarity factor r usually varies from 0.10 to 0.20 [15] and has three suggested values, 0.10 [16,47,49,52], 0.15 [14][15][16]47,51,52], and 0.20 [25,53]; these three r values were thus employed in this study.
The number of data points in the time series for MSE analysis depends upon the value of m. To obtain a reliable MSE evaluation, this number of data points must be at least 10 m , and preferably at least 30 m [51]; for this study, 10,000, 20,000, and 40,000 data points were considered.

Data
The NN interval time series were taken from the following databases on PhysioNet [54]: nsrdb, MIT-BIH Normal Sinus Rhythm Database, five men aged 26 to 45 years and 13 women aged 20 to 50 years; nsr2db, Normal Sinus Rhythm RR Interval Database, 30 men aged 29 to 76 years and 24 women aged 58 to 73 years; chfdb, BIDMC Congestive Heart Failure Database, 11 men aged 22 to 71 years and four women aged 54 to 63 years; and chf2db, Congestive Heart Failure RR Interval Database, 29 patients aged 34 to 79 years including eight men, two women, and 19 patients of unknown gender.
These NN interval time series were categorized into four groups as follows: the young NSR group, 26 healthy participants younger than 55 years from nsrdb and nsr2db; the old NSR group, 46 healthy participants of ages equal to or older than 55 years from nsr2db; the young CHF group, 18 patients younger than 55 years from chfdb and chf2db; and the old CHF group, 25 patients equal with an age to or older than 55 years from chfdb and chf2db. One patient of unknown age in chfdb was excluded. All NN interval time series were preprocessed to remove artifacts [47]. Three sets of data points, numbering 10,000, 20,000, and 40,000, were used as described above.

Machine Learning
Three machine learning classifiers, LDA, SVM, and KNN, were applied to automatically discriminate between young CHF and healthy participants, and old CHF and healthy participants. In the SVM algorithm, the radial basis function (rbf) kernel was used. For the KNN classifier, the optimized number of neighbors was three in this study. The inputs for the three classifiers were the two, three, four, and five features out of 20 of the preprocessed NN interval time series, extracted by the MSE algorithm for 2D, 3D, 4D, and 5D exhaustive searches, respectively.
The optimal MSE scales were determined for each classifier by means of exhaustive search in two, three, four, or five dimensions (denoted as 2D, 3D, 4D, and 5D, respectively) among 20 (20 MSE scales). Optimization was examined via leave-one-out cross validation, which is suited for small sample sizes. To reduce overtraining effects, an overtraining tolerance of 5% was chosen, defined as the absolute difference between the training and testing accuracies.

Performance Metrics
The performance of MSE analysis with machine learning with few dimensions was evaluated using five traditional indices: accuracy (Acc), sensitivity (Sen), specificity (Spe), positive predictive value (PPV), and negative predictive value (NPV). According to [55,56], the correct classification rate for class k, CCR k , is defined as: where T k is the number of correct identifications of class k and N k is the total number of participants of class k. Thus, Acc, Sen, and Spe can be separately defined by: PPV and NPV are separately determined by:  Figure 2 shows that the average MSE values at different scales in the NN interval time series of young NSR participants (black circles) deviated from those of the other three groups (old NSR, young CHF, and old CHF participants). This suggests that discrimination between NSR and CHF is relatively easy in younger individuals (<55 years), and becomes more difficult as they age (≥55 years). For a fixed r value and scale τ, MSE values with different data sample sizes are similar to one another. Conversely, for the same data sample size and scale τ, MSE values decrease as r increases.  Figure 2 shows that the average MSE values at different scales in the NN interval time series of young NSR participants (black circles) deviated from those of the other three groups (old NSR, young CHF, and old CHF participants). This suggests that discrimination between NSR and CHF is relatively easy in younger individuals (<55 years), and becomes more difficult as they age (55 years). For a fixed r value and scale , MSE values with different data sample sizes are similar to one another. Conversely, for the same data sample size and scale , MSE values decrease as r increases.  Table 1 shows the discrimination accuracies between young NSR and CHF participants using LDA, SVM, and KNN. Among 20 MSE scales, the optimal combination of two, three, four, or five MSE scales was determined by means of an exhaustive search. LDA appeared to be superior to SVM and KNN, with an accuracy of 95.5% with only two MSE scales (2D), an r of 0.10, and a data sample size of 40,000. For the same data sample size and classifier, accuracy reached 97.7% with four MSE scales (4D) and an r of 0.10 or 0.20.  Table 1 shows the discrimination accuracies between young NSR and CHF participants using LDA, SVM, and KNN. Among 20 MSE scales, the optimal combination of two, three, four, or five MSE scales was determined by means of an exhaustive search. LDA appeared to be superior to SVM and KNN, with an accuracy of 95.5% with only two MSE scales (2D), an r of 0.10, and a data sample size of 40,000. For the same data sample size and classifier, accuracy reached 97.7% with four MSE scales (4D) and an r of 0.10 or 0.20. Table 1. Discrimination accuracy between young NSR and CHF participants using linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbor (KNN). Note: SD, standard deviation; 2D, two optimal MSE scales; 3D, three optimal MSE scales; 4D, four optimal MSE scales; 5D, five optimal MSE scales.  For data sample sizes of 40,000, 20,000, and 10,000, the highest accuracies were 97.7, 90.9, and 93.2%, respectively. These accuracies were also achieved using an r of 0.2 and 4D exhaustive search with the LDA classifier, as indicated in the footnotes of Table 1. The 4D combinations are listed in Table 2, along with their Acc, Sen, Spe, PPV, and NPV. The combination of MSE scales, {4, 5, 6, 9}, yielded 100% Spe and PPV for a data sample size 40,000. Table 2. List of the combinations of the optimal four MSE (r = 0.20) scales for discrimination between young NSR and CHF participants using LDA, as well as their corresponding accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Note: Acc, accuracy; Sen, sensitivity; Spe, specificity; PPV, positive predictive value; NPV, negative predictive value.  Table 3 shows the discrimination accuracies between NSR and CHF in older individuals under various conditions. Even though average MSE values from the NN interval time series for older NSR and CHF participants did not appear different from one another (Figure 2), accuracy reached 90.1% Appl. Sci. 2019, 9, 3496 7 of 11 using LDA in 2D, SVM in 3D, and KNN in 5D. Overtraining occurred for eight conditions, labeled NA in Table 3, and the results showed marked overtraining for KNN. This finding suggests that the KNN classifier combined with MSE analysis is not appropriate for differentiating between older CHF and NSR participants with fewer dimensions. Table 3. Discrimination accuracy between older NSR and CHF participants using LDA, SVM, and KNN.  On average, LDA showed better accuracy than SVM or KNN. LDA accuracies were 94.4, 90.1, and 93.0% for data sample sizes of 40,000, 20,000, and 10,000, respectively. The utilization of an r of 0.15 and a 3D exhaustive search also yielded the highest accuracies. Combinations of the three MSE scales (3D) and their performances are listed in Table 4. Moreover, the combinations of {3, 6, 12} for 40,000 points, {2, 6, 9} for 20,000 points, and {1, 9, 18} for 10,000 points were associated with 100% Spe and PPV. Table 4. List of combinations of three optimal MSE (r = 0.15) scales for the discrimination between older NSR and CHF participants using LDA, as well as the corresponding accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.

Discussion
Applying MSE analysis to differentiate between older CHF and healthy participants has been controversial. Some reports have indicated that the five and six MSE scales are significantly different between the CHF and NSR groups [47,49]. Others have reported that older CHF and NSR participants cannot be differentiated by MSE alone [43][44][45][46]. Our study shows that the {6} MSE scale is important because it is frequently involved, as shown in Tables 2 and 4; however, a single MSE scale is not sufficient for discrimination. This study indicates that a combination of three or four MSE scales (Tables 2 and 4) is preferred.
Regarding MSE analysis, one difficulty is the pre-determination of the similarity factor r. Three r values-0.10 [16,47,49,52], 0.15 [14][15][16]47,51,52], and 0.20 [25,53]-have been widely used. The results of this study suggest that age is critical to r. According to Tables 1 and 3, we suggest values of 0.15 and 0.20 for discrimination in older and younger participants, respectively. This should be confirmed with larger datasets in the future.
Many methods have been used for feature selection in preprocessing multidimensional HRV metrics, such as sequential forward feature selection [48,57], sequential backward feature selection [42], and genetic algorithms [39]. Up until now, the use of an exhaustive search has been ignored because it is extremely time-consuming in higher dimensions. We simplified it and used it for low-dimensional exhaustive searches, where it demonstrated excellent utility in MSE analysis for the discrimination between older CHF and healthy participants.

Conclusions
Machine learning has been widely used in analyzing multidimensional HRV metrics to detect heart failure. The use of MSE for the discrimination between heart failure and healthy hearts in older people remains controversial. No study has reported a testing accuracy greater than 86% using MSE alone. This study illustrates that a low-dimensional exhaustive search improves the determination between CHF and NSR individuals using MSE analysis. Accuracies were found to be greater than 90% within 5% overtraining tolerance. For discrimination in younger individuals (<55 years), MSE analysis with an r of 0.20 combined with LDA in a 4D exhaustive search is suggested. This yielded accuracies of 97.7, 90.9, and 93.2% for data sample sizes of 40,000, 20,000, and 10,000, respectively. For discrimination in older participants (≥55 years), MSE analysis with an r of 0.15 along with LDA in a 3D exhaustive search is suggested. These accuracies were 94.4, 90.1, and 93.0% for data sample sizes of 40,000, 20,000, and 10,000, respectively. The results indicate that MSE analysis could extract important features hidden in the heart rate signals of healthy and heart failure participants. Three or four features are also sufficient for discrimination, with a testing accuracy greater than 90%. Therefore, MSE analysis combined with machine learning and a low-dimensional exhaustive search can differentiate between CHF and healthy individuals of any age.
Practically, cardiologists do not use HRV metrics to diagnose CHF at present. Previous research and this work aim to improve HRV analysis for clinical application. With respect to MSE analysis, this work uses the low-dimensional exhaustive search to determine the optimal three and four features, which yielded the highest testing accuracy compared to previous studies. The analysis algorithm of this work and the optimal features can be applied to diagnostic engines for further clinical research and supporting cardiologists.