1. Introduction
Congestive heart failure (CHF) is a complex clinical syndrome. Some advanced heart diseases can potentially lead to abnormal changes in cardiac structure or function, which makes ventricular systolic or diastolic dysfunction, and can eventually lead to CHF. The main manifestations of CHF are breathlessness, ankle swelling, fatigue and peripheral edema [
1]. CHF is an important part of the global chronic cardiovascular diseases, and is the final stage of the development of various heart diseases. CHF has high morbidity and high mortality. In 2016, the European Society of Cardiology (ESC) indicated that there were 26 million patients suffering from CHF worldwide, while 3.6 million patients were newly diagnosed annually. Seventeen to forty five percent of CHF patients die within the first year and the remaining die within five years [
2]. CHF poses a serious threat to human health and is a major social problem. Prompt diagnosis and treatment can improve survival in patients with CHF. Therefore, it is important to detect CHF early and accurately assess the severity of CHF.
Once the diagnosis of CHF is established, clinical experts need to assess the severity of CHF in a timely manner, since this assessment allows them to determine the most appropriate treatment to be followed. Nowadays, there are various guidelines in the world for doctors to assess the severity of CHF, and the most widely used one is the New York Heart Association (NYHA) functional classification system [
3]. According to exercise capacity of the patient and the symptomatic status of the disease, the NYHA functional classification system classifies CHF into four classifications: NYHA class I, NYHA class II, NYHA class III and NYHA class IV. The severity of CHF increases from class I to class IV. However, the NYHA functional classification system mainly relies on the statements of patients and the experience of doctors. The diagnostic results are susceptible to the subjectivity and thus inter-observer variability can be introduced. Therefore, it is important to find a convenient, effective and objective solution to help doctors to assess the severity of CHF.
The electrocardiogram (ECG) is a non-invasive examination method, which has the advantages of low cost, good examination effect and fast examination speed. Although the morphology of the ECG signals is different in the four classifications of CHF, it is difficult for doctors to identify the subtle differences with the naked eyes. Furthermore, it is time-consuming for physicians to visually examine the recorded ECG signals. With the development of computer science, many researchers began to use computer methods to analyze ECG signals to classify CHF. Bhurane et al. [
4] extracted five different features from short ECG segments and employed the quadratic support vector machine (SVM) to detect CHF subjects from normal people. They evaluated their algorithm using 10-fold cross-validation, and obtained the accuracy of 99.66%, sensitivity of 99.82% and a specificity of 99.28% across four datasets. Acharya et al. [
5] developed an 11-layer deep convolutional neural network (CNN) to detect CHF using two second ECG segments. The accuracy, sensitivity and specificity are 98.97%, 98.87% and 99.01%, respectively. Heart rate variability (HRV) refers to the variations in the heartbeat intervals. The variations of HRV show indicators of current disease, or warn about impending cardiac disease [
6]. By analyzing HRV signals, researchers can obtain much useful information for the classification of CHF. Melillo et al. [
7] extracted 13 features from long-term HRV signals, and they used the classification and regression tree (CART) to divide CHF into mild CHF (NYHA class I and II) and severe CHF (NYHA class III and IV). Their methods achieved the accuracy of 85.4%, the sensitivity of 93.3% and the specificity of 63.6%. Shahbazi et al. [
8] applied the generalized discriminant analysis to reduce the number of features extracted from long-term HRV signals, and used the k-nearest-neighbor (KNN) classifier to discriminate between mild CHF (NYHA class I and II) and severe CHF (NYHA class III and IV). Qu et al. [
9] extracted HRV features, and used two classifiers to classify CHF into three classes. The SVM achieved an accuracy, sensitivity and specificity of 84.0%, 71.2% and 83.4%, respectively. The CART achieved an accuracy, sensitivity and specificity of 81.4%, 66.5% and 81.6%, respectively. Hua et al. [
10] extracted 34 features from long-term HRV signals, and implemented the sequence forward selection algorithm to reduce the feature dimension. They applied the SVM to distinguish between CHF and normal subjects using the selected five features. Furthermore, they applied the KNN to classify CHF into four levels using the selected four features. Chen et al. [
11] extracted 54 classical measures and 126 dynamic indices from short-term HRV. In addition, they applied the backward elimination to select features. Finally, they applied a multi-stage classification approach to classify CHF into four levels: no risk (normal people), mild risk (NYHA class I and II), moderate risk (NYHA class III) and severe risk (NYHA class III–IV).
The above studies show that computer-aided diagnosis systems play important roles in classifying CHF. Although they all attain good results, there is much work to be addressed. Most of the above studies classify CHF into two or three classes, whereas few studies classify CHF into four classes. Except for Acharya et al. [
5], other researchers all extracted features manually from ECG signals or HRV signals, then used different machine learning approaches to classify the features. Their methods are time-consuming, and the experimental results are susceptible to the extracted features. Since deep learning [
12] models can automatically extract and select features from data and automatically classify the features, they can obtain better classification results than traditional machine learning approaches. Furthermore, it takes less time for deep learning models to obtain the classification results. Hence, in this work, we propose a novel deep learning model called multi-scale residual network (ResNet), which can automatically classify CHF into four classifications according to the NYHA functional classification system.
The rest of this paper is organized as follows. The dataset and methods, including data pre-processing, the architecture of the proposed multi-scale ResNet, the training and testing methods of the deep learning models are described in
Section 2. In
Section 3, the experimental results are presented, and we compare our methods to others. We discuss our work in
Section 4. Finally, we summarize our work and the future work is highlighted in
Section 5.
3. Results
The deep learning models were trained on a personal computer with an Intel Core i5-5200 (2.2 GHz) processor, a NVIDA GeForce 920M Graphics Processing Unit (GPU) and an 8 GB RAM. The deep learning models were developed on the deep learning framework of Keras with Tensorflow as the backend. All the code was written in Python.
To find out the best architecture of the proposed multi-scale RseNet, we designed a multi-scale ResNet-18 with the first four MSRBs and a multi-scale ResNet-26 with the first six MSRBs. Except for the number of convolutional layers, other parameters are the same as those of the multi-scale ResNet-34. We trained and tested all the multi-scale ResNets using our collected dataset.
Figure 6 shows the validation results of the three proposed multi-scale ResNets.
It can be observed from
Figure 6 that starting from the 17th epoch, the validation results of the three models tend to be stable. In addition, the multi-scale ResNet-34 achieves the highest validation accuracy for both sets. During the training process, we have saved the models with the highest validation accuracy. Then, we used the test set to evaluate the performance of the saved models. The evaluation metrics are positive predictive value (PPV), sensitivity (Sen) and accuracy (Acc), which are separately defined in Equations (4)–(6).
where for NYHA class I, we view it as a positive case and other NYHA classes as a negative case. TP represents the number of the patients with NYHA class I that are correctly classified as NYHA class I. TN represents the number of patients without NYHA class I that are correctly classified as other classes. FP represents the number of the patients without NYHA class I that are falsely classified as NYHA class I. FN represents the number of the patients with NYHA class I that are falsely classified to other classes. Other classes are the same as NYHA class I.
Table 4 and
Table 5 separately show the test results of our proposed multi-scale ResNet-34.
It can be seen from
Table 4 and
Table 5 that the highest positive predictive values recorded for both sets are attributed to the detection of NYHA class III and are respectively 94.86% and 94.72%. The highest sensitivities recorded for both sets are attributed to the classification of NYHA class IV and are respectively 94.94% and 95.47%.
The proposed multi-scale ResNet-34 achieves the highest average positive predictive value of 94.16%, average sensitivity of 93.79% and accuracy of 94.29% for Set II. Furthermore, an average positive predictive value of 93.49%, an average sensitivity of 93.44% and an accuracy of 93.60% are obtained for Set I.
We also compared our proposed models to other deep learning models using our collected dataset. The same methods were used to train and test the 11-layer CNN model [
5] and the ResNet-34 model [
22], respectively. The overall average performance of the models is tabulated in
Table 6.
It can be noted from
Table 6 that all the deep learning models can obtain a good model performance in classifying CHF. The highest average positive predictive value, average sensitivity and accuracy for both sets are obtained by our proposed multi-scale ResNet-34. Moreover, for the five deep learning models, Set II (two seconds long ECG segment) achieves better model performance than Set I (five seconds long ECG segment).
We also assessed the doctor performance on the test set. We asked two doctors with three years of clinical experience to classify the heart failure patients based on the patients’ ECG signals and the clinical records. The average performance of the two doctors is shown in
Table 7. It can be noted that our proposed method can outperforms the doctors considering the PPV and accuracy.
4. Discussion
Traditional methods [
7,
8,
9,
10,
11] for classifying CHF have to firstly filter the noise, then manually extract and select features, and finally use different classifiers to classify the selected features. It takes much time for researchers to design handcrafted pre-processing and feature extraction. Furthermore, the experimental results are susceptible to the selected features. Deep learning models can automatically learn from the raw data and merge the feature extraction and the feature classification processes into one step. Therefore, it takes less time for deep learning models to classify CHF, and the classification results are improved. In this work, we proposed a novel deep learning model called multi-scale ResNet-34, which can automatically classify CHF into four classifications according to the NYHA functional classification system. It is unnecessary for the proposed model to manually extract features and select features. Furthermore, the raw ECG segments can be directly input into the proposed model for classification after simple pre-processing. Compared to the ResNet-34, our proposed model can extract multi scale features so that the model can learn more information from data. Hence, the proposed model achieves better performance.
Normally, clinical experts have to analyze a short-duration ECG record instead of an ECG beat to classify CHF. Therefore, in this work, we segmented ECG signals into two different intervals to analyze the effect of ECG length on the classification of CHF. The experimental results demonstrate that the classification of CHF from five second ECG segments can achieve better model performance. Since the long-duration ECG segments contain more information, deep learning models can extract more features from the long-duration ECG segments. Therefore, the model performance is improved.
In addition, in clinical practice, the computer-aided diagnosis systems are developed on the confirmed cases, and are used to diagnose the probable cases. Hence, we use an inter-patient paradigm to divide the dataset in this work so that the data in the training set, validation set and test set come from different patients. The reported results are more realistic in clinical practice.
The main highlights of our work are as follows:
- (1)
We creatively propose a multi-scale ResNet-34 to automatically classify CHF into four classifications according to the NYHA functional classification system.
- (2)
The proposed method can automatically extract different scale features from data and requires little pre-processing.
- (3)
The effect of ECG length on the classification of CHF is analyzed in this work.
- (4)
An inter-patient paradigm is used to divide the dataset. Hence, the reported results are realistic in the clinical environment.
The drawbacks of our work are as follows:
- (1)
Our methods require a lot of data for training.
- (2)
In order to attain better model performance, hyperparameter optimization takes a long time.
5. Conclusions
In this work, we have proposed a novel multi-scale ResNet-34 to classify CHF using ECG segments. The proposed model can automatically extract different scale features from data and requires little pre-processing. In addition, we analyzed the effects of two different intervals of ECG segments on the classification of CHF. Our proposed model achieved an average positive predictive value of 93.49%, an average sensitivity of 93.44% and an accuracy of 93.60% for two seconds of ECG segments. Moreover, we achieved an average positive predictive value of 94.16%, an average sensitivity of 93.79% and an accuracy of 94.29% for five seconds of ECG segments. Furthermore, we compared our proposed models with other deep learning models using our collected dataset. The experimental results showed that our proposed multi-scale ResNet-34 achieved the highest model performance for both sets. We also compared our methods to methods doctors used in hospital. Considering the PPV and accuracy, our proposed method outperforms the doctors. Since this work used an inter-patient paradigm to divide the dataset, the reported results were realistic in the clinical environment. Hence, the proposed method can be used as an auxiliary tool to help doctors classify CHF in clinical practice. Furthermore, the proposed method can be extended to analyze other time-series signals. Our work only used ECG signals to classify CHF. However, in clinical practice, doctors also need to analyze additional physiological indicators to classify CHF. In future, we will combine some physiological indicators with ECG signals to develop a more efficient method to classify CHF.