Classiﬁcation of Congestive Heart Failure from ECG Segments with a Multi-Scale Residual Network

: Congestive heart failure (CHF) poses a serious threat to human health. Once the diagnosis of CHF is established, clinical experts need to assess the severity of CHF in a timely manner. It is proved that electrocardiogram (ECG) signals are useful for assessing the severity of CHF. However, since the ECG perturbations are subtle, it is di ﬃ cult for doctors to detect the di ﬀ erences of ECGs. In order to help doctors to make an accurate diagnosis, we proposed a novel multi-scale residual network (ResNet) to automatically classify CHF into four classiﬁcations according to the New York Heart Association (NYHA) functional classiﬁcation system. Furthermore, in order to make the reported results more realistic, we used an inter-patient paradigm to divide the dataset, and segmented the ECG signals into two di ﬀ erent intervals. The experimental results show that the proposed multi-scale ResNet-34 has achieved an average positive predictive value, sensitivity and accuracy of 93.49%, 93.44% and 93.60% respectively for two seconds of ECG segments. We have also obtained an average positive predictive value, sensitivity and accuracy of 94.16%, 93.79% and 94.29% respectively for ﬁve seconds of ECG segments. The proposed method can be used as an auxiliary tool to help doctors to classify CHF.


Introduction
Congestive heart failure (CHF) is a complex clinical syndrome. Some advanced heart diseases can potentially lead to abnormal changes in cardiac structure or function, which makes ventricular systolic or diastolic dysfunction, and can eventually lead to CHF. The main manifestations of CHF are breathlessness, ankle swelling, fatigue and peripheral edema [1]. CHF is an important part of the global chronic cardiovascular diseases, and is the final stage of the development of various heart diseases. CHF has high morbidity and high mortality. In 2016, the European Society of Cardiology (ESC) indicated that there were 26 million patients suffering from CHF worldwide, while 3.6 million patients were newly diagnosed annually. Seventeen to forty five percent of CHF patients die within the first year and the remaining die within five years [2]. CHF poses a serious threat to human health and is a major social problem. Prompt diagnosis and treatment can improve survival in patients with CHF. Therefore, it is important to detect CHF early and accurately assess the severity of CHF.
Once the diagnosis of CHF is established, clinical experts need to assess the severity of CHF in a timely manner, since this assessment allows them to determine the most appropriate treatment to be followed. Nowadays, there are various guidelines in the world for doctors to assess the severity of CHF, and the most widely used one is the New York Heart Association (NYHA) functional classification The rest of this paper is organized as follows. The dataset and methods, including data pre-processing, the architecture of the proposed multi-scale ResNet, the training and testing methods of the deep learning models are described in Section 2. In Section 3, the experimental results are presented, and we compare our methods to others. We discuss our work in Section 4. Finally, we summarize our work and the future work is highlighted in Section 5.

Data Used
The data used in this work were obtained from Shanxi Bethune Hospital. We have collected a dataset of 764 ECG records from 764 patients suffering from CHF. The collected data are all fully deidentified. The ECG signals are sampled at a frequency of 250 Hz, and are collected from lead II. Each ECG signal is 5 min long. Moreover, according to the NYHA functional classification system, clinical experts have annotated each patient as a different classification of CHF. Figure 1 shows the typical ECG segments of the four different NYHA classifications. Table 1 shows an overview of the data used in this work. The rest of this paper is organized as follows. The dataset and methods, including data preprocessing, the architecture of the proposed multi-scale ResNet, the training and testing methods of the deep learning models are described in Section 2. In Section 3, the experimental results are presented, and we compare our methods to others. We discuss our work in Section 4. Finally, we summarize our work and the future work is highlighted in Section 5.

Data Used
The data used in this work were obtained from Shanxi Bethune Hospital. We have collected a dataset of 764 ECG records from 764 patients suffering from CHF. The collected data are all fully deidentified. The ECG signals are sampled at a frequency of 250 Hz, and are collected from lead II. Each ECG signal is 5 min long. Moreover, according to the NYHA functional classification system, clinical experts have annotated each patient as a different classification of CHF. Figure 1 shows the typical ECG segments of the four different NYHA classifications. Table 1 shows an overview of the data used in this work.

Pre-Processing
Normally, there are two paradigms to divide the dataset. One is called intra-patient paradigm and the other is called inter-patient paradigm [13]. If using the intra-patient paradigm to divide the dataset, training and test set may contain ECG segments from the same patient. During the training process, the classification model has learned the specialties of particular patient. If the test set also contains the ECG segments from the same patient, the reported results will be biased. In contrast,

Pre-Processing
Normally, there are two paradigms to divide the dataset. One is called intra-patient paradigm and the other is called inter-patient paradigm [13]. If using the intra-patient paradigm to divide the dataset, training and test set may contain ECG segments from the same patient. During the training process, the classification model has learned the specialties of particular patient. If the test set also contains the ECG segments from the same patient, the reported results will be biased. In contrast, using an inter-patient paradigm to divide the dataset means that all ECG segments from one subject are either in training or are a test set. Hence, the reported results are more realistic in clinical practice. In this work, we use an inter-patient paradigm to divide the dataset.
First, we randomly select 20% of the CHF patients as the test set according to the distribution of patients. Then, we use 80% of the remaining data as the training set and 20% of the remaining data as the validation set. Figure 2 details the data distribution for the training set, validation set and test set. using an inter-patient paradigm to divide the dataset means that all ECG segments from one subject are either in training or are a test set. Hence, the reported results are more realistic in clinical practice.
In this work, we use an inter-patient paradigm to divide the dataset. First, we randomly select 20% of the CHF patients as the test set according to the distribution of patients. Then, we use 80% of the remaining data as the training set and 20% of the remaining data as the validation set. Figure 2 details the data distribution for the training set, validation set and test set. We use the training set to train the deep learning models. According to the validation set results, we save the model with the highest validation accuracy. The model performance is evaluated using the test set.
After dividing the patients, we segment the ECG signal of each patient into 2 s (Set I) and 5 s (Set II) without R-peak detection and denoising. Dividing the dataset according to the distribution of patients ensures that the ECG segments in the training and test set come from different patients. Table  2 details the distribution of the ECG segments used in this work. There are a total of 114,600 ECG segments in Set I and 45,840 ECG segments in Set II. Finally, in order to facilitate the subsequent data processing and to speed up training convergence, we apply min-max normalization (Equation (1)) to normalize the raw ECG segments. The amplitude of each ECG segment is normalized to the range of [0,1].
where ' x represents the normalized ECG segments, min( ) x and max( ) x are the minimum and maximum amplitudes of the raw ECG segments, respectively.  is set to 0.0001 to prevent having a denominator equal to 0. We use the training set to train the deep learning models. According to the validation set results, we save the model with the highest validation accuracy. The model performance is evaluated using the test set.
After dividing the patients, we segment the ECG signal of each patient into 2 s (Set I) and 5 s (Set II) without R-peak detection and denoising. Dividing the dataset according to the distribution of patients ensures that the ECG segments in the training and test set come from different patients. Table 2 details the distribution of the ECG segments used in this work. There are a total of 114,600 ECG segments in Set I and 45,840 ECG segments in Set II. Finally, in order to facilitate the subsequent data processing and to speed up training convergence, we apply min-max normalization (Equation (1)) to normalize the raw ECG segments. The amplitude of each ECG segment is normalized to the range of [0,1].
where x represents the normalized ECG segments, min(x) and max(x) are the minimum and maximum amplitudes of the raw ECG segments, respectively. ε is set to 0.0001 to prevent having a denominator equal to 0.

ResNet
The depth of the CNN affects the model performance. Therefore, various CNN architectures such as AlexNet [14] and VGGNet [15], improve their performance by making the architectures as deep as possible. However, with the increase in the depth of CNN, the model performance tends to be saturated, and even degrades rapidly. As a result, it is more difficult to train the deep networks. In order to solve the problem, He et al. [16] proposed the ResNet architecture. Figure 3 shows the architecture of the ResNet. Firstly, the data are directly input into a convolutional layer. After the convolutional layer, there is a max pooling layer to reduce the size of the feature map. The main body of the ResNet consists of four blocks, and each block consists of several repeating residual blocks. Except for the number of filters in the convolutional layers, the architecture of the four blocks is similar. In particular, the convolutional layers have the same number of filters in the same block. In addition, each block will halve the feature map of the inputs while the number of filters in the convolutional layers is doubled. After the last block, there is a global average pooling layer and a fully connected layer. The outputs of the ResNet can be obtained by the fully connected layer.  [14] and VGGNet [15], improve their performance by making the architectures as deep as possible. However, with the increase in the depth of CNN, the model performance tends to be saturated, and even degrades rapidly. As a result, it is more difficult to train the deep networks. In order to solve the problem, He et al. [16] proposed the ResNet architecture. Figure 3 shows the architecture of the ResNet. Firstly, the data are directly input into a convolutional layer. After the convolutional layer, there is a max pooling layer to reduce the size of the feature map. The main body of the ResNet consists of four blocks, and each block consists of several repeating residual blocks. Except for the number of filters in the convolutional layers, the architecture of the four blocks is similar. In particular, the convolutional layers have the same number of filters in the same block. In addition, each block will halve the feature map of the inputs while the number of filters in the convolutional layers is doubled. After the last block, there is a global average pooling layer and a fully connected layer. The outputs of the ResNet can be obtained by the fully connected layer.   Figure 4a shows the architecture of the residual block proposed by He et al [16]. The output of the residual block can be expressed as Equation (2).
where y represents the output of the residual block, ReLU represents the rectified linear unit activation [14] and x represents the input of the residual block. The function () f  contains two convolutional layers, two batch normalization layers [17] and a ReLU activation. The residual block uses the shortcut connection to add the input of the residual block to the output of the second batch normalization layer, which allows information to propagate well. Benefiting from the architecture of the residual block, the ResNet can optimize training in very deep neural networks.  Figure 4a shows the architecture of the residual block proposed by He et al [16]. The output of the residual block can be expressed as Equation (2).
where y represents the output of the residual block, ReLU represents the rectified linear unit activation [14] and x represents the input of the residual block. The function f (·) contains two convolutional layers, two batch normalization layers [17] and a ReLU activation. The residual block uses the shortcut connection to add the input of the residual block to the output of the second batch normalization layer, which allows information to propagate well. Benefiting from the architecture of the residual block, the ResNet can optimize training in very deep neural networks.

Multi-Scale Residual Block
Based on the residual block, we propose a novel multi-scale residual block (MSRB). Unlike the original residual block (Figure 4a), the input of our proposed MSRB is sent to two channels with different convolution kernel sizes for feature extraction. The architecture of each channel is the same as the original residual block. Different scale features can be extracted from the two channels. The output of the proposed architecture is a combination of the two channels with their output filter banks concatenated into a single output vector. Additionally, in the proposed architecture, zero-padding is necessary to maintain the size of the feature map. Figure 4b shows the architecture of our proposed MSRB. The output of the MSRB can be expressed as Equation (3).

ReLU f x x y
ReLU f x x where function 1 () f  and 2 () f  represent the architectures of the two channels, respectively. Other parameters of Equation (3) are the same as those of Equation (2).

Multi-Scale Residual Block
Based on the residual block, we propose a novel multi-scale residual block (MSRB). Unlike the original residual block (Figure 4a), the input of our proposed MSRB is sent to two channels with different convolution kernel sizes for feature extraction. The architecture of each channel is the same as the original residual block. Different scale features can be extracted from the two channels. The output of the proposed architecture is a combination of the two channels with their output filter banks concatenated into a single output vector. Additionally, in the proposed architecture, zero-padding is necessary to maintain the size of the feature map. Figure 4b shows the architecture of our proposed MSRB. The output of the MSRB can be expressed as Equation (3).
where function f 1 (·) and f 2 (·) represent the architectures of the two channels, respectively. Other parameters of Equation (3) are the same as those of Equation (2). Figure 5 illustrates the architecture of our proposed multi-scale ResNet-34. The input of our proposed model is the ECG segments. After the input layer, there is a convolutional layer, a batch normalization layer, a ReLU activation and a max pooling layer. The main body of our proposed model consists of eight MSRBs. Except for the convolution kernel size and the number of filters in the convolutional layers, all the architectures of the MSRBs are the same. After every two MSRBs, there is a max pooling layer to halve the length of the feature map. After the last MSRB, we apply a global average pooling layer, a dropout layer [18] and a fully connected layer. Finally, there is a Softmax layer to classify CHF into four classifications according to the NYHA functional classification system. layer to classify CHF into four classifications according to the NYHA functional classification system. Table 3 details the parameter of the proposed model.

Training and Testing
Since there is not a pre-trained dataset as large as the ImageNet dataset [19], we have to train the proposed model from scratch. He initialization [20] is used to initialize the weights of each convolutional layer. A total of 64 ECG segments are input into the proposed model at a time. We use cross-entropy loss function to evaluate the model loss. We apply Adam [21] optimizer to optimize parameters and set the learning rate to 0.001. The learning rate will multiply 0.1 if the validation loss does not present any improvement for five consecutive epochs. The model will stop training if the validation loss does not present any improvement for ten consecutive epochs.
After training an epoch, the validation results are evaluated using the trained model. According to the validation results, we saved the model with the highest validation accuracy. Finally, the model performance is evaluated using the saved model.

Results
The deep learning models were trained on a personal computer with an Intel Core i5-5200 (2.2 GHz) processor, a NVIDA GeForce 920M Graphics Processing Unit (GPU) and an 8 GB RAM. The deep learning models were developed on the deep learning framework of Keras with Tensorflow as the backend. All the code was written in Python.
To find out the best architecture of the proposed multi-scale RseNet, we designed a multi-scale ResNet-18 with the first four MSRBs and a multi-scale ResNet-26 with the first six MSRBs. Except for the number of convolutional layers, other parameters are the same as those of the multi-scale ResNet-34. We trained and tested all the multi-scale ResNets using our collected dataset. Figure 6 shows the validation results of the three proposed multi-scale ResNets. It can be observed from Figure 6 that starting from the 17th epoch, the validation results of the three models tend to be stable. In addition, the multi-scale ResNet-34 achieves the highest validation accuracy for both sets. During the training process, we have saved the models with the highest validation accuracy. Then, we used the test set to evaluate the performance of the saved models. The evaluation metrics are positive predictive value (PPV), sensitivity (Sen) and accuracy (Acc), which are separately defined in Equations (4)- (6).   It can be observed from Figure 6 that starting from the 17th epoch, the validation results of the three models tend to be stable. In addition, the multi-scale ResNet-34 achieves the highest validation accuracy for both sets. During the training process, we have saved the models with the highest validation accuracy. Then, we used the test set to evaluate the performance of the saved models. The evaluation metrics are positive predictive value (PPV), sensitivity (Sen) and accuracy (Acc), which are separately defined in Equations (4)- (6).
where for NYHA class I, we view it as a positive case and other NYHA classes as a negative case. TP represents the number of the patients with NYHA class I that are correctly classified as NYHA class I. TN represents the number of patients without NYHA class I that are correctly classified as other classes. FP represents the number of the patients without NYHA class I that are falsely classified as NYHA class I. FN represents the number of the patients with NYHA class I that are falsely classified to other classes. Other classes are the same as NYHA class I. Tables 4 and 5 separately show the test results of our proposed multi-scale ResNet-34.  We also compared our proposed models to other deep learning models using our collected dataset. The same methods were used to train and test the 11-layer CNN model [5] and the ResNet-34 model [22], respectively. The overall average performance of the models is tabulated in Table 6.
It can be noted from Table 6 that all the deep learning models can obtain a good model performance in classifying CHF. The highest average positive predictive value, average sensitivity and accuracy for both sets are obtained by our proposed multi-scale ResNet-34. Moreover, for the five deep learning models, Set II (two seconds long ECG segment) achieves better model performance than Set I (five seconds long ECG segment).
We also assessed the doctor performance on the test set. We asked two doctors with three years of clinical experience to classify the heart failure patients based on the patients' ECG signals and the clinical records. The average performance of the two doctors is shown in Table 7. It can be noted that our proposed method can outperforms the doctors considering the PPV and accuracy.

Discussion
Traditional methods [7][8][9][10][11] for classifying CHF have to firstly filter the noise, then manually extract and select features, and finally use different classifiers to classify the selected features. It takes much time for researchers to design handcrafted pre-processing and feature extraction. Furthermore, the experimental results are susceptible to the selected features. Deep learning models can automatically learn from the raw data and merge the feature extraction and the feature classification processes into one step. Therefore, it takes less time for deep learning models to classify CHF, and the classification results are improved. In this work, we proposed a novel deep learning model called multi-scale ResNet-34, which can automatically classify CHF into four classifications according to the NYHA functional classification system. It is unnecessary for the proposed model to manually extract features and select features. Furthermore, the raw ECG segments can be directly input into the proposed model for classification after simple pre-processing. Compared to the ResNet-34, our proposed model can extract multi scale features so that the model can learn more information from data. Hence, the proposed model achieves better performance.
Normally, clinical experts have to analyze a short-duration ECG record instead of an ECG beat to classify CHF. Therefore, in this work, we segmented ECG signals into two different intervals to analyze the effect of ECG length on the classification of CHF. The experimental results demonstrate that the classification of CHF from five second ECG segments can achieve better model performance. Since the long-duration ECG segments contain more information, deep learning models can extract more features from the long-duration ECG segments. Therefore, the model performance is improved.
In addition, in clinical practice, the computer-aided diagnosis systems are developed on the confirmed cases, and are used to diagnose the probable cases. Hence, we use an inter-patient paradigm to divide the dataset in this work so that the data in the training set, validation set and test set come from different patients. The reported results are more realistic in clinical practice.
The main highlights of our work are as follows: (1) We creatively propose a multi-scale ResNet-34 to automatically classify CHF into four classifications according to the NYHA functional classification system. (2) The proposed method can automatically extract different scale features from data and requires little pre-processing. (3) The effect of ECG length on the classification of CHF is analyzed in this work. (4) An inter-patient paradigm is used to divide the dataset. Hence, the reported results are realistic in the clinical environment.
The drawbacks of our work are as follows: (1) Our methods require a lot of data for training.
(2) In order to attain better model performance, hyperparameter optimization takes a long time.

Conclusions
In this work, we have proposed a novel multi-scale ResNet-34 to classify CHF using ECG segments. The proposed model can automatically extract different scale features from data and requires little pre-processing. In addition, we analyzed the effects of two different intervals of ECG segments on the classification of CHF. Our proposed model achieved an average positive predictive value of 93.49%, an average sensitivity of 93.44% and an accuracy of 93.60% for two seconds of ECG segments. Moreover, we achieved an average positive predictive value of 94.16%, an average sensitivity of 93.79% and an accuracy of 94.29% for five seconds of ECG segments. Furthermore, we compared our proposed models with other deep learning models using our collected dataset. The experimental results showed that our proposed multi-scale ResNet-34 achieved the highest model performance for both sets. We also compared our methods to methods doctors used in hospital. Considering the PPV and accuracy, our proposed method outperforms the doctors. Since this work used an inter-patient paradigm to divide the dataset, the reported results were realistic in the clinical environment. Hence, the proposed method can be used as an auxiliary tool to help doctors classify CHF in clinical practice. Furthermore, the proposed method can be extended to analyze other time-series signals. Our work only used ECG signals to classify CHF. However, in clinical practice, doctors also need to analyze additional physiological indicators to classify CHF. In future, we will combine some physiological indicators with ECG signals to develop a more efficient method to classify CHF.