k -Labelsets Method for Multi-Label ECG Signal Classiﬁcation Based on SE-ResNet

: Cardiovascular diseases are the leading cause of death globally. The ECG is the most commonly used tool for diagnosing cardiovascular diseases, and, recently, there are a number of attempts to use deep learning to analyze ECG. In this study, we propose a method for performing multi-label classiﬁcation on standard ECG (12-lead with duration of 10 s) data. We used the ResNet model that can perform residual learning as a base model for classiﬁcation in this work, and we tried to improve performance through SE-ResNet, which added squeeze and excitation blocks on the plain ResNet. As a result of the experiment, it was possible to induce overall performance improvement through squeeze and excitation blocks. In addition, the random k -labelsets (RAKEL) algorithm was applied to improve the performance in multi-label classiﬁcation problems. As a result, the model that applied soft voting through the RAKEL algorithm to SE-ResNet-34 represented the best performance, and the average performances according to the number of label divisions k were achieved 0.99%, 88.49%, 92.43%, 90.54%, and 93.40% in exact match, accuracy, F1-score, precision, and recall, respectively.


Introduction
Cardiovascular diseases (CVDs) are the leading cause of mortality and morbidity worldwide, and are a generic term for disorders of the heart or blood vessels. According to the World Health Organization (WHO), approximately 17.9 million people died from CVDs in 2019, accounting for 32% of global deaths [1]. In particular, about 80% of these sudden cardiac deaths are the result of ventricular arrhythmias [2,3]. Arrhythmias are when the electrical signals that control the heart's rhythm are out of sequence. In other words, arrhythmia is an abnormality in the rhythm of the heart, which can be slow, fast, or irregular [3]. Arrhythmias are accompanied by various symptoms and have various risks ranging from mild fluttering to death. Because of the high mortality rates of CVDs, early detection and accurate identification of arrhythmias are essential for treatment of patient [4]. The electrocardiogram (ECG), which records the electrical activity of the heart, is the most commonly used tool to detect arrhythmias due to its low cost and non-invasive characteristics.
The standard ECG refers to the 12-lead ECG with a short duration of 10 s, which can provide sufficient information for the diagnosis of various disease [5]. Therefore, a method that allows for an accurate interpretation of the ECG is required. However, the diagnosing of arrhythmias through ECG records requires a time-consuming process by an experienced physician. Furthermore, there may be subtle changes in the ECG that have not been detected. To overcome these problems, computer-aided diagnosis (CAD) algorithms have been used to automate the diagnosis of arrhythmias. Traditional CAD methods require the use of manually processed features, which are the most important step for classification [6]. Kernel-based [7][8][9], wavelet transform [10][11][12], and Fourier transform [13,14] methods were used to perform feature extraction, such as signal preprocessing and waveform detection. This feature extraction step is generally designed by experience and requires specific expertise [6]. However, recent advances in deep learning can perform tasks more efficiently than traditional methods without the need to feature extraction.
Deep learning can be used to classify patterns, extract features to identify meaningful hidden information from data. In analyzing ECG, deep learning methods demonstrated better classification performance than previous traditional methods when trained with sufficient data [6]. There are many studies to use deep learning to analyze ECG [15]. However, most attempts were designed using single lead ECG, which remains the possibility of using more information from raw ECG data [16]. Clearly, there are investigations that performed the desired tasks with 12-lead (or multi-lead) ECG as inputs [17,18]. Their common purpose is to predict and classify a single label, yet the PTB-XL ECG dataset [19] used in this study has a total of 5 superclasses, and since it can have various classes simultaneously, multi-label prediction is required. We used the k-label set method [20] to solve this problem, and convolutional neural network (CNN)-based model was used as a deep learning architecture for classification.
CNN is deep learning architecture designed to learn topological information and patterns appearing in adjacent spaces, and has demonstrated impressive performance in computer vision [21][22][23][24], natural language processing [24,25], medical fields [15,26], and many other tasks. CNN has been continuously evolving since LeNet [27], and has become the most prioritized model in image-related tasks. Over the past decade, various CNN-based models, such as VGGNet [28] and GooLeNet [29] have shown good classification performance since the advent of the first large-scale CNN, AlexNet [21]. However, due to the gradient vanishing problem, as the layer of the model deepened, it was revealed that the performance deteriorated, and a problem occurred that the training was not performed properly. ResNet [30] overcomes this problem through residual learning using skip connection.
In summary, we propose a method to solve the multi-label classification problem of ECG signals based on SE-ResNet, which applies the squeeze and excitation method [31] to ResNet. The squeeze and excitation method consists of two steps. Through the squeeze step, the entire information corresponding to the channel can be expressed as a channel descriptor. Then, in excitation step, the relative importance of each channel is calculated using the information obtained in the previous step and used as weights. Using squeeze and excitation method, we recalibrate the feature information in ResNet, and use this model for multi-label classification. The remainder of this paper is summarized as follows: In Section 2, we explain the dataset and methods used in our experiment. We describe experimental results in Section 3, and in Section 4, we present our evaluations of the findings.

Dataset
The PTB-XL ECG dataset [19] is composed of open data that can be obtained from the Physionet [32] site managed by the MIT Laboratory for Computational Physiology. The dataset included waveform data collected from October 1989 to June 1996 with a Schiller AG device and was created so that individual patients could not be identified. Waveform data are provided in waveform database (WFDB) format in 100 Hz and 500 Hz, and 100 Hz data were used in this study. The structured data consisted of 21,837 clinical 12-lead ECGs of 18,885 patients. In total, 71 ECG descriptions conforming to SCP-ECG standards, prepared by two cardiologists, have been documented, along with demographic items, such as age, gender, and height. One ECG datum was excluded because the measurement was interrupted in the middle, and a total of 21,836 data were used. Therefore, the format of the input data we used is equal to 21,386 × 12 × 1000. Diagnosis of ECG data is classified into five superclasses: normal ECG (NORM), myocardial infarction (MI), ST/T change (STTC), conduction disturbance (CD), and Hypertrophy (HYP). This superclass is further subclassed into subclasses, and the description of each class can be found in Table 1 [19]. The entire dataset was split into 60:20:20 ratio and used as training (13,101), validation (4367), and test sets (4368), respectively. The purpose of neural networks classifier was to find a function that maps the input x to the label y. This approach causes gradient vanishing problems when layers become deeper. In other words, as the layers get deeper, the gradient gradually decreases as the backpropagation process returns to the input layer. Therefore, the weights are not updated in the layer close to the input layer, making it difficult to find the mapping function that want to find. The advent of ResNet [30] made it possible to deal these issues. The gradient vanishing problem was solved by adding a shortcut that skips over the middle, rather than consecutively in the order of the connections between the layers. The shortcut connection calculates the residuals based on how much it has changed from the previous value, which ensures that the gradient delivered is at least 1. Figure 1 shows the structure of the residual block in ResNet architecture. It can be confirmed thatX is output by adding the input feature map X with C × H × W and the feature map that X passed through the residual block. When C, H, and W mean the number of channel, height, and width in feature map X, respectively. Additionally, the details and variants of the network structure according to the number of layers used in this work can be found in Table 2.  The squeeze and excitation network (SENet) [31] is designed to train the importance of convoluted features in the CNN training process. The goal of SENet is to recalibrate interaction between channels in the feature representation via CNN through each squeeze and excitation phase. Based on the obtained information, new weights were given for each channel to improve performance. In other words, the squeeze and excitation block (SE block) overcomes the limitations of a traditional CNN, which learned only information corresponding to a local receptive fields, and collects and delivers information on all fields. The networks can be flexibly expanded with additional SE blocks, and performance improvements can be achieved with little additional calculations.
This method consists of two phases: squeeze and excitation. The squeeze step extracts the entire information corresponding to the channels. When a feature map X with C × H × W dimensions becomes a feature map U of C × H × W size through convolution layer, the size of the feature map corresponding to one channel in U is H × W. Feature maps for each channel are squeezed into 1 × 1 feature maps using channel descriptor function, such as global average pooling (GAP) [33]. In this phase, a scalar value representing global information about the channel is created. The squeeze process represented in Equation (1), where u c (i, j) denotes a feature map corresponding to channel c after X has passed convolution layer. F sq represents the channel descriptor function, and GAP was used in this study.
In the excitation step, the channel-wise dependencies are considered by using the descriptor for each channel obtained through the squeeze step. This task can be achieved through fully-connected and non-linear functions. The excitation step is shown in Equation (2), where z is the value obtained by squeeze, W i are the FC layers, and σ is the sigmoid function. Due to the sigmoid, the output value of the excitation step has a value between 0 and 1, and it can be used as a weight for calibration. A new weight s obtained by excitation is multiplied by the existing feature map U. Figure 2 shows how SE-ResNet used in this study works, and represent the structure of the squeeze and excitation phases in the SE block. Figure 2. The SE-ResNet structure is configured by adding SE block to ResNet used in this study.

Proposed Model Architecture
The architecture of the model proposed in this study is shown in the Figure 3 and is basically based on the RAKEL algorithm [20]. In RAKEL algorithm, let L be a set of l labels in a multi-label classification problems. In the ECG signal classification problem in this study, L = 5. The RAKEL method creates m k-subsets of L, such as L 1 , L 2 , . . . , L m . The existing RAKEL model randomly selects m subsets, but since this study uses all subsets, m = ( 5 k ). Original labels on all samples were translated into new labels based on labels in a set, resulting in one new label for each sample. Then, using samples with new labels, a single-label classifier is constructed. In this study, the SE-ResNet-34 model, which showed the best performance among 8 ResNet and SE-ResNet models, was used as a single-label classifier. All developed m SE-ResNet-34 models were put into the RAKEL model. The label set for a query sample is determined by combining the outputs of m classifiers. The method of combining outputs uses a soft voting method that sums all output values and passes through the sigmoid activation function.

Evaluation Metrics
Since the superclass classification problem of the PTB-XL ECG dataset is multi-label classification, we used the following five evaluation metrics to evaluate the model performance.
• The F 1 -score: The harmonic mean of precision and recall is the F 1 -score, which is an overall assessment of the quality of a classifier's predictions • Exact match (subset accuracy): This is a metric of indicating the correct ratio for all labels. There is a disadvantage of ignoring partially correct predictions because they must all be matched. Additionally, because of this, the dependency on the label is taken into account.
where N,ŷ i , y i represent the total number of samples in test dataset, classifier's prediction, and true label, respectively. Table 3 shows the classification result of the ResNet, SE-ResNet, and the proposed model with k-labelsets. In the case of ResNet and SE-ResNet models, the respective results using 18, 34, 50, and 101 layers are presented for comparison. In the case of plain ResNet, when 18 layers were used, exact match, F1-score, and recall showed the best at 60.67%, 92.06%, and 93.81%, respectively, and when 34 layers were used, accuracy showed the best at 88.02%. Additionally, the ResNet-50 model showed the highest precision at 91.06%. Meanwhile, in the case of SE-ResNet, when 34 layers were used, exact match, accuracy, and F1-score were the highest at 61.42%, 88.48%, and 92.40%. Furthermore, in order to compare the performance of the overall models, the performance of each model was presented as an average value. When comparing the averages, the results of using the squeeze-and-excitation network in all metrics were better than the results of plain ResNet. However, the increase in all metrics was less than 1% except exact match. In particular, SE-ResNet using 34 layers showed the best performance in all metrics except precision and recall. Therefore, in this study, the k-labelsets method was applied to the SE-ResNet-34 model, and the results were calculated by changing k = 1, 2, 3, 4. When k = 2, F1-score and recall were the highest at 92.84% and 94.87%, and when k = 4, exact match, accuracy, and precision showed the best results at 63.76%, 89.16%, and 91.62%. In order to compare the results of all models, the highest value for each metric is shown in bold. Comparing the averages of plain ResNet, SE-ResNet, and the proposed models, the proposed model showed the best performance in all measures.  Figure 4 shows the confusion matrices generated by the proposed model with 4-labelsets that showed the best accuracy performance. Figure 4a Table 4 shows the multi-label classification results of the proposed model with k-labelsets based on the subclass. When classifying using subclasses, the number of subclass labels is 23, so if all k-labelsets are used, it takes too long to train the model. Therefore, like the original RAKEL algorithm, we randomly selected 23 k-labelsets and observed the model performance while changing the k value. As with the superclass results, the highest value of each metric is bolded to compare the results of all models. Comparing the performance of the proposed model with k-labelsets based on the superclass and the subclass, it can be seen that the performance of the subclass is greatly degraded with the exact match 88.94% and 62.50%, respectively. However, since the number of labels in the subclass is 23, it seems reasonable in that it is difficult to accurately match all the labels. Looking at the performance of the model according to k, exact match was the best at 58.01% when k = 22, and accuracy was the highest at 83.82% at k = 12, k = 17, and k = 22. For F1-score, when k = 12, and k = 17, the performance of the proposed model was the highest at 90.15%.

Multi-Label Classification Results Based on Subclass
Additionally, precision showed the best performance with 89.60% when k = 22, and recall was the highest with 90.90% when k = 3. Table 4. Classification performance for the proposed model with k-labelsets based on subclass (The proposed model with k-labelsets uses SE-ResNet-34 as a classifier).

Model
Exact Match Accuracy F1-Score Precision Recall

Comparison between Existing Studies and the Proposed Model
In this subsection, we conducted performance comparisons with existing methods. Strodthoff et al. [34] performed various tasks using the PTB-XL ECG dataset based on deep learning. For classification tasks, CNN-based methods and ensemble methods demonstrated superior performance. They obtained classification AUC and F1-score of 93.4% and 82.5% for superclass, 93.0% and 76.6% for subclass, respectively. Zhang et al. [35] performed classification for NORMAL, AF, I-AVC, LBBB, and PAC classes in the PTB-XL ECG dataset. The inception-ResNet-v2 model [36] was used, and classification was performed by converting the ECG signal into a 2-dimensional texture image. As a result of classification using 5 classes, a F1-score of 88.62% was obtained. Zhaowei et al. [37] performed classification using 6 public ECG data. Classification was performed on 27 classes using SE-ResNet, and 0.885 was achieved in the challenge score, the evaluation criterion of the PhysioNet/Computing in Cardiology Challenge 2020 [38]. Although there are various ECG classification results, it is difficult to compare directly due to labeling methods and evaluation metrics are different, in addition, there is a difference in the purpose of classification. Nevertheless, our method has been found to be superior in terms of F1-score.

Discussion
This paper has proposed a model applying the k-labelsets methodology to SE-ResNet-34 to solve the multi-label ECG signal classification problem. Although the existing RAKEL algorithm randomly selected k-subsets, this study used all k-subsets and observed the performance of the model while changing the value of k. In addition, the proposed model uses a soft voting method that sums the outputs of classifiers and passes through sigmoid activation, unlike the existing algorithm using majority voting. SE-ResNet-34 was used as a classifier to apply the k-labelsets methodology because this model showed the best performance while changing the depth of plain ResNet and SE-ResNet models. In both the plain ResNet model and the SE-ResNet model, even if the deep model was used, it did not show better performance. Although the residual block is structured so that the performance does not deteriorate even if the depth of the model is deep by using the shortcut connection, the deep model did not show good performance only for the classification problem of ECG data. CNN is suitable for image classification problems, and ECG data have a very small amount of data compared to images, so it does not seem to show good performance even if the depth of the model is deep. The results of comparing plain ResNet and SE-ResNet showed that the overall performance improved when the squeeze and excitation network was used, but the performance measures were improved by less than 1% except exact match. Among plain ResNet and SE-ResNet models the SE-ResNet model using 34 layers showed the best performance, and the k-labelsets method was applied to this model. In the multi-label classification problem based on superclass, the proposed model applying the k-labelsets methodology showed good overall performance regardless of the value of k, and showed the best exact match, accuracies, and precision of 63.76%, 89.16%, and 91.62%, respectively, when k = 4. Furthermore, when k = 2, it showed the best F1-score and recall value of 92.84% and 94.87%. The performance of the proposed model in multi-label classification based on subclasses was also examined. The model with k = 22 showed the highest values in exact match, Accuracy, and precision, and was the model with the best overall performance. It was helpful to increase the value of k to take into account the values of other labels, as all labels must be matched correctly for exact match to be high. On the other hand, accuracy and F1-score were highest even when k = 12 and k = 17. Since these metrics do not need to consider the values of other labels, as the k increases, the values of the metrics increase and then decrease. However, since this number is not high enough to make a diagnosis, a follow-up study is needed to improve diagnosis accuracy by using not only ECG data but also demographic data, such as gender, height, and age. In addition, it is also necessary to analyze whether there are any differences between classifying 100 Hz ECG data and classifying 500 Hz ECG data.

Institutional Review Board Statement:
The PTB-XL ECG dataset was approved by the Institutional Ethics Committee for publication of anonymous data in a public access database.

Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset is managed by the MIT Laboratory for Computational Physiology and is accessible at the following site: https://physionet.org/content/ptb-xl/1.0.0/ (accessed on 25 July 2021).