Electrocardiogram Heartbeat Classification for Arrhythmias and Myocardial Infarction

An electrocardiogram (ECG) is a basic and quick test for evaluating cardiac disorders and is crucial for remote patient monitoring equipment. An accurate ECG signal classification is critical for real-time measurement, analysis, archiving, and transmission of clinical data. Numerous studies have focused on accurate heartbeat classification, and deep neural networks have been suggested for better accuracy and simplicity. We investigated a new model for ECG heartbeat classification and found that it surpasses state-of-the-art models, achieving remarkable accuracy scores of 98.5% on the Physionet MIT-BIH dataset and 98.28% on the PTB database. Furthermore, our model achieves an impressive F1-score of approximately 86.71%, outperforming other models, such as MINA, CRNN, and EXpertRF on the PhysioNet Challenge 2017 dataset.


Introduction
Cardiovascular diseases (CVDs) have surpassed cancer as the number one killer globally, killing approximately 17.3 million people yearly [1]. In the United States, CVDs accounted for 874,613 deaths in 2019, as reported by the American Heart Association. An ECG is a commonly used diagnostic tool in healthcare settings for assessing cardiovascular function [2,3]. Figure 1 describes the construction of an ECG signal, which is important for interpreting the results of the test. novel approach suggested in this study includes the use of evolving normalization, residual block, gradient clipping, and a normalized gradient, which make it more generalizable and computationally efficient for classifying arrhythmias. The evolving normalization is fed into a residual block, resulting in a significant improvement in the performance of classifying various ECG signals. The effectiveness of our method is assessed using three publicly accessible ECG datasets. The results of the experiments demonstrate that the proposed model significantly enhances the classification of ECG heartbeats.
The primary advancements of this research can be summarized as follows: • The examination of a selection of current neural network methods for the identification of electrocardiogram signals; • The introduction of a new model for classifying electrocardiogram (ECG) signals; • The examination of the efficiency of the proposed model on three published ECG datasets.

Related Work
Deep learning is a state-of-the-art method for extracting features, predicting, detecting, making decisions, and classifying different classes using a set of datasets. One of the major advantages of deep learning methods for ECG classification is that they can learn complex relationships between the ECG signal and various cardiovascular conditions. For instance, deep neural networks can automatically learn features of ECG signals, such as their shape, frequency, and amplitude, that indicate specific heart conditions, thereby improving upon traditional ECG classification methods. Another advantage of deep learning for ECG classification is its ability to handle large and noisy datasets, which are common in healthcare. Deep neural networks can effectively learn from ECG signals with various noises and artifacts and generalize well to unseen data. Due to their high efficiency, many studies have proposed using deep learning models for ECG classification. This study reviews three types of neural networks for ECG classification: convolutional neural networks (CNN), recurrent neural networks (RNN), and LTSM. However, interpreting the results from these deep learning methods depends on various factors, such as the hardware platform, the model's architecture, and compiler optimization, which can directly impact training the model.

Convolutional Neural Network (CNN)
Convolutional neural networks have gained widespread use in the field of computer vision [37,38]. A CNN consists of three important layers: the input layer, multiple hidden layers, and the output layer. The hidden layers are made up of three layer categories: convolutional, pooling, and fully connected layers. The convolutional layers in a CNN identify features within the input data, while the pooling layers decrease the dimensions of the feature map. The fully connected layers classify these features. Acharya et al. [31] used 13 layers to classify electrocardiogram (ECG) signals into three categories: normal, preictal, and seizure, but their experiment used only a small dataset. Panda et al. [39] applied CNN to process ECG signals and showed that their method could only classify two classes. Other studies based on VGG-Net proposed a 2D CNN that can classify eight classes [40]. This study used optimization techniques, such as K-fold cross-validation, data augmentation, and regularization, to solve the overfitting problem and improve the classification's performance. Anwar et al. [41] presented a framework with four steps: preprocessing, heartbeat segmentation, feature extraction, and heartbeat classification. They used discrete wavelet transform (DWT), RR interval, and the Teager energy operator (TEO) to extract features before the neural network classifier. Fatma et al. [42] suggested a deeplearning model for classifying five-class electrocardiogram (ECG) datasets. Ullah et al. [43] used a deep CNN with a pretrained ResNet-18 to identify premature ventricular contraction (PVC) on the MIT-BIH dataset and the Institute of Cardiological Technics (INCART), respectively. Naz et al. [44] proposed using a deep CNN with a pretrained AlexNet, VGG19, and Inception-v3 for the diagnosis of VTA ECG signals.

Recurrent Neural Network (RNN)
RNN is widely used in natural language processing [45]. Similar to convolutional neural networks (CNNs), the structure of RNNs includes an input layer, a series of hidden layers, and an output layer. However, the hidden layers in RNNs are composed of recurrent layers and fully connected layers, as opposed to convolutional and pooling layers in CNNs. RNNs utilize recurrent layers to identify temporal patterns in the input data by processing the input data in a sequential manner. The fully connected layers then categorize the features extracted from the recurrent layers. AI-Zaiti et al. [33] also introduced DNN to analyze ECG signals. An early deep learning approach method for ECG signal classification was found to be effective in the training dataset, as evidenced by its high performance metrics. However, when tested on an independent dataset, the model demonstrated poor generalizability, indicating that it may not perform well on unseen data. Xiong et al. [46] developed RhythmNet to recognize ECGs of different rhythms. RhythmNet can help distinguish between atrial fibrillation (AF) and normal rhythms. Additionally, the proposed model utilizes a combination of convolutional and recurrent layers to identify patterns and make predictions in the ECG signal. Specifically, the model includes sixteen convolutional layers with a filter size of 15 × 1, three recurrent layers, and two fully connected layers. This design allows for efficient feature extraction of the ECG signal. However, it is worth highlighting that the model demonstrated an efficient performance on the training dataset but had poor generalizability on the testing dataset. They applied softmax activation to calculate the probability of each class. In their experiment, the margin of error between normal rhythm and AF was limited, implying that RhythmNet was adept at distinguishing between these two categories.

Long Short-Term Memory (LSTM)
LSTM has shown promising results in various applications, including disease prediction and ECG signal classification [47]. Studies such as Yildirim et al. [34], Saadatnejad et al. [35], and Hou et al. [48] have applied LSTM in different ways to improve the efficiency and accuracy of ECG signal classification. Yildirim et al. [34] proposed the use of a new input layer, called the wavelet sequence (WS), which improved the efficiency of the LSTM network. Saadatnejad et al. [35] presented a lightweight model using wavelet transform and multiple LSTM layers, while Hou et al. [48] introduced a deep learning model with an encoder and a decoder, which used LSTM networks to extract high-level features and classify the signals into different categories.
Overall, the previous studies have shown that recent 2DCNN methods have promising classification performances. However, these methods require the transformation of ECG sequence data to the two-dimensional domain and are very computationally intensive, as seen in VGG, ResNet, and Inception-v3. On the other hand, RNN-LSTM has also demonstrated good performance, but the method focuses heavily on the temporal domain, which can lead to overfitting and poorer results on test data. In this work, we aim to improve the accuracy of ECG signal classification by developing a novel 1DCNN method that incorporates evolving normalization-activation (EVO), squeeze-and-excitation (SE), and gradient clipping (GC) components. Our approach outperforms existing techniques, achieving a significant improvement in classification accuracy for several datasets. By optimizing these components, we were able to effectively address the limitations of the previous 2DCNN and RNN-LSTM methods and achieve state-of-the-art results for ECG signal classification tasks.

Dataset
It is worth noting that the PhysioNet MIT-BIH Arrhythmia dataset [49] and the PTB Diagnostic ECG dataset [50] are widely used in the field of ECG signal analysis and have been used in many previous studies for algorithm development, testing, and comparison. These datasets are publicly available and free to download, making them an accessible resource for researchers and developers. Additionally, the annotations provided with these datasets are considered to be reliable and accurate, as they were determined by multiple experts in the field. However, it is important to note that the datasets have their limitations, including the small size of the MIT-BIH dataset and the limited scope of diagnoses in the PTB dataset. As such, caution should be taken when generalizing findings from these datasets to larger and more diverse populations.
The MIT-BIH Arrhythmia dataset is a set of ECG recordings obtained from 47 participants, comprising 48 two-channel, half-hour ambulatory recordings. The data were recorded at 360 samples per second per channel, using an 11 bit resolution and a 10 mV range. The dataset also includes reference annotations for each beat, which were determined by two or more cardiologists and any discrepancies were resolved. The dataset includes approximately 110,000 annotations in total, with the corresponding beat labels in the accompanying Table 1.
The PTB Diagnostics dataset is composed of ECG signals obtained from 290 individuals, including 148 diagnosed with myocardial infarction (MI) and 52 healthy controls, as well as individuals diagnosed with various other conditions. Each record includes ECG measurements taken from 12 different leads, all sampled at a frequency of 1000 Hz. In this particular study, only the ECG Lead II was utilized, and the focus was classification of the MI and healthy control groups. For our study, we utilized ECG data from the PhysioNet Challenge 2017 dataset [25]. This dataset contains a total of 8528 ECG recordings sampled by the AliveCor device at 300 Hz, with durations ranging from 9 s to over 60 s. Of these recordings, 738 were from patients with AF, while the remaining 7790 were from non-AF candidates. The number of recordings for each group is summarized in Table 2. We chose this dataset because of its large size and the availability of annotated labels, which were determined by experts and verified by a second reader. Pre-Processing Data The authors in [51] detailed the pre-processing steps. In this study, we follow their steps to extract beats from an ECG signal. The pre-processing steps are described in Figure 2.       The number of normal and abnormal images of the PTB Diagnostics dataset are summarized in Figure 8. The number of normal images is greater than 10,000, whereas the number of abnormal images is less than 4000. We used the finite impulse response bandpass filter (FIR) [52], a time-frequency transform plane, to transform a single-channel ECG signal into a four-channel ECG signal. Then, for each channel, we used sliding window segmentation on a sequence of segments of equal length to divide x (i) εR n , as shown in Table 3.

Proposed Method
The proposed model's architecture is detailed in Figure 9, which highlights several critical components, including the use of Convolution 1D (Conv1D) to expand the number of channels in the input image, evolving normalization-activation layers (Evo_norm) to normalize the output from the Conv1D block, and residual blocks to analyze the input features comprehensively. The model starts by resizing the input image to 187 × 1 and feeding it into the Conv1D block to expand the number of channels. The output from the Conv1D block is then passed through an Evo_norm block, followed by four residual blocks. The output from these blocks is then processed using the flatten and dense tools, which ultimately produce the final output.
Evo_norm has potential benefits, such as non-centered normalization, mixed variances, and tensor-to-tensor processing. It is a collection of normalization-activation layers combined into a single computation graph. Evo_norm is divided into two series: B (batch-dependent) and S (individual samples). These are explained in detail in a paper by Liu et al. [53]. To clarify the explanation, the residual block used in the proposed model consists of two pathways: the first pathway involves the max pooling and Conv1D layers to extract features from the input, while the second pathway further refines these features using Evo_norm, Dropout, Conv1D, and SE_Block, as shown in Figure 10. The SE_Block is a squeeze-and-excitation block that adaptively recalibrates the channel-wise feature responses. The outputs from the two pathways are then multiplied element-wise to produce the final output of the residual block. By using this approach, the residual block is able to effectively capture and refine the important features in the input signal. The squeeze-and-excitation for Conv1D blocks (SE) [54] is one of the modules used to improve the model's performance. The intricacies of the squeeze-and-excitation mechanism are shown in Figure 11. The main idea is that the squeeze operation is used to obtain a single value for each channel of input features, while the excitation operation on the output of the squeeze operation is used to obtain per-channel weights. In addition, this study proposes gradient clipping and a normalized gradient to enable faster convergence compared to traditional gradient descent with a fixed stepsize [55]. The experimental results for two popular ECG datasets demonstrate the benefits of the proposed model, which achieves an improved performance in distinguishing between different classes of ECG signals.

Evaluation Metrics
Following AAMI's suggestion, we used accuracy, precision, and recall to evaluate the model's efficiency.

Loss Function
For the PhysioNet MIT-BIH dataset, we applied the traditional loss, namely the categorical cross-entropy loss. For the PhysioNet PTB dataset, we used the binary focal loss. The categorical cross-entropy can be found through: where s p is the positive class; s j is the score of the positive class; and C is the class. The binary focal loss can be calculated as follows: where the class's probability estimate from the model is given by p, and ∝ is a weighting factor.

The PhysioNet MIT-BIH Dataset
For the PhysioNet MIT-BIH dataset, we perform training with five-fold cross-validation. We set up the experiment with the same details as for training one fold at a time. The chosen optimization technique is Adam. The initial value of the learning rate of 1 × 10 −3 would be multiplied by 0.1 at the 20th and 40th epochs. The training runs for 50 cycles (or epochs).

The PhysioNet PTB Dataset
We utilized 10-fold cross-validation for training on the PhysioNet PTB dataset. In each fold, we set up the experiment with the same details. The optimizer is the Adam method. The initial value of the learning rate of 1 × 10 −3 would be multiplied by 0.1 at the 40th and 120th epochs. The number of epochs is 150.

The PhysioNet Challenge 2017 Dataset
This dataset was divided into three sets: a training set (75%), a validation set (10%), and a test set (15%), which were used to train and evaluate the model. To address the class imbalance, we converted each ECG recording into sliding frames of the length 3000, using 50 and 500 steps for recordings from AF and non-AF ECGs, respectively. The model was trained for 25 epochs with a batch size of 64 and a learning rate of 0.001, with an early stopping decay of 0.9. We used ADAM optimization and binary cross-entropy as the loss function. To evaluate the model, we tested it five times and report the mean values with one standard deviation.

Hyperparameters
The model's hyperparameters were chosen based on the suggestions from the reference papers. The SE ratio was set to 0.25, and the hyperparameters ∝ = 2 and ∝ t = 0.25 were used for the focal loss. During training, we applied oversampling to both the training and validation data to address the class imbalance. Additionally, we trimmed the gradient based on its norm with a threshold of 0.001. For the test data, which were independent or unseen, we analyzed the results and found them to be promising.

Independent Testing Set
Using a five-fold cross-validation technique on the training set means that the data are divided into five equal parts or "folds", and the model is trained and validated five times, with a different fold being used for validation each time. This allows for a more comprehensive evaluation of the model's performance on the data, as each data point is used for validation at least once. Similarly, employing a 10-fold cross-validation strategy on the PhysioNet MIT-BIH dataset means that the data are divided into ten equal parts, and the model is trained and validated ten times, with a different fold being used for validation each time. This strategy can help to ensure that the model's performance is not affected by the particular set of data used for training and validation, as each data point is used for validation at least once across all folds. The use of the "sklearn.model selection.StratifiedKFold" function ensures that the class distribution is preserved in each fold of the dataset, which is important for ensuring a fair evaluation of the model's performance. Figure 12 shows the way to divide the data in the training, testing, and validation sets.

Results
The experimental results are examined on two ECG datasets, namely, the PhysioNet MIT-BIH Arrhythmia and PhysioNet PTB Myocardial Infarction. Table 4 shows the contribution of each component to the performance of the proposed method. The results indicate that the Evo_norm block plays a significant role in enhancing the classification accuracy of the three datasets. The addition of the SE block also improves the performance on these three datasets. Finally, gradient clipping also shows a positive effect on the performance of the model. Overall, the combination of all three components results in the best performance on the three datasets. It can be seen that the proposed model with all three components achieved the highest performance with an accuracy of 98.56%, followed by the model without GC with an accuracy of 98.42%, the model without SE with an accuracy of 98.36%, and the model without EVO with an accuracy of 95.50%. This indicates that each component has a contribution to the performance of the model, with EVO having a higher impact than SE and GC. The combination of all three components significantly improved the efficiency of the model in this study.

Comparative Results
In an effort to facilitate comparison, our study was benchmarked against the work presented in [51]. The results of our study revealed an average accuracy of 98.5%, which is significantly higher than the accuracy scores obtained in previous studies, such as [51,56,57], as shown in Table 5. These results are strong evidence that our research has an advanced benefit for arrhythmia classification.

Confusion Matrix
In a prior study by Kachuee et al. [51], the authors presented the confusion matrix of the resampled testing data. Our study, however, uses the actual testing set to evaluate the performance of the model, and the results indicate an improvement in performance. As shown in Figure 13, the confusion matrix of the MIT-BIH Arrhythmia classification for the five heartbeat classes (N, S, V, F, and Q) is presented. Despite evaluating the performance on the actual testing set, instead of just the resampled testing data as in previous studies, our proposed model still demonstrates superior performance. The model demonstrates an accuracy of 99% for the N and Q classes, 97% for the V class, 90% for the S class, and 88% for the F class. As mentioned above, the MIT-BIH Arrhythmia dataset exhibits a class imbalance, yet our proposed model achieved high accuracy in both scenarios where there were ample input images (i.e., class N) or limited input images (i.e., class Q). This suggests that our proposed model is effective in classifying ECG signals from different heartbeat classes. In this study, we conducted an empirical comparison of our proposed model against other previous studies, as presented in Table 6. The evaluation of the results was conducted using three common performance metrics, namely accuracy, precision, and recall. The results showed that our proposed model exhibited a superior classification performance, achieving an accuracy of 98.28. Furthermore, our model performed exceptionally well in terms of recall, with a score of 97.72, which was the highest among all the studies compared. One of the most notable achievements of our model was its precision, which was measured at 99.90, meaning that the model only made 0.01% of mistaken classifications, making it the best-performing model among the previous studies.

Confusion Matrix
The results of our model's classification performance on the PhysioNet PTB Myocardial Infarction data are depicted in Figure 14. The figure represents the confusion matrix for the two classes, normal and abnormal, that were evaluated. The results indicate that our model achieved high accuracy in classifying the normal class, with 99.8% of the normal images being correctly classified and only a small percentage, 0.2%, being misclassified. Additionally, our model also showed good performance in recognizing the abnormal class, with 97.7% of the images in that class being correctly classified. The remaining images in the abnormal class were misclassified by the model during the prediction process.  Table 7 compares the performance of our proposed model with the existing models in the literature. Our model achieved an F1-score of approximately 86.71%, outperforming other models, such as MINA, CRNN, and EXpertRF, which achieved F1-scores of 83.42%, 82.62%, and 81.80%, respectively. These results demonstrate the superiority of our proposed model over previous studies.

Conclusions
In this study, we present a novel approach for ECG heartbeat classification. Our proposed model outperforms the existing state-of-the-art techniques, achieving superior accuracy results on both the PhysioNet MIT-BIH and PTB datasets. The high performance of our model is attributed to the combination of the Convolution 1D (Conv1D), evolving normalization-activation layers (Evo_norm), and the residual block module, with accuracy rates of 98.5% and 98.28%, respectively, on these datasets. Furthermore, we evaluated the model's efficiency on the PhysioNet Challenge 2017 dataset and achieved an F1-score of approximately 86.71%, outperforming the existing models. Overall, our proposed model is a valuable contribution to ECG heartbeat classification.
In future work, we aim to evaluate the effectiveness of our model on additional datasets and explore optimizing the model's architecture with fewer parameters. We believe that our model has the potential to be useful not only for ECG heartbeat classification but also for general classification tasks in wearable device applications.

Conflicts of Interest:
The authors declare no conflict of interest.