A Novel Deep-Learning-Based Framework for the Classification of Cardiac Arrhythmia

Cardiovascular diseases (CVDs) are the primary cause of death. Every year, many people die due to heart attacks. The electrocardiogram (ECG) signal plays a vital role in diagnosing CVDs. ECG signals provide us with information about the heartbeat. ECGs can detect cardiac arrhythmia. In this article, a novel deep-learning-based approach is proposed to classify ECG signals as normal and into sixteen arrhythmia classes. The ECG signal is preprocessed and converted into a 2D signal using continuous wavelet transform (CWT). The time–frequency domain representation of the CWT is given to the deep convolutional neural network (D-CNN) with an attention block to extract the spatial features vector (SFV). The attention block is proposed to capture global features. For dimensionality reduction in SFV, a novel clump of features (CoF) framework is proposed. The k-fold cross-validation is applied to obtain the reduced feature vector (RFV), and the RFV is given to the classifier to classify the arrhythmia class. The proposed framework achieves 99.84% accuracy with 100% sensitivity and 99.6% specificity. The proposed algorithm outperforms the state-of-the-art accuracy, F1-score, and sensitivity techniques.


Introduction
Cardiovascular diseases (CVDs) are the primary cause of death every year. According to the World Health Organization (WHO), approximately 17.9 million people died due to CVD in 2019, which translated as CVDs causing 32% of global deaths. Of these deaths, 85% were due to heart attacks [1]. A heart attack is caused by the blockage of one or more coronary arteries, whereas other CVDs cause the remaining 15% of the deaths due to cardiac arrhythmia. Cardiac arrhythmia is an irregular heartbeat. ECGs are used to monitor heartbeats. The ECG signal is recorded through electrocardiography in which 12 leads are placed on the body. For correct recording of the ECG, the placement of the leads is very important [2]. The correct positions of the leads and the components of the ECG signal are shown in Figure 1.
A normal heartbeat has a regular rhythm in the ECG plot, whereas arrhythmia heartbeats have irregularities. In this article, sixteen classes of cardiac arrhythmia are considered. These classes are atrial premature beats (APBs), atrial flutter (AFL), atrial fibrillation (AFIB), supraventricular tachyarrhythmias (SVTA), Wolff-Parkinson-White (WPW), premature ventricular contraction (PVC), idioventricular rhythm (IVR), ventricular bigeminy, ventricular trigeminy, left bundle branch block (LBBBB), ventricular tachycardia (VT), ventricular flutter (VFL), second-degree heart block (SDHB), the fusion of ventricular and regular beat, right bundle branch block (RBBBB) and short PR [3]. Not all cardiac arrhythmias are harmful, but a few are, such as AFIB and VFL, which can cause heart attacks. Thus, it is necessary to identify the arrhythmia class to avoid potential damage.
The classification of ECG beats into arrhythmic cardiac classes has been performed using several conventional and deep-learning-based algorithms. ECG is a one-dimensional (1D) signal; therefore, the anomalies can be detected using machine learning approaches. Prasad et al. [4] used k-nearest neighbors (kNNs) to classify arrhythmia and achieved a 97.65% accuracy. The framework only considered the nonlinear local features and 1D signals. Similarly, the authors of [5] used kNNs to detect arrhythmia and achieved a sensitivity of 97.22% for the detection. The method only considered the handcrafted features of the 1D signals. With the advancements in deep learning, many deep learning architectures have been proposed to classify arrhythmias. In [6], the authors proposed a multilayer perceptron (MLP) and convolutional neural network (CNN)-based architecture for the classification of arrhythmia using 1D ECG signals. They achieved an accuracy of 88.7% for the MLP and 83.5% for CNN-based framework. Similarly, ref. [7] converted an ECG signal to a 2D signal and used a CNN-based approach to classify arrhythmia. They only considered eight classes and achieved a classification accuracy of 99.11%. A normal heartbeat has a regular rhythm in the ECG plot, whereas arrhythmia heartbeats have irregularities. In this article, sixteen classes of cardiac arrhythmia are considered. These classes are atrial premature beats (APBs), atrial flutter (AFL), atrial fibrillation (AFIB), supraventricular tachyarrhythmias (SVTA), Wolff-Parkinson-White (WPW), premature ventricular contraction (PVC), idioventricular rhythm (IVR), ventricular bigeminy, ventricular trigeminy, left bundle branch block (LBBBB), ventricular tachycardia (VT), ventricular flutter (VFL), second-degree heart block (SDHB), the fusion of ventricular and regular beat, right bundle branch block (RBBBB) and short PR [3]. Not all cardiac arrhythmias are harmful, but a few are, such as AFIB and VFL, which can cause heart attacks. Thus, it is necessary to identify the arrhythmia class to avoid potential damage.
The classification of ECG beats into arrhythmic cardiac classes has been performed using several conventional and deep-learning-based algorithms. ECG is a one-dimensional (1D) signal; therefore, the anomalies can be detected using machine learning approaches. Prasad et al. [4] used k-nearest neighbors (kNNs) to classify arrhythmia and achieved a 97.65% accuracy. The framework only considered the nonlinear local features and 1D signals. Similarly, the authors of [5] used kNNs to detect arrhythmia and achieved a sensitivity of 97.22% for the detection. The method only considered the handcrafted features of the 1D signals. With the advancements in deep learning, many deep learning architectures have been proposed to classify arrhythmias. In [6], the authors proposed a multilayer perceptron (MLP) and convolutional neural network (CNN)-based architecture for the classification of arrhythmia using 1D ECG signals. They achieved an accuracy of 88.7% for the MLP and 83.5% for CNN-based framework. Similarly, ref. [7] converted an ECG signal to a 2D signal and used a CNN-based approach to classify arrhythmia. They only considered eight classes and achieved a classification accuracy of 99.11%.
Inspired by this, we propose a novel deep-learning-based framework for classifying cardiac arrhythmia using attention-block-based CNN and the clump of features (CoF). The ECG signal is segmented into beats and converted into a 2D time-frequency domain signal using CWT. The features are extracted using attention-based CNN. After feature extraction, feature reduction is achieved with the help of the CoF model; then, classification is performed using classifiers. The performance of several classifiers is compared.
The rest of the article is organized as follows: Section 2 presents related work; the Inspired by this, we propose a novel deep-learning-based framework for classifying cardiac arrhythmia using attention-block-based CNN and the clump of features (CoF). The ECG signal is segmented into beats and converted into a 2D time-frequency domain signal using CWT. The features are extracted using attention-based CNN. After feature extraction, feature reduction is achieved with the help of the CoF model; then, classification is performed using classifiers. The performance of several classifiers is compared.
The rest of the article is organized as follows: Section 2 presents related work; the proposed methodology is explained in Section 3; Section 4 shows the experimental results; and in Section 5, conclusions are drawn.

Related Work
ECG signals are widely used for the detection and diagnosis of heart diseases. The signals can also be used for the classification of arrhythmia. These signals are one-dimensional and can be transformed into two-dimensional signals to classify the arrhythmia beats. ECG-based arrhythmia classification can be performed using handcrafted and deep temporal features. In [8], the authors proposed a gray-level co-occurrence matrix (GLCM) and gray-level run-length matrix (GLRLM)-based model for the classification of arrhythmia. They used six machine learning classifiers and achieved 90.42% accuracy for the 1D signal. In this framework, only handcrafted features are considered.
Similarly, in [9], a multi-resolution representation-based deep neural network model is proposed to classify arrhythmia. This model achieved an F1-score of 0.9238. The model is only applicable for the 1D signal and only considers handcrafted features.
In [10], an attention-based model is proposed for the classification of arrhythmia. The authors used the PhysioNet public dataset and achieved 92.8% accuracy, but the ECG signal in 1D is considered classification. In [11], the authors calculated RR intervals to detect arrhythmia. The approach uses conventional methods and achieves an accuracy of 99.98%. They also considered ECG in 1D.
In [12], the authors converted the ECG signal into a 2D spectrogram and proposed a framework based on the spectrogram for arrhythmia classification. They achieved an accuracy of 99.02% for the 2D-based CNN model. They only considered seven classes of arrhythmia. In [13], the authors demonstrated a support vector machine (SVM)-based framework for classifying arrhythmia using a 1D ECG signal. They considered 17 classes of arrhythmia and achieved 97.3% accuracy. The authors of [14] used YOLO to detect arrhythmia, and they considered only four classes of arrhythmia. Similarly [15], proposed a CNN-based framework for classifying cardiac arrhythmia using short-time Fourier transform (STFT) as input signal and achieved 99.00% accuracy, but they only considered four arrhythmia classes.
Furthermore, SVM is used to classify arrhythmia using the MIT-BIH database in [16]. They used selective bands and achieved 97.06% accuracy. Similarly, the authors of [17] used SVM to classify arrhythmia into four classes and achieved an accuracy of 83%. Moreover, in [18], the authors demonstrated cardiac arrhythmia classification using fog computing. They used the LSTM network for the classification of cardiac arrhythmia. Majahad et al. [19] demonstrated the conversion of ECG into time-frequency domain representation to classify arrhythmia. They used bagging methods and achieved 99% accuracy.

Proposed Methodology
The proposed framework consists of four major blocks. In the first block, the ECG signal is segmented into beats and then converted into 2D time-frequency representation using CWT. These 2D signals are given to the second block where the local, as well as global features, are extracted using a novel ArrhythmiaNet architecture. The output of ArrhythmiaNet is SFV. SFV is fed into the third block where the k-means clustering algorithm is applied, and the features are reduced to obtain RFV. This RFV is used to train the classifier in the fourth block, and the outputs of the classifier are the arrhythmia classes. The proposed framework is illustrated in Figure 2.

Preprocessing
We used the MIT-BIH dataset [30] for this research. The dataset consists of ECG signals of the standard and cardiac arrhythmia classes, and there are a total of 16 classes of cardiac arrhythmia. Table 1 shows the number of samples in each class. Each sample has 1000 fragments.

Preprocessing
We used the MIT-BIH dataset [30] for this research. The dataset consists of ECG signals of the standard and cardiac arrhythmia classes, and there are a total of 16 classes of cardiac arrhythmia. Table 1 shows the number of samples in each class. Each sample has 1000 fragments. All the ECG signals are segmented into beats using a timeframe window. The timeframe is kept constant for all ECG signals. Each ECG signal of ten seconds is divided into ten beats of one second. Figure 3 shows the ECG signal and its segmented beats.
After segmentation, the segmented beats are converted into the 2D time-frequency domain representation. For 2D time-frequency domain representation, we have used the CWT method. Traditional arrhythmia detection models extract features from the 1D ECG signal, which causes abortive classification results and low accuracy. We use CWT to transform a 1D signal into a 2D signal for better signal analysis and features extraction.

Continuous Wavelet Transform (CWT)
CWT expresses the signals in the form of wavelet functions. These wavelet functions are localized in the time domain as well as the frequency domain. By allowing the translation and scaling parameters of the wavelets to change constantly, CWT yields a comprehensive representation of the 1D signal. The time-frequency domain representation ECG ω (α, β) of a continuous ECG beat ecg beat (t) can be expressed as: Here, α is the scaling parameter and should be greater than 0; β is the translation parameter; t is the instant of time; Ψ(t) is the continuous mother wavelet. Ψ(t) provides the translation and scaling of the original wavelet ecg beat (t). The original ECG beats can be expressed as: where C Ψ is the wavelet constant whose value is 0 < C Ψ < ∞ and can be written as shown in Equation (3). After segmentation, the segmented beats are converted into the 2D time-frequency domain representation. For 2D time-frequency domain representation, we have used the CWT method. Traditional arrhythmia detection models extract features from the 1D ECG signal, which causes abortive classification results and low accuracy. We use CWT to transform a 1D signal into a 2D signal for better signal analysis and features extraction.
Continuous Wavelet Transform (CWT) The integration of the admissible wavelet must be zero. For this, ecg beat (t) is recovered using the second inverse wavelet transform, as shown in Equation (2).
The instantaneous wavelet at time t is defined as: where ω(t) is the window. We derive the time-frequency representation of the beats by using a filter bank in CWT. All the 2D representations are resized to 256 × 256 and divided into training and testing using tenfold cross-validation.

Train Test Dataset Split
After preprocessing, the dataset is divided into training and testing. Training involves 70% of the preprocessed data, whereas testing contains 30% of the preprocessed data.

Features Extraction Using ArrhythmiaNet
After preprocessing, the features of the 2D representations are extracted with the help of a novel ArrhythmiaNet with an attention block. ArrhythmiaNet consists of three 2D convolutional layers with the rectified linear unit (ReLU) as the activation function, one max-pooling layer, attention block, one flatten layer, and one last fully connected layer. The output of the fully connected layer is called the spatial feature vector. The attention block is explained in Section 3.3.1. The output of this attention block is multiplied with the max-pooling layer of the main ArrhythmiaNet. The purpose of this attention block is to capture global features, as explained in [31]. The architecture of the ArrhythmiaNet with the attention block is shown in Figure 4.  The vector from the proposed CNN is called the spatial feature vector (SFV). There are 2097152 full features in the SFV. After feature extraction, these features are reduced to 4096 using a clump of features (CoF) module.

Attention Block
In the attention block, a self-attention mechanism is followed. In particular, the selfattention block calculates the response at one position in the feature map as a weighted sum of the features from all positions. As a result, the weights are calculated with only a small computational cost. Figure 5 shows the attention block. In the attention block, the input features map denoted by is transformed into three feature spaces, f, g and h. To derive these feature spaces, initially, the input is passed through 1 × 1 convolution, and then it is multiplied by the trained weighted matrices W using the following equations. The vector from the proposed CNN is called the spatial feature vector (SFV). There are 2097152 full features in the SFV. After feature extraction, these features are reduced to 4096 using a clump of features (CoF) module.

Attention Block
In the attention block, a self-attention mechanism is followed. In particular, the selfattention block calculates the response at one position in the feature map as a weighted sum of the features from all positions. As a result, the weights are calculated with only a small computational cost. Figure 5 shows the attention block. In the attention block, the input features map denoted by x is transformed into three feature spaces, f, g and h. To derive these feature spaces, initially, the input is passed through 1 × 1 convolution, and then it is multiplied by the trained weighted matrices W using the following equations.
where W denotes the weights and b denotes the bias parameters. Then, matrices f (x i ) T (transpose of f (x i )) and matrices g x j are multiplied by the softmax function and the weighted value vectors are summed to obtain the attention map. Then, attention map and matrix h(x i ) are multiplied to derive the attention feature matrix. The attention mechanism used is similar to the self-attention mechanism used in [32]. However, we have not used linear regulizers.

Feature Reduction Using CoF
The reduced feature vector (RFV) is obtained using the CoF method, as explained in [33]. In CoF, k-means clustering is applied to the SFV to derive the vocabulary of features. k-means clustering is a highly unstable algorithm; therefore, the optimal value of k to make clusters is crucial. The optimal value of the clusters can be obtained with the help of the elbow method, in which the sum of the squared errors is plotted against different values of k, and the optimal value of k is selected. From the elbow plot, we obtained the value of k as 10. We further validated the clusters using silhouette analysis (SA). The pseudocode of SA is presented in Algorithm 1.

If
= 0, the sample is very close to the neighboring clusters If = 1, the sample is far away from the neighboring clusters If = −1, the sample is assigned to the wrong clusters After clustering, the histogram of the clustered vocabularies is drawn to derive the RFV. The RFV has the most important features. In feature reduction, the training and testing features are treated separately because the inter-patient paradigm is more strict than the intra-patient paradigm. The CoF method is illustrated in Figure 6.

Feature Reduction Using CoF
The reduced feature vector (RFV) is obtained using the CoF method, as explained in [33]. In CoF, k-means clustering is applied to the SFV to derive the vocabulary of features. k-means clustering is a highly unstable algorithm; therefore, the optimal value of k to make clusters is crucial. The optimal value of the clusters can be obtained with the help of the elbow method, in which the sum of the squared errors is plotted against different values of k, and the optimal value of k is selected. From the elbow plot, we obtained the value of k as 10. We further validated the clusters using silhouette analysis (SA). The pseudo-code of SA is presented in Algorithm 1.
Find the mean distance from all features in the same cluster ( i ) 2.
Find the mean distance from all features in the closest cluster ( i ) 3.

Find the coefficient: S
If S = 0, the sample is very close to the neighboring clusters If S = 1, the sample is far away from the neighboring clusters If S = −1, the sample is assigned to the wrong clusters After clustering, the histogram of the clustered vocabularies is drawn to derive the RFV. The RFV has the most important features. In feature reduction, the training and testing features are treated separately because the inter-patient paradigm is more strict than the intra-patient paradigm. The CoF method is illustrated in Figure 6.

If
= 1, the sample is far away from the neighboring clusters If = −1, the sample is assigned to the wrong clusters After clustering, the histogram of the clustered vocabularies is drawn to derive the RFV. The RFV has the most important features. In feature reduction, the training and testing features are treated separately because the inter-patient paradigm is more strict than the intra-patient paradigm. The CoF method is illustrated in Figure 6. After feature reduction, the RFV is given to the classifier for the classification of features into cardiac arrhythmia classes.

Classification
We used a support vector machine (SVM) for classification [34,35]. An SVM classifies objects by drawing a hyperplane and support vectors. The margin between the support vectors demonstrates the performance of the classifier. The greater the margin of the support vectors, the better the performance of the classifier, and vice versa. We have trained our SVM using RFV of the training dataset, and the RFV of the test dataset was used for performing the classification.
The upcoming section explains the experimental results of the proposed framework.

Experimental Results
The proposed framework is evaluated in terms of accuracy, sensitivity, specificity, F1-score, and classification error. Accuracy of the model was calculated using Equation (9). Accuracy = T P + T N T P + T N + F P + F N (9) where T P and T N denote true positive and true negative, respectively, whereas F P and F N denote false positive and false negative, respectively. The accuracy of the system shows its ability to correctly classify arrhythmia classes. Specificity is the ratio of the prediction of normal beats present in the case of binary classification and can be calculated using Equation (10).
The ratio of the prediction of abnormal or cardiac arrhythmia class is called the sensitivity of the model and can be determined using Equation (11).
The F1-score is the mean of the specificity and sensitivity; the formula to calculate the F1-score is given in Equation (12).
Here, S E and S P denote sensitivity and specificity, respectively. We also considered Cohen's kappa for the evaluation of our model. This can be calculated using Equation (13).
The optimal simulation parameters for training of the 36 million parameters of Ar-rhythmiaNet are shown in Table 2.

Loss function Cross entropy
The performance of ArrhythmiaNet was compared with the pre-trained D-CNNs, and is shown in Table 3. From Table 3, the performance of the proposed ArrhythmiaNet based framework is better than the other pre-trained D-CNNs. We have also compared the performance of the proposed ArrhythmiaNet with the PhysioNet dataset. The performance of the different kernels of SVM and kNN for the MIT-BIH and PhysioNet datasets is shown in Figure 7.
We have also compared the accuracy of ArrhythmiaNet with the existing models. This comparison is presented in Table 4.
From Table 4, all the CNN-based models which used ECG as 2D signals considered a maximum of eight classes of cardiac arrhythmia, and the maximum accuracy achieved by [7] was 99.11%. We have considered seventeen classes of arrhythmia and achieved 99.84%, which outperforms all the existing methods.
We also evaluated the classification accuracy of both classifiers for the MIT-BIH dataset for different clusters in the CoF module; the highest accuracy was achieved at 10 clusters. The classification accuracies of SVM and kNN for different clusters are shown in Figures 8 and 9, respectively. From Table 3, the performance of the proposed ArrhythmiaNet based framework is better than the other pre-trained D-CNNs. We have also compared the performance of the proposed ArrhythmiaNet with the PhysioNet dataset. The performance of the different kernels of SVM and kNN for the MIT-BIH and PhysioNet datasets is shown in Figure 7. We have also compared the accuracy of ArrhythmiaNet with the existing models. This comparison is presented in Table 4.   clusters. The classification accuracies of SVM and kNN for different clusters are shown in Figure 8 and Figure 9, respectively.  The confusion matrix of ArrhythmiaNet using the SVM classifier is shown in Figure  10. clusters. The classification accuracies of SVM and kNN for different clusters are shown in Figure 8 and Figure 9, respectively.  The confusion matrix of ArrhythmiaNet using the SVM classifier is shown in Figure  10. The confusion matrix of ArrhythmiaNet using the SVM classifier is shown in Figure 10.

Ablation Study
We performed an ablation study to show the impact of the attention block added in ArrhythmiaNet. Initially, we removed the complete block and calculated the performance metrics of the network. Then, we added the block and calculate the performance metrics. The accuracy of ArrhythmiaNet without the attention block was 90.71%, which is significantly lower than the other D-CNN models. However, when attention model block was added, more detailed features were extracted, and the accuracy of the network was 99.84%, which outperformed the other D-CNNs models. Figure 11 shows the confusion matrix of ArrhythmiaNet without the attention block.

Ablation Study
We performed an ablation study to show the impact of the attention block added in ArrhythmiaNet. Initially, we removed the complete block and calculated the performance metrics of the network. Then, we added the block and calculate the performance metrics. The accuracy of ArrhythmiaNet without the attention block was 90.71%, which is significantly lower than the other D-CNN models. However, when attention model block was added, more detailed features were extracted, and the accuracy of the network was 99.84%, which outperformed the other D-CNNs models. Figure 11 shows the confusion matrix of ArrhythmiaNet without the attention block.

Ablation Study
We performed an ablation study to show the impact of the attention block added in ArrhythmiaNet. Initially, we removed the complete block and calculated the performance metrics of the network. Then, we added the block and calculate the performance metrics. The accuracy of ArrhythmiaNet without the attention block was 90.71%, which is significantly lower than the other D-CNN models. However, when attention model block was added, more detailed features were extracted, and the accuracy of the network was 99.84%, which outperformed the other D-CNNs models. Figure 11 shows the confusion matrix of ArrhythmiaNet without the attention block.

Limitations and Future Directions
The proposed framework is dependent on the attention block, and the accuracy of the model decreases if the attention block is removed. The performance of the model also varies by selecting different features. The inter-patient paradigm is also a factor which affect the performance of the model. These limitations can be resolved by the use of class tokens and the multi attention heads similar to the vision transformers. In future, we plan to use a hybrid model of vision transformers to categorize different classes of arrhythmic cardiac and to resolve these limitations.
Section 5 presents a brief conclusion of this research.

Conclusions
Cardiovascular diseases are the primary cause of death. Most deaths are due to heart attacks. An irregular heartbeat causes a heart attack. A normal heartbeat has rhythm in the ECG plot, whereas an abnormal heartbeat shows an irregular ECG plot. These irregular ECGs are called cardiac arrhythmia. The timely classification of these cardiac arrhythmias can avoid potential damage. In this article, we propose the novel attentionbased ArrhythmiaNet with a CoF module to categorize seventeen classes of heartbeats. The proposed method achieved 99.84% accuracy with an SVM classifier. The sensitivity of the proposed technique was 100%, and the F1-score was 99%. The classification accuracy of ArrhythmiaNet with a kNN classifier was 98.64%, which is inferior to SVM. We have also compared the proposed framework with existing methods, and the experimental results verify that ArrhythmiaNet outperforms all the existing techniques in terms of accuracy.