Next Article in Journal
Development of an Artificial Neural Network Model to Predict the Tensile Strength of Friction Stir Welding of Dissimilar Materials Using Cryogenic Processes
Previous Article in Journal
In Silico Investigation of Taurodispacamide A and Strepoxazine A from Agelas oroides S. as Potential Inhibitors of Neuroblastoma Targets Reveals Promising Anticancer Activity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intra- and Interpatient ECG Heartbeat Classification Based on Multimodal Convolutional Neural Networks with an Adaptive Attention Mechanism

by
Ítalo Flexa Di Paolo
1,2,* and
Adriana Rosa Garcez Castro
1
1
Postgraduate Program in Electrical Engineering, Federal University of Pará, Belém 66075110, PA, Brazil
2
Center for Natural Sciences and Technology, Department of Computer Systems and Infrastructure, Pará State University, Ananindeua 67125118, PA, Brazil
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(20), 9307; https://doi.org/10.3390/app14209307
Submission received: 14 August 2024 / Revised: 4 October 2024 / Accepted: 8 October 2024 / Published: 12 October 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Echocardiography (ECG) is a noninvasive technology that is widely used for recording heartbeats and diagnosing cardiac arrhythmias. However, interpreting ECG signals is challenging and may require substantial time from medical specialists. The evolution of technology and artificial intelligence has led to advances in the study and development of automatic arrhythmia classification systems to aid in medical diagnoses. Within this context, this paper introduces a framework for classifying cardiac arrhythmias on the basis of a multimodal convolutional neural network (CNN) with an adaptive attention mechanism. ECG signal segments are transformed into images via the Hilbert space-filling curve (HSFC) and recurrence plot (RP) techniques. The framework is developed and evaluated using the MIT-BIH public database in alignment with AAMI guidelines (ANSI/AAMI EC57). The evaluations accounted for interpatient and intrapatient paradigms, considering variations in the input structure related to the number of ECG leads (lead MLII and V1 + MLII). The results indicate that the framework is competitive with those in state-of-the-art studies, particularly for two ECG leads. The accuracy, precision, sensitivity, specificity and F1 score are 98.48%, 94.15%, 80.23%, 96.34% and 81.91%, respectively, for the interpatient paradigm and 99.70%, 98.01%, 97.26%, 99.28% and 97.64%, respectively, for the intrapatient paradigm.

1. Introduction

According to the World Health Organization (WHO) [1], approximately 17.9 million people die from heart disease worldwide each year, accounting for 32% of all global deaths. In 2020, Ref. [2] estimated that 19.05 million deaths were from heart disease globally, representing an increase of 18.71% compared with 2010.
The diagnosis of heart disease can be performed through a combination of clinical evaluation, imaging tests (such as echocardiography, electrocardiography and cardiac magnetic resonance imaging) and functional assessments (such as exercise stress testing). Among these tests, an electrocardiogram (ECG) is the most commonly used test for evaluating cardiovascular health.
An ECG is a noninvasive technology that records cardiac signals. Electrodes are placed on the patient’s skin in different areas to detect electrical signal variations during contraction and relaxation of the heart. Figure 1 shows the signal of a normal heartbeat, which consists of the P wave, QRS complex and T and U waves, indicators that reflect the heart’s electrical activities, including repolarization and depolarization of the atrium and ventricle.
The signal obtained from the modified lead II (MLII) ECG lead, which uses electrodes placed on the right arm and left leg, is the most commonly used signal in clinical practice for medical diagnosis. This lead provides a clear and consistent view of the electrical activity of the heart, particularly its rhythm, which is important for identifying both normal and abnormal heart functions in patients. Among the most common heart diseases diagnosed is cardiac arrhythmia, which is characterized by the occurrence of irregular heartbeats. In practice, a specialist performs an extensive review of the patient’s records (more than 24 h).
As an alternative approach to aid in the diagnosis of cardiac arrhythmias, techniques for automatic classification based on machine learning have been introduced, including techniques based on deep learning, such as the convolutional neural networks (CNNs), which have been widely used in the field [3,4,5,6,7,8,9]. CNNs, which are known for their ability to extract relevant characteristics from cardiac signals for use in classification, can be used both for direct analysis of the temporal signal of ECGs [5,6,7] and for the analysis of ECG signals transformed into images [3,4,8,9].
In recent studies, multimodal CNNs have been introduced for medical diagnosis, demonstrating promising results [10,11,12,13,14]. Ref. [4] introduced a model with more than one sequence of convolutional layers to extract features from ECG signals and then applied a fusion process to obtain the features used in the classification process via the support vector machine (SVM) for cardiac arrhythmia diagnosis. Ref. [15] introduced a multimodal CNN and generated a 2D heatmap from the characteristics extracted from the ECG signal through the wavelet transform as one of the inputs. Ref. [16] used a multimodal CNN with input information from different ECG leads to classify arrhythmias. In [9], a strategy was implemented featuring a multimodal CNN using different images to represent ECG signals.
Recently, CNNs have been combined with attention mechanisms [9,17,18,19,20]. These mechanisms, which were initially applied in the field of natural language processing and demonstrated success, are based on the idea that humans tend to focus their attention on certain parts of a visual space to acquire information rather than viewing the image as a whole. When CNNs are introduced, the network may pay selective attention to specific features extracted from images, thus increasing classification performance. In the case of arrhythmia classification, several studies have been conducted, such as that by [21], where the authors introduced a bidirectional recurrent neural network based on hybrid hierarchical attention with a dilated CNN method for arrhythmia classification, achieving good results. Ref. [9] proposed an attention mechanism in the frequency channel to modify the feature map dynamically in CNNs.
Two approaches have been followed by the scientific community for the development of automatic arrhythmia classifiers, namely, the interpatient approach [15,22,23] and the intrapatient approach [4,9,21,24]. In the intrapatient approach, classifier training is performed with segments of ECG signals from a group of patients, and the tests are performed with different segments from the same group of patients. This setup allows the classifier to be evaluated for its ability to generalize across different conditions within the same group of patients. In the case of the interpatient approach, training is performed while considering one group of patients, and different groups of patients are used for the tests. This approach is the most suitable for the development of classifiers for use in medical clinics to support diagnoses.
The Association for the Advancement of Medical Instrumentation (AAMI) is an organization that establishes guidelines and standards for automatic arrhythmia classification. The ANSI/AAMI EC57 guidelines [25] define five main classes of arrhythmia for classification: normal (N), supraventricular ectopic beat (S), ventricular ectopic beat (V), fusion beat (F) and unknown beat (Q). The AAMI recommends the development and evaluation of classifiers using five public databases and considers the importance of high-quality and representative data. The Massachusetts Institute of Technology—Beth Israel Hospital Arrhythmia Database (MIT-BIH) is the most widely used database for the development of classifiers presented in the literature [4,9,15,21,22,23,24].
Recognizing the importance of advancing the studying and development of automatic classifiers for cardiac arrhythmias, this article presents a classifier structure based on a multimodal CNN with an adaptive attention mechanism. Two techniques are used to generate images from ECG signals, which serve as classifier inputs, namely, the Hilbert space-filling curve (HSFC) and the recurrence plot (RP). The proposed structure was developed and evaluated using the MIT-BIH public database, considering both the interpatient and intrapatient approaches and variations in the classifier input related to the number of ECG leads (lead MLII and MLII + V1).
The main contributions of this work are as follows:
  • Presentation of the results from the proposed structure using interpatient and intrapatient approaches.
  • Evaluation and recommendation of the HSFC technique as an additional method for transforming ECG signals into images for use in trained CNNs to achieve arrhythmia classification.
  • Evaluation and improvement of arrhythmia classification results by incorporating a multimodal CNN which integrates an attention module based on convolution with an adaptive kernel.
  • Demonstration of the possibility of improving the results of arrhythmia classification via ECG data obtained from lead V1 (electrode positioned at a specific point on the chest) together with data from the MLII lead (mostly used in clinical practice and in studies in the literature).
The remainder of this article is organized as follows. Section 2 presents the methodology adopted to develop the proposed structure for arrhythmia classification. Section 3 presents the experiments performed with the framework using the MIT-BIH signal database, considering the interpatient and intrapatient approaches. Section 4 presents the obtained results, Section 5 presents a discussion of the obtained results, and, finally, Section 6 presents the conclusions.

2. Methodology

2.1. Overview

Figure 2 presents an overview of the proposed structure for classifying cardiac arrhythmias on the basis of a multimodal CNN with an adaptive attention mechanism (CNN-AM). The structure consists of 3 main components: an input stage, where temporal ECG signals are transformed into images; multimodal convolutional layers; and a classifier with an adaptive attention mechanism.
Next, each component of a CNN-AM is described, highlighting its design, functionality in the structure and interactions with other components.

2.2. Input with Transformation of ECG Signals into Images

The input component receives a segment of the ECG signal and transforms it into two images, which are then passed to two sequences of convolutional layers. Two techniques are used for this process. During the definition stage, five techniques for transforming temporal signals into images were explored, namely, the Gramian angular field (GAF), RP, Markov transition field (MTF), heatmap (HM) and HSFC. Among these techniques, all except the HSFC have been used in the literature for transforming ECG signals into images and yield good results for arrhythmia classifiers based on CNNs [4,9,26,27,28]. The HSFC has been successfully applied in other areas [29,30], and, for this reason, it was also evaluated at this stage.
On the basis of the metrics adopted in this study to evaluate the arrhythmia classifier during the experimental phase, the best results were obtained using the HSFC and RP techniques, which were then selected for transforming ECG signals into input images.
The HSFC is a technique that enables the mapping of a time series (1D) into an image (2D matrix). During this process, the spatial proximity between samples in the time series is preserved. This feature facilitates spatial visualization, as the position of each element in the matrix reflects its original temporal order. The pixel intensity values align with the series samples, making it easier to detect abrupt changes in the temporal signal of the ECG in the generated images. In the HSFC, the time series is mapped into 2D space using a Hilbert curve, with each sample of the time series (indexed by i) being assigned a corresponding (x, y) coordinate on the Hilbert curve. In this case, the value of the point in the (x, y) coordinate will have the corresponding value of the time series, thus implying that the color scale of the image is formed. More details on the HSFC can be found in [31,32,33,34,35].
The RP is a technique that represents the recurrence of patterns in a temporal signal, highlighting relationships between points in the series and revealing changes in the dynamics of the signal [36,37].
For a time signal represented by N points x = [ x 1   x 2   x 3   x N ] , the RP is mathematically represented as a square matrix, n × n , where each element of the matrix is given by:
R i j ( ε ) = Θ ( ε x i x j )
where i, j = 1,…, N, ε is the threshold or tolerance limit that determines when two points in the series are considered recurrent, and Θ is the Heaviside function, which returns 1 (recurring points) if ε x i x j > 0 and 0 if ε x i x j < 0 . The norm includes ‖.‖ the minimum, maximum and Euclidean norms. In this case, the RP generates a black and white image.
In our study, we adopted a modified version of the RP [38] that generates a color image where each element of the matrix is obtained through:
R i j = x i x j
where i, j = 1,…, N and ‖.‖ is the Euclidean norm.
In addition to evaluating the RP and HSFC as techniques for the input component, other tests were performed to verify the influence of the color scale on the generated images in the classification results. The “Rainbow” color scale was also tested.
Figure 2 presents examples of ECG signal segments transformed into images via the RP and HSFC techniques, shown in an RGB rainbow for normal, supraventricular and ventricular cases.
In analyzing Figure 3, and considering the images generated by the RP technique, we can identify image regions that represent the temporal signal in the vicinity of the peaks, with the image being very sensitive to subtle changes in the frequency or amplitude of the temporal signal. Amplitude variations are observed as differences in the color intensity of the pixels of the recurrent regions. High amplitudes appear as red areas, whereas low amplitudes result in bluish areas. Frequency variations are visualized through the spacing between the main diagonal shown in the graph, where high frequencies result in closely spaced lines and low frequencies result in more widely spaced lines. The RP offers a clear view of repetitive structures and global patterns, highlighting regularities and irregularities in the heart’s rhythm. Now, considering the images generated by the HSFC, a global view of the temporal signal segment is provided, highlighting patterns and structures on wider time scales that are less affected by rapid variations that occur at small intervals in the domain (high frequency in a short time). The amplitude variations appear as differences in intensity (the red color represents the peaks on the rainbow color scale) between different regions of the image, whereas the frequency variations are observed as changes in the density and repetition of the color patterns in space in which the same colors in sequence represent similar frequencies and color variations also represent frequency variations.

2.3. Multimodal Convolutional Layers

The multimodal convolutional layer component captures the spatial features of the images generated by the input component. This layer consists of two sequences of convolutional layers in parallel, based on the AlexNet structure [39], with some modifications being used to better highlight the edges of the input images. These edges, particularly in images generated by the HSFC and RP from the ECG signal segments, may contain important information, as shown in Figure 3. The structures of denser networks, such as VGG and ResNet, were tested, and the best results were obtained with AlexNet.
Table 1 presents information regarding the structure of each sequence of convolutional layers. Each sequence process involves an image in the RGB format (three channels), with a resolution of 224 × 224 as input, and produces a 6 × 6 matrix at the output of each sequence, containing 256 channels.
The sequence includes a component called “features” for extracting high-level features from the input image. An image is received in the RGB format (three channels) with a resolution of 224 × 224, and a convolution operation with 64 filters of size 11 × 11, with a step of 4 and a fill of 6, is applied. The reflective fill mode is used to highlight the edges. This step helps in extracting low-level features from the image. Next, a rectified linear unit (ReLU) activation function introduces nonlinearity, followed by max pooling with a kernel size of 3 × 3 and a stride of 2, reducing the spatial resolution of the output (width and height) by half due to the stride. The pooling window method is applied to select the maximum value in each region of 3 × 3 size, and the window is slid with a step of 2 pixels, maintaining 64 input channels. Next, there is a second convolution layer with 192 filters of size 5 × 5 and a fill of 2. The reflective fill mode is employed to maintain edge highlights. This is followed by a ReLU activation function and a max pooling operation with a 3 × 3 kernel and a step size of 2. The third, fourth and fifth convolution layers have 384, 256 and 256 filters, respectively, each using a 3 × 3 kernel with a padding of 1 and followed by a ReLU activation function. The “feature” component involves another max pooling layer with a 3 × 3 kernel and a step size of 2. After the “feature” component, there is an adaptive Avgpool2D layer that resizes the output of the 256-channel features via an adaptive averaging operation to create a new compact representation of the data in a 6 × 6 matrix with a depth of 256 channels.
Before being passed to the classifier component with the attention mechanism, the output of each sequence of convolutional layers with dimensions of 256 × 6 × 6 passes through a flattened layer for conversion to vectors f1 and f2 with dimensions of 1 × 9216.

Classifier with an Attention Mechanism

The classifier component with the adaptive attention mechanism, as shown in Figure 4, is the final decision-making component of the CNN-AM structure. The feature vectors, f 1 and f 2 , extracted from the convolutional layers are passed to two sequences of fully connected layers, FC1 and FC2, which serve as inputs. Each FC layer is composed of a layer with 4096 neurons with a ReLU activation function, followed by a dropout of 0.5 for regularization and a layer with 1024 neurons with a ReLU activation function. The two sequences of fully connected layers map the vectors f 1 and f 2 (with dimensions of 1 × 9216) into a new reduced feature space, f 1 and f 2 , with dimensions of 1 × 1024. The numbers of neurons in the fully connected layers were defined during the experimental phase through end-to-end training of the structure, consistently aiming to achieve the best performance in the evaluation metrics of the classifier.
The feature vectors f 1 and f 2 are normalized to the interval [−1, 1] to reduce unstable gradients in backpropagation, and they are then merged as follows:
L = A v g ( f 1 , f 2 )
where A v g represents the mean of the vectors.
The resulting 1 × 1024 vector L is a global representation of the features extracted through the sequences of fully connected convolutional layers. This vector is the input to the adaptive attention mechanism to determine the relative importance of each element in the vector L by calculating a weight vector.
The adaptive attention mechanism proposed here, as shown in Figure 4, was inspired by the studies of [4,40]. The authors of these studies introduced the multistage gated average fusion (MGAF) technique, which uses an attention mechanism based on convoluting the feature vector with a fixed kernel (high-boost filter). This strategy is similar to that used in the study by [20].
In the proposed attention mechanism, a vector H is obtained by convoluting the vector L with a kernel of dimension 1 × 3 according to:
H [ i ] i = 0 1023 = L k e r n e l = k = 1 3 L i + k 2 . k e r n e l [ k ]
where the index i ranges from 0 to 1023 to traverse the positions of vector L. The dimensions of L and the kernel are 1 × 1024 and 1 × 3, respectively. Therefore, the convolution operation is performed by summing L at the boundaries, where i + k − 2 represents the value of the input vector displaced by the k- k e r n e l index. For H to maintain 1024 positions, a padding value equal to zero was included at the beginning of the vector.
The convolution captures spatial relationships in the L vector, adjusting the k e r n e l relationship during the end-to-end learning process of CNN-MA.
The vector of attention weights W is obtained by applying a hyperbolic tangent function to the vector H as:
W = T a n h ( H )
The output vector of the attention mechanism is obtained as:
F = W L
where is the elementwise product and F is a 1 × 1024-dimensional vector that represents the global characteristics, with their relative importance being determined by the vector of weights W .
The output of the attention mechanism, vector F , is passed through a ReLU activation function to introduce nonlinearity into the model. The vector C is generated through a L o g S o f t m a x function, which empirically demonstrates improvements in classification performance during the CNN-MA training phase:
D i = L o g S o f t m a x C i = l o g e C i j = 1 1024 e C j
where i is the i-th element of the vector C and the logarithm is applied to the exponent of each element C i and divided by the sum of the exponentials of all the elements in C, transforming the original values into a logarithmic probability distribution.
Finally, the resulting vector D of features, with dimensions of 1 × 1024, is presented as input for FC3, which is composed of 3 output neurons. Each neuron applies the softmax function as the activation function to generate the probability of each of the 3 possible classes of CNN-AM output, namely normal (N), supraventricular arrhythmia (S) and ventricular arrhythmia (V).

3. Experiment Setup

3.1. Database of ECG Signals

The AAMI is an organization that establishes guidelines and standards for the automatic classification of ECG signals (ANSI/AAMI, 2020) [25]. They have developed a set of recommendations known as the AAMI Guidelines for the Classification of Electrocardiograms (ANSI/AAMI EC57), which provides guidance on terminology, performance criteria and evaluation metrics for automated cardiac arrhythmia classification systems.
The AAMI guidelines define five main classes of arrhythmias widely used in the automatic classification of ECG signals: normal (N), supraventricular (S), ventricular (V), fusion of normal and ventricular (F) and unknown (Q). These guidelines are important, as they standardize the terminology and approach, allowing for a comparison of results between different automatic classification systems that have been proposed in the literature. This standardization also contributes to the development of more reliable systems for cardiac arrhythmia classification.
The ANSI/AAMI standard [25] emphasizes the importance of high-quality data for the development and evaluation of arrhythmia classification algorithms and recommends the use of certain public databases. Among the recommended databases is the MIT-BIH dataset (2005), which was used in this study to develop and test the proposed structure, considering both the interpatient and intrapatient approaches. The MIT-BIH database has been widely used for the development and validation of automated classification models of ECG signals and is an important reference for the scientific community in the performance evaluation and comparison of different methods.
This database contains 48 recordings of ECG signals from 47 patients that are categorized into various classes that represent different types of arrhythmias and cardiac conditions. The patients range in age from 32 to 89 years, and the data are recorded at a sampling frequency of 360 Hz, totaling approximately 30 min of ECG signal measurements per patient. Table 2 presents the organization of the annotations and classes in the MIT-BIH database.
For the ECG signal leads, the MIT-BIH database [41] contains signals obtained from the MLII and V leads, with the MLII lead being the most commonly used for the development of the arrhythmia classifiers presented in the state-of-the-art studies, which were also used in our work.
Regarding the signals obtained from the V lead, from the results presented by [15,16], it was concluded that classification using the V and MLII leads for classification could lead to improved results. The signals from the V lead are obtained from electrodes placed on the chest, which record the action potentials in the ventricular muscles, which may favor the classification of class V (ventricular) arrhythmias [42]. For this case, two segments of ECG signals are used as inputs for the CNN-MA structure (data segment of the MLII lead and data segment of the V lead), and, for each ECG segment, two images (HSFC and RP) are generated, which are then used as inputs for the four multimodal convolutional layers.

Preprocessing of the ECG Signal

All original records from the MIT-BIH database were segmented into 2 s intervals, with the segmentation being based on the observation of the time required to capture at least two RR intervals per segment.
After segmentation, the signals were cleaned using a technique based on visibility graphs, as described in a study by [43]. This technique was designed to highlight the R peaks and reduce signal noise. After the signals were cleaned, the baseline, which may have been generated by factors such as breathing or movement during patient data collection, was removed.
Figure 5 shows an example with three segments obtained from a patient record after signal cleaning and baseline removal.

3.2. Training and Testing Database

3.2.1. Interpatient Database

The main reference for the division of the database into training and testing for the interpatient approach was a study by [44], who proposed dividing the MIT-BIH database into two sets, DS1 (training) and DS2 (testing). This division is widely used for the development of classification systems proposed in the literature, thus facilitating a comparison of the results. However, some studies revealed problems with this division and proposed improvements that were also adopted in our study. Refs. [45,46] reported imbalances between the classes of the two sets, leading the authors to suggest the addition of two medical records for the same patient in the two sets, DS1 and DS2, an approach also adopted by [23,47]. Ref. [44] reported concerns regarding the use of unbalanced register 232 in DS2, which contains more than 75% of class S heartbeats with incorrect rhythm annotations in some segments. To maintain the integrity and reliability of the study results, the authors decided to remove record 232 from DS2 to avoid distortions in the evaluation results of the algorithms.
Table 3 shows the organization of training and test data for the interpatient paradigm used in this study, where only classes N, S and V were selected for classification. Refs. [22,23,48,49,50,51] indicated that, as observed in practice, the F class is considered difficult to classify owing to the complex nature of the QRS complexes in ECG signals, where its morphology varies significantly, as the fusion of normal heartbeats and ventricular premature heartbeats creates labeling ambiguities for this class of arrhythmia.

3.2.2. Intrapatient Database

The data used in the DS1 and DS2 interpatient approaches were merged, and then, at random, regardless of patient status, new databases were created with 80% training data and 20% test data, proportionally balanced for each class, as shown in Table 4.

3.2.3. Data Augmentation

The data augmentation method is an effective technique for expanding training datasets by generating new data from the original data, thereby balancing the data distribution by generating synthetic data for the minority classes. The strategy aims to create variations in existing data, allowing the neural network to be better generalized with new data and thus improve its classification ability. Several methodologies can be employed, such as adding noise to the temporal signals and applying rotation, mirroring and translation in the case of images, among others. In this study, the synthetic minority oversampling technique (SMOTE) [52] and a Wasserstein generative adversarial network with gradient penalty WGAN-GP [53,54] were used to increase the amount of data for the temporal segments of the ECG. The SMOTE was used to augment the data for class V, and, for class S, the WGAN-GP technique was used. Empirically, from the tests performed, it was observed that this combination provided better results for structure classification, where the use of the WGAN-GP technique specifically to increase the training data of the S class provided greater sensitivity for this class, which was more difficult to classify.
Table 5 presents the new organization of the database for training and testing for the interpatient approach with data augmentation, and Table 6 presents the new organization of the database for the intrapatient approach with data augmentation.

3.3. Classifier Evaluation Metrics

The ANSI/AAMI EC57:2012 R2020 standard [25] specifies the indicators that should be used to evaluate the performance of arrhythmia classifiers, including accuracy, precision or positive hits, sensitivity (or recall), and specificity, which can be calculated as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
S p e c i f i c i t y = T N T N + F P
where T P indicates true positives, T N indicates true negatives, F P indicates false positives and F N indicates false negatives.
Additionally, as frequently observed in the literature and recommended for evaluation in the case of highly unbalanced databases, the harmonic mean between precision and sensitivity, called the F 1   S c o r e , can be used:
F 1   S c o r e = 2 . P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l
Each of these indicators contribute to a meaningful interpretation of the automatic ECG signal classification results:
  • Accuracy: Accuracy is the percentage of correct diagnoses of cardiac arrhythmias (or normal cases). Importantly, in unbalanced databases, such as the MIT-BIH, the majority of cases are the normal class, requiring other indicators to better evaluate the classifier.
  • Precision: Precision is important in scenarios where the cost of false positives is high (for example, an incorrect diagnosis of arrhythmia). High precision means that the model produces fewer false positives.
  • Recall or Sensitivity: Recall is essential in situations where it is important to minimize false negatives, i.e., ensure that arrhythmias are detected, even if there are some false positives.
  • Specificity: Specificity is important in evaluating false positives, indicating whether the model erroneously classifies a healthy patient as having an arrhythmia.
  • F1 score: The F1 score is a metric that balances the ability of the model to correctly identify arrhythmias (recall) with the lowest possible number of false positives (precision). A high F1 score indicates a good balance between detecting arrhythmias and avoiding false positives.

3.4. CNN-AM Training

To perform CNN-AM training, an important preliminary step is normalization of the images, performed during the transformation of ECG signals into images. This step is important due to the extensive variation in pixel intensity and color distribution in the images, which are factors that can result in challenges during the training phase, such as slow convergence and model instability. In addition to promoting stability during training, normalization reduces the sensitivity of multimodal convolutional layers to small fluctuations in pixel values, improving the model’s ability to generalize to data not seen during training.
The normalization adopted in this study includes two distinct steps. Initially, various resolutions and pixel densities were explored, and, regardless of the time series sample size, the resulting images should maintain a square shape with custom dimensions. Among the resolutions commonly found in the literature, such as 64 × 64, 224 × 224 and 227 × 227 pixels, we chose to adopt the approach of [39], derived from the AlexNet network, which uses images of 224 × 224 pixels and 300 dpi. This size was chosen because it offers a resolution that does not hide crucial details between the classes while avoiding excessive robustness and high hardware requirements.
The next step involved calculating the mean and standard deviation (std) of each channel for the images (three for RGB and one for grayscale). The reason for using the mean and std of the images for normalization was to adjust the scale of the pixel values of the images so that they are close to a normal (Gaussian) distribution with a mean of zero and a standard deviation of one, forcing the pixel values into a narrower range, which may provide more stable gradient propagation and faster convergence during training [55,56]. The identification of these normalization parameters was performed with images from the training database, and, with the parameters identified, the images in both the training and test databases were normalized.
All components of the proposed CNN-AM were implemented in Python 3.9.15 using the PyTorch 1.12 and CUDA 11.6 libraries. The hardware setup included an Intel Core i7 12th-generation processor with 14 cores, operating at 4.7 GHz with 32 GB of RAM and 4800 MHz, and a NVIDIA GeForce RTX 3070 Ti GPU with 8 GB of GDDR6 memory. Additionally, a Google Colab Pro+ environment was implemented. During the experiments, an Intel Xeon 12-core 2.2 GHz processor with 83 GB of RAM and a 40 GB NVIDIA A100 GPU was used for configuration.
During the training phase, the minimized cost function was based on the hinge loss, which was introduced by [57] and adapted by [58] for multiclass classification as follows:
H i n g e   L o s s = m a x 0 , 1 y ^ y + m a x i y ( y ^ i )
where y 0,1 , 2 is the desired output for the 3-class case, y ^ R 3 is the expected output per class, y ^ y refers to elements that are of class y and max i y y ^ i refers to the maximum value of the elements of y ^ that are not of class y .
The hinge loss is commonly used in binary classification problems, especially in linear machine learning contexts, such as in support vector machines (SVMs). The main idea is to penalize incorrect model predictions, where the penalty increases as the distance between the correct prediction and the model prediction increases. The loss is zero when the model prediction is correct and positive when the model prediction is incorrect. For our case, which is a multiclass problem (three classes), the MulticlassHingeLoss function from the TorchMetrics library, which is the hinge loss function adapted for the case of multiple classes, was used.
The adaptive moment estimation (Adam) optimizer was used [59] with an initial learning rate of 4 × 10−6, learning blocks (batch size) with 128 images, a maximum number of 300 epochs and a learning rate decay of 50% every 10 training epochs without error reduction.
The experiments were performed on CNN-AM for arrhythmia classification via the MIT-BIH database while considering the interpatient and intrapatient approaches and the input of the CNN-AM signals from only the MLII lead and signals from the MLII and V leads. The training was performed with a focus on automatic end-to-end classification, with an evaluation of the performance of the structure from the metrics being obtained from the test base to define the best model for each approach. As an objective way of defining the best model, the F1 score was adopted as the main reference metric for both approaches because it is an important metric that relates recall and precision, especially for unbalanced classes such as the base used in this study, and the sensitivity for class S was also adopted as a metric for choosing the best models for the interpatient approach. Owing to the difficulty in classifying its records, this is an important focus, as already presented and discussed in the literature.

4. Results

In this section, the results obtained in the experiments performed at the CNN-AM proposed for the classification of arrhythmias via the MIT-BIH database are presented, considering the interpatient and intrapatient approaches.

4.1. Interpatient Approach Results

The CNN-AM training and tests for the interpatient approach were performed while considering two situations: input signals from only the MLII lead and input signals from the MLII and V leads.
Table 7 presents the global indicators obtained for the DS2 test database for models trained with data from the MLII signals for two classifiers. The MLII classifier focusing on the F1 score (MLII-F1) achieved the best overall results, while the best accuracy was achieved by the model focusing on the S class (MLII-S).
Table 8 presents the global indicators, considering the DS2 test database for the case of CNN-AM training with databases from the MLII and V derivations. The best overall result was achieved using the model focusing on the F1 score (MLII + V-F1) and the best accuracy was achieved for the model focusing on the S class (MLII + V-S).
Table 9 presents the indicators by class for the DS2 test database, considering the MLII derivation, for the best CNN-AM results.
Table 10 presents the indicators by class for the DS2 test database, considering the MLII and V leads, for the best CNN-AM results.

4.2. Intrapatient Approach Results

The CNN-AM training and tests for the intrapatient approach were performed while considering two situations: input signals from only the MLII lead and input signals from the MLII and V leads.
The best classifiers for this approach were obtained from the analysis of the highest global F1 score because, for this type of approach, the test database contains ECG signal segments from the same patients as those in the training database, resulting in a process evaluation bias. This bias occurs because the classifiers may learn the specific characteristics of individual heartbeats during the training phase, leading to significant values in all metrics per class for the test database.
Table 11 presents the global indicators by class for the test database, considering the MLII and MLII + V derivations.

5. Discussion

5.1. Interpatient Approach

Considering the results presented in Table 7 and Table 8 for the MLII derivation, focusing on the classifier based on the F1 score (MLII-F1), the model reached 97.29% in overall accuracy and achieved a 75.96% F1 score. However, it encountered challenges when classifying the S class, with 41.1% sensitivity and 28.0% accuracy, but reached 98.3% accuracy and 98.9% precision for the N class and 95.5% accuracy and 96.2% precision for the V class. The difficulty in learning the S class may arise from the fact that the database for the V class includes several subtypes of supraventricular arrhythmias, with distinct characteristics, which can make learning and classification more difficult, considering the reduced number of patterns of this class for training [60].
When analyzing the classifier performance in terms of the sensitivity of the S class (MLII-S), the best model obtained a sensitivity of 77.5% sensitivity and a precision of 7.6%, with an overall accuracy of 83.89% and an F1 score of 61.86%.
Considering the results of the classifiers with data from the MLII and V leads presented in Table 9 and Table 10, for the classifier focusing on the F1 score, there is clearly an increase in all the metrics compared with the results without the V lead, with increases of 1.19%, 19.74%, 1.90%, 0.92% and 5.95% being seen for accuracy, precision, sensitivity/recall, specificity and F1 score, respectively. However, despite the increase in metrics for the S class, the model has difficulty learning this class for the same reasons already indicated for the classifier with only the derivation of the MLII lead.
Now, considering the classifier focusing on the sensitivity of the S class, there was a 12.6% increase in the accuracy compared with the accuracy achieved by the model using data from only one lead, but sensitivity was reduced to 69.3%. In addition, the F1 score difference between these two scenarios was significant, with an increase of 13.39% for this indicator, especially because of the increased accuracy and specificity for the N and V classes with the two-lead approach.

5.2. Intrapatient Approach

The performance of the model proposed for the intrapatient approach is presented in Table 11, which shows results that can be considered promising for this type of approach. Significant values can be observed for all the metrics, especially accuracy and precision. There was less difficulty in classifying the S class in this approach because of the use of test data from the same patients used for training, reaching recalls of 89.2% for one lead and 92.9% for two leads, with accuracies of 92.7%. and 94.7%, respectively.
The difference between the one- and two-lead classifiers can also be observed, with increases in the two-lead classifier of 0.11%, 0.84%, 1.29%, 0.24% and 1.05% in regard to accuracy, precision, sensitivity/recall, specificity and F1 score, respectively.

5.3. Comparison of Results with the State-of-the-Art Methods

In the context of comparing the results of this study with those of state-of-the-art methods, a search for related studies that also used MIT-BIH data was performed while considering the interpatient and intrapatient approaches.
With respect to the performance indicators for comparison, it is necessary to analyze each paradigm separately. For the interpatient paradigm, researchers do not usually only use global indicators for comparison purposes, which are as important as the comparison with indicators by class. In this context, Table 12 and Table 13 were prepared for the interpatient approach, and the same divisions (DS1 and DS2) were adopted for all studies, albeit with some differences in the organization methodology that do not interfere with the purpose of general comparison.
Considering the indicators by class presented in Table 13, the difficulty in classifying the S class is widely recognized in all studies, and the trend in greater overall accuracy implies a lower sensitivity for the N class, as can be observed in [23,48], with values of 91.3% and 40.3%, respectively. There is a wide variation in focus in the development of studies which should also be considered for comparison, as studies such as those by [15,23] effectively explore the accuracy improvement of the S class, whereas others, such as those by [22,50,61], focus more on general indicators.
The study by [51] is noteworthy in this context because it advances methodologies for segmentation and the extraction of features in 1D data, revealing characteristics that have become decisive in regard to presenting one of the best current metrics for this paradigm, with a 92.27% overall F1 score and 83.3% and 83.5% accuracy and precision for the S class, respectively, without significantly reducing the sensitivity of the N class.
Our results indicate that the proposed CNN-AM, especially for the two-lead scenario with a higher F1 score (MLII + V-F1), achieves the highest overall accuracy and precision, with values of 98.48% and 94.15%, respectively, and a high F1 score of 81.91%. In the scenario with one derivation and a higher F1 score (MLII-F1), the results are close to the values of correlated studies, with better performance in terms of the general indicators, precision and F1 score, demonstrating good model performance.
With respect to the S class and the difficulties in its classification, a strong fluctuation in the values of the metrics obtained in all the studies can be observed, and our model with one derivation yielded a poor precision of 7.5% for this class, even with 77.5% accuracy. The performance of the S class was superior for only two leads, with accuracies of 43.3% and 56.3% in the scenario focusing on the highest overall F1 score and 69.3% and 20.2% for the scenario focusing on the greater sensitivity of the S class. In this context, the model with two leads and a higher F1 score becomes the most competitive in relation to the results presented in the comparison studies.
In the case of the intrapatient paradigm, most studies in the literature only present results regarding the global indicators; thus, Table 14 was prepared. This table includes all related studies that used the MIT-BIH database but with a division of the different training and test databases, making a direct comparative analysis of the results impossible.
Table 14 indicates that the intrapatient paradigm is promising, with competitive results for all the metrics, especially the results obtained for the proposed classifier with two derivations.

5.4. CNN-MA Component Analysis: Ablation Study

To evaluate the contributions of certain individual components within the proposed CNN-AM structure, we conducted a study in which new classifiers were trained by removing certain model components. Table 15, Table 16, Table 17 and Table 18 present the global interpatient metrics, featuring metrics by class for the interpatient and intrapatient approaches, respectively, of each new model developed and our CNN-AM for comparison, considering the obtained metrics with the test database and with the MLII and MLII + V derivations. The models are as follows:
  • AlexNet with ECG signal-to-image transformation with RP (M1);
  • AlexNet with ECG signal-to-image transformation with the HSFC (M2);
  • CNN-AM removing the attention module and for an MLII lead (M3);
  • CNN-AM with the attention module and for an MLII lead (M4);
  • CNN-AM removing the attention module and for two MLII + V leads (M5);
  • CNN-AM with the attention module and for two leads MLII + V leads (M6).
Considering the results for the interpatient and intrapatient cases, for the M1 and M2 models, which use only AlexNet with the HSFC or RP, the M4 models (MLII-F1) and M6 (MLII + V-F1) achieved superior global results, with improvements in all the metrics, indicating that the proposed structure, with the multimodal convolutional layer sequence and attention mechanism, can improve the results of the model based on conventional AlexNet despite the increase in computational complexity.
For the models in which the attention mechanism was removed from the CNN-AM structure (models M3 and M5), for the cases of the interpatient approach and the MLII + V1 leads, the results achieved by CNN-AM were slightly higher. Although the one-lead M3 model showed superior performance in terms of the global indicators, the sensitivity of class S for the M4 model with the attention module increased, which is important for the interpatient case. For the intrapatient case, the M5 model yielded higher global indicators, whereas the M4 model with the attention module was superior in the case with one MLII lead.
In all cases, the inclusion of the attention mechanism resulted in greater sensitivity to the S class. This result is a good indication of the importance of the attention mechanism in the structure, since, for the interpatient approach, which is the most realistic approach in relation to clinical practice, the classification for this type of arrhythmia has been challenging, with studies being presented in the literature specifically aimed at improving the model performance of this class [15,23,61,64].

6. Conclusions

In this article, we introduce a structure based on a multimodal CNN with an attention module for the end-to-end classification of cardiac arrhythmias. This model was evaluated via the MIT-BIH database, considering both the interpatient and intrapatient paradigms. Two multimodal CNN structures were tested, one considering two input images generated with different techniques, namely, RP and the HSFC, from ECG signals with only the MLII lead and another with MLII and V leads. The convolutional layers of the structure were based on AlexNet, with a modification to highlight the edges of the images that were generated from the ECG signals and that, as observed, were concentrated with a significant amount of information. This adaptation resulted in improvements in the classification results. The feature vectors extracted by the sequences of convolutional layers, after being processed by fully connected layers, were merged and passed to an attention mechanism that highlighted the most relevant merged features for the classification process. The presence of the attention mechanism increased the global indicators and the sensitivity for the S class, and its introduction led to the generation of different results between the inter- and intrapatient paradigms. The best results achieved by the proposed structure, considering the overall accuracy and F1 score for the interpatient paradigm, were 98.48% and 81.91%, respectively, while, for the intrapatient paradigm, they were 99.59% and 96.59%, respectively. These results can be considered promising and competitive with the results already presented in state-of-the-art studies; they demonstrate the importance of using the signals of the MLII and V leads together to improve the classification results, especially for the S class, which, as already noted, is a difficult class to learn owing to its signal characteristics. In the future, we intend to improve the proposed structure to obtain better results, especially considering the indicators for the S class using the interpatient approach.

Author Contributions

All the authors contributed to the conception and design of the study. Material preparation, data collection and analysis were performed by Í.F.D.P. and A.R.G.C. The first draft of the manuscript was written by Í.F.D.P. and A.R.G.C., who read and approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

Financial support was received from the Government of Pará State through Pará State University, Brazil, Resolution No. 3837/2022 of the University Council, according to electronic protocol 2022/8566.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MIT-BIH database used here is publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. WHO—World Health Organization. Cardiovascular Diseases 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 7 October 2024).
  2. Tsao, C.W.; Aday, A.W.; Almarzooq, Z.I.; Anderson, C.A.M.; Arora, P.; Avery, C.L.; Baker-Smith, C.M.; Beaton, A.Z.; Boehme, A.K.; Buxton, A.E.; et al. Heart Disease and Stroke Statistics—2023 Update: A Report From the American Heart Association. Circulation 2023, 147, 8. [Google Scholar] [CrossRef]
  3. Izci, E.; Ozdemir, M.A.; Egirmenci, M.; Akan, A. Cardiac arrhythmia detection from 2D ECG images by using deep learning technique. In Proceedings of the Medical Technologies Congress (TIPTEKNO), İzmir, Turkey, 3–5 October 2019. [Google Scholar] [CrossRef]
  4. Ahmad, Z.; Tabassum, A.; Guan, L.; Khan, N.M. ECG Heartbeat Classification Using Multimodal Fusion. IEEE Access 2021, 9, 100615–100626. [Google Scholar] [CrossRef]
  5. Fradi, M.; Khriji, L.; Machhout, M.; Hossen, A. Automatic heart disease class detection using convolutional neural network architecture-based various optimizers-networks. IET Smart Cities 2021, 3, 3–15. [Google Scholar] [CrossRef]
  6. Ahmed, A.A.; Ali, W.; Abdullah, T.A.A.; Malebary, S.J. Classifying Cardiac Arrhythmia from ECG Signal Using 1D CNN Deep Learning Model. Mathematics 2023, 11, 3. [Google Scholar] [CrossRef]
  7. Rawal, V.; Prajapati, P.; Darji, A. Hardware implementation of 1D-CNN architecture for ECG arrhythmia classification. Biomed. Signal Process. Control 2023, 85, 104865. [Google Scholar] [CrossRef]
  8. Mewada, H. 2D-wavelet encoded deep CNN for image-based ECG classification. Multimed. Tools Appl. 2023, 82, 20553–20569. [Google Scholar] [CrossRef]
  9. Zhou, F.; Fang, D. Multimodal ECG heartbeat classification method based on a convolutional neural network embedded with FCA. Sci. Rep. 2024, 14, 8804. [Google Scholar] [CrossRef] [PubMed]
  10. Asfand-e-yar, M.; Hashir, Q.; Shah, A.A.; Malik, H.A.N.; Alourani, A.; Khalil, W. Multimodal CNN-DDI: Using multimodal CNN for drug to drug interaction associated events. Sci. Rep. 2024, 14, 4076. [Google Scholar] [CrossRef]
  11. Jiang, W.; Zhang, Y.; Han, H.; Huang, Z.; Li, Q.; Mu, J. Mobile Traffic Prediction in Consumer Applications: A Multimodal Deep Learning Approach. IEEE Trans. Consum. Electron. 2024, 70, 3425–3435. [Google Scholar] [CrossRef]
  12. Tanioka, S.; Aydin, O.U.; Hilbert, A.; Ishida, F.; Tsuda, K.; Araki, T.; Nakatsuka, Y.; Yago, T.; Kishimoto, T.; Ikezawa, M.; et al. Prediction of hematoma expansion in spontaneous intracerebral hemorrhage using a multimodal neural network. Sci. Rep. 2024, 14, 16465. [Google Scholar] [CrossRef]
  13. Wajid, M.A.; Zafar, A.; Terashima-Marín, H.; Wajid, M.S. Neutrosophic-CNN-based image and text fusion for multimodal classification. J. Intell. Fuzzy Syst. 2023, 45, 1039–1055. [Google Scholar] [CrossRef]
  14. Wang, D.; Gan, J.; Mao, J.; Chen, F.; Yu, L. Forecasting power demand in China with a CNN-LSTM model including multimodal information. Energy 2023, 263 Part E, 126012. [Google Scholar] [CrossRef]
  15. Wang, T.; Lu, C.; Sun, Y.; Yang, M.; Liu, C.; Ou, C. Automatic ECG Classification Using Continuous Wavelet Transform and Convolutional Neural Network. Entropy 2021, 23, 119. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, F.; Li, M.; Song, L.; Wu, L.; Baiyang, W. Multi-classification method of arrhythmia based on multi-scale residual neural network and multi-channel data fusion. Front. Physiol. 2023, 14, 1253907. [Google Scholar] [CrossRef]
  17. Toğaçar, M.; Ergen, B.; Cömert, Z. BrainMRNet: Brain Tumor Detection using Magnetic Resonance Images with a Novel Convolutional Neural Network Model. Med. Hypotheses 2020, 134, 109531. [Google Scholar] [CrossRef]
  18. Liu, M.; Yang, J. Image Classification of Brain tumor based on Channel Attention Mechanism. J. Phys. Conf. Ser. 2021, 2035, 012029. [Google Scholar] [CrossRef]
  19. Jun, W.; Zheng, L. Brain Tumor Classification Based on Attention Guided Deep Learning Model. Int. J. Comput. Intell. Syst. 2022, 15, 35. [Google Scholar] [CrossRef]
  20. Tang, C.; Li, B.; Sun, J.; Wang, S.-H.; Zhang, Y.-D. GAM-SpCaNet: Gradient awareness minimization-based spinal convolution attention network for brain tumor classification. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 560–575. [Google Scholar] [CrossRef] [PubMed]
  21. Islam, S.; Hasan, K.F.; Sultana, S.; Uddin, S.; Lio’, P.; Quinn, J.M.W.; Moni, M.A. HARDC: A novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netw. 2023, 162, 271–287. [Google Scholar] [CrossRef] [PubMed]
  22. Garcia, G.; Moreira, G.; Menotti, D.; Luz, E. Inter-Patient ECG Heartbeat Classification with Temporal VCG Optimized by PSO. Sci. Rep. 2017, 7, 10543. [Google Scholar] [CrossRef]
  23. Dias, F.M.; Monteiro, H.L.M.; Cabral, T.W.; Naji, R.; Kuehni, M.; Luz, E.J.S. Arrhythmia classification from single-lead ECG signals using the inter-patient paradigm. Comput. Methods Programs Biomed. 2021, 202, 105948. [Google Scholar] [CrossRef] [PubMed]
  24. He, R.; Liu, Y.; Wang, K.; Zhao, N.; Yuan, Y.; Li, Q. Automatic detection of QRS complexes using dual channels based on U-Net and bidirectional long short-term memory. IEEE J. Biomed. Health Inform. 2021, 25, 4. [Google Scholar] [CrossRef]
  25. ANSI/AAMI EC57:2012 (R2020). Testing And Reporting Performance Results of Cardiac Rhythm and ST Segment Measurement Algorithms 2020. AAMI. Available online: https://webstore.ansi.org/Standards/AAMI/ANSIAAMIEC572012R2020 (accessed on 7 October 2024).
  26. Mathunjwa, B.M.; Lin, Y.-T.; Lin, C.-H.; Abbod, M.F.; Shieh, J.-S. ECG arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed. Signal Process. Control 2021, 64, 102262. [Google Scholar] [CrossRef]
  27. Farag, M.M. A Self-Contained STFT CNN for ECG Classification and Arrhythmia Detection at the Edge. IEEE Access 2022, 10, 94469–94486. [Google Scholar] [CrossRef]
  28. Adib, E.; Fernandez, A.S.; Afghah, F.; Prevost, J.J. Synthetic ECG Signal Generation Using Probabilistic Diffusion Models. IEEE Access 2023, 11, 75818–75828. [Google Scholar] [CrossRef]
  29. Borrell, R.; Cajas, J.C.; Mira, D.; Taha, A.; Koric, S.; Vázquez, M.; Houzeaux, G. Parallel mesh partitioning based on space filling curves. Comput. Fluids Elsevier 2018, 173, 15. [Google Scholar] [CrossRef]
  30. Liu, H.; Zhang, W. Spatial and temporal variation and convergence in the efficiency of high-standard farmland construction: Evidence in China. J. Clean. Prod. 2024, 452, 142200. [Google Scholar] [CrossRef]
  31. Hilbert, D. Ueber die stetige Abbildung einer Line auf ein Flächenstück. Math. Ann. 1891, 38, 459–460. [Google Scholar] [CrossRef]
  32. Feng, C.; Shu, S.; Wang, J.; Li, Z. The parallel generation of 2-D Hilbert Space-filling Curve on GPU. In Proceedings of the 5th International Conference on BioMedical Engineering and Informatics, Chongqing, China, 16–18 October 2012. [Google Scholar] [CrossRef]
  33. Skilling, J. Programming the Hilbert curve. AIP Conf. Proc. 2004, 707, 381–387. [Google Scholar] [CrossRef]
  34. Wang, Z.; Oates, T. Spatially Encoding Temporal Correlations to Classify Temporal Data Using Convolutional Neural Networks. arXiv 2015, arXiv:1509.07481. [Google Scholar] [CrossRef]
  35. Earl, D. Script to Plot 1D Data in 2D Using the Hilbert Curve. Honestly a Pretty Terrible Visualization Technique for Conveying Information, but It Looks Cool 2013. Santa Cruz, CA, USA. Available online: https://github.com/dentearl/simpleHilbertCurve (accessed on 7 October 2024).
  36. Eckmann, J.-P.; Kamphorst, S.O.; Ruelle, D. Recurrence Plots of Dynamical Systems. Europhys. Lett. 1987, 4, 9. [Google Scholar] [CrossRef]
  37. Casdagli, M.C. Recurrence plots revisited. Phys. D Nonlinear Phenom. 1997, 108, 12–44. [Google Scholar] [CrossRef]
  38. Faria, F.A.; Almeida, J.; Alberton, B.; Morellato, L.P.C.; Torres, R.S. Fusion of time series representations for plant recognition in phenology studies. Pattern Recognit. Lett. 2016, 83, 205–214. [Google Scholar] [CrossRef]
  39. Krizhevsky, A. One weird trick for parallelizing convolutional neural networks. arXiv 2014, arXiv:1404.5997. [Google Scholar] [CrossRef]
  40. Ahmad, Z.; Khan, N. CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors. IEEE Sens. J. 2021, 21, 3. [Google Scholar] [CrossRef]
  41. Moody, G.; Mark, R. MIT-BIH Arrhythmia Database; Version 1.0.0; PhysioNet: MIT Laboratory for Computational Physiology: Cambridge, MA, USA, 2005. [Google Scholar] [CrossRef]
  42. Luz, E.J.S.; Schwartz, W.R.; Cámara-Chávez, G.; Menotti, D. ECG-based heartbeat classification for arrhythmia detection: A survey. Comput. Methods Programs Biomed. 2016, 127, 144–164. [Google Scholar] [CrossRef]
  43. Emrich, J.; Koka, T.; Wirth, S.; Muma, M. Accelerated Sample-Accurate R-Peak Detectors Based on Visibility Graphs. In Proceedings of the 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023. [Google Scholar] [CrossRef]
  44. Chazal, P.d.; O’dwyer, M.; Reilly, R.B. Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Trans. Biomed. Eng. 2004, 51, 7. [Google Scholar] [CrossRef]
  45. Mar, T.; Zaunseder, S.; Martínez, J.P.; Llamedo, M.; Poll, R. Optimization of ECG classification by means of feature selection. IEEE Trans. Biomed. Eng. 2011, 58, 8. [Google Scholar] [CrossRef]
  46. Llamedo, M.; Martínez, J.P. Heartbeat Classification Using Feature Selection Driven by Database Generalization Criteria. IEEE Trans. Biomed. Eng. 2011, 58, 3. [Google Scholar] [CrossRef]
  47. Luz, E.; Menotti, D. How the choice of samples for building arrhythmia classifiers impact their performances. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011. [Google Scholar] [CrossRef]
  48. Soria, M.L.; Martínez, J.P. Analysis of multidomain features for ECG classification. In Proceedings of the 36th Annual Computers in Cardiology Conference (CinC), Park City, UT, USA, 13–16 September 2009; Available online: https://ieeexplore.ieee.org/document/5445344 (accessed on 7 October 2024).
  49. Lin, C.-C.; Yang, C.-M. Heartbeat classification using normalized RR intervals and morphological features. Math. Probl. Eng. 2014, 1, 712474. [Google Scholar] [CrossRef]
  50. Oliveira, R.F.; Freitas, V.L.S.; Moreira, G.J.P.; Luz, E.J.S. Explorando Redes Neurais de Grafos para Classificação de Arritmias. In Anais do XXII Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS); Sociedade Brasileira de Computação (SBC): Teresina, Brazil, 2022. [Google Scholar] [CrossRef]
  51. Zahid, M.U.; Kiranyaz, S.; Gabbouj, M. Global ECG Classification by Self-Operational Neural Networks with Feature Injection. IEEE Trans. Biomed. Eng. 2023, 70, 205–215. [Google Scholar] [CrossRef]
  52. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  53. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
  54. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6 August 2017; Available online: https://dl.acm.org/doi/10.5555/3305381.3305404 (accessed on 7 October 2024).
  55. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; Available online: https://dl.acm.org/doi/10.5555/3045118.3045167 (accessed on 7 October 2024).
  56. Awais, M.; Bin Iqbal, T.; Bae, S.-H. Revisiting Internal Covariate Shift for Batch Normalization. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 11. [Google Scholar] [CrossRef] [PubMed]
  57. Cortes, C.; Vapnik, V. Support-vector networks. Mach Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  58. Crammer, K.; Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2001, 2, 265–292. Available online: https://dl.acm.org/doi/10.5555/944790.944813 (accessed on 7 October 2024).
  59. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
  60. Morady, F. Catheter Ablation of Supraventricular Arrhythmias: State of the Art. J. Cardiovasc. Electrophysiol. 2004, 15, 124–139. [Google Scholar] [CrossRef]
  61. Zhang, Z.; Dong, J.; Luo, X.; Choi, K.-S.; Wu, X. Heartbeat classification using disease-specific feature selection. Comput. Biol. Med. 2014, 46, 79–89. [Google Scholar] [CrossRef]
  62. Kachuee, M.; Fazeli, S.; Sarrafzadeh, M. ECG Heartbeat Classification: A deep transferable representation. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018. [Google Scholar] [CrossRef]
  63. Huang, J.; Chen, B.; Yao, B.; He, W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access 2019, 7, 92871–92880. [Google Scholar] [CrossRef]
  64. Zhai, X.; Tin, C. Automated ECG Classification using Dual Heartbeat Coupling based on Convolutional Neural Network. IEEE Access 2018, 6, 27465–27472. [Google Scholar] [CrossRef]
  65. Shaker, A.M.; Tantawi, M.; Shedeed, H.A.; Tolba, M.F. Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE Access 2020, 8, 35592–35605. [Google Scholar] [CrossRef]
  66. Xu, X.; Jeong, S.; Li, J. Interpretation of electrocardiogram (ECG) rhythm by combined CNN and BiLSTM. IEEE Access 2020, 8, 125380–125388. [Google Scholar] [CrossRef]
  67. Qiao, F.; Li, B.; Zhang, Y.; Guo, H.; Li, W.; Zhou, S. A Fast and Accurate Recognition of ECG Signals Based on ELM-LRF and BLSTM Algorithm. IEEE Access 2020, 8, 71189–71198. [Google Scholar] [CrossRef]
  68. Seitanidis, P.; Gialelis, J.; Papaconstantinou, G. Identifying heart arrhythmias through multi-level algorithmic processing of ECG on edge devices. Procedia Comput. Sci. 2022, 203, 699–706. [Google Scholar] [CrossRef]
Figure 1. Sign of a normal heartbeat.
Figure 1. Sign of a normal heartbeat.
Applsci 14 09307 g001
Figure 2. Overview of the proposed CNN-AM structure.
Figure 2. Overview of the proposed CNN-AM structure.
Applsci 14 09307 g002
Figure 3. Transformation of ECG segments (classes N, S and V) into RGB rainbow images via the HSFC and RP techniques.
Figure 3. Transformation of ECG segments (classes N, S and V) into RGB rainbow images via the HSFC and RP techniques.
Applsci 14 09307 g003
Figure 4. Classifier with an attention mechanism.
Figure 4. Classifier with an attention mechanism.
Applsci 14 09307 g004
Figure 5. Three optimized N-class segments.
Figure 5. Three optimized N-class segments.
Applsci 14 09307 g005
Table 1. Structure of a sequence of convolutional layers for an image.
Table 1. Structure of a sequence of convolutional layers for an image.
Layer NameOutput SizeKernel SizePaddingStrid
Input3× [224 × 224]
Features:
Conv2D[3 × 64][11 × 11][6 × 6]: Reflect[4 × 4]
ReLU
MaxPool2D [3][0][2]
Conv2D[64 × 192][5 × 5][2 × 2]: Reflect[1 × 1]
ReLU
MaxPool2D [3][0][2]
Conv2D[192 × 384][3 × 3][1 × 1]: Reflect[1 × 1]
ReLU
Conv2D[384 × 256][3 × 3][1 × 1]: Reflect[1 × 1]
ReLU
Conv2D[256 × 256][3 × 3][1 × 1]: Reflect[1 × 1]
ReLU
MaxPool2D [3][0][2]
Avg. pool:
Adaptive AvgPool2D256 × [6 × 6]
Table 2. Organization of the notes in the MIT-BIH database.
Table 2. Organization of the notes in the MIT-BIH database.
GroupNoteClass
N
Any heartbeat not categorized as
S, V, F or Q.
NNormal heartbeat.
LLeft bundle branch block heartbeat.
RRight bundle branch block heartbeat.
eAtrial escape heartbeat.
jNodal (junctional) escape heartbeat.
S
Supraventricular ectopic heartbeat.
AAtrial premature heartbeat.
aAberrant atrial premature heartbeat.
JNodal (junctional) premature heartbeat.
SSupraventricular premature heartbeat.
V
Ventricular ectopic heartbeat.
VPremature ventricular contraction.
EVentricular escape heartbeat.
F
Fusion heartbeat.
FFusion of ventricular and normal beats.
Q
Unknown heartbeat.
P or /Rhythmic heartbeat.
fFusion of rhythmic and normal heartbeat.
UUnclassifiable heartbeat.
Table 3. Data organization via the interpatient paradigm.
Table 3. Data organization via the interpatient paradigm.
Consolidated
DS1DS2
N45,78143,598
S975667
V3,7863,219
50,54247,484
Table 4. Data organization via the intrapatient paradigm.
Table 4. Data organization via the intrapatient paradigm.
DS1 + DS2Training (80%)Test (20%)
N89,37971,50317,876
S16421314328
V700556041401
98,02678,42119,605
Table 5. Data augmentation for training classes in the interpatient paradigm.
Table 5. Data augmentation for training classes in the interpatient paradigm.
ClassOriginal
Training Data
Training Data with Data
Augmentation
N478145,781
S97530,000
V378610,000
Total:50,54285,781
Table 6. Data augmentation for training classes in the intrapatient paradigm.
Table 6. Data augmentation for training classes in the intrapatient paradigm.
ClassOriginal
Training Data
Training Data with Data
Augmentation
N71,50371,503
S131430,000
V560410,000
Total:78,421111,503
Table 7. Global indicators for the test database using the single-derivation interpatient paradigm (MLII).
Table 7. Global indicators for the test database using the single-derivation interpatient paradigm (MLII).
AccPreRecallSpeF1
MLII-F197.2974.4178.3395.4275.96
MLII-S83.8959.5184.8693.2461.86
Table 8. Global indicators for the test database using the two-derivation interpatient paradigm (MLII and V).
Table 8. Global indicators for the test database using the two-derivation interpatient paradigm (MLII and V).
AccPreRecallSpeF1
MLII + V-F198.4894.1580.2396.3481.91
MLII + V-S95.4772.2887.2196.6275.25
Table 9. Indicators by class for the test database using the single-derivation interpatient paradigm (MLII).
Table 9. Indicators by class for the test database using the single-derivation interpatient paradigm (MLII).
NSV
RecallPreSpeF1RecallPreSpeF1RecallPreSpeF1
MLII-F198.398.988.098.641.128.098.533.395.696.299.795.9
MLII-S83.399.695.990.777.57.686.613.993.871.497.381.1
Table 10. Indicators by class for the test database using the two-derivation interpatient paradigm (MLII and V).
Table 10. Indicators by class for the test database using the two-derivation interpatient paradigm (MLII and V).
NSV
RecallPreSpeF1RecallPreSpeF1RecallPreSpeF1
MLII + V-F199.499.188.499.243.356.399.749.098.097.099.797.5
MLII + V-S95.899.497.097.669.320.296.131.396.697.299.896.6
Table 11. Global and class indicators for the test database using the intrapatient paradigm with the MLII derivation.
Table 11. Global and class indicators for the test database using the intrapatient paradigm with the MLII derivation.
AccPreRecallSpeF1
MLII99.5997.1795.9799.0496.56
MLII + V99.7098.0197.2699.2897.64
NSV
RecallPreSpeF1RecallPreSpeF1RecallPreSpeF1
MLII99.899.797.399.889.292.799.990.998.899.199.999.0
MLII + V99.999.897.999.892.994.799.993.899.099.699.999.3
Table 12. Comparison of results for the interpatient paradigm—global indicators.
Table 12. Comparison of results for the interpatient paradigm—global indicators.
WorkMethodAcc
%
Pre
%
Recall
%
Spe
%
F1
%
Chazal et al. (2004) [44]Statistical83.8845.5766.0096.0559.74
Soria, Martinez (2009) [48]LDA + QDA91.4769.8989.9894.6076.43
Llamedo, Martínez (2011) [46]QDA90.6266.2686.1895.8366.61
Lin, Yang (2014) [49]LDA93.0067.6083.50--
Zhang et al. (2014) [61]SVM86.66----
Garcia et al. (2017) [22]ANN + SVM + PSO92.3870.1281.1092.1274.59
Wang et al. (2021) [15]Wavlet + CNN97.4870.7567.4796.0168.76
Dias et al. (2021) [23]LDA80.5859.3784.8494.3560.77
Oliveira et al. (2022) [50]NGN + GV86.1276.7277.0492.8776.88
Zahid et al. (2023) [51]Self-ONN98.1993.2591.3596.3792.27
CNN-AM–MLII-F1 97.2974.4178.3395.4275.96
CNN-AM–MLII-S83.8959.5184.8693.2461.86
CNN-AM–MLII + V-F198.4894.1580.2396.3481.91
CNN-AM–MLII + V-S95.4772.2887.2196.6275.25
Table 13. Results of research on the interpatient paradigm—indicators by class.
Table 13. Results of research on the interpatient paradigm—indicators by class.
WorkAcc
%
Recall
%
Pre
%
F1 Score
%
NSVNSVNSV
Chazal et al. (2004) [44]83.8886.975.977.799.238.581.692.651.179.6
Soria, Martinez (2009) [48]91.4791.788.389.998.939.571.295.254.679.5
Llamedo, Martínez (2011) [46]90.6291.884.881.999.510.988.495.519.385.1
Lin, Yang (2014) [49]93.0091.681.486.299.331.631.695.345.579.5
Zhang et al. (2014) [61]86.6688.979.185.599.036.092.8---
Garcia et al. (2017) [22]92.3894.062.087.398.053.059.495.957.170.7
Wang et al. (2021) [15]97.4899.474.695.798.289.593.298.881.494.4
Dias et al. (2021) [23]80.5879.691.387.399.540.393.288.556.090.1
Oliveira et al. (2022) [50]86.1299.846.185.298.644.986.699.245.585.9
Zahid et al. (2023) [51]98.1999.383.391.498.983.597.499.183.494.3
CNN-AM–MLII-F197.2998.341.195.698.928.096.298.633.395.9
CNN-AM–MLII-S83.8983.377.593.899.67.671.490.713.981.1
CNN-AM–MLII + V-F198.4899.443.398.099.156.397.099.249.097.5
CNN-AM–MLII + V-S95.4795.869.396.699.420.297.297.631.396.9
Table 14. Research results for the intrapatient paradigm—global indicators.
Table 14. Research results for the intrapatient paradigm—global indicators.
WorkMethodAcc
%
Pre
%
Recall
%
Spec
%
F1
%
Kachuee et al. (2018) [62]CNN95.9095.2095.10--
Izci et al. (2019) [3]CNN92.9690.0880.0898.1482.17
Huang et al. (2019) [63]CNN99.00----
Zhai, Tin (2018) [64]CNN96.0565.9172.0697.8368.06
Shaker et al. (2020) [65]CNN98.3582.2493.8299.0187.29
Xu et al. (2020) [66]SVM + RF + CNN95.9096.3495.90-95.92
Quiao et al. (2020) [67]DELM-LRF-BLSTM99.3298.3097.15-97.71
He et al. (2021) [24]SVM + ANN98.2999.2298.29--
Ahmad et al. (2021) [4]CNN + SVM99.7098.0098.00--
Seitanidis et al. (2022) [68]CNN95.20-95.2098.80-
Islam et al. (2023) [21]ANN + CNN99.6097.6699.60-98.21
Mewada (2023) [8]CNN99.5295.1296.18-95.64
Zhou, Fang (2024) [9]CNN-FCA99.6684.1997.9299.7087.72
CNN-AM MLII 99.5997.1795.9799.0496.56
CNN-AM MLII + V99.7098.0197.2699.2897.64
Table 15. Global interpatient indicators.
Table 15. Global interpatient indicators.
AccPreRecallSpeF1
M1MLII96.7473.0277.5494.6474.40
M2MLII97.1473.4276.2894.9274.69
M3MLII-f197.3274.7278.3995.3776.02
MLII-S85.3262.3983.9893.4364.01
M4MLII-F197.2974.4178.3395.4275.96
MLII-S83.8959.5184.8693.2461.86
M5MLII + V-F198.1581.1780.1896.3080.56
MLII + V-S92.7963.8469.1791.962.70
M6MLII + V-F198.4894.1580.2396.3481.91
MLII + V-S95.4772.2887.2196.6275.25
Table 16. Indicators by interpatient class.
Table 16. Indicators by interpatient class.
No.SV
RecallPreSpeF1RecallPreSpeF1RecallPreSpeF1
M1MLII97.998.886.198.342.023.598.130.192.896.899.894.7
M2MLII98.398.886.698.536.328.598.731.994.393.099.593.6
M3MLII-F198.398.997.898.640.826.298.431.996.199.199.997.6
MLII-S84.899.594.991.672.17.487.113.495.180.398.387.0
M4MLII-F198.398.988.098.641.128.098.533.395.696.299.795.9
MLII-S83.399.695.990.777.57.686.613.993.871.497.381.1
M5MLII + V-F199.099.190.099.043.250.599.446.698.493.999.596.1
MLII + V-S95.398.381.597.048.412.895.320.263.580.598.971.0
M6MLII + V-F199.499.188.499.243.356.399.749.098.097.099.797.5
MLII + V-S95.899.497.097.669.320.296.131.396.697.299.896.6
Table 17. Global intrapatient indicators.
Table 17. Global intrapatient indicators.
AccPreRecallSpeF1
M1MLII98.8194.3791.8599.0693.03
M2MLII98.5394.0491.3398,8892.62
M3MLII99.5897.0295.8999.0296.44
M4MLII99.5997.1795.9799.0496.56
M5MLII + V99.7398.2197.2699.3497.73
M6MLII + V99.7098.0197.2699.2897.64
Table 18. Indicators by intrapatient class.
Table 18. Indicators by intrapatient class.
No.SV
RecallPreSpeF1RecallPreSpeF1RecallPreSpeF1
M1RP99.799.195.899.482.093.698.887.498.097.799.896.4
M2HSFC99.599.095.199.379.790.099.884.595.296.499.895.8
M3MLII99.899.797.299.888.992.099.990.598.999.399.999.1
M4MLII99.899.797.399.889.292.799.990.998.899.199.999.0
M5MLII + V99.999.898.199.992.695.399.993.999.399.610099.4
M6MLII + V99.999.897.999.892.994.799.993.899.099.699.999.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Di Paolo, Í.F.; Castro, A.R.G. Intra- and Interpatient ECG Heartbeat Classification Based on Multimodal Convolutional Neural Networks with an Adaptive Attention Mechanism. Appl. Sci. 2024, 14, 9307. https://doi.org/10.3390/app14209307

AMA Style

Di Paolo ÍF, Castro ARG. Intra- and Interpatient ECG Heartbeat Classification Based on Multimodal Convolutional Neural Networks with an Adaptive Attention Mechanism. Applied Sciences. 2024; 14(20):9307. https://doi.org/10.3390/app14209307

Chicago/Turabian Style

Di Paolo, Ítalo Flexa, and Adriana Rosa Garcez Castro. 2024. "Intra- and Interpatient ECG Heartbeat Classification Based on Multimodal Convolutional Neural Networks with an Adaptive Attention Mechanism" Applied Sciences 14, no. 20: 9307. https://doi.org/10.3390/app14209307

APA Style

Di Paolo, Í. F., & Castro, A. R. G. (2024). Intra- and Interpatient ECG Heartbeat Classification Based on Multimodal Convolutional Neural Networks with an Adaptive Attention Mechanism. Applied Sciences, 14(20), 9307. https://doi.org/10.3390/app14209307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop