ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism

Ammar, Khalid; Fraihat, Salam; Al-Naymat, Ghazi; Sanjalawe, Yousef

doi:10.3390/a18110674

Open AccessArticle

ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism

by

Khalid Ammar

¹,

Salam Fraihat

^1,*

,

Ghazi Al-Naymat

¹

and

Yousef Sanjalawe

²

¹

Department of Information Technology, College of Engineering and Information Technology, Artificial Intelligence Research Center, Ajman University, Ajman 346, United Arab Emirates

²

Department of Information Technology, King Abdullah II School for Information Technology, The University of Jordan, Amman 11942, Jordan

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(11), 674; https://doi.org/10.3390/a18110674

Submission received: 22 September 2025 / Revised: 18 October 2025 / Accepted: 20 October 2025 / Published: 22 October 2025

Download

Browse Figures

Versions Notes

Abstract

The electrocardiogram (ECG) is a vital diagnostic tool used to monitor heart activity and detect cardiac abnormalities, such as arrhythmias. Accurate classification of normal and abnormal heartbeats is essential for effective diagnosis and treatment. Traditional deep learning methods for automated ECG classification primarily focus on reconstructing the original ECG signal and detecting anomalies based on reconstruction errors, which represent abnormal features. However, these approaches struggle with unseen or underrepresented abnormalities in the training data. In addition, other methods rely on manual feature extraction, which can introduce bias and limit their adaptability to new datasets. To overcome this problem, this study proposes an end-to-end model called ECG-CBA, which integrates the convolutional neural networks (CNNs), bidirectional long short-term memory networks (Bi-LSTM), and a multi-head Attention mechanism. ECG-CBA model learns discriminative features directly from the original dataset rather than relying on feature extraction or signal reconstruction. This enables higher accuracy and reliability in detecting and classifying anomalies. The CNN extracts local spatial features from raw ECG signals, while the Bi-LSTM captures the temporal dependencies in sequential data. An attention mechanism enables the model to primarily focus on critical segments of the ECG, thereby improving classification performance. The proposed model is trained on normal and abnormal ECG signals for binary classification. The ECG-CBA model demonstrates strong performance on the ECG5000 and MIT-BIH datasets, achieving accuracies of 99.60% and 98.80%, respectively. The model surpasses traditional methods across key metrics, including sensitivity, specificity, and overall classification accuracy. This offers a robust and interpretable solution for both ECG-based anomaly detection and cardiac abnormality classification.

Keywords:

electrocardiogram classification; convolutional neural networks; bidirectional long short-term memory; multi-head attention mechanism; ECG5000; MIT-BIH

1. Introduction

The electrocardiogram (ECG) is one of the most widely used non-invasive diagnostic tools for monitoring heart activity and detecting cardiac abnormalities, including arrhythmias. Early and accurate classification of normal and abnormal heartbeats is essential for timely medical intervention and effective treatment. However, analyzing ECG signals presents significant challenges due to their complex temporal dynamics, variations in signal morphology across individuals, and the presence of noise and artifacts. Traditional methods for automated ECG classification typically rely on either manual feature extraction or deep learning approaches based on signal reconstruction. Both techniques have limitations that hinder their ability to generalize and their accuracy in real-world applications [1].

Feature extraction-based methods attempt to capture relevant characteristics from ECG signals using statistical, time-domain, and frequency-domain techniques. However, the extracted features introduce bias, limit the model’s ability to adapt to new datasets and unseen abnormalities, and often require significant computational resources, making real-time processing difficult [2]. On the other hand, deep learning methods that reconstruct ECG signals and detect anomalies based on reconstruction errors struggle to generalize, especially when encountering rare or underrepresented abnormal patterns in training data. Additionally, their high computational complexity and latency further hinder their suitability for real-time applications. These limitations highlight the need for a more efficient, effective, and generalizable approach to ECG classification that can operate reliably in real-time scenarios [3].

To overcome these limitations, recent research in signal-based deep learning has increasingly adopted end-to-end architectures that learn discriminative representations directly from raw input signals, bypassing the dependency on handcrafted features or reconstruction losses. Such architectures can automatically discover both spatial and temporal dependencies, improving robustness and generalisation across diverse datasets and noise conditions. This paradigm has also shown success in related domains of signal interpretation [4,5]. These studies demonstrate the versatility of deep neural models in learning complex signal representations and capturing long-term dependencies, reinforcing the motivation for an end-to-end approach in ECG anomaly detection.

In this study, we propose an end-to-end deep learning model, ECG-CBA, which integrates convolutional neural networks (CNNs), Bidirectional long short-term memory networks (Bi-LSTMs), and a multi-head Attention mechanism. Unlike conventional approaches, ECG-CBA directly learns discriminative features from raw ECG signals rather than relying on manual feature extraction or signal reconstruction. The CNN component captures local spatial patterns in ECG waveforms, while the Bi-LSTM module models the sequential dependencies of heartbeats, enabling the recognition of temporal variations. The attention mechanism further enhances the model’s ability to focus on critical ECG segments, improve the detection of anomalies, and boost classification accuracy.

The proposed model is evaluated on the ECG5000 and MIT-BIH Arrhythmia datasets for binary classification of normal and abnormal heartbeats. Experimental results demonstrate that ECG-CBA achieves high classification performance, with accuracies of 99.60% and 97.49% on the ECG5000 and MIT-BIH datasets, respectively. Compared to traditional deep learning approaches, ECG-CBA improves sensitivity, specificity, and overall classification accuracy, making it a robust solution for ECG-based anomaly detection.

The key contributions of this research are summarized as follows:

We propose ECG-CBA, an end-to-end framework that learns discriminative representations directly from raw ECG signals, eliminating the need for hand-crafted feature extraction or signal reconstruction. This approach enhances generalization to unseen data and improves robustness against noise and inter-patient variability, making it highly suitable for real-world clinical applications.
The architecture integrates convolutional neural networks (CNNs) for spatial feature extraction, bidirectional long short-term memory (Bi-LSTM) networks for modeling temporal dependencies, and a multi-head attention mechanism to emphasize clinically relevant ECG segments.
Comprehensive experiments conducted on two benchmark datasets, ECG5000 and MIT-BIH Arrhythmia, demonstrate that ECG-CBA consistently outperforms existing approaches across multiple performance metrics, including accuracy, sensitivity, specificity, and F1-score.

The rest of the paper is organised as follows: Section 2 presents the relevant related works on ECG anomaly detection. Section 3 details the Background study of CNN, Bi-LSTM and Attention models. Section 4 presents the design and development of the VAE-MCRS model. Section 5 describes the implementation and training framework of the ECG-CBA model. Section 6 outlines the performance evaluation methods, while Section 7 presents the experimental results of the ECG-CBA model, compared against related recommendation methods as benchmarks. Finally, Section 8 provides the conclusions of this study and suggests directions for future research.

2. Related Work

Anomalies in electrocardiogram (ECG) signals are critical indicators of various cardiovascular diseases, which remain a leading cause of morbidity and mortality worldwide [6]. The advent of deep learning has revolutionized the field of medical diagnostics, particularly in the interpretation of ECG signals, enabling more accurate and efficient detection of arrhythmias and other cardiac anomalies [7]. This literature review focuses on applying deep learning models for detecting and classifying anomalies in ECG signals, explicitly utilizing the ECG5000 and MIT-BIH Arrhythmia Databases. These datasets are widely recognized for their comprehensive representation of various ECG patterns, making them suitable for training and validating deep learning models.

The ECG5000 dataset, introduced by [8], comprises 5000 ECG recordings from five different classes of heartbeats, including Normal, Atrial Fibrillation, and others. Recent studies have employed this dataset to leverage the capabilities of deep learning architectures [9]. The dataset’s structure enables a balanced representation of both normal and abnormal ECG signals, facilitating the development of robust models that can generalize across different classes. The dataset’s size and diversity make it a suitable candidate for training deep learning models that require large amounts of labeled data to achieve high accuracy.

The MIT-BIH Arrhythmia Database is another pivotal resource in ECG research, consisting of 48 half-hour ECG recordings from 47 subjects, annotated with 11 different types of arrhythmias [10]. Recent studies have highlighted the unique challenges posed by this dataset, such as class imbalance and the need for precise temporal segmentation of ECG signals [11]. The annotations provided in this dataset offer a rich ground for training deep learning models, enabling researchers to explore various architectures and preprocessing techniques to improve classification performance.

Recent advancements in deep learning have led to the development of various architectures tailored for ECG signal analysis. Convolutional neural networks (CNNs) have emerged as a popular choice due to their ability to automatically learn spatial hierarchies in data [12]. For instance, they proposed a hybrid CNN-LSTM model that combines the strengths of CNNs in feature extraction with Long Short-Term Memory (LSTM) networks for temporal sequence modeling, achieving state-of-the-art performance on the ECG5000 and MIT-BIH datasets.

Additionally, attention mechanisms have been integrated into deep learning models to enhance their performance further. A Transformer-based model that incorporates self-attention layers was introduced by Ref. [13], enabling the model to focus on critical segments of the ECG signal while disregarding noise. Their findings demonstrated significant improvements in classification accuracy compared to traditional CNN approaches.

Liu et al. [5] introduced a new method for long-term temperature compensation in structural health monitoring using ultrasonic guided waves. This study emphasize the versatility and growing importance of deep learning for extracting discriminative representations and modeling long-term dependencies across diverse signal domains.

Roy et al. [14] proposed a novel approach for detecting anomalies in ECG signals using a deep LSTM autoencoder. The method employs an encoder to learn a lower-dimensional representation of ECG sequences and a decoder to reconstruct the original ECG signal. The model is trained on normal ECG signals, and anomaly detection is performed by analyzing the reconstruction loss of test ECG signals. The authors determine a reconstruction loss threshold using both manual and Kapur’s automated thresholding procedures. When applied to the ECG5000 dataset, the proposed model achieved an accuracy of over 98%.

Qin et al. [15] introduced a novel one-class classification GAN (ECG-ADGAN) for ECG anomaly detection. The method incorporates a Bi-directional Long-Short Term Memory (Bi-LSTM) layer into a GAN architecture. It employs a mini-batch discrimination training strategy in the discriminator to synthesize ECG signals. The model was trained to generate samples that match the distribution of normal signals, enabling the reliable detection of anomalies, even those not well-represented in the training data. Experiments on the MIT-BIH arrhythmia database demonstrated the method’s effectiveness, achieving an accuracy of 95.5% and an AUC of 95.9%, outperforming state-of-the-art semi-supervised learning algorithms.

Pereira et al. [16] proposed an unsupervised approach for learning representations of ECG sequences and detecting anomalies. They trained a variational autoencoder model with Bi-LSTM encoders and decoders for representation learning. Then, they introduced new unsupervised methods for anomaly detection in the latent space. The clustering step focused on defining the two clusters, which include the normal and anomalous heartbeats. Such a technique relies on the fact that normal heartbeats are the majority, and the anomalous ones are in a latent space different from the normal ones. Their model was regularized using a sparsity penalty.

Additionally, Dutta et al. [17] presented MED-NET, a novel approach to ECG anomaly detection using LSTM autoencoderders. The model uses Stacked LSTM Architecture and autoencoderders to represent temporal attributes in a latent matrix. The model processes and extracts around 140 features from a dataset of 5000 samples. The LSTM network is structured by combining an Encoder–Decoder LSTM, allowing the model to accept variable-length input sequences and predict or output variable-length output sequences. The recreated time series-based ECG data as output in the final layer is compared with the original ECG time series-based input data to calculate the Reconstruction Loss.

Roy et al. [14] proposed ECG-NET, an LSTM-based autoencoder for detecting anomalous ECG signals. A key contribution of this method is that it only requires normal ECG signals to train the model. This approach addressed the challenges of data imbalance and the limited availability of annotated anomalous ECG signals. The method also incorporated an automated reconstruction loss threshold selection approach during testing, where if the reconstruction loss value is above a certain threshold, the signal is considered an anomaly; otherwise, it is considered normal.

Shaik Munawar et al. [18] proposed a Multi-Task Group Bi-directional LSTM method to improve the performance of arrhythmia classification. The MTGBi-LSTM model learns unique features in a shared representation that help overcome overfitting problems and increase the model’s learning rate. The global and intra LSTM method selects the relevant feature and easily escapes from local optima. The multi-task learning technique learns two ECG signals in a shared representation for effective learning.

Gutiérrez-Gnecchi et al. [19] presented a DSP-based method for real-time arrhythmia classification designed for online ambulatory operation. The algorithm classifies eight heartbeat conditions using a wavelet transform based on quadratic wavelets to identify individual ECG waves and obtain a fiducial marker array. Classification is conducted using a Probabilistic Neural Network and tested with ECG records from the PhysioNet repository.

The study by Ref. [20] proposed classifying arrhythmias into five categories using LSTM with a Luong Attention Mechanism for the model to learn the critical part of the ECG signal at each time step. In the proposed model, the authors used a Continuous Wavelet Transform (CWT) to denoise the signals from low and high frequencies. Then, a set of features was extracted using statistical features (such as skewness and kurtosis) and time—frequency domain features.

Ramaiah et al. [21] addressed the critical issue of ECG ventricular fibrillation (VF) type signal detection. VF is a life-threatening cardiac arrhythmia with high mortality. The authors proposed a novel Long Short-Term Memory (LSTM) classifier optimized with Improved Penguin Optimization (IPEO) to mitigate overfitting. The method utilized the ECG recordings from the MIT-BIH and CPSC 2018 datasets. The proposed model also employed Fuzzy C-Means and Enhanced Fuzzy Rough Set methods for effective feature selection, extracting informative features, and clustering membership degrees.

The study by Ref. [22] proposed a novel one-dimensional convolutional neural network (1D-CNN) architecture for the automated classification of arrhythmias. Leveraging the MIT-BIH dataset, incorporating both real and noise-attenuated ECG signals, the authors aimed to develop a robust and efficient diagnostic tool. Such noise is the inherent randomness of the arrhythmic events, leading to potential misdiagnosis.

Mohammed [23] addressed the limitations of cloud-based machine learning (ML) for electrocardiogram (ECG) arrhythmia detection, focusing on the emerging field of edge inference to meet real-time, privacy, and availability demands. The author illuminated the computational challenge of deploying modern ML algorithms on resource-constrained edge devices. The challenge was addressed by the proposed model, which is a compact convolutional neural network (CNN) classifier enhanced with matched filter (MF) theory. The model’s minimal size (15 KB) and rapid inference time position it as a superior, low-complexity alternative for real-time ECG monitoring on edge devices, potentially benefiting a large population of patients with cardiovascular disease.

In Ref. [24], a novel deep learning approach was proposed utilizing a Hybrid Residual Network (Hybrid ResNet). The method enhanced the feature extraction and classification accuracy through an architecture that integrated standard convolution, depthwise separable convolution, and residual connections. Their model ensured high-quality input because the ECG signals undergo preprocessing, including baseline drift removal, denoising via discrete wavelet transform, and heartbeat segmentation. Furthermore, the study tackles the class imbalance issue in the MIT-BIH Arrhythmia Database by employing the Synthetic Minority Oversampling Technique (SMOTE).

In Ref. [25], the study proposed a hybrid deep learning-based technique that transforms 1D ECG data into 2D Scalogram images for automated noise reduction and feature extraction. The core contribution is the Residual attention-based 2D-CNN-LSTM-CNN (RACLC) model, which combines 2D convolutional neural networks (CNNs) and Long Short-Term Memory (LSTM) systems to capture both morphological and temporal information from ECG signals. Integrating an attention block enhances the model’s ability to focus on critical details within the ECG signal, thereby improving classification efficiency.

In Ref. [26], the research proposed a 1D convolutional neural network (CNN) with residual connections. Using the residual connections enabled the model to capture the most critical feature in the input signal.

Murat et al. [27] analyzed literature reports that use deep learning on arrhythmia ECG data. The authors utilized ECG data from five classes, comprising 100,022 beats, obtained from the MIT-BIH arrhythmia database, to evaluate deep learning techniques. They found that classifying raw ECG signals with deep learning-based systems without using any manual feature extraction is a significant advantage. However, some studies have shown that using certain temporal features (i.e., RR interval) and raw signals improves model performance. They also noted that the imbalance of ECG datasets is a significant problem that can give misleading information concerning model performance. The authors concluded that efficient hybrid models can provide more distinctive features from ECG signals.

The study by Ref. [28] proposed AttentivECGRU, a novel arrhythmia detection method that employs a Gated Recurrent Unit (GRU)-based autoencoder with an attention mechanism. The model, trained solely on normal electrocardiogram (ECG) signals, learns to reconstruct these signals and effectively identifies abnormalities by analyzing reconstruction errors. In their study, a fuzzy logic-based threshold selection process is applied to differentiate between normal and arrhythmic ECGs, addressing the overlap in reconstruction loss distributions.

3. Background Study

This section describes the fundamental building blocks of the ECG-CBA model for ECG anomaly detection, including convolutional neural networks (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), and Multi-Head Attention types. Each technique contributes to the precision and interpretability of the anomalies present in ECG signals. The extraction of spatial features is performed using a CNN module, while temporal dependencies are captured by utilizing a Bi-LSTM. Additionally, the Attention Mechanism helps highlight key portions of the sequence. The ECG-CBA model is described in detail in the subsequent subsections.

3.1. CNN

CNNs are a class of deep learning architectures that are particularly effective for processing data with a grid-like topology, such as time series signals or images. In the context of ECG signals, CNNs extract spatial features like wave peaks, troughs, and transitions between cardiac [29,30]. As shown in Figure 1, a typical convolutional neural network (CNN) architecture consists of several key stages, including convolution, pooling, flattening, and fully connected layers for classification.

The primary operation in CNNs is the convolution, where a kernel slides across the input signal to produce a feature map. Mathematically, the convolution operation can be expressed as follows:

y (i) = \sum_{k = 0}^{K - 1} x (i + k) \cdot w (k)

(1)

where x is the input signal, w is the kernel (filter) of size K, and y is the output feature map. The 1D convolution in ECG processing applies filters over temporal sequences to detect localized patterns such as QRS complexes (Q, R, and S waves representing ventricular depolarization).

Following convolution, a non-linear activation function such as ReLU is typically applied:

f (x) = max (0, x)

(2)

This enhances model expressiveness while introducing sparsity. Max-pooling layers are also commonly used to reduce the dimensionality of the feature maps:

y (i) = max_{k \in [0, P - 1]} x (i + k)

(3)

where P is the pooling window size, pooling reduces computational cost and controls overfitting by retaining the most salient features.

In ECG anomaly detection, CNNs can automatically learn discriminative features from raw input, eliminating the need for handcrafted feature engineering. These learned features are then forwarded to sequence models, such as Bi-LSTM, for temporal analysis.

3.2. Bi-LSTM

Long-short-term memory (LSTM) networks are a type of Recurrent Neural Network (RNN) designed to overcome the vanishing gradient problem and capture long-range dependencies in sequential data. The Bi-LSTM processes input data in both forward and backward directions, enhancing the ability to capture temporal relationships across the entire sequence [31,32,33]. Figure 2 illustrates the structure of a Bi-LSTM network, where inputs are processed in both forward and backward directions to capture comprehensive temporal dependencies.

An LSTM unit updates its internal states through three gates—input gate

i_{t}

, forget gate

f_{t}

, and output gate

o_{t}

—to control the flow of information into and out of the memory cell

C_{t}

. The operations at each time step t are defined as follows:

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(4)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(5)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(6)

{\tilde{C}}_{t} = tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(7)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}

(8)

h_{t} = o_{t} ⊙ tanh (C_{t})

(9)

where

$x_{t}$ is the input vector at time step t.
$h_{t - 1}$ is the hidden state from the previous time step.
$W_{*}$ and $U_{*}$ are the weight matrices for input and recurrent connections, respectively.
$b_{*}$ are the bias terms.
$σ (\cdot)$ is the sigmoid activation function.
$tanh (\cdot)$ is the hyperbolic tangent function.
⊙ denotes element-wise multiplication.
${\tilde{C}}_{t}$ is the candidate cell state.
$C_{t}$ is the updated cell state.
$h_{t}$ is the output hidden state.

In a Bi-LSTM network, two separate LSTM layers are applied to the input sequence: one processes the sequence forward, and the other processes it backward. This structure allows the model to capture information from past and future time steps at each point in the sequence. The final hidden state at each time step t is formed by concatenating the forward and backward hidden states:

h_{t}^{bi} = [\vec{h_{t}}; \overset{\leftarrow}{h_{t}}]

(10)

where

$\vec{h_{t}}$ is the hidden state from the forward LSTM pass at time step t.
$\overset{\leftarrow}{h_{t}}$ is the hidden state from the backward LSTM pass at time step t.
$h_{t}^{bi}$ is the concatenated output representing both temporal directions.
$[;]$ denotes vector concatenation.

This bidirectional representation enables the model to learn more comprehensive temporal dependencies, which is especially beneficial in time-series tasks, such as ECG anomaly detection. This bidirectional context also allows the model to understand dependencies that span past and future time steps. This is critical in ECG analysis since some abnormalities can only be detected when considering prior and subsequent waveform segments. The Bi-LSTM thus captures richer temporal features than unidirectional RNNs or LSTMs.

3.3. Multi-Head Attention

Attention mechanisms enhance deep learning models by allowing them to focus on relevant parts of the input data [34]. Multi-head attention extends this concept by employing multiple attention heads, each learning different relationships and representations [35,36]. Figure 3 presents the architecture of a transformer model block, highlighting the role of Multi-Head Attention, which is applied multiple times in both the encoder and decoder to capture contextual dependencies across input and output sequences effectively.

Given queries Q, keys K, and values V, the scaled dot-product attention is computed as follows:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(11)

where

d_{k}

is the dimensionality of the key vectors, this operation produces a weighted sum of values, allowing the model to attend to essential segments.

In Multi-Head Attention, this process is performed h times in parallel, using different learned projections:

MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}

(12)

where each head is defined as follows:

{head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(13)

The multiple attention heads allow the model to capture diverse patterns and dependencies across different parts of the ECG signal.

This is crucial in ECG anomaly detection, as it allows us to focus on segments that exhibit subtle deviations from normal rhythms. For instance, rare arrhythmic episodes may only affect small waveform regions; attention mechanisms can prioritize such anomalies without being distracted by irrelevant patterns. The Multi-Head Attention mechanism enhances interpretability and performance by enabling the model to allocate focus across the input sequence dynamically. This is particularly beneficial in medical time-series data, such as ECG signals.

4. Proposed ECG-CBA Model

This section presents the proposed ECG-CBA model. We first describe its network architecture, then explain how its design enhances ECG anomaly detection. Finally, we provide implementation details to ensure reproducibility. Figure 4 illustrates the ECG anomaly detection process, which is further explained in the following subsections.

4.1. Data Preprocessing

Preprocessing a Signal is crucial to ensure the data is clean, structured, and suitable for training machine learning or deep learning models. Signals often contain noise, artifacts, and variations in amplitude, which can hinder the performance of predictive models if not addressed. The preprocessing pipeline for Signals typically involves several key steps, including data segmentation, normalization, label encoding, and splitting into training and testing sets. These steps ensure that the data are formatted correctly and optimized for practical model training, which allows accurate analysis and classification of normal and abnormal ECG patterns. It is essential to highlight that the datasets used in this study are already segmented, with each ECG signal divided into 140 time intervals for the ECG5000 dataset and 188 time intervals for the MIT-BIH dataset. The data is further processed to ensure compatibility with deep learning models and enhance the overall quality of the analysis. The following sections provide a detailed overview of the preprocessing steps applied to the datasets used.

4.1.1. Data Segmentation

The preprocessing of the ECG5000 dataset begins with data segmentation, which ensures that the ECG signals are structured in a format suitable for deep learning models. Each record in the dataset consists of 140 time steps representing an ECG signal, with the last column indicating whether the signal is normal or anomalous. To prepare the data, the ECG features are first separated from their corresponding labels. Since deep learning models like CNNs and Bi-LSTMs require fixed-length sequences, the data is reshaped into a three-dimensional format: (samples, time steps, channels), where the channel dimension is set to 1, as the ECG5000 dataset contains single-lead signals. Next, normalization is applied to standardize the data, ensuring that all values have a zero mean and unit variance, which improves the stability of model training and prevents certain features from dominating others. This step is crucial as ECG signals can have varying amplitude ranges depending on the recording conditions.

4.1.2. Data Splitting

Following segmentation, data splitting is performed to create separate training and testing sets. Typically, the data set is divided using a split 80–20%, 90–10%, or 10–90%, ensuring that both splits contain a representative distribution of normal and abnormal samples. Since the ECG5000 and MIT-BIH datasets support binary classification, the class labels are converted into a Boolean format, where normal signals are mapped to one and abnormal signals are mapped to zero. Furthermore, to align with TensorFlow ’s input expectations for the CNN layer, the data are further reshaped into a format (batch_size, time_steps, features), maintaining the time series structure of the ECG data while making it compatible with deep learning architectures. This structured preprocessing approach ensures that the ECG signals are appropriately formatted for training a ECG-CBA model, optimizing both feature extraction and sequence learning to enhance the detection of ECG anomalies.

4.2. Proposed ECG-CBA Architecture

As illustrated in Figure 5, the proposed model architecture consists of two main components: an encoder and a decoder. The encoder employs CNN layers to extract local features from the ECG signals, followed by a Bi-LSTM layer to capture sequential patterns. An attention mechanism is then applied to focus on the most relevant parts of the data, enhancing the model’s ability to detect anomalies.

The decoder flattens the encoded features and passes them through a dense layer with a sigmoid activation function to produce binary classification outputs. The model is compiled using the Adam optimizer and binary cross-entropy loss, with accuracy as the evaluation metric. Early stopping is implemented to prevent overfitting by monitoring validation loss and restoring the best weights.

4.2.1. ECG Encoder Blocks

CNN layers: The CNN plays a crucial role in extracting meaningful features from raw ECG time-series signals. CNN effectively captures local patterns such as peaks, waveforms, and abrupt changes that are essential for identifying anomalies in ECG data. In the proposed model, CNN serves as the first stage of processing, preparing the data for deeper sequential learning using Bi-LSTM and attention mechanisms. The Conv1D layers apply convolutional filters that slide over the ECG signal, detecting key features like P-waves, QRS complexes, and T-waves, ensuring that relevant characteristics are automatically learned. Following this, MaxPooling1D layers reduce the dimensionality of the extracted features, thereby lowering computational complexity while preserving critical information. This step also prevents overfitting by eliminating unnecessary variations in the data. Finally, the high-level representations generated by CNN are passed to the Bi-LSTM layer, allowing the model to capture long-term dependencies in ECG signals. By first extracting spatial dependencies through CNN, the model ensures that the LSTM can focus on learning sequential relationships. CNN serves as an automated feature extractor, transforming raw ECG signals into meaningful representations that improve the efficiency and accuracy of anomaly detection in the system.

Bi-LSTM layers: The Bi-LSTM layer plays a critical role in capturing the sequential dependencies and temporal patterns within ECG signals. Unlike traditional LSTMs, which process data in a single direction (forward or backward), Bi-LSTM processes the input in both directions, enabling the model to learn patterns from past and future time steps simultaneously. This is particularly important in ECG anomaly detection, where abnormalities may depend on both preceding and succeeding cardiac cycles. The Bi-LSTM layer follows the CNN feature extraction stage to ensure that the input passed to the LSTM network is already enriched with relevant spatial features. The CNN extracts local features, such as QRS complexes and wave morphologies, while the Bi-LSTM focuses on learn the temporal relationships between these features. This allows the model to recognize irregularities in heart rhythms that may not be apparent in isolated segments of the ECG signal. By leveraging Bi-LSTM, the model effectively detects anomalies by understanding both short-term fluctuations and long-term dependencies in ECG patterns. This enhances the robustness of the anomaly detection system and improves its ability to differentiate between normal and abnormal heartbeats. In addition, the return_sequences = True parameter in the Bi-LSTM layer ensures that the output maintains its sequential structure, which allows subsequent layers, such as the attention mechanism, to refine the learned representations further.

Multi-Head Attention layers: The Multi-Head Attention layer’s role is vital to enhance the ECG-CBA model’s ability to focus on essential features within the ECG signal. Unlike traditional sequential models that process data linearly, the attentional mechanism enables the model to selectively attend to different parts of the ECG signal simultaneously, capturing both local and long-range dependencies effectively. The use of the attention layer enables the model to recognize subtle variations in ECG signals that may indicate arrhythmias or other abnormalities.

Normalization layers: Finally, a normalization layer is added to stabilize training and enhance feature consistency, ensuring that the attended information is well-integrated before final classification.

4.2.2. ECG Decoder Blocks

The decoder block in the ECG anomaly detection model is responsible for transforming the encoded feature representations into a final classification decision. It consists of a Flatten layer, which converts the multi-dimensional feature representations from the encoder into a 1D vector, making it suitable for classification. This is followed by a Dense layer with a sigmoid activation function, which outputs a probability score between 0 and 1, determining whether the ECG signal is normal (0) or anomalous (1). The decoder receives the processed feature representations from the encoder. By flattening these high-dimensional features and applying a dense layer with sigmoid activation, the decoder enables binary classification. It plays a crucial role in translating learned feature representations into meaningful predictions, allowing the model to distinguish between normal and abnormal heartbeats effectively.

4.3. Classification

The model is trained on the training data and evaluated on the test set, with performance metrics such as accuracy, precision, recall, and F1-score calculated to assess its effectiveness. The results demonstrate the model’s ability to classify ECG signals as normal or abnormal, providing a robust solution for ECG-based anomaly detection. This approach leverages the strengths of CNNs, Bi-LSTMs, and attention mechanisms to achieve high accuracy and interpretability in detecting cardiac abnormalities.

In binary classification problems, models often output a probability score between 0 and 1, which must be converted into a class label using a decision threshold. A standard default threshold is 0.5, but this may not be optimal for anomaly detection. To find the optimal classification threshold, we used Youden’s J statistic [37], which identifies the threshold that maximizes the difference between the True Positive Rate (TPR) and the False Positive Rate (FPR):

J = T P R - F P R

(14)

The threshold that maximizes J is chosen as the optimal threshold:

T = arg max (T P R - F P R)

(15)

where T is the optimal threshold.

The ROC-based threshold selection method improves model performance by dynamically choosing the threshold that maximizes sensitivity while minimizing false alarms. This method is particularly advantageous in applications such as anomaly detection, medical diagnosis, and fraud detection, where the costs of False Positives and False Negatives are critical. In heartbeat anomaly detection, a lower threshold helps catch all possible cases, even at the cost of some false alarms.

The pseudocode of the proposed ECG-CBA is outlined in Algorithm 1.

Algorithm 1 ECG-CBA: ECG Classification using CNN, Bi-LSTM, and Attention.

1:: Input: Training set (normal and anomaly ECG samples), Validation set (normal and anomaly ECG samples), Test set (normal and anomaly ECG samples)
2:: Output: Predicted test ECG as normal or anomaly
3:: Hyper-parameters: Activation: Sigmoid, batch_size = 32, Learning rate: 0.001, Optimizer: Adam, Loss: Binary Cross-Entropy (BCE), EarlyStopping(patience = 5)
4:: Training Phase
5:: for each ECG sample in the training set do
6:: Pass the sample through CNN layers to capture local patterns
7:: Process features through Bi-LSTM layers to learn temporal dependencies
8:: Apply Multi-Head Attention to highlight important features
9:: Normalize the pattern representation of the ECG signal
10:: Pass the encoded representation through a fully connected output layer (sigmoid activation)
11:: Compute and minimize the binary cross-entropy loss metric
12:: end for
13:: Validate the model using validation ECG samples and compute validation loss
14:: Testing Phase
15:: for each test ECG sample (normal or anomaly) do
16:: Pass the test sample through the trained ECG-CBA model
17:: Compute the classification probability (P)
18:: Determine the threshold $τ$ using manual tuning or adaptive thresholding (Youden’s J method)
19:: if $P \leq τ$ then
20:: Predict ECG as Normal
21:: else
22:: Predict ECG as Abnormal
23:: end if
24:: end for

5. Implementation and Training Framework of the ECG-CBA Model

This section describes the implementation details and training configuration of the proposed ECG-CBA model to ensure experimental reproducibility.

5.1. Model Implementation

Each ECG sample serves as the input to the model, represented as a one-dimensional sequence of

T = 140

timesteps with a single feature channel. Prior to training, the data are reshaped to

(T, 1)

for CNN compatibility and normalized to the range [0, 1].

The encoder network combines convolutional, recurrent, and attention-based components to extract hierarchical temporal features from the ECG signals. Its structure is summarized as follows:

Conv1D Layer 1: 32 filters, kernel size = 3, stride = 1, activation = ReLU, padding = “same”. Output shape: $(T, 32)$ .
MaxPooling1D Layer 1: Pool size = 2. Output shape: $(T / 2, 32)$ .
Conv1D Layer 2: 64 filters, kernel size = 3, stride = 1, activation = ReLU, padding = “same”. Output shape: $(T / 2, 64)$ .
MaxPooling1D Layer 2: Pool size = 2. Output shape: $(T / 4, 64)$ .
Bidirectional LSTM: 64 units per direction, return_sequences=True. Output shape: $(T / 4, 128)$ .
Multi-Head Self-Attention: Four heads, key dimension = 64. Output shape: $(T / 4, 128)$ .
Layer Normalization: Normalizes attention outputs.

The Bi-LSTM layer produces an output tensor of shape (35, 128), corresponding to 35 timesteps with 128 features obtained by concatenating the forward and backward LSTM outputs (each with 64 units). This tensor is used as the input to the Multi-Head Self-Attention mechanism. The same tensor provides the Query, Key, and Value representations, each projected through learned weight matrices of size

(128 \times 64)

. The attention layer employs four heads with a key dimension of 64.

Within each head, pairwise similarity scores between the query and key vectors are computed and normalized using the softmax function to derive attention weights. These weights scale the Value representations to generate context-aware features. The four head outputs, each of dimension 64, are concatenated into a (35, 256) tensor and linearly transformed back to (35, 128) using an output projection. Layer Normalization follows this step to stabilize model convergence and maintain consistent feature distributions. This self-attention process enables the model to capture long-range temporal dependencies across ECG sequences, complementing the localized and sequential representations learned by the CNN and Bi-LSTM layers.

The decoder module consists of a flattening layer followed by a single dense neuron with a sigmoid activation function. This configuration outputs a probability value between 0 and 1, indicating the likelihood of an anomalous ECG segment.

5.2. Training Configuration

As shown in Table 1, the model was trained using the Adam optimizer with an initial learning rate of

0.001

,

β_{1} = 0.9

,

β_{2} = 0.999

, and

ϵ = 10^{- 7}

. A fixed batch size of 32 and a maximum of 50 epochs were adopted. Early stopping monitored validation loss with a patience of five epochs and a minimum delta of

1 \times 10^{- 4}

, ensuring training termination when performance plateaued. The binary cross-entropy loss function was employed, and accuracy served as the primary performance metric. Model checkpoints were automatically saved at the lowest validation loss to prevent overfitting.

Post-training evaluation included accuracy, precision, recall, F1-score, and AUC metrics. All experiments were conducted using TensorFlow 2.15 and Keras 3 API s to ensure replicable and transparent results.

6. Experimental Evaluation

6.1. Datasets

To validate the performance of the proposed ECG-CBA model, the publicly available ECG5000 dataset [8] and the MIT-BIH arrhythmia dataset [10] are utilized.

The ECG5000 dataset is a widely used resource to analyze electrocardiogram (ECG) signals, particularly for anomaly detection and classification tasks. The ECG5000 dataset is publicly available and consists of 5000 ECG heartbeat signals, randomly selected from a larger 20 h ECG recording. The preprocessing of this dataset involved two key steps: first, segmenting the continuous ECG recordings to extract individual heartbeats, and second, normalizing the length of each heartbeat using interpolation to ensure consistency across samples. Each heartbeat signal is represented as a time series of 140 data points, forming a structured dataset with a total dimension of (5000 × 141), where the first 140 columns contain the heartbeat signal, and the last column represents the corresponding class label. ECG5000 dataset is categorized into five distinct heartbeat types, enabling both normal and abnormal heartbeat classification. These categories include the following: Normal (N): 2919 samples, R-on-T Premature Ventricular Contraction (R-on-T PVC): 1767 samples, Premature Ventricular Contraction (PVC): 96 samples, Supra-ventricular Premature or Ectopic Beat (SP or EB): 194 samples, and Unclassified Beat (UB): 24 samples. The structured representation of the data allows deep learning models to effectively learn patterns associated with different heart conditions, making it an ideal benchmark for evaluating anomaly detection approaches in ECG analysis. Since the ECG-CBA model is trained for binary classification, we encoded the ECG signals as follows: zero for normal heartbeats and one for abnormal heartbeats, as shown in Figure 6A.

To evaluate the generalizability of the ECG-CBA model, it was also tested on the MIT-BIH Arrhythmia Database. The MIT-BIH Arrhythmia dataset contains five distinct categories of heartbeat signals designed for training and testing purposes. Each heartbeat signal is represented as a time series of 187 data points, with the target column representing five class labels: zero for normal heartbeats (“N”), one for supraventricular premature beats (“S”), two for ventricular escape beats (“V”), three for fusion of ventricular and normal beats (“F”), and four for unclassified heartbeats (“Q”). For binary classification, we relabeled the ECG signals as follows: zero for normal heartbeats and one for abnormal heartbeat classes (S, V, F, and Q). The training set consists of 87,554 data points, and the test set contains 21,892 data points. However, the dataset is highly imbalanced, with the majority class—normal heartbeats labeled “0”—comprising 72,471 data points, which account for approximately 83% of the training data. The abnormal classes are underrepresented: label “4” has 6431 data points, label “2” has 5788 data points, label “1” has 2223 data points, and label “3” has only 641 data points. This significant class imbalance can lead to biased models. To address this, we applied undersampling to the “Normal” class, as shown in the Figure 6B. By randomly selecting 18,000 rows from the filtered subset, we balanced the data to improve prediction accuracy. The undersampling technique reduces the number of rows in the “Normal” class to make it comparable to the other classes, helping achieve more precise predictions.

6.2. Data Morphology Analysis

The morphological variability graph for each ECG class is displayed in Figure 7, where the highlighted regions represent the overall average amplitude (dark line), with one standard deviation shown on both sides (light area). Through graphical analysis of the means and standard deviations for each class, it is clear that the time series graph for the normal class follows a distinct pattern when compared to the anomaly classes.

Since all anomaly classes follow similar heart rhythm patterns, it becomes easier to differentiate between normal and abnormal ECG signals.

As shown in Figure 8, all four abnormal classes (1, 2, 3, and 4) show irregular rhythms and disruptions in the normal wave structure. Unlike normal beats, these abnormal heartbeat classes lack a clear, distinct pattern, making multi-class classification more challenging. However, their irregularity makes it easier to differentiate them from normal beats in binary ECG signal classification.

6.3. Performance Evaluation Measures

To evaluate the performance of the proposed ECG-CBA model for detecting abnormal and normal ECG signals, we employed a Confusion Matrix along with four key evaluation metrics: Precision, Recall, F1-score, and Accuracy.

Confusion Matrix: The confusion matrix, presented in Table 2, illustrates the number of correctly and incorrectly classified ECG signals. It categorizes predictions into True Positives (TP) and True Negatives (TN) for correctly identified abnormal and normal signals, respectively, while False Positives (FP) and False Negatives (FN) represent misclassifications.

Evaluation Metrics To assess the model’s classification effectiveness, we compute the following metrics:

-: Precision measures the proportion of correctly identified abnormal ECG signals among all instances predicted as abnormal. It is given by the following:

$P r e c i s i o n = \frac{T P}{T P + F P}$

(16)
-: Recall quantifies the proportion of correctly detected abnormal ECG signals out of all actual abnormal instances. It is defined as follows:

$R e c a l l = \frac{T P}{T P + F N}$

(17)
-: F1-score provides a harmonic mean of Precision and Recall, balancing both measures. It is calculated as follows:

$F 1 - S c o r e = \frac{2 \cdot (P r e c i s i o n \cdot R e c a l l)}{(P r e c i s i o n + R e c a l l)}$

(18)
-: Accuracy evaluates the overall model performance by measuring the ratio of correctly classified ECG signals (both normal and abnormal) to the total number of instances:

$A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}$

(19)

These metrics provide a comprehensive assessment of the ECG-CBA model’s ability to distinguish between normal and abnormal ECG signals, ensuring reliable detection and classification.

7. Experiments

7.1. Experiments Setup

The experiments were conducted using Google Colab for training on the ECG5000 dataset and Kaggle’s cloud-based environment for training on the MIT-BIH dataset. Google Colab was selected for ECG5000 due to its availability of a Tesla T4 GPU (16 GB VRAM, NVIDIA, Santa Clara, CA, USA), which was sufficient for handling the relatively more minor dataset. For the larger MIT-BIH dataset, Kaggle’s NVIDIA A100 GPU environment was utilized due to its higher computational capacity, allowing for efficient training on a more extensive dataset.

7.2. Experimental Results

The proposed ECG-CBA model was trained with the same architecture for both the ECG5000 and MIT-BIH datasets. The architecture consisted of two Conv1D layers, where the first had 32 filters with a ReLU activation function, followed by MaxPooling with a pool size of two. The second Conv1D layer had 64 filters, also using ReLU activation and MaxPooling with a pool size of two. After convolutional feature extraction, a Bi-LSTM layer with 64 units was applied. Additionally, a Multi-Head Attention mechanism with four heads and a key dimension of 64 was integrated to enhance important sequential patterns. Finally, a Layer Normalization step was used to stabilize training and improve model convergence. The proposed model was trained for 200 epochs using an early stopping mechanism (patience = 5) with a learning rate of 0.001 and a binary cross-entropy loss function.

Table 3 presents the classification performance of the proposed ECG-CBA on the ECG5000 and MIT-BIH datasets, using the following metrics: accuracy, precision, recall, and F1 score. The ECG5000 model demonstrates exceptional performance, achieving near-perfect scores across all metrics, with 99.6% accuracy, 99.31% precision, 100% recall, and 99.65% F1 score. The perfect recall score of 1.0 indicates that the model correctly identifies all positive cases without false negatives, while the high precision shows minimal False Positives. In addition, the MIT-BIH dataset yields slightly lower but still strong results, with 98.80% accuracy, 98.55% precision, 96.90% recall, and 97.72% F1 score. Both datasets show balanced precision and recall, as evidenced by F1 scores closely aligned with their accuracy, indicating robust model performance in both cases.

Figure 9 and Figure 10 clearly show that both the training and validation accuracy continuously increase over the epochs and eventually stabilize.

We evaluate the performance of our proposed model using the ROC curve, which illustrates how well the classification model performs across different thresholds. As shown in Figure 11 and Figure 12, the ROC curve of the ECG-CBA model exhibits a steep ascent toward the top-left corner, indicating a strong classification capability. Additionally, the Area Under the Curve (AUC-ROC) is very close to 1.0, indicating near-perfect discrimination between normal and abnormal ECG signal cases.

The confusion matrix results demonstrate the strong performance of the proposed ECG-CBA model in classifying ECG5000 test samples into Normal and Abnormal rhythm categories.

As shown in Figure 13 and Figure 14, the confusion matrices demonstrate a strong classification performance of the ECG-CBA model in distinguishing between normal and abnormal cases on both ECG5000 and MIT-BIH datasets. For normal instances, it correctly classified 436 (ECG5000) and 1717 (MIT-BIH) cases, with only four (ECG5000) and 34 (MIT-BIH) False Positives (misclassified as abnormal). For abnormal instances, it accurately identified 558 (ECG5000) and 1509 (MIT-BIH) cases, with just two (ECG5000) and 49 (MIT-BIH) False Negatives (misclassified as normal). This reflects robust classification accuracy across both datasets, with slightly higher precision on the ECG5000 dataset. This indicates a high level of accuracy with minimal misclassification errors. The low False Positive Rate suggests strong specificity, ensuring normal cases are rarely misidentified as abnormal. Similarly, the low false negative rate highlights the model’s excellent sensitivity, ensuring that abnormal cases are accurately detected. The balance between precision and recall confirms the model’s reliability in detecting ECG anomalies, making it highly effective for real-world applications in medical diagnostics.

7.3. Sensitvity Analysis

To achieve optimal performance of the ECG-CBA model, a series of experiments was conducted to analyze parameter sensitivity and fine-tune the model by adjusting key hyperparameters. These included the Threshold selection, the number of neurons per convolutional layer, the direction of the LSTM (forward vs. bidirectional), the number of multi-head attention mechanisms, and the training-to-testing data split percentage.

7.3.1. Threshold Selection

In Table 4, the analysis of different threshold values on the ECG5000 and MIT-BIH datasets reveals significant variations in classification performance. The thresholds of 0.3 and 0.5 were manually selected, while the 0.7 threshold was determined using Youden’s method, optimizing the balance between sensitivity and specificity. At 0.3, recall is the highest (100% for ECG5000, 97.93% for MIT-BIH), but precision is lower due to a higher number of false positives. Increasing the threshold to 0.5 improves precision (98.97% for ECG5000, 97.79% for MIT-BIH) while maintaining high accuracy. At 0.7, accuracy reaches its peak (99.60% for ECG5000, 98.80% for MIT-BIH), with better precision but slightly lower recall. These findings indicate that lower thresholds are more suitable for scenarios where missing positive cases are critical, such as detecting cardiac anomalies. In comparison, higher thresholds are beneficial in reducing false alarms and ensuring more reliable positive classifications.

7.3.2. Training and Testing Splits

Table 5 presents the performance comparison of the ECG-CBA model on the ECG5000 and MIT-BIH datasets using different train–test splits. The results demonstrate that the ECG-CBA model maintains the highest performance across different train–test splits. While the training data is considerably reduced, the accuracy, precision, recall, and F1 score remain high, particularly for the ECG5000 dataset. Even with only 10% of the data used for training, the model still achieves an accuracy of 0.9859 for ECG5000 and 0.9283 for MIT-BIH. This indicates the robustness of the model, as it generalizes well despite limited training data.

7.3.3. CNN Layers

Table 6 compares the performance of different CONV1D configurations on the ECG5000 and MIT-BIH datasets. The results indicate that increasing the number of filters generally improves performance up to a certain point. The CONV1D 32, 64 configuration achieves the highest accuracy (0.9960 for ECG5000 and 0.9880 for MIT-BIH), demonstrating that this setting effectively captures features from the ECG signals. The CONV1D 16, 32 configuration performs slightly lower, while the CONV1D 64, 128 configuration results in a slight drop in accuracy and F1 score, likely due to overfitting or an increased complexity that does not generalize as well. These findings suggest that a balanced approach, rather than simply increasing or decreasing filters, is essential for optimal performance.

7.3.4. Bi-LSTM Layer

Table 7 compares the performance of Forward LSTM and Bi-LSTM on the ECG5000 and MIT-BIH datasets. The results show that Bi-LSTM outperforms Forward LSTM across all metrics for both datasets. Specifically, Bi-LSTM achieves higher accuracy (0.9960 vs. 0.9860 for ECG5000 and 0.9880 vs. 0.9646 for MIT-BIH), indicating its ability to better capture temporal dependencies in ECG signals. The improvement in recall (1.0000 vs. 0.9862 for ECG5000 and 0.9690 vs. 0.9346 for MIT-BIH) suggests that Bi-LSTM is more effective in correctly identifying abnormal heartbeats. These results confirm that considering both past and future contexts, as Bi-LSTM does, enhances the model’s performance compared to a unidirectional LSTM.

7.4. Ablation Experiments

Table 8 presents a comprehensive performance comparison of various model architectures utilizing CNN, Bi-LSTM, and Attention mechanisms on the ECG5000 and MIT-BIH datasets. The performance comparison of different model configurations highlights the effectiveness of integrating CNN, Bi-LSTM, and Attention mechanisms for ECG anomaly detection. The best-performing configuration, which includes all three components, achieves the highest accuracy (0.9960 for ECG5000 and 0.9880 for MIT-BIH), demonstrating that CNN extracts spatial features, Bi-LSTM captures long-term dependencies, and the Attention mechanism enhances feature weighting. Removing the Attention layer results in a noticeable performance drop, particularly in recall, suggesting that Attention enhances the model’s ability to detect anomalies in the minority class. Similarly, excluding Bi-LSTM while retaining CNN and Attention results in a decrease in accuracy and F1 score, confirming that Bi-LSTM plays a crucial role in learning sequential dependencies. Models relying solely on CNN or Bi-LSTM exhibit significantly lower performance, showing that these components complement each other—CNN excels at feature extraction, while Bi-LSTM models temporal dependencies. Additionally, the results indicate that ECG5000 outperforms MIT-BIH across all configurations, likely due to differences in class distribution between the datasets. While CNN + Bi-LSTM remains a viable trade-off for lightweight models, the combination of CNN, Bi-LSTM, and Attention provides the most robust performance, achieving optimal accuracy and recall. This comprehensive hybrid model proves to be the most effective for ECG anomaly detection across different datasets and train–test splits.

7.5. Comparison Performance with Related Works

In this section, we have included in the comparison only models that were tested on the entire set of features of the datasets rather than on a subset of features, since the selection of only a sample of features could introduce bias in accuracy and lead to a feature selection process that is not suitable for real-time applications. This ensures a fair evaluation and highlights the superiority of direct feature extraction over signal reconstruction, making the proposed ECG-CBA model more accurate, robust, and efficient for real-time ECG classification.

In Table 9, we compare ECG classification models based on two primary approaches: signal reconstruction-based models and feature extraction-based models. Signal reconstruction-based models, such as ECG-NET [14] (98.36% on ECG5000), AttentivECGRU [28] (99.14% on ECG5000), and Qin et al. [15] (95.50% on MIT-BIH), rely on reconstructing ECG signals and identifying anomalies based on reconstruction errors. While these models achieve high accuracy, they are sensitive to noise and struggle with detecting unseen abnormal patterns. In contrast, feature extraction-based models, including Farag et al. [23] (98.18% on MIT-BIH), and Pham et al. [26] (98.28% on MIT-BIH), directly learn discriminative features from ECG signals, improving robustness and adaptability.

Notably, almost all existing models have been tested on a single dataset, making it difficult to assess their generalizability across different ECG sources. To the best of our knowledge, no prior model has been evaluated on both the ECG5000 and MIT-BIH datasets. In contrast, the proposed ECG-CBA model has been tested on both datasets, demonstrating superior adaptability and robustness. As shown, ECG-CBA outperforms other models, achieving 99.60% on ECG5000 and 98.80% on MIT-BIH. These results confirm that ECG-CBA provides a highly accurate, reliable, and generalizable solution for real-time ECG anomaly detection.

8. Conclusions and Future Work

In this study, we proposed ECG-CBA, a deep learning-based model for real-time ECG anomaly detection that integrates CNN, Bi-LSTM, and an attention mechanism to enhance feature extraction and classification performance. Unlike traditional methods that rely on signal reconstruction or handcrafted feature extraction, ECG-CBA directly learns discriminative features from raw ECG signals, improving both accuracy and generalizability.

A comprehensive evaluation was conducted using two benchmark ECG datasets, ECG5000 and MIT-BIH, to assess the robustness of ECG-CBA. The results demonstrate that our model achieves 99.60% accuracy on ECG5000 and 98.80% accuracy on MIT-BIH, outperforming existing models. The findings confirm that ECG-CBA offers a highly accurate and robust solution for detecting ECG anomalies. Future work will focus on expanding the model’s adaptability to multi-class classification scenarios and real-world ECG monitoring systems.

Author Contributions

Conceptualization, K.A. and S.F.; methodology, K.A., S.F. and G.A.-N.; validation, S.F. and Y.S.; formal analysis, K.A., S.F. and Y.S.; investigation, G.A.-N. and Y.S.; data curation, S.F. and Y.S.; writing—original draft preparation, K.A., S.F., G.A.-N. and Y.S.; writing—review and editing, K.A., S.F., G.A.-N. and Y.S.; visualization, S.F. and Y.S.; supervision, S.F.; project administration, S.F.; funding acquisition, K.A. All authors have read and agreed to the published version of the manuscript.

Funding

This article’s study was funded by the Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, UAE. Grant No.: 2024-IRG-ENIT-17.

Data Availability Statement

The datasets used in this study are publicly available and can be accessed as follows: ECG5000 Dataset: This dataset is available from: https://www.timeseriesclassification.com/description.php?Dataset=ECG5000 (accessed on 19 October 2025); MIT-BIH Arrhythmia Dataset: This dataset is publicly accessible from: https://physionet.org/content/mitdb/1.0.0/ (accessed on 19 October 2025).

Acknowledgments

We thank Ajman University for supporting this research via the above funding details.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anbalagan, T.; Nath, M.K.; Vijayalakshmi, D.; Anbalagan, A. Analysis of various techniques for ECG signal in healthcare, past, present, and future. Biomed. Eng. Adv. 2023, 6, 100089. [Google Scholar] [CrossRef]
Houssein, E.H.; Kilany, M.; Hassanien, A.E. ECG signals classification: A review. Int. J. Intell. Eng. Inform. 2017, 5, 376–396. [Google Scholar] [CrossRef]
Rajput, A.; Patel, N.R.; Bhati, R.S.; Singh, A.P.; Fatima, H. Anomaly Detection in ECG using Deep Learning. In Proceedings of the 11th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 28 February–1 March 2024; pp. 76–82. [Google Scholar]
Liu, W.; Wang, S.; Yin, Z.; Tang, Z. Structural damage detection of switch rails using deep learning. NDT & E Int. 2024, 147, 103205. [Google Scholar]
Liu, W.; Hu, J.; Lv, F.; Tang, Z. A new method for long-term temperature compensation of structural health monitoring by ultrasonic guided wave. Measurement 2025, 252, 117310. [Google Scholar] [CrossRef]
Safdar, M.F.; Pałka, P.; Nowak, R.M.; Faresi, A.A. A novel data augmentation approach for enhancement of ECG signal classification. Biomed. Signal Process. Control 2023, 86, 105114. [Google Scholar] [CrossRef]
Jiang, A.; Huang, C.; Cao, Q.; Xu, Y.; Zeng, Z.; Chen, K.; Zhang, Y.; Wang, Y. Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning. arXiv 2024, arXiv:2404.04935. [Google Scholar] [CrossRef]
Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
Ansari, Y.; Mourad, O.; Qaraqe, K.; Serpedin, E. Deep learning for ECG Arrhythmia detection and classification: An overview of progress for period 2017–2023. Front. Physiol. 2023, 14, 1246746. [Google Scholar] [CrossRef]
Moody, G.B.; Mark, R.G. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [Google Scholar] [CrossRef]
Merdjanovska, E.; Rashkovska, A. Comprehensive survey of computational ECG analysis: Databases, methods and applications. Expert Syst. Appl. 2022, 203, 117206. [Google Scholar] [CrossRef]
Tang, S.-N.; Chen, Y.-H.; Chang, Y.-W.; Chen, Y.-T.; Chou, S.-H. Hybrid CNN-LSTM Network for ECG Classification and Its Software-Hardware Co-Design Approach. In Proceedings of the 20th International SoC Design Conference (ISOCC), Jeju Island, Republic of Korea, 25–28 October 2023; pp. 173–174. [Google Scholar] [CrossRef]
Xia, Y.; Xiong, Y.; Wang, K. A transformer model blended with CNN and denoising autoencoder for inter-patient ECG arrhythmia classification. Biomed. Signal Process. Control 2023, 86, 105271. [Google Scholar] [CrossRef]
Roy, M.; Majumder, S.; Halder, A.; Biswas, U. ECG-NET: A deep LSTM autoencoder for detecting anomalous ECG. Eng. Appl. Artif. Intell. 2023, 124, 106484. [Google Scholar] [CrossRef]
Qin, J.; Gao, F.; Wang, Z.; Wong, D.C.; Zhao, Z.; Relton, S.D.; Fang, H. A novel temporal generative adversarial network for electrocardiography anomaly detection. Artif. Intell. Med. 2023, 136, 102489. [Google Scholar] [CrossRef] [PubMed]
Pereira, J.; Silveira, M. Unsupervised representation learning and anomaly detection in ECG sequences. Int. J. Data Min. Bioinform. 2019, 22, 389–407. [Google Scholar] [CrossRef]
Dutta, K.; Lenka, R.; Nayak, S.R.; Khandual, A.; Bhoi, A.K. MED-NET: A novel approach to ECG anomaly detection using LSTM auto-encoders. Int. J. Comput. Appl. Technol. 2021, 65, 343–357. [Google Scholar] [CrossRef]
Munawar, S.; Angappan, G.; Konda, S. Arrhythmia Classification Based on Bi-Directional Long Short-Term Memory and Multi-Task Group Method. Int. J. E-Collab. 2023, 19, 1–18. [Google Scholar] [CrossRef]
Gutiérrez-Gnecchi, J.A.; Morfin-Magaña, R.; Lorias-Espinoza, D.; Tellez-Anguiano, A.d.; Reyes-Archundia, E.; Méndez-Patiño, A.; Castañeda-Miranda, R. DSP-based arrhythmia classification using wavelet transform and probabilistic neural network. Biomed. Signal Process. Control 2017, 32, 44–56. [Google Scholar] [CrossRef]
Mulam, H.; Chikati, V.R.; Salugu, B. Electrocardiogram Based Arrhythmia Classification Using Long Short-Term Memory with Luong Attention Mechanism. Int. J. Intell. Eng. Syst. 2024, 17, 696–705. [Google Scholar] [CrossRef]
Ramaiah, R.; Srikantegowda, K. Coronary heart disease classification using improved penguin emperor optimization-based long short term memory network. IIUM Eng. J. 2023, 24, 67–85. [Google Scholar] [CrossRef]
Ahmed, A.A.; Ali, W.; Abdullah, T.A.A.; Malebary, S.J. Classifying cardiac arrhythmia from ECG signal using 1D CNN deep learning model. Mathematics 2023, 11, 562. [Google Scholar] [CrossRef]
Farag, M.M. A tiny matched filter-based CNN for inter-patient ECG classification and arrhythmia detection at the edge. Sensors 2023, 23, 1365. [Google Scholar] [CrossRef]
Qi, T.; Zhang, H.; Zhao, H.; Shen, C.; Liu, X. Research on ECG Signal Classification Based on Hybrid Residual Network. Appl. Sci. 2024, 14, 11202. [Google Scholar] [CrossRef]
Ram, R.S.; Akilandeswari, J.; Kumar, M.V. HybDeepNet: A Hybrid Deep Learning Model for Detecting Cardiac Arrhythmia from ECG Signals. Inf. Technol. Control 2023, 52, 433–444. [Google Scholar] [CrossRef]
Pham, B.-T.; Le, P.T.; Tai, T.-C.; Hsu, Y.-C.; Li, Y.-H.; Wang, J.-C. Electrocardiogram heartbeat classification for arrhythmias and myocardial infarction. Sensors 2023, 23, 2993. [Google Scholar] [CrossRef]
Murat, F.; Yildirim, O.; Talo, M.; Baloglu, U.B.; Demir, Y.; Acharya, U.R. Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput. Biol. Med. 2020, 120, 103726. [Google Scholar] [CrossRef]
Roy, M.; Halder, A.; Majumder, S.; Biswas, U. AttentivECGRU: GRU based autoencoder with attention mechanism and automated fuzzy thresholding for ECG arrhythmia detection. Appl. Soft Comput. 2024, 167, 112337. [Google Scholar] [CrossRef]
Sanjalawe, Y.; Althobaiti, T. DDoS Attack Detection in Cloud Computing Based on Ensemble Feature Selection and Deep Learning. Comput. Mater. Contin. 2023, 75, 3571. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef] [PubMed]
Zwayed, F.A.; Anbar, M.; Manickam, S.; Sanjalawe, Y.; Alrababah, H.; Hasbullah, I.H.; Almi’ani, N. An efficient intrusion detection systems in fog computing using forward selection and BiLSTM. Bull. Electr. Eng. Inform. 2024, 13, 2586–2603. [Google Scholar] [CrossRef]
Dai, W.; Li, X.; Ji, W.; He, S. Network intrusion detection method based on CNN, BiLSTM, and attention mechanism. IEEE Access 2024, 12, 53099–53111. [Google Scholar]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Al-deen, H.S.S.; Zeng, Z.; Al-sabri, R.; Hekmat, A. An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl. Syst. Innov. 2021, 4, 85. [Google Scholar] [CrossRef]
Zhao, F.; Feng, F.; Ye, S.; Mao, Y.; Chen, X.; Li, Y.; Ning, M.; Zhang, M. Multi-head self-attention mechanism-based global feature learning model for ASD diagnosis. Biomed. Signal Process. Control 2024, 91, 106090. [Google Scholar] [CrossRef]
Hassanzad, M.; Hajian-Tilaki, K. Methods of determining optimal cut-point of diagnostic biomarkers with application of clinical data in ROC analysis: An update review. BMC Med. Res. Methodol. 2024, 24, 84. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Typical architecture of CNNs.

Figure 2. Typical architecture of Bi-LSTM.

Figure 3. Transformer architecture with Multi-Head Attention layers.

Figure 4. The architecture of the ECG Detection System.

Figure 5. Illustration of the proposed ECG-CBA model architecture to classify ECG signals.

Figure 6. Distribution of normal and abnormal classes of ECG5000 (A) and MIT-BIH (B) dataset.

Figure 7. Morphological graph of ECG5000 Normal (blue) and abnormal (red) classes. The highlighted regions indicate the overall average amplitude (dark line) surrounded by standard deviation, on both sides (light).

Figure 8. Morphological graph of each class of the MIT-BIH dataset. Zero for normal heartbeats, one for supraventricular premature beats, two for ventricular escape beats, three for fusion of ventricular, and hour for unclassified heartbeats.

Figure 9. Accuracy of ECG Detection using ECG-CBA on the ECG5000 Dataset.

Figure 10. Accuracy of ECG Detection using ECG-CBA on the MIT-BIH Dataset.

Figure 11. ROC and AUC of ECG Detection using ECG-CBA on the ECG5000 Dataset.

Figure 12. ROC and AUC of ECG Detection using ECG-CBA on the MIT-BIH Dataset.

Figure 13. Confusion matrix showing the classification results of test samples from the ECG5000 dataset using the proposed ECG-CBA model.

Figure 14. Confusion matrix showing the classification results of test samples from the MIT-BIH dataset using the proposed ECG-CBA model.

Table 1. Training configuration of the ECG-CBA model.

Parameter	Value	Description
Optimizer	Adam	Learning rate = 0.001, $β_{1}$ = 0.9, $β_{2}$ = 0.999, $ϵ = 10^{- 7}$
Loss function	Binary Cross-Entropy	Objective for binary anomaly detection
Batch size	32	Constant across all experiments
Epochs	50	Maximum training iterations
Early stopping patience	5	Stops after 5 epochs without improvement
Early stopping delta	$1 \times 10^{- 4}$	Minimum improvement threshold
Validation split	0.2	Randomly selected 20% of training data
Shuffle	True	Applied at the start of each epoch
Random seed	21	Ensures reproducibility

Table 2. Confusion Matrix.

Actual Class	Predicted Class
	Abnormal	Normal
Abnormal	True Positives (TP)	False Positives (FP)
Normal	False Negatives (FN)	True Negatives (TN)

Table 3. Performance overview of the proposed ECG-CBA on ECG5000 and NIT-BIH datasets.

Dataset	Accuracy	Precision	Recall	F1 Score
ECG5000	0.9960	0.9931	1.0	0.9965
MIT-BIH	0.9880	0.9880	0.9690	0.9772

Table 4. Performance comparison of the ECG-CBA model on the ECG5000 and MIT-BIH datasets using different thresholds.

Threshold	ECG5000				MIT-BIH
Threshold	Accuracy	Precision	Recall	F1 Score	Accuracy	Precision	Recall	F1 Score
0.3	0.9900	0.9831	1.0000	0.9914	0.9697	0.9588	0.9793	0.9680
0.5	0.9920	0.9897	0.9965	0.9931	0.9749	0.9779	0.9774	0.9732
0.7	0.9960	0.9931	1.0000	0.9965	0.9880	0.9855	0.9690	0.9772

Table 5. Performance comparison of the ECG-CBA model on the ECG5000 and MIT-BIH datasets using different train–test splits.

Train–Test Split	ECG5000				MIT-BIH
Train–Test Split	Accuracy	Precision	Recall	F1 Score	Accuracy	Precision	Recall	F1 Score
Train 90%, Test 10%	0.9960	0.9931	1.0000	0.9965	0.9880	0.9855	0.9690	0.9772
Train 80%, Test 20%	0.9930	0.9911	0.9964	0.9937	0.9735	0.9738	0.9669	0.9703
Train 10%, Test 90%	0.9859	0.9826	0.9935	0.9880	0.9283	0.9497	0.8899	0.9188

Table 6. Performance comparison of different CONV1D configurations on the ECG5000 and MIT-BIH datasets.

CONV1D Configuration	ECG5000				MIT-BIH
CONV1D Configuration	Accuracy	Precision	Recall	F1 Score	Accuracy	Precision	Recall	F1 Score
CONV1D 16, 32	0.9940	0.9931	0.9965	0.9948	0.9731	0.9819	0.9595	0.9705
CONV1D 32, 64	0.9960	0.9931	1.0000	0.9965	0.9880	0.9855	0.9690	0.9772
CONV1D 64, 128	0.9880	0.9896	0.9896	0.9896	0.9706	0.9818	0.9542	0.9678

Table 7. Performance comparison of Forward LSTM and Bi-LSTM on the ECG5000 and MIT-BIH datasets.

LSTM Configuration	ECG5000				MIT-BIH
LSTM Configuration	Accuracy	Precision	Recall	F1 Score	Accuracy	Precision	Recall	F1 Score
Forward LSTM	0.9860	0.9896	0.9862	0.9879	0.9646	0.9882	0.9346	0.9607
Bi-LSTM	0.9960	0.9931	1.0000	0.9965	0.9880	0.9855	0.9690	0.9772

Table 8. Performance comparison of different model configurations. Y: with the specified layer; N: without the specified layer.

Dataset	CNN	Bi-LSTM	Attention	Accuracy	Precision	Recall	F1 Score
ECG5000	Y	Y	Y	0.9960	0.9931	1.0000	0.9965
	Y	Y	N	0.9860	0.9930	0.9828	0.9879
	Y	N	Y	0.9880	0.9896	0.9896	0.9896
	Y	N	N	0.9731	0.9835	0.9805	0.9820
	N	Y	Y	0.9684	0.9901	0.9553	0.9724
	N	Y	N	0.9535	0.9926	0.9271	0.9588
MIT-BIH	Y	Y	Y	0.9880	0.9855	0.9690	0.9772
	Y	Y	N	0.9582	0.9861	0.9236	0.9538
	Y	N	Y	0.9622	0.9790	0.9391	0.9587
	Y	N	N	0.9146	0.9550	0.8528	0.9010
	N	Y	Y	0.9351	0.9660	0.8889	0.9258
	N	Y	N	0.9292	0.9900	0.8554	0.9178

Table 9. Comparison performance with other ECG classification models.

Reference	Feature Extraction	Model	ECG5000	MIT-BIH
Qin et al. [15]	Signal Reconstruction	Bi-LSTM and GAN	-	95.50%
MTGBi-LSTM [18]	By the model	Bi-LSTM + Multi-Task Group	-	96.48%
Farag et al. [23]	By the model	CNN	-	98.18%
Pham et al. [26]	By the model	Conv1D + Residual Blocks	-	98.28%
Proposed ECG-CBA	By the model	CNN + Bi-LSTM + Attention	99.60%	98.80%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ammar, K.; Fraihat, S.; Al-Naymat, G.; Sanjalawe, Y. ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism. Algorithms 2025, 18, 674. https://doi.org/10.3390/a18110674

AMA Style

Ammar K, Fraihat S, Al-Naymat G, Sanjalawe Y. ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism. Algorithms. 2025; 18(11):674. https://doi.org/10.3390/a18110674

Chicago/Turabian Style

Ammar, Khalid, Salam Fraihat, Ghazi Al-Naymat, and Yousef Sanjalawe. 2025. "ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism" Algorithms 18, no. 11: 674. https://doi.org/10.3390/a18110674

APA Style

Ammar, K., Fraihat, S., Al-Naymat, G., & Sanjalawe, Y. (2025). ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism. Algorithms, 18(11), 674. https://doi.org/10.3390/a18110674

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism

Abstract

1. Introduction

2. Related Work

3. Background Study

3.1. CNN

3.2. Bi-LSTM

3.3. Multi-Head Attention

4. Proposed ECG-CBA Model

4.1. Data Preprocessing

4.1.1. Data Segmentation

4.1.2. Data Splitting

4.2. Proposed ECG-CBA Architecture

4.2.1. ECG Encoder Blocks

4.2.2. ECG Decoder Blocks

4.3. Classification

5. Implementation and Training Framework of the ECG-CBA Model

5.1. Model Implementation

5.2. Training Configuration

6. Experimental Evaluation

6.1. Datasets

6.2. Data Morphology Analysis

6.3. Performance Evaluation Measures

7. Experiments

7.1. Experiments Setup

7.2. Experimental Results

7.3. Sensitvity Analysis

7.3.1. Threshold Selection

7.3.2. Training and Testing Splits

7.3.3. CNN Layers

7.3.4. Bi-LSTM Layer

7.4. Ablation Experiments

7.5. Comparison Performance with Related Works

8. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI