1. Introduction
Electrocardiography (ECG) is an efficient, non-invasive, and low-cost tool widely employed for the accurate analysis and diagnosis of arrhythmia and other cardio diseases. However, ECG signals are usually contaminated by various types of noise, including baseline wander (BW), motion artifacts (MA), and electrode movement (EM) [
1]. The presence of these noise components has the potential to obscure crucial diagnostic features of the ECG, including the P-wave, QRS complex, and T-wave. This may result in misdiagnosis of cardiac abnormalities [
2].
In practice, the 12-lead ECG represents the most routine and crucial diagnostic tool for identifying arrhythmias. Zhang et al. [
3] investigated inter-lead relationships in a 12-lead ECG dataset by analyzing the impact of lead absence on deep neural networks and quantifying the correlation strength of individual leads through a projection-based process, demonstrating that the removal of certain leads can substantially degrade diagnostic performance.
Several approaches have been reported in the literature for ECG signal denoising. In recent years, neural network-based denoising algorithms, commonly referred to as denoising autoencoders (DAEs), have shown promising performance in noise reduction. These include fully convolutional DAEs [
1,
4], stacked denoising autoencoders [
5], and the running denoising autoencoder (RunDAE) model [
6]. Convolutional DAEs have demonstrated promising performance in denoising biomedical signals, such as ECG. One notable advantage of the convolutional DAE model is their ability to denoise ECG signals without relying on an R-peak detection algorithm to align input segments. However, a significant drawback of convolutional DAE models is their complexity, as they require multiple hidden layers and long input segments to achieve effective denoising performance [
6].
In [
1], the proposed convolutional DAE, i.e., the well-known fully convolutional neural network (FCN)-based DAE, demonstrated superior denoising performance in removing physical noises from ECG signals. Bouny, Khalil, and Adib (2021) further demonstrated the effectiveness of the convolutional DAE model in removing additive Gaussian white noise from ECG signals, with promising results in terms of signal-to-noise ratio (
) and root mean square error (
) [
7]. In [
8], a stacked contractive denoising autoencoder was proposed for ECG signal denoising which showed significant improvements in
and
. These studies in general suggest that convolutional DAE models, including FCN-based and stacked contractive variants, are effective in denoising ECG signals, with potential applications in clinical practice.
Ref. [
9] investigated the effects of correlated, uncorrelated, and jittered datasets on denoising autoencoder (DAE) performance, demonstrating that correlated ECG segments across Einthoven leads I, II, and III require fewer hidden neurons for effective denoising. In addition, a novel architecture termed the multiple parallel hidden layers DAE (MPHL-DAE) was introduced in [
10], which showed promise in capturing distinct ECG signal features and achieved superior or comparable signal-to-noise ratio improvements (
) compared with the conventional multiple hidden layers DAE (MHL-DAE).
However, these ECG denoising methods often rely on single-lead processing, which fails to exploit the inherent correlations between different leads in a standard 12-lead ECG. This limitation can result in suboptimal denoising performance. The utilization of multiple single-lead models can be time-consuming and computationally expensive, particularly with a high number of hidden layers, without leveraging the benefits of correlations between ECG leads to reduce the presence of contaminating noises. Moreover, the MPHL-DAE model and convolutional DAE model have only been evaluated on Gaussian white noise; In [
10], the authors suggest evaluating MPHL-DAE performance on other types of noise that can affect ECG signals (e.g., baseline wander, electrode movement).
Multi-lead autoencoders can leverage the natural correlation between ECG leads to denoise signals by analyzing all leads simultaneously, efficiently isolating clean signals from noise. This correlation could enhance the autoencoder’s ability to learn a more accurate and robust representation of the ECG signals, resulting in better denoising performance. For example, the network can recognize that similar leads should have similar QRS complex shapes. By processing multiple leads simultaneously, this type of autoencoder can learn complex noise patterns that are often difficult to remove using traditional filtering techniques. Multi-lead autoencoders can also be trained in an unsupervised manner, eliminating the need for large, labeled datasets of clean and noisy ECGs. This approach is advantageous because obtaining high-quality clean ECG recordings can be challenging [
11].
Therefore, we propose a multi-lead convolutional denoising autoencoder (ML-CDAE) for ECG signal denoising. The architecture of the ML-CDAE is inspired by the multiple-input and multiple-output neural network (MIMO-NN) framework [
12]. We then compare its performance with the traditional single-lead convolutional denoising autoencoder (SL-CDAE) and the current state-of-the-art FCN-DAE model using both quantitative and qualitative metrics such as
and
.
3. Materials and Methods
3.1. Data Acquisition and Preparation
In this work, both simulated and real multi-lead ECG datasets were employed to develop and evaluate the proposed ML-CDAE denoising model. For controlled experimentation and the availability of ground-truth signals, a well-established ECG signal simulator was used to generate clean 12-lead ECG recordings [
23,
28]. Obtaining noise-free multi-lead ECG recordings in real clinical settings is inherently challenging; therefore, the simulator provides an effective framework for systematic analysis. The simulated ECG signals include all 12 standard leads: bipolar limb leads (I, II, III), augmented unipolar limb leads (aVR, aVL, aVF), and unipolar chest leads (V1–V6).
To enhance realism and ensure applicability to real-world scenarios, recorded physical noise from the MIT-BIH stress test noise database was incorporated into the simulated ECG signals [
24]. Specifically, a mixture of physical noises (MoN), including muscle artifacts, electrode motion artifacts, and baseline wander, was added to the clean ECG signals. The noise signals were superimposed on the corresponding ECG leads to preserve inter-lead consistency. All simulated ECG signals were generated at a sampling frequency of 250 Hz. Sixty-second recordings were generated for each lead from 50 simulated subjects. The resulting dataset was divided into training (60%), validation (20%), and testing (20%) subsets.
The training and validation datasets were corrupted with input signal-to-noise ratio levels of −5 dB, 0 dB, and 5 dB. To evaluate model generalization under unseen noise conditions, the testing dataset was corrupted with levels of −7 dB, 0 dB, and 7 dB.
In addition to simulated data, real ECG recordings from the ST-Petersburg INCART 12-Lead Arrhythmia Database [
29] were used to further validate the proposed model. This database consists of 12-lead ECG recordings sampled at 257 Hz. A total of 75 records were initially considered. To each record, an automatic screening procedure was applied to extract the cleanest continuous 60 s fragment corresponding to the lowest estimated noise level, following the algorithm described in [
30]. The fragment selection was based on the average noise level computed across all 12 leads, ensuring a global assessment of signal quality rather than lead-specific optimization. The selected 60 s fragments share identical time stamps across all leads, thereby preserving inter-lead temporal alignment and physiological correlation. This step ensured the selection of high-quality, multi-lead reference segments suitable for denoising experiments.
Prior to segmentation and model input, the real ECG recordings were normalized on a per-lead basis using min–max normalization. For each lead, signal amplitudes were linearly scaled to the range [0, 1] based on the minimum and maximum values computed over the corresponding 60 s reference segment. This normalization step was applied to reduce inter-patient and inter-lead amplitude variability while preserving relative waveform morphology and inter-lead relationships. The same normalization procedure was applied consistently across training, validation, and testing subsets.
From the extracted segments, 60 recordings were randomly selected for training, while the remaining 15 were reserved exclusively for testing the proposed model. To maintain consistency with the simulated data experiments, the same noise types and levels were applied to the real ECG recordings.
Finally, both simulated and real noisy ECG signals were segmented into fixed-length windows of 256, 512, and 1024 samples each. This segmentation strategy was adopted to investigate the effect of segment length on the denoising performance of the proposed ML-CDAE and SL-CDAE model.
3.2. Architecture of ML-CDAE Model
The proposed convolutional denoising autoencoder has a multiple-input and multiple-output (MIMO) architecture to process/denoise multi-lead ECG signals simultaneously, as shown in
Figure 3A. Thus, we denote this model as the multi-lead convolutional denoising autoencoder (ML-CDAE). The ML-CDAE model is designed to accommodate multiple inputs to leverage the inter-lead correlation among the 12-lead ECG signals, in contrast to the single-lead convolutional denoising autoencoder (SL-CDAE) (see
Figure 3B), which processes a single input to produce a single output (SISO). The encoder component of the ML-CDAE model consists of two sequential convolutional layers for each input. The outputs of the second convolutional layers (referred to as Conv. Layer 2) are concatenated using a merge layer, which computes the average of these outputs. The first and second convolutional layers utilize kernels of size 16 and 8, respectively, with 128 filters each. Within these layers, the number of neurons is progressively reduced by applying a stride of 2 and using zero-padding at each hidden layer. In the decoder section of the ML-CDAE, the output from the merge layer is passed through two additional transpose convolutional layers. These layers are designed with hyperparameters which are a mirror of the encoder’s hidden layers. However, the final transpose convolutional layer contains only a single filter, to produce an output that matches the length of the input segment.
The following equations achieve the output functions of each layer [
1]:
where
and
represent the weight and bias matrices of the encoder layer, respectively, and
and
represent the weight and bias matrices of the decoder layer, respectively. Meanwhile,
f and
g denote non-linear activation functions.
The single-lead CDAE model shares the same architecture as the ML-CDAE model in terms of the number of hidden layers and their hyperparameters, providing a fair comparison. However, it is designed for a single input layer and excludes the merge layer used in the ML-CDAE, as depicted in
Figure 3B. The proposed models, ML-CDAE and SL-CDAE, were implemented using the following hardware and software setup:
Processor (CPU): Apple M1 chip with an 8-core CPU (4 performance cores and 4 efficiency cores) (Apple Inc., Cupertino, CA, USA).
Graphics Processing Unit (GPU): Integrated 8-core GPU (Apple Inc., Cupertino, CA, USA).
Memory (RAM): 8 GB unified memory.
Software environment: Python (v3.9.13); TensorFlow (v2.11); NumPy (v1.23.5); SciPy (v1.9.3).
For compilation, the mean square error and adaptive moment estimation (considering the following values of learning rate , exponential decay rate ) were used as a loss function and optimizer. We compiled all the proposed models with epoch = 200 and ReLU as activation function.
3.3. Performance Evaluation
The performance of the proposed models was evaluated based on the
of the ECG signals achieved by the network.
is a metric that quantifies the improvement in signal strength relative to the noise level following the application of ML-CDAE. Greater separation between the pure ECG signal and other noise can be seen at higher
values. It shows how good the signal becomes after its denoising process [
4]. This metric is estimated for a channel
of a signal as follows:
Another metric that can be used to evaluate the results of using ML-CDAE is the mean square error (
). The
estimates the average squared difference between the denoised ECG and the clean ECG signal, expressing the potential error resulting from the denoising process. Lower values of
indicate high performance. Its equation is as follows [
1]:
where
M denotes the total number of samples.
5. Discussion
For both real and simulated ECG recordings, the results indicate that ML-CDAE outperforms SL-CDAE across all tested segment lengths and noise levels in key metrics such as
and
. This improvement arises from the model’s ability to exploit correlations between multiple leads in a standard 12-lead ECG. By analyzing signals from multiple leads, ML-CDAE can more effectively distinguish clean ECG signals from noise, thereby enhancing the denoising process [
3]. The redundancy in multi-lead recordings allows the model to compensate for noise present in individual leads [
25], improving robustness against common clinical interferences such as baseline wander, motion artifacts, and electrode motion noise [
26]. Similar benefits have been observed in tensor decomposition approaches for single- and multi-lead ECG denoising [
31]. This means it can also better remove noise that is present on multiple leads.
Including model size and latency time in this discussion highlights the feasibility of deploying the proposed models in real-time clinical and wearable device applications. Although the ML-CDAE model has a larger memory footprint (approximately 12,498 KB) compared to FCN-DAE (3120 KB), this increase is primarily attributed to its multi-input architecture and large number of filters per layer, which enables hierarchical feature learning and enhanced denoising capability. Despite its larger model size, ML-CDAE demonstrates superior computational efficiency, processing 12-lead input segments simultaneously with a latency time of only 3.4 ms, whereas FCN-DAE requires nearly twice the inference time (6.4 ms) for the same task. Considering the trade-off between memory consumption and computational cost, along with the improved denoising performance, ML-CDAE presents a favorable balance that makes it particularly well suited for real-time multi-lead ECG denoising applications, particularly when optimizations such as model pruning, quantization, or reduced-precision arithmetic are applied.
Clinically, the improved denoising of ML-CDAE has direct benefits. Cleaner ECG signals improve the accuracy of arrhythmia detection and preserve critical waveform features such as P-waves, QRS complexes, and T-waves, reducing the risk of misdiagnosis (see
Figure 4). Moreover, its ability to operate effectively in an unsupervised manner enhances accessibility for widespread clinical use [
11].