Fully-Gated Denoising Auto-Encoder for Artifact Reduction in ECG Signals

Ahmed Shaheen; Liang Ye; Chrishni Karunaratne; Tapio Seppänen

doi:10.3390/s25030801

,

and

¹

Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, FI-90014 Oulu, Finland

²

Department of Information and Communication Engineering, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Sensors2025, 25(3), 801;https://doi.org/10.3390/s25030801

This article belongs to the Special Issue Topical Advisory Panel Members' Collection Series: Electronic Sensors for Biological Sensing and Healthcare Monitoring

Version Notes

Order Reprints

Abstract

Cardiovascular diseases (CVDs) are the primary cause of death worldwide. For accurate diagnosis of CVDs, robust and efficient ECG denoising is particularly critical in ambulatory cases where various artifacts can degrade the quality of the ECG signal. None of the present denoising methods preserve the morphology of ECG signals adequately for all noise types, especially at high noise levels. This study proposes a novel Fully-Gated Denoising Autoencoder (FGDAE) to significantly reduce the effects of different artifacts on ECG signals. The proposed FGDAE utilizes gating mechanisms in all its layers, including skip connections, and employs Self-organized Operational Neural Network (self-ONN) neurons in its encoder. Furthermore, a multi-component loss function is proposed to learn efficient latent representations of ECG signals and provide reliable denoising with maximal morphological preservation. The proposed model is trained and benchmarked on the QT Database (QTDB), degraded by adding randomly mixed artifacts collected from the MIT-BIH Noise Stress Test Database (NSTDB). The FGDAE showed the best performance on all seven error metrics used in our work in different noise intensities and artifact combinations compared with state-of-the-art algorithms. Moreover, FGDAE provides reliable denoising in extreme conditions and for varied noise compositions. The significantly reduced model size, 61% to 73% reduction, compared with the state-of-the-art algorithm, and the inference speed of the FGDAE model provide evident benefits in various practical applications. While our model performs best compared with other models tested in this study, more improvements are needed for optimal morphological preservation, especially in the presence of electrode motion artifacts.

Keywords:

electrocardiogram (ECG); denoising autoencoder (DAE); convolutional neural network (CNN); self-ONN; U-Net; gated convolution; gated residual; attention; motion artifacts

1. Introduction

Cardiovascular diseases (CVDs) remain by far the leading causes of mortality, causing the death of an estimated 17.9 million lives worldwide in 2019 []. The percentages continuously increase so that diseases such as ischemic heart disease and stroke were solely responsible for 9.1 million and around 7 million of the world’s total deaths in 2021, respectively []. Electrocardiogram (ECG) [] is a well-known non-invasive technology that can be used in the diagnosis [,] and screening [] of several cardiovascular diseases. The ECG signal is the measured electrical activity of the heart acquired by adhesive electrodes attached to the skin. However, ECG signals can be contaminated by several types of noises, such as baseline wander (BW), power-line interference (PLI), muscle artifacts (MA), and electrode motion artifacts (EM) []. If the signal is highly corrupted by these artifacts, especially during Ambulatory Electrocardiography [] or Exercise Stress Test [], the reliability of using ECG in diagnosis and screening drops significantly. Generally, motion artifacts are known to be the most difficult noise type to be eliminated from ECG signals []. Motion artifacts’ spectrum highly overlaps with the spectrum of ECG signals, and the morphology of motion artifacts resembles the morphology of ECG waves.

Several decomposition and filtering methods in the literature were utilized to reduce noise and motion artifacts or to separate the different components of the signal. Simple filters such as the traditional finite impulse response (FIR) [] and the infinite impulse response (IIR) [] filters are not sufficient for removing severe motion artifacts in signals like ECG since they introduce distortions in amplitude, duration, and morphology []. Other more complex filters, such as least mean squares (LMS) and recursive least squares (RLS), have been popular in filtering ECG signals [,,], as well as other versions such as Kalman filters, Wiener filters, adaptive median filters, etc. [,,,]. However, most adaptive filters might perform sub-optimally in extracting signals with similar statistical features if they fully overlap in time []. Moreover, the used noise reference could be uncorrelated with the motion artifacts contaminating the signals []. Blind Source Separation (BSS) approaches, such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA), can cause distortion to the extracted signals due to bad handling of the background noise and not using the a priori information of the signal []. Other algorithms, such as Empirical Mode Decomposition (EMD), can cause distortion at the beginning and the end of the QRS complexes in ECG signals, resulting in a wider QRS complex, which makes it inconvenient for the decomposition of such signals []. EMD had several issues, such as mode mixing, where the frequency between the decomposed intrinsic mode functions (IMFs) overlaps []. Moreover, it is difficult to identify which IMF represents the motion artifact and is required to be removed []. Other versions of EMD, such as ensembled EMD (EEMD) and complete EEMD with additive noise (CEEMDAN) [], tackle these issues, yet the computational speed decreases as the number of ensemble trials increases []. Variational Mode Decomposition (VMD) [] is another algorithm that can be used in ECG denoising and can overcome the mode mixing problem in EMD, yet optimal IMF selection and VMD parameters setting are difficult to obtain []. Although wavelet transform methods such as discrete wavelet transform (DWT) [] and empirical wavelet transform (EWT) [] are quite popular for denoising tasks, they can be insufficient for removing motion artifacts from contaminated ECG signals due to spectrum overlap [].

The drawbacks of different filtering and decomposition approaches make the choice of one algorithm that is robust, accurate, and efficient at the same time challenging. Using deep learning (DL) algorithms has become popular for denoising tasks in bio-signals and usually outperforms the traditional algorithms. Many deep learning algorithms exist and have been used for ECG denoising. For instance, a four-layer deep recurrent neural network (DRNN) has been used in [] to denoise ECG signals. Romero et al. [] proposed a multibranch deep convolutional neural network (CNN)-based model (namely DeepFilter) inspired by the famous inception model [] to remove baseline wander artifacts from ECG signals. Wang et al. [] proposed a generative adversarial networks (GANs) model utilizing convolutional autoencoders for ECG denoising and restoration. Diffusion models were also proposed to be used in ECG signal denoising as in [,]. Diffusion models have been the state of the art for image generation, and they show great performance, yet they have the major drawback of very slow inference time and are usually very resource intensive. As denoising autoencoders (DAEs) [] become more popular in ECG denoising, many works showed a major improvement compared with other deep learning models. For example, a fully convolutional denoising autoencoder (FCN-DAE) [] was proposed by Chiang et al., which showed superior ECG denoising performance compared with deep fully connected and convolutional models. An attention-based convolutional denoising autoencoder (ACDAE) model proposed by Singh et al. [] utilized efficient channel attention (ECA) [] in the decoder to improve the denoising performance of the model. Chorney et al. [] proposed a convolutional block attention module (CBAM) []-based denoising autoencoder model (CBAM-DAE) with a similar structure to the ACDAE model, yet the CBAM attention module was used in both the encoder and decoder. Following these models, other models that use a similar idea of using attention in denoising autoencoders emerged, such as the transformer-based convolutional denoising autoencoder (TCDAE) model proposed by Chen et al. []. In that model, gated convolutions and a block of transformer encoder were used in the encoder part of the model to provide state-of-the-art performance on the benchmark proposed by Romero et al. []. Very similar structures with minimal differences where transformer blocks are utilized in a denoising autoencoder for motion artifact removal in ECG signals were proposed in [,,].

Although the most recent deep learning approaches seem to have outstanding performance and better generalizability compared with traditional algorithms, the state-of-the-art approaches are large, slow, and resource-intensive. Further, cleaner signals, especially in severe motion artifact cases, are needed for a more reliable diagnosis in athletes and ambulatory cases, which the available approaches still lack. Therefore, in this work, we propose a novel compact architecture that surpasses the state-of-the-art performance with lower computational cost. The proposed Fully-Gated DAE (FGDAE) model embraces the idea of gating and attention and also utilizes the use of non-linearity in feature extraction to overcome the severe noise artifacts in ECG signals.

2. Related Work

2.1. CNN and Self-ONN for Feature Extraction

CNNs have become the de facto standard for many tasks in deep learning []. CNN has the ability to perform efficient feature extraction rather than hand-crafted features while considering the locality of the features. This made CNNs especially suitable for many image and signal processing tasks. For instance, Kong et al. [] proposed a polydirectional-CNN-based structure to extract spectral–spatial features for multispectral image compression. The method was tested on the Landsat-8 and the WorldView-3 satellite datasets, and three performance indicators, namely spatial information recovery, spectral information recovery, and visual effect, were evaluated. Experimental results showed that the proposed structure had strong robustness as the bit rate dropped off. Elhassan et al. [] used a hybrid CNN-based feature extraction and fusion method to extract features from white blood cells for computer-aided acute myeloid leukemia (AML) diagnosis. A total of 15 types of white blood cells, including both normal and AML cancer cells, were involved, and an overall classification accuracy of 96.41% was achieved. Chaudhri et al. [] proposed a multidimensional, multiconvolution-based feature extraction approach for a drift-tolerant, robust classifier for gases and odors. Experimental results on a public gas dataset proved the effectiveness of the proposed approach.

While deep learning models are usually preferred in many tasks, compact CNN models seem to be sufficient in many signal processing tasks. According to [], compact 1D CNNs are better than deep 2D CNNs and deep 1D CNNs in terms of data requirement (i.e., less training data needed) and computational complexity. Compact 1D CNNs have especially been state-of-the-art in several biosignal processing applications, such as early arrhythmia detection in ECG, patient-specific ECG classification, and anomaly detection [,].

Self-Organized Operational Neural Networks (self-ONNs) are generative non-linear CNN alternatives []. They can approximate any arbitrary nodal function

ψ

for each kernel element using the Qth order truncated approximation (a.k.a., MacLaurin polynomial approximation) as shown in Equation (1),

ψ {(x)}^{(q)} = \sum_{n = 0}^{q} \frac{ψ^{(n)} (0)}{n!} x^{n},

(1)

where the coefficients

\frac{ψ^{(n)} (0)}{n!}

are learnable parameters. When the parameter

q = 1

, the self-ONN is equivalent to ordinary CNN. More technical and mathematical details can be found in [,]. Self-ONNs have been used for feature extraction, denoising, and other signal processing tasks, as well as CNNs, and showed extraordinary performance. For example, Malik et al. [] compared the AWGN (Additive White Gaussian Noise) image denoising performances of self-ONN with those of CNN and BM3D (Block-Matching and 3D filtering). On average, self-ONN outperformed CNN and BM3D in terms of PSNR (Peak Signal-to-Noise Ratio). When the standard deviation of the noise increased, the gaps between self-ONN and the other two widened even more. Yamanouchi et al. [] combined self-ONN and Cycle-GAN (Cycle-consistent Generative Adversarial Network) for X-ray image denoising. The self-ONN in this structure was used for feature extraction. Via an evaluation on 300 datasets with sesame salt noise, the proposed structure showed outstanding performances in terms of both PSNR and SSIM (Structural Similarity). Devecioglu, et al. [] proposed an improved active fire detection method using an operational U-net, in which the authors replaced the convolutional layers with operational layers from self-ONN for feature extraction. The Landsat-8 dataset was used for evaluation, and experimental results showed that the proposed operational U-net outperformed the convolutional U-net significantly.

Self-ONNs were also used in ECG signal analysis. For instance, compact self-ONN models were used in [,] for patient-specific and inter-patient ECG classification, respectively, and showed outstanding performance compared with the state-of-the-art with a significant margin. Furthermore, Qin et al. [] proposed a general pre-trained classification model based on self-ONN, where it showed outstanding performance compared with CNN architectures on the PTB-XL database for an inter-patient 71-class classification task. Also, an algorithm based on 1D self-ONN was proposed in [] for peak detection in Holter ECG, and it was shown that the proposed model could significantly surpass the state-of-the-art deep CNN models with less computational complexity. These works clearly show the value of using self-ONNs instead of CNNs in denoising and feature extraction models.

2.2. Denoising AE and U-Net for ECG Denoising

Autoencoders (AEs) are powerful deep learning structures that can be used in several applications, such as dimensionality reduction, where they seem to outperform PCA and can generate more effective features. They can be used in denoising tasks in both images and signals. In fact, several types of autoencoders exist, and each type tries to address certain problems as discussed in []. Especially, AEs that utilize CNNs in their hidden layers can be suitable for extracting morphological features in signals. Denoising autoencoder (DAE) [] is a type of AEs that aims to restore a clean version of a corrupted output. This type of autoencoder has been used in ECG denoising as in the FCN-DAE model []. A special structure of autoencoders that has been recently used heavily in ECG denoising tasks is the U-Net architecture []. The U-Net was initially proposed for biomedical image segmentation. Later, Jansson et al. [] proposed using this architecture in source separation tasks, which opened the door for U-Net to be used in denoising tasks as well. Recently, the U-Net has been used as the core structure of DAE models such as the ACDAE model [], CBAM-DAE model [], TCDAE model [], and others used in ECG denoising. Hence, it can be said that the U-Net is the core architecture of the state-of-the-art ECG denoising algorithms. The efficient progressive compression of the input signal features in the encoder and progressive decompression in the decoder help the model learn the key features of the signal while ignoring redundant features. Further, the use of the residual connection between the mirrored layers of the encoder and decoder helps retain the information flow and preserve the important features in the input signal that could be lost in the compression/decompression stages of the model.

2.3. Gating and Gated Convolutions

Gated Convolutional Networks (GCNs) [] introduce a gating mechanism that can control the flow of information through the network. This mechanism is inspired by the gating units in recurrent neural networks (RNNs), such as Long Short-Term Memory (LSTM) units and Gated Recurrent Units (GRUs) [], which regulate the information passing through the units based on the context. In GCNs, a gating mechanism is applied to the output of the convolutional layer. That is, the convolution operation is followed by a gating operation. The gating mask is generated using another independent convolution operation followed by an activation (i.e., typically a sigmoid function). The gating is done by element-wise multiplication between the gating mask and the output of the main convolutional layer. Therefore, it can be understood that the values in the gating mask control which features are allowed to pass through and which to suppress, dynamically adjusting the flow of information based on the input. In a gated convolutional layer with input x, the output y is computed as follows:

y = C o n v_{i} (x) ⊙ σ (C o n v_{g} (x)),

(2)

where

σ (\cdot)

is the sigmoid function, ⊙ denotes element-wise multiplication, and

C o n v_{i} (\cdot)

and

C o n v_{g} (\cdot)

are the convolution operations for input and gate, respectively. Gated convolutions can potentially improve the performance of denoising models by dynamically controlling the flow of information and emphasizing important features. In the ECG denoising task, gated convolutions can be particularly useful in distinguishing between the features of ECG signals and motion artifacts since they can dynamically adapt to different types of artifacts and suppress their related features. Gated convolutions were used in the model proposed by Chen et al. [] along with other modules to denoise ECG signals, and state-of-the-art performance was reported. Hence, we try to utilize the idea of gated convolutions in our work as discussed later on.

3. Materials and Methods

3.1. Model Architecture

The different artifacts corrupting the ECG signal can be represented by mainly two components: additive and multiplicative components, respectively. For example, respiration motion causes the heart to change orientation and move from its original position (w.r.t. the measuring electrodes), causing amplitude modulation to the measured ECG []. Furthermore, the change in conductivity of the skin tissues due to chest movement and blood volume shift in the area of measurement causes an effect known as baseline drift. Changes in body position, muscle tremors, and electrode motions can also cause similar effects []. To model these effects, we consider these non-ECG components as either one of two components, i.e., multiplicative component

α

and additive component

β

, where those two components combine the rotation, scaling, baseline drift, and the additive Gaussian noise as shown in Equation (3),

\tilde{x} (n) = α (n) \cdot x (n) + β (n),

(3)

where

x, \tilde{x}, α, and β \in R^{N}

. The general aim would be to obtain the clean ECG signal x from the noisy ECG signal

\tilde{x}

by suppressing the artifact components

α

and

β

. However, in this study, we are concerned with the additive component only, which represents the majority of different artifacts.

A denoising autoencoder (DAE) [] utilizing self-organized Operational Neural Network (self-ONN) [] layers is proposed to remove the additive artifacts in ECG signals. The DAE attempts to reconstruct the original clean data x from a corrupted version of these data

\tilde{x}

, usually by applying a stochastic corruption process

\tilde{x} \sim q_{D} (\tilde{x} | x)

such as applying Additive White Gaussian Noise (AWGN) or other means of partial destruction of the data. The encoder part of the DAE,

h = f_{θ} (\tilde{x})

, tries to map the corrupted input data to a lower-dimensional manifold while ignoring the variations of the input data. The decoder part of the DAE,

\hat{x} = g_{θ} (h)

, then tries to reconstruct the clean data from the learned manifold. The overall DAE model can be summarized by Equation (4), where

\hat{x} \in R^{N}

is the estimated clean data:

\hat{x} = g_{θ} (f_{θ} (\tilde{x})) .

(4)

An overview of the proposed model is shown in Figure 1. In our proposed model we utilize the powerful U-Net architecture [], which has proven to be efficient in ECG denoising. In U-Net architecture, the signal is passed through a few encoder layers that aim to find the latent representation of the input signal, which can be used by the decoder layers to reconstruct the clean ECG signal. The length of the signal after each encoder block is compressed, and the number of channels increases progressively. The opposite happens after each decoder layer, where the signal’s length increases and the number of channels decreases progressively. For the signal shape after each stage of the proposed model, we followed similar shapes as proposed by [], which we found to be well suited for our model.

Figure 1. Overview of the proposed FGDAE model.

The structures of the encoder and decoder blocks are shown in Figure 2. In the encoder block, we employ a gated self-ONN layer for feature extraction where two 1D self-ONN layers (kernel size = 9 × 1) parallelly process the input signal. One layer (input self-ONN) is extracting features from the input signal, while the other layer (gate self-ONN), followed by a sigmoid activation, is used to generate the gating mask to the extracted features. A dropout layer [] is introduced to the gating path after the sigmoid activation to force the gate self-ONN to produce a robust gating mask and prevent overfitting, which is inspired by []. The dropout layer is by default active only during training, and we additionally limit it to be active only when the number of channels is more than 1 feature, which means that, for example, the dropout layer is not active in “Encoder block 5” shown in Figure 1. That also applies to all the dropout layers used anywhere in the proposed model. The gated self-ONN layer can be expressed by Equation (5),

y = O N N_{i} (x) \otimes η (σ (O N N_{g} (x))),

(5)

where x and y are the input and output signals of the gated self-ONN layer, respectively. The ⊗ represents point-wise multiplication,

O N N_{i} (\cdot)

and

O N N_{g} (\cdot)

represent the input and gate self-ONN layers, respectively, generating input and gating feature maps, and

σ (\cdot)

and

η (\cdot)

represent sigmoid activation and dropout layers, respectively.

Figure 2. The building blocks of the proposed model.

The decoder block uses a similar feature extraction layer structure (Gated Deconv) where the only difference is that the 1D self-ONN layers are replaced by 1D transpose convolutional (a.k.a. Deconvolution) layers (kernel size = 9 × 1). Equation (6) depicts the structure of the Gated Deconv block, where

D e c o n v_{i} (\cdot)

and

D e c o n v_{g} (\cdot)

represent the input and gate deconvolution layers, respectively, generating input and gating feature maps:

y = D e c o n v_{i} (x) \otimes η (σ (D e c o n v_{g} (x))) .

(6)

The feature extraction layers in encoder and decoder blocks are followed by a normalization layer where we employ the instance normalization [], which in our experiments, seemed to give a better performance than batch normalization [], for example. The instance normalization operates on each individual sample independently, rather than normalization across the entire batch as in the batch normalization. For non-linearity, we use LeakyReLU activation following the normalization layer. Then in the encoder only, a max pooling layer is utilized after the activation to divide the signal length in half as explained earlier. The decoder does not need a max-unpool layer since the stride is set to 2 in the “Gated Deconv” layers, which perform the needed unpooling effect. Finally, a channel attention module is utilized at the end of the encoder and decoder blocks. In our model, we use the efficient channel attention []. Yet, upon testing, we found that incorporating the information obtained from max pooling into the original average pooling resulted in a better performance. This is similar to the channel attention proposed by []. In the proposed model, we propose a small modification that enhances the performance of our model. Namely, the input signal is max-pooled and average-pooled simultaneously, the pooled features are passed through a shared 1D convolutional layer, and then each output is activated by a sigmoid activation function. The activation outputs are added together to form the attention mask. A dropout layer is applied to the attention mask to enhance generalization and prevent overfitting. The channel attention can be expressed as follows:

ω = η (σ (C o n v (x_{m a x})) + σ (C o n v (x_{a v g}))),

(7)

y = x \otimes ω,

(8)

where x and y are the input and output signals of the channel attention block, respectively. The

C o n v (\cdot)

operator represents the shared 1D convolutional layer, and

σ (\cdot)

and

η (\cdot)

represent sigmoid activation and dropout layers, respectively. The attention mask is referred to as

ω

, while

x_{m a x}

and

x_{a v g}

are the output features of applying global max pooling and global average pooling on the input signal, respectively. These are 1D features that have C data points, where C is the number of channels in the input signal x. We follow the other configurations for the convolutional layer proposed by [], where, for the proposed model structure, the kernel size of the shared convolutional layer is 3 × 1, and “same padding” is used along all the used channel attention layers.

It is known that using residual connections, as in ResNet architectures [], can help overcome issues like vanishing gradients and allows for the effective training of very deep networks. Recent works showed that residual connections in U-Net architectures have proven to be beneficial for ECG denoising. Furthermore, it has been shown by [] that using spatial attention with point convolution on the residual connections on similar architecture helps in retaining the key features of the encoded feature maps and helps in better reconstruction in the decoder layers. Hence, in our work, we use the idea of residual gating proposed by Oktay et al. [] to utilize the information from both the skip connection and the decoder output to form a gating for the decoder output as shown in Figure 2. The proposed residual gating is more global and robust than using attention on pooled features as in []. Both the spatial information and channel information are retained and utilized effectively to generate the required gating mask. In the proposed residual gating, two convolutional layers with a kernel size of 1 × 1 are used to generate features from the residual connection signal and the output of the preceding decoder block, respectively. A dropout layer is employed after the “Residual Conv” layer to put more emphasis on the output of the “Input Conv” layer while forcing more robust features from the residual connection. The rationale here is that the residual connection signal might still partially have noisy components, hence the need for more robustness. A sigmoid activation is used on the sum of the obtained features to generate the gating mask to the input signal of the residual gating block. The residual gate block can be described by Equation (9),

y = x_{i n} \otimes σ (C o n v_{i n} (x_{i n}) + η (C o n v_{r e s} (x_{r e s}))),

(9)

where

x_{i n}

,

x_{r e s}

, and y are the input feature map to be gated (i.e., the output of the previous decoder block), the residual connection feature map, and the output of the residual gating block, respectively. The

C o n v_{i n} (\cdot)

and

C o n v_{r e s} (\cdot)

operators represent the 1D convolutional layers (k = 1) that operate on input and residual feature maps, respectively. The

σ (\cdot)

and

η (\cdot)

operators, as usual, represent sigmoid activation and dropout layers, respectively.

3.2. Training Loss Functions

In this study, we aim to design a model that can benefit from the training process to gain the capability of performing effective denoising. Reducing the reconstruction error while keeping the morphology of the original signal as much as possible is of utmost priority. Hence, we designed a loss function that drives the model to learn the hidden features of the signal while retaining sparsity and generalization ability. As a reconstruction loss, we use the loss function proposed by [] as shown in Equation (10),

J_{r e c o n} = \sum_{n = 1}^{N} {[x (n) - \hat{x} (n)]}^{2} + λ \times max | x (n) - \hat{x} (n) |_{n = 1}^{N},

(10)

where the sum of squared distance (SSD) (also defined in Equation (17)) and maximum absolute distance (MAD) (also defined in Equation (18)) are combined, and the tunable scaling coefficient

λ

for MAD is set to 50, similar to [,]. This combined loss ensures that the overall signal looks similar to the target signal while also minimizing the point-wise errors and rejecting outliers.

To enhance the spatial resolution of the reconstructed signal and encourage the model to pay attention to the fine details, a gradient loss (G-Loss) [] is used as part of the overall model loss. The G-Loss was proposed by Ge et al. to improve the resolution of reconstructed images. In our case, it helps the model to follow the target signal closely and focus on subtle details that are not noise-related. The use of G-Loss also helps the model to learn a more detailed yet robust manifold of the ECG signals, which we believe would enhance the generalization ability of the model. We employed the idea of G-Loss, yet we calculated the MAD loss (instead of originally the sum of absolute error) between the gradients of the target signal x and the estimated signal

\hat{x}

. Following [], we did not add a scaling parameter to this loss component. This loss component aims to minimize the largest deviation between the gradients, preventing outliers and localized high-intensity noise. Equations (11)–(13) depict the proposed gradient loss:

x_{g r a d} (n) = x (n + 1) - x (n),

(11)

{\hat{x}}_{g r a d} (n) = \hat{x} (n + 1) - \hat{x} (n),

(12)

J_{g r a d} = max {|x_{g r a d} (n) - {\hat{x}}_{g r a d} (n)|}_{n = 1}^{N - 1} .

(13)

To further enhance the morphological similitude between the target signal and the estimated signal, we employ a correlation loss using Pearson’s correlation coefficient to measure the correlation between the signals. Exponential transformation of correlation values was used in [] to enhance the correlation between power spectrums of the target and the estimated signals, while the absolute of the correlation was used as auxiliary loss in [] to enhance the morphology of ECG generated from given photoplethysmography (PPG) signals. In this study, we wanted to leverage the overall morphological resemblance between the target and the estimated signals, and hence we penalize the minimum correlation as shown in Equation (14):

J_{m o r p h} = λ_{m o r p h} \times (1 - min (\frac{cov (x_{i}, {\hat{x}}_{i})}{σ_{x_{i}} σ_{{\hat{x}}_{i}}})) .

(14)

The rationale here is that the minimum correlations usually come from outliers or difficult pathological cases, and we wanted our model to be able to retain the morphology even for these challenging samples. The correlation loss is computed as

1 - m i n (c o r r e l a t i o n (x_{i}, {\hat{x}}_{i}))

, where

x_{i}

and

{\hat{x}}_{i}

are the

i^{t h}

target and estimated samples. This means that the possible range of this loss component is from 0 to 2. The tunable scaling factor

λ_{m o r p h}

is used to balance the significance of this component to other components of the overall loss. Empirically, we tested the values of

λ_{m o r p h}

in the range of 5 to 30 and found 10 to be suitable for our model, where it is neither too large to dominate other loss components nor small enough to be negligible.

To promote the model’s sparsity, we employ

L_{1}

regularization on the weights of the encoder part of the model to ensure sparse and robust manifold learning. L1 regularization adds a penalty equal to the absolute value of the magnitude of the coefficients. L1 regularization can lead to sparse models where some weights become exactly zero, effectively selecting features and preventing overfitting []. The

L_{1}

-regularization cost function (

J_{r e g}

) is described in Equation (15),

J_{r e g} = λ_{r e g} \sum_{p} |θ_{p}^{E n c}|,

(15)

where

θ_{p}^{E n c}

is the set of weights of the

p^{t h}

layer of the encoder, while the tunable weight decay factor

λ_{r e g}

is used to control the strength of regularization. Empirically, we tested values for

λ_{r e g}

in the range of 0.005 to 1 and found that 0.01 was suitable for our model. We noticed that higher values of

λ_{r e g}

negatively affect the model performance and prevent it from converging due to over-regularization. A very low value of

λ_{r e g}

makes the loss value very small, to a limit that it doesn’t affect the performance of the model.

The overall loss function (

J_{A E} (θ)

) is the sum of all the discussed terms as shown in Equation (16), promoting better reconstruction, morphological similitude, and sparsity:

J_{A E} (θ) = J_{r e c o n} + J_{g r a d} + J_{m o r p h} + J_{r e g} .

(16)

3.3. Datasets and Experimental Setup

In this study, we follow the benchmark experiment guidelines proposed by []. The benchmark tests the model performance on a dataset created by combining signals from two openly available databases: QT Database (QTDB) [] and the MIT-BIH Stress Test Database (NSTDB) []. The QTDB provides the clean ECG samples collected from 7 other databases. A total of 105 records of dual-channel ECG with a duration of 15 min each with a sampling rate of 250 Hz are available from this database collected from different leads and containing different pathological cases. The NSTDB provides three artifact types (namely, baseline wanders (BW), muscle artifacts (MA), and electrode motions (EM)) that resemble ambulatory ECG noise. Each available artifact type has a duration of 30 min and a sampling rate of 360 Hz. The samples from QTDB are resampled to 360 Hz to match the sampling rate of noise records. In the original benchmark experiment, the ECG records from the QT database are corrupted by the noise in NSTDB that contains baseline wander (BW) only. The contamination noise samples are scaled randomly to have maximum amplitudes in the range [0.2:2] times the ECG sample maximum amplitude. Since NSTDB has noise records in two channels, each channel is split in half (where the first and second halves are referred to as “a” and “b”, respectively), and one half from each channel is used for either training or testing as shown in Table 1.

Table 1. Noise split used in the experiments.

The selected models are trained on ECG signals corrupted with “channel 1_a” and tested on ECG signals corrupted with “channel 2_b” (this noise version is referred to as “Noise v1 (nv1)” elsewhere in this study) and then trained on ECG signals corrupted with “channel 2_a” and tested on ECG signals corrupted with “channel 1_b” (this noise version is referred to as “Noise v2 (nv2)” elsewhere in this study). This ensures that each time the model is trained and tested on different uncorrelated noises. The combined results of the two experiments are then reported, as in the following section. The test datasets use selected subjects from the QT database as shown in Table 2. The selected test set represents 13% of the total data amount. Using that inter-patient scheme for data division ensures better evaluation of the model’s generalization capability and adapts to real-world scenarios.

Table 2. Subjects selected from QTDB for test sets.

However, the original experiment assumes non-realistic assumptions for the data preparation, such as having one beat per signal (data sample) centered with zero-padding to ensure size 512 per data sample. In most real-life situations it is not possible to perform that segmentation process before denoising, especially when the signal is too corrupted. Furthermore, assuming spatial consistency in the input samples is impractical. Also, using only one noise type as in [] or all three noise types combined as in [] does not cover the possibilities of real noisy signals. Hence, we follow the proposed data preparation scheme proposed by [], where a sliding window of length 512 and overlap of 256 is used to segment the signals into data samples of length 512 each with no prior assumptions of the beat length or its spatial position in the obtained signal. Additionally, we noticed that the clean ECG signals are not entirely clean, as some of the segments include very slow baseline variations. We removed these variations by subtracting the mean of the segments. Following [], no amplitude normalizations of the clean ECG signals were done. Furthermore, we utilize the same Random Mixed Noise (RMN) scheme proposed by [], where 8 random combinations of noise are created in the training and testing datasets (shown in Table 3), giving more variation to the datasets with more realistic scenarios.

Table 3. Noise types and combination types.

The random noise scaling and random noise type combinations are independent of each other. An important point in this combination is that it provides clean signal samples, which is essential to have in the training datasets for the deep learning models to influence the denoising functionality only on noisy signals rather than blindly performing reduction in innate features of the signal. For each noise split, we end up with 91,062 pairs of noisy (input) and clean (reference) ECG samples for training data and 15,535 pairs for testing data.

3.4. Comparative Methods

We compare our proposed algorithm with different methods, including state-of-the-art methods. These methods can be divided into three categories:

3.4.1. Traditional Filters

We use FIR and IIR filters as non-learnable baseline methods as in previous studies employing this benchmark setup. The implementation of the 4th order Butterworth IIR filter is more computationally efficient, yet both filters are bidirectional (zero-phase) high-pass filters with a cut-off frequency of 0.67 Hz. The design of both the IIR and the FIR filters is described in [].

3.4.2. Non-DAEs

Additionally, we compare with deep learning methods that do not compress the input signals to lower dimensions, such as in DAEs. For this category, we use the same methods used in previous state-of-the-art studies, such as deep recurrent neural networks (DRNNs) proposed by [] and a multi-branch deep convolutional filter model (DeepFilter) proposed by [].

3.4.3. DAEs

We compare our model with similar DAE-based models that include the state-of-the-art methods such as FCN-DAE [], ACDAE [], CBAM-DAE [], and TCDAE []. We also include a convolutional denoising autoencoder (CNN-DAE) model adopted in the experimentation of []. The CNN-DAE model is very similar to FCN-DAE, except that the output transpose convolution layer is replaced by a dense (fully connected) layer in CNN-DAE. For the ablation study, we also include our proposed model with different values of the “q” parameter of self-ONN to evaluate the behavior of the model for each value. For

q = 1

, the model has no non-linear layers (typical of traditional “

C o n v

” layers), and non-linearity is introduced gradually as the set value to the “q” parameter increases. The dropout ratio is set to

0.001 \times q

for the gated self-ONN layer and

0.001

otherwise in other layers in the model.

3.5. Evaluation Metrics

To evaluate our model’s performance compared with other methods, we use the same four quantitative evaluation metrics used in [] as well as another three standard error metrics used in the literature to evaluate signal denoising effectiveness. The following is a list of the seven used evaluation metrics:

Sum of the square of the distances (SSD)

$SSD = \sum_{n = 1}^{N} {[x (n) - \hat{x} (n)]}^{2}$

(17)
Maximum absolute distance (MAD)

$MAD = max {|x (n) - \hat{x} (n)|}_{n = 1}^{N}$

(18)
Percentage root-mean-square difference (PRD)

$PRD = \sqrt{\frac{\sum_{n = 1}^{N} {[x (n) - \hat{x} (n)]}^{2}}{\sum_{n = 1}^{N} {[x (n) - \bar{x}]}^{2}}} \times 100 %$

(19)
Cosine similarity (CosSim)

$CosSim = \frac{\sum_{n = 1}^{N} x (n) \hat{x} (n)}{\sqrt{\sum_{n = 1}^{N} {[x (n)]}^{2}} \sqrt{\sum_{n = 1}^{N} {[\hat{x} (n)]}^{2}}}$

(20)
Mean absolute error (MAE)

$MAE = \frac{1}{N} \sum_{n = 1}^{N} |x (n) - \hat{x} (n)|$

(21)
Root mean square error (RMSE)

$RMSE = \sqrt{\frac{1}{N} \sum_{n = 1}^{N} {[x (n) - \hat{x} (n)]}^{2}}$

(22)
Signal-to-noise ratio (SNR)

$SNR = 10 {log}_{10} (\frac{\sum_{n = 1}^{N} {[x (n)]}^{2}}{\sum_{n = 1}^{N} {[x (n) - \hat{x} (n)]}^{2}})$

(23)

where N is the number of samples in the signals, x and

\hat{x}

are original (clean) and reconstructed (denoised) signals, respectively, and

\bar{x}

is the mean of the original signal. These different quantitative measures indicate the performance of the models from various perspectives. For instance, SSD and RMSE evaluate the error in denoising while giving more weight to the larger errors. MAD identifies extreme outliers, while PRD provides normalized overall error measures accounting for different signal segment magnitudes. MAE provides a straightforward interpretation of the average error without excessive sensitivity to outliers. SNR indicates how well the noise component is suppressed compared with the signal component. Finally, cosine similarity measures the preservation of signal shape and structure, independent of amplitude errors.

3.6. Training and Implementation Details

The proposed model and other comparative models were implemented in Python using the Tensorflow Keras framework (The implementation codes will be available at https://github.com/AhmedAShaheen/fully_gated_DAE (accessed on 26 January 2025)). The core of the benchmark is obtained from [] and other additions provided by [,]. All the methods were trained and tested on the same benchmark using the same conditions. The deep learning models were trained and tested on GPU, and traditional filters were tested on CPU on a regular laptop with an AMD Ryzen 7 5800H CPU, 32 GB RAM, and NVIDIA GeForce RTX 3080 GPU. A batch size of 64 was used for training, as well as using the Adam optimizer with an initial learning rate of

10^{- 3}

. Similar to [], we reduce the learning rate by a factor of 2 after a couple of iterations if the validation loss does not improve by a minimum of 0.05. The minimum possible learning rate was set to

10^{- 10}

as well. The training was permitted for

10^{5}

iterations unless the validation loss stops improving for 10 consecutive iterations. The weights resulting in the best validation loss for each noise version for each model were saved and used in the evaluation process.

4. Results

In this section, we present the proposed model’s performance results compared with other comparative models in the benchmark. Following [], we also used the Wilcoxon signed-rank test to compare the outputs of all methods with the proposed model. The significance level for the test (

α

) is also set to 0.05.

4.1. Overall Quantitative Evaluation

The overall error metrics for the tested denoising methods are shown in Table 4, where the MEAN and standard deviation (STD) values (

M E A N \pm S T D

) are reported for each metric for each method. The MEAN and STD values are calculated from the error between denoised and reference segments in the testing data for both noise versions (concatenated). The statistical significance (p-value) is reported for other models compared with FGDAE for (

q = 1

) and (

q = 2

). The results show superior overall performance of the FGDAE model (

q = 2

) compared with all other methods except for PRD and SNR, where the FGDAE model (

q = 1

) shows better performance. The SSD, MAD, PRD, CosSim, RMSE, MAE, and SNR of FGDAE (

q = 2

) are

12.251 \pm 30.766

mV,

0.501 \pm 0.448

mV,

50.588 \pm 31.524

%,

0.867 \pm 0.165

,

0.121 \pm 0.096

mV,

0.086 \pm 0.065

mV, and

7.841 \pm 4.749

dB, respectively. Likewise, the SSD, MAD, PRD, CosSim, RMSE, MAE, and SNR of FGDAE (

q = 1

) are

12.899 \pm 32.987

mV,

0.504 \pm 0.466

mV,

49.993 \pm 30.778

%,

0.866 \pm 0.169

,

0.122 \pm 0.102

mV,

0.086 \pm 0.068

mV, and

8.028 \pm 5.229

dB, respectively. However, the overall performance difference between the FGDAE model (

q = 1

) and the FGDAE model (

q = 2

) is significant in favor of (

q = 2

). The results propose that the additional increase in non-linearity level can be unpleasant and result in worse performance for the proposed model design, as in cases of (

q = 3

) and (

q = 4

). Hence, in the following sections, we only report the results of the two best models, i.e., FGDAE with

q = 1

and

q = 2

. Henceforth, we refer to those models as “linear” and “non-linear” FGDAE, respectively.

Table 4. Overall evaluation metrics results for different methods for RMN experiment.

The statistical significances per error metric for selected methods (the closest performance to our model) compared with the proposed non-linear FGDAE model are shown in Table 5. It can be seen from the results that the p-values for all the selected models are almost zero, indicating significant performance of the proposed non-linear FGDAE. The results also show that the PRD and SNR differences between the linear and non-linear FGDAE models are significant. This indicates that the level of non-linearity can be adjusted in the model to fulfill specific needs.

Table 5. Wilcoxon test (

α = 0.05

) results between (FGDAE, q = 2) and selected models.

4.2. Performance Under Different Noise Levels

We calculate the error metrics for different noise amplitudes for each method similar to the analysis proposed by [] to evaluate the performance of the models in different noise severities. Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 show the results of the used metrics separately. The p-values are calculated in these tables for all models compared with the best-performing model on the specified scales. The SSD results in Table 6 show that the non-linear FGDAE model performs best (with statistical significance) on the noisiest signals, i.e., ranges (

1.0 to 1.5

) and (

1.5 to 2.0

). Although the linear FGDAE produces better results for ranges (

0.2 to 0.6

) and (

0.6 to 1.0

), the statistical difference in performance is not significant in range (

0.6 to 1.0

). Similar results can be seen in Table 7, Table 10, and Table 11 for MAD, RMSE, and MAE, respectively. Table 8 shows that linear FGDAE shows the best PRD error values in ranges (

0.2 to 0.6

), (

1.0 to 1.5

), and (

1.5 to 2.0

), while non-linear FGDAE shows the best values in the range (

0.6 to 1.0

). However, the differences in ranges (

0.2 to 0.6

), (

1.0 to 1.5

), and (

1.5 to 2.0

) are not significant, showing that the performance w.r.t. that metric on selected noise amplitudes is comparable for both models. For CosSim results shown in Table 9, it can be noticed that the linear model performs best in the lowest range (

0.2 to 0.6

) while the non-linear model performs best otherwise. The statistical difference in the range (

0.6 to 1.0

) is not significant, though. This is very similar to the SNR results reported in Table 12. These results clearly indicate that the non-linear model preserves the morphological similarity better than all other models, especially in cases of larger noise amplitudes. This can be attributed to the fact that the introduced non-linearity tends to learn more complex features, making it more suitable for processing more complex signals.

Table 6. SSD in different noise segments for different methods for RMN experiment.

Table 7. MAD in different noise segments for different methods for RMN experiment.

Table 8. PRD in different noise segments for different methods for RMN experiment.

Table 9. CosSim in different noise segments for different methods for RMN experiment.

Table 10. RMSE in different noise segments for different methods for RMN experiment.

Table 11. MAE in different noise segments for different methods for RMN experiment.

Table 12. SNR in different noise segments for different methods for RMN experiment.

4.3. Performance Under Different Noise Types

To evaluate the model’s robustness on different noises, we calculated the error metrics for different noise types. For brevity, we only compared our methods’ results to the results of the state-of-the-art model (i.e., TCDAE) present in Table 13. Table 14 and Table 15 show the results for the linear and non-linear FGDAE models, respectively. The results show that the linear FGDAE outperforms the non-linear model as well as TCDAE when applied to clean signals. The non-linear FGDAE becomes marginally inferior to TCDAE only on clean ECG signals; otherwise, both linear and non-linear FGDAE show better performance compared with TCDAE on all artifact compositions. The non-linear model seems to introduce more disturbances to clean signals, which can be attributed to the possibility that the non-linear model is biased more towards denoising more complex signals than cleaner ones. The results also show that BW and MA artifacts and their combinations seem to be less problematic than EM for the three models. It can be noticed that the three models produce significantly more error for EM and other combinations that include EM than any other noise combinations. The electrode motion noise is expected to be very challenging generally since it introduces very sharp drops or severe distortion in the data compared with motion and baseline wander artifacts, which have naturally slower trends. Nonetheless, the electrode motion is very difficult to model compared with the other two noise types. However, especially in these challenging cases, the non-linear model outperforms the linear model and TCDAE. This shows that although the non-linear model can be a bit problematic for less noisy data, it can introduce major benefits in denoising more challenging data.

Table 13. Error metrics for different noise combinations for TCDAE method [].

Table 14. Error metrics for different noise combinations for method (FGDAE, q = 1).

Table 15. Error metrics for different noise combinations for method (FGDAE, q = 2).

4.4. Visual Analysis of Denoising Performance

To visually evaluate the denoising performance of different algorithms, we plot a small portion of the data, making a total of 4069 data points (8 consecutive non-overlapping segments), which is a bit more than 11 s per plot. Figure 3, Figure 4 and Figure 5 show signals collected from different subjects, with different artifact compositions and different noise amplitudes. The clean and noisy signals are displayed in the top-left plot of the three figures, accompanied by SNR and other error metrics calculated between the clean and noisy signals. The denoised outputs of selected state-of-the-art models, along with our linear and non-linear FGDAE models, are also presented in the same figures. The figures demonstrate that the linear FGDAE produces the best-preserved signals in the case of BW+MA, while the non-linear FGDAE model achieves better results in the other two cases involving EM artifacts. This outcome is consistent with the findings presented in the previous section. Although the differences between the linear and non-linear models appear marginal, the plots indicate that the SNR improvement achieved by both models is considerable compared with other methods. The figures demonstrate that both FGDAE models effectively preserve the key morphological features of the ECG signal, even in abnormal cases. Compared with other state-of-the-art methods, they exhibit less distortion and smaller deviations relative to the reference clean signal. However, in our models, some P and T waves still fail to align sufficiently well. These minor disturbances and mismatches could potentially lead to incorrect diagnoses. Thus, while the proposed solution represents a considerable improvement, perfect morphological preservation of ECG signals in these situations is not guaranteed, highlighting the need for further improvements.

Figure 3. ECG signals processed by selected algorithms. The artifact composition is (BW+MA).

Figure 4. ECG signals processed by selected algorithms. The artifact composition is (MA+EM).

Figure 5. ECG signals processed by selected algorithms. The artifact composition is (BW+MA+EM).

4.5. Computational Efficiency

Finally, we summarize the model sizes and inference time for testing data for both noise versions (i.e., nv1 and nv2) for selected models in Table 16. The proposed linear and non-linear models have

263.25 k

and

384.50 k

parameters, respectively. Compared with the TCDAE model with over 1 million parameters, our proposed models show a significant reduction of 61% to 73%. This comes with comparable inference time for both models. We believe that the gating mechanisms in the self-ONN layers and the residual connections introduce some extra delay in the model, causing it to have a similar inference time to the TCDAE model despite the smaller sizes of the proposed models. Also, the TCDAE has fewer stages, yet there is a large bottleneck at the transformer part where many attention heads work concurrently. Since we are using a GPU for training and testing, the provided parallelism causes the TCDAE to have close inference times to the proposed models. Also, we believe that the non-linear version of the proposed model could be optimized to be more compact, yet for the sake of fair comparison, we keep the model structure as it is for the linear and non-linear versions, where the linear version is the baseline for designing the model architecture. However, as aforementioned, the compact self-ONN models can in fact surpass more complex CNN models if carefully designed.

Table 16. Total number of parameters and testing times for selected models.

5. Conclusions

An efficient model is proposed in this study to reduce different types of artifacts in ECG signals. The proposed FGDAE model employs the U-Net architecture for DAE to perform a robust denoising process. FGDAE utilized the idea of gating and attention efficiently to provide a high denoising performance. The linear and non-linear variants of the proposed model harness the feature extraction capability of CNN and self-ONNs, respectively, to achieve state-of-the-art ECG denoising performance in all possible noise combinations and amplitudes compared with the baseline methods in the followed benchmark. The seven used evaluation metrics show that the proposed model with its linear and non-linear variants shows statistically significant performance differences compared with the other methods in the benchmark. The proposed model succeeds in preserving the key features of the ECG signals, such as QRS complexes and P and T waves, in most cases, being superior to other baseline models, especially in intensive noise contamination. The proposed model size is significantly smaller compared with the TCDAE model and has a comparable inference time. The provided performance shows that the FGDAE model can be used in the vast majority of noisy ECG conditions to generate a high-quality denoised signal. The results show that utilizing gating mechanisms in the proposed structure indeed provides significant value. Also, the results show proof of concept for the great potential of introducing non-linearities in the denoising architecture.

A clear limitation of our model is the elimination of EM artifacts. The abrupt large changes and the spikes that look very similar to morphological ECG features in that artifact make it challenging to remove. That limitation is important to investigate in future works to enable more reliable ECG denoising solutions. In future works, we intend to customize the proposed model more to unlock the full power of the self-ONN layers with more non-linearity. We also plan to employ multimodalities such as synchronized motion signals to ensure better signal processing. Utilizing motion references would give a more reliable indication of the type and structure of the contaminating motion. Also, it can provide objective information on the quality of the ECG signal, which would be beneficial in rejecting heavily corrupted ECG segments. Finally, to validate the usefulness of the proposed model, we intend to perform diagnostic tasks where the morphology preservation of the signal is critical.

Author Contributions

Conceptualization, A.S. and T.S.; methodology, A.S.; software, A.S. and C.K.; validation, A.S.; formal analysis, A.S.; investigation, A.S. and L.Y.; resources, A.S. and T.S.; data curation, A.S.; writing—original draft preparation, A.S. and L.Y.; writing—review and editing, T.S. and L.Y.; visualization, A.S.; supervision, T.S.; project administration, T.S.; funding acquisition, T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the BIOFUSION project (5311/31/2022) funded by Business Finland.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The databases presented in this study (QT Database (QTDB) and MIT-BIH Stress Test Database (NSTDB)) are publicly available in Physionet at https://www.physionet.org/content/qtdb/1.0.0/ (reference number 10.13026/C24K53, accessed on 26 January 2025) and https://www.physionet.org/content/nstdb/1.0.0/ (reference number 10.13026/C2HS3T, accessed on 26 January 2025), respectively.

Acknowledgments

Kai Noponen is acknowledged for discussions in an initial stage of the project.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CVDs	cardiovascular diseases
ECG	electrocardiogram
BW	baseline wander
MA	muscle artifact
EM	electrode motion
CNN	convolutional neural network
self-ONN	self-organized operational neural network
DAE	denoising autoencoder
ECA	efficient channel attention
CBAM	convolutional block attention module
GCN	gated convolutional network
nv1	noise split version 1
nv2	noise split version 2
RMN	random mixed noise
SSD	sum of squared distance
MAD	maximum absolute distance
PRD	percentage root-mean-square difference
CosSim	cosine similarity
RMSE	root mean square error
MAE	mean absolute error
SNR	signal-to-noise ratio

References

World Health Organization. Cardiovascular Diseases (CVDs). June 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 27 November 2024).
World Health Organization. The Top 10 Causes of Death. August 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (accessed on 27 November 2024).
Geselowitz, D. On the theory of the electrocardiogram. Proc. IEEE 1989, 77, 857–876. [Google Scholar] [CrossRef]
Hagiwara, Y.; Fujita, H.; Oh, S.L.; Tan, J.H.; Tan, R.S.; Ciaccio, E.J.; Acharya, U.R. Computer-aided diagnosis of atrial fibrillation based on ECG Signals: A review. Inf. Sci. 2018, 467, 99–114. [Google Scholar] [CrossRef]
American Heart Association. Exercise Stress Test. August 2023. Available online: https://www.heart.org/en/health-topics/heart-attack/diagnosing-a-heart-attack/exercise-stress-test (accessed on 27 November 2024).
Jin, J. Screening for Cardiovascular Disease Risk with ECG. JAMA 2018, 319, 2346. [Google Scholar] [CrossRef]
Satija, U.; Ramkumar, B.; Manikandan, M.S. Automated ECG noise detection and classification system for unsupervised healthcare monitoring. IEEE J. Biomed. Health Inform. 2017, 22, 722–732. [Google Scholar] [CrossRef]
DiMarco, J.P.; Philbrick, J.T. Use of Ambulatory Electrocardiographic (Holter) Monitoring. Ann. Intern. Med. 1990, 113, 53–68. [Google Scholar] [CrossRef] [PubMed]
Ghaleb, F.A.; Kamat, M.B.; Salleh, M.; Rohani, M.F.; Abd Razak, S. Two-stage motion artefact reduction algorithm for electrocardiogram using weighted adaptive noise cancelling and recursive Hampel filter. PLoS ONE 2018, 13, e0207176. [Google Scholar] [CrossRef]
Van Alste, J.A.; Schilder, T.S. Removal of Base-Line Wander and Power-Line Interference from the ECG by an Efficient FIR Filter with a Reduced Number of Taps. IEEE Trans. Biomed. Eng. 1985, BME-32, 1052–1060. [Google Scholar] [CrossRef]
Pei, S.C.; Tseng, C.C. Elimination of AC interference in electrocardiogram using IIR notch filter with transient suppression. IEEE Trans. Biomed. Eng. 1995, 42, 1128–1132. [Google Scholar] [CrossRef]
Laguna, P.; Jané, R.; Caminal, P. Adaptive filtering of ECG baseline wander. In Proceedings of the 1992 14th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Paris, France, 29 October–1 November 1992; Volume 2, pp. 508–509. [Google Scholar] [CrossRef]
Romero, I.; Geng, D.; Berset, T. Adaptive filtering in ECG denoising: A comparative study. In Proceedings of the 2012 Computing in Cardiology, Krakow, Poland, 9–12 September 2012; pp. 45–48. [Google Scholar]
Chandrakar, C.; Kowar, M. Denoising ECG signals using adaptive filter algorithm. Int. J. Soft Comput. Eng. (IJSCE) 2012, 2, 120–123. [Google Scholar]
Danandeh Hesar, H.; Danandeh Hesar, A. Adaptive dual augmented extended Kalman filtering of ECG signals. Measurement 2025, 239, 115457. [Google Scholar] [CrossRef]
Shah, A.; Singh, D. Efficient Application of Kalman Filter for Enhancement of ECG Signal Visibility by Denoising Power Line Interference. In Proceedings of the 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN), Dhulikhel, Nepal, 3–4 July 2024; pp. 636–643. [Google Scholar] [CrossRef]
Nair, S.S.; P P, N.K.; K P, S. ECG Denoising: Evaluating the Effectiveness of Different Algorithms. In Proceedings of the 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI), Kannur, India, 21–22 June 2024; pp. 1–5. [Google Scholar] [CrossRef]
Menaceur, N.E.; Kouah, S.; Derdour, M. Adaptive Filtering Strategies for ECG Signal Enhancement: A Comparative Study. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS), El Oued, Algeria, 24–25 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Ghobadi Azbari, P.; Abdolghaffar, M.; Mohaqeqi, S.; Pooyan, M.; Ahmadian, A.; Ghanbarzadeh Gashti, N. A novel approach to the extraction of fetal electrocardiogram based on empirical mode decomposition and correlation analysis. Australas. Phys. Eng. Sci. Med. 2017, 40, 565–574. [Google Scholar] [CrossRef] [PubMed]
Ahmadi, A.; Behroozi, M.; Shalchyan, V.; Daliri, M.R. Phase and amplitude coupling feature extraction and recognition of Ictal EEG using VMD. In Proceedings of the 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, Iran, 22–22 December 2017; pp. 0526–0532. [Google Scholar] [CrossRef]
Bennia, F.; Moussaoui, S.; Boutalbi, M.C.; Messaoudi, N. Comparative study between EMD, EEMD, and CEEMDAN based on De-Noising Bioelectric Signals. In Proceedings of the 2024 8th International Conference on Image and Signal Processing and Their Applications (ISPA), Biskra, Algeria, 21–22 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Barnova, K.; Martinek, R.; Jaros, R.; Kahankova, R.; Behbehani, K.; Snasel, V. System for adaptive extraction of non-invasive fetal electrocardiogram. Appl. Soft Comput. 2021, 113, 107940. [Google Scholar] [CrossRef]
Jabeur, T.B.; Bashier, E.; Sandhu, Q.; Bwalya, K.J.; Joshua, A. Noise And Artifacts Elimination In ECG Signals Using Wavelet, Variational Mode Decomposition And Nonlocal Means Algorithm. arXiv 2024, arXiv:2406.01023. [Google Scholar]
Zhang, P.; Jiang, M.; Li, Y.; Xia, L.; Wang, Z.; Wu, Y.; Wang, Y.; Zhang, H. An efficient ECG denoising method by fusing ECA-Net and CycleGAN. Math. Biosci. Eng. 2023, 20, 13415–13433. [Google Scholar] [CrossRef]
Jenkal, W.; Latif, R.; Laaboubi, M. ECG Signal Denoising Using an Improved Hybrid DWT-ADTF Approach. Cardiovasc. Eng. Technol. 2024, 15, 77–94. [Google Scholar] [CrossRef]
Elouaham, S.; Dliou, A.; Jenkal, W.; Louzazni, M.; Zougagh, H.; Dlimi, S. Empirical Wavelet Transform Based ECG Signal Filtering Method. J. Electr. Comput. Eng. 2024, 2024, 9050909. [Google Scholar] [CrossRef]
Antczak, K. Deep Recurrent Neural Networks for ECG Signal Denoising. arXiv 2019, arXiv:1807.11551. [Google Scholar]
Romero, F.P.; Piñol, D.C.; Vázquez-Seisdedos, C.R. DeepFilter: An ECG baseline wander removal filter using deep learning techniques. Biomed. Signal Process. Control 2021, 70, 102992. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
Li, H.; Ditzler, G.; Roveda, J.; Li, A. DeScoD-ECG: Deep Score-Based Diffusion Model for ECG Baseline Wander and Noise Removal. IEEE J. Biomed. Health Inform. 2024, 28, 5081–5091. [Google Scholar] [CrossRef]
Liu, Y.T.; Wang, K.C.; Liu, K.C.; Peng, S.Y.; Tsao, Y. SDEMG: Score-Based Diffusion Model for Surface Electromyographic Signal Denoising. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 1736–1740. [Google Scholar] [CrossRef]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar] [CrossRef]
Chiang, H.T.; Hsieh, Y.Y.; Fu, S.W.; Hung, K.H.; Tsao, Y.; Chien, S.Y. Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders. IEEE Access 2019, 7, 60806–60813. [Google Scholar] [CrossRef]
Singh, P.; Sharma, A. Attention-Based Convolutional Denoising Autoencoder for Two-Lead ECG Denoising and Arrhythmia Classification. IEEE Trans. Instrum. Meas. 2022, 71, 4007710. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
Chorney, W.; Wang, H.; He, L.; Lee, S.; Fan, L.W. Convolutional block attention autoencoder for denoising electrocardiograms. Biomed. Signal Process. Control 2023, 86, 105242. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Chen, M.; Li, Y.; Zhang, L.; Liu, L.; Han, B.; Shi, W.; Wei, S. Elimination of Random Mixed Noise in ECG Using Convolutional Denoising Autoencoder With Transformer Encoder. IEEE J. Biomed. Health Inform. 2024, 28, 1993–2004. [Google Scholar] [CrossRef]
Chen, M.; Li, Y.; Zhang, X.; Gao, J.; Sun, Y.; Shi, W.; Wei, S. Multiscale Convolution and Attention based Denoising Autoencoder for Motion Artifact Removal in ECG Signals. In Proceedings of the 2024 7th International Conference on Image and Graphics Processing, ICIGP ’24, Beijing, China, 19–21 January 2024; pp. 442–448. [Google Scholar] [CrossRef]
Zhu, D.; Chhabra, V.K.; Khalili, M.M. ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers. arXiv 2024, arXiv:2407.11065. [Google Scholar]
Peng, H.; Chang, X.; Yao, Z.; Shi, D.; Chen, Y. A deep learning framework for ECG denoising and classification. Biomed. Signal Process. Control 2024, 94, 106441. [Google Scholar] [CrossRef]
Kiranyaz, S.; Ince, T.; Gabbouj, M. Personalized monitoring and advance warning system for cardiac arrhythmias. Sci. Rep. 2017, 7, 9270. [Google Scholar] [CrossRef]
Kong, F.; Hu, K.; Li, Y.; Li, D.; Liu, X.; Durrani, T.S. A Spectral-Spatial Feature Extraction Method with Polydirectional CNN for Multispectral Image Compression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2745–2758. [Google Scholar] [CrossRef]
Elhassan, T.A.M.; Rahim, M.S.M.; Swee, T.T.; Hashim, S.Z.M.; Aljurf, M. Feature Extraction of White Blood Cells Using CMYK-Moment Localization and Deep Learning in Acute Myeloid Leukemia Blood Smear Microscopic Images. IEEE Access 2022, 10, 16577–16591. [Google Scholar] [CrossRef]
Chaudhri, S.N.; Rajput, N.S. Multidimensional Multiconvolution-Based Feature Extraction Approach for Drift Tolerant Robust Classifier for Gases/Odors. IEEE Sensors Lett. 2022, 6, 7001004. [Google Scholar] [CrossRef]
Kiranyaz, S.; Ince, T.; Abdeljaber, O.; Avci, O.; Gabbouj, M. 1-D Convolutional Neural Networks for Signal Processing Applications. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8360–8364. [Google Scholar] [CrossRef]
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
Kiranyaz, S.; Malik, J.; Abdallah, H.B.; Ince, T.; Iosifidis, A.; Gabbouj, M. Self-organized Operational Neural Networks with Generative Neurons. Neural Netw. 2021, 140, 294–308. [Google Scholar] [CrossRef] [PubMed]
Malik, J.; Devecioglu, O.C.; Kiranyaz, S.; Ince, T.; Gabbouj, M. Real-Time Patient-Specific ECG Classification by 1D Self-Operational Neural Networks. IEEE Trans. Biomed. Eng. 2022, 69, 1788–1801. [Google Scholar] [CrossRef]
Malik, J.; Kiranyaz, S.; Yamac, M.; Gabbouj, M. Bm3d Vs 2-Layer Onn. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1994–1998. [Google Scholar] [CrossRef]
Hodaka, Y.; Yusuke, S.; Toshiyuki, U. Image Denoising with Self Operational and Convolutional Cycle-GANs. In Proceedings of the 2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Jeju, Republic of Korea, 25–28 June 2023; pp. 1–6. [Google Scholar] [CrossRef]
Can, D.O.; Mete, A.; Fahad, S.; Turker, I.; Moncef, G. Improved Active Fire Detection Using Operational U-nets. In Proceedings of the 2023 Photonics & Electromagnetics Research Symposium (PIERS), Prague, Czech Republic, 3–6 July 2023; pp. 692–697. [Google Scholar] [CrossRef]
Zahid, M.U.; Kiranyaz, S.; Gabbouj, M. Global ECG Classification by Self-Operational Neural Networks with Feature Injection. IEEE Trans. Biomed. Eng. 2023, 70, 205–215. [Google Scholar] [CrossRef]
Qin, K.; Huang, W.; Zhang, T.; Zhang, H.; Cheng, X. A lightweight SelfONN model for general ECG classification with pretraining. Biomed. Signal Process. Control 2024, 89, 105780. [Google Scholar] [CrossRef]
Gabbouj, M.; Kiranyaz, S.; Malik, J.; Zahid, M.U.; Ince, T.; Chowdhury, M.E.H.; Khandakar, A.; Tahir, A. Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 9363–9374. [Google Scholar] [CrossRef]
Ardelean, E.R.; Coporîie, A.; Ichim, A.M.; Dînșoreanu, M.; Mureșan, R.C. A study of autoencoders as a feature extraction technique for spike sorting. PLoS ONE 2023, 18, e0282810. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Jansson, A.; Humphrey, E.; Montecchio, N.; Bittner, R.; Kumar, A.; Weyde, T. Singing voice separation with deep U-Net convolutional networks. In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 23–27 October 2017; Available online: https://openaccess.city.ac.uk/id/eprint/19289/ (accessed on 26 January 2025).
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; Proceedings of Machine Learning Research. Volume 70, pp. 933–941. [Google Scholar]
Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024, 15, 517. [Google Scholar] [CrossRef]
Noponen, K.; Kortelainen, J.; Seppänen, T. Invariant trajectory classification of dynamical systems with a case study on ECG. Pattern Recognit. 2009, 42, 1832–1844. [Google Scholar] [CrossRef]
Badiger, R.; M, P. ASCNet-ECG: Deep Autoencoder based Attention aware Skip Connection network for ECG filtering. Int. J. Eng. Trends Technol. 2023, 71, 382–398. [Google Scholar] [CrossRef]
Srivastava, N. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv 2017, arXiv:1607.08022. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Ge, L.; Dou, L. G-Loss: A loss function with gradient information for super-resolution. Optik 2023, 280, 170750. [Google Scholar] [CrossRef]
Tang, Q.; Chen, Z.; Ward, R.; Menon, C.; Elgendi, M. PPG2ECGps: An End-to-End Subject-Specific Deep Neural Network Model for Electrocardiogram Reconstruction from Photoplethysmography Signals without Pulse Arrival Time Adjustments. Bioengineering 2023, 10, 630. [Google Scholar] [CrossRef]
Chen, J.; Shi, X. Sparse Convolutional Denoising Autoencoders for Genotype Imputation. Genes 2019, 10, 652. [Google Scholar] [CrossRef]
Laguna, P.; Mark, R.; Goldberg, A.; Moody, G. A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. In Proceedings of the Computers in Cardiology 1997, Lund, Sweden, 7–10 September 1997; pp. 673–676. [Google Scholar] [CrossRef]
Moody, G.B.; Muldrow, W.; Mark, R.G. A noise stress test for arrhythmia detectors. Comput. Cardiol. 1984, 11, 381–384. [Google Scholar]

Figure 1. Overview of the proposed FGDAE model.

Figure 2. The building blocks of the proposed model.

Figure 3. ECG signals processed by selected algorithms. The artifact composition is (BW+MA).

Figure 4. ECG signals processed by selected algorithms. The artifact composition is (MA+EM).

Figure 5. ECG signals processed by selected algorithms. The artifact composition is (BW+MA+EM).

Table 1. Noise split used in the experiments.

Split Name	Train Noise	Test Noise
Noise split version 1 (nv1)	Channel 1_a	Channel 2_b
Noise split version 2 (nv2)	Channel 2_a	Channel 1_b

Table 2. Subjects selected from QTDB for test sets.

Database Name	Selected ID
MIT-BIH Arrhythmia Database	sel123, sel233
MIT-BIH ST Change Database	sel302, sel307
MIT-BIH Supraventricular Arrhythmia Database	sel820, sel853
MIT-BIH Normal Sinus Rhythm Database	sel16420, sel16795
European ST-T Database	sel0106, sel0121
Sudden death patients from BIH	sel32, sel49
MIT-BIH Long-Term ECG Database	sel14046, sel15815

Table 3. Noise types and combination types.

Noise Type *	Combination Type
Noise Type *	1	2	3	4	5	6	7	8
BW	-	✔	-	✔	-	✔	-	✔
MA	-	-	✔	✔	-	-	✔	✔
EM	-	-	-	-	✔	✔	✔	✔

* The symbol "-" represents the absence of a noise type and the symbol "✔" indicates the presence of a noise type in each combination.

Table 4. Overall evaluation metrics results for different methods for RMN experiment.

Method *	SSD (mV)	MAD (mV)	PRD (%)	CosSim (a.u.)	RMSE (mV)	MAE (mV)	SNR (dB)	p-Value (q = 1)	p-Value (q = 2)
FIR Filter []	$85.705 \pm 153.683$	$0.959 \pm 0.861$	$64.552 \pm 27.458$	$0.685 \pm 0.238$	$0.181 \pm 0.119$	$0.127 \pm 0.079$	$4.162 \pm 4.290$	$0.000$	$0.000$
IIR Filter []	$87.466 \pm 152.595$	$1.012 \pm 0.876$	$67.597 \pm 25.297$	$0.667 \pm 0.235$	$0.181 \pm 0.119$	$0.127 \pm 0.079$	$4.162 \pm 4.290$	$0.000$	$0.000$
DRNN []	$25.711 \pm 45.872$	$0.831 \pm 0.619$	$89.806 \pm 51.404$	$0.720 \pm 0.235$	$0.182 \pm 0.131$	$0.121 \pm 0.079$	$4.127 \pm 4.271$	$0.000$	$0.000$
CNN-DAE []	$20.654 \pm 29.592$	$0.913 \pm 0.528$	$98.721 \pm 42.564$	$0.754 \pm 0.200$	$0.173 \pm 0.102$	$0.112 \pm 0.068$	$4.044 \pm 2.707$	$0.000$	$0.000$
DeepFilter []	$18.011 \pm 34.857$	$0.616 \pm 0.494$	$66.110 \pm 33.461$	$0.811 \pm 0.174$	$0.152 \pm 0.110$	$0.108 \pm 0.070$	$5.679 \pm 4.349$	$0.000$	$0.000$
FCN-DAE []	$17.737 \pm 35.363$	$0.707 \pm 0.539$	$68.630 \pm 38.680$	$0.803 \pm 0.201$	$0.151 \pm 0.109$	$0.103 \pm 0.071$	$5.629 \pm 3.938$	$0.000$	$0.000$
ACDAE []	$13.715 \pm 31.486$	$0.570 \pm 0.477$	$54.561 \pm 32.317$	$0.850 \pm 0.174$	$0.129 \pm 0.101$	$0.089 \pm 0.066$	$7.300 \pm 4.824$	$0.000$	$0.000$
CBAM-DAE []	$16.332 \pm 35.659$	$0.561 \pm 0.494$	$54.284 \pm 29.688$	$0.837 \pm 0.189$	$0.138 \pm 0.114$	$0.098 \pm 0.076$	$6.916 \pm 5.102$	$0.000$	$0.000$
TCDAE []	$15.399 \pm 34.832$	$0.580 \pm 0.495$	$52.632 \pm 32.179$	$0.850 \pm 0.187$	$0.134 \pm 0.110$	$0.093 \pm 0.074$	$7.183 \pm 5.261$	$0.000$	$0.000$
FGDAE, q = 1 (ours)	$12.899 \pm 32.987$	$0.504 \pm 0.466$	$49.993 \pm 30.778$	$0.866 \pm 0.169$	$0.122 \pm 0.102$	$0.086 \pm 0.068$	$8.028 \pm 5.229$	−	$8.58 \times 10^{- 13}$
FGDAE, q = 2 (ours)	$12.251 \pm 30.766$	$0.501 \pm 0.448$	$50.588 \pm 31.524$	$0.867 \pm 0.165$	$0.121 \pm 0.096$	$0.086 \pm 0.065$	$7.841 \pm 4.749$	$8.58 \times 10^{- 13}$	−
FGDAE, q = 3 (ours)	$14.520 \pm 32.859$	$0.563 \pm 0.480$	$55.960 \pm 32.964$	$0.847 \pm 0.171$	$0.133 \pm 0.104$	$0.093 \pm 0.070$	$7.119 \pm 4.925$	$0.000$	$0.000$
FGDAE, q = 4 (ours)	$24.100 \pm 34.926$	$0.775 \pm 0.576$	$80.954 \pm 43.806$	$0.738 \pm 0.210$	$0.181 \pm 0.119$	$0.127 \pm 0.079$	$4.162 \pm 4.290$	$0.000$	$0.000$

* The bold black values indicate the best performance for each error metric.

Table 5. Wilcoxon test (

α = 0.05

) results between (FGDAE, q = 2) and selected models.

Table 5. Wilcoxon test (

α = 0.05

) results between (FGDAE, q = 2) and selected models.

Metric	ACDAE []	CBAM-DAE []	TCDAE []	FGDAE, q = 1
SSD	$0.0$	$0.0$	$0.0$	$0.0152$
MAD	$0.0$	$0.0$	$0.0$	$4.77 \times 10^{- 5}$
PRD	$0.0$	$0.0$	$2.76 \times 10^{- 135}$	$3.99 \times 10^{- 15}$
CosSim	$0.0$	$0.0$	$8.21 \times 10^{- 174}$	$0.002$
RMSE	$0.0$	$0.0$	$0.0$	$5.94 \times 10^{- 10}$
MAE	$6.40 \times 10^{- 207}$	$0.0$	$8.79 \times 10^{- 272}$	$7.08 \times 10^{- 17}$
SNR	$0.0$	$0.0$	$0.0$	$4.88 \times 10^{- 39}$

Table 6. SSD in different noise segments for different methods for RMN experiment.

Method *	$0.2 to 0.6$	p-Value	$0.6 to 1.0$	p-Value	$1.0 to 1.5$	p-Value	$1.5 to 2.0$	p-Value
FIR Filter []	$13.281 \pm 18.963$	$0.000$	$45.027 \pm 51.702$	$0.000$	$108.33 \pm 133.80$	$0.000$	$201.64 \pm 235.83$	$0.000$
IIR Filter []	$19.914 \pm 36.081$	$0.000$	$48.996 \pm 61.473$	$0.000$	$109.61 \pm 138.85$	$0.000$	$194.72 \pm 231.93$	$0.000$
DRNN []	$9.540 \pm 11.546$	$0.000$	$17.249 \pm 19.915$	$0.000$	$31.911 \pm 44.748$	$0.000$	$51.011 \pm 72.185$	$0.000$
CNN-DAE []	$13.153 \pm 14.126$	$0.000$	$15.958 \pm 16.146$	$0.000$	$22.618 \pm 27.114$	$0.000$	$33.395 \pm 47.069$	$0.000$
DeepFilter []	$6.773 \pm 8.176$	$0.000$	$12.130 \pm 13.265$	$0.000$	$22.019 \pm 30.671$	$0.000$	$35.997 \pm 58.169$	$0.000$
FCN-DAE []	$7.739 \pm 10.043$	$0.000$	$11.218 \pm 13.435$	$0.000$	$19.732 \pm 29.973$	$0.000$	$35.475 \pm 59.870$	$0.000$
ACDAE []	$4.842 \pm 7.299$	$0.000$	$8.428 \pm 10.438$	$0.000$	$15.785 \pm 23.642$	$0.000$	$29.279 \pm 55.296$	$0.000$
CBAM-DAE []	$4.790 \pm 6.770$	$0.000$	$9.415 \pm 12.309$	$0.000$	$19.425 \pm 29.351$	$0.000$	$36.018 \pm 60.538$	$0.000$
TCDAE []	$5.034 \pm 8.310$	$0.000$	$8.976 \pm 12.185$	$0.000$	$18.079 \pm 29.706$	$0.000$	$33.436 \pm 59.043$	$0.000$
FGDAE, q = 1	$4.298 \pm 7.116$	–	$7.405 \pm 10.228$	–	$14.599 \pm 25.467$	$0.000$	$28.665 \pm 58.270$	$0.000$
FGDAE, q = 2	$4.431 \pm 7.571$	$0.000$	$7.477 \pm 10.904$	$0.161$	$13.778 \pm 23.923$	–	$26.230 \pm 54.206$	–
FGDAE, q = 3	$4.962 \pm 7.444$	$0.000$	$8.829 \pm 10.827$	$0.000$	$16.747 \pm 26.402$	$0.000$	$31.547 \pm 56.763$	$0.000$
FGDAE, q = 4	$9.961 \pm 12.276$	$0.000$	$17.620 \pm 18.185$	$0.000$	$30.706 \pm 36.328$	$0.000$	$44.672 \pm 50.174$	$0.000$