Deep Learning-Based Stacked Denoising and Autoencoder for ECG Heartbeat Classification

.


Introduction
The electrocardiogram (ECG) is a valuable technique for making decisions regarding cardiac heart diseases (CHDs) [1].However, the ECG signal acquisition involves high-gain instrumentation amplifiers that are easily contaminated by different sources of noise, with characteristic frequency spectrums depending on the source [2].ECG contaminants can be classified into different categories, including [2][3][4]; (i) power line interference at 60 or 50 Hz, depending on the power supply frequency; (ii) electrode contact noise of about 1 Hz, caused by improper contact between the body and electrodes; (iii) motion artifacts that produce long distortions at 100-500 ms, caused by patient's movements, affecting the electrode-skin impedance; (iv) muscle contractions, producing noise up to 10% of regular peak-to-peak ECG amplitude and frequency up to 10 kHz around 50 ms; and (v) baseline wander caused by respiratory activity at 0-0.5 Hz.All of these kinds of noise can interfere with the original ECG signal, which may cause deformations on the ECG waveforms and produce an abnormal signal.
To keep as much of the ECG signal as possible, the noise must be removed from the original signal to provide an accurate diagnosis.Unfortunately, the denoising process is a challenging task due to the overlap of all the noise signals at both low and high frequencies [4].To prevent noise interference, several approaches have been proposed to denoise ECG signals based on adaptive filtering [5][6][7], wavelet methods [8,9], and empirical mode decomposition [10,11].However, all these proposed techniques require analytical calculation and high computation; also, because cut-off processing can lose clinically essential components of the ECG signal, these techniques run the risk of misdiagnosis [12].Currently, one machine learning (ML) technique, named denoising autoencoders (DAEs), can be applied to reconstruct clean data from its noisy version.DAEs can extract robust features by adding noise to the input data [13].Previous results indicate that DAEs outperform conventional denoising techniques [14][15][16].Recently, DAEs have been used in various fields, such as image denoising [17], human activity recognition [18], and feature representation [19].
To produce the proper interpretation of CHDs, the ECG signal must be classified after the denoising process.However, DAEs are unable to automatically produce the extracted feature [15,16].Feature extraction is an important phase in the classification process for obtaining robust performance.If feature representation is bad, it will cause the classifier to produce a low performance.Such a limitation in DAEs leaves room for further improvement upon the existing ECG denoising method through combination with other methods for feature extraction.Some techniques, including principal component analysis (PCA) or linear discriminant analysis (LDA) algorithms, have been proposed [20,21].However, these cannot extract the feature directly from the network structure and they usually require a trial-and-error process, which is time-consuming [20,21].Currently, by using autoencoders (AEs), extracting features of the raw input data can work automatically.This leads to an improvement in the prediction model performances, while, at the same time, reducing the complexity of the feature design task.Hence, a combination of two models of DAEs and AEs is a challenging task in ECG signal processing applications.
The classification phase based on ECG signal processing studies can be divided into two types of learning: supervised and unsupervised [22][23][24][25][26][27].Such two types of learning provide good performance in ECG beats [26][27][28], or rhythm classification [27][28][29].Among them, Yu et al. [26] proposed higher-order statistics of sub-band components for heartbeat classification with noisy ECG.Their proposed classifier is a feedforward backpropagation neural network (FBNN).The feature selection algorithm is based on the correlation coefficient with five levels of the discrete wavelet transform (DWT).However, for exceptional evaluation, DWT becomes computationally intensive.Besides its discretization, DWT is less efficient and less natural, and it takes time and energy to learn which wavelets will serve each specific purpose.Li et al. [27] focused on the five-level ECG signal quality classification algorithm, adding three types of real ECG noise at different signal-to-noise ratio (SNR) levels.A support vector machine (SVM) classifier with a Gaussian radial basis function kernel was employed to classify the ECG signal quality.However, ECG signal classification with traditional ML based on supervised shallow architecture is limited by feature extraction and classification because it uses a handcrafted, feature-based approach.Also, for larger amounts of ECG data and variance, the shallow architecture can be employed for this purpose.On the other hand, the deep learning (DL) technique extracts features directly from data [24,25].In our previous work [30], the DL technique successfully worked to generate feature representations from raw data.This process is carried out by conducting an unsupervised training approach to process feature learning and followed by a classification process.DL has superiority in automated feature learning, while ML is only limited to feature engineering.
Artificial neural networks (ANNs) are a well-known technique in ML.ANNs increase the depth of structure by adding multiple hidden layers, named deep neural networks (DNNs).Some of these layers can be adjusted to better predict the final outcome.More layers enable DNNs to fit complex functions with fewer parameters and improve accuracy [27].Compared with shallow neural networks, DNNs with multiple nonlinear hidden layers can discover more complex relationships between input layers and output layers.High-level layers can learn features from lower layers to obtain higher-order and more abstract expressions of inputs [28].However, DNNs cannot learn features from noisy data.The combination of DNNs and autoencoders (AEs) can learn efficient data coding in an unsupervised manner.However, the AEs do not perform well when the data samples are very noisy [28].Therefore, DAEs were invented to enable AEs to learn features from noisy data by adding noise to the input data [28].
To the best of our knowledge, no research has implemented the DL technique using stacked DAEs and AEs to accomplish feature learning for the noisy signal of ECG heartbeat classification.This paper proposes a combination DAEs-AEs-DNNs processing method to calculate appropriate features from ECG raw data to address automated classification.This technique consists of beat segmentation, noise cancelation with DAEs, feature extraction with AEs, and heartbeat classification with DNNs.The validation and evaluation of classifiers are based on the performance metrics of accuracy, sensitivity, specificity, precision, and F1-score.The rest of this paper is organized as follows.
In Section 2, we explain the materials and the proposed method.In Section 3, we conduct an experiment on a public dataset and compare the proposed method with existing research.Finally, we conclude the paper and discuss future work in Section 4.

Autoencoder
Autoencoders (AEs) are a neural network trained to try to map the input to its output in an unsupervised manner.AEs have a hidden layer h that describes the coding used to represent the input [29].AEs consist of two parts-the encoder ( ( )) h f a = and the decoder ( ( )) f and g are called the encoder and decoder mappings, respectively.The number of hidden units is smaller than the input or output layers, which achieve encoding of the data in a lower-dimensional space and extract the most discriminative features.Given the training samples D (dimensional vectors) { , ,..., } m a a a a = , the encoder forms the x input vector into d (dimensional vectors), a hidden representation . This study implements the rectified linear unit (ReLU) as an activation function in the first hidden encoding layer.In addition, the activation function σ , (1) (1) , in the output, where (1)  W is a d D

×
(dimensional weight matrix), and b is a d (dimensional bias vector).Then, vector h is transformed back into the reconstruction vector b is a D (dimensional bias vector).
The AEs' training aims to optimize the parameter set for reducing the error of reconstruction.The mean squared error (MSE) is used as a loss function in standard AEs [26,30]: AEs are usually trained using only a clean ECG signal dataset.For the further task of treating noisy ECG data, denoising AEs (DAEs) are introduced.In the case of a single-hidden-layer neural AE trained with noisy ECG data as input and a clean signal as an output, it includes one nonlinear encoding and decoding stage, as follows: where  � is a corrupted version of x, b and c represent vectors of biases of input and output layers, respectively, and x is the desired output.Usually, a tied weight matrix (i.e., 1 2 ) is used as one type of regularization.This paper uses a noisy signal to train the DAEs before the automated feature extraction with AEs.DAEs are a stochastic extension of classic AEs.DAEs try to reconstruct a clean input from its corrupted version.The initial input x is corrupted to x  by a stochastic mapping . Subsequently, DAEs use the corrupted x  as the input data and then map to the corresponding hidden representation y and ultimately to its reconstruction z .After the reconstruction signal is obtained from DAEs, the signal-to-noise ratio (SNRs) value must be calculated so that the signal quality can be measured [30], as follows: where d x and x are the reconstructed and the original signal, respectively.
DAEs have shown good results in extracting noisy robust features in ECG signals and other applications [14].

Proposed Deep Learning Structure
The basis of DL is a bio-inspired algorithm from the earliest neural network.Fundamentally, DL formalizes how a biological neuron works, in which the brain can process information by billions of these interlinked neurons [30].DL has provided new advanced approaches to the training of DNNs architectures with many hidden layers, outperforming its ML counterpart.The features extracted in DL by using data-driven methods can be more accurate.An autoencoder is a neural network designed specifically for this purpose.In our previous work [30], deep AEs were shown to be implemented in ECG signal processing before the classification task, extracting high-level features not only from the training data but also from unseen data.The Softmax function is used as an activation function, and it can be treated as the probability of each label for the output layer of the classifier.Here, let N be the number of units of the output layer, let x be the input, and let i x be the output of unit i .Then, the output ( ) p i of unit i is defined by the following equation, Cross entropy is used as the loss function of the classifier f L , as follows: where n is the sample size, m is the number of classes, ij p is the output of the classifier of class j of the th i sample, and ij y is the annotated label of class j of the th i sample.
In our study, the proposed DL structure consists of noise cancelation with DAEs, automated feature extraction with AEs, and DNNs as a classifier, as presented in Figure 1.DAEs structure is divided into three layers, namely the input, encoding, and output layer.There are two models of DAEs structured for validation-Model 1, which has an input, encoding, and output layer which each have 252 nodes, respectively, while in Model 2 the input and output layers have 252 nodes, respectively, and the encoding layer has 126 nodes.For all models, the activation function in the encoding layer is the rectified linear unit (ReLU) and in the output layer is sigmoid.The compilation of the DAEs model requires two arguments, namely the optimizer and loss function.The optimization method used in the DAE construction is adaptive moment estimation (Adam), with the mean squared error as the loss function.As the proposed DAEs structure, SNR −6 dB is used for the input as the noisiest ECG and SNR 24 dB for the desired signal as the best ECG quality in the dataset used in this study.After doing some experiments and completing the training phase, a good accuracy with a total of 400 epochs and a batch size of 64 is obtained.Then, the DAEs model can already be used to reconstruct the signal with SNR −6 dB, and the results of the reconstructed signal will approach an SNR of 24 dB.After all noisy signals have been removed by the DAEs, the next process is to extract the features of the signal.Automated feature extraction using the AEs [30] is the final step before the ECG heartbeat can be classified using DNNs.The ECG signal that has been reconstructed by the DAEs is carried out in a training process with 200 epochs and a batch size of 32 for the AE architecture.After completing the AEs' training phase, the reconstruction signal is used for prediction in the encoder, from the input layer to the encoding layer.After the signal is predicted in the encoder, a feature of the reconstructed signal is obtained.This reconstructed signal is used as the classifier input of the DNNs.Like the DAEs architecture, ReLU and Sigmoid were implemented in the encoding layer and output layer, respectively.The encoding layer of the AEs is used as an input for the DNNs classifier, which has five hidden layers and represents five classes of ECG heartbeat.In the input layer, there

ECG Segmentation and Normalization
In our previous work [30], ECG signal segmentation was used to find the R-peak position.After the R-peak position was detected, sampling was performed at approximately 0.7-s segments for a single beat.The section was divided into two intervals: t1 of 0.25 s before the R-peak position and t2 of 0.45 s after the R-peak position.The ECG records of 118 24 dB, 118 −6 dB, 119 24 dB, and 119 −6 dB were segmented into the beat (see Figure 3).The record of 118 contained four types of heartbeat: Atrial Premature (A), Right Bundle Branch Block (R), Non-conducted P-wave (P), and Premature Ventricular Contraction (V).The record of 119 contains two types of heartbeat: Normal (N) and Premature Ventricular Contraction (V).The total beats of each record were 2287 and 1987, respectively.The number and representation of each beat are represented in Table 1, and the sample of a heartbeat after segmentation in Figure 4.After the segmentation of each beat, the ECG signal was normalized into 0 and 1 using the normalized bound.Normalized bound changed the value of the lower limit ( ) lb and upper limit ( ) ub on the amplitude of a signal to the desired range without changing the pattern or shape of the signal itself.In this study, the data was acquired in the pre-process with the lower limit (0) and upper limit (1), respectively.The mathematical function of the normalization with normalized bound is as follows: where is the midpoint of the input signal: The sample result of normalized ECG beats can be seen in Figure 5, in which the range of amplitude (mV) of signal is 0 and 1.After the DAEs model was trained, a reconstruction result (clean ECG signal) was produced.A clean ECG signal was used as an input into the AEs model to extract features.The feature was learned through the encoder and decoder steps.After the AEs model was trained, the encoder part was used as the feature output, and all features (f1, f2, f3, … fn) were used as an input into the DNNs model to classify the ECG heartbeat.All steps of feature learning are presented in Figure 7.In the third and fourth models there were five layers, in which there are 63 nodes in the encoding layer.For the fifth and sixth models, we propose seven layers, in which there are 32 nodes in the encoding layer.Based on our previous work [30], for all six models in this study, we propose ReLU and Softmax activation functions with Adam optimization (0.0001 as the learning rate).Adam optimization allows for the use of adaptive learning levels for each variable.The total epoch of each model is 200, with a batch size of 48.To select a good classifier model, the DAEs, AEs, and DNNs change in some structures in the validation phase.Only DL with high performance is selected as the best DL model.This paper presents a total of six models for ECG heartbeat classification validation (see Table 2).The weights of the classifier are extracted from the pre-trained DAEs-AEs model, and the fully connected layer is added as an output layer with Softmax as an activation function and categorical cross-entropy as a loss function.(32-100-100-100-100-100-5) (ReLU-ReLU-ReLU-ReLU-ReLU-Softmax)

Model
The classification performances were measured by using the following five metrics in the literature: accuracy, sensitivity, specificity, precision and F1-score [22,30].While accuracy can be used for evaluating the overall performance as benchmarking, the other metrics can measure the performance of a specific class of validation model.The five metrics that measured for model validation can be expressed as follows: Pr 1 where I is the number of beat classes; i TP (true positives) is the number of class th i that are correctly classified; i TN (true negative) is the number of non-class th i that are correctly classified; i FP (false positive) is the number of non-class th i that are incorrectly predicted as class th i types, and i FN (false negative) is the number of class th i that are incorrectly predicted as non-class th i .

Results and Analysis
In this study, the training process used 90% of the data, and the remaining data was used for the testing process.The testing set was used to make decisions on hyperparameter values and model selection in Table 2.The next procedure was the classification stage with DNNs after the feature had been extracted in the AEs.Firstly, in the input layer, different nodes were compared: 32, 63, and 126.Then, the fine-tuned nodes were used to obtain the best architecture in the input layer.As seen in Table 2, the first model of DNNs, with 126 nodes, was chosen because it outperformed the other nodes in the result.The architecture of the first DNNs model consisted of one input layer, five hidden layers (each with 100 nodes), and the output layers which represented five classes of ECG beat (i.e., A, R, P, V, and N).In the output layers, we applied the Softmax function.To evaluate our proposed DNNs model, five performance metrics were used: accuracy, sensitivity, specificity, precision, and F1-score.The comparison result based on these metrics of the six DL models in Table 2 is shown in Tables 3 and 4, for the training and testing sets, respectively.As displayed in Table 3, Model 1 yields the highest accuracy, specificity, precision, and F1-score in the training phase.For the testing results in Table 4, Model 1 also yields the highest results in all performance metrics.Therefore, Model 1 is our proposed DL architecture for ECG noise removal and AEs feature extraction to the DNNs classification task.Unfortunately, poor performances were shown in Models 5 and 6.This is because Models 5 and 6 use the DAEs' second model that produced a loss of 0.0037 and SNR reconstruction of 20.6129 dB.This reconstruction is not close to the target signal of 24 dB, meaning that the ECG is still noisy.This will affect the classifier's performance results at the end.The sensitivity, precision, and F1-score achieved only 56.97% to 59.19%.The sensitivity of Model 5 in training is inversely proportional to the testing set.In training, the sensitivity is 98.90%; however, in testing, it only achieves 56.97%.As seen in Tables 2 and 4, the number of AE encoding layers in Models 5 and 6 yields the worst result in the testing phase of the classifier.As for classification, to analyze the prediction of the DL model in each class during the training and testing phase, the confusion matrix was applied.Tables 5 and 6 present the predicted ECG heartbeat class only for the first model as the proposed architecture.For the A-class (atrial premature beat), compared to the R class (right bundle branch block), there are 11 misclassified ECG heartbeats in the training phase and only two misclassified beats in the testing phase.From the result of the confusion matrix, the A-class is the worst result of all the classes.Beats were misclassified as R beats in both the training and testing phases.This may have occurred because of the larger amount of data for R beats compared to A beats.This may have occurred due to such a large amount of comparison data for the A and R beats.Later, the morphology between these two beats is similar.

Class A R P V N
In this study, we compared the classification process of two types of feature-learning condition: with and without AEs.However, by excluding the AEs structure, the comparison is conducted only for the best models of DNNs structure.Such conditions help investigate what affects the DNNs' classifier performance without feature extraction.With the same DL architecture with DAEs, the result of the testing phase is presented in Table 7. Table 7 shows a significant gap between the DAEs-AEs-DNNs model and the DAEs-DNNs model.The DAEs-AEs-DNNs model outperformed the DAEs-DNNs model for all five ECG beat classes, with average scores of 99.34%, 93.83%, 99.575%, 89.81%, and 91.44% in accuracy, sensitivity, specificity, precision, and F1-score, respectively.Without AEs, the result of all the performance metrics decreases significantly, including a concerningly poor performance score of 78.67% in precision.DAEs are trained in a similar yet different way.When performing the self-supervised training, the input ECG signal is corrupted by adding noise.The DAEs, whose task is to recover the original input, are trained locally to denoise corrupted versions of their inputs.However, sometimes the input and expected output of ECG are not identical, which affects the DNNs classifier.Therefore, the DAEs are stacked with AEs as a solution to provide a good representation for a better-performing classifier.Intuitively, if a representation allows for a proper reconstruction of its input, it means that it has retained much of the information that was present in that input.To verify the proposed DL model, the accuracy and the loss curve are shown in Figure 8a,b, respectively.In Figure 8a, the accuracy of the training set starts above 60% in the first epoch.Later, in the testing set, the accuracy of the model achieves a satisfactory accuracy of about 99.34%.Figure 8b shows decreasing error until 0.6%, along with increasing epochs in the training and testing set.Both curves produce suitable shapes, as the DAEs and AEs' reconstruction results show effective noise cancelation before processing with the DNNs.However, after 60 rounds of training, the DL cannot predict accurately when AEs are not included in the structure (see Figure 8c,d).This shows that while DAEs perform the denoising process to reconstruct the clean data, they fail to reconstruct the representation feature for the classifier.Without AEs, the input to DNNs is the DAE's encoding.Therefore, the output classifier reaches a 40% loss in the testing process and produces overfitting due to the ECG beats' signal being incorrectly predicted by the DNNs (see Figure 8c).As seen in Figure 8d, the training error is much lower than the testing error, thus producing a high variance.The performance of the proposed DL structure was compared to three previous studies in the literature [26,27,31], due to limited previous research classifying heartbeats with noisy data using ML including SVM, FBNNs, and CNNs (see Table 8).The results found that, with an accuracy level of 99.34%, our proposed DL structure outperformed the SVM, FBNN, and CNN classifiers of ECG noise removal and feature representation algorithms (see Table 8).The proposed DL structure yields higher accuracy than FBNNs, SVM, and CNNs, which yield 97.50%, 80.26%, and 95.70% accuracy, respectively.We can clearly observe that the proposed DL model with stacked DAEs-AEs successfully reconstructs the heartbeat ECG signal from the noisy signal and is effective for arrythmia diagnosis.In our previous research [30], the DL model with DWT for noise cancelation was proposed with the MITDB dataset without noise stress.This study took the best DL model from [30] and added data from the NSTDB, producing a higher than 99% accuracy for two conditions of the ECG signal.This means the proposed DL model may be generalizable in arrhythmia diagnosis.
point of the input signal, min x is the lower point of the input signal, and mid is the midpoint of the specified limit:

Figure 7 .
Figure 7.The sample of the stacked DAEs-AEs process for ECG signal.

Table 1 .
Heartbeat distribution after the ECG segmentation.

Table 3 .
The performance evaluation of DL models in the training phase.

Table 4 .
The performance evaluation of DL models in the testing phase.

Table 5 .
Confusion matrix in the training phase.

Table 6 .
Confusion matrix in the testing phase.

Table 8 .
The benchmark of the proposed model with another DL structure.