Electrocardiogram Signals Classification Using Deep-Learning-Based Incorporated Convolutional Neural Network and Long Short-Term Memory Framework †

: Cardiovascular diseases (CVDs) like arrhythmia and heart failure remain the world’s leading cause of death. These conditions can be triggered by high blood pressure, diabetes, and simply the passage of time. The early detection of these heart issues, despite substantial advancements in artificial intelligence (AI) and technology, is still a significant challenge. This research addresses this hurdle by developing a deep-learning-based system that is capable of predicting arrhythmias and heart failure from abnormalities in electrocardiogram (ECG) signals. The system leverages a model that combines long short-term memory (LSTM) networks with convolutional neural networks (CNNs). Extensive experiments were conducted using ECG data from both the MIT-BIH and BIDMC databases under two scenarios. The first scenario employed data from five distinct ECG classes, while the second focused on classifying data from three classes. The results from both scenarios demonstrated that the proposed deep-learning-based classification approach outperformed existing methods.


Introduction
Artificial intelligence (AI) was invented by John McCarthy, the father of AI, in [1].AI was developed with the aim of emulating human cognitive processes, fostering the hope that it would not only augment but also significantly assist in human endeavors.While traditional machine learning techniques like support vector machines (SVMs) have been around for a while in medical image and signal classification, they are falling short.First, their accuracy is not quite good enough for real-world use.Second, they have seen slow progress in recent years.On top of that, extracting and selecting features is a time-consuming task and often varies depending on the algorithm used and the type of image being analyzed.Deep neural networks (DNNs), specifically convolutional neural networks (CNNs), have been revolutionizing classification tasks for more than a decade and have reached impressive performance levels [2].In fact, some CNN-based research in medical image and signal classification has achieved accuracy that rivals even human experts.AI has lately been used in diverse fields such as facial expression recognition [3], cyber security [4], space missions [5], management [6], online education [7], and chip development [8].Furthermore, the use of AI has developed content-creation software such as Photoshop or Midjourney, in which AI can now generate images or art just from a description of the scene.Moreover, lately many big companies and researchers have been focusing on using AI in biomedical fields such as COVID-19 detection [9,10], cancer detection [11], and arrhythmia detection [12].Arrhythmia is a kind of heart disease which causes an irregular rhythm within the heart.When it comes to heart diseases, it is necessary to perform what is called an ECG test.The ECG test is a non-invasive procedure where doctors attach several sensors to areas of the patient's body, such as arms, legs and, most importantly, chest.The main purpose of these sensors is to record the electrical activity of the heart and pass it into the ECG machine where it will be presented to doctors for a diagnosis of the heart condition of the patient.Unfortunately, doctors might sometimes misdiagnose the ECG waveform, which will prevent early-stage treatment and in turn might lead to the patient's death.Therefore, using an automated system is good practice to effectively increase correct diagnoses without overwhelming the doctors.Only suspicious cases can be referred to highly skilled and experienced doctors for further analysis.To achieve this, training a model on reliable databases for extracting and classifying the relevant information is required.The time domain representation of the signal might not be enough to extract the salient discriminating features.Instead, transforming the ECG signal to the frequency domain using Fourier transform [13] can help in extracting more useful information.Other transform methods such as continuous wavelet transform [14,15] can also be used; this mainly represents signals in the frequency domain as heat map images of either scalograms or spectrograms.
Many researchers have proposed different algorithms and techniques for detecting and predicting the type of arrhythmia in the heartbeat using various arrhythmia datasets such as MIT-BIH and BIDMC databases [13][14][15][16][17][18][19].Researchers in [20] introduced a federated semisupervised learning (FSSL) framework named FedECG for ECG abnormality prediction based on the ResNet-9 model.They claimed that their framework managed to solve both challenges of non i.i.d.data [21,22] and heavy labeling while maintaining personal privacy.The development of a generalized model and data imbalance are another two challenges that have been addressed in another study that used a four-layer convolutional neural network (CNN) model [23].They proposed introducing new data augmentation for the ECG signal using simple identical-length segments and re-arrangement to obtain welldistinguished synthetic signals.In other studies, a combination network of convolutional neural network (CNN) and long short-term memory (LSTM) were used in which the major feature selection and classification steps were merged in the deep network [12,24].
Another system called the spatiotemporal attention-based convolutional recurrent neural network (STA-CRNN), consisting of a CNN subnetwork, spatiotemporal attention modules, and an RNN subnetwork, was developed in [25].The study claimed that the classification performance could be greatly improved by such a combination, as these networks ignored the fact that different channels and temporal segments of the feature map extracted from the 12-lead ECG signal contribute differently to cardiac arrhythmia detection.Another classifier proposed to transform the ECG heartbeat signals into images before processing them [26].The proposed ADCGNet (attention-based dual-channel Gabor network) involves using analytical Morlet transform on the obtained ECG images before applying 32 Gabor filters with Sobel edge detection for features enhancement.
Another work proposed an automatic identification and classification of ECG by developing a dense heart rhythm network that combines a 24-layer deep convolutional neural network (DCNN) and bidirectional long short-term memory (BiLSTM) [27].Their networks deeply extract the hierarchical and time-sensitive features of ECG data using different sizes of convolution kernels.Wavelet transform and median filtering were applied to eliminate the influence of noise on the signal.Other researchers suggested a hybrid approach using two-dimensional CNN-LSTM accompanied by continuous wavelet transform (CWT) as the feature extractor [28].To predict arrhythmias, one had to transform the ECG signals from 1D into a scalogram image using CWT.They managed to extract the required features and eliminate signal from irrelevant artifacts.The combination of the time-frequency domain provided better understanding of different arrhythmia characteristics.The obtained images are fed to a CNN-LSTM model, where the CNN works by extracting features, using convolutional operations.On the other hand, LSTM can also extract and memorize relevant features to be used for enhancing the prediction.
The researchers in [29] implemented a similar framework using CNN and LSTM for COVID-19 detection.This system takes a time series data, specifically, statistical information from sick people, and tests it against COVID-19 infection.Another system utilized two techniques for detecting COVID-19 by using both speech (voice, coughing, and breathing) and X-ray images [30].Since detecting COVID-19 symptoms is a challenging task, a combination of CNN and LSTM was used to enhance the prediction of such symptoms.Data augmentation was applied to overcome the problem of small data size when implementing this speech-image-based model.Our research in this study for the ECG classification problem involved extensive experiments on two databases: MIT-BIH and BIDMC.The first scenario focused on decoding five distinct classes in the MIT-BIH database.In the second scenario, we merged both databases, MIT-BIH and BIDMC, into a three-class setting.In both scenarios, our approach uses a CNN-LSTM combination as a DL model for ECG signals classification.FFT extracts features from the ECG signals before feeding them to the DL model.The proposed approach had to be well-trained to separate the subtle differences in the ECG signals and accurately classify each beat into its correct category.
The rest of the paper is organized as follows: Section 2 explains the methodology of database preparation and segmentation, feature extraction, and the proposed approach.Experimental results are shown and discussed in detail in Section 3. Section 4 concludes the paper with proposals for future research directions.

Database Preparation
For the experiments conducted in this study, two different heartbeat ECG databases from the freely available medical research PhysioNet repository have been utilized, namely the MIT-BIH and BIDMC databases.The MIT-BIH database, as shown in Figure 1, has 2 main classes, the normal sinus rhythm (NSR) class, and the arrhythmia (ARR) class; meanwhile, the BIDMC database had only one, the congestive heart failure (CHF) class.In addition, the arrhythmia (ARR) class in the MIT-BIH database consists of 4 different sub-classes, namely supraventricular ectopic beat (S), ventricular ectopic beat (V), fused beat (F), and unknown beats (Q).
extracting features, using convolutional operations.On the other hand, LSTM can also extract and memorize relevant features to be used for enhancing the prediction.
The researchers in [29] implemented a similar framework using CNN and LSTM for COVID-19 detection.This system takes a time series data, specifically, statistical information from sick people, and tests it against COVID-19 infection.Another system utilized two techniques for detecting COVID-19 by using both speech (voice, coughing, and breathing) and X-ray images [30].Since detecting COVID-19 symptoms is a challenging task, a combination of CNN and LSTM was used to enhance the prediction of such symptoms.Data augmentation was applied to overcome the problem of small data size when implementing this speech-image-based model.Our research in this study for the ECG classification problem involved extensive experiments on two databases: MIT-BIH and BIDMC.The first scenario focused on decoding five distinct classes in the MIT-BIH database.In the second scenario, we merged both databases, MIT-BIH and BIDMC, into a three-class setting.In both scenarios, our approach uses a CNN-LSTM combination as a DL model for ECG signals classification.FFT extracts features from the ECG signals before feeding them to the DL model.The proposed approach had to be well-trained to separate the subtle differences in the ECG signals and accurately classify each beat into its correct category.
The rest of the paper is organized as follows: Section 2 explains the methodology of database preparation and segmentation, feature extraction, and the proposed approach.Experimental results are shown and discussed in detail in Section 3. Section 4 concludes the paper with proposals for future research directions.

Database Preparation
For the experiments conducted in this study, two different heartbeat ECG databases from the freely available medical research PhysioNet repository have been utilized, namely the MIT-BIH and BIDMC databases.The MIT-BIH database, as shown in Figure 1, has 2 main classes, the normal sinus rhythm (NSR) class, and the arrhythmia (ARR) class; meanwhile, the BIDMC database had only one, the congestive heart failure (CHF) class.In addition, the arrhythmia (ARR) class in the MIT-BIH database consists of 4 different sub-classes, namely supraventricular ectopic beat (S), ventricular ectopic beat (V), fused beat (F), and unknown beats (Q).The data originally had 162 recordings which are divided among the 3 main classes.The ARR class consists of 96 recordings, the CHF class consists of 30 recordings and the NSR class consists of 36 recordings.For consistency, a sampling frequency of 128 Hz was used for all the recordings.The interclass imbalanced data problem was resolved by using only 30 recordings from each class, ending up with 90 recordings.

Database Segmentation
Segmentation is an important tool that splits the original signal into smaller parts for easier analysis and processing, especially with inputs having a massive number of features.Each one of the recordings consists of a vector of 65,536 features.This vector was segmented into smaller, non-overlapping, and equally sized samples with 500 features.Through the process, the ECG signal recording was segmented into 131 feature vectors Computers 2024, 13, x FOR PEER REVIEW used for all the recordi only 30 recordings fro

Database Segmentat
Segmentation is a easier analysis and pr tures.Each one of the segmented into smalle Through the process, (⌊65,536/500⌋).So, each ples/vectors making a Figure 2 shows a port feature vector of the E

Feature Extraction
An electrocardiog ventral depolarization uses electrodes placed on the intervals betwe transform (FFT), a disc including data filtering been used in many app [32], and ECG classific to extract all the frequ cluding linear and non heart that creates the c plex, and T wave, as sh represented by the P w components, namely t .So, each class of the 3 main (ARR, NSR, CHF) classes will have 3930 samples/vectors making a total of 11,790 feature vectors extracted out of the 90 recordings.Figure 2 shows a portion of the original ECG signal (first 2000 values) and a segmented feature vector of the ECG signal with 500 features.
Computers 2024, 13, x FOR PEER REVIEW 4 of 13 used for all the recordings.The interclass imbalanced data problem was resolved by using only 30 recordings from each class, ending up with 90 recordings.

Database Segmentation
Segmentation is an important tool that splits the original signal into smaller parts for easier analysis and processing, especially with inputs having a massive number of features.Each one of the recordings consists of a vector of 65,536 features.This vector was segmented into smaller, non-overlapping, and equally sized samples with 500 features.Through the process, the ECG signal recording was segmented into 131 feature vectors (⌊65,536/500⌋).So, each class of the 3 main (ARR, NSR, CHF) classes will have 3930 samples/vectors making a total of 11,790 feature vectors extracted out of the 90 recordings.Figure 2 shows a portion of the original ECG signal (first 2000 values) and a segmented feature vector of the ECG signal with 500 features.

Feature Extraction
An electrocardiogram (ECG) records the denominated atrial depolarization (P wave), ventral depolarization (QRS complex wave), and repolarization (T wave) of the heart and uses electrodes placed on the body surface to measure the heart's electrical activity.Based on the intervals between the ECG waves, the heart rate can be calculated.Fast Fourier transform (FFT), a discrete Fourier transform algorithm, solves a wide range of problems, including data filtering, digital signal processing, and partial differential equations.It has been used in many applications such as speech enhancement [31], radar signal processing [32], and ECG classification [12,33].In this study, fast Fourier transform (FFT), will be used to extract all the frequency components that are contributing to the heartbeat signal including linear and nonlinear components.The electric signal generated from the human heart that creates the cardiac cycle comprises three basic components: P wave, QRS complex, and T wave, as shown in Figure 3.The linear components of the heartbeat signal are represented by the P wave, the T wave, and the QRS complex.There are two nonlinear components, namely the PR segment and the ST segment.

Feature Extraction
An electrocardiogram (ECG) records the denominated atrial depolarization (P wave), ventral depolarization (QRS complex wave), and repolarization (T wave) of the heart and uses electrodes placed on the body surface to measure the heart's electrical activity.Based on the intervals between the ECG waves, the heart rate can be calculated.Fast Fourier transform (FFT), a discrete Fourier transform algorithm, solves a wide range of problems, including data filtering, digital signal processing, and partial differential equations.It has been used in many applications such as speech enhancement [31], radar signal processing [32], and ECG classification [12,33].In this study, fast Fourier transform (FFT), will be used to extract all the frequency components that are contributing to the heartbeat signal including linear and nonlinear components.The electric signal generated from the human heart that creates the cardiac cycle comprises three basic components: P wave, QRS complex, and T wave, as shown in Figure 3.The linear components of the heartbeat signal are represented by the P wave, the T wave, and the QRS complex.There are two nonlinear components, namely the PR segment and the ST segment.

The Proposed Approach
This research focuses on developing an automated deep learning model that can classify various classes using a convolutional neural network (CNN) and long short-term memory (LSTM).It has been proven that CNNs can be used for complex applications such as ECG classification [12,33,34].On the other hand, LSTM is an advanced model that was developed from the recurrent neural networks (RNN) by Hochreiter and Schmidhuber [35].LSTM is a sequential deep learning model that allows the information to be preserved in the memory so that the system remembers the related data and uses them to enhance the classification process [36][37][38]  As shown in Figure 5, LSTM works by using the current input xt, the previous cell state of short-term memory Ct − 1 and the previous state of the hidden state ht − 1 to calculate the current cell state Ct and the current hidden state ht in each computational step.

The Proposed Approach
This research focuses on developing an automated deep learning model that can classify various classes using a convolutional neural network (CNN) and long short-term memory (LSTM).It has been proven that CNNs can be used for complex applications such as ECG classification [12,33,34].On the other hand, LSTM is an advanced model that was developed from the recurrent neural networks (RNN) by Hochreiter and Schmidhuber [35].LSTM is a sequential deep learning model that allows the information to be preserved in the memory so that the system remembers the related data and uses them to enhance the classification process [36][37][38]

The Proposed Approach
This research focuses on developing an automated deep learning model that can classify various classes using a convolutional neural network (CNN) and long short-term memory (LSTM).It has been proven that CNNs can be used for complex applications such as ECG classification [12,33,34].On the other hand, LSTM is an advanced model that was developed from the recurrent neural networks (RNN) by Hochreiter and Schmidhuber [35].LSTM is a sequential deep learning model that allows the information to be preserved in the memory so that the system remembers the related data and uses them to enhance the classification process [36][37][38]  As shown in Figure 5, LSTM works by using the current input xt, the previous cell state of short-term memory Ct − 1 and the previous state of the hidden state ht − 1 to calculate the current cell state Ct and the current hidden state ht in each computational step.As shown in Figure 5, LSTM works by using the current input x t , the previous cell state of short-term memory C t−1 and the previous state of the hidden state h t−1 to calculate the current cell state C t and the current hidden state h t in each computational step.
Figure 4 shows how the proposed approach is composed of 5 main layers.FFT, CNN, and LSTM are the three main components of the model.These components are combined to enhance the classification accuracy.The first layer is the input convolutional layer, which is a 1-dimensional layer.A batch normalization layer normalizes the output after the convolution.Finally, the max pooling layer slides a kernel into the result to obtain the maximum value at each location within the input layer.The outputs will propagate through the next 4 convolutional layers in the same manner.To enhance the learning process, LSTM with 200 units has been chosen to allow the model to remember as many complex features as possible for boosting the performance.The output of the LSTM layer will continue to the output layer, which is a dense layer that has k output neurons based on the number of classes for each scenario.The SoftMax function is used as the activation function.In addition, the model stopping criterion for training was set to 30 epochs.Figure 6 shows the accuracy and the loss of the model for both the training and the validation at each epoch.
The training of the model stopped at epoch 10 because no improvement was recorded for three consecutive epochs and to avoid the overfitting problem.Batch size allows the system to take the input data in sequence.This will help the model to learn either at a low rate (slow convergence) or a high rate (Fast convergence).This convergence will affect the error estimation in both training and validation losses and accuracy.It is important to set a proper batch size as it affects the model's performance.Different batch sizes were tested and a batch size of 40 was found to be the optimal value for our model.Figure 4 shows how the proposed approach is composed of 5 main layers.FFT, CNN, and LSTM are the three main components of the model.These components are combined to enhance the classification accuracy.The first layer is the input convolutional layer, which is a 1-dimensional layer.A batch normalization layer normalizes the output after the convolution.Finally, the max pooling layer slides a kernel into the result to obtain the maximum value at each location within the input layer.The outputs will propagate through the next 4 convolutional layers in the same manner.To enhance the learning process, LSTM with 200 units has been chosen to allow the model to remember as many complex features as possible for boosting the performance.The output of the LSTM layer will continue to the output layer, which is a dense layer that has k output neurons based on the number of classes for each scenario.The SoftMax function is used as the activation function.In addition, the model stopping criterion for training was set to 30 epochs.Figure 6 shows the accuracy and the loss of the model for both the training and the validation at each epoch.The training of the model stopped at epoch 10 because no improvement was recorded for three consecutive epochs and to avoid the overfitting problem.Batch size allows the system to take the input data in sequence.This will help the model to learn either at a low rate (slow convergence) or a high rate (Fast convergence).This convergence will affect the error estimation in both training and validation losses and accuracy.It is important to set a proper batch size as it affects the model's performance.Different batch sizes were tested and a batch size of 40 was found to be the optimal value for our model.Figure 4 shows how the proposed approach is composed of 5 main layers.FFT, CNN, and LSTM are the three main components of the model.These components are combined to enhance the classification accuracy.The first layer is the input convolutional layer, which is a 1-dimensional layer.A batch normalization layer normalizes the output after the convolution.Finally, the max pooling layer slides a kernel into the result to obtain the maximum value at each location within the input layer.The outputs will propagate through the next 4 convolutional layers in the same manner.To enhance the learning process, LSTM with 200 units has been chosen to allow the model to remember as many complex features as possible for boosting the performance.The output of the LSTM layer will continue to the output layer, which is a dense layer that has k output neurons based on the number of classes for each scenario.The SoftMax function is used as the activation function.In addition, the model stopping criterion for training was set to 30 epochs.Figure 6 shows the accuracy and the loss of the model for both the training and the validation at each epoch.The training of the model stopped at epoch 10 because no improvement was recorded for three consecutive epochs and to avoid the overfitting problem.Batch size allows the system to take the input data in sequence.This will help the model to learn either at a low rate (slow convergence) or a high rate (Fast convergence).This convergence will affect the error estimation in both training and validation losses and accuracy.It is important to set a proper batch size as it affects the model's performance.Different batch sizes were tested and a batch size of 40 was found to be the optimal value for our model.

Results and Discussion
Our research through the ECG classification problem involved extensive experiments on two databases: MIT-BIH and BIDMC.The first scenario focused on decoding five distinct classes in the MIT-BIH database (see Figure 7).Four of these classes, the S, V, F, and Q, represented various types of arrhythmias, each with their own irregularities in the ECG recording.The fifth, the NSR, represented the normal beat, the baseline against which the other classes were judged.Our approach had to be well-trained to separate these subtle differences and accurately classify each beat into its correct category.In the second scenario, we merged both databases, MIT-BIH and BIDMC, into a three-class setting (see Figure 8).Here, the ARR and NSR classes from MIT-BIH remained, but a new class emerged: CHF, the congestive heart failure from the BIDMC database.

Results and Discussion
Our research through the ECG classification problem involved extensive experiments on two databases: MIT-BIH and BIDMC.The first scenario focused on decoding five distinct classes in the MIT-BIH database (see Figure 7).Four of these classes, the S, V, F, and Q, represented various types of arrhythmias, each with their own irregularities in the ECG recording.The fifth, the NSR, represented the normal beat, the baseline against which the other classes were judged.Our approach had to be well-trained to separate these subtle differences and accurately classify each beat into its correct category.In the second scenario, we merged both databases, MIT-BIH and BIDMC, into a three-class setting (see Figure 8).Here, the ARR and NSR classes from MIT-BIH remained, but a new class emerged: CHF, the congestive heart failure from the BIDMC database.

Results and Discussion
Our research through the ECG classification problem involved extensive experiments on two databases: MIT-BIH and BIDMC.The first scenario focused on decoding five distinct classes in the MIT-BIH database (see Figure 7).Four of these classes, the S, V, F, and Q, represented various types of arrhythmias, each with their own irregularities in the ECG recording.The fifth, the NSR, represented the normal beat, the baseline against which the other classes were judged.Our approach had to be well-trained to separate these subtle differences and accurately classify each beat into its correct category.In the second scenario, we merged both databases, MIT-BIH and BIDMC, into a three-class setting (see Figure 8).Here, the ARR and NSR classes from MIT-BIH remained, but a new class emerged: CHF, the congestive heart failure from the BIDMC database.

Results of the First Scenario
Our research delved into classifying five distinct heart rhythm types from the MIT-BIH database, deploying diverse machine learning ML and deep learning DL algorithms.Examples of ML algorithms used here are as follows: principal component analysis (PCA), which scrutinizes the ECG signal's complexities, identifying key patterns and reducing its dimensionality [39,40]; independent component analysis (ICA) which unmasks hidden or overlapping signals lurking within the ECG [41,42]; random forest (RF) which builds a team of decision trees, each analyzing the ECG signal slightly differently, and then votes for the most likely heart rhythm category [43][44][45]; and K-best algorithm that picks the K most relevant features from the ECG signal, focusing on the distinctive features that matter most for classification [46].As for deep learning algorithms, both CNN and LSTM were used.The performance comparison among the literature results, implemented algorithms, and the proposed approach are shown in Table 1.In this scenario, the MIT-BIH database had five distinctive classes called normal sinus rhythm (N), supraventricular ectopic beat (S), ventricular ectopic beat (V), fused beat (F), and unknown beats (Q).The results of both approaches from [44] were calculated from the confusion matrices provided in their article.PCA-ICA+RF and FFT+CNN algorithms were implemented for comparison purposes.As for the proposed approach, FFT+CNN-LSTM, it achieved the highest accuracy (97.4%) among all the compared models.Incorporating the LSTM into the CNN slightly helped improve the performance of the proposed approach.The confusion matrix of the FFT+CNN-LSTM approach is shown in Table 2.

Results of the Second Scenario
In this scenario, three classes were used in the conducted experiments for classification.Two main classes from the MIT-BIH database were used (normal sinus rhythm, NSR, and arrhythmia, ARR) together with the third class from the BIDMC database (congestive heart failure, CHF).The categories of the main classes from both databases are already shown in Figure 1.The data originally had 162 recordings for the three classes, where ARR had 96 recordings, CHF had 30 recordings, and NSR had 36 recordings.To solve this imbalanced data problem, only 30 recordings from each class were used for extracting the feature vectors making a total of 90 recordings.As mentioned in the data preparation section, each recording was segmented into equally size feature vectors of length 500, making a total of 11,790 feature vectors.Table 3 compares the performance of various existing approaches against the proposed approach.The proposed FFT+CNN-LSTM approach outperformed all the models by achieving 99.2% performance.The result of FFT+CNN-LSTM approach was obtained using 9432 (80%) vectors for training and 2358 (20%) vectors for testing.Table 4 illustrates the confusion matrix of the proposed FFT+CNN-LSTM approach.The recorded execution time for classification of all the test vectors/samples was 213 s using Asus ROG STRIX G17 laptop with CPU: Intel(R) Core(TM) i7-10750H CPU @2.60 GHz; GPU: NIVIDA GeForce RTX 2060 6 GB; RAM: 16 GB.As can be observed from the provided results in the previous tables, by successfully differentiating between these cardiac heart conditions across different ECG signals, our proposed approach further solidified its potential for real-world application in the diverse area of heart disease diagnosis.Convolutional neural networks (CNNs) and long shortterm memory (LSTM) network as examples of artificial intelligence (AI) were shown to be superior algorithms for revolutionizing medical diagnosis, but like any powerful tool, they come with a double-edged sword.On the one hand, Al's potential to improve accuracy, speed, and accessibility is undeniable.CNNs, for instance, excel in analyzing medical signals such as ECG and EEG signals or medical images like X-rays, MRI scans, and CT scans, detecting subtle patterns that might elude even the most experienced eyes.This translates to earlier diagnoses, better treatment decisions, and ultimately can significantly influence treatment outcomes.AI algorithms, meanwhile, can process vast amounts of clinical data, identifying risk factors and predicting disease progression with amazing precision.This allows doctors to personalize care and prioritize resources efficiently.However, we must be aware of the potential drawbacks and risks.Firstly, AI algorithms are only as good as the data they are trained on.Biased datasets can lead to biased algorithms, disadvantaging people belonging to certain demographics, races, or classes.Secondly, opaque AI models offer little transparency in their decision making, potentially eroding trust between patients and doctors.Thirdly, overreliance on AI could diminish the critical role of human expertise and intuition in diagnosis.The final diagnosis should always be made by a qualified doctor, with AI acting as an assisting tool, not a replacement.
In a paper by Phillips et al., released by NIST, they introduce four principles that are believed to be fundamental properties for explainable AI systems [50].They recognized that not all AI systems may require explanations.However, for those AI systems that are intended or required to be explainable, they should adhere to four principles: they should provide explanation by evidence or reason for their outputs; they should be meaningful to their users; they should provide explanation accuracy that reflects the system's process; they should have knowledge limits, where the system should only operate under conditions for which it was designed and when it reaches sufficient confidence in its output.
The ENISA report on securing machine learning algorithms dispels the notion that a single, uniform strategy for security is applicable across the board [51].Their research findings suggest organizations relying on AI should conduct thorough, individual analyses of their specific systems.Different algorithms have unique vulnerabilities, and applying the same set of controls across the board is not effective because different security measures carry different trade-offs.Some might enhance security, but at the cost of speed.Others might boost performance, but leave vulnerabilities exposed.To strike the right balance between security, privacy, and performance, ENISA emphasizes the importance of targeted risk assessments tailored to each unique AI system.Therefore, the entire cybersecurity and privacy strategies must be meticulously tailored to the context and reality of the individual organization.This way, organizations can make informed decisions about the security controls they implement, ensuring optimal protection without sacrificing performance or privacy.

Conclusions
The automatic and accurate diagnosis of the irregularities in the ECG signal is essential and crucial for a patient's life.A deep-learning-based system using both convolution neural networks (CNNs) and long short-term memory networks (LSTMs) was developed to predict different irregularities in the heartbeats for various heart diseases.Experiments were conducted on ECG data from PhysioNet, the medical research repository.FFT transformation was applied as a preprocessing stage before feeding the results to the deep learning model for classification.The proposed approach was trained and tested in two scenarios: one with five classes (including normal beats and four types of arrhythmias), and another with three classes (normal, arrhythmia, and congestive heart failure).Experimental results showed that the proposed FFT+CNN-LSTM approach outperformed other machine learning and deep learning models in both scenarios, where it achieved an accuracy of 97.6% (five classes) and 99.20% (three classes), respectively.Our deep-learning-based system proved adept at identifying these heart conditions, potentially paving the way for earlier diagnoses and improved patient outcomes.The proposed model uses the actual values of the heartbeat signal as a one-dimensional vector instead of using the statistical readings from the intervals, such as the PR, QT, or QRS intervals (presented in Figure 3) of the ECG signal.This will make the system robust to the change in the acquisition machine where slight changes, shifts, and noise might affect the recorded signals.
In conclusion, it is evident that deep neural networks have the potential to transform medical diagnosis, but their integration requires careful consideration and ethical implementation.Any future work must focus on addressing main drawbacks and issues such as data biases, data privacy, and security, improving explainability, and ensuring responsible use alongside human expertise.Only then, we can truly harness the power of such tools to create a healthier future for mankind.

Figure 1 .
Figure 1.The used databases with corresponding classes.

Figure 1 .
Figure 1.The used databases with corresponding classes.

Figure 2 .
Figure 2. Part of the orig

Figure 2 .
Figure 2. Part of the original ECG signal (top) vs. example of the segmented ECG signal (bottom).

Figure 2 .
Figure 2. Part of the original ECG signal (top) vs. example of the segmented ECG signal (bottom).

Figure 3 .
Figure 3.The characteristics of a normal heartbeat.
. In addition, fast Fourier transform was used as a feature extraction technique to extract the salient information from the ECG signal.The extracted feature vectors are fed to the CNN-LSTM model for classification.An illustration of the proposed model using CNN-LSTM is shown in Figure 4.The model has five convolution layers.Each of the 5 layers consists of convolution, batch normalization, and max pooling layers.The output of the last convolution layer is fed to the LSTM layer before classification.The output layer has k outputs ( 3 or  5), depending on the number of classes in the applied scenario.

Figure 4 .
Figure 4.The architecture of the proposed approach.

Figure 3 .
Figure 3.The characteristics of a normal heartbeat.

13 Figure 3 .
Figure 3.The characteristics of a normal heartbeat.
. In addition, fast Fourier transform was used as a feature extraction technique to extract the salient information from the ECG signal.The extracted feature vectors are fed to the CNN-LSTM model for classification.An illustration of the proposed model using CNN-LSTM is shown in Figure 4.The model has five convolution layers.Each of the 5 layers consists of convolution, batch normalization, and max pooling layers.The output of the last convolution layer is fed to the LSTM layer before classification.The output layer has k outputs ( 3 or  5), depending on the number of classes in the applied scenario.

Figure 4 .
Figure 4.The architecture of the proposed approach.

Figure 4 .
Figure 4.The architecture of the proposed approach.

Figure 5 .
Figure 5.The architecture of long short-term memory (LSTM) network.

Figure 5 .
Figure 5.The architecture of long short-term memory (LSTM) network.

Figure 6 .
Figure 6.Model's accuracy and loss for the training and validation processes.

Figure 6 .
Figure 6.Model's accuracy and loss for the training and validation processes.

Figure 7 .
Figure 7. Examples of the 5 classes (4 arrhythmias and normal ECG signals) used in the first scenario's experiments.

Figure 7 .
Figure 7. Examples of the 5 classes (4 arrhythmias and normal ECG signals) used in the first scenario's experiments.

Figure 6 .
Figure 6.Model's accuracy and loss for the training and validation processes.

Figure 7 .
Figure 7. Examples of the 5 classes (4 arrhythmias and normal ECG signals) used in the first scenario's experiments.

Table 1 .
Different approaches performance results compared to the proposed approach for the first scenario.

Table 2 .
Confusion matrix result for the proposed FFT+CNN-LSTM approach for the first scenario.

Table 3 .
Comparison between different approaches for the second scenario.

Table 4 .
Confusion matrix results for the proposed FFT+CNN-LSTM approach for the second scenario.