Automatic ECG Diagnosis Using Convolutional Neural Network

: Cardiovascular disease (CVD) is the most common class of chronic and life-threatening diseases and, therefore, considered to be one of the main causes of mortality. The proposed new neural architecture based on the recent popularity of convolutional neural networks (CNN) was a solution for the development of automatic heart disease diagnosis systems using electrocardiogram (ECG) signals. More speciﬁcally, ECG signals were passed directly to a properly trained CNN network. The database consisted of more than 4000 ECG signal instances extracted from outpatient ECG examinations obtained from 47 subjects: 25 males and 22 females. The confusion matrix derived from the testing dataset indicated 99% accuracy for the “normal” class. For the “atrial premature beat” class, ECG segments were correctly classiﬁed 100% of the time. Finally, for the “premature ventricular contraction” class, ECG segments were correctly classiﬁed 96% of the time. In total, there was an average classiﬁcation accuracy of 98.33%. The sensitivity (SNS) and the speciﬁcity (SPC) were, respectively, 98.33% and 98.35%. The new approach based on deep learning and, in particular, on a CNN network guaranteed excellent performance in automatic recognition and, therefore, prevention of cardiovascular diseases.


Introduction
For many years, doctors have been aware that cardiovascular diseases constitute a class of diseases considered to be one of the main causes of mortality [1]. Cardiovascular diseases occur in the form of myocardial infarction (MI). Myocardial infarction, commonly referred to as heart attack, stands for the failure of heart muscles to contract for a fairly long period of time. Using appropriate treatment within an hour of the start of the heart attack, the mortality risk of the person who suffers from a heart attack in progress can be reduced.
When a heart condition occurs, the first diagnostic check consists of an electrocardiogram (ECG), which, therefore, is the main diagnostic tool for cardiovascular disease (CVD). The electrocardiograph detects the electrical activity of the heart during the test time, which is then represented on a graphic diagram that reflects cyclical electrophysiological events in the cardiac muscle [2]. By conducting a careful analysis of the ECG trace, doctors can diagnose a probable myocardial infarction. It is important, however, to underline that the sensitivity and specificity of manual detection of acute myocardial infarction are 91% and 51%, respectively [3].
Developing a computer-aided system to automatically detect MI would help the cardiologists make better decisions. Hence, lately, various studies have been conducted on automatic MI detection.
Given the nonlinearity of the heart anomaly classification, techniques based on neural networks have recently been adopted. In a precedent study, the authors proposed a training technique based Many other papers extract functionality from the PQRST complex and take advantage of machine learning algorithms based on other techniques. In [19], the authors used rough sets (RS) and quantum neural network (QNN) to recognize electrocardiogram (ECG) signals. For feature extraction (Peaks-P, Q, R, S, and T waves), after normalization of signals, the wavelet transform (WT) was used. Then, the attribute reduction of RS was applied as preprocessor so that redundant attributes and conflicting objects could be deleted from the decision-making table but retain efficient information losslessly. After that, the classification modeling and forecasting test based on QNN was trained using a gradient descent method; the accuracy of these systems was 91.7%.
In [20], RR interval is calculated using the recordings from the MIT-BIH Arrhythmia Database. MLPNN and SVM (Support Vector Machine) classifiers are compared in this paper. Results show that MLPNN is good for testing performance, while SVM shows good training performance.
In [21], the authors proposed a survey on the classification of ECG signals based on machine learning techniques other than CNN. Table 1 of the study highlights the main techniques for classifying ECG signals, including  the number of features, feature names, pre-processing techniques, database, modeling techniques, performance measures used, and accuracy achieved in each paper.
This paper proposed a low-complexity solution for automatic heart disease recognition based on the direct application of a CNN-based classification network to EGC signals, thus bypassing any possible heart disease ECG signals from the time domain to other domains, e.g., frequency domain as MFCC (Mel-Frequency Cepstral Coefficients), wavelet, etc. This paper evaluated the performance of a classifier in the following three classes: "normal", "atrial premature beat", and "premature ventricular contraction". The obtained performances were remarkable.

ECG Signal and Dataset
From a graphic or numerical point of view, electrocardiogram (ECG) represents the electrical activity of the heart during its operation. The most important elements of an ECG waveform, which repeats for each cardiac cycle, are shown in Figure 1. ECG is carried out to provide information about different heart diseases that a person can suffer from [22], in order to guarantee effective therapy.
According to international conventions, the specific points that are identified in the trace of an electrocardiogram are labeled with the letters P, Q, R, S, T and, in particular, are the following: • P wave: the first wave that occurs in the ECG cycle, a small deflection that represents atrial depolarization or most commonly called "atrial contraction"; • T wave: represents the depolarization of ventricles or most commonly called "ventricular relaxation"; • Q, R, and S waves: together, these waves form the so-called QRS complex. The QRS complex represents the contraction of the ventricles or, technically speaking, the depolarization complex of the ventricles. In particular, the Q wave represents the depolarization of the interventricular septum, the R wave reflects the depolarization of the main mass of the ventricles, and the S wave is the final depolarization of the ventricles at the base of the heart.
Taken together, the P, Q, R, S, and T waves make up the so-called PQRST complex. Cardiologists denote the interval between two PQRST complexes by the term "R-R interval", which corresponds to a cardiac cycle.
Other parameters, which have been extensively used to make medical diagnoses using the ECG trace, are: • PR interval or PQ interval: the PR interval is a stretch formed by the P wave and the PR segment (rectilinear stretch) that begins with the P wave, that is, during the first deflection, and ends at the QRS complex. This interval indicates the time that the depolarization wave takes propagating from the atrial sinus node along the part of the electrical conduction system of the heart present on the myocardium; • ST segment, i.e., the time between the end of the QRS complex and the start of the T wave; • QT interval, i.e., the time between the beginning of the QRS complex and the end of the T wave, which is the electrocardiographic manifestation of ventricular depolarization and repolarization [23].
When an ECG is performed on a patient suffering from heart disease, the diagram outlines a different waveform from that shown in Figure 1. For example, the QT interval may be longer than normal, indicating that the patient may be suffering from a ventricular arrhythmia; the ST segment may have an elevation, which may be associated with myocardial infarction [24,25].
One of the most commonly used databases on the field is PhysioNet [26,27]; in particular, the MIT-BIH arrhythmia database was used in this study, as shown in Table 1. A large collection of recorded physiologic signals is available under the Open Data Commons-Public Domain Dedication & Licence v1.0 [28].
The PhysioNet database is composed of 48 ECG recordings of two-channel ambulatory, each 30 min long, associated with different clinical pathologies (e.g., ventricular and supraventricular arrhythmia, ventricular tachyarrhythmia, atrial fibrillation, etc.).
The database contained ECG recordings from 47 subjects: 25 males aged between 32 and 89 years and 22 females aged between 23 and 89 years. Twenty-three recordings were chosen at random from a set of 4000 24-h ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample. The recordings were digitized at 360 samples per second per channel with an 11-bit resolution over a 10 mV range.
Cardiologists independently annotated each recording; disagreements were resolved to obtain the computer-readable reference annotations for each beat (approximately 110,000 annotations in all) included in the database.
The database is made up of three classes: Premature ventricular contraction. Figure 2 shows the differences in the ECG wave between the normal beat, the premature atrial beat, and the premature ventricular contraction. The first graph of Figure 2 shows the ECG wave of a normal beat, i.e., a heartbeat not affected by pathologies. This graph could be traced back to the "ideal" one in Figure 1. The second graph shows the ECG wave affected by a premature atrial beat or premature atrial contraction (PAC). It was a common cardiac dysrhythmia characterized by premature heartbeats originating in the atria. While the sinoatrial node typically regulated the heartbeat during normal sinus rhythm, PACs occurred when another region of the atria depolarized before the sinoatrial node and thus triggered a premature heartbeat. Therefore, the difference from a normal ECG wave lied in the PR segment that was formed prematurely. In Figure 2, "RR longer" stands for the time between QRS complexes, while "SA reset" indicates the reformation of electrical impulse beginning in the sinoatrial (SA) node and propagating to the atrioventricular (AV) node.
The third graph shows an ECG wave affected by premature ventricular contraction (PVC). It was a relatively common event where the heartbeat was initiated in the ventricles rather than by the sinoatrial node.
From what has been said, it is clear now how an automatic diagnosis system must perform in detecting these differences in duration and shape of the waves and segments that make up the PQRST complex.
The used dataset was not recorded by the authors but originated from a 2001 study by Moody et al. [26]. Therefore, the authors were not responsible for the applied data collection procedure. Original authors of the database stated that all ethical requirements had been followed. Moreover, the database is available online for an extended period now and has been used extensively in many recent publications (see Table 1). Finally, all records in the database have been anonymized.

CNN General Characteristics and Architecture Adopted
Convolutional neural networks, or CNNs, are a specialized kind of neural network for processing data that has a known grid-like topology. Examples include time-series data, which may be considered as a 1-D grid taking samples at regular time intervals, and image data, which may be considered as a 2-D grid of pixels.
The general characteristics and architecture of this network are described in [29], where the only difference is the sample rate used. In this study and also in [30], the sample rate was 44.1 kHz instead of 8 kHz.
The deep convolutional neural network is mainly composed of: 1. 1D convolution layers; 2.
Only in the first convolution, a convolutional kernel composed of 80 elements was used, with respect to the subsequent convolution layers where it was set to 3, with the aim of reducing the computational cost.
After each convolution, Batch normalization was carried out to avoid the explosion of the parameters and the phenomenon of "vanishing gradients". Batch normalization allowed training deep networks and was applied after each convolutional layer and before performing the ReLU (rectified linear activation function). The level of pooling in CNN, placed before RELU, reduced the problem of data overfitting by the network, taking the input size by half the actual input.
Unlike the classic CNN, which use fully connected neurons as their output layer, this network performed a single AvgPool and then a LogSofMax softmax, followed by a natural logarithm log (softmax (x)).
The structure of the proposed network is illustrated in Table 2 below.  Figure 3 shows the structure of the proposed convolutional neural network. Deep neural networks could both extract and classify the representation of features, rather than perform these two functions separately. After being processed, the ECG recording was sent to the CNN network as an input for the classification of pathologies by means of the ECG signal in three classes: normal, atrial premature beat, and premature ventricular contraction, based on convolutional neural networks (CNN).

Training/Validation and Testing Dataset
Neural network input consists of 30-s segments where every second of ECG recording is equivalent to 360 samples, for a total of 10,800 samples.
This dataset was subsequently divided into two different datasets, see Figure 4 below: Figure 4. The distribution of ECG segments used for learning (70%) and testing (30%). Thirty percent of the learning dataset was used for the validation of the network.
• Training/validation set, consisting of 995 segments for the "normal" class, 234 segments for the "premature ventricular contraction" class, and 93 segments for the "atrial premature beat" class. The 70% of this set was used for the training, and the other 30% was used for the testing; • Testing set, consisting of 426 segments for the "normal" class, 101 segments for the "premature ventricular contraction" class, and 40 segments for the "atrial premature beat" class.
At first, the network was trained by entering the data relating to the "training set" as input, then it was validated using the "validation set", in order to evaluate the performance of the neural network (the percentage of loss and accuracy). Finally, the "testing set" was applied to validate and verify, through the accuracy estimate, the robustness of the neural network to data external to the training/validation set.

Methods
As previously stated, for the purposes of performance evaluation, the proposed study used the PhysioNet database, typically employed as a reference database in the automatic classification of cardiac pathologies based on ECG signals. From this dataset, the data relating to learning and testing of the neural network was obtained for the assessment of classification accuracy. Accuracy indicated that the network performed great classification of the two classes related to heart disease ("atrial premature beat" and "premature ventricular contraction") and the one relating to the state of good health. Based on the results obtained from the confusion matrix, it was possible to evaluate the proposed method, applying the statistical classification functions [31]: sensitivity, also known as a true positive ratio (TPR), specificity, also known as a true negative ratio (TNR), Fall -Out, also known as a false positive ratio (FPR), and the measure of the test accuracy.
Hence, it was possible to define the meaning of each statistical classification parameter described above: sensitivity indicated the percentage of ECG recordings belonging to a specific category and correctly classified in that category; specificity measured how often the classifier could classify the ECG recordings not belonging to that category; Fall-Out indicated that ECG recordings were considered to belong to a specific category, but, in reality, they were not part of it; false discovery ratio indicated that ECG recordings were not considered to belong to a specific category but that, in reality, they were part of it; F1 score took into account precision and recovery of the test, where precision was the number of true positives (TP) divided by the number of all positive results, i.e., true positives (TP) plus false positives (FP); while recovery was the number of true positives (TP) divided by the number of all tests that should have been positive, that is, true positives (TP) plus false negatives (FN).
The following equations relate to the classification functions previously described.

Test Results
In this section, the results of training and subsequent validation of the neural network are presented and discussed. Figure 5a,b represents the progress of the training and validation loss and the progress of the training and validation accuracy, respectively. As the graphs show, after 100 epochs, training and validation losses stabilized at a value close to zero (Figure 5a), while training and validation accuracy stabilized at 100%.
Such data were very encouraging, as it was understood that there was a good percentage of accuracy in the classification of the three classes described above.
In order to evaluate the performance of the CNN network with ECG sequences external to the training dataset, the accuracy obtained with the "testing set" was assessed. Figure 6 shows the relative confusion matrix.
The matrix highlighted an average classification accuracy level of 98.33%. The results obtained in terms of the statistical parameters described in Section 5 are shown in Table 3.

Cross-Validation Analysis
In this paragraph, we have described the method used for the cross-validation of data, which was used to obtain reliable estimates of the generalization error of the model, or how the CNN network behaves on data other than learning data.
In particular, K-fold [32] cross-validation was used in this study, which involved randomly dividing the training dataset into k parts without reintegration: the K-1 parts were used for training the model, and a part was used for testing. This procedure was repeated k times so as to obtain k models and performance estimates.
Subsequently, the average performance of the models was calculated on the basis of the different independent subdivisions to obtain an estimate of the performance that was less sensitive to the partitioning of the training data.
Since k-fold cross-validation is a resampling without reintegration technique, the advantage of this approach is that each sample point will be part of the training and test datasets only once, which provides a lower variance estimate of the template performance.
For this study, the training dataset was divided into ten parts, K = 10, and during the ten iterations, nine parts were used for training, and one part was used as a test set for model evaluation. In addition, the estimated performance E i (for example, the accuracy of the classification) of each part was then used to calculate the average estimated performance E of the model. Figure 7 depicts the concept of the k-fold cross-validation technique. The average accuracy and standard deviation for the model used in this study were 96.8 ± 1.2%.  Table 4 shows a comparison between our method and other methods in terms of feature extraction (FE), the model used, the system's accuracy, and the statistical classification accuracy.

Discussion
Hereinafter, the differences between this work and the state-of-the-art have been discussed. In [33,34], the authors used the extraction of the decision tree (DT) and R-peak (RP) as features and did not apply convolutional neural networks (CNN) but rather the discrete wavelet transformation (DWT) and the feed-forward neural network (FFNN). The authors claimed an average accuracy of 96.56% and 87.66%, respectively, while, in our study, the average accuracy was equal to 98.1%. This result was higher than the result proposed in [33,34].
Compared to the approaches proposed in [5,[14][15][16]33,34], our method had higher classification performances. As far as the studies proposed in [17,18] are concerned, it is evident that they had quite comparable performances, but they used more hidden layers than our study, with a consequent increase in computation costs. In addition, they did a preprocessing of data using wavelet transformation, which implied an additional computational cost. From the point of view of the structure of the neural network, in [17], in particular, five layers (two convolution layers, two down sampling layers, and one full connection layer) plus the output layer formed by Softmax were used for classification; however, we used another structure (previously described), which was more robust to the "vanishing gradients" phenomenon. In addition, to ensure that the model was correct, we applied the K-fold technique (previously described) for cross-validation, obtaining an average accuracy of 96.8% and a standard deviation of ±1.2%.
Usually, the processing unit implements the automatic disease classification algorithm described above, showing the result of the diagnosis on display. A possible alternative is to transmit in real-time ECG sequences via data cellular connection (4G dongle) [35,36] to a cloud platform, where an automatic ECG diagnosis is implemented in "as a service" mode. The robustness to the IP (Internet Protocol) packet loss, typical of a 4G data connection, was verified by sending the test database several times from a transmitter to a 4G data receiver. The classification results confirmed the same values obtained in the case of processing on the local board.

Conclusions
This paper proposed an automated heart disease recognition technique based on recent and innovative CNN networks. The proposed technique had high accuracy and had low complexity of implementation. This approach harnessed the potential of deep learning to capture the typical characteristics of given heart disease in the ECG signal domain.
Using the "validation set", the proposed method yielded the following results: By comparing and contrasting various methods in the "Discussion" section, we could affirm that the method applied in the present paper yielded considerably better performances than those of the state-of-the-art.