Myocardial Infarction Classiﬁcation Based on Convolutional Neural Network and Recurrent Neural Network

: Myocardial infarction is one of the most threatening cardiovascular diseases for human beings. With the rapid development of wearable devices and portable electrocardiogram (ECG) medical devices, it is possible and conceivable to detect and monitor myocardial infarction ECG signals in time. This paper proposed a multi-channel automatic classiﬁcation algorithm combining a 16-layer convolutional neural network (CNN) and long-short term memory network (LSTM) for I-lead myocardial infarction ECG. The algorithm preprocessed the raw data to ﬁrst extract the heartbeat segments; then it was trained in the multi-channel CNN and LSTM to automatically learn the acquired features and complete the myocardial infarction ECG classiﬁcation. We utilized the Physikalisch-Technische Bundesanstalt (PTB) database for algorithm veriﬁcation, and obtained an accuracy rate of 95.4%, a sensitivity of 98.2%, a speciﬁcity of 86.5%, and an F1 score of 96.8%, indicating that the model can achieve good classiﬁcation performance without complex handcrafted features.


Introduction
Myocardial infarction is a cardiovascular disease caused by myocardial insufficient blood supply or even myocardial necrosis due to coronary artery occlusion. According to statistics from the American Health Association, nearly 720,000 Americans suffer from myocardial infarction each year [1]. In the early stage of this disease, patients with myocardial infarction usually show symptoms such as chest pain and chest tightness, but some patients still have no obvious symptoms, which makes it difficult to treat in time, thus threatening life [2]. Therefore, how to achieve the early diagnosis of myocardial infarction has a significant clinical value, and has become a research topic of many scholars.
Electrocardiogram (ECG) is one of the routine examination methods for myocardial infarction [2]. In the field of ECG signal processing, many traditional studies have focused on the feature extraction of myocardial infarction ECG signals including time domain, frequency domain, wavelet transform, and other characteristics. Sun et al. extracted ST segments and combined support vector machine (SVM) and multi-instance learning to complete myocardial infarction ECG classification [3]. Arif et al. started with the three time-domain features of T wave amplitude, Q wave amplitude, and ST segment offset level, and used the K-nearest neighbor (KNN) algorithm to achieve the detection and location of myocardial infarction [4]. Sharma et al. obtained the frequency domain features of ECG such as sample entropy, and applied SVM and the KNN algorithm to classify different types of myocardial infarction ECG [5]. Similar to Arif, Safdarian et al. extracted T wave characteristics from ECG signals, and employed pattern recognition methods for myocardial infarction classification [6]. Although the above algorithms can achieve good results, as ECG signals are very weak and susceptible to noise interference, feature point recognition cannot be guaranteed, which has become a limitation of such methods.
In recent years, with the development of deep learning, convolutional neural network (CNN), and recurrent neural network (RNN) have achieved great success in image classification, object detection, and speech recognition. Deep learning methods such as CNN automatically learn and extract features through deep neural networks, independent of the acquisition of handcrafted features and expert knowledge [7][8][9]. In the area of ECG signal processing, compared with the traditional methods, the deep learning method avoids ECG handcrafted feature extraction to simplify the implementation process to a certain extent, and has been applied by scholars. Xiong et al. completed the classification of atrial fibrillation ECG through a 16-layer CNN [10]. Due to the temporal characteristics of ECG signals, the long-short term memory (LSTM) in RNN also performs well. Saadatnejad et al. used LSTM to complete the classification of arrhythmia ECG [11]; in the classification of myocardial infarction ECG, Reasat T et al. extracted ECG signals from II, III, and AVF leads, and performed preprocessing and classification through shallow CNN [12]; moreover, Acharya et al. established a deep CNN to classify the noisy and denoised myocardial infarction ECG [13]. With the popularization of handheld electrocardiographs, smart bands, and smart watches, access to a single-lead ECG is possible in personal and home detection. Thus, it is important to detect and prevent myocardial infarction through single-lead ECG. Since there have only been a few studies on single-lead myocardial infarction ECG, there is still a very large space for exploration. Therefore, this paper proposed a deep learning method combining CNN and RNN, established a multi-channel CNN-LSTM network structure, segmented the pre-processed ECG signal, extracted spatial features in the multi-channel convolution network, and acquired the temporal characteristics through LSTM. This method unified the feature extraction and classification procedures, realized the automatic classification of single-lead myocardial infarction ECG, and made in-depth research and analysis on the model convolution kernel, optimizer, and other parameters.

One-Dimensional CNN
CNN is a feedforward neural network with the characteristics of sparse connectivity and weight sharing. A typical CNN model consists of a series of convolutional layers, pooling layers, and fully-connected layers. As an important part of CNN, the convolutional layers convolute the output feature map of the previous layer and construct the output feature map after the activation function. The mathematical model can be expressed as Equation (1): where M j represents the input feature map; l denotes the number of layer; k is the convolution kernel; and b is the network bias vectors. The pooling layers decrease the dimension of the upper layer feature map and achieve the purpose of information filtering. During practical application, max-pooling is often used, and its mathematical model is shown as Equation (2): where q l i (t) represents the value of t neuron of i feature map in layer l; W is the size of the pooling area; and P l+1 i ( j) is the responding position of the neuron in layer l + 1.
In the fully-connected layer, each neuron node is connected to all nodes of the previous layer of neurons, and the feature classification is performed using a specific activation function.

RNN
Compared with CNNs, RNNs are suitable for processing sequence signals. The output of the neuron of a RNN is determined by the current input of the neuron and the output of the neuron at the previous moment. Assume the input x = (x 1 , x 2 , . . . .., x t ), the hidden layer h = (h 1 , h 2 , . . . .., h t ), the output y = (y 1 , y 2 , . . . .., y t ), then for time t: where W is the weight value; b is the bias vector; and H is the activation function in the hidden layer. However, the traditional RNN has a problem called the vanishing gradient. One way to solve the problem is by using the LSTM model. The basic unit of LSTM is represented by cells, and the input, forget, and output gates control the behavior of the cells to achieve the long-term storage of memory information. Figure 1 shows the workflow of an LSTM model [14]. According to the above workflow, we can calculate as follows: where σ represents sigmoid function; i t , f t , and o t are the input gate, the forget gate, and the output gate, respectively; c t is the cell activation vector with the same length of vector h t in the hidden layer; and W ci , W c f , and W co are the weight matrix of peephole connections.

Network Model
After comprehensive analysis and study of the characteristics of the ECG signals, this paper designed the multi-channel CNN-LSTM model structure as shown in Figure 2. The input layer was the extracted heartbeat signals after preprocessing, and the adjacent five heartbeats (e.g., 3 s of ECG data) were selected and input into five identical CNN channels to obtain the feature map, which was concatenated into the LSTM network to acquire the temporal characteristics between signals. Finally, the temporal characteristics were output to the fully-connected layer for classification. The network structure had 16 layers including nine convolutional layers, two max-pooling layers, one global average pooling layer, one dropout layer, one LSTM layer, one flatten layer, and one fully-connected layer. (1) Convolutional layers: since the input ECG signals were one-dimensional signals, the selected numbers of filters of the one-dimensional convolutional layer were 4, 8, and 16, respectively; the convolution kernel size was 5 and the stride was 1; the specific parameter setting process is described in Section 4. Meanwhile, batch normalization was utilized to ensure that the input distribution of each layer of the neural network was the same, and ReLU function was applied as the activation function. Compared with the sigmoid function and tanh function, the ReLU function converges faster and alleviates the over-fitting problem. ReLU activation function can be expressed as Equation (10): (2) Pooling layers: this paper used the max-pooling (kernel size was 5) in the fourth and eighth layers of the network, decreased the dimensionality of the calculated characteristic parameters, and retained the significant features to accelerate the calculation. The tenth layer used the global average pooling to reduce the extraction of potential features and obtain the calculation result of the convolutional network. (3) Dropout layer: the dropout layer was applied between the global average pooling layer and the LSTM layer to achieve stronger generalization capability by randomly invalidating some network nodes. (4) LSTM layer: after the convolutional network as described above, due to the strong temporal correlation of the ECG signals, we connected a layer of the LSTM network to obtain the temporal characteristics in the output features of the convolutional network. (5) Flatten layer: we converted the multi-dimensional output of the LSTM network into a one-dimensional output.
(6) Fully-connected layer: the features after all processes were input to the fully-connected layer for classification, and the classifier was Softmax.

Data Sources
The myocardial infarction ECG data used in this paper was from the Physikalisch-Technische Bundesanstalt (PTB) database [15], provided by the German National Metrology Institute. The PTB database contains 549 records in 290 cases, each of which was acquired synchronously by a total of 15 leads including a traditional 12-lead and 3 Frank-VCG, and a professional medical practitioner completed the label for each record. The sampling frequency of ECG signals in the PTB database was 1000 Hz. In the PTB database, there were 148 cases of myocardial infarction (368 records) and 52 healthy volunteers (80 records), and the remaining records were heart diseases such as myocarditis, rhythm disorder, and unstable angina. Aiming to study the single-lead myocardial infarction ECG classification, we extracted the myocardial infarction signals and healthy signals of I-lead with lengths of 30 s from the above 15 leads as the experimental data.

Data Preprocessing
During the acquisition process, ECG signals are subject to three types of noise such as myoelectric interference, baseline drift, and power line interference. Wavelet transform has a good effect on eliminating the above three kinds of ubiquitous noises in ECG signals. This paper used the wavelet transform method proposed in [16] to filter the original ECG noise, and utilized Daubechies D6 ('db6') wavelet basis function to decompose the ECG signals to 10 levels. Table 1 corresponds to the components of the wavelet transform frequency band of ECG signals. The low and high frequency component are called approximation and detail, respectively. We removed the D1 (250-500 Hz), D2 (125-250 Hz), D3 (62.5-125 Hz) detail components, and the A10 (0-0.4875 Hz) approximation component, and reconstructed the remaining components to obtain signals without noises. Figure 3 shows the heartbeats before and after the noises were removed.  After the noises were removed, each ECG record was segmented according to a fixed length in consideration of the input characteristics of the CNN. First, we used the Pan-Tompkins algorithm to detect the R-peaks [17], and then the R-peak positions were utilized for heartbeat segmentation. After segmentation, each heartbeat contained 600 sampling points (199 sampling points were selected on the left side of the R peak and 400 sampling points on the right side), and the length of a single heartbeat was 0.6 s, which basically covered the range of a P-QRS-T wave. The amplitude distribution of the normal and myocardial infarction segmentations differed, which would affect the calculation rate. In order to accelerate the calculation effect, the segmented ECG signals were normalized to improve the convergence speed of the model. As shown in Figure 4, there was a clear distinction between the normal ECG and myocardial infarction ECG [18].

Balanced Data
Since the ECG data samples used in the study were not balanced, and the healthy records were significantly less than the myocardial infarction records, in order to avoid over-fitting during training and improve the generalization ability of the model, the healthy data were randomly oversampled for balance. Thus, the number of the increased healthy samples was approximately the same as that of the myocardial infarction.

Cross-Validation
In order to improve the robustness of the algorithm model, 10-fold cross-validation was used in the training process. The pre-processed data were randomly divided into 10 parts. In the calculation of each fold, 90% of the data was used to train the model, and 10% of the data was used as a test set to test the performance of the model. This process was repeated 10 times, and the corresponding evaluation indicator for each calculation was recorded. Meanwhile, in order to observe the parameter variation of the training process and prevent over-fitting, 20% of the 90% training data was taken out as the validation set to test the performance of the model at each epoch. The data partitioning is shown in Figure 5.

Evaluation Index
In the analysis of the classification effect of the model, this paper comprehensively considered the following four indicators: accuracy (Acc), sensitivity (Sen), specificity (Spec), and F1 score (F1). The calculation method and meaning of each indicator are as follows: where true positive (TP) represents the number of correct classification; false positive (FP) is the number of normal ECG, but marked as myocardial infarction; and false negative (FN) is the number of myocardial infarction ECG but marked as normal. In addition to the above indicators, this paper studied the two-category problem, so the receiver operating characteristic (ROC) curve and the area under curve (AUC) were used to describe the performance of the model.

Development Environment
The experimental environment of this article was as follows: Intel Core i5-4590@3.30GHz CPU, 8G RAM, and a GTX 750 graphics card. The development platform was Python 3.7, using the Keras framework and Tensorflow as the back-end.

Impact of Channel Numbers on Performance Indicators
During the experiment to test the influence of input length on the accuracy of the model, this paper found that when the number of model channels was set to five, i.e., when the data of five adjacent heartbeats (3 s) were selected as the input, the accuracy was the highest. Table 2 shows the experimental results of different heartbeats. Compared with the input length of 10 heartbeats (6 s), the accuracy, sensitivity, specificity, and F1 increased by 1.1%, 1.7%, 4.6%, and 1.4%, respectively; and compared with the input length of 15 heartbeats (9 s), the accuracy, sensitivity, specificity, and F1 indicators increased by 2.7%, 4.6%, 6.0%, and 3.4%, respectively. Thus, the model channel was set to five, and we used five heartbeats as the input to obtain the classification effect confusion matrix as shown in Figure 6, which indicates that the model could identify 98% of myocardial infarction ECG. The ROC curve is shown in Figure 7. The AUC value of 0.9868 indicates that the model had excellent classification performance.

Impact of Convolution Kernel Sizes on Classification Results
After determining the number of channels and layers of the network structure, since the size of the convolution kernel has great influence on the classification performance and operation speed, we tested five different convolution kernel sizes for the convolutional network, and the results are shown in Table 3. It was found that when the convolution kernel size was set between [5,9], the AUC value of the classification effect was improved. Therefore, by taking into account not wasting computing resources, when the size of the convolution kernel was set between [5,9], a better classification effect could be obtained.

Impact of Different Optimizers and Learning Rates on Performance Index
Model training speed and classification accuracy can be improved by selecting an appropriate optimizer and optimal learning rate. In this paper, three commonly used optimizers-RMSprop, SGD, and Adam-were selected, the model was trained with different learning rates, and the average accuracy of the test set was the evaluation index. As the experimental results show in Table 4, the accuracy was the highest when using the Adam optimizer and the learning rate was set to 0.0001.

Determination of Model Parameters
After the above-mentioned experiments, the model parameters were determined (shown in Table 5). The number of filters in the convolutional layers was 4, 8, and 16, respectively. The convolution kernel size was 5, the stride was 1, and the activation function was ReLU; the size of the max-pooling layer was 5; the dropout layer was set to 0.5; the optimizer selected Adam, and the learning rate was 0.0001; each batch size was 32, and each training completed 100 epochs.

Discussion
This paper used 10-fold cross-validation to train the model, and the comparison of the results of the evaluation indicators obtained and the existing methods is shown in Table 6. It can be seen from Table 6 that the classification and recognition of myocardial infarction ECG have focused on multi-lead studies. By extracting the time-frequency domain features [3,5] and the wavelet coefficient features [7] of multi-lead ECG signals, myocardial infarction classification can be achieved with a high recognition rate through SVM, KNN, and other methods. Unlike traditional 12 leads, 3 Frank leads also can be used to derive the vectorcardiogram (VCG) to detect myocardial infarction. Dawson et al. found that the 12 lead ECG could be linearly transformed from a 3 lead VCG [19]. Aiming at classify myocardial infarction with VCG signals, Huang et al. acquired VCG signals from Frank XYZ leads and extracted 64 features to complete the detection [20]. Ge obtained multivariable autoregressive coefficients via the VCG signal to classify myocardial infarction [21]. However, the above methods all passed the complicated handcrafted feature extraction step, the calculation process was relatively cumbersome, and the data volume recorded by the multi-lead system was often very large and had more constraints on patients, which is not suitable for portable monitoring, and thus is limited to a certain extent. In studies of single-lead myocardial infarction, Safdarian [6] and Zewdie [22] used T-wave detection or morphological information as features to classify with the Naive Bayes and SVM methods, respectively; Acharya [13] achieved classification through the CNN structure with a 95.2% accuracy rate; unlike Acharya [13], this paper considered the ECG signal as a time series, and combined CNN and LSTM to extract deeper features and eliminated steps such as complex feature extraction without decreasing accuracy. It is worth mentioning that although the deep learning model can automatically learn to obtain feature information, compared with the traditional method, the model has higher requirements on the amount and time of training data. The model spent about 15 s per epoch during training. However, there was no need to retrain when classifying the results. During the test of 358 datasets, the total test time was 2.2 s, and the average test time was about 60 ms. Although the processing speed of conventional portable devices cannot be consistent with that of the computer used in the experiment, the algorithm can be implemented on the cloud platform for real-time processing to meet the requirements of clinical applications.

Conclusions
Early diagnosis of myocardial infarction is crucial to reduce patient mortality. To diagnose different types of myocardial infarction, many researchers have focused on 12 lead ECG and Frank lead VCG and have achieved great performances. However, multi-lead ECG devices are cumbersome instruments that are only available in hospitals and clinics. Due to advancements in technology, single-lead ECG devices are available for individual and home use for basic cardiac monitoring. Particularly in recent years, with the prevalence of portable ECG testing equipment, utilizing single-lead ECG to prevent and monitor myocardial infarction has played an important role. This paper proposed a classification model of myocardial infarction ECG based on multi-channel CNN and RNN. The network structure had deep structural features, which could acquire the spatial and temporal characteristics of ECG signals. Without any handcrafted feature extraction, the model could obtain an accuracy of 95.4%, a sensitivity of 98.2%, a specificity of 86.5%, and an F1 of 96.8%. It is an effective solution for the automatic classification of myocardial infarction ECG, which can help clinicians, non-specialists, or individuals achieve the prevention and diagnosis of myocardial infarction. However, the occurrence of myocardial infarction is often accompanied by other types of abnormal ECG, so in the future, we will focus on how to further optimize the model structure. Meanwhile, we will cooperate with clinicians to obtain more types of ECG data, and apply the model to other abnormal ECG for recognition and classification.