Electrocardiogram Classification Based on Faster Regions with Convolutional Neural Network

The classification of electrocardiograms (ECG) plays an important role in the clinical diagnosis of heart disease. This paper proposes an effective system development and implementation for ECG classification based on faster regions with a convolutional neural network (Faster R-CNN) algorithm. The original one-dimensional ECG signals contain the preprocessed patient ECG signals and some ECG recordings from the MIT-BIH database in this experiment. Each ECG beat of one-dimensional ECG signals was transformed into a two-dimensional image for experimental training sets and test sets. As a result, we classified the ECG beats into five categories with an average accuracy of 99.21%. In addition, we did a comparative experiment using the one versus rest support vector machine (OVR SVM) algorithm, and the classification accuracy of the proposed Faster R-CNN was shown to be 2.59% higher.


Introduction
An electrocardiogram (ECG) as a cardiac activity record provides important information about the state of the heart [1]. ECG arrhythmia detection is necessary for early diagnosis of heart disease patients. On the one hand, it is very difficult for a doctor to analyze an electrocardiogram with a long recording time for a limited time. On the other hand, people are also almost unable to recognize the morphological changes of ECG signals without tool support. Therefore, an effective computer-aided diagnosis system is needed to solve this problem.
Most ECG classification methods are mainly based on one-dimensional ECG data. These methods usually need to extract the waveform's characteristics, the interval of adjacent wave, and the amplitude and period of each wave as input. The main difference between them is the selection of the classifier.
Early on, Yuzhen et al. [2] used the BP neural network to classify the ECG beat, with the classification accuracy rate reaching 93.9%. Martis et al. [3,4] proposed discrete cosine transform (DCT) coefficients from the segmented beats of ECGs, which were then subjected to principal component analysis for dimensionality reduction, and a probabilistic neural network (PNN) for automatic classification. They classified the ECG beats into five categories with the highest average sensitivity of 98.69%, specificity of 99.91%, and classification accuracy of 99.52%. Luo et al. used an artificial neural network, based on multi-order feedforward, to classify the ECG beat into six categories, and finally obtained a classification accuracy rate of 90.6% [5]. Osowski et al. designed a classifier that cascades the fuzzy self-organizing layer and the multi-layer perceptron, and realized seven classifications for ECG beats with a classification accuracy rate of 96% [6]. Ceylan et al. used feedforward neural networks as the classifier, and they realized the images. The reason why we applied two-dimensional CNN by converting the ECG signal into an ECG image in this paper is that two-dimensional convolutional and pooling layers are more suitable for filtering the spatial locality of the ECG images. As a result, higher ECG classification accuracy can be obtained. In addition, the physician can judge the arrhythmia in ECG signals of the patient through vision treatment of the eyes. Therefore, we concluded that applying the two-dimensional CNN model to the ECG image is most similar to the physician's arrhythmia diagnosis process. Moreover, this method can be applied to ECG signals from various ECG devices with different sampling rates. Before the one-dimensional ECG signals are converted to two-dimensional ECG images, we preprocessed the one-dimensional ECG signals by empirical mode decomposition (EMD) [33]. Finally, we classified the ECG into five categories with 99.21% average accuracy. Meanwhile, we did a comparative experiment using the OVR SVM algorithm, and the classification result of our method is higher than that of the one versus rest (OVR) SVM [34] algorithm. At the end of the article, we compared our model with previous works using machine learning algorithms to classify ECG, where the proposed method achieved the best results in average accuracy.
The rest of the paper is structured as follows: Section 2 introduces the method of ECG signal preprocessing and Faster R-CNN architecture. Section 3 presents the experimental design based on the Faster R-CNN algorithm. Section 4 describes the experimental results and the comparative analysis. Conclusions are drawn in Section 5.

Methods
In this paper, we used one-dimensional ECG data from the recordings of the MIT-BIH database and the patient. We firstly preprocessed the ECG data from the patient, which had quite serious noise interference. Since the Faster R-CNN model handles two-dimensional images as input data, we transformed the ECG signals into ECG images. Finally, we used Faster R-CNN to classify the ECG beats into five categories. The overall procedure is shown in Figure 1. In this paper, we propose an ECG classification method using faster regions with a convolutional neural network (Faster R-CNN) with ECG images. This method uses CNN to extract features of ECG images. The reason why we applied two-dimensional CNN by converting the ECG signal into an ECG image in this paper is that two-dimensional convolutional and pooling layers are more suitable for filtering the spatial locality of the ECG images. As a result, higher ECG classification accuracy can be obtained. In addition, the physician can judge the arrhythmia in ECG signals of the patient through vision treatment of the eyes. Therefore, we concluded that applying the two-dimensional CNN model to the ECG image is most similar to the physician's arrhythmia diagnosis process. Moreover, this method can be applied to ECG signals from various ECG devices with different sampling rates. Before the one-dimensional ECG signals are converted to two-dimensional ECG images, we preprocessed the one-dimensional ECG signals by empirical mode decomposition (EMD) [33]. Finally, we classified the ECG into five categories with 99.21% average accuracy. Meanwhile, we did a comparative experiment using the OVR SVM algorithm, and the classification result of our method is higher than that of the one versus rest (OVR) SVM [34] algorithm. At the end of the article, we compared our model with previous works using machine learning algorithms to classify ECG, where the proposed method achieved the best results in average accuracy.
The rest of the paper is structured as follows: Section 2 introduces the method of ECG signal preprocessing and Faster R-CNN architecture. Section 3 presents the experimental design based on the Faster R-CNN algorithm. Section 4 describes the experimental results and the comparative analysis. Conclusions are drawn in Section 5.

Methods
In this paper, we used one-dimensional ECG data from the recordings of the MIT-BIH database and the patient. We firstly preprocessed the ECG data from the patient, which had quite serious noise interference. Since the Faster R-CNN model handles two-dimensional images as input data, we transformed the ECG signals into ECG images. Finally, we used Faster R-CNN to classify the ECG beats into five categories. The overall procedure is shown in Figure 1.

ECG Signal Pre-Processing
In general, due to the weakness of the ECG signal and the influence of acquisition equipment, many interference noises would be easily mixed during the acquisition process. However, these noises are very unfavorable for the analysis of ECG signals. Therefore, effective preprocessing of ECG signals is a key issue before the classification of ECG. Common ECG signal interference noises include power frequency interference, baseline drift, and myoelectric interference.
As shown in Figure 2, the original ECG signal from the patient was decomposed into 10 intrinsic mode functions (IMFs) using the EMD algorithm. Among them, the noise signal was mainly concentrated in the IMF1 and IMF2 modes. The baseline drift was mainly focused on the IMF9 and IMF 10 modes, while the remaining modes contained important information about the ECG signal.

ECG Signal Pre-Processing
In general, due to the weakness of the ECG signal and the influence of acquisition equipment, many interference noises would be easily mixed during the acquisition process. However, these noises are very unfavorable for the analysis of ECG signals. Therefore, effective preprocessing of ECG signals is a key issue before the classification of ECG. Common ECG signal interference noises include power frequency interference, baseline drift, and myoelectric interference.
As shown in Figure 2, the original ECG signal from the patient was decomposed into 10 intrinsic mode functions (IMFs) using the EMD algorithm. Among them, the noise signal was mainly concentrated in the IMF1 and IMF2 modes. The baseline drift was mainly focused on the IMF9 and IMF 10 modes, while the remaining modes contained important information about the ECG signal. The high-frequency IMF (IMF1 and IMF2) modes were denoised by the wavelet transform algorithm, while the baseline drift of the low-frequency IMF (IMF9 and IMF10) modes was eliminated by the median filtering algorithm. The processed IMF modes and the remaining unprocessed modes were reconstructed, and a smooth and noiseless ECG signal was finally obtained. The process is shown in Figure 3. As shown in Figure 4, the ECG signal without noise and baseline drift makes it easier to classify the ECG signal.  The high-frequency IMF (IMF1 and IMF2) modes were denoised by the wavelet transform algorithm, while the baseline drift of the low-frequency IMF (IMF9 and IMF10) modes was eliminated by the median filtering algorithm. The processed IMF modes and the remaining unprocessed modes were reconstructed, and a smooth and noiseless ECG signal was finally obtained. The process is shown in Figure 3. The high-frequency IMF (IMF1 and IMF2) modes were denoised by the wavelet transform algorithm, while the baseline drift of the low-frequency IMF (IMF9 and IMF10) modes was eliminated by the median filtering algorithm. The processed IMF modes and the remaining unprocessed modes were reconstructed, and a smooth and noiseless ECG signal was finally obtained. The process is shown in Figure 3. As shown in Figure 4, the ECG signal without noise and baseline drift makes it easier to classify the ECG signal.  As shown in Figure 4, the ECG signal without noise and baseline drift makes it easier to classify the ECG signal. The high-frequency IMF (IMF1 and IMF2) modes were denoised by the wavelet transform algorithm, while the baseline drift of the low-frequency IMF (IMF9 and IMF10) modes was eliminated by the median filtering algorithm. The processed IMF modes and the remaining unprocessed modes were reconstructed, and a smooth and noiseless ECG signal was finally obtained. The process is shown in Figure 3. As shown in Figure 4, the ECG signal without noise and baseline drift makes it easier to classify the ECG signal.

Transforming the ECG Signals into ECG Images
Before transforming the one-dimensional ECG signal into a two-dimensional image, the R wave position of the ECG signal needs to be found. In this paper, discrete wavelet transform (DWT) [35] was adopted to find the R wave.

R Wave Detection
Wavelet transform achieved good results in improving the anti-interference and accuracy of the QRS group detection. According to the wavelet transform theory, the R wave peak point corresponds to the zero crossing of the modulus maxima. The R wave peak position is located by detecting the position of the modulus maxima of the R wave, and then the start and end points of the QRS wave are searched forward and backward according to the R peak position.
In the experiment, the DWT method and the adaptive threshold denoising method were used to detect the R wave of the ECG signal. The modulus maximum and the zero crossing were detected to find the position of the QRS group. Then, the adaptive noise threshold method was used to judge whether the detected peak position was an R wave or a glitch. Figure 5 shows the position of the R wave point in the ECG signal. Before transforming the one-dimensional ECG signal into a two-dimensional image, the R wave position of the ECG signal needs to be found. In this paper, discrete wavelet transform (DWT) [35] was adopted to find the R wave.

R Wave Detection
Wavelet transform achieved good results in improving the anti-interference and accuracy of the QRS group detection. According to the wavelet transform theory, the R wave peak point corresponds to the zero crossing of the modulus maxima. The R wave peak position is located by detecting the position of the modulus maxima of the R wave, and then the start and end points of the QRS wave are searched forward and backward according to the R peak position.
In the experiment, the DWT method and the adaptive threshold denoising method were used to detect the R wave of the ECG signal. The modulus maximum and the zero crossing were detected to find the position of the QRS group. Then, the adaptive noise threshold method was used to judge whether the detected peak position was an R wave or a glitch. Figure 5 shows the position of the R wave point in the ECG signal.

Extracting the ECG Beat
In this paper, the sliding window search method was used to extract the ECG beat. The current R wave point was used as a reference to search the left for 150 ms. If the point existed, the coordinates of the left endpoint were recorded; otherwise, the search was stopped. Then, the right was searched for 150 ms with the current R wave point as reference. If the location point existed, the coordinates of the right endpoint were recorded; otherwise, the search was stopped. Finally, we cut the graph from the left to the right endpoints as the input sample of the deep learning network in the experiment. Figure 6 shows the process of extracting the ECG beats, and Figure 7 shows the ECG beat samples.

Extracting the ECG Beat
In this paper, the sliding window search method was used to extract the ECG beat. The current R wave point was used as a reference to search the left for 150 ms. If the point existed, the coordinates of the left endpoint were recorded; otherwise, the search was stopped. Then, the right was searched for 150 ms with the current R wave point as reference. If the location point existed, the coordinates of the right endpoint were recorded; otherwise, the search was stopped. Finally, we cut the graph from the left to the right endpoints as the input sample of the deep learning network in the experiment. Figure 6 shows the process of extracting the ECG beats, and Figure 7 shows the ECG beat samples.

Faster R-CNN Architecture
In this paper, we used Faster R-CNN based on the ZF net to classify the ECG. As we can see in Figure 8, Faster R-CNN is composed of the ZF net, region proposal network (RPN) net, and Fast R-CNN net. Among them, the ZF net is a CNN architecture, which is used to extract the feature map of ECG images. Then, the RPN net runs on the feature map and generates approximately 20,000 rectangular boxes, which are sorted according to scores from large to small. Then, the first 300 rectangular boxes are taken as inputs for the Fast R-CNN net, which maintains higher accuracy while reducing time complexity. Finally, the Fast R-CNN net outputs the probability of a category and a coordinate matrix (containing four coordinate values).

Faster R-CNN Architecture
In this paper, we used Faster R-CNN based on the ZF net to classify the ECG. As we can see in Figure 8, Faster R-CNN is composed of the ZF net, region proposal network (RPN) net, and Fast R-CNN net. Among them, the ZF net is a CNN architecture, which is used to extract the feature map of ECG images. Then, the RPN net runs on the feature map and generates approximately 20,000 rectangular boxes, which are sorted according to scores from large to small. Then, the first 300 rectangular boxes are taken as inputs for the Fast R-CNN net, which maintains higher accuracy while reducing time complexity. Finally, the Fast R-CNN net outputs the probability of a category and a coordinate matrix (containing four coordinate values).

Faster R-CNN Architecture
In this paper, we used Faster R-CNN based on the ZF net to classify the ECG. As we can see in Figure 8, Faster R-CNN is composed of the ZF net, region proposal network (RPN) net, and Fast R-CNN net. Among them, the ZF net is a CNN architecture, which is used to extract the feature map of ECG images. Then, the RPN net runs on the feature map and generates approximately 20,000 rectangular boxes, which are sorted according to scores from large to small. Then, the first 300 rectangular boxes are taken as inputs for the Fast R-CNN net, which maintains higher accuracy while reducing time complexity. Finally, the Fast R-CNN net outputs the probability of a category and a coordinate matrix (containing four coordinate values).

Region Proposal Network
The region proposal network adopts the neural network, and integrates the three processes of generating candidate boxes, extracting features, and classifying them into a network model. Finally, the RPN realizes end-to-end training and detection. The RPN takes an image of any size as input and outputs a set of candidate boxes, where each box has a score for evaluating the similarity between the box and the target.
The small network slides on the convolution feature map of the last layer output of the shared convolutional layer. Each sliding window is mapped to a low-dimensional vector, which is output to two fully connected layers of the same level-the rectangular frame regression layer and the rectangular frame classification layer.
An anchor mechanism is proposed by Faster R-CNN. As shown in Figure 9, at each position of the sliding window, k area suggestions are predicted at the same time, so the rectangular frame regression layer has 4k outputs, that is, coordinate values of k boxes (x_1, y_1, x_2, y_2). The rectangular frame classification layer has 2k outputs. The center of the k anchors is the center of the current sliding window. Faster R-CNN uses three scales and three aspect ratios; thus, there are k = 9 anchors at each sliding position. There is a total of W × H × k anchors for a convolutional feature map of size W × H [36].

Region Proposal Network
The region proposal network adopts the neural network, and integrates the three processes of generating candidate boxes, extracting features, and classifying them into a network model. Finally, the RPN realizes end-to-end training and detection. The RPN takes an image of any size as input and outputs a set of candidate boxes, where each box has a score for evaluating the similarity between the box and the target.
The small network slides on the convolution feature map of the last layer output of the shared convolutional layer. Each sliding window is mapped to a low-dimensional vector, which is output to two fully connected layers of the same level-the rectangular frame regression layer and the rectangular frame classification layer.
An anchor mechanism is proposed by Faster R-CNN. As shown in Figure 9, at each position of the sliding window, k area suggestions are predicted at the same time, so the rectangular frame regression layer has 4k outputs, that is, coordinate values of k boxes (x_1, y_1, x_2, y_2). The rectangular frame classification layer has 2k outputs. The center of the k anchors is the center of the current sliding window. Faster R-CNN uses three scales and three aspect ratios; thus, there are k = 9 anchors at each sliding position. There is a total of W × H × k anchors for a convolutional feature map of size W × H [36].

Loss Function
Each anchor is assigned a label that is a "target" (positive label) or "non-target" (negative label). The Faster R-CNN network inherits the multitasking loss mechanism of the Fast R-CNN network.

Loss Function
Each anchor is assigned a label that is a "target" (positive label) or "non-target" (negative label). The Faster R-CNN network inherits the multitasking loss mechanism of the Fast R-CNN network. For an anchor i , its loss function is defined as where i is the index of an anchor in the mini-batch, and p i is the predicted probability of the condition that anchor i is a positive label. If the anchor is a positive label, p i * is 1; otherwise, p i * is 0. t i = t x , t y , t w , t h is a vector representing the four parameterized coordinates of the predicted rectangular box, and t * i = t * x , t * y , t * w , t * h is the coordinate vector of the rectangle corresponding to the anchor with a positive label.
The classification loss function in Equation (1) is The regression loss function in Equation (1) is The R function in Equation (3) is the smooth L1 function shown in Equation (4). The

Convergence Feature Sharing
Faster R-CNN developed a technology which allows the sharing of convolutional layers between the RPN network and Fast R-CNN network, and it achieves joint training rather than learning the two networks separately. This algorithm consists of four steps, the basic idea of which is to alternate optimization.
(a) Initialize the network parameters with a pre-trained model on ImageNet, and fine-tune the RPN network [37]; (b) Use the initialized RPN network in step (a) to extract the region proposal training Fast R-CNN network; (c) Re-initialize the RPN using the Fast R-CNN network in step (b), fix its convolutional layer while training the RPN network, and only fine-tune its unique layer; (d) Fix the convolutional layer parameters after the Fast R-CNN learning in step (b). On this basis, use the region proposal extracted by the RPN in step (c) to fine-tune the Fast R-CNN network.

Non-Maximum Suppression
As shown in Figure 10, intersection-over-union (IoU) is used to measure the degree of overlap between the two rectangular boxes, which can be defined as the ratio of the intersection of the two boxes to the union.

Non-Maximum Suppression
As shown in Figure 10, intersection-over-union (IoU) is used to measure the degree of overlap between the two rectangular boxes, which can be defined as the ratio of the intersection of the two boxes to the union. Non-maximum suppression (NMS) is a strategy for finding maxima and suppressing nonmaximal values according to certain rules. In target detection, NMS is often used to remove redundant windows. As shown in Figure 11, all boxes are sorted according to their score from small to large. The IoU between the rest of the box is calculated based on the box of the highest score. If the IoU exceeds the threshold set in advance, then the corresponding box is suppressed. box 1 ,score 1 box 2 ,score 2 box 3 ,score 3 Figure 11. Non-maximum suppression.

Experimental Platform
This experiment was conducted under the Windows 10 operating system. Table 1 lists the software information in the experiment. Non-maximum suppression (NMS) is a strategy for finding maxima and suppressing non-maximal values according to certain rules. In target detection, NMS is often used to remove redundant windows. As shown in Figure 11, all boxes are sorted according to their score from small to large. The IoU between the rest of the box is calculated based on the box of the highest score. If the IoU exceeds the threshold set in advance, then the corresponding box is suppressed.

Non-Maximum Suppression
As shown in Figure 10, intersection-over-union (IoU) is used to measure the degree of overlap between the two rectangular boxes, which can be defined as the ratio of the intersection of the two boxes to the union. Non-maximum suppression (NMS) is a strategy for finding maxima and suppressing nonmaximal values according to certain rules. In target detection, NMS is often used to remove redundant windows. As shown in Figure 11, all boxes are sorted according to their score from small to large. The IoU between the rest of the box is calculated based on the box of the highest score. If the IoU exceeds the threshold set in advance, then the corresponding box is suppressed. box 1 ,score 1 box 2 ,score 2 box 3 ,score 3 Figure 11. Non-maximum suppression.

Experimental Platform
This experiment was conducted under the Windows 10 operating system. Table 1 lists the software information in the experiment.

Experimental Platform
This experiment was conducted under the Windows 10 operating system. Table 1 lists the software information in the experiment.

ECG Beat Classification Criteria
There are many types of ECG beats, and many similarly shaped beats must rely on physicians with specialized experience to be able to accurately identify them. Table 2 shows the currently accepted classification standard for ECG beats.

AAMI ECG Beat Class MIT-BIH ECG Beat Types
Normal ( The classification performance is measured by four criteria, specificity (Spe), sensitivity (Sen), positive rate (Pre), and accuracy (Acc), as follows: (1) Specificity (Spe): The proportion of normal ECG beats that are correctly classified, which represents the correct rate of non-patients.  Table 3 shows the definition of the four classification results. According to the definitions of specificity, sensitivity, and accuracy, the formulas are as follows:

Building Dataset
The ECG data used in this paper were obtained from the MIT-BIH database, and some patients cooperated with us. Among the data, the MIT-BIH database contained 48 half-hour ECG recordings collected from 47 patients. There are approximately 110,000 ECG beats in the MIT-BIH database with 15 different types. From the MIT-BIH database, we included normal beat (NOR) and four types of ECG arrhythmias, including left bundle branch block beat (LBBB), right bundle branch block beat (RBBB), premature ventricular contraction (PVC), and fusion of ventricular and normal beat (FVN). Table 4 shows the number of ECG beats per class in the MIT-BIH database.

Impact of Learning Rate on Classification Performance
Learning rate is an important parameter in deep learning. If the learning rate is too large, it is easy to overshoot the phenomenon. If the learning rate is too small, this will lead to slow convergence or overfitting. Therefore, it is necessary to consider the actual problem to determine the learning rate. Table 6 shows the effect of learning rate on the performance of this experimental classification. As the learning rate decreases, the loss function value decreases and gradually stabilizes. Although the classification accuracy is the highest at the learning rate of 0.001, the training takes longer; thus, it was more appropriate to consider the learning rate as 0.001.

Effect of Weight Attenuation Coefficient on Classification Performance
In the loss function, the weight decay is the coefficient placed before the regularization. Regular terms usually represent the complexity of the model. Therefore, the effect of changing the weight attenuation coefficient is to adjust the effect of the loss function. If the weight attenuation is greater, the value of the loss function is also larger. Table 7 shows the effect of the attenuation weighting factor on the performance of the experimental classification. As the weight attenuation coefficient decreases, the loss function value decreases; the classification accuracy is the highest when the weight attenuation coefficient value is 0.0005. Therefore, considering the weight, the weight attenuation coefficient was set to 0.0005.

Influence of the Number of Iterations on Classification Performance
Iterations represent the maximum number of iterations during the training process. If the parameter is too small, the model will not be adequately trained. If it is too large, the model may suffer from over-fitting. Table 8 shows the impact of the number of iterations on classification performance. Based on the results in Table 8, the final number of iterations was chosen as 4000.

Experimental Results
The Faster R-CNN framework used in this paper has three aspects for the classification of ECG beats: target position box, target classification, and score. The target position involves the coordinates of the upper left vertex and the lower right vertex of a rectangular box. The target classification involves the result of the model's judgment on the image category at that position, which had only five discrete values in this experiment. The score is a probability value to show how likely the ECG beat is to be in this category, which is also called the confidence coefficient. The ECG beats were classified as shown in Figure 12.
Precision rate and recall rate are also two important evaluation indexes for classification results. According to the definition in Table 3, the formulas of precision rate and recall rate are as follows: Because the same ECG beat could be divided into multiple categories and each category had a score, we needed to set a score threshold to remove the categories with low scores. However, when the score threshold is too low, multiple target frames will appear, and the precision rate will decrease. When the score threshold is too high, the precision rate will increase, and the recall rate will decrease. Figure 13 plots the relationship between precision rate and recall rate when we adjusted the score threshold. After many experiments, when the score threshold value was equal to 0.7, the precision rate and the recall rate could reach a better level. Precision rate and recall rate are also two important evaluation indexes for classification results. According to the definition in Table 3, the formulas of precision rate and recall rate are as follows: Because the same ECG beat could be divided into multiple categories and each category had a score, we needed to set a score threshold to remove the categories with low scores. However, when the score threshold is too low, multiple target frames will appear, and the precision rate will decrease. When the score threshold is too high, the precision rate will increase, and the recall rate will decrease. Figure 13 plots the relationship between precision rate and recall rate when we adjusted the score threshold. After many experiments, when the score threshold value was equal to 0.7, the precision rate and the recall rate could reach a better level.   Precision rate and recall rate are also two important evaluation indexes for classification results. According to the definition in Table 3, the formulas of precision rate and recall rate are as follows: Because the same ECG beat could be divided into multiple categories and each category had a score, we needed to set a score threshold to remove the categories with low scores. However, when the score threshold is too low, multiple target frames will appear, and the precision rate will decrease. When the score threshold is too high, the precision rate will increase, and the recall rate will decrease. Figure 13 plots the relationship between precision rate and recall rate when we adjusted the score threshold. After many experiments, when the score threshold value was equal to 0.7, the precision rate and the recall rate could reach a better level. Figure 13. Score threshold test results. Figure 13. Score threshold test results. Table 9 shows the classification results of Faster R-CNN. The classification accuracy of each class was more than 99%. The average classification accuracy rate was 99.21%. For testing the robustness of the test classification model, this experiment made a test set with multiple ECG beats in a sample picture, as shown in Figure 14, and still showed good classification results.  Since SVM is also a widely applied classification method in ECG arrhythmia detection, we did a comparative experiment with the OVR SVM method. SVM is a binary classifier, which was originally designed for binary classification problems. When dealing with multi-class problems, it is necessary to construct an appropriate multi-class classifier.
We realized the construction of multiple classifiers by combining multiple binary classifiers. During training, the samples of one category are successively classified into one category, and the rest of the samples are classified into another category. In this paper, we needed to divide the ECG beats into five categories; thus, the selection of the training set was as shown in Table 10. We used these five training sets to obtain five separate training models. Then, we used test sets of each category to test the five training models, whereby each test set got five test results. The largest of the five test results was selected as the final classification result for each test set. Table 11 shows the classification results of the OVR SVM. The average classification accuracy rate was 96.62%. Figure   Figure 14. The detected sample graph consisting of multiple ECG beats.
Since SVM is also a widely applied classification method in ECG arrhythmia detection, we did a comparative experiment with the OVR SVM method. SVM is a binary classifier, which was originally designed for binary classification problems. When dealing with multi-class problems, it is necessary to construct an appropriate multi-class classifier.
We realized the construction of multiple classifiers by combining multiple binary classifiers. During training, the samples of one category are successively classified into one category, and the rest of the samples are classified into another category. In this paper, we needed to divide the ECG beats into five categories; thus, the selection of the training set was as shown in Table 10. We used these five training sets to obtain five separate training models. Then, we used test sets of each category to test the five training models, whereby each test set got five test results. The largest of the five test results was selected as the final classification result for each test set. Table 11 shows the classification results of the OVR SVM. The average classification accuracy rate was 96.62%. Figure 15 shows the comparison of classification results of the two methods. We can see that the average classification accuracy, sensitivity, and specificity of Faster R-CNN were higher than for OVR SVM.  The benefits of Faster R-CNN are as follows: i.
No manual feature extraction is required. ii. The sampling rate of the original ECG signal does not need to be considered. iii. The approach is insensitive to the ECG signal quality. iv. High classification accuracy.
The drawbacks of Faster R-CNN are as follows: i.
Training set samples need to be manually labeled. ii. Requires long training hours, and specialized hardware to efficiently train datasets (graphics processing unit (GPU)).
However, once the training of the ECG signals is completed, the classification of ECG heartbeat signals is fast. It takes about 0.025 s to classify a test sample of ECG beats.

Comparison with Existing Approaches
Various machine learning methods are also used to classify ECG signals. Therefore, we also compared the classification results of these algorithms. Table 12 presents a performance comparison with previous works. From Table 12, we can see that the proposed method achieved the best results in average accuracy. The benefits of Faster R-CNN are as follows: i.
No manual feature extraction is required. ii.
The sampling rate of the original ECG signal does not need to be considered. iii.
The approach is insensitive to the ECG signal quality. iv.
High classification accuracy.
The drawbacks of Faster R-CNN are as follows: i. Training set samples need to be manually labeled. ii.
Requires long training hours, and specialized hardware to efficiently train datasets (graphics processing unit (GPU)).
However, once the training of the ECG signals is completed, the classification of ECG heartbeat signals is fast. It takes about 0.025 s to classify a test sample of ECG beats.

Comparison with Existing Approaches
Various machine learning methods are also used to classify ECG signals. Therefore, we also compared the classification results of these algorithms. Table 12 presents a performance comparison with previous works. From Table 12, we can see that the proposed method achieved the best results in average accuracy.

Conclusions
In this paper, we proposed an effective ECG classification method using Faster R-CNN based on a ZF net with ECG images as input. For better-quality ECG images, we used the EMD method to preprocess the one-dimensional ECG signals; then, the DWT algorithm was used to find the R wave position of the ECG signals, and the one-dimensional ECG signals were transformed into two-dimensional ECG images using the sliding window algorithm. After several experiments and parameter optimization, we finally classified the ECG beats into five categories with an average accuracy of 99.21%. Meanwhile, we did a comparative experiment using the OVR SVM algorithm, and the classification result of our method was higher than that of the OVR SVM algorithm. In addition, we also compared with previous works using machine learning algorithms to classify ECG signals, and the proposed method achieved the best results in average accuracy. Furthermore, the proposed ECG classification method can be applied to medical robots. For future work, we will streamline and optimize the model structure of this algorithm so that it can classify ECG signals in real time and play an important role in future healthcare.