Robust PVC Identification by Fusing Expert System and Deep Learning

Premature ventricular contraction (PVC) is one of the common ventricular arrhythmias, which may cause stroke or sudden cardiac death. Automatic long-term electrocardiogram (ECG) analysis algorithms could provide diagnosis suggestion and even early warning for physicians. However, they are mutually exclusive in terms of robustness, generalization and low complexity. In this study, a novel PVC recognition algorithm that combines deep learning-based heartbeat template clusterer and expert system-based heartbeat classifier is proposed. A long short-term memory-based auto-encoder (LSTM-AE) network was used to extract features from ECG heartbeats for K-means clustering. Thus, the templates were constructed and determined based on clustering results. Finally, the PVC heartbeats were recognized based on a combination of multiple rules, including template matching and rhythm characteristics. Three quantitative parameters, sensitivity (Se), positive predictive value (P+) and accuracy (ACC), were used to evaluate the performances of the proposed method on the MIT-BIH Arrhythmia database and the St. Petersburg Institute of Cardiological Technics database. Se on the two test databases was 87.51% and 87.92%, respectively; P+ was 92.47% and 93.18%, respectively; and ACC was 98.63% and 97.89%, respectively. The PVC scores on the third China Physiological Signal Challenge 2020 training set and hidden test set were 36,256 and 46,706, respectively, which could rank first in the open-source codes. The results showed that the combination strategy of expert system and deep learning can provide new insights for robust and generalized PVC identification from long-term single-lead ECG recordings.


Introduction
Cardiovascular diseases (CVDs) are the foremost cause of human death worldwide, which can lead to over 31% of deaths every year. With the progressive aging of populations worldwide, the number of patients with CVDs may continue to increase. It is estimated that the number of deaths due to CVDs will increase from 17 million in 2016 to 24 million in 2030 [1]. Therefore, monitoring and preventing CVDs in advance has become one of the important tasks for many countries [2].
Arrhythmia is a common CVDs, which refers to a series of rhythm and/or waveform irregular. As one of the most common arrhythmias, premature ventricular contraction (PVC) is caused by premature ectopic beats in the right or left ventricle [3]. Frequent PVC and multisource PVC detection have important clinical significance [4]. Clinicians generally detect PVC by observing rhythmic changes and subtle morphological changes from electrocardiogram (ECG) signal. However, this visual inspection may increase the manual interpretation work for physicians and lead to low efficiency for long-term PVC recognition. In order to reduce the workload of clinicians and improve PVC detection accuracy, researchers developed computer-aided systems for automagical diagnosis [5].

INCART Database
The performance of the proposed algorithm was evaluated on the INCART database, which consists of 75 12-lead ECG records. Each recording was sampled at 275 Hz and 30 min in duration. The annotations were produced by an automatic algorithm and then corrected manually, containing over 175,000 annotations in total [15]. Among these recordings, ECGs of lead II are adopted as our experimental data [20], and the ventricular ectopic beats (V) are regarded as PVC beats, and the others are Non_PVC beats.

CPSC2020 Database
CPSC2020 database is a wearable ECG database constructed for challenging PVC and supraventricular premature beat detection tasks [21], including pathological arrhythmias and poor signal quality due to artifact and noise. The training data consists of 10 single-lead ECG recordings collected from arrhythmia patients, each of the recording lasts for almost 24 h. The test set contains similar ECG recordings, which are not public. All data were collected with a sampling frequency of 400 Hz. It is worth noting that we did not participate in CPSC2020 in order to avoid doubts (we are affiliated with the organizer), but we tested our algorithm on this database and compared it with the top five teams.

Method
In this study, ECG recordings were cut into 30 min ECG segments. Each 30 min ECG segment was preprocessed to exclude the noise episodes and filter the artifacts for accurate R-peak detection. Thereafter, the feature vectors extracted by LSTM-AE were used for template construction based on K-means clustering, and the type of each template was determined by rule-based method. Finally, PVC heartbeats were identified by several rules. The flowchart of the proposed method is illustrated in Figure 1. determined by rule-based method. Finally, PVC heartbeats were identified by several rules. The flowchart of the proposed method is illustrated in Figure 1.

Signal Preprocessing
ECG signal is easily polluted by a variety of noises, including body movement, ECGlead off, etc. The corrupted ECG data could significantly affect the PVC identification. To remove the unacceptable ECG segments with poor signal quality, the signal quality assessment is used based on our previous work [22]. In brief, seven signal quality indices (SQIs) were calculated to train an SVM-based signal quality classification model, the training strategy and parameters setting were same as our previous work. After that, the baseline drift and high-frequency noise is excluded by a Butterworth band-pass (0.1-45 Hz) filter. Then, R-peaks are detected using an adaptive and time-efficient algorithm [23]. It was an adaptive method integrating wavelet-based multiresolution analysis, signal mirroring, local maximum detection, and amplitude and time interval thresholding. The R peaks were refined three times by replacing the detected R peak with the position of its surrounding (±25 ms) maximum absolute amplitude to address the R-peak misalignment problem. Finally, the 30 min ECG segment is divided into ECG heartbeats with 0.5 s length window centered around the detected R-peaks (0.1 s in front and 0.4 s after) referred from previous works [24].

Feature Vectors Extraction Based on LSTM-AE
The long short-term memory-based autoencoder (LSTM-AE) network is used to extract the feature vectors of ECG heartbeats in this research. Figure 2 shows the structure of LSTM-AE. LSTM is designed for processing time series based on the framework of the recurrent neural network, consisting of three gate structures: input gate, forget gate, and output gate. The forget gate decides what information will be thrown away from the previous cell state. The vectors generated by the hidden state ℎ from the previous LSTM cell and the input t x of the current step t. The generation process can be represented as  Figure 1. Flowchart of proposed method.

Signal Preprocessing
ECG signal is easily polluted by a variety of noises, including body movement, ECGlead off, etc. The corrupted ECG data could significantly affect the PVC identification. To remove the unacceptable ECG segments with poor signal quality, the signal quality assessment is used based on our previous work [22]. In brief, seven signal quality indices (SQIs) were calculated to train an SVM-based signal quality classification model, the training strategy and parameters setting were same as our previous work. After that, the baseline drift and high-frequency noise is excluded by a Butterworth band-pass (0.1-45 Hz) filter. Then, R-peaks are detected using an adaptive and time-efficient algorithm [23]. It was an adaptive method integrating wavelet-based multiresolution analysis, signal mirroring, local maximum detection, and amplitude and time interval thresholding. The R peaks were refined three times by replacing the detected R peak with the position of its surrounding (±25 ms) maximum absolute amplitude to address the R-peak misalignment problem. Finally, the 30 min ECG segment is divided into ECG heartbeats with 0.5 s length window centered around the detected R-peaks (0.1 s in front and 0.4 s after) referred from previous works [24].

Feature Vectors Extraction Based on LSTM-AE
The long short-term memory-based autoencoder (LSTM-AE) network is used to extract the feature vectors of ECG heartbeats in this research. Figure 2 shows the structure of LSTM-AE. LSTM is designed for processing time series based on the framework of the recurrent neural network, consisting of three gate structures: input gate, forget gate, and output gate. The forget gate decides what information will be thrown away from the previous cell state. The vectors f t generated by the hidden state h t−1 from the previous LSTM cell and the input x t of the current step t. The generation process can be represented as where W f is the weighted matrix of the forget gate and b f is the bias. As for the input gate, the vector i t and the input candidate information C t is also generated by the hidden state h t−1 and the input x t as The weighted matrices of W i , W o and bias b i , b o represent the connection between two components respectively. The forget gate and the input gate together determine the current control cell status C t : The output gate also generates a vector o t to determine the hidden state h t in the output state of the LSTM, as shown in the following equations: In Equation (5), W o is the weighted matrix of the forget gate and b o represents the bias. In this study, the LSTM-AE network is adopted in this study to extract feature vectors of the heartbeat, the training parameters are feature number = 32, batch size = 128, epoch numbers = 100, and Adam optimizer is selected as the optimizer [25].
This research embeds the LSTM network into the AE framework; thus, the process of encoder and decoder is implemented by LSTM. The encoder converts the input x t to a hidden representation h t (feature vectors) using a deterministic mapping function: where W is the weight between input x t and hidden representation h t and h t represents the bias. The decoder implements reconstructing the outputx t by h t , which can be expressed aŝ where W is the weight between hidden representation h t and outputx t and b is the bias.
where is the weighted matrix of the forget gate and is the bias. As for the input gate, the vector and the input candidate information is also generated by the hidden state ℎ and the input as The weighted matrices of , and bias , represent the connection between two components respectively. The forget gate and the input gate together determine the current control cell status :

K-Means Clustering Using Feature Vectors
The divided ECG heartbeats in each 30 min ECG segment are preliminarily clustered into K groups (K ≤ M, M represents the total number of heartbeats) based on the feature vectors using K-means clustering technique. In this study, K is determined by silhouette coefficient (SC): where a(i) and b(i) are the intra-cluster dissimilarity and intercluster dissimilarity of ith coded feature, respectively. The maximum SC is defined as K.

Template Construction and Template Classification
After K-means clustering, the distances between each coded feature sample in each group and its centroid are calculated, and sorted in ascending order Equation (10): where, sort_label j is the index of the sample corresponding to the distance between the sample in group j and the centroid a j after sorted, and N t indicates the number of samples in the group.
The first 30 samples after sorting are selected to construct templates, and the type of each template is determined as PVC/Non_PVC based on the morphological rules referring to our previous work in [26]. In brief, three features (the QRS complex height, the QRS complex width, and the correlation coefficient of each template) and several prior-knowledge-based rules are used to determine the type of each template.

Heartbeat Classification
To quantify the similarity between each heartbeat waveform (HW) and the determined template waveform (TW), three characteristics are adopted in this study: cross-correlation coefficient (Covr), area difference (ArDiff ) and energy difference (EnDiff ). The Covr is defined as where HW and TW are the mean values of HW and TW, respectively, N is the sample points of HW and TW. ArDiff indicates the area difference between HW and TW, the definition of ArDiff is EnDiff is used to assess the energy difference between HW and TW, and is defined as The details of the proposed heartbeat classification are described as follows: Step1: Evaluate the similarity between template and each intracluster heartbeats to determine the heartbeat type. If the current heartbeat and its related intracluster template meets the following conditions (14), the current heartbeat type and its template type are considered the same; else the current heartbeat is considered as "Unknown".
Covr ≥ 0.9 or (Covr ≥ 0.8 and ArDi f f < 10 and EnDi f f < 1) (14) Step2: Evaluate the similarity between "Unknown" heartbeat with all determined templates. The template matching result between "Unknown" heartbeat and all determined templates, as well as the rhythmic rules defined in [26] are considered simultaneously to identify the type of "Unknown" heartbeat.
For the long-term ECG signal in CPSC2020, the 24 h signal is divided into several 30 min segments, and the first 30 min segment is processed as described above. For other segments, a rule-based method is used to determine whether there is a need to update the template. If necessary, the previous described steps are performed to update the template; otherwise, the templates of the previous 30 min segment are used for the current 30 min segment.

Evaluation Method
Three common metrics including Se, P+ and ACC are used to evaluate the performance of the proposed method [27].
where We adopt the scoring rules of the CPSC 2020 competition (PVC score) to evaluate the performance of the algorithm on the CPSC 2020 database, so that our algorithm can be compared with the participating teams of the cpsc2020 competition. The scoring rules are as follows.

•
For a false negative (FN) detection, deduct 5 points, since from a clinical perspective, missed diagnosis is more serious than misdiagnosis, thus we penalize FN detection. The final score for PVC is the sum of all deducted points.

Effectiveness of Feature Vectors Extracted by LSTM-AE
LSTM-AE model combines the LSTM network with the AE, which means the encoding and decoding process is performed by LSTM. Through LSTM, encoder extracts feature from the input ECG signal, while decoder implements the conversion from feature maps to the output. The parameters of the encoding and decoding operations are computed using unsupervised greedy training. In this paper, the input ECG signal of the LSTM-AE model is the raw ECG without filtering, while the loss function used to optimize the LSTM-AE model is calculated between the bandpass-filtered ECG signal and the reconstructed ECG signal. In order to determine the detailed hyperparameter (batch size and feature numbers) of the LSTM-AE model, we tested the PVC detection performance on different parameter settings. Table 2 illustrates the classification accuracy in MIT-BIH-AR database under different hyperparameter settings (take record 100 as an example), it can be seen that the model can provide better classification performance when batch size and feature numbers are set to 128 and 32, respectively. Therefore, the batch size and feature numbers are set to 128 and 32 in our paper, respectively.  Figure 3 shows the ranked feature vectors of PVC and Non_PVC in record 228 from the MIT-BIH-AR database, sorted according to their t-test p-value. It can be seen that the feature values of Non_PVC fluctuate slightly around 1, while the feature vectors of PVC vary greatly from 0 to 10. In addition, it is obvious that more than half the feature vectors between PVC and Non_PVC are different, which indicates that the feature vectors can substitute original ECG data for heartbeat clustering. feature numbers) of the LSTM-AE model, we tested the PVC detection performance on different parameter settings. Table 2 illustrates the classification accuracy in MIT-BIH-AR database under different hyperparameter settings (take record100 as an example), it can be seen that the model can provide better classification performance when batch size and feature numbers are set to 128 and 32, respectively. Therefore, the batch size and feature numbers are set to 128 and 32 in our paper, respectively.  Figure 3 shows the ranked feature vectors of PVC and Non_PVC in record 228 from the MIT-BIH-AR database, sorted according to their T-test p-value. It can be seen that the feature values of Non_PVC fluctuate slightly around 1, while the feature vectors of PVC vary greatly from 0 to 10. In addition, it is obvious that more than half the feature vectors between PVC and Non_PVC are different, which indicates that the feature vectors can substitute original ECG data for heartbeat clustering.

Results of K-Means Clustering
The example of K-means clustering result of record 210 in MIT-BIH-AR database is shown in Figure 4. It can be seen that the heartbeats are clustered into only two groups (K = 2), including 164 heartbeats and 2475 heartbeats (Figure 4a,b), respectively. The heartbeats in each group show high similarity, and the templates (Figure 4e,f) constructed from the 30 heartbeats closest to the centroid of each group show great difference (Figure 4c,d). This demonstrates that the K-means clustering based on the feature vectors can better divide the heartbeats into different groups.

Results of K-Means Clustering
The example of K-means clustering result of record 210 in MIT-BIH-AR database is shown in Figure 4. It can be seen that the heartbeats are clustered into only two groups (K = 2), including 164 heartbeats and 2475 heartbeats (Figure 4a,b), respectively. The heartbeats in each group show high similarity, and the templates (Figure 4e,f) constructed from the 30 heartbeats closest to the centroid of each group show great difference (Figure 4c,d). This demonstrates that the K-means clustering based on the feature vectors can better divide the heartbeats into different groups.   Figure 5a shows the confusion matrix of the results on MIT-BIH-AR database, and the detailed results for this database are illustrated in the appendix (Table A1). The overall ACC is 98.63%, which is comparable to the state of art algorithms. The Se for Non_PVC and PVC beats is 99.46% and 87.51%, respectively; and the P+ is 99.06% and 92.47%, respectively.    Figure 5a shows the confusion matrix of the results on MIT-BIH-AR database, and the detailed results for this database are illustrated in the appendix (Table A1). The overall ACC is 98.63%, which is comparable to the state of art algorithms. The Se for Non_PVC and PVC beats is 99.46% and 87.51%, respectively; and the P+ is 99.06% and 92.47%, respectively.   Figure 5a shows the confusion matrix of the results on MIT-BIH-AR database, and the detailed results for this database are illustrated in the appendix (Table A1). The overall ACC is 98.63%, which is comparable to the state of art algorithms. The Se for Non_PVC and PVC beats is 99.46% and 87.51%, respectively; and the P+ is 99.06% and 92.47%, respectively.

Results on INCART Database
The confusion matrix for the INCART database is shown in Figure 5b and the results for each recording are shown in the appendix (Table A2). For this database, we obtained a 97.89% overall ACC; Se 99.17% and P+ 98.46 % for non-PVC beats, and Se 87.92% and P+ 93.18% for PVC beats. In order to evaluate the multilead robustness of our method, the algorithm was independently verified in all 12-lead signals of the INCART database ( Figure A1). The results on 12-lead INCART database indicated the proposed method had a good generalization ability between leads. Table 3 shows the results of the proposed method on CPSC 2020 dataset. According to the scoring standards of the competition, the PVC score reached 46,706 and 36,256 on the hidden dataset and training dataset, respectively. The result of our method is compared with the final scores of the top five teams on the hidden test set, we got first rank among the open-source codes. In addition, the computational complexity on the hidden test set is analyzed with the help of the competition organizing committee. Compared with the top five teams, the running time of our method is much shorter. It indicates that the proposed method has the potential to be applied in long-term dynamic ECG monitoring for PVC recognition.

Discussion
A PVC recognition algorithm based on integrating deep learning and rules was proposed in this study. Many ES-based or DL-based automatic ECG heartbeat classification algorithms have achieved high recognition results. However, they are complementary in terms of robustness and generalization.
The contribution of this paper is the combination of the DL-assisted template construction and ES-based heartbeat classification, which not only guarantees the accuracy but also improves the interpretability, robustness and generalization ability of the algorithm. A wavelet-based statistical process control (SPC) method was proposed for PVC recognition on MIT-BIH-AR database [28], the overall ACC was 97.90%, and the Se and P+ for PVC were 87.20% and 84.60%, respectively. This method could improve PVC sensitivity by manually adjusting parameter thresholds according to different situations, while our method could achieve high PVC sensitivity without any manual process. A real-time premature beat (PB) detection method for single-lead ECG was proposed based on several simple rules [26], which was reported to have low computational complexity and could be used for real-time PB detection for portable ambulatory ECG monitoring. However, their accuracy on the total data (85.56%) was still non-neglected for accurate clinical diagnosis. Malek et al. [29] developed an improved template matching technique for identifying normal and PVC beats in ECG signals, which was evaluated on the INCART, QT, MIT-BIH Supraventricular Arrhythmia, and Fantasia databases, and the accuracy was 97.91%, 99.34%, 99.89%, and 98.44%, respectively. One of the strengths of this method was the application of an adaptable threshold without the need for expert intervention, however, the features they adopted were more complex than ours. Talbi et al. [30] studied the effectiveness of the fractional linear prediction (FLP) technique on the ECG signal modeling, and developed a PVC recognition method based on the three coefficients of FLP and KNN, and the best accuracy of 96% was achieved on MIT-BIH-AR database. Most of the existing ES-based methods are efficient and requires less expert intervention, but the robustness still needs to be improved for daily life application.
From Table 4, we compared the PVC recognition between the proposed method with existing methods on MIT-BIH-AR database and INCART database. The satisfactory performance of the proposed method on these two clinical databases demonstrated that our method not only guarantees the accuracy and robustness advantages of DL-based method, but also improved the generalization capacity and interpretability advantages of ES-based methods. With the popularity of machine learning, many researchers have implemented machine learning algorithms in arrhythmia recognition and achieved high performance. Mazidi et al. [32] designed a linear kernel-based SVM classifier with morphology, time domain, time-frequency domain and nonlinear features for PVC recognition, the method achieved a higher overall ACC and Se (99.78% and 99.91%, respectively) than our method. Wang et al. [34] proposed a PVC detection scheme based on image processing and CNN for scanned clinical ECG reports, and their Se and ACC could reach 95.47% and 98.25%, respectively. However, our method was unsupervised while the training set used in their method was overlapped in their test set. Oh et al. [12] proposed an automated system using a combination of CNN and LSTM for variable-length ECG classification (five class), they obtained the high classification accuracy of 98.10% without noise elimination on the MIT-BIH-AR database. The system could analyze ECG signals of different lengths with only a single type of arrhythmia, but it was computationally intensive. Yang et al. [27] applied stacked sparse autoencoders (SSAEs) and a Softmax regression (SF) for six types of ECG classification and achieved average 99.22% Se and 99.37% P+ on MIT-BIH-AR database. The features extracted by SSAE had no individual independent differences in feature selection and extraction accuracy, and almost no useful heartbeat information was lost. However, the method was semisupervised and required trained cardiologists to first classify each beat cluster into normal or ventricular. Therefore, it was inappropriate for analyzing long-term signals.
Although we did not participate in CPSC2020 as we were affiliated with the organizer of the challenge, the performance of the proposed method on long-term wearable ECG database (CPSC2020) was also compared with the published top five teams for PVC recognition in CPSC2020 (Table 3). The method proposed by the published champion team employed DenseNet model to classify the heartbeats into three categories (normal, premature ventricular contraction and supraventricular premature beat) and refined the results by a postprocessing procedure with several clinical rules. The algorithms of other teams were almost all DL-based methods, and they could achieve excellent performance on the training set, but they could not maintain such good results on the test set. The reason might be that these teams overoptimized the accuracy of their algorithm on the training set, leading to overfitting, which affected the algorithm results on hidden test set. Both our method and the published champion team's results outperformed DL-based methods, indicating that the fusion of these two (ES-based and DL-based) methods had the potential to reform the existing methods based only on ES or DL.
To evaluate the computational complexity of our method, we computed and compared the operating time of our method and the CPSC2020 top five teams on the hidden test set. In addition, we also compared the running time with some published works in parallel. Three morphological features and seven statistical features were directly extracted, normalized and fed into CFNN classifier for PVC recognition, which could process 20-s segment within 2.1 s on a Samsung Galaxy J1 motherboard (a quad-core Cortex-A7 CPU clocked at up to 1.2 GHz with 1 GB RAM, OS Android 6.0) [15]. Khalaf et al. [37] proposed an SVM-based method on MATLAB R2010a on Intel ® Core™ i5 3.2 GHz processor and 8 GB RAM, and it consumed 54.8 ms for each beat classification. Arrais Junior et al. [38] reported an adaptive threshold and redundant discrete wavelet transform fusion method, which can process 30 min signals using only 61.2 s on the Matlab 2014a platform. These results showed that (1) the superposition of deep learning and time-frequency conversion processes will increase the complexity of the algorithm; (2) complex deep learning frameworks are indeed more time-consuming than simple CNN; (3) the DL-based feature extraction + ES-based postprocessing analysis generally take less time. The comparison results further verified the advantage of the fusion of these two (ES-based and DL-based) methods.
The employed DL-based method (LSTM-AE module) was used to extract features from ECG heartbeats for K-means clustering, and the PVC identification was based on a combination of multiple rules, including template matching and rhythm characteristics. The features used for classification are extracted according to the R-peak-relevant clinical experience: the Covr, ArDiff and EnDiff are used to map the morphological and frequency domain difference between PVC and Non_PVC, and the rhythmic rules are used to map the variation of RR intervals between PVC and Non_PVC. All these features are extracted only based on R peaks instead of those complex features detected from precise fiducial points (Q wave, S wave, etc.) and professional knowledge, which can not only retain the interpretability of the proposed algorithm, but also improve the antinoise ability of the algorithm.
Although the proposed method is an important contribution to unsupervised PVC identification, there are three main limitations. (1) The performance is affected by the misalignment of QRS complex, more accurate QRS detection algorithm should be designed to detect the peak of each QRS complex for precise ECG classification. (2) This method is trained and tested only on the Windows platform, so further work is needed to embed the algorithm to the mobile terminal for daily life monitoring application. (3) Only one-channel information is considered in this paper, multichannel information should be considered from multilead ECG monitoring systems for accuracy improvement of PVC recognition, or even other kinds of heartbeat classification.

Conclusions
In summary, an unsupervised adaptive PVC recognition algorithm is proposed for single-lead ECG based on a novel expert system and deep learning combination strategy. The personalized heartbeat templates are firstly clustered by K-means using LSTM-AE extracted features and determined by rule-based methods. Then, each heartbeat is classified into PVC or Non_PVC by a series of rules. The performance of the proposed algorithm is tested on the clinical databases (MIT-BIH database and INCART database) and long-term wearable databases (CPSC2020 training set and hidden test set). The comparison with the existing PVC algorithms shows that the proposed method embraces the advantages of deep learning and rules, and achieves high accuracy, robustness, and interpretability.