Diagnostic Accuracy of the Deep Learning Model for the Detection of ST Elevation Myocardial Infarction on Electrocardiogram

We aimed to measure the diagnostic accuracy of the deep learning model (DLM) for ST-elevation myocardial infarction (STEMI) on a 12-lead electrocardiogram (ECG) according to culprit artery sorts. From January 2017 to December 2019, we recruited patients with STEMI who received more than one stent insertion for culprit artery occlusion. The DLM was trained with STEMI and normal sinus rhythm ECG for external validation. The primary outcome was the diagnostic accuracy of DLM for STEMI according to the three different culprit arteries. The outcomes were measured using the area under the receiver operating characteristic curve (AUROC), sensitivity (SEN), and specificity (SPE) using the Youden index. A total of 60,157 ECGs were obtained. These included 117 STEMI-ECGs and 60,040 normal sinus rhythm ECGs. When using DLM, the AUROC for overall STEMI was 0.998 (0.996–0.999) with SEN 97.4% (95.7–100) and SPE 99.2% (98.1–99.4). There were no significant differences in diagnostic accuracy within the three culprit arteries. The baseline wanders in false positive cases (83.7%, 345/412) significantly interfered with the accurate interpretation of ST elevation on an ECG. DLM showed high diagnostic accuracy for STEMI detection, regardless of the type of culprit artery. The baseline wanders of the ECGs could affect the misinterpretation of DLM.


Introduction
Although the mortality of acute myocardial infarction (AMI) has been improving recently with coronary reperfusion therapy, AMI is still the leading cause of death worldwide [1,2]. AMI can be divided into ST elevation myocardial infarction (STEMI) and non-ST elevation myocardial infarction (NSTEMI). STEMI has more severe complaints than NSTEMI, and a higher mortality rate with rapid disease progression [3,4].
Since STEMI requires urgent reperfusion therapy, the quick and accurate interpretation of the electrocardiogram (ECG) is essential. The interpretation of STEMI on ECG is a difficult task for emergency physicians and cardiologists [2].
ECG machines have their own automatic machine interpretation programs according to the manufacturers. Clinicians have used the automatic ECG interpretation provided by ECG machines to determine whether the ECG is a real STEMI. However, previous studies reported that the inaccuracy of the automatic interpretation of ECG machines ranged from 5.9% to as high as 29%. Thus, the automatic ECG interpretation of STEMI remains unreliable [5][6][7]. Although several studies have been conducted to improve the 2 of 10 accuracy of ECG machine interpretation, obtaining an accurate ECG interpretation remains difficult [8,9].
Deep learning has recently been applied to improve the accuracy of ECG analysis. Additionally, efforts to improve the accuracy of STEMI-ECG using deep learning have been actively attempted [1,[10][11][12]. However, the accuracy of STEMI-ECG interpretation by deep learning has not yet reached a clinically applicable level to diagnose real STEMI.
This study aimed to demonstrate that the application of deep learning can improve the accuracy of STEMI-ECG interpretation.

Study Design
This was a single center, retrospective, observational study. The recruitment period was from January 2017 to December 2019. This study was approved by the Hallym University Kangnam Sacred Heart Hospital Institution Review Board in November 2019 (No. HKS 2019-10-021-001).
By searching electronic medical records during the recruitment period, we recruited patients with AMI who received more than one stent insertion for culprit artery occlusion after visiting the emergency room in the Hallym University Kangnam Sacred Heart Hospital. The culprit coronary artery was defined as any vessel with acute thrombotic total or subtotal occlusion [13].
A single ECG machine (MAC 5500 HD, GE Healthcare, Chicago, IL, USA) was used in the emergency room during the recruitment period for the STEMI group. Two trained cardiologists with sufficient experience interpreted and confirmed STEMI-ECGs among AMI patients using reliable STEMI diagnostic criteria [14]. The cardiologists were board certified physicians in cardiology and had worked in a university hospital for more than ten years.
The STEMI group was categorized into three groups according to the type of culprit artery receiving coronary intervention. The culprit arteries were the left anterior descending artery (LAD), left circumflex artery (LCX), and right coronary artery (RCA). We collected information on baseline characteristics and analyzed the success rate of STEMI detection using an ECG machine.
Patients with an ECG interpretation of "normal sinus rhythm (NSR) and normal ECG" were considered as the NSR group. The ECGs of the NSR group were collected using the Cardiology Information System (MUSE ® , GE Healthcare, Chicago, IL, USA) during the same recruitment period as patients with STEMI.

Outcomes
The primary outcome was the diagnostic accuracy of the deep learning model (DLM) for STEMI according to the culprit arteries. The outcomes were measured using area under the receiver operating characteristic curve (AUROC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV) using the Youden index.

Application of Deep Learning Model
We externally validated the ECG dataset to measure the STEMI detection performance of the DLM between STEMI-ECGs and NSR-ECGs. DLM was also applied to measure the difference in STEMI detection performance according to culprit artery occlusion.
The The flow of the DLM is represented in Figure 1 in the development of the model architecture. Twelve-lead raw ECG with differentiated data and the P, QRS, and T section information were concatenated as the input. Ten-second ECG was split to obtain four ensembled outputs for 2.5 s each, to ensure stable performance. The weight shared encoder, which is based on the ResNet structure with some modifications (Document S1), outputs each lead information [15]. These were concatenated and used as the input of the classifier, fully connected layers, and a sigmoid layer. The classifier has a probability between STEMI-ECGs and NSR-ECGs. The flow of the DLM is represented in Figure 1 in the development of the model architecture. Twelve-lead raw ECG with differentiated data and the P, QRS, and T section information were concatenated as the input. Ten-second ECG was split to obtain four ensembled outputs for 2.5 s each, to ensure stable performance. The weight shared encoder, which is based on the ResNet structure with some modifications (Document S1), outputs each lead information [15]. These were concatenated and used as the input of the classifier, fully connected layers, and a sigmoid layer. The classifier has a probability between STEMI-ECGs and NSR-ECGs.

Statistical Analysis
The data were compiled using a standard spreadsheet application (Excel, Microsoft, Redmond, WA, USA) and analyzed using the Statistical Package for the Social Sciences (SPSS) 26.0 KO for Windows (SPSS Inc., Chicago, IL, USA). We generated descriptive statistics and presented them as frequencies and percentages for categorical variables. Continuous variables are presented as mean with standard deviation (mean ± SD) for parametric data or median with interquartile range for nonparametric data. Normality for continuous variables was tested using the Shapiro-Wilk test. To identify the correlation between factors and autointerpretation of ECG, the chi-square test or Fisher's exact test was used for categorical variables. An independent t-test (parametric data) or Mann-Whitney test (nonparametric data) was used for continuous variables. The external validation performance of the DLM was evaluated using the AUROC, SEN, and SPE using a two sided 95% CI. To compare AUROC among the three different culprit artery groups, the significance was statistically tested using DeLong's test. Statistical significance was set at p < 0.05.

Statistical Analysis
The data were compiled using a standard spreadsheet application (Excel, Microsoft, Redmond, WA, USA) and analyzed using the Statistical Package for the Social Sciences (SPSS) 26.0 KO for Windows (SPSS Inc., Chicago, IL, USA). We generated descriptive statistics and presented them as frequencies and percentages for categorical variables. Continuous variables are presented as mean with standard deviation (mean ± SD) for parametric data or median with interquartile range for nonparametric data. Normality for continuous variables was tested using the Shapiro-Wilk test. To identify the correlation between factors and autointerpretation of ECG, the chi-square test or Fisher's exact test was used for categorical variables. An independent t-test (parametric data) or Mann-Whitney test (nonparametric data) was used for continuous variables. The external validation performance of the DLM was evaluated using the AUROC, SEN, and SPE using a two sided 95% CI. To compare AUROC among the three different culprit artery groups, the significance was statistically tested using DeLong's test. Statistical significance was set at p < 0.05.
The baseline characteristics of the experimental groups are shown in Table 1. There were no significant factors affecting the success of STEMI detection by automated ECG interpretation.

Diagnostic Accuracy of DLM between STEMI-ECGs versus NSR-ECGs
A total of 60,157 ECGs were obtained. These included 117 STEMI-ECGs and 60,040 NSR-ECGs ( Figure 2). The baseline characteristics of the experimental groups are shown in Table 1. There were no significant factors affecting the success of STEMI detection by automated ECG interpretation.   The DLM also showed a high SEN and SPE, as shown in Table 2. In the detection of the overall STEMI-ECGs, the SEN and SPE were 97.4% and 99.2%, respectively. In the analyses by culprit arteries, all SEN and SPE were in the range of 95-100%. It demonstrated a very high overall NPV (99.9%), with the NPV above 99% regardless of the type of culprit artery. PPV, on the other hand, had an extremely low value (Total: 20.2%; RCA: 4.6%; LAD: 15.7%; LCX: 3.2%).   The DLM also showed a high SEN and SPE, as shown in Table 2. In the detection of the overall STEMI-ECGs, the SEN and SPE were 97.4% and 99.2%, respectively. In the analyses by culprit arteries, all SEN and SPE were in the range of 95-100%. It demonstrated a very high overall NPV (99.9%), with the NPV above 99% regardless of the type of culprit artery. PPV, on the other hand, had an extremely low value (Total: 20.2%; RCA: 4.6%; LAD: 15.7%; LCX: 3.2%).

Analysis for False Positive and False Negative Results
We found that the DLM detected 412 NSR-ECGs as STEMI-ECGs, which suggested false positive cases in Table 3. These false positive ECGs included 345 ECGs with a baseline wander (83.7%, 345/412). Ten real STEMI patients were identified in the chart review by cardiologists. Nevertheless, there was no significant difference in real STEMI-ECGs regardless of the existence of a baseline wander (p = 0.49). In the measurement of the effect of a baseline wander, the results suggested that the interpretation of ST elevation was significantly impeded by a baseline wander (p = 0.001). The six examples of false positive cases are shown in Supplementary Figure S1 using 12-lead ECGs and heatmap images. Three STEMI-ECGs were false negative by DLM. Two ECGs were the inferior wall (all RCA culprit arteries) and an anteroseptal wall STEMI (LAD culprit artery). All false negative ECG results showed baseline wanders.

Discussion
This study demonstrated that DLM could significantly improve the accuracy of STEMI-ECG interpretation. DLM showed high diagnostic performance for STEMI detection (AUROC, 0.998; SEN, 97.4%; SPE, 99.2%). In the analysis according to three different culprit coronary arteries, the diagnostic accuracy of LAD, LCX, and RCA STEMI was greater than AUROC 0.99. In addition, the sensitivity and specificity of DLM were >95%. The PPV was only 20.2% while having a high NPV of more than 99%. We believe that the low PPV is linked to the cohort's low STEMI prevalence (117 STEMI-ECG vs. 60,040 NSR-ECG; prevalence = 0.2%). We also found that the baseline wander of ECGs could affect the misinterpretation of DLM by analyzing false positive and false negative ECGs.
We applied the DLM to classify STEMI-ECGs from NSR-ECGs. Compared with typical convolutional neural network (CNN) models, which have been widely used in ECG studies, DLM has some advantages for STEMI detection [16][17][18]. First, the DLM processes the 12-lead (each channel) information separately by applying weight shared encoders for each channel, which can reduce information loss by including each lead. Typical models, which have a single encoder for 12 leads, usually aggregate channel information in the first layer. However, since the ST elevation feature could appear on a few leads associated with the obstruction of culprit arteries, the DLM showed better performance in the STEMI detection task than the single encoder. Second, the DLM can optimally process the ECG feature input. The major ECG feature input is as follows: P wave, QRS complex, T wave, and the shape, amplitude, and latency of each wave. In the application of a conventional CNN with a single encoder, the aggregation of this channel information can impede rapid ECG input processing.
In the analysis of false positive and false negative ECGs, we found that a baseline wander significantly affected the accuracy of DLM. A baseline wander was detected in 83.7% (345/412) in false positive ECGs and 100% (3/3) in false negative ECGs. In the selection of STEMI-ECGs by cardiologists, we removed those with a baseline wander for correct interpretation. However, when extracting NSR-ECGs through a cardiology information system, several NSR-ECGs with baseline wander were included, as the filtering function for baseline wander was not embedded in this system. The baseline wander might have originated from unsolved human factors such as movement, shivering, or poor contact of ECG leads with dry or hairy skin [19]. Considering these results, we anticipate that the application of the wander filtering system will improve the diagnostic accuracy of DLM.
The analysis of false positive cases also showed that the control group included 10 STEMI (3 ECGs without baseline wander and 7 ECGs with baseline wander). This result was caused by setting the control group using the autointerpretation of ECG machines. In addition to the 10 STEMI ECGs, true negative results included 402 NSR-ECGs. Unfortunately, we could not identify the cause of false positive results because the DLM did not report the analysis of misinterpretation. Additionally, the ST elevation of ECGs without a baseline wander was significantly higher than that of ECGs with a baseline wander. This result also demonstrated that a baseline wander could significantly interfere with the interpretation of ST elevation by DLM [20].
DLM allows the possibility of STEMI to be detected with high accuracy in a short period in circumstances when it is difficult to determine the STEMI on the ECG in the prehospital stage, or for medical professionals who are unfamiliar with the ECG reading at the primary medical institution. This might help STEMI patients by allowing them to go to a cardiovascular center where revascularization can be performed quickly.
This study has some limitations. First, the sample size of the STEMI group (n = 117) was smaller than that of the NSR group (n = 60,040) during the same recruitment period. Although the model showed 100% accuracy in the autointerpretation of LCX-STEMI, the sample size of LCX-STEMI was smaller than that of LAD-STEMI and RCA-STEMI. Therefore, the accuracy of the autointerpretation of ECG machines or DLM might differ in the study of a larger sample size of STEMI. Second, the control group was selected using the autointerpretation of ECG machines, which resulted in the misinterpretation of 10 cases with STEMI. Third, this was a retrospective single center study, which limits its generalizability. Fourth, only one type of ECG machine was used in this study. If the algorithm for the autointerpretation of EGCs varies according to the type or manufacturer of ECG machine, the accuracy of autointerpretation might change. Fifth, some ECGs with left ventricular hypertrophy (LVH), left bundle branch block (LBBB), and right bundle branch block (RBBB) were not included in the STEMI group. ECGs such as LVH and LBBB, according to the myocardial infarction definition, can simulate ST deviation [14]. For STEMI-ECG with LBBB, the Sgarbossa criteria is known. Although the RBBB ECG is not an exclusion criteria in the definition of myocardial infarction, we took into account the difficulties of interpreting ST elevation on precordial leads such as V1-2 in a clinical setting. As a result, during the external validation process, we eliminated the STEMI-ECG with LVH or LBBB. The diagnostic accuracy of the deep learning model will be affected if these ECGs are incorporated. Sixth, we used C statistics to assess the diagnostic accuracy of DLM. Although C statistics have been routinely utilized to assess the predictive power of models, their correlation to clinical outcomes has been questioned. Novel statistical indices such as "net benefit" have been proposed for measuring the diagnostic performance of tools or the prediction power of models to counteract this flow [21]. To compute "net benefit", we attempted to establish a "harm to benefit ratio". We were unable to discover the publication claiming a "harm to benefit ratio" for myocardial infarction despite a comprehensive search. It is recommended that further study be conducted before using a robust prediction model such as "net benefit".

Conclusions
DLM showed high diagnostic accuracy for STEMI detection regardless of the type of culprit artery. Baseline wanders of the ECGs could affect the misinterpretation of DLM.