Early Identification of Resuscitated Patients with a Significant Coronary Disease in Out-of-Hospital Cardiac Arrest Survivors without ST-Segment Elevation

This study aimed to develop a machine learning (ML)-based model for identifying patients who had a significant coronary artery disease among out-of-hospital cardiac arrest (OHCA) survivors without ST-segment elevation (STE). This multicenter observational study used data from the Korean Hypothermia Network prospective registry (KORHN-PRO) gathered between October 2015 and December 2018. We used information available before targeted temperature management (TTM) as predictor variables, and the primary outcome was a significant coronary artery lesion in coronary angiography (CAG). Among 1373 OHCA patients treated with TTM, 331 patients without STE who underwent CAG were enrolled. Among them, 127 patients (38.4%) had a significant coronary artery lesion. Four ML algorithms, namely regularized logistic regression (RLR), random forest classifier (RF), CatBoost classifier (CBC), and voting classifier (VC), were used with data collected before CAG. The VC model showed the highest accuracy for predicting significant lesions (area under the curve of 0.751). Eight variables (older age, male, initial shockable rhythm, shorter total collapse duration, higher glucose and creatinine, and lower pH and lactate) were significant to ML models. These results showed that ML models may be useful in developing early predictive tools for identifying high-risk patients with a significant stenosis in CAG.


Introduction
Coronary artery disease is the main cause of out-of-hospital cardiac arrest (OHCA) [1]. Immediate reperfusion of the culprit coronary lesion is proposed to improve outcome in OHCA survivors [2]. Current international guidelines recommend that coronary angiography (CAG) be performed emergently for all cardiac arrest patients with suspected cardiac cause of arrest and ST-segment elevation (STE) on electrocardiogram (ECG) [3]. However, in OHCA patients without STE, early CAG is suggested for selected patients, but guidelines do not provide specific characteristics of patients who may benefit from immediate CAG [4]. Thus, the main challenge is to identify the best candidates for CAG among resuscitated cardiac arrest patients without STE. Previous research focused on the selection of CAG candidates who will recover with good neurologic outcomes. The benefit of CAG is associated primarily with providing percutaneous coronary intervention (PCI), and therefore, identifying patients without STE who have a significant coronary occlusion is crucial. OHCA survivors without STE do not always have obstructive coronary artery This multicenter prospective observational study used data from the Korean Hypothermia Network prospective registry (KORHN-PRO) 1.0 gathered between October 2015 and December 2018. KORHN is a multicenter clinical research consortium for targeted temperature management (TTM) in South Korea. Twenty-two academic hospitals participated in KORHN-PRO. The study included an informed consent form approved by all participating hospitals, and the study was registered at the International Clinical Trials Registry Platform (NCT02827422). Written informed consent was obtained from all patients legal surrogates.

Population and Variables
The inclusion criteria of KORHN-PRO were as follows: OHCA regardless of etiology of cardiac arrest, age older than 18 years, unconsciousness (Glasgow Coma Scale score < 8) after ROSC, and treatment with TTM. The exclusion criteria were as follows: active intra-cranial bleeding, acute stroke, known limitations in therapy and a do-not-attempt resuscitation order, known prearrest cerebral performance category (CPC) 3 or 4, known disease making 6-month survival unlikely, and body temperature < 30 • C on admission. Data were entered into a web-based electronic database registry using a standardized registry form. Each participating hospital had a designated research coordinator responsible for ensuring data accuracy.
For the present trial, we excluded patients with obvious non-cardiac causes and STE on the initial ECG from KORHN-PRO. This study used 37 independent and one dependent variable described below from the registry: age, sex, comorbidities (previous arrest, previous acute myocardial infarction, previous angina, previous arrhythmia, chronic heart failure, previous transient ischemic attack (TIA) or stroke, hypertension, diabetes mellitus, pulmonary disease, neurologic disease other than cerebrovascular accident (CVA), chronic kidney disease, malignancy, previous PCI, previous coronary artery bypass grafting, and previous ischemic heart disease), family history (cardiac arrest, stroke, angina and myocardial infarction, arrhythmia, unknown), resuscitation variables (initial shockable rhythm, witnessed arrest, bystander cardiopulmonary resuscitation, time from collapse to ROSC, total epinephrine dose), post-ROSC variables (immediate ECG findings and laboratory findings), and extracorporeal life support. OHCA survivors were defined as OHCA patients who had a return of spontaneous circulation and were admitted to the hospital for management of post-cardiac arrest care including TTM. The outcome variable of this work was whether OHCA survivors without STE had a significant coronary artery lesions. Significant coronary artery disease was defined by invasive CAG as >50% stenosis of the left main stem or >70% stenosis in a major coronary vessel.

ML Algorithms
Through automated ML procedures, four ML algorithms (regularized logistic regression (RLR) [8], random forest (RF) [9], CatBoost classifier (CBC) [10], and voting classifier (VC) [11], which combined the other three models by 1:1:1) were selected to predict patients needing CAG among OHCA survivors without STE. For curated data, 29 out of 37 independent variables are categorical variables, and the dependent variable is binary. Linearity between the independent variables and the dependent variable cannot be assumed.
Logistic regression (LR) is a statistical model that uses the logistic function as a link function to explain the relationship between independent and dependent variables. In the field of ML, an arbitrary constant is added to the cost function of LR and used as a model for classifying unseen data. When learning a model with training data, the constant helps to optimize between the bias and variance by preventing overfitting. LR with this constraint is called RLR, and the coefficients of explanatory variables in the optimized model represent the feature importance of each variable.
RF is known as an algorithm that can create an optimal model for prediction or classification in a short learning time from a tabular data structure in which continuous and categorical variables are mixed. In particular, it is possible to prevent overfitting by extracting part of the data through sampling and generating a number of mutually independent decision trees using only some variables that greatly reduce the cost function, predicting the outcome through majority voting and evaluating the importance of the independent features. These are the reasons we adopted RF in this study.
In our data, as most independent variables are categorical variables and most of the categorical independent variables are binary, it is essential for algorithms to handle binary categorical variables well. RF may have difficultly generating optimized models if many binary categorical variables are included because it creates a model in a deep tree structure with variables having only one splitting point. As the name suggests, CatBoost can effectively handle categorical variables through sequentially encoding categorical variables or combining highly correlated categorical variables. In addition, because CatBoost uses the boosting algorithm, it is possible to generate an optimized decision tree model in a way different from that of RF.
The three algorithms with different strengths and weaknesses create each optimized model. Finally, the voting algorithm was used to combine the prediction probabilities of the models with several voting methods to form an ensemble model.
Multiclass classification by splitting a minor outcome group into two subgroups was conducted to enhance the performance of the models. Additionally, analyses were carried out by feature selection by permutation [12] or the dimensional compression method with principal component analysis [13] and uniform manifold approximation and projection [14]. However, these trials made no model performance improvement. A stratified five-fold cross-validation method was applied to avoid overfitted model generation, and realistic model performance was evaluated in a clinical environment. A random search method [15] was used for optimizing hyperparameters so that the Cohen's kappa of the RLR, RF, and CBC models had the highest values. For model evaluations, the predicted ratio of patients with significant lesions among OHCA patients without STE, the area under the receiver operating characteristic (AUROC), log loss, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-score (the harmonic average of sensitivity and PPV), Cohen's kappa, and net reclassification improvement (NRI) were used. The performance of the models was evaluated as a function of the number of sample data used to train the models and the number of variables included in descending order of relative feature importance (RFI) in the model. Lastly, the VC model's explainability for global feature importance was presented using Shapely values. Python [16] and its extension packages, such as NumPy 1.20.3 [17], scikit-learn 0.24.2 [18], pandas 1.2.0 [19], SciPy 1.6.1 [20], matplotlib 3.3.2 [21], CatBoost 0.24.4 [10], shap 0.39.0 [22], TPOT 0.11.7 [23], and PyCaret 2.3.1 [24], were used for the ML analyses.

Statistical Methods
Continuous variables are presented as the means with the standard deviation or as median values with interquartile ranges (IQRs). Categorical variables are presented as frequencies and percentages. For patient characteristics and comparisons between groups (significant lesion and without significant lesion), we used Student s t-test for normally distributed continuous variables and the Mann-Whitney U test for nonparametric cases. Fisher s exact test was used for cases of low-frequency categorical variables, and the chi-square test was used otherwise. p-values ≤ 0.05 indicated statistical significance.
In our dataset, nine variables had missing values, and Little's missing completely at random test [25] was run on the dataset to confirm that all variables with missing values in our data were missing completely at random. In the process of data imputation, to increase statistical reliability, the predictive mean matching algorithm of the multivariate imputation by chained equations (MICE) package was used to produce 30 different datasets [26]. A combination of 30 imputed datasets and 5-fold cross-validation allowed stable mean values and 95% confidence intervals (CIs) of the evaluation metrics to be computed. Statistical analysis was performed using R version 4.1.0 beta [27] and its packages such as the MICE 3.13.0 [26].

Baseline Statistics
During the study period, a total of 1373 OHCA patients (>18 years) treated with TTM were enrolled in KORHN-PRO ( Figure S1). Among them, 521 patients with obvious noncardiac causes of arrest, 11 patients lacking ECG data, four patients with no information of outcome, and 163 patients with STE were excluded. Further, 343 patients dropped out due to no CAG within 2 weeks. Thus, 331 patients without STE who underwent CAG were ultimately included in this study.
Among them, 127 patients (38.4%) had a significant coronary artery lesion. Table 1 presents the baseline characteristics by classifying patients according to significant coronary lesion or no lesion. Several variables (age, male, previous acute myocardial infarction, diabetes mellitus, chronic kidney disease, ST-segment depression, glucose, creatinine, and extracorporeal life support) were statistically different between the two groups. Values are presented as mean ± standard deviation, median (IQR (interquartile range)), or number (percentage), as appropriate.

Model Performance and Validation
The four ML models were trained to identify patients who had a significant coronary artery disease among OHCA survivors without STE. The AUROC (95% CI) values of the trained RLR, RF, CBC, and VC models were 0.734 (0.722-0.746), 0.737 (0.729-0.745), 0.736 (0.726-0.746), and 0.751 (0.740-0.762), respectively ( Figure 1). The AUROC of the VC model was statistically better than those of the RLR, RF, and CBC models at a significance level of 0.05. Figure S2 shows the sensitivity, specificity, PPV, and NPV of the four learning models measured in response to increments of thresholds from 0 to 1.
The mean and 95% CI of loss and metrics values were estimated with the 150 models made by a combination of 30 imputed datasets and a five-fold cross validation procedure, and the results are shown in Table 2. All four models had better performance than a random classifier at the absolute values of log loss, considering the ratio of the dependent variable labels (38.4%) [28]. The VC model was superior to the other three models based on all kinds of metrics from AUROC to NRI. The fact that the lower 95% CI of NRI for the VC model was larger than zero revealed that the VC model was only better than the RLR model [29]. The mean and 95% CI of loss and metrics values were estimated with the 150 models made by a combination of 30 imputed datasets and a five-fold cross validation procedure, and the results are shown in Table 2. All four models had better performance than a random classifier at the absolute values of log loss, considering the ratio of the dependent variable labels (38.4%) [28]. The VC model was superior to the other three models based on all kinds of metrics from AUROC to NRI. The fact that the lower 95% CI of NRI for the VC model was larger than zero revealed that the VC model was only better than the RLR model [29].

Model
Actual    The learning curves for Cohen's kappa, AUROC, and F1-score of the VC, CBC, RF, and RLR models as a function of the number of sample data are shown in Figure 2. As the number of sample data increases, the scores of the model evaluated with the training set gradually decrease and then become flattened, and those of the model evaluated with the testing set increase and then reach a saturated state. Figure 2 implies that the performance of all four models can be enhanced when more sample data are collected.
The learning curves for Cohen's kappa, AUROC, and F1-score of the VC, CBC, RF, and RLR models as a function of the number of sample data are shown in Figure 2. As the number of sample data increases, the scores of the model evaluated with the training set gradually decrease and then become flattened, and those of the model evaluated with the testing set increase and then reach a saturated state. Figure 2 implies that the performance of all four models can be enhanced when more sample data are collected. The coefficients of feature importance were respectively estimated by the (a) mean of coefficients of RLR models, (b) Gini impurity-based method of the RF model, and (c) prediction values changed algorithm of CBC models (Figure 3). Note that the values on the x-axis in Figure 3 are meaningful only as the relative importance between features for a model. The order of the Top 3 variables of feature importance (age, creatinine, and glucose) was consistent in the three models. In addition, the predictor variables shown to be statistically significant in Table 1 tend to have high feature importance coefficients. The coefficients of feature importance were respectively estimated by the (a) mean of coefficients of RLR models, (b) Gini impurity-based method of the RF model, and (c) prediction values changed algorithm of CBC models (Figure 3). Note that the values on the x-axis in Figure 3 are meaningful only as the relative importance between features for a model. The order of the Top 3 variables of feature importance (age, creatinine, and glucose) was consistent in the three models. In addition, the predictor variables shown to be statistically significant in Table 1 tend to have high feature importance coefficients.   , and CBC (c) models. The box extends from the first quartile (Q1) to the third quartile (Q3), which is also called IQR. The black and green bars in the box are the median and the mean of the relative feature importance (RFI). The left whisker end is Q1-1.5IQR, and the right whisker end is Q3 + 1.5IQR. Lastly, the hollow circles outside of both whisker ends are the outliers of the coefficients of RFI. The total sums of all feature importance coefficients of the RF and CBC models are 1 and 100, respectively. Figure 4 shows the results of evaluating the model's performance with three metrics-Cohen's kappa, AUROC, and F1-score-according to the number of variables included in the model. The three metric scores were computed by increasing the number of predicting variables in descending order of feature importance coefficients. The number of variables and the imputed datasets included in this experiment were limited due to the lack of computing resources and time-consuming training procedure. In all three metrics, it appeared that the model built with only the top nine variables in terms of feature importance was not inferior to the model with all 37 predicting variables.
The global feature importance for the VC model, the best-performing model in this study, was calculated using Shapley values to reliably select important variables to identify patients with a significant coronary artery lesion. In Figure S3, the features that are higher on the vertical axis have a greater influence on predicting the outcome. When a value of a patient's variable is located on the right side of the horizontal axis, the observed value contributes to increasing the probability of predicting that the patient will have a significant coronary artery lesion. More specifically, older age, male, initial shockable rhythm, shorter total collapse duration, higher glucose and creatinine, and lower pH and lactate predicted the higher probability of a positive outcome. Changes of F1-score of (g) CBC, (h) RF, and (i) RLR models. The mean values and standard deviations of the three metric scores are represented by filled circles and bars with caps, respectively. The three metrics were calculated when the numbers of features were 1, 2, 3, 4, 5, 7, 9,12,15,20,25,30, and 37 due to the time-consuming model training procedure and lack of computing resources.
The global feature importance for the VC model, the best-performing model in this study, was calculated using Shapley values to reliably select important variables to identify patients with a significant coronary artery lesion. In Figure S3, the features that are higher on the vertical axis have a greater influence on predicting the outcome. When a value of a patient's variable is located on the right side of the horizontal axis, the observed value contributes to increasing the probability of predicting that the patient will have a significant coronary artery lesion. More specifically, older age, male, initial shockable rhythm, shorter total collapse duration, higher glucose and creatinine, and lower pH and lactate predicted the higher probability of a positive outcome.

Discussion
In this study, we developed models based on ML algorithms to predict patients with a significant coronary artery lesion requiring early PCI through only basic patient information before admission. We found that the VC model showed the highest accuracy for predicting a significant lesion (AUROC of 0.751). We also found that eight easily captured variables (older age, male, initial shockable rhythm, shorter total collapse duration, higher glucose and creatinine, and lower pH and lactate) were significant.
Coronary artery disease is particularly prevalent in cardiac arrest patients with shockable rhythm: up to 96% of patients with STE on their post-resuscitation ECG and up to 42% for patients without STE [30,31]. Our study showed that 38.4% of patients had a significant coronary artery lesion. This is in agreement with recent studies that have found coronary occlusions in approximately 30% of patients without STE [32][33][34].
Considering that the benefit of CAG is associated primarily with providing PCI, identifying patients who have a significant coronary occlusion is crucial. Waldo et al. developed a risk prediction model for the presence of an acute coronary lesion among patients resuscitated from an arrest [35]. They found four variables-angina, congestive heart failure, shockable rhythm, and STE-and STE had the highest number of points. However, few studies regarding OHCA survivors without STE have focused on the selection of CAG candidates based on neurologic outcome [36][37][38][39]. Although selecting OHCA survivors without STE who will have good neurologic outcomes is important, the next step will be to identify which OHCA survivors have a significant coronary occlusion. Comparing immediate and delayed CAG in cardiac arrest patients with STE, recent randomized controlled trials, such as Coronary Angiography after Cardiac Arrest (COACT) and Angiography after Out-of-Hospital Cardiac Arrest without ST-Segment Elevation (TOMAHAWK), and a systematic review failed to show the superiority of immediate CAG over delayed CAG in improving the survival of resuscitated OHCA patients without STE [40,41]. Thus, the aim of our study was to develop an ML-based model for the early (within 24 h) identification of patients with a significant coronary artery lesion.
For now, except those with STE or cardiogenic shock, the specific subgroups that would benefit from early CAG remain unknown [2,3]. A study from the United States using the International Cardiac Arrest Registry developed a prediction model of CREST scores for circulatory-etiology death using data obtained at the time of intensive care unit admission in OHCA patients without STE. History of coronary artery disease, nonshockable rhythm, initial ejection fraction <30%, shock at the time of presentation, and total ischemic time >25 min accurately and pragmatically predicted the risk of circulatoryetiology death [42]. The American College of Cardiology Interventional Council suggested triaging OHCA survivors using 10 unfavorable resuscitation features, determined by consensus: unwitnessed arrest, initial rhythm non-ventricular fibrillation, no bystander cardiopulmonary resuscitation, >30 min from collapse to ROSC (time-to-ROSC), ongoing cardiopulmonary resuscitation, pH < 7.2, lactate > 7 mmol/L, age > 85 years, end-stage renal disease, and non-cardiac etiology [43]. Moreover, a very recent study by Harhash et al. aimed to identify resuscitated cardiac arrest patients with unfavorable clinical features for whom invasive procedures are unlikely to improve survival [36]. They found that the most powerful predictors of adverse outcome were age > 85 years, >30 min before return of spontaneous circulation, and initial non-shockable rhythm. Although these variables were for predicting poor outcome, some variables, such as age, duration of cardiopulmonary resuscitation, shockable rhythm, and pH in blood gas analysis, were the same as factors for predicting a significant coronary lesion from ML. Noh et al. investigated one prediction ML algorithm to select the high-risk group of patients with acute coronary syndrome requiring revascularization using data from 2344 patients [44]. The obtained prediction functions were relevant, showing an AUROC of 0.860 for the prediction of ACS requiring revascularization. They found the main obstacles to applying the ML techniques were missing data and uncertain labeling causing small numbers of available data.
The data that accumulate in a stressful emergency room are affected by several biases, including typos and discontinuous laboratory measurement. They have great impacts on ML studies using data made up of already sculpted tabular-type variables. However, it was possible to create better performing VC models that synthesized the effective RLR, RF, and CBC models that were complementary with advantages of different algorithms. The soft voting method, the use of the arithmetic mean of prediction probabilities from the RLR, RF, and CBC models, worked well in this study.
The aim of this research was to determine which variables greatly hint at the presence of significant coronary artery lesions. Feature importance determined using Shapley values considering all possible combinations of variables in the VC model can be considered the most reliable result. This provides guidelines to practitioners in emergency rooms on the need to carefully monitor patients with the following: older age, male, initial shockable rhythm, shorter total collapse duration, higher glucose and creatinine, and lower pH and lactate. This result can also be used to prioritize which variables in data to collect by investing limited emergency room resources.
As shown in Figure 4, eight to eleven independent variables of top feature importance of RLR, RF, and CBC showed good performance comparable to that of optimized models with 37 predicting variables. As mentioned earlier, this result suggests that better clinical outcomes might be produced if current limited resources are focused on compiling important variables. Figures 1-4 depict large standard deviations of almost all metrics, which seem to be due to the insufficient number of sample data, not the inclusion of important variables, and the difficulty in selecting representative values of time-series attribute variables [45]. The AUROC values of the model, inferior to that with all 37 variables, trained with the complete dataset to build a model for rapid response under the current emergency room environment are presented in Figure S4. We also observed that the missing value issue, which Noh et al. noted in 2019, is a barrier that makes it difficult to employ ML algorithms in clinical practice [44].
This study has several limitations. There is an important concern about the population selected. KORHN-PRO did not have a solid indication of CAG except STE or cardiogenic shock. The selection of the patients who had undergone CAG was done by the treating physicians, which means there was already a high suspicion that these patients had ACS, creating selection bias. Other issues include the requirement of a computer implementation for the ML model and the limited power to detect other clinical characteristics that may be associated with a significant coronary artery disease due to the modest sample size. Moreover, to appropriately select patients that would benefit from CAG, thrombotic occlusion or culprit lesion was more useful than a significant coronary artery disease. Lastly, there may be valuable variables such as results of echocardiography not included in our registry. Therefore, our VC model is not complete enough to be used in the clinical field right now. Further larger studies with other variables will be needed.

Conclusions
ML models may be helpful early predictive tools for identifying high-risk patients with a significant stenosis in CAG. However, further investigations with large datasets including other variables are warranted to improve the prediction model, which can support clinicians' decisions on early CAG in OHCA survivors without STE.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/jcm10235688/s1: Figure S1: Study Patient Flow Diagram. ECG, electrocardiogram; OHCA, out-of-hospital cardiac arrest; STE, ST-segment elevation; TTM, targeted temperature management. Figure S2: Four metrics (sensitivity (blue), specificity (orange), PPV (green), NPV (red)) for RLR (a), RF (b), CBC (c), and VC (d) models in response to thresholds, ranging from 0 to 1; PPV, positive predictive value; NPV, negative predictive value; RLR, regularized logistic regression; RF, random forest; CBC, CatBoost classifier; VC, voting classifier. Figure S3: Feature importance of voting classifier model was estimated by Shapley value. The x-axis is the voting classifier model's SHAP values, which are the contributions of each variable value to predicting the model outcome as label 1. When it is positive, the possibility of the existence of a significant lesion is increased, and vice versa. The variables higher on the y-axis have more influence on predicting the outcome in the model. The colored dots are sample data from the training set, and the red and blue dots indicate, respectively, high and low values in the range of each variable. ABGA, arterial blood gas analysis; HbA1c, glycated hemoglobin; ROSC, return of spontaneous survival. Figure S4: ROC and AUROC of four models (RLR (blue), RF (orange), CBC (green), VC (red)) trained on complete set excluding nine variables with missing values. AUROC, area under the receiver operating characteristic; RLR, regularized logistic regression; RF, random forest; CBC, CatBoost classifier; VC, voting classifier. Table S1: The definition and collection form of the variables in Korean Hypothermia Network prospective registry (KORHN-PRO).