Integrating Coronary Plaque Information from CCTA by ML Predicts MACE in Patients with Suspected CAD

Conventional prognostic risk analysis in patients undergoing noninvasive imaging is based upon a limited selection of clinical and imaging findings, whereas machine learning (ML) algorithms include a greater number and complexity of variables. Therefore, this paper aimed to explore the predictive value of integrating coronary plaque information from coronary computed tomographic angiography (CCTA) with ML to predict major adverse cardiovascular events (MACEs) in patients with suspected coronary artery disease (CAD). Patients who underwent CCTA due to suspected coronary artery disease with a 30-month follow-up for MACEs were included. We collected demographic characteristics, cardiovascular risk factors, and information on coronary plaques by analyzing CCTA information (plaque length, plaque composition and coronary artery stenosis of 18 coronary artery segments, coronary dominance, myocardial bridge (MB), and patients with vulnerable plaque) and follow-up information (cardiac death, nonfatal myocardial infarction and unstable angina requiring hospitalization). An ML algorithm was used for survival analysis (CoxBoost). This analysis showed that chest symptoms, the stenosis severity of the proximal anterior descending branch, and the stenosis severity of the middle right coronary artery were among the top three variables in the ML model. After the 22nd month of follow-up, in the testing dataset, ML showed the largest C-index and AUC compared with Cox regression, SIS, SIS score + clinical factors, and clinical factors. The DCA of all the models showed that the net benefit of the ML model was the highest when the treatment threshold probability was between 1% and 9%. Integrating coronary plaque information from CCTA based on ML technology provides a feasible and superior method to assess prognosis in patients with suspected coronary artery disease over an approximately three-year period.


Introduction
Coronary computed tomography angiography (CCTA) is increasingly accepted as a first-line noninvasive imaging examination that has shown high accuracy for diagnosing and excluding coronary artery disease (CAD) [1,2]. Furthermore, CCTA examination was used to evaluate various stages of atherosclerosis ranging from plaque formation (length, composition, and morphology) to plaque progression, aiding in risk stratification for future major adverse cardiovascular events (MACE) and medical decision-making for patients with CAD [3][4][5][6][7].
Conventional CCTA risk scores were used to stratify the patients with CAD mainly based on the presence, length, composition, and luminal stenosis of 16-segment coronary plaque [8][9][10]. This plaque information was integrated into a single score, assuming a linear relationship between the atherosclerosis extent and outcomes [8,11,12]. Machine learning (ML) is a field of computer science that uses advanced algorithms including a great number of variables to optimize prediction, and this methodology has the potential to maximize the utilization of the coronary plaque information derived from CCTA without prior assumptions for independent variables. Previous studies have demonstrated that ML showed improves predictive values for death, myocardial ischemia and myocardial infarction compared with conventional risk scores [13][14][15]. The aim of the present study was to explore whether ML based on survival data with a time-dependent outcome integrating plaque information from CCTA exhibits better predictive values for MACEs over an approximately three-year follow-up period than the conventional CCTA risk score in patients with suspected coronary artery disease.

Study Population
This is a single-center prospective observational study that was approved by the institutional review board of PLA General Hospital. All patients provided written informed consent. A total of 5526 patients with suspected coronary artery disease who sequentially underwent CCTA at the Department of Cardiology of PLA General Hospital were included from January 2015 to December 2016. The inclusion criteria were complete CCTA and clinical data. The exclusion criteria were prior known CAD (defined as prior myocardial infarction or revascularization) or those with early revascularization after CCTA (defined as within 3 months), incomplete CCTA, motion artifacts, poor-quality images, or severe coronary artery calcification that was unable to be interpreted ( Figure 1). In total, 4017 patients were included. used to evaluate various stages of atherosclerosis ranging from plaque formation (length, composition, and morphology) to plaque progression, aiding in risk stratification for future major adverse cardiovascular events (MACE) and medical decision-making for patients with CAD [3][4][5][6][7]. Conventional CCTA risk scores were used to stratify the patients with CAD mainly based on the presence, length, composition, and luminal stenosis of 16-segment coronary plaque [8][9][10]. This plaque information was integrated into a single score, assuming a linear relationship between the atherosclerosis extent and outcomes [8,11,12]. Machine learning (ML) is a field of computer science that uses advanced algorithms including a great number of variables to optimize prediction, and this methodology has the potential to maximize the utilization of the coronary plaque information derived from CCTA without prior assumptions for independent variables. Previous studies have demonstrated that ML showed improves predictive values for death, myocardial ischemia and myocardial infarction compared with conventional risk scores [13][14][15]. The aim of the present study was to explore whether ML based on survival data with a time-dependent outcome integrating plaque information from CCTA exhibits better predictive values for MACEs over an approximately three-year follow-up period than the conventional CCTA risk score in patients with suspected coronary artery disease.

Study Population
This is a single-center prospective observational study that was approved by the institutional review board of PLA General Hospital. All patients provided written informed consent. A total of 5526 patients with suspected coronary artery disease who sequentially underwent CCTA at the Department of Cardiology of PLA General Hospital were included from January 2015 to December 2016. The inclusion criteria were complete CCTA and clinical data. The exclusion criteria were prior known CAD (defined as prior myocardial infarction or revascularization) or those with early revascularization after CCTA (defined as within 3 months), incomplete CCTA, motion artifacts, poor-quality images, or severe coronary artery calcification that was unable to be interpreted ( Figure 1). In total, 4017 patients were included.

Figure 1.
A flowchart about the framework of this study. The data were randomly divided into a training dataset and a testing dataset at a ratio of 7:3. The training dataset was used to build the prediction model, whereas the testing dataset was independently used to verify the effectiveness of the prediction model generated by the training dataset by computing C-index, AUC, Brier score and DCA.

Clinical Data
Demographic characteristics (age, male sex, and body mass index [BMI]) and conventional cardiovascular risk factors (dyslipidemia, hypertension, diabetes, current smoking, and family history of CAD) were collected by checking the medical record system. Hypertension was defined as a history of blood pressure >140 mmHg or treatment with antihypertensive medications. Diabetes mellitus was defined by a diagnosis made previously and/or use of insulin or oral hypoglycemic agents. Smoking was defined as current smoking or cessation of smoking within the last 3 months. A family history of premature CAD was defined as MI in a first-degree relative <55 years (male) or <65 years (female). Dyslipidemia was defined as known but untreated dyslipidemia or current treatment with lipid-lowering medications.

Image Acquisition and Analysis
A second-generation dual-source CT (Simens CT SOMATOM Definition Flash, SIEMENS AG, Munich, Germany) was used for the CCTA scanning. The acquisition protocols were performed in accordance with the Society of Cardiovascular Computed Tomography guidelines [16]. A detailed methodology has been previously published [17].
All images were analyzed by three radiologists or cardiologists using the 16-segment coronary artery tree model for the segment involvement score (SIS score) and the 18-segment coronary artery tree model for ML [10,16]. Plaque was defined as a tissue structure > 1 mm 2 within or adjacent to the coronary artery lumen that could be distinguished from surrounding pericardial tissue, epicardial fat, or the vessel lumen [8]. The presence of plaque was evaluated with the corresponding stenosis severity in each segment. The coronary plaques in each segment were classified as noncalcified, mixed, and calcified plaques. The corresponding stenosis severity of the plaques was classified as 0%, 1-24%, 25-49%, 50-69%, 70-99%, and 100%. Lengths of coronary plaque were classified as 0 mm, <10 mm, 10-20 mm, and >20 mm. Coronary dominance was divided into left dominant, right dominant, and balanced types. Myocardial bridge was defined as a coronary artery segment that was surrounded by myocardium and led to systolic compression of a part of the myocardium covering the epicardial vessels [18]. Plaques with two or more characteristics (positive remodeling, spotty calcification, low attenuation plaque, and napkin-ring sign) at the same time were defined as vulnerable plaques [19]. Positive remodeling was assessed as the cross-sectional area at the site of maximal stenosis divided by an average of the proximal and distal reference segment cross-sectional areas [20]. Spotty calcification was defined by calcium deposits (>130 HU) that were <3 mm within an atheroma [21]. A low attenuation plaque was defined as a plaque with an average attenuation <30 HU, and the size of the necrotic core was >1 mm 2 [19]. The napkin-ring sign was defined as a ring of attenuation of <130 HU that formed an arc of higher attenuation around a low attenuating plaque [22].

Outcome
The survival status of the patient was obtained by reviewing the electronic medical record system or patient interviews at least 90 days after CCTA examination from 1 January 2015 to 31 August 2020. MACEs, including nonfatal myocardial infarction, unstable angina requiring hospitalization, and cardiac death, were recorded as the outcome of the present study. Two physicians judged each event independently. In the case of divergence, a third physician was consulted.

Machine Learning Algorithm with Survival Times
Fifty-seven CCTA variables (including plaque length, plaque composition and stenosis severity of 18 coronary artery segments, coronary artery dominance, myocardial bridge, and vulnerable plaque) and nine clinical factors (including male, age, BMI, diabetes, hypertension, dyslipidemia, family history of CAD, current smoking, and chest symptoms) were available (Table 1). Machine learning involved automated feature selection, model building, and 10-fold stratified cross-validation for the entire process [23,24]. Machine learning techniques were implemented using R version 4.0.2. First, the data were randomly divided into a training dataset and a testing dataset at a 7:3 ratio. The training dataset was used to build the prediction model, and the testing dataset was independently used to verify the effectiveness of the prediction model generated by the training dataset.
Second, automated feature selection for fifty-seven CCTA variables and nine clinical factors was performed in the training dataset using least absolute shrinkage and selection operator regression for Cox regression (LASSO-COX), which minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant, shrinks coefficients, and produces some coefficients that are zero, allowing for efficient variable selection (Table 1) [23].
Then, filtered CCTA variables were included for model generation. The model for MACE prediction was constructed using 'CoxBoost', an algorithm used to fit a Cox proportional hazards model by componentwise likelihood based on the offset-based boosting approach. This algorithm is especially suited for models with a large number of variables and allows for mandatory covariates with unpenalized parameter estimates [25][26][27][28].
The model building procedure using the training dataset included two steps, as follows. First, the hyperparameters of CoxBoost (penalty, optimal step, and numbers of estimators) were automatically calculated by the training dataset. The penalty value was calculated using a coarse line search that lead to an optimal number of boosting steps for CoxBoost, as determined by 10-fold cross-validation [29]. The optimal step of the model was confirmed using a coarse line search considering the connections between parameters to identify a potential combination of tuned hyperparameters (a penalty updating scheme was helped by an optimum step-size modification for CoxBoost), which results in an optimal model in terms of cross-validated partial log-likelihood [26]. Second, after tuning the hyperparameters from 10-fold stratified cross validation, the model was refitted on the entire training dataset for the training model. Then, the trained model was validated on the independent testing dataset (30% of entire data) to show the prediction probabilities. Compared with other models, the performance of the ML model was derived from the testing dataset.

The Reference Models
First, Cox proportional hazard regression (Cox regression), including the same variables as the ML model and the conventional CCTA risk score (SIS score) assessing overall plaque burden, was used in this study. The SIS score was calculated as a measure of overall coronary segments with plaque by summation of the absolute number of coronary segments with plaques (0-16) [30]. Second, the clinical factors were added to the SIS score (SIS score + clinical factors), and only clinical factors were used in this study as reference models.

Statistical Analysis
Continuous variables are presented as the mean ± standard deviation, and categorical variables are presented as counts (%). We assessed the performance of each prediction model (including CoxBoost, Cox regression, SIS score, SIS score + clinical factors, and clinical factors) to discriminate outcomes on the testing dataset using the C-index and AUC [31]. We evaluated the calibration of each prediction model using the Brier score [32]. The Cox regression model included the variables used in the ML model. The Brier score calculates the mean squared distance between the predicted probabilities and actual outcomes, and a smaller value indicates better calibration (<0.25 indicates significant) [32]. Decision curve analysis (DCA) of all models revealed the preferred model with the best net benefit at any given threshold. The statistical analysis was implemented in R version 4.0.2. A two-sided p value < 0.05 was considered statistically significant.

Feature Selection and Model Generation
In this study, feature selection was performed by LASSO-COX ( Figure 2). When the hyperparameter of feature selection were determined (partial likelihood deviance is minimum), the algorithm output filtered variables with non-zero coefficients (chest symptoms (symptom); MB; plaque composition of the middle right coronary, the left main coronary artery, the proximal, middle and distal anterior descending branch, the first obtuse marginal branch, and the ramus intermedius artery (RCAm_composition, LM_composition, LADp_composition, LADm_composition, LADd_composition, OM1_composition, RI_composition); plaque length of the distal right coronary, the proximal anterior descending branch, and the proximal circumflex branch (RCAd_length, LADp_length, LCXp_length); and stenosis of the proximal and middle right coronary, the left main coronary artery, the proximal, middle and distal anterior descending branch, the first and second diagonal branch, and the proximal circumflex branch (RCAp_stenosis, RCAm_stenosis, LM_stenosis, LADp_stenosis, LADm_stenosis, LADd_stenosis, D1_stenosis, D2_stenosis, LCXp_stenosis)) ( Figure 2).

Feature Selection and Model Generation
In this study, feature selection was performed by LASSO-COX (Figure 2). When the hyperparameter of feature selection were determined (partial likelihood deviance is minimum), the algorithm output filtered variables with non-zero coefficients (chest symptoms (symptom); MB; plaque composition of the middle right coronary, the left main coronary artery, the proximal, middle and distal anterior descending branch, the first obtuse marginal branch, and the ramus intermedius artery (RCAm_composition, LM_composition, LADp_composition, LADm_composition, LADd_composition, OM1_composition, RI_composition); plaque length of the distal right coronary, the proximal anterior descending branch, and the proximal circumflex branch (RCAd_length, LADp_length, LCXp_length); and stenosis of the proximal and middle right coronary, the left main coronary artery, the proximal, middle and distal anterior descending branch, the first and second diagonal branch, and the proximal circumflex branch (RCAp_stenosis, RCAm_stenosis, LM_stenosis, LADp_stenosis, LADm_stenosis, LADd_stenosis, D1_stenosis, D2_stenosis, LCXp_stenosis)) ( Figure 2). Selecting process for features by Lasso-Cox. Automated feature selection for fifty-seven CCTA variables and nine clinical factors was performed using LASSO-COX, which minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant, shrinks coefficients, and produces some coefficients that are zero, allowing efficient variable selection (a). When the hyperparameters of feature selection were determined (partial likelihood deviance is minimum) (b), the algorithm outputted 21 filtered variables with non-zero coefficients (the filtered variables were included in model generation subsequently).
After feature selection, the filtered variables were included in model generation (Figure 3). When the hyperparameters of the ML model were determined (the penalty was 1116, and the step was 74), the optimal model (the logplik of the 10-fold stratified cross validation was the largest) was identified in the training dataset (Figure 3a). In the ML Figure 2. Selecting process for features by Lasso-Cox. Automated feature selection for fifty-seven CCTA variables and nine clinical factors was performed using LASSO-COX, which minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant, shrinks coefficients, and produces some coefficients that are zero, allowing efficient variable selection (a). When the hyperparameters of feature selection were determined (partial likelihood deviance is minimum) (b), the algorithm outputted 21 filtered variables with non-zero coefficients (the filtered variables were included in model generation subsequently).
After feature selection, the filtered variables were included in model generation ( Figure 3). When the hyperparameters of the ML model were determined (the penalty was 1116, and the step was 74), the optimal model (the logplik of the 10-fold stratified cross validation was the largest) was identified in the training dataset (Figure 3a). In the ML model, chest symptoms, stenosis of the proximal anterior descending branch, and stenosis of the middle right coronary artery were among the top three variables (Figure 3b). model, chest symptoms, stenosis of the proximal anterior descending branch, and stenosis of the middle right coronary artery were among the top three variables (Figure 3b).

Assessment of the Performance of Each Prediction Model
After the 22nd month of follow-up, compared to other models (Cox regression, SIS score, SIS score + clinical factors, and clinical factors), the C-index of the ML model for prediction of the MACE in the testing dataset (30% of the data not used for model build-  Table 4).

Model Evaluation Using Calibration and DCA
In this study, we evaluated each model through calibration and DCA. In the model calibration, this study shows that the Brier score for each model to predict the MACE was less than 0.040 in approximately three years (<0.25 means significant) ( Table 5). The DCA of all the models showed that the proportion of the benefit for the population each year was the highest when the risk assessment of the ML model was used for treatment, while the treatment threshold probability was between 1% and 9% over a period of approximately three years. (Figure 6). ard regression; SIS score, segment involvement score; SIS score + clinical factors, clinical factors added to segment involvement score.

Model Evaluation Using Calibration and DCA
In this study, we evaluated each model through calibration and DCA. In the model calibration, this study shows that the Brier score for each model to predict the MACE was less than 0.040 in approximately three years (<0.25 means significant) ( Table 5). The DCA of all the models showed that the proportion of the benefit for the population each year was the highest when the risk assessment of the ML model was used for treatment, while the treatment threshold probability was between 1% and 9% over a period of approximately three years. (Figure 6).   The decision curve analysis of all models showed that the proportion of the benefit for the population each year was the highest when the risk assessment of the ML model was used for treatment, while the treatment threshold probability was between 1% and 9% over a period of approximately three years.

Discussion
In this study, we used ML integrating numerous coronary plaque factors (stenosis severity, lesion length, plaque location and composition considering the 18 coronary segments, coronary dominance, myocardial bridge (MB), and patient with vulnerable plaque) and clinical and demographic information to predict MACEs after an approximately threeyear period in a cohort study that accounts for time to event. The results of this study suggest that a newly generated model based on ML, accounting for nonlinearities, provided better event prediction. This study, integrating coronary plaque information from CCTA and clinical factors based on ML technology, provides a feasible and superior method to assess prognosis in patients with suspected coronary artery disease over an approximately three-year period.

Risk Stratification with CCTA
Until recently, cardiac imaging studies were more inclined to use clinical and coronary plaque features (presence, extent, location, and composition) of CCTA for risk stratification of future events [33,34]. Cheruvu C showed that the maximal severity of CAD is related to major cardiovascular events [35]. The number of segments with plaque, location, and composition also improve risk assessment [36,37]. Currently, the use of CCTA information is far from insufficient, whereas the resolution of CCTA can provide massive information for mining. The conventional CCTA risk score, linear assumptions, and conventional statistical approaches may be insufficient to complete this study [38].

Machine Learning Algorithms Improve the Integration of Coronary Plaque Information for Survival Analysis
ML, a subset of artificial intelligence accounting for nonlinearities, is able to integrate a number of variables [11]. Cox regression is often limited for data mining purposes due to the correlation between variables, nonlinearity of variables (including potential complex interactions among them), and the possibility of overfitting.
The feasibility of ML has been demonstrated previously in CAD risk reclassification analysis. Using 25 clinical and 44 CCTA features, Motwani et al. showed that ML significantly improved the prediction of death compared with the Framinghan risk score, SSS, SIS, and the Duke prognostic index [13]. Moreover, Dey et al. showed that an ML model incorporating semiautomatically quantified measures of coronary plaque (plaque volumes, stenosis severity, lesion length, and contrast density difference) identified vessels with hemodynamically significant CAD (fractional flow reserve ≤ 0.80) with high accuracy (AUC = 0.84) [14]. Specifically, the ML model showed greater diagnostic accuracy than a conventional statistical model that utilized the exact same data. The findings above suggest that ML improves the integration of the available data for the prediction of a certain outcome.
However, these studies are similar to a cross-sectional study (as opposed to a cohort study) because the follow-up outcomes of these studies do not include survival time and only showed dichotomous outcomes (not time-dependent).
This study accounted for time to event to obtain a more appropriate risk estimation. In the ML model, chest symptoms, stenosis of the proximal anterior descending branch, and stenosis of the middle right coronary artery were among the top three factors (Figure 3), suggesting that we need to pay more attention to these characteristics in patients with suspected coronary disease. In the assessment of the model's performance, this study shows that the ML model significantly improved the prediction of MACEs compared with other models ( Tables 2 and 3).
In the model evaluation, the ML model showed great calibration for approximately three years (Brier score < 0.040), demonstrating a low difference between the predicted risk and the actual observed risk for events, and a good prediction performance (<0.25 indicates significant) ( Table 5). The decision curve analysis of all models showed that the ML model was the preferred model, with the best net benefit when the treatment threshold probability was between 1% and 9% in approximately three years ( Figure 6).
This ML model can potentially translate the detailed 18-segment CCTA reads and clinical factors into an individualized risk report that might help physicians tailor preventive medical therapy. The present study established an integrated machine-learning model to predict clinical outcomes and compared it to currently available tools including SIS score, SIS score with clinical factors, and clinical factors models. The results demonstrated that the machine-learning model was feasible and easily-obtainable. Furthermore, the machine-learning model demonstrated the best performance in discrimination and calibration. The ML model could directly output MACE risk assessment within three years based on 13 non-zero variables and their coefficients in Figure 3b (symptom, LADp_stenosis, RCAm_stenosis, LCXp_length, LM_stenosis, LADm_composition, RCAp_stenosis, RI_composition, LADd_composition, OM1_composition, LCXp_stenosis, RCAd_length). For individualized preventive therapy, as is shown in present study, the proportion of the benefit for the population each year was between 0% and 3% when the risk assessment of the ML model was used for treatment, while the treatment threshold probability was between 1% and 9% over a period of approximately three years ( Figure 6). Considering the incidence of MACE events (4.4%), the proportion of the benefit for the population each year of 3% is relatively better.

Study Limitations
This study, which was designed as a respective single-center cohort study, was performed in a middle-aged population with suspected coronary artery disease. Therefore, the results of this study may not be generalizable to other study populations. This study was lacking in medication history and only followed up after nearly three years. Further research may follow up for longer, add follow-up medication history, include genetic data, and identify the image feature-genome interaction, wihle combined prediction ability may potentially improve the risk estimation.

Conclusions
Integrating coronary plaque information from CCTA based on machine learning technology provides a feasible and superior method to assess prognosis in patients with suspected coronary artery disease over an approximately three-year period.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Chinese PLA General Hospital (protocol code S2020-255-01 and 23 June 2020).

Informed Consent Statement:
Written informed consent has been obtained from the patient(s) to publish this paper. Data Availability Statement: Not applicable.