Prediction of Coronary Artery Calcium Score Using Machine Learning in a Healthy Population

Background: Coronary artery calcium score (CACS) is a reliable predictor for future cardiovascular disease risk. Although deep learning studies using computed tomography (CT) images to predict CACS have been reported, no study has assessed the feasibility of machine learning (ML) algorithms to predict the CACS using clinical variables in a healthy general population. Therefore, we aimed to assess whether ML algorithms other than binary logistic regression (BLR) could predict high CACS in a healthy population with general health examination data. Methods: This retrospective observational study included participants who had regular health screening including coronary CT angiography. High CACS was defined by the Agatston score ≥ 100. Univariable and multivariable BLR was performed to assess predictors for high CACS in the entire dataset. When performing ML prediction for high CACS, the dataset was randomly divided into a training and test dataset with a 7:3 ratio. BLR, catboost, and xgboost algorithms with 5-fold cross-validation and grid search technique were used to find the best performing classifier. Performance comparison of each ML algorithm was evaluated with the area under the receiver operating characteristic (AUROC) curve. Results: A total of 2133 participants were included in the final analysis. Mean age and proportion of male sex were 55.4 ± 11.3 years and 1483 (69.5%), respectively. In multivariable BLR analysis, age (odds ratio [OR], 1.12; 95% confidence interval [CI], 1.10–1.15, p < 0.001), male sex (OR, 2.91; 95% CI, 1.57–5.38, p < 0.001), systolic blood pressure (OR, 1.02; 95% CI, 1.00–1.03, p = 0.019), and low-density lipoprotein cholesterol (OR, 1.00; 95% CI, 0.99–1.00, p = 0.047) were significant predictors for high CACS. Performance in predicting high CACS of xgboost was AUROC of 0.823, followed by catboost (0.750) and BLR (0.585). The comparison of AUROC between xgboost and BLR was significant (p for AUROC comparison < 0.001). Conclusions: Xgboost ML algorithm was found to be a more reliable predictor of CACS in healthy participants compared to the BLR algorithm. ML algorithms may be useful for predicting CACS with only laboratory data in healthy participants.


Introduction
Cardiovascular disease (CVD) is one of the leading causes of death worldwide [1]. Inflammation of the vascular smooth muscle cell results in increased calcium deposits that develops into atherosclerotic plaque on the internal wall of the coronary artery. Further, the internal diameter of the vessels deteriorates due to the expansion and rupture of the calcifying plaque, resulting in cardiovascular disease [2]. When evaluating the risk of coronary atherosclerosis, we usually assess an individual's conventional risk factors including hypertension, diabetes, dyslipidemia, and smoking. In addition, coronary artery calcium score (CACS) in computed tomography (CT) is an important predictor for future CVD development and mortality in the general population [3][4][5]. It has been reported that an assessment of CACS along with the Framingham Risk Score stratification, can be more useful in assessing future CVD development rather than just evaluating the latter [6]. Likewise, identifying CACS in the general population is a useful tool to identify high risk patients for CVD in primary prevention.
Machine learning (ML) has recently been adopted in a variety of medical problem solving or outcome predictions because it shows greater accuracy compared to conventional statistical methods due to its use of tremendous computational power [7]. Previous studies have focused on deep learning algorithms to predict CACS using chest CT, which has already shown promising results in image deep learning tasks [8,9]. Moreover, there have been a variety of ML algorithms to classify high and low risk for CVD more efficiently than the conventional logistic regression analysis [10][11][12]. The performance of these ML algorithms are gradually improved by several ML techniques such as data scaling/normalization, outlier or noise processing, and cross-validation to minimize overfitting and underfitting [13]. Recently, Al'Aref et al. suggested that an ML model including clinical features and CACS can improve the future coronary artery obstructive disease (CAOD) risks in 35281 Coronary CT Angiography Evaluation for Clinical Outcomes: An International Multicenter (CONFIRM) registry patients [10]. However, to our best knowledge, no study has assessed the feasibility of ML algorithms to predict the CACS using only clinical variables in a healthy general population. Therefore, the aim of this study was to assess the performance of several ML classifiers to predict CACS in addition to conventional binary logistic regression (BLR) analysis. We hypothesized that classification performances of several ML classifiers are superior to those of BLR analysis.

Study Participants
This is a retrospective observational study using hospital-based participants who had regular health screening. In Korea, the entire population has medical insurance, and those aged 40 and above are allowed to have a general health screening every two years, which is provided by the National Health Insurance Service. At the same time, if there have been any abnormal health results through prior screening, or if the health policyholder or business owner wants additional investigations at a personal cost, then examinations such as a brain MRI or coronary CT angiography (CTA) are made available. We included patients who were examined between January 2014 and December 2019 at three tertiary academic centers. The inclusion criteria were those patients (1) who were 40 years or older and (2) who opted for a coronary CTA in addition to regular health examination simultaneously. The exclusion criteria were (1) missing clinical or laboratory information and (2) the patients with repetitive coronary CTA. In other words, if the patients had undergone repetitive coronary CTA at each regular health examination, we only included information of the last health examination and coronary CTA results. Figure 1 shows the flow chart of the inclusion and exclusion strategy used in this study. The study protocol was approved by the Hallym University Hospital Institutional Review Board (No. 2019-05-010-001). Informed consent was waived by the IRB because we only used fully deidentified data. strategy used in this study. The study protocol was approved by the Hallym University Hospital Institutional Review Board (No. 2019-05-010-001). Informed consent was waived by the IRB because we only used fully deidentified data.

Data Collection
As described earlier, we included laboratory parameters performed at regular health examinations including complete blood count, liver and renal function tests, lipid profile, atherosclerosis markers, and screening tests for diabetes. In addition, anthropometric measures such as age, sex, blood pressure, and body mass index were included in the input features of ML tasks. Information on clinical characteristics and risk factors for CVD (hypertension, diabetes, dyslipidemia, current smoking) were not included because laboratory parameters such as blood pressure, fasting blood glucose, glycated hemoglobin, or total and low-density lipoprotein (LDL) cholesterol are already included in the prediction model.

Coronary Artery Calcium Score
Three participating centers performed an ECG-triggering cardiac CT scan of Sensation 64 or Somatom Definition Flash (Siemens Medical Solutions, Forchheim, Germany). All CT scanners performed with 64-detector scanning capability, and the exact parameters of the cardiac CT scan were as follows: tube voltage, 120 kVp; window level, 40; window width, 120; slice thickness, 3 mm. CACS was calculated by semi-automated Agatston method [14]. The primary outcome measure was high CACS, defined as a CACS of 100 or more in coronary CTA.

Machine Learning
We used anthropometric and laboratory variables from the general health examination as input variables. BLR, catboost, and extreme gradient boost (xgboost) ML algorithms were used to classify high and low CACS of the participants. Especially, catboost and xgboost are ensemble tree-based classifiers, which can minimize the issue of overfitting in tree-based classifiers [15][16][17]. We randomly divided the whole dataset into a training and test dataset with a 7:3 ratio. Proportions of high CACS groups were identically distributed in the training and test dataset. Input variables were not scaled or normalized with preprocessing but entered as row values in the ML task. All ML tasks were

Data Collection
As described earlier, we included laboratory parameters performed at regular health examinations including complete blood count, liver and renal function tests, lipid profile, atherosclerosis markers, and screening tests for diabetes. In addition, anthropometric measures such as age, sex, blood pressure, and body mass index were included in the input features of ML tasks. Information on clinical characteristics and risk factors for CVD (hypertension, diabetes, dyslipidemia, current smoking) were not included because laboratory parameters such as blood pressure, fasting blood glucose, glycated hemoglobin, or total and low-density lipoprotein (LDL) cholesterol are already included in the prediction model.

Coronary Artery Calcium Score
Three participating centers performed an ECG-triggering cardiac CT scan of Sensation 64 or Somatom Definition Flash (Siemens Medical Solutions, Forchheim, Germany). All CT scanners performed with 64-detector scanning capability, and the exact parameters of the cardiac CT scan were as follows: tube voltage, 120 kVp; window level, 40; window width, 120; slice thickness, 3 mm. CACS was calculated by semi-automated Agatston method [14]. The primary outcome measure was high CACS, defined as a CACS of 100 or more in coronary CTA.

Machine Learning
We used anthropometric and laboratory variables from the general health examination as input variables. BLR, catboost, and extreme gradient boost (xgboost) ML algorithms were used to classify high and low CACS of the participants. Especially, catboost and xgboost are ensemble tree-based classifiers, which can minimize the issue of overfitting in tree-based classifiers [15][16][17]. We randomly divided the whole dataset into a training and test dataset with a 7:3 ratio. Proportions of high CACS groups were identically distributed in the training and test dataset. Input variables were not scaled or normalized with preprocessing but entered as row values in the ML task. All ML tasks were performed with 5-fold cross-validation, which prevents overestimation of the parameters and reduces information leakage. We used a grid search technique for hyperparameter optimization of the ML algorithms. From the ML training process, we extracted feature importance information of each ML classifier and compared them with each other.

Statistical Methods
Baseline characteristics of the participants according to high vs low CACS were compared with Student's t-test, Mann-Whitney U, or Pearson's χ 2 -test, as appropriate. We performed univariable and multivariable BLR analysis for the whole dataset. Statistically significant features with a p-value less than 0.05 in the univariable logistic regression analysis were entered into the multivariable model. Predictors in BLR analysis were represented with odds ratio (OR) and 95% confidence interval (CI).
After the BLR analysis of the whole dataset, the ML classifier was trained in the 4-fold training dataset and validated in the 1-fold dataset, randomly. Performance of each ML classifier was evaluated in unseen test data. We calculated the probability score of the data in each ML algorithm and allocated those with a score of more than 0.5 into the high CACS group. The performance of each ML algorithm was measured by the area under the receiver operating characteristic (AUROC) curve. ML classification tasks were performed with R version 3.6.1 (the R Foundation for Statistical Computing) using moonBook, caret, xgboost, and catboost R packages.

Results
A total of 2123 participants were included in the final analysis. Mean (±standard deviation) age and proportion of male sex were 55.4 ± 11.3 years and 1483 (69.5%), respectively. Among all participants, 237 participants (11.2%) had a high CACS. Comparison of the anthropometric and laboratory difference for high versus low CACS groups in the whole dataset are shown in Table 1. The high CACS group is likely to be older, shorter in height and male and have a longer abdominal circumference and higher systolic and diastolic blood pressure than those with low CACS. Fasting blood sugar, glycated hemoglobin, lactate dehydrogenase, serum glutamic-oxaloacetic transaminase, blood urea nitrogen, creatinine, and mean corpuscular volume of red blood cells were higher, and total bilirubin, estimated glomerular filtration rate, total cholesterol, LDL cholesterol, and platelet count were lower in the high CACS group than in the low CACS group.

Performance of Machine Learning Classifier of High Coronary Artery Calcium Score
Supplementary Table S1 shows the comparison of input variables between the training and test datasets. There were no differences among input features with the exception of triglyceride level. In BLR analysis that included only the training dataset, age and male sex were significant predictors for high CACS (Supplementary Table S2). Supplementary Table S3 depicts the catboost and xgboost ML algorithms detailing the hyperparameters of the best classifier. Figure 2 shows the overall performance comparison of the AUROC of the three algorithms that predict high CACS. The performance of xgboost was better than that of the BLR classifier (p for AUROC comparison = 0.008). Figure 3 summarizes the feature importance of each ML model. Age, systolic blood pressure, and LDL cholesterol were significant features in all three ML classifiers. Male sex was found to be a significant predictor of high CACS in BLR analysis but was not a significant feature in xgboost ML classifiers. When we additionally performed ML prediction with the 10 variables that had the highest importance values (Figure 3), the AUROC values did not differ much from the original models (0.605 for binary logistic stress, 0.749 for catboost, and 0.822 for xgboost). classifiers. When we additionally performed ML prediction with the 10 variables that had the highest importance values (Figure 3), the AUROC values did not differ much from the original models (0.605 for binary logistic stress, 0.749 for catboost, and 0.822 for xgboost).

Discussion
In this study of CACS prediction using health examination data, age, male sex, and systolic blood pressure were significant predictors for high CACS in a healthy population. BLR analysis provides reasonable information for the significant predictors of high CACS, but its prediction performance was lacking as a prediction classifier. However, other ML algorithms such as catboost and xgboost classifiers overcome the prediction power of the BLR classifier and provide a satisfactory binary classification prediction.
There have been well-known risk calculators in stratifying CVD risk in symptomatic and asymptomatic patients [18][19][20]. Diamond and Forrester originally developed the Duke Clinical Score using logistic regression, which included age, sex, type of chest pain, and several risk factors for CVD [21]. After reporting the overestimation of the pretest probability of the Duke Clinical Score in predicting symptomatic CAOD, the CAD Consortium score and the CONFIRM Risk score were developed and still used age, sex, and several risk factors for CVD as predictors to predict high CAD risk in patients with CAOD [22][23][24]. In our ML prediction model, age, sex, and several risk factors also contribute to classification prediction. Among them, age, systolic blood pressure, and LDL cholesterol were important features for all three ML classification predictions, which is concordant with previously developed risk stratification calculators.
In our BLR of high CACS in all participants, LDL cholesterol had a negative association with high CACS. In our data, the high CACS group had lower total and LDL cholesterol levels (184 mg/dL and 86 mg/dL, respectively) compared to the low CACS group, which was different to other study populations [4,6,25]. This phenomenon could be explained by the possibility that those with high CACS may have been taking more lipid lowering agents for primary prevention of CVD than those with low CACS. Further, our data could be partially explained by the fact that high aspartate transaminase/alanine aminotransferase ratio and increased lactate dehydrogenase in individuals with high CACS have been associated with a higher proportion of statin-induced muscle or liver injury than in those with low CACS [26,27]. If drug history was included as an input feature of our ML algorithms, the performance of these models might have been better than current models. However, since we did not include history of previous prescription including lipid lowering agents, this hypothesis should be verified in future studies.
In our xgboost algorithms, male sex was not a significant predictor of CAS. Furthermore, why is the importance of input features different in predicting high CACS group for each ML model? Logistic regression is used to explain various phenomena with odds ratio but always takes into account the interaction effect when constructing the model [28]. In contrast, catboost and xgboost do not consider interaction effects because these algorithms are composed of tree-ensemble structures, which minimize the feature interaction in classification or regression tasks [15,29,30]. In addition, identifying the importance of catboost and xgboost features could give us novel feature information for CACS prediction. In our result, alkaline phosphatase, platelet count, estimated glomerular filtration rate, body mass index, and white blood cells were importance features in ML algorithms. Alkaline phosphatase is already one of the risk factors or treatment target for cardiovascular disease in addition to the highly sensitive C-reactive protein as atherosclerotic biomarkers [31]. Likewise, implementation of ML other than logistic regression could be useful to improve prediction performance and to find novel associations between input features and target disease.
Our study has some limitations. We did not extract or use information on drug usage, regular exercise, or compliance of primary prevention of CVD. In other words, the disadvantage of this study is that there is no information on statin usage and regular exercise, which are associated with CVD development. Second, this study was conducted on the Korean people, and therefore, it may not be possible to generalize its results to other ethnic groups. When using our method on data from other ethnic groups, we should keep in mind that the probability of several factors contributing to CVD development in Asians is underestimated. This study was conducted on patients attending tertiary hospitals, and therefore, there is a possibility of selection bias even though the participants were likely to be a healthy general population.
Despite these limitations, our study has some strengths. First, physically healthy participants were the main population in our study. Most other CVD risk scoring systems have been developed for symptomatic CVD patients. Therefore, in order to screen for the risk of CAOD in asymptomatic patients, there was a need to study patients without CVD, rather than those with CVD. Second, our ML algorithms showed improved performance of some ML classifiers compared to logistic regression even though no clinical variables, except age and sex, were used in the training process. Therefore, our results could be useful in predicting future CVD risk in patients with only limited laboratory data.

Conclusions
In this study, we found that when predicting CACS with laboratory data in a healthy participant, logistic regression was not enough for classifying them into a high CACS group. Instead, the xgboost algorithm can improve the prediction power in CACS estimation.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2075-4426/10/3/96/s1. Table S1: The comparison of input features between training and test dataset. Table S2: Results of the binary logistic regression analysis of the training dataset. Table S3: Parameter optimization of each machine learning algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.