Predictive Model for High Coronary Artery Calcium Score in Young Patients with Non-Dialysis Chronic Kidney Disease

Cardiovascular disease is a major complication of chronic kidney disease. The coronary artery calcium (CAC) score is a surrogate marker for the risk of coronary artery disease. The purpose of this study is to predict outcomes for non-dialysis chronic kidney disease patients under the age of 60 with high CAC scores using machine learning techniques. We developed the predictive models with a chronic kidney disease representative cohort, the Korean Cohort Study for Outcomes in Patients with Chronic Kidney Disease (KNOW-CKD). We divided the cohort into a training dataset (70%) and a validation dataset (30%). The test dataset incorporated an external dataset of patients that were not included in the KNOW-CKD cohort. Support vector machine, random forest, XGboost, logistic regression, and multi-perceptron neural network models were used in the predictive models. We evaluated the model’s performance using the area under the receiver operating characteristic (AUROC) curve. Shapley additive explanation values were applied to select the important features. The random forest model showed the best predictive performance (AUROC 0.87) and there was a statistically significant difference between the traditional logistic regression model and the test dataset. This study will help identify patients at high risk of cardiovascular complications in young chronic kidney disease and establish individualized treatment strategies.


Introduction
Chronic kidney disease (CKD) is a major health problem, both worldwide and in Korea. When CKD progresses to end stage kidney disease, it causes a heavy socioeconomic burden on both individual patients and communities [1,2]. Among the various complications of CKD, cardiovascular disease (CVD) is at least the second most common cause of death for all stages of CKD patients, and it is the most common cause of death for CKD patients in stages 3-5 [3]. Therefore, in CKD patients, CVD risk assessment and timely intervention may improve the prognosis for CKD patients. Furthermore, the evaluation of CVD risk in younger patients is particularly important. Because younger patients often are more involved in socioeconomic activities than older patients, the development of CVD in young patients has a greater adverse effect on society.
CKD increases the risk of atheromatosis, and it can progress to atherosclerosis [4,5]. The traditional risk factors for CVD in CKD patients include age, hypertension, high fasting glucose, dyslipidemia, and smoking history [6][7][8]. Coronary computed tomography (CT), which is a non-invasive method for evaluating atherosclerosis of the coronary arteries, has been widely used to assess CKD. Coronary CT can calculate a patient's coronary artery calcium (CAC) score, which is a marker of subclinical coronary artery disease, by measuring the amount of CAC [9][10][11]. However, the use of coronary CT is limited in developing countries due to its high cost, and the effects of radiation exposure prohibit its excessive use.
Recently, many studies have been conducted that apply various machine learning techniques to clinical problems. Prediction models using machine learning techniques have demonstrated better performance than traditional prediction models, such as scoring systems for critical care [12] and traditional statistical models [13]. However, to the best of our knowledge, there has been no study examining the prediction of CAC scores in young non-dialysis CKD patients using machine learning. Therefore, the purpose of this study is to develop a predictive model using machine learning techniques that can screen high-risk patients with coronary artery disease among young chronic kidney disease, and we also compared the performance of machine learning techniques and traditional logistic regression.

Data Source and Study Population
We analyzed data from the Korean Cohort Study for Outcomes in Patients with Chronic Kidney Disease (KNOW-CKD), a nationwide, multicenter prospective cohort study that included non-dialysis patients with stage 1-5 CKD, aged 20-75 years. The detailed methods and design of the study were published previously (NCT01630486 at http://www.clinicaltrials.gov, accessed on 14 December 2021) [14]. The KNOW-CKD cohort included a total of 2238 patients. We excluded 879 patients who had missing CAC scores or were over the age of 60 years old. The final derivation cohort comprised 1341 patients. In addition, we established an external cohort based on patients who were treated at Chonnam National University Hospital for external validation. A total of 83 patients with CKD who were under the age of 60 years old were included in the external validation cohort. The enrollment of patients in this study is summarized in Figure 1.

Measurement and Definition
Various factors are known to be associated with coronary artery calcification; we selected a representative sample of 35 features that were related with coronary artery calcification and used them for the analysis. The details of the selected features are summarized in Table S1. Demographic and baseline clinical data, including age, sex, smoking history, cause of CKD, economic status, educational status, comorbidities, and medication history, were surveyed by well-trained research coordinators. Blood pressure was measured using an electronic sphygmomanometer in the clinic after five minutes of seated rest. Venous blood samples were collected after an overnight fast. Serum creatinine was measured using the traceable isotope-dilution mass spectrometry method. The estimated glomerular filtration rate was calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [15]. First-voided urine was used to measure spot urinary metrics, such as protein and creatinine. Coronary multi-detector CT was performed to calculate the coronary calcium score. The quantitative CAC score was calculated using the method described by Agatston et al. [16]. The primary outcome variable of this study was high CAC score, which was defined as a CAC score ≥100, and the patients were classified based on this criterion.
To maintain a constant ratio of primary outcomes in both the training and test datasets, we divided the full dataset into a training dataset (70%) and a validation dataset (30%) using a stratified sampling method. The training dataset was used for developing the predictive models, whereas the validation dataset was used to validate and compare the models. To survey the optimal hyperparameters for machine learning techniques, a 10-fold cross-validation was performed. We used a grid search to investigate the combination of hyperparameters and defined the hyperparameter with the highest the area under the receiver operator curve (AUROC) value as the optimal hyperparameter. We used the AUROC as our main evaluation metric because it features class skew independence, and it is classification-threshold-invariant. The test dataset was utilized only in the performance tests for the final predictive model.
For the neural network model and the SVM, all the variables were normalized with the minimum and maximum values of each variable in the training dataset. The mathematical expressions of the normalization are depicted below: We also created dummy features of the discrete variables for appropriate analyses. We calculated the AUROC to quantify the performance of the predictive models and applied the DeLong test to compare the performance of each predictive model. Because missing values cannot be used in machine learning, simple imputation was performed using the MICE package in R [20]. The continuous variables were imputed using the pmm (predictive mean matching) method, the binary variables were imputed by the logreg (logistic regres-sion) method, and the multinomial variables were imputed by the polyreg (polytomous logistic regression) method. Any p-values < 0.05 were considered as statistically significant. The Shapley additive explanations (SHAP) value was calculated to determine feature importance. SHAP is based on game theory [21] and local explanations [22]. Lundberg and Lee [23] reported the SHAP value for an explainable model with additive feature attribution methods. Additive feature attribution methods were defined as follows [23]: where z ∈ {0, 1} M , M is the number of input features, and φ i ∈ R. An important property of the class of additive feature attribution methods is that it has a single unique solution with three desirable properties: local accuracy, omission, and consistency [23]. Based on above method, the authors suggested Tree SHAP, which uses a conditional expectation rather than a marginal expectation [24]: where N is the set of all input features SHAP values can be obtained using the conditional expected value function of the machine learning model, and SHAP utilizes a technique for estimating the Shapley value for the input feature value of each instance [23]. Using SHAP, consistent variable importance can be extracted; Tree SHAP was used in this study.

Clinical Characteristics of Study Population
The percentages of missing data for all the variables in the derivation cohort were <5%, except for C-reactive protein (7.189%), waist-hip ratio (6.594%), and serum chloride (8.676%), and these missing values were imputed via a simple imputation method using the "MICE" package. The data from 1341 patients were analyzed. The median age and eGFR of the patients in the derivation cohort were 48.0 years and 55.3 mL/min/1.73 m 2 , respectively. The proportion of female patients was 42.4% (568 patients), and the mean waist-hip ratio and fasting blood glucose levels were 0.9 and 107.3 mg/dL, respectively, for the entire derivation cohort. The number of patients with high CAC scores (>100) was 345 (36.8%) in the training dataset and 148 (36.7%) in the validation dataset. Only six features, namely serum albumin, low density lipid, total cholesterol, educational status, serum calcium, and serum phosphate, showed statistical differences between the training and validation cohorts. The detailed characteristics of the study population are summarized in Table S1.

Predictive Models for Coronary Artery Calcium Score
We constructed five predictive models. The summarized results, including the sensitivity, specificity, accuracy, AUROC, and p-value for the DeLong test of each predictive model are described in Table 1. The RF and XGboost models showed better accuracy, sensitivity, and specificity than the conventional logistic regression techniques. The SVM and MLP neural network models had lower accuracy, sensitivity, and AUROC than logistic regression. Among the machine learning techniques, the RF model showed the best performance with respect to AUROC and was only statistically significantly different from the performance of logistic regression. We visualize the AUROC in Figure 2. For the logistic regression, Akaike information criterion was applied to select the features using the backward elimination method. The logistic regression data are summarized in Table 2.

Final Predictive Model
The SHAP values of the features in the RF model that showed the best performance were calculated, and the results of the top 20 features are summarized in Figure 3. Age and fasting blood glucose were the highest-ranking features, followed by waist-hip ratio, sex, and high-density lipoprotein. Finally, we selected four features, age, sex, waist-hip ratio, and fasting blood glucose, based on their SHAP values and clinical accessibility. We assessed the performance of the final predictive model with the test dataset, which was independent of the derivation cohort. The AUROC of the final predictive model using the test dataset is visualized in Figure 4, and its AUROC was 0.87.

Discussion
In this study, we found that traditional logistic regression has limitations in predicting the CAC score of young CKD patients and classifying them into high-risk groups. The RF model showed the best performance with respect to AUROC, requiring only four, easily obtained clinical variables.
Cardiovascular complications are major complications among CKD patients. The selection of high-risk CKD patients for cardiovascular complications and the provision of early interventions are particularly challenging tasks in the medical field. Based on the socioeconomic benefits of early intervention in young patients and the prevalence of high CAC score (>100) increasing with age in Korea [25], we excluded patients over 60 years old. The CAC score is a surrogate marker that can predict the occurrence of cardiovascular events, and it is possible to measure the CAC score using the Agatston method, which measures CAC scores using the weighted sum of lesions with density >130 HU, multiplying the area of calcium by a factor related to maximum plaque attenuation [16]. Recent research has reported that patients with CAC scores >100 have 4.3 times greater risk of experiencing a major cardiovascular event compared to the patients a CAC score of zero [26]. Additionally, the CAC score correlates well with the Framingham risk score, which estimates the 10 year risk of coronary artery disease [27]. Current treatment guidelines recommend that if coronary artery disease is strongly suspected, coronary angiography should be performed first so that both diagnosis and treatment can be performed at the same time. If it is possible to screen patients who are expected to receive a high CAC score, i.e., patients at high risk of coronary artery disease who require coronary angiography. This is very important because unnecessary potentials for double doses of radiation and contrast agents can be avoided.
RF is an ensemble-based tree model that offers the advantage of not easily nor frequently overfitting the data. In addition, rapid performance improvements are not induced by an increase in the training dataset amount, which means that good performances can be generated even with smaller datasets. However, RF features disadvantages in that it has a high computational cost, and it is unable to extract the non-uniform feature importance. To overcome these problems, we constructed the final predictive model by extracting important variables based on the SHAP value and ensured the universality and applicability of the model. Previous reports [28,29] showed the underperformance of predictive models with machine learning techniques in test datasets, which reflects the universality problem of machine learning. However, our final model for predicting high CAC score patients showed a strong predictive performance (0.87 of AUROC) with the external test cohort. This was achieved through careful feature selection.
Currently, machine learning explainability is a major concern, especially in the medical field. Traditional methods (Gini and Split count) for measuring the feature importance of the tree-model feature inconsistent limitations for each model or individual tree. Consistency and accuracy are the important components with which to evaluate the feature importance [19]. Among various methods, SHAP is one of the most reliable technique to assess the feature importance [30]. Our final predictive model comprised only four clinical variables (age, sex, fasting glucose and waist-hip ratio). Known traditional risk factors for CAC include age, high fasting glucose, hypertension, male sex, blood glucose, and waist-hip ratio [7,8,31]. The four variables (age, sex, fasting blood glucose, and waist-hip ratio) that were selected by the SHAP value were consistent with traditional risk factors for CAC, and similar results were observed in logistic regression. Although the SHAP value could not confirm the causality, we concluded that our prediction model is clinically reliable based on these results.
To the best of our knowledge, this study is the first to predict CAC scores based on clinical variables in young non-dialysis CKD patients. Our derivation cohort features many strengths, including its prospective observational design, robust data collection, and large study population. We applied robust statistical methods, including minimized omitted variable bias, with our imputation method and the SHAP value. These strengths ensure that our analyses are reliable. However, our study also features some limitations. First, the database used in this study is larger than in other disease-specific cohorts, but it is smaller than the general population cohort. Second, although the dataset was verified by an external validation cohort, we cannot confirm its universality, which is a limitation of all machine learning methods. Lastly, the study design was retrospective and the problems of hidden bias, confound variables, and omitted variables, which are common to many machine learning techniques, could not be solved completely here.
In this study, our final predictive RF model demonstrated better predictive performance than logistic regression in the assessment of young CKD patients. Our predictive model may help to screen high-risk patients for cardiovascular complications in young chronic kidney disease, without subjecting patients to radiation exposure. Regarding the simplicity and clinical significance of our predictive model, the results of this study may offer great benefits for the efficient use of resources where the use of expensive medical resources, such as CT, is limited. In addition, these results may help in the application of personalized treatment strategies for high-risk patients. In the future, we aim to complete a follow-up study to demonstrate the universality and generalizability of the model. Since the database used in CNUH-2021-292 did not include personal identifiers and the study is retrospective and observational in design, the need for informed consent was waived.

Data Availability Statement:
The datasets generated during and/or analyzed during the current study are available from the corresponding author (S.W.K.) on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.