Machine Learning Models for the Prediction of Renal Failure in Chronic Kidney Disease: A Retrospective Cohort Study

This study assessed the feasibility of five separate machine learning (ML) classifiers for predicting disease progression in patients with pre-dialysis chronic kidney disease (CKD). The study enrolled 858 patients with CKD treated at a veteran’s hospital in Taiwan. After classification into early and advanced stages, patient demographics and laboratory data were processed and used to predict progression to renal failure and important features for optimal prediction were identified. The random forest (RF) classifier with synthetic minority over-sampling technique (SMOTE) had the best predictive performances among patients with early-stage CKD who progressed within 3 and 5 years and among patients with advanced-stage CKD who progressed within 1 and 3 years. Important features identified for predicting progression from early- and advanced-stage CKD were urine creatinine and serum creatinine levels, respectively. The RF classifier demonstrated the optimal performance, with an area under the receiver operating characteristic curve values of 0.96 for predicting progression within 5 years in patients with early-stage CKD and 0.97 for predicting progression within 1 year in patients with advanced-stage CKD. The proposed method resulted in the optimal prediction of CKD progression, especially within 1 year of advanced-stage CKD. These results will be useful for predicting prognosis among patients with CKD.


Introduction
Chronic kidney disease (CKD) is a global health problem associated with a high risk of adverse clinical events and high health care costs [1]. In addition, CKD progression can induce the development of cardiovascular disease [2] and diabetes [3]. In the United States, Medicare costs associated with CKD and end-stage renal disease (ESRD) were reported to total more than $120 billion in 2017 [4]. The number of patients with ESRD is currently projected to increase to between 971,000 and 1,259,000 by 2030, representing a 41.3-83.2% increase in the prevalence from 687,093 patients with ESRD reported in 2015 [5]. In Taiwan, the hospitalization rate for ESRD has gradually increased from 964.1 per 1000 person-years in 2010 to 1037.9 per 1000 person-years in 2018 [6]. The total number of people on dialysis increased by 28.9%, from 65,610 patients in 2010 to 84,615 patients in 2018 [7]. The ESRD prevalence rate has gradually increased among older adults, especially among men 65 years and older.
CKD is typically silent and extremely variable. Moreover, the development of several chronic diseases have been associated with CKD progression, making clinical management particularly challenging. Timely interventions for patients with CKD could improve the quality of medical care and reduce morbidity, mortality, and healthcare costs [8,9]. Therefore, the development of a reliable model able to predict the risk of CKD progression even during early stages is necessary. Traditional statistical methods, such as the Cox hazard model, have been applied to the prediction of renal failure among patients with CKD in prior studies [10,11]. However, machine learning (ML) is increasingly being adopted to Diagnostics 2022, 12, 2454 2 of 13 assess patients with CKD and consider all possible interactions between various input data [12][13][14]. A logistic regression (LR) classifier was able to predict the onset of renal replacement therapy (RRT) within 12 months with an area under the receiver operating characteristic curve (AUROC) value of 0.773 [15]. These results provided a screening approach for predicting the risk of RRT within 12 months. In another study, Subas et al. explored the abilities of several ML models for CKD diagnosis, including artificial neural network, support vector machine (SVM), k-nearest neighbor, C4.5 decision tree and random forest (RF) classifiers [16]. The RF model had the highest accuracy (100%), followed by the C4.5 decision tree classifier (99%), when applied to a dataset obtained from the University of California at Irvine ML Repository. Electronic health record (EHR) systems have drawn extensive and consistent attention and predictive models for clinical disease progression can be developed using features extracted from the EHR [17,18], allowing for the development and validation of predictive models that combine available laboratory data with data obtained from the EHR.
To provide a clinical database for medical assessments and improve healthcare quality, the pre-ESRD patient care and education program was initiated by the National Health Insurance (NHI) administration under the Ministry of Health and Welfare in Taiwan. The program includes CKD patients classified as Stages 3B to 5. In addition, the NHI reimbursement program has been shown to improve health care quality for patients with early-stage CKD (Stages 1 to 3A) [19]. The management program is required to follow specific clinical guidelines for each CKD stage. Patients with early-stage CKD are evaluated for urine protein to creatinine ratio (UPCR), serum creatinine levels, low-density lipoprotein cholesterol (LDL-C) levels and glycated hemoglobin (HbA1c) levels. Patients with pre-ESRD CKD are assessed for hemoglobin, blood urea nitrogen (BUN), serum creatinine levels, albumin levels, serum calcium levels, serum phosphate levels, fasting glucose levels, HbA1c levels, LDL-C levels, uric acid levels, sodium levels, potassium levels, triglyceride levels and UPCR.
In the present study, predictive models for progression from pre-dialysis CKD to ESRD were established for patients with Stages 2 to 5 CKD using various ML algorithms, including LR, RF, extreme gradient boosting (XGBoost), SVM and Gaussian naïve Bayes (GNB). Important classification features were evaluated for their value as high-risk factors to determine the optimal predictive features for use in each of the five models. Optimal model development could promote early CKD diagnosis and improve CKD management, preventing progression to kidney failure. These results could improve access to timely treatments among patients with CKD.

Patient Population
In the retrospective cohort study, a total of 858 patients enrolled in the NHI program and diagnosed with early-(Stages 2 and 3A) or advanced-stage (Stages 3B, 4 and 5) CKD were treated from November 2006 to December 2019 at a branch of the Taipei Veterans General Hospital, including 516 with early-stage CKD and 342 with advanced-stage CKD. Early-stage CKD was defined as Stage 2 if 60 < eGFR < 89.9 mL/min/1.73 m 2 and as Stage 3 if 45 < eGFR < 59.9 mL/min/1.73 m 2 . Advanced-stage CKD was defined as Stage 3B if 30 < eGFR < 44.9 mL/min/1.73 m 2 , as Stage 4 if 15 < eGFR < 29.9 mL/min/1.73 m 2 and as Stage 5 if eGFR < 14.9 mL/min/1.73 m 2 . Based on the NHI program requirements, eGFR was calculated using the simplified Modification of Diet in Renal Disease equation. The outcome of this study was ESRD, defined as the diagnosis of renal failure, the initiation of hemodialysis or peritoneal dialysis. None of our study subjects were treated by kidney transplantation. Transfer to other hospitals, death and loss to follow-up were regarded as observation endpoints without reaching ESRD. De-identified data associated with patients diagnosed with Stages 2-5 CKD and enrolled in the two NHI CKD programs were retrieved from the hospital information database. The study was reviewed and approved by the Institutional Review Board (IRB) of Taipei Veterans General Hospital (No. 2020-01-024BC). Due to the use of de-identified data, the need for informed consent was waived by the IRB.

Study Design
In total, 858 patients with CKD were enrolled in this study, comprising 119 patients with Stage 2, 397 with Stage 3A, 111 with Stage 3B, 143 with Stage 4 and 88 with Stage 5. The numbers of early-stage CKD patients who progressed to ESRD within 3 and 5 years were 44 (5 with stage 2 and 39 with stage 3A) and 50 (6 with stage 2 and 44 with stage 3A), respectively. The numbers of advanced CKD patients who progressed to ESRD within 1 and 3 years were 38 (10 with stage 4 and 28 with stage 5) and 59 (2 with stage 3b, 17 with stage 4 and 40 with stage 5), respectively. A flow chart of the patient selection and categorization processes is shown in Figure 1.

Study Design
In total, 858 patients with CKD were enrolled in this study, comprising 119 patients with Stage 2, 397 with Stage 3A, 111 with Stage 3B, 143 with Stage 4 and 88 with Stage 5. The numbers of early-stage CKD patients who progressed to ESRD within 3 and 5 years were 44 (5 with stage 2 and 39 with stage 3A) and 50 (6 with stage 2 and 44 with stage 3A), respectively. The numbers of advanced CKD patients who progressed to ESRD within 1 and 3 years were 38 (10 with stage 4 and 28 with stage 5) and 59 (2 with stage 3b, 17 with stage 4 and 40 with stage 5), respectively. A flow chart of the patient selection and categorization processes is shown in Figure 1. The dataset was divided into patients managed with and without ESRD. In addition to the original dataset, the synthetic minority over-sampling technique (SMOTE) was also applied. Randomized data subsets were used for cross-validation (K = 5). The LR, RF, XGBoost, SVM and GNB classifiers were used to determine whether pre-dialysis CKD data could be used to predict progression to ESRD. The Shapley additive explanations (SHAP) value was used to select important characteristic factors for predicting CKD progression. Model retraining was performed using the most important risk factors for CKD progression to achieve optimal classification outcomes. The progression of CKD to kidney failure among patients diagnosed with early-stage CKD was followed for up to 5 years and the models were used to identify risk factors for CKD progression within 3 and 5 years. The progression of CKD to kidney failure in patients diagnosed with advancedstage CKD was followed for up to 3 years and the models were used to identify risk factors for CKD progression within 1 and 3 years. The flow chart of model training and performance evaluation is shown in Figure 2. The dataset was divided into patients managed with and without ESRD. In addition to the original dataset, the synthetic minority over-sampling technique (SMOTE) was also applied. Randomized data subsets were used for cross-validation (K = 5). The LR, RF, XGBoost, SVM and GNB classifiers were used to determine whether pre-dialysis CKD data could be used to predict progression to ESRD. The Shapley additive explanations (SHAP) value was used to select important characteristic factors for predicting CKD progression. Model retraining was performed using the most important risk factors for CKD progression to achieve optimal classification outcomes. The progression of CKD to kidney failure among patients diagnosed with early-stage CKD was followed for up to 5 years and the models were used to identify risk factors for CKD progression within 3 and 5 years. The progression of CKD to kidney failure in patients diagnosed with advanced-stage CKD was followed for up to 3 years and the models were used to identify risk factors for CKD progression within 1 and 3 years. The flow chart of model training and performance evaluation is shown in Figure 2.

Variables
Each subject's predictor variables and baseline characteristics were obtained initially during the first clinic visit. The clinical characteristics of CKD were classified into four categories. Demographic variables included age, sex, height and weight. Laboratory data included serum and urine assessments, including eGFR, hemoglobin, hematocrit, creatinine, BUN, sodium, potassium, calcium, phosphorus and UPCR. Comorbid conditions included hypertension, diabetes and cardiovascular diseases. Risk-related biophysical and biochemical data included blood pressure, lipid profile and HbA1c levels. Those variables missing greater than 30% of values were excluded from the analysis. The missing values for other variables were replaced with multiple imputation. The study created five datasets using the multivariate imputation via chained equations module in the R package to perform the data imputation. All baseline characteristics and laboratory variables were obtained from the NHI pre-ESRD Patient Care and Education Program and the NHI Reimbursement Plans that Improve Health Care Quality of Early-Stage CKD Program implemented by the NHI.

Statistical Analysis
In this study, baseline demographic and laboratory data from the first clinic visit at which CKD was diagnosed were used to train the models. Due to differences in the collection of clinical characteristics, clinical data for Stages 2-3A CKD (early stage) were processed separately from clinical data for Stages 3B-5 CKD (advanced stage). Clinical indicators, including age, eGFR, serum creatinine, urine creatinine, LDL-C, HbA1c and UPCR, were associated with a significant risk of ESRD among patients diagnosed with Stages 2-5 CKD. In addition to these risk factors, the indicators associated with a significant risk for ESRD among Stages 3A-5 CKD include uric acid, albumin, fasting plasma glucose (FPG), triglyceride, cholesterol, hemoglobin, hematocrit, BUN, sodium, potassium, calcium and phosphorus. The detailed demographic characteristics of the cohort are listed in Table 1. Five available classification models were developed and the predictive performances of each model for determining the progression risk of various stages of pre-dialysis CKD were analyzed. The predictive performances of the five models were evaluated using an AUROC analysis and the sensitivity, specificity, accuracy, precision, F1 score and negative predictive value (NPV) were calculated. A cutoff value was identified based on the AUROC analysis to provide optimal sensitivity and specificity. Important features were selected and applied to determine the optimal model for predicting ESRD progression in patients with early-and advanced-stage CKD.

Variables
Each subject's predictor variables and baseline characteristics were obtained initially during the first clinic visit. The clinical characteristics of CKD were classified into four categories. Demographic variables included age, sex, height and weight. Laboratory data included serum and urine assessments, including eGFR, hemoglobin, hematocrit, creatinine, BUN, sodium, potassium, calcium, phosphorus and UPCR. Comorbid conditions included hypertension, diabetes and cardiovascular diseases. Risk-related biophysical and biochemical data included blood pressure, lipid profile and HbA1c levels. Those variables missing greater than 30% of values were excluded from the analysis. The missing values for other variables were replaced with multiple imputation. The study created five datasets using the multivariate imputation via chained equations module in the R package to perform the data imputation. All baseline characteristics and laboratory variables were obtained from the NHI pre-ESRD Patient Care and Education Program and the NHI Reimbursement Plans that Improve Health Care Quality of Early-Stage CKD Program implemented by the NHI.

Statistical Analysis
In this study, baseline demographic and laboratory data from the first clinic visit at which CKD was diagnosed were used to train the models. Due to differences in the collection of clinical characteristics, clinical data for Stages 2-3A CKD (early stage) were processed separately from clinical data for Stages 3B-5 CKD (advanced stage). Clinical indicators, including age, eGFR, serum creatinine, urine creatinine, LDL-C, HbA1c and UPCR, were associated with a significant risk of ESRD among patients diagnosed with Stages 2-5 CKD. In addition to these risk factors, the indicators associated with a significant risk for ESRD among Stages 3A-5 CKD include uric acid, albumin, fasting plasma glucose (FPG), triglyceride, cholesterol, hemoglobin, hematocrit, BUN, sodium, potassium, calcium and phosphorus. The detailed demographic characteristics of the cohort are listed in Table 1. Five available classification models were developed and the predictive performances of each model for determining the progression risk of various stages of pre-dialysis CKD were analyzed. The predictive performances of the five models were evaluated using an AUROC analysis and the sensitivity, specificity, accuracy, precision, F1 score and negative predictive value (NPV) were calculated. A cutoff value was identified based on the AUROC analysis to provide optimal sensitivity and specificity. Important features were selected and applied to determine the optimal model for predicting ESRD progression in patients with earlyand advanced-stage CKD.

Results
The characteristics of patients with early-stage CKD (Stages 2 to 3A) and advancedstage CKD (Stages 3B to 5) are shown in Table 1. The majority of patients were approximately 80 years old and most were men. High proportions of patients with CKD had comorbid diabetes and hypertension, particularly those with advanced-stage CKD. For example, the proportions of patients with hypertension and Stages 3B, 4 and 5 CKD were 75.7%, 74.1% and 79.5%, respectively. The proportions of patients with Stages 4 and 5 CKD undergoing ESRD were higher than those with Stages 2 and 3 CKD. Serum creatinine, urine creatinine, UPCR and HbA1c levels increased from Stages 2 to 5, whereas eGFR levels decreased as CKD progressed from early stages to advanced stages.
The predictive performances of the tested five models were evaluated using the AUROC analysis and other discrimination indicators. Significant performance differences were identified between the five models. The performances of each model to predict the progression of early-and advanced-stage CKD to ESRD are shown in Tables 2 and 3, respectively. For early-stage CKD, the AUROC value for the RF classifier using SMOTE was 0.97 for predicting progression to ESRD within 3 years. Both RF and XGBoost classifiers with SMOTE resulted in AUROC values of 0.98 when predicting progression to ESRD within 5 years. For advanced-stage CKD, both RF and XGBoost classifiers with SMOTE resulted in AUROC values of 0.99 for the prediction of progression to ESRD within 1 year. The AUROC value of the RF classifier with SMOTE was 0.97 for the prediction of progression to ESRD within 3 years. The accuracy, specificity and sensitivity of the RF classifier were all greater than 90%. The AUROC plots of the RF classifier with SMOTE for predicting the progression of both early-and advanced-stage CKD are shown in Figure 3. The best performances among all five models were obtained using the RF classifier, which is suitable for predicting the progression of CKD to ESRD at all CKD stages. To assess the contributions of various features to the prediction of ESRD progression, the SHAP value method was applied. For the progression of early-stage CKD to ESRD within 3 and 5 years, 13 features were analyzed by SHAP, as shown in Figure 4a,b. The results showed that urine creatinine and eGFR levels are the most influential features. A lower urine creatinine level is associated with a lower risk of progression to ESRD, whereas a lower eGFR level is associated with a higher risk of progression to ESRD. In addition, for the progression of advanced-stage CKD to ESRD within 1 and 3 years, 24 features were analyzed by SHAP, as shown in Figure 4c,d. The results indicated that serum creatinine level was the most important predictive feature in advanced-stage CKD. A lower serum creatinine level is associated with a lower risk of progression to ESRD. The second-most important features are hematocrit and urine creatinine, which have predictive value for progression within 1 and 3 years, respectively. A negative correlation between progression and hematocrit level was observed for progression within 1 year. Features associated with the progression of advanced-stage CKD to ESRD within 1 year showed more pronounced positive and negative associations with the risk of progression than the features associated with progression in 3 or 5 years for either advanced-or early-stage CKD.   second-most important features are hematocrit and urine creatinine, which have predic-tive value for progression within 1 and 3 years, respectively. A negative correlation between progression and hematocrit level was observed for progression within 1 year. Features associated with the progression of advanced-stage CKD to ESRD within 1 year showed more pronounced positive and negative associations with the risk of progression than the features associated with progression in 3 or 5 years for either advanced-or earlystage CKD. According to the SHAP analysis, the top six features were used to retrain the model for predicting the progression of early-stage CKD within 5 years, as shown in Figure 5a,b. In addition, the top 10 features were used to retrain the model for predicting the progression of advanced-stage CKD within 1 year, as shown in Figure 5c,d. The AUROC values for the RF classifier with SMOTE were 0.96 for early-stage CKD and 0.97 for advanced-stage CKD. These results indicated that the RF classifier was the best model for predicting the risk of progression among patients with CKD, especially for predicting progression within 1 year in patients with advanced-stage CKD.
In addition, the top 10 features were used to retrain the model for predicting the progression of advanced-stage CKD within 1 year, as shown in Figure 5c,d. The AUROC values for the RF classifier with SMOTE were 0.96 for early-stage CKD and 0.97 for advancedstage CKD. These results indicated that the RF classifier was the best model for predicting the risk of progression among patients with CKD, especially for predicting progression within 1 year in patients with advanced-stage CKD.

Discussion
In the present study, we evaluated several models for the ability to predict the progression of Stages 2-5 pre-dialysis CKD to ESRD. Based on our results, the SMOTE method can significantly improve the abilities of five models to predict CKD progression. The RF classifier showed the highest AUROC of 0.99 for predicting progression from advancedstage CKD to ESRD within 1 year while achieving a sensitivity of 0.96, a specificity of 0.91, an accuracy of 0.93 and a precision of 0.90, as shown. Similarly, the XGBoost classifier showed an AUROC of 0.99, a sensitivity of 0.96, a specificity of 0.94, an accuracy of 0.95 and a precision of 0.93 for predicting the progression of advanced-stage CKD to ESRD within 1 year. Compared with the other classifiers, both the RF and XGBoost classifiers are more suitable for the early prediction of progression in patients with advanced-stage CKD.
In patients with early-stage CKD, the performance of the RF classifier was better than the performance of the XGBoost classifier, including better sensitivity, specificity, accuracy, precision and F1 scores. The AUROC values for the RF classifiers were 0.97 and 0.98 for predicting progression to ESRD within 3 and 5 years, respectively. Therefore, the RF classifier demonstrated the best performance for predicting progression in patients with both early-and advanced-stage pre-dialysis CKD. In a different study, Ravindra et al. reported an accuracy of 0.94 achieved by an SVM neural network for distinguishing between CKD and non-CKD [20]. In addition to classifier selection, feature selection is important for the performance of ML algorithms. Dulhare and Ayesha [21] indicated that the selection of suitable features or predictors is crucial for training ML classifiers. Their results showed that the GNB classifier had optimal performance when operated by a one-rule attribute selector.
In our study, we ranked the features associated with early-and advanced-stage CKD according to the SHAP value. The primary impact of urine creatinine can be observed for both early-and advanced-stage CKD. The results indicated that low urine creatinine levels are associated with a low risk of progression to ESRD, whereas eGFR and systolic blood pressure are risk factors for progression that can be observed during early-stage CKD. These results demonstrated that a low eGFR level is associated with a high risk of ESRD. High systolic blood pressure is an important risk factor that can be identified in early-stage CKD. Seyedzadeh et al. reported that ESRD was concurrent with several clinical symptoms, among which hypertension (52.3%) was the most commonly identified symptom in 128 patients [22]. In our study, a high prevalence of hypertension (79.5%) was observed among Stage 5 CKD patients. However, a negative association was identified between systolic blood pressure and ESRD in advanced-stage CKD based on the SHAP value. A high impact of serum creatinine was also identified for advanced-stage CKD. Therefore, in addition to serum creatinine, urine creatinine, eGFR and blood pressure are all highly associated with ESRD and each factor has different impacts at different CKD stages. Urine creatinine appears as a parameter of special relevance to predict the evolution of kidney damage. Nonetheless, it should be considered that the parameter may be altered due to urine dilution in different time of sample collection. Our study suggests that multiple parameters are needed simultaneously while using the prediction model. The calculation of eGFR has been offered as a practical and easy approach for converting serum creatinine values, as reviewed by Mula-Abed [23]. Further study of the relationship between eGFR and creatine will improve the qualitative estimation of the interaction between eGFR and creatinine and their impacts on ESRD in CKD patients. Additionally, our results showed that high serum phosphate is a common complication in patients with ESRD. Seyedzadeh et al. reported a large impact for serum phosphate in approximately 50% of patients with ESRD, associated with renal osteodystrophy [22]. Our study found increased serum phosphate levels in patients with Stages 3B to 5 CKD, which is consistent with the previous study.
Anemia or low serum albumin are also common complications of CKD. Our findings indicated that low hematocrit and albumin levels are associated with an increased risk of ESRD among patients with advanced-stage CKD. Anemia is strongly associated with poor kidney function in CKD patients [24]. A previous study reported prevalence rates of anemia among patients with CKD of 42%, 33%, 48%, 71% and 82% for Stages 1, 2, 3, 4 and 5, respectively, in Saudi Arabia [25]. Decreased serum albumin levels were also associated with a decline in eGFR and may be related to proteinuria or underlying inflammation [26]. The criteria for hyperuricemia include an increased uric acid level, which was also observed to increase from Stages 3B to 5 in the current study. These results implied that high uric acid levels in patients with advanced-stage CKD might be associated with impending renal failure. Past work indicated a 'J-shaped' association between uric acid levels and mortality in hemodialysis patients [27]. Thus, maintaining uric acid levels in patients with advancedstage CKD within normal levels should be a clinical goal and uric acid levels should be monitored. Oda and Kawai reported that LDL-C levels were significantly higher in Stages 2 and 3 CKD than in Stage 1 CKD in a study including 3897 patients [28]. In our study, the LDL-C levels in patients with Stages 3B and 4 CKD were higher than those in patients with early-stage CKD, which is consistent with the findings of the previous study and provides additional information for more advanced CKD stages. Due to collinearity and the reduced impacts of hypertension and diabetes in our models, these two comorbidities were excluded from our models. HbA1c levels are a well-known indicator of diabetes control and showed a positive impact on the progression of advanced-stage CKD. Based on our results, the impacts of risk factors differ across different stages of CKD. Thus, specific management strategies are necessary for different stages, making the early diagnosis of CKD particularly important.
To assess the optimal predictive model for the progression of pre-dialysis CKD to ESRD, the top six features (eGFR, blood pressure, UPCR, serum creatinine, urine creatinine and LDL-C) identified for early-stage CKD and the top 10 features (serum creatinine, uric acid, urine creatinine, calcium, LDL-C, hemoglobin, HbA1c, cholesterol, phosphorus and triglyceride) identified for advanced-stage CKD were used to retrain all five models. The RF classifier demonstrated the optimal risk prediction performance for the progression of pre-dialysis CKD to ESRD. The AUROC values for the RF classifier with SMOTE were 0.96 for the progression of early-stage CKD within 5 years and 0.97 for the progression of advanced-stage CKD within 1 year. A slight difference (0.98 to 0.96) was observed for the RF classifier when using only six features compared with using all features for the prediction of early-stage CKD progression. Similarly, the AUROC of the RF classifier using only the top 10 features showed a slight decline (0.99 to 0.97) compared with the RF classifier using all features when predicting advanced-stage CKD progression. The results indicated that the RF classifier could be used with specific features to predict ESRD risk in patients with pre-dialysis CKD. This approach can help clinicians understand the risk factors associated with ESRD and the progression of patients with CKD at different stages. An effective predictive model can help medical teams quickly and easily identify the key factors contributing to the deterioration of renal function, track the rate of renal function decline and modify the care goals on a rolling basis. In addition, predicting the time of progressing to ESRD can early remind care providers, patients and relatives with facing to the dangers and complications of ESRD. Certain strategies, such as stricter diet control, treatment of electrolyte imbalances and acidemia, improvement of anemia and uremia, or early decision on dialysis mode can be intervened in time to reduce the impact on the body and on life.
Our approach provides a reference to clinical strategy. Nonetheless, there are several limitations to this study. First, this cohort consisted of a relatively small sample, so the model performance may have been affected by the training data. Second, the laboratory data was limited by geography and subject demographics, limiting the generalizability of these findings to the wider population. Third, the individual laboratory records may have changed over time. Because our models were based on baseline data, our study cannot present the trajectory of disease progression. A longitudinal model is likely to better reflect associations between risk factors and disease progression and should be evaluated in future studies

Conclusions
The present study demonstrated a reliable ML method for predicting the risk of progression to eventual ESRD among patients with Stages 2-5 CKD. The RF classifier with SMOTE showed the best performance for the early diagnosis of CKD prognosis. In addition, the high performance of ML classifiers was achieved when limiting the analysis to predominant features. This approach reveals that the RF classifier is suitable for risk assessments among patients with pre-dialysis CKD and the results could be potentially advantageous for patient screening initiatives.

Institutional Review Board Statement:
The study protocol used was reviewed and approved by the Institutional Review Board of Taipei Veterans General Hospital (No. 2020-01-024BC).

Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets generated and analyzed during the current study are not publicly available due to privacy/ethical restrictions but are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare that they have no conflict of interest.