Next Article in Journal
Calculation of Safety Factors of the Eurocodes
Previous Article in Journal
Resolving Dilemmas Arising during Design and Implementation of Digital Repository of Heterogenic Scientific Resources

Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm

Faculty of Software Information Science, Iwate Prefectural University, Iwate 020-0693, Japan
Sendai Foundation for Applied Information Sciences, Sendai 980-0012, Japan
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(1), 202;
Received: 11 November 2020 / Revised: 7 December 2020 / Accepted: 23 December 2020 / Published: 28 December 2020
Background: Creatinine is a type of metabolite of blood that is strongly correlated to glomerular filtration rate (GFR). As measuring GFR is difficult, creatinine value is used for indirectly determining GFR and then the stage of chronic kidney disease (CKD). Adding a creatinine test into routine health examination could detect CKD. As more items for comprehensive examination means higher cost, creatinine testing is not included in the routine health examination in many countries. An algorithm based on common test results, without creatinine test, to evaluate the risk of CKD will increase the chance of its early detection and treatment. Methods: In this study, we used open source data containing 1 million samples. These data contain 23 health-related features, including common diagnostic test results provided by National Health Insurance Sharing Service (NHISS). A low GFR indicates possible chronic kidney disease (CKD). As is commonly accepted in the medical community, a GFR of 60 mL/min is used as the threshold, below which is considered to have CKD. In this study, the first step aims to build a regression model to predict the value of creatinine from 23 features, and then combine the predicted value of creatinine with the original 23 features to evaluate the risk of CKD. We will show by simulation that by the proposed method we can achieve better prediction results compared to direct prediction from 23 features. The data is extremely unbalanced for predicting the target variable creatinine. We used undersampling method and proposed a new cost-sensitive mean-squared error (MSE) loss function to deal with the problem. Regrading model selection, this work used three machine learning models: a bagging tree model named Random Forest, a boosting tree model named XGBoost, and a neural network based model named ResNet. To improve the result of the creatinine predictor, we averaged results from eight predictors, a method known as ensemble learning. Finally, the predicted creatinine and the original 23 features is used to predict the risk of CKD. Results: We optimized results of R-Squared (R2) value to select the appropriate undersampling strategy and the regression model for the regression stage of creatinine prediction. Ensembled model achieved the best performance of R2 of 0.5590. The six factors from 23 are selected from the top of the list of how strongly they affect the creatinine value. They are sex, age, hemoglobin, the level of urine protein, waist circumference, and habit of smoking. Using the predicted value of creatinine, an area under Receiver Operating Characteristic curve (AUC) of 0.76 is achieved while classifying samples for CKD. Conclusions: Using commonly available health parameters, the proposed system can assess the risk of CKD for public health. High-risk subjects can be screened and advised to take a creatinine test for further confirmation. In this way, we can reduce the impact of CKD on public health and facilitate early detection for many, where a blanket test of creatinine is not available for all. View Full-Text
Keywords: chronic kidney disease; creatine; ensemble learning; regression; unbalanced data chronic kidney disease; creatine; ensemble learning; regression; unbalanced data
Show Figures

Figure 1

MDPI and ACS Style

Wang, W.; Chakraborty, G.; Chakraborty, B. Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm. Appl. Sci. 2021, 11, 202.

AMA Style

Wang W, Chakraborty G, Chakraborty B. Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm. Applied Sciences. 2021; 11(1):202.

Chicago/Turabian Style

Wang, Weilun, Goutam Chakraborty, and Basabi Chakraborty. 2021. "Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm" Applied Sciences 11, no. 1: 202.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop