Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review
Abstract
1. Introduction
2. Methods
2.1. Search Strategy
2.2. Inclusion/Exclusion Criteria
2.3. Risk of Bias and Applicability Assessment
2.4. Data Synthesis
3. Results
3.1. Literature Screening Process and Results
3.2. Basic Characteristics of the Included Literature
3.3. Basic Features Included in the Prediction Model
3.3.1. Establishment and Validation of the Model
3.3.2. Performance of Predictive Factors in the Model and Research Limitations
3.4. Literature Quality Assessment
4. Discussion
4.1. Homogenization of Predictors
4.2. Treatment of Continuous Variables and High Risk of Bias
4.3. Model Validation and Application
4.4. Comparison of Traditional Statistical Methods and Machine Learning Prediction Methods
4.5. Contrast with TRIPOD Reporting Standards
4.6. Early Intervention Strategies
4.7. Perspective for Clinical Practice
5. Implications for Future Research
6. Limitation
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ALT | Alanine Aminotransferase |
ANN | Artificial Neural Network |
AST | Aspartate Aminotransferase |
AUC | Area Under the Curve |
BMI | Body Mass Index |
BPNN | Back Propagation Neural Network |
BUN | Blood Urea Nitrogen |
C4.5 | C4.5 Decision Tree Algorithm |
CAD | Coronary Artery Disease |
CART | Classification and Regression Tree |
CB | Calibration Belt |
CHOL | Cholesterol |
COX | Cox Proportional Hazards Model |
CR | Creatinine |
CREA | Creatinine |
DBP | Diastolic Blood Pressure |
DL | Deep Learning |
DNN | Deep Neural Network |
DT | Decision Tree |
ECG | Electrocardiogram |
EH | Essential Hypertension |
FBG | Fasting Blood Glucose |
FPG | Fasting Plasma Glucose |
GLU | Glucose |
HB | Hemoglobin |
HBA1C | Hemoglobin A1c |
HDL | High-Density Lipoprotein |
HDL-C | High-Density Lipoprotein Cholesterol |
HTN | Hypertension |
IDF | International Diabetes Federation |
KNN | K-Nearest Neighbors |
LASSO | Least Absolute Shrinkage and Selection Operator |
LDL-C | Low-Density Lipoprotein Cholesterol |
LGBM | Light Gradient Boosting Machine |
LR | Logistic Regression |
MCHC | Mean Corpuscular Hemoglobin Concentration |
ML | Machine Learning |
MLP | Multilayer Perceptron |
PDM | Prediabetes Mellitus |
PLT | Platelets |
PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
RF | Random Forest |
RFE | Recursive Feature Elimination |
SBP | Systolic Blood Pressure |
SCR | Serum Creatinine |
SHAP | SHapley Additive exPlanations |
SS | Salt Sensitivity |
SUA | Serum Uric Acid |
SVM | Support Vector Machine |
T2D | Type 2 Diabetes |
TABNET | Tabular Neural Network |
TBIL | Total Bilirubin |
TC | Total Cholesterol |
TG | Triglycerides |
VC | Variable Combination |
VIF | Variance Inflation Factor |
WBC | White Blood Cell |
WC | Waist Circumference |
WHR | Waist-to-Hip Ratio |
XGB | Extreme Gradient Boosting |
SDOH | Social Determinants of Health |
EPV | Events Per Variable |
References
- Heald, A.H.; Stedman, M.; Davies, M.; Livingston, M.; Alshames, R.; Lunt, M.; Rayman, G.; Gadsby, R. Estimating life years lost to diabetes: Outcomes from analysis of National Diabetes Audit and Office of National Statistics data. Cardiovasc. Endocrinol. Metab. 2020, 9, 183–185. [Google Scholar] [CrossRef]
- American Diabetes Association. Standards of medical care in diabetes—2020 abridged for primary care providers. Clin. Diabetes: A Publ. Am. Diabetes Assoc. 2020, 38, 10–38. [Google Scholar] [CrossRef]
- International Diabetes Federation. IDF Diabetes Atlas, 11th ed.; International Diabetes Federation: Brussels, Belgium, 2025; Available online: https://diabetesatlas.org/ (accessed on 2 August 2025).
- Cavan, D. Why screen for type 2 diabetes? Diabetes Res. Clin. Pract. 2016, 121, 215–217. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Zhang, P.; Wang, J.; Gregg, E.W.; Yang, W.; Gong, Q.; Li, H.; Li, H.; Jiang, Y.; An, Y.; et al. The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: A 20-year follow-up study. Lancet 2008, 371, 1783–1789. [Google Scholar] [CrossRef] [PubMed]
- Lowe, W.L.; Bain, J.R. “Prediction is very hard, especially about the future”: New biomarkers for type 2 diabetes? Diabetes 2013, 62, 1384–1385. [Google Scholar] [CrossRef] [PubMed]
- Janghorbani, M.; Adineh, H.; Amini, M. Evaluation of the Finnish Diabetes Risk Score (FINDRISC) as a screening tool for the metabolic syndrome. Rev. Diabet. Stud. 2013, 10, 283–292. [Google Scholar] [CrossRef]
- Petridis, P.D.; Kristo, A.S.; Sikalidis, A.K.; Kitsas, I.K. A review on trending machine learning techniques for type 2 diabetes mellitus management. Informatics 2024, 11, 70. [Google Scholar] [CrossRef]
- Li, W.; Liu, X.; Liu, Z.; Xing, Q.; Liu, R.; Wu, Q.; Hu, Y.; Zhang, J. The signaling pathways of selected traditional Chinese medicine prescriptions and their metabolites in the treatment of diabetic cardiomyopathy: A review. Front. Pharmacol. 2024, 15, 1416403. [Google Scholar] [CrossRef]
- Nazirun, N.N.N.; Wahab, A.A.; Selamat, A.; Fujita, H.; Krejcar, O.; Kuca, K. Prediction models for type 2 diabetes progression: A systematic review. IEEE Access 2024, 12, 161595–161619. [Google Scholar] [CrossRef]
- Bini, S.A. Artificial intelligence, machine learning, deep learning, and cognitive computing: What do these terms mean and how will they impact health care? J. Arthroplast. 2018, 33, 2358–2361. [Google Scholar] [CrossRef]
- Negi, A.; Jaiswal, V. A first attempt to develop a diabetes prediction method based on different global datasets. In 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC); IEEE: New York, NY, USA, 2016; pp. 237–241. [Google Scholar] [CrossRef]
- Shaik, T.; Tao, X.; Higgins, N.; Li, L.; Gururajan, R.; Zhou, X.; Acharya, U.R. Remote patient monitoring using artificial intelligence: Current state, applications, and challenges. WIREs Data Min. Knowl. Discov. 2023, 13, e1485. [Google Scholar] [CrossRef]
- Bell, M.L.; Fiero, M.; Horton, N.J.; Hsu, C.-H. Handling missing data in RCTs; a review of the top medical journals. BMC Med. Res. Methodol. 2014, 14, 118. [Google Scholar] [CrossRef]
- Asgari, S.; Khalili, D.; Hosseinpanah, F.; Hadaegh, F. Prediction models for type 2 diabetes risk in the general population: A systematic review of observational studies. Int. J. Endocrinol. Metab. 2021, 19, e109206. [Google Scholar] [CrossRef] [PubMed]
- Sung, K.; Lee, S. Social determinants of health and type 2 diabetes in Asia. J. Diabetes Investig. 2025, 16, 971–983. [Google Scholar] [CrossRef] [PubMed]
- Hu, G.; Lin, L.; Hu, X.; Zheng, Y.; Liu, X.; Xu, Z.; He, Y.; Zhang, Y. Machine learning-based diagnosis of type 2 diabetes mellitus using social determinants of health. Mol. Cell. Biomech. 2025, 22, 1461. [Google Scholar] [CrossRef]
- Lan, X.; Ji, X.; Zheng, X.; Ding, X.; Mou, H.; Lu, S.; Ye, B. Socio-demographic and clinical determinants of self-care in adults with type 2 diabetes: A multicenter cross-sectional study in Zhejiang province, China. BMC Public Health 2025, 25, 397. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, H.-F.; Wu, X.; Li, G.-H.; Golden, A.R.; Cai, L. Rural-urban differentials of prevalence and lifestyle determinants of pre-diabetes and diabetes among the elderly in southwest China. BMC Public Health 2023, 23, 603. [Google Scholar] [CrossRef]
- Chang, G.; Tian, S.; Luo, X.; Xiang, Y.; Cai, C.; Zhu, R.; Cai, H.; Yang, H.; Gao, H. Hypoglycemic effects and mechanisms of polyphenols from Myrica rubra pomace in type 2 diabetes (db/db) mice. Mol. Nutr. Food Res. 2025, 69, e202400523. [Google Scholar] [CrossRef]
- Dong, W.; Wan, E.; Bedford, L.; Wu, T.; Wong, C.; Tang, E.; Lam, C. Prediction models for the risk of cardiovascular diseases in Chinese patients with type 2 diabetes mellitus: A systematic review. Public Health 2020, 186, 144–156. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
- Alam, A.; Sohel, A.; Hasan, K.M.; Islam, M.A. Machine learning and artificial intelligence in diabetes prediction and management: A comprehensive review of models. J. Next-Gen Eng. Syst. 2024, 1, 107–124. [Google Scholar] [CrossRef]
- Sun, Y.; Gregersen, H.; Yuan, W. Chinese health care system and clinical epidemiology. Clin. Epidemiol. 2017, 9, 167–178. [Google Scholar] [CrossRef] [PubMed]
- Chen, R.; Wang, S.F.; Zhou, J.C.; Sun, F.; Wei, W.W.; Zhan, S.Y. Introduction of the Prediction model Risk Of Bias Assessment Tool: A tool to assess risk of bias and applicability of prediction model studies. Chin. J. Epidemiol. 2020, 41, 776–781. [Google Scholar] [CrossRef]
- Gusenbauer, M.; Gauster, S.P. How to search for literature in systematic reviews and meta-analyses: A comprehensive step-by-step guide. Technol. Forecast. Soc. Change 2025, 212, 123833. [Google Scholar] [CrossRef]
- Lorenzetti, D.L.; Ghali, W.A. Reference management software for systematic reviews and meta-analyses: An exploration of usage and usability. BMC Med. Res. Methodol. 2013, 13, 141. [Google Scholar] [CrossRef]
- Xu, W.; Zhou, Y.; Jiang, Q.; Fang, Y.; Yang, Q. Risk prediction models for diabetic nephropathy among type 2 diabetes patients in China: A systematic review and meta-analysis. Front. Endocrinol. 2024, 15, 1407348. [Google Scholar] [CrossRef]
- Bozkurt, S.; Cahan, E.M.; Seneviratne, M.G.; Sun, R.; Lossio-Ventura, J.A.; Ioannidis, J.P.A.; Hernandez-Boussard, T. Reporting of demographic data and representativeness in machine learning models using electronic health records. J. Am. Med. Inform. Assoc. 2020, 27, 1878–1884. [Google Scholar] [CrossRef]
- Xu, T.; Yu, D.; Zhou, W.; Yu, L. A nomogram model for the risk prediction of type 2 diabetes in healthy eastern China residents: A 14-year retrospective cohort study from 15,166 participants. EPMA J. 2022, 13, 397–405. [Google Scholar] [CrossRef]
- Lin, Y.; Shen, Y.; He, R.; Wang, Q.; Deng, H.; Cheng, S.; Liu, Y.; Li, Y.; Lu, X.; Shen, Z. A novel predictive model for optimizing diabetes screening in older adults. J. Diabetes Investig. 2024, 15, 1403–1409. [Google Scholar] [CrossRef]
- Wang, S.; Chen, R.; Wang, S.; Kong, D.; Cao, R.; Lin, C.; Luo, L.; Huang, J.; Zhang, Q.; Yu, H.; et al. Comparative study on risk prediction model of type 2 diabetes based on machine learning theory: A cross-sectional study. BMJ Open 2023, 13, e069018. [Google Scholar] [CrossRef]
- Liu, H.; Dong, S.; Yang, H.; Wang, L.; Liu, J.; Du, Y.; Liu, J.; Lyu, Z.; Wang, Y.; Jiang, L.; et al. Comparing the accuracy of four machine learning models in predicting type 2 diabetes onset within the Chinese population: A retrospective study. J. Int. Med. Res. 2024, 52, 3000605241253786. [Google Scholar] [CrossRef]
- Yang, J.; Liu, D.; Du, Q.; Zhu, J.; Lu, L.; Wu, Z.; Zhang, D.; Ji, X.; Zheng, X. Construction of a 3-year risk prediction model for developing diabetes in patients with pre-diabetes. Front. Endocrinol. 2024, 15, 1410502. [Google Scholar] [CrossRef] [PubMed]
- Tong, Y.-T.; Gao, G.-J.; Chang, H.; Wu, X.-W.; Li, M.-T. Development and economic assessment of machine learning models to predict glycosylated hemoglobin in type 2 diabetes. Front. Pharmacol. 2023, 14, 1216182. [Google Scholar] [CrossRef] [PubMed]
- Shao, X.; Wang, Y.; Huang, S.; Liu, H.; Zhou, S.; Zhang, R.; Yu, P.; Hu, C. Development and validation of a prediction model estimating the 10-year risk for type 2 diabetes in China. PLoS ONE 2020, 15, e0237936. [Google Scholar] [CrossRef] [PubMed]
- Jiang, L.; Xia, Z.; Zhu, R.; Gong, H.; Wang, J.; Li, J.; Wang, L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev. Med. Rep. 2023, 35, 102358. [Google Scholar] [CrossRef]
- Li, L.; Cheng, Y.; Ji, W.; Liu, M.; Hu, Z.; Yang, Y.; Wang, Y.; Zhou, Y. Machine learning for predicting diabetes risk in western China adults. Diabetol. Metab. Syndr. 2023, 15, 165. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, Y.; Wang, K.; Su, Y.; Zhuge, J.; Li, W.; Wang, S.; Yao, H. Nomogram model for screening the risk of type II diabetes in western Xinjiang, China. Diabetes, Metab. Syndr. Obesity: Targets Ther. 2021, 14, 3541–3553. [Google Scholar] [CrossRef]
- Dong, W.; Tse, T.Y.E.; Mak, L.I.; Wong, C.K.H.; Wan, Y.F.E.; Tang, H.M.E.; Chin, W.Y.; Bedford, L.E.; Yu, Y.T.E.; Ko, W.K.W.; et al. Non-laboratory-based risk assessment model for case detection of diabetes mellitus and pre-diabetes in primary care. J. Diabetes Investig. 2022, 13, 1374–1386. [Google Scholar] [CrossRef]
- Liu, Q.; Zhang, M.; He, Y.; Zhang, L.; Zou, J.; Yan, Y.; Guo, Y. Predicting the risk of incident type 2 diabetes mellitus in Chinese elderly using machine learning techniques. J. Pers. Med. 2022, 12, 905. [Google Scholar] [CrossRef]
- Hu, H.; Wang, J.; Han, X.; Li, Y.; Miao, X.; Yuan, J.; Yang, H.; He, M. Prediction of 5-year risk of diabetes mellitus in relatively low risk middle-aged and elderly adults. Acta Diabetol. 2020, 57, 63–70. [Google Scholar] [CrossRef]
- Yang, H.; Yuan, L.; Wu, J.; Li, X.; Long, L.; Teng, Y.; Feng, W.; Lyu, L.; Xu, B.; Ma, T.; et al. Construction of a predictive model for diabetes mellitus type 2 in middle-aged and elderly populations based on the medical checkup data of National Basic Public Health Service. Sichuan Da Xue Xue Bao. Yi Xue Ban = J. Sichuan University. Med. Sci. Ed. 2024, 55, 662–670. [Google Scholar] [CrossRef]
- Long, X.; Hua, H.; Wu, Y.; Zhang, W.; Yin, C.; Li, N.; Cheng, N. Construction and validation of a risk prediction model for diabetes incidence. J. Lanzhou Univ. (Med. Ed.) 2024, 50, 70–78. [Google Scholar] [CrossRef]
- Ma, W.; Wang, K.; Yu, B.; Feng, C.; Ji, J. Comparative study of diabetes risk prediction models based on physical examination data. Mod. Inf. Technol. 2020, 4, 72–75. [Google Scholar] [CrossRef]
- Miao, Q.; Zhu, Y. Diabetes prediction model based on PSO-FWSVM. Comput. Digit. Eng. 2020, 48, 993–998. [Google Scholar] [CrossRef]
- Ma, Y.; Che, Q.; Zheng, Q.; Chen, S.; Zhou, Z.; Yang, J.; Wu, Y.; Wu, T.; Hu, Y.; Zhang, L.; et al. Common evaluation methods of prediction model for risk of type 2 diabetes mellitus. Chin. J. Prev. Control. Chronic Dis. 2020, 28, 94–100. [Google Scholar] [CrossRef]
- Ouyang, P.; Li, X.; Leng, F.; Lai, X.; Zhang, H.; Yan, C.; Wang, C.; Bai, Y.; Xing, Z.; Liu, X.; et al. Application of machine learning algorithms in predicting diabetes risk in a physical examination population. Chin. J. Dis. Control. Prev. 2021, 25, 849–853. [Google Scholar] [CrossRef]
- Wu, H.; Chen, S.; Chen, Z.; Yang, Y.; Zeng, C.; Wu, S.; Su, X. Study on diabetes prediction model based on LightGBM model. China Health Stand. Manag. 2023, 14, 64–67. [Google Scholar] [CrossRef]
- Yang, S. Study on key biological indicators of diabetes based on statistical tests. J. Clin. Nurs. Res. 2024, 8, 267–273. [Google Scholar] [CrossRef]
- Deberneh, H.M.; Kim, I. Prediction of type 2 diabetes based on machine learning algorithm. Int. J. Environ. Res. Public Health 2021, 18, 3317. [Google Scholar] [CrossRef] [PubMed]
- Tarumi, S.; Takeuchi, W.; Qi, R.; Ning, X.; Ruppert, L.; Ban, H.; Robertson, D.H.; Schleyer, T.; Kawamoto, K. Predicting pharmacotherapeutic outcomes for type 2 diabetes: An evaluation of three approaches to leveraging electronic health record data from multiple sources. J. Biomed. Inform. 2022, 129, 104001. [Google Scholar] [CrossRef]
- Hatmal, M.M.; Alshaer, W.; Mahmoud, I.S.; Al-Hatamleh, M.A.I.; Al-Ameer, H.J.; Abuyaman, O.; Zihlif, M.; Mohamud, R.; Darras, M.; Al Shhab, M.; et al. Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis. PLoS ONE 2021, 16, e0257857. [Google Scholar] [CrossRef] [PubMed]
- Ngufor, C.; Van Houten, H.; Caffo, B.S.; Shah, N.D.; McCoy, R.G. Mixed effect machine learning: A framework for predicting longitudinal change in hemoglobin A1c. J. Biomed. Inform. 2019, 89, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Shin, J.; Lee, J.; Ko, T.; Lee, K.; Choi, Y.; Kim, H.-S. Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. J. Pers. Med. 2022, 12, 1899. [Google Scholar] [CrossRef] [PubMed]
- Kopitar, L.; Kocbek, P.; Cilar, L.; Sheikh, A.; Stiglic, G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci. Rep. 2020, 10, 11981. [Google Scholar] [CrossRef]
- Yuk, H.; Gim, J.; Min, J.K.; Yun, J.; Heo, T.-Y. Artificial intelligence–based prediction of diabetes and prediabetes using health checkup data in Korea. Appl. Artif. Intell. 2022, 36, 2145644. [Google Scholar] [CrossRef]
- Hegde, H.; Shimpi, N.; Panny, A.; Glurich, I.; Christie, P.; Acharya, A. Development of non-invasive diabetes risk prediction models as decision support tools designed for application in the dental clinical environment. Inform. Med. Unlocked 2019, 17, 100254. [Google Scholar] [CrossRef]
- Syed, A.H.; Khan, T. Machine learning-based application for predicting risk of type 2 diabetes mellitus (T2DM) in Saudi Arabia: A retrospective cross-sectional study. IEEE Access 2020, 8, 199539–199561. [Google Scholar] [CrossRef]
- Oh, R.; Lee, H.K.; Pak, Y.K.; Oh, M.-S. An Interactive Online App for Predicting Diabetes via Machine Learning from Environment-Polluting Chemical Exposure Data. Int. J. Environ. Res. Public Health 2022, 19, 5800. [Google Scholar] [CrossRef]
- Gollapalli, M.; Alansari, A.; Alkhorasani, H.; Alsubaii, M.; Sakloua, R.; Alzahrani, R.; Al-Hariri, M.; Alfares, M.; AlKhafaji, D.; Al Argan, R.; et al. A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM. Comput. Biol. Med. 2022, 147, 105757. [Google Scholar] [CrossRef]
- Islam, S.; Qaraqe, M.K.; Belhaouari, S.B.; Abdul-Ghani, M.A. Advanced techniques for predicting the future progression of type 2 diabetes. IEEE Access 2020, 8, 120537–120547. [Google Scholar] [CrossRef]
- Deberneh, H.M.; Kim, I.; Park, J.H.; Cha, E.; Joung, K.H.; Lee, J.S.; Lim, D.S. 1233-P: Prediction of type 2 diabetes occurrence using machine learning model. Diabetes 2020, 69 (Suppl. 1), 1233. [Google Scholar] [CrossRef]
- Navarro, C.L.A.; Damen, J.A.A.; Takada, T.; Nijman, S.W.J.; Dhiman, P.; Ma, J.; Collins, G.S.; Bajpai, R.; Riley, R.D.; Moons, K.G.M.; et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: Systematic review. BMJ 2021, 375, n2281. [Google Scholar] [CrossRef] [PubMed]
- Altman, D.G.; Royston, P. The cost of dichotomising continuous variables. BMJ 2006, 332, 1080. [Google Scholar] [CrossRef]
- Harrell, F.E., Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
- Ramspek, C.L.; Jager, K.J.; Dekker, F.W.; Zoccali, C.; van Diepen, M. External validation of prognostic models: What, why, how, when and where? Clin. Kidney J. 2021, 14, 49–58. [Google Scholar] [CrossRef] [PubMed]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Med. 2015, 13, 148–158. [Google Scholar] [CrossRef]
- Mansmann, U.; Ön, B.I. The validation of prediction models deserves more recognition. BMC Med. 2025, 23, 166. [Google Scholar] [CrossRef] [PubMed]
- Nieboer, D.; van der Ploeg, T.; Steyerberg, E.W.; Collins, G. Assessing discriminative performance at external validation of clinical prediction models. PLoS ONE 2016, 11, e0148820. [Google Scholar] [CrossRef]
- Iwagami, M.; Matsui, H. Introduction to clinical prediction models. Ann. Clin. Epidemiol. 2022, 4, 72–80. [Google Scholar] [CrossRef]
- Hanf, M.; Guégan, J.-F.; Ahmed, I.; Nacher, M. Disentangling the complexity of infectious diseases: Time is ripe to improve the first-line statistical toolbox for epidemiologists. Infect. Genet. Evol. 2014, 21, 497–505. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan Rural Cohort Study. Sci. Rep. 2020, 10, 4406. [Google Scholar] [CrossRef]
- Teng, Q.; Liu, Z.; Song, Y.; Han, K.; Lu, Y. A survey on the interpretability of deep learning in medical diagnosis. Multimedia Syst. 2022, 28, 2335–2355. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Chen, J.; Ding, J. Data evaluation and enhancement for quality improvement of machine learning. In 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS); IEEE: New York, NY, USA, 2020; p. 13. [Google Scholar] [CrossRef]
- Zaid, M.M.A.; Mohammed, A.A. Hybrid models in diabetes prediction: A review of techniques, performance, and potential. J. Al-Qadisiyah Comput. Sci. Math. 2024, 16, 298–308. [Google Scholar] [CrossRef]
- Tuomilehto, J.; Lindström, J.; Eriksson, J.G.; Valle, T.T.; Hämäläinen, H.; Ilanne-Parikka, P.; Keinänen-Kiukaanniemi, S.; Laakso, M.; Louheranta, A.; Rastas, M.; et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N. Engl. J. Med. 2001, 344, 1343–1350. [Google Scholar] [CrossRef]
- Knowler, W.C.; Barrett-Connor, E.; Fowler, S.E.; Hamman, R.F.; Lachin, J.M.; Walker, E.A.; Nathan, D.M.; Diabetes Prevention Program Research Group. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 2002, 346, 393–403. [Google Scholar] [CrossRef]
- Ramachandran, A.; Snehalatha, C.; Mary, S.; Mukesh, B.; Bhaskar, A.D.; Vijay, V.; Indian Diabetes Prevention Programme (IDPP). The Indian Diabetes Prevention Programme shows that lifestyle modification and metformin prevent type 2 diabetes in Asian Indian subjects with impaired glucose tolerance (IDPP-1). Diabetologia 2006, 49, 289–297. [Google Scholar] [CrossRef]
- Cangelosi, G.; Mancin, S.; Pantanetti, P.; Nguyen, C.T.T.; Palomares, S.M.; Biondini, F.; Sguanci, M.; Petrelli, F. Lifestyle medicine case manager nurses for type two diabetes patients: An overview of a job description framework—A narrative review. Diabetology 2024, 5, 375–388. [Google Scholar] [CrossRef]
- Alshowair, A.; Altamimi, S.; Alshahrani, S.; Almubrick, R.; Ahmed, S.; Tolba, A.; Alkawai, F.; Alruhaimi, F.; Alsafwani, E.; AlSuwailem, F.; et al. Effectiveness of case manager–led multi-disciplinary team approach on glycemic control amongst T2DM patients in primary care in Riyadh: A retrospective follow-up study. J. Prim. Care Community Health 2023, 14, 21501319231204592. [Google Scholar] [CrossRef]
- Tourkmani, A.M.; Abdelhay, O.; Alkhashan, H.I.; Alaboud, A.F.; Bakhit, A.; Elsaid, T.; Alawad, A.; Alobaikan, A.; Alqahtani, H.; Alqahtani, A.; et al. Impact of an integrated care program on glycemic control and cardiovascular risk factors in patients with type 2 diabetes in Saudi Arabia: An interventional parallel-group controlled study. BMC Fam. Pract. 2018, 19, 1. [Google Scholar] [CrossRef]
- Yu, X.; Chau, J.P.C.; Huo, L.; Li, X.; Wang, D.; Wu, H.; Zhang, Y. The effects of a nurse-led integrative medicine-based structured education program on self-management behaviors among individuals with newly diagnosed type 2 diabetes: A randomized controlled trial. BMC Nurs. 2022, 21, 217. [Google Scholar] [CrossRef]
First Author | Year of Publication (Year) | Country | Research Type | Age of the Research Subjects (Years) | Sample Source | Sample Size | Number of Patients with the Occurrence of Endpoint Events | Observation Endpoint | References |
---|---|---|---|---|---|---|---|---|---|
Xu et al. | 2022 | China | Retrospective study | 12~94 | Nanjing Drum Tower Hospital Health Management Center | 15,166 | 623 | ①②⑤ | [30] |
Lin et al. | 2024 | China | Retrospective study | ≥60 | Nanjing Shengrun Hospital | D:1564 V:671 | 99 | ② | [31] |
Wang et al. | 2023 | China | Retrospective study | ≥18 | Monitoring Data on Chronic Disease Risk Factors Among Residents of Dongguan City | 4106 | 149 | ①③④⑤ | [32] |
Liu et al. | 2024 | China | Retrospective study | — | National Health Examination Center Database | D: 32,372 V: 13,875 | D: 411 V: 205 | ①②③⑥ | [33] |
Yang et al. | 2024 | China | Retrospective study | ≥20 | Suzhou University First Affiliated Hospital Health Checkup Center | D: 3221 V: 1381 | 760 | ①②③ | [34] |
Tong et al. | 2023 | China | Retrospective study | ≥18 | Sichuan Provincial People’s Hospital | 980 | 513 | ①②③④⑥ | [35] |
Shao et al. | 2020 | China | Retrospective study | 20~80 | China Health and Nutrition Survey | D: 4498 V: 1525 | D: 257 V: 92 | ①②③④⑤ | [36] |
Jiang et al. | 2024 | China | Retrospective study | 50~75 | Guangzhou Haizhu District Grassroots Community Service Management Information System | 252,176 | — | ① | [37] |
Li et al. | 2023 | China | Retrospective study | ≥18 | national physical examination (NPE) project | 4,075,431 | 301,347 | ①③⑤ | [38] |
Wang et al. | 2021 | China | Retrospective study | ≥18 | 2018 Health Checkup Data for All Residents of Ili Kazakh Autonomous Prefecture, Xinjiang | D: 366,523 V: 91,630 | D: 30,758 V: 7577 | ①③ | [39] |
Dong et al. | 2022 | China | Retrospective study | 18~84 | Department of Health, Government of the Hong Kong Special Administrative Region | 1857 | 280 | ①②④⑤ | [40] |
Liu et al. | 2022 | China | Retrospective study | ≥65 | Wuhan Elderly Health Screening Data | 127,031 | — | ① | [41] |
Hu et al. | 2019 | China | Prospective study | — | Retired employees of Dongfeng Motor Corporation (DMC) in Shiyan City, Hubei Province, China | 4833 | 171 | ①②⑤ | [42] |
Yang et al. | 2024 | China | Retrospective study | 43~102 | Health check-up data for middle-aged and elderly people in Hongguang Street, Pidu District, Chengdu | 7602 | 434 | ①②④ | [43] |
Long et al. | 2024 | China | Prospective study | ≥18 | Jinchuan Group Staff Hospital | D: 22,025 V: 9438 | — | ①⑤ | [44] |
Ma et al. | 2020 | China | Retrospective study | — | Beijing Huazhao Yisheng Health Checkup Data | D: 4754 V: 2375 | — | ① | [45] |
Miao et al. | 2020 | China | Retrospective study | — | Physical examination data from a hospital in China | 936 | — | — | [46] |
Ma et al. | 2020 | China | Prospective study | 38~88 | A cohort survey of chronic cardiovascular diseases in Fangshan District, Beijing, China | 3127 | 187 | ①③④⑤ | [47] |
Ouyang et al. | 2021 | China | Retrospective study | ≥18 | Southern Hospital Health Management Center | 36,292 | 2244 | ①⑤ | [48] |
Wu et al. | 2023 | China | Retrospective study | 18~80 | Fujian Shishi Community Health Center | 165,263 | — | — | [49] |
First Author | Modeling Methods | Variable Selection Methods | Methods for Handling Continuous Variables | AUC (95%CL) | Verification Method | Calibration Method |
---|---|---|---|---|---|---|
Xu et al. | LR | LASSO regression | Maintain continuity | 0.865 (0.847, 0.865) | Internal verification | Calibration curve |
Lin et al. | LR | LASSO regression | Maintain continuity | D: 0.824 (0.765, 0.883) V: 0.809 (0.732, 0.886) | Internal verification | Calibration curve |
Wang et al. | LR, DT (CART, C4.5), BPNN, SVM, DNN | Univariate analysis, stepwise selection | Maintain continuity | Model 1: 0.962 Model 2: 0.906 Model 3: 0.888 Model 4: 0.977 Model 5: 0.911 Model 6: 0.845 | Internal verification | — |
Liu et al. | XGBoost, SVM, LR, RF | Univariate and multivariate analysis | Maintain continuity | Model 1: D: 0.986 V: 0.812 Model 2: D: 0.896 V: 0.668 Model 3: D: 0.914 V: 0.913 Model 4: D: 0.998 V: 0.838 | Internal verification | Hosmer–Lemeshow test calibration curve |
Yang et al. | LR | Univariate and multivariate analysis, stepwise selection | Convert to categorical variable | 0.800 (0.770, 0.829) | Internal verification | Calibration curve |
Tong et al. | RF, MLP, XGBoost, LGBM, CB | Univariate and multivariate analysis, LASSO regression | Maintain continuity | Model 1: 0.840 Model 2: 0.816 Model 3: 0.848 Model 4: 0.852 Model 5: 0.850 | Internal verification | Calibration curve |
Shao et al. | LR | Univariate and multivariate analysis, LASSO regression | Maintain continuity | Model 1: 0.788 (0.761, 0.816) Model 2: 0.807 (0.780, 0.834) Model 3: 0.905 (0.879, 0.932) Model 4: 0.882 (0.853, 0.912) | Internal validation and external validation | Calibration curve and bootstrap resampling |
Jiang et al. | RF, KNN XGBoost VC | — | Maintain continuity | — | Internal verification | Calibration curve and bootstrap resampling |
Li et al. | CART, LGBM, RF, XGBoost TabNet, MLP, LR | Univariate and multivariate analysis | Convert to categorical variable | Model 1: 0.884 Model 2: 0.881 Model 3: 0.873 Model 4: 0.912 Model 5: 0.876 Model 6: 0.875 Model 7: 0.816 | Internal verification | Calibration curve |
Wang et al. | LR | Univariate, multivariate analysis, LASSO regression | Convert to categorical variable | D: Male: 0.894 Woman: 0.816 V: Male: 0.865 Woman: 0.815 | Internal verification | Hosmer–Lemeshow test calibration curve |
Dong et al. | LR, XGBoost | Univariate and multivariate analysis, stepwise selection | Convert to categorical variable | Model 1: 0.812 (0.769, 0.853) Model 2: 0.822 (0.779, 0.863) | Internal verification | Hosmer–Lemeshow test Calibration curve |
Liu et al. | LR, DT, RF, XGBoost | Univariate analysis, LASSO regression | Convert to categorical variable | Model 1: 0.760 Model 2: 0.728 Model 3: 0.777 Model 4: 0.780 | Internal verification | Calibration curve |
Hu et al. | Cox | Univariate and multivariate analysis | Convert to categorical variable | D: 0.850 V: 0.830 | Internal verification | — |
Yang et al. | LR | Univariate and multivariate analysis | Convert to categorical variable | 0.794 (0.771, 0.816) | Internal verification | Calibration curve |
Long et al. | Cox | Univariate and multivariate analysis | Convert to categorical variable | D: 3 year: 0.783 5 year: 0.825 7 year: 0.842 V: 3 year: 0.782 5 year: 0.805 7 year: 0.807 | Internal verification | Calibration curve |
Ma et al. | RF, LR, SVM, DT, Naive Bayes (NB) | Multivariate analysis | Maintain continuity | Model 1: 0.931 Model 2: 0.903 Model 3: 0.813 Model 4: 0.776 Model 5: 0.858 | Internal verification | — |
Miao et al. | SVM (PSO-FWSVM) | Multivariate analysis | Maintain continuity | — | Internal verification | — |
Ma et al. | LR | Multivariate analysis and stepwise selection | Convert to categorical variable | Original model: 0,878 (0.853, 0.903) Model 1: 0.880 (0.856, 0.903) Model 2: 0.880 (0.855, 0.903) Model 3: 0.879 (0.854, 0.903) | Internal verification | Hosmer–Lemeshow test Calibration curve |
Ouyang et al. | LR, LGBM | Univariate and multivariate analysis, stepwise selection | Maintain continuity | Model1: 0.906 Model2: 0.910 | Internal verification | — |
Wu et al. | LR LGBM | — | Maintain continuity | — | Internal verification | — |
Method | Usage Count | Strengths | Weaknesses | Reference |
---|---|---|---|---|
LR | 15 |
|
| [30,31,34,51,52] |
XGBoost | 6 |
|
| [33,35,53] |
RF | 6 |
|
| [35,37,54] |
DT | 4 |
|
| [32,55] |
SVM | 4 |
|
| [32,33,56] |
LGBM | 4 |
|
| [48,49,57] |
MLP | 2 |
|
| [38,58] |
COX | 2 |
|
| [44,55] |
ANN | 1 |
|
| [59] |
DNN | 1 |
|
| [60] |
KNN | 1 |
|
| [61] |
NB | 1 |
|
| [45,62] |
First Author | Predictor Factor | Limitations |
---|---|---|
Xu et al. | Gender, Age, BMI, ALT, CREA, CHOL, HDL, GLU, MCHC, WBC, | Predicting the risk of type 2 diabetes solely based on laboratory data does not include factors such as diet, exercise, or genetics, which have been proven to be closely related to type 2 diabetes. Single-center data sources, lack of external validation. |
Lin et al. | Age, Gender, BMI, FBG, ALT, ALT/AST, BUN, TG, Hb | Single-center data sources, lack of external validation, exclusion of key variables (such as family history and history of gestational diabetes), and the model’s applicability being limited to the elderly population. |
Wang et al. | Age, drinking, cereals, potatoes, beans, fruits, eggs, milk, poultry, fish, DBP, FPG, TC, TG, HDL-C, LDL-C | Single-center data sources, lack of external validation. Failure to incorporate common disease risk factors such as genetics and self-care conditions (physical activity, sleep duration, etc.) into the model. |
Liu et al. | FPG, Age, TG, ALT, BMI, CR, DBP, gender, family | Lack of external validation, single source of data, non-inclusion of key indicators such as HbA1c. |
Yang et al. | Gender, Age, BMI, Blood Glucose, HDL-C, LDL-C, Fatty liver, ALT/AST | Few women were included, resulting in an imbalanced male-to-female ratio. No external validation was conducted. The follow-up period was short. |
Tong et al. | FBG, previous HbA1c values, having a rational and reasonable diet, health status scores, type of manufacturers of metformin, interval of measurement, EQ-5D scores, occupational status, Age | The sample size is small, and there are recall biases for some variables. |
Shao et al. | Model 1: Age, gender, race, BMI, waist circumference, hypertension Model 2: Model 1 + diet (calories, carbohydrates, protein), exercise, sleep duration Model 3: Model 2 + FPG, HbA1c, TG, LDL, HDL Model 4: FPG, HbA1c, TG, LDL, HDL | The data only comes from the China Health and Nutrition Survey (CHNS), which has issues with re-gional and sample representativeness, and there is a lack of further external validation to assess the model’s general applicability. |
Jiang et al. | BMI, age, systolic BP, diastolic BP, staple food, exercise frequency, exercise time | The feature variables are not comprehensive enough. |
Li et al. | Gender, age, ethnicity, EH, SS, HTN, CAD, PDM, WC, BMI, WBC, PLT, FBG, ECG, TC, TG, LDL-C, HDL-C | Using cross-sectional data cannot establish causal relationships, and the high heterogeneity and missing rates of health check-up data affect the model’s test effectiveness. |
Wang et al. | Age, FHOT, WC, TC, TG, BMI, HDLc, and history of hypertension. | It is not possible to analyze causal relationships from cross-sectional data, the regional limitations of data sources affect generalizability, and the model may not cover all risk factors for type 2 diabetes, which could lead to prediction bias. |
Dong et al. | Model 1: Age, BMI, WHR, smoking status, sleep duration, vigorous recreational activity time per week, and fruit consumption per week Model 2: age, BMI, WHR, SBP, waist circumference, sleep duration, smoking status, and vigorous recreational activity time per week | This study did not include key risk factors such as family history of diabetes and gestational diabetes history, and the validation was limited to the same population sample, which restricted the comprehensiveness of the results. |
Liu et al. | Age, gender, education, marital status, hypertension, fatty liver, exercise, current smoking, BMI, WC, SBP, DBP, FPG, TC, TG, HDL-C, LDL-C, ALT, AST, TBIL, SCR, BUN, and SUA | Selection bias, omission of certain key risk factors (such as HbA1c and insulin), failure to use OGTT may lead to diagnostic bias, only internal validation was conducted and external validation is lacking. |
Hu et al. | Age, gender, BMI, waist circumference, blood pressure, fasting blood glucose, lipid profile (TC, TG, HDL-C, LDL-C), serum uric acid, smoking and drinking status, physical activity, history of hypertension, and family history of diabetes. | Insufficient sample representativeness, lack of important predictive factors, internal validation only, short follow-up period, and potential bias in some self-reported data. |
Yang et al. | Age, gender, BMI, waist circumference, triglycerides, HDL-C, smoking status, drinking status, history of hypertension, and family history of diabetes | Insufficient sample representativeness, exclusion of certain key risk factors, limitations of diagnostic methods, internal validation only, and potential biases in lifestyle data. |
Long et al. | Sex, age, body mass index, alcohol consumption, alcohol abstinence, hypertension, triglycerides, HDL-C, glutamyl transferase, family history of diabetes mellitus, cholecystitis, gallbladder agenesis. | No external validation; no inclusion of lifestyle variables such as diet and exercise; single source of data. |
Ma et al. | Forty-seven characteristics such as blood lipids, urinalysis, liver function, blood pressure, age, gender, and height | Single source of data, lack of external validation, many missing datasets. |
Miao et al. | BMI, family history of diabetes, diastolic blood pressure, fasting blood glucose, total cholesterol, triglycerides, LDL, heart rate | Single data source, lack of external validation, risk of bias due to sample imbalance, weak interpretability of features. |
Ma et al. | Smoking, history of lipid-lowering drug use, 2h-PG, FPG, BMI, family history of diabetes mellitus, abnormal blood pressure markers, history of hypertension drug use | Lack of external validation and lack of extrapolation; low number of incidence and inaccurate prediction of high risk; continuous variables all categorized for treatment, which may reduce prediction accuracy. |
Ouyang et al. | Sex, age, BMI, waist circumference, heart rate, systolic blood pressure, diastolic blood pressure, FBG, uric acid, 4 biochemical indicators, 2 liver function indicators, 2 renal function indicators, and 17 routine blood tests, totaling 34 study indicators, were used as independent variables. | Single source of data, lack of external validation, failure to assess the calibration ability of the model, failure to include variables such as lifestyle behaviors, possible retrospective bias. |
Wu et al. | Only 42 characteristics were noted, but no specific | No external validation, no reported AUC, ROC curves, lack of model calibration assessment, lack of model interpretability. |
Study | ROB | Applicability | |||||||
---|---|---|---|---|---|---|---|---|---|
Participants | Predictors | Outcome | Analysis | Overall ROB | Participants | Predictors | Outcome | Overall Applicability | |
Xu et al. | + | + | + | − | − | + | + | + | + |
Lin et al. | + | + | + | − | − | + | + | + | + |
Wang et al. | + | + | + | − | − | + | + | + | + |
Liu H et al. | + | + | + | − | − | + | + | + | + |
Yang et al. | + | + | + | − | − | + | + | + | + |
Tong et al. | + | + | + | − | − | + | + | + | + |
Shao et al. | + | + | + | + | + | + | + | + | + |
Jiang et al. | + | ? | + | − | − | + | + | + | + |
Li et al. | + | + | + | − | − | + | + | + | + |
Wang et al. | + | ? | + | − | − | + | + | + | + |
Dong et al. | + | + | + | − | − | + | + | + | + |
Liu et al. | + | + | + | − | − | + | + | + | + |
Hu et al. | + | ? | + | − | − | + | + | + | + |
Yang et al. | + | + | + | − | − | + | + | + | + |
Long et al. | + | + | + | − | − | + | + | + | + |
Ma et al. | + | + | − | − | − | + | + | + | + |
Miao et al. | + | + | ? | − | − | + | + | + | + |
Ma et al. | + | + | + | − | − | + | + | + | + |
Ouyang et al. | + | + | + | − | − | + | + | + | + |
Wu et al. | + | + | + | − | − | + | + | + | + |
Aspect | Main Findings | Recommendation |
---|---|---|
Predictor diversity | Predictor variables focused on biological indicators, lack of SDOH (lifestyle factors, socioeconomic factors, etc.) | Add multidimensional predictors |
Continuous variable processing | Dichotomizing or categorizing continuous variables leads to information loss and exacerbates bias. | Adopt methods that preserve continuity |
Risk of bias assessment | Overall risk of bias is high, affecting model reliability | Strengthen risk of bias control and implement pre-registration and standardized development processes |
Model Validation and Application | Validation is mostly focused internally, lacking external multicenter validation and clinically friendly deployment tools | Conducted multicenter external validation and developed easy-to-use interfaces such as line charts, WeChat applets, EHR plug-ins, etc. |
Statistics vs. machine learning | Adoption of Machine Learning Methods Growing Rapidly, but Interpretability and Clinical Embeddedness Remain to be Improved | Exploring Interpretable Hybrid Models and Optimizing for Clinical Needs |
Aspect | Binning | Continuous |
---|---|---|
Information Retention | Loses within-bin variability | Retains full numeric detail |
Statistical Power | Substantially reduces statistical power when categorizing | Preserves full variability, maximizing power |
Model Calibration | Risk estimates “jump” at bin boundaries, hindering smooth calibration | Spline- or polynomial-based fits yield smoother, more accurate calibration |
Interpretability | Easy to explain cut-points and risk groups | Requires interpretation of coefficients or spline functions |
Overfitting Risk | Simpler structure may reduce overfitting | Complex fits need regularization or cross-validation to avoid overfit |
Sample Size Needs | Lower requirements but must ensure balanced counts per bin | Requires larger sample and EPV ≥ 10 to support reliable estimation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Duan, J.; Nayan, N.M. Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review. Healthcare 2025, 13, 2007. https://doi.org/10.3390/healthcare13162007
Duan J, Nayan NM. Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review. Healthcare. 2025; 13(16):2007. https://doi.org/10.3390/healthcare13162007
Chicago/Turabian StyleDuan, Juncheng, and Norshita Mat Nayan. 2025. "Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review" Healthcare 13, no. 16: 2007. https://doi.org/10.3390/healthcare13162007
APA StyleDuan, J., & Nayan, N. M. (2025). Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review. Healthcare, 13(16), 2007. https://doi.org/10.3390/healthcare13162007