1. Introduction
According to the World Health Organization, the global burden of diabetes has reached more than 800 million adults, a fourfold increase since 1990. Between 1990 and 2022, the prevalence of diabetes in adults doubled from 7 to 14%, with low- and middle-income countries (LMICs) experiencing the largest surge. Alarmingly, 59% of adults with diabetes, nearly 450 million people, remain untreated, with 90% living in LMICs, according to the WHO’s South-East Asia and Eastern Mediterranean Regions report [
1]. Predicting diabetes (Type 2-NIDD) risk involves identifying individuals with a higher probability of developing diabetes or experiencing related complications. Accurate prediction is critical for early intervention and prevention. Diabetes mellitus is a chronic metabolic disorder that affects millions of people worldwide and can cause serious complications if not properly managed. Early prediction of diabetes plays a critical role in mitigating its impact through timely intervention and treatment. Numerous studies have focused on developing machine learning models to predict diabetes accurately. Research has demonstrated the efficacy of algorithms such as Logistic Regression, Support Vector Machines, and Random Forests in identifying patterns from clinical and demographic data—see, e.g., Olusanya et al. [
2] and Ahmad et al. [
3]. The Pima Indian Diabetes Dataset has been widely used in these studies to benchmark predictive performance (Ahmed et al. [
4]). Advanced machine learning algorithms have become pivotal in predicting diabetes risk, utilising features such as age, body mass index (BMI), blood glucose levels, and genetic markers (Deberneh et al. [
5]). Models such as logistic regression, decision trees, random forests, and support vector machines (SVMs) have shown high precision in predicting diabetes risk based on clinical and demographic data (Chauhan et al. [
6] and Wang et al. [
7]). Recent studies emphasize the incorporation of ensemble methods, such as gradient boost and XGBoost, to improve predictive performance. These models analyze complex, high-dimensional datasets and identify critical risk factors, including oxidative stress and genetic predisposition. Additionally, novel approaches, such as deep learning and artificial neural networks, have been employed to enhance the accuracy of diabetes prediction using image data and real-time health monitoring systems (Aslan et al. [
8]). The Pima Indian Diabetes Dataset, along with other publicly available datasets, has been extensively used as a benchmark to assess model performance (Kakoly et al. [
9] and Naz et al. [
10]). These studies highlight the potential of machine learning techniques in identifying people at risk of diabetes and guiding preventive measures. Each model differs in methodology and uses different factors to predict the existence of diabetes. The accuracy of models differs significantly. The models’ parameters, the methods used for computing the error function, and the activation functions differ a lot. Many have compared the models used to predict diabetes, such as those by Chatterjee et al. [
11], Grundy et al. [
12], and Aponte et al. [
13], without much concern for the issue of fixing the model parameters. Therefore, the accuracy estimations are not reliable and comparable. Patients suffering from diabetes may suffer from various kinds of complications, such as retinopathy, nephropathy, cardiovascular disease, etc., which must be predicted, enabling corrective actions taken, in advance. Individual biomarker measurements fall in different ranges. A certain kind of risk is associated with every biomarker, depending on the measurement level. The risk associated with each biomarker contributes to the risk of diabetes. Complications are related to the overall risk of diabetes and the individual risk of biomarkers. Assessment of diabetes risk through biomarkers involves identifying and analysing various biological indicators that can predict the likelihood of developing diabetes. Recent research has highlighted the potential of both traditional and novel biomarkers in improving the accuracy of diabetes risk prediction. These biomarkers can provide insight into the underlying pathophysiological processes and inform the development of early intervention strategies. Various types of biomarkers and their roles in assessing diabetes risk have been presented by Ecesoy et al. [
14] and Hathaway et al. [
15]. Diabetes is often accompanied by severe complications (Reddy et al. [
16]), such as nephropathy, retinopathy, neuropathy, cardiovascular disease, etc. Understanding and predicting the risk levels of these complications is crucial for effective patient care. Various machine learning models, including Decision Trees and Random Forests, have been used to analyze the risk factors that contribute to dealing with these complications. The level of biomarkers has not been considered in this context. Recommendation systems for managing diabetes risk complications are emerging as powerful tools in personalized healthcare. By integrating clinical data, such as HbA1c levels, blood pressure, BMI, and lifestyle factors, with predictive analytics, these systems provide actionable insights for both healthcare providers and patients. Hybrid recommender systems, which combine collaborative filtering and machine learning models, have demonstrated high effectiveness in improving the management of diabetes-related complications, such as nephropathy, retinopathy, and cardiovascular disease (Alian et al. [
17]). Recent advances in recommender systems emphasize the use of real-time patient monitoring and adaptive algorithms to deliver customized recommendations to manage risks associated with diabetes complications. For example, systems that leverage artificial neural networks and hybrid collaborative filtering approaches have been developed to predict the progression of complications and suggest appropriate interventions, such as dietary adjustments, medication regimens, and exercise plans (Xie et al. [
18] and Arzouk et al. [
19]). Most systems suffer from an inaccurate prediction of complications, as the risk levels of the biomarkers used for prediction are not considered.
5. Related Work
The American Diabetes Association (ADA) [
20] provides comprehensive guidelines on the diagnosis of diabetes mellitus, serving as a global standard for clinical and research purposes. The ADA’s diagnostic criteria are based on extensive clinical evidence that correlates blood glucose levels with the risk of diabetes complications.
The American Heart Association [
21] categorises blood pressure and explains its relevance in diabetes management. It emphasizes the association between elevated blood pressure and an increased risk of cardiovascular complications in diabetic patients. The AHA’s framework helps both clinicians and patients effectively track and manage blood pressure to improve overall health outcomes.
The American Heart Association [
22] also presented the Body Mass Index (BMI) classification system, its connection to obesity-related health risks, and the importance of maintaining a healthy BMI to reduce the risk of diabetes.
The International Diabetes Federation (IDF) [
23] provides global statistics on diabetes, emphasising its socioeconomic impact and advocating for preventive strategies and patient education. The IDF emphasises the urgent need for improved prevention, early diagnosis, and access to effective treatment, particularly in low- and middle-income countries, where the disease is rapidly increasing. These figures stress the importance of coordinated global efforts to combat the diabetes epidemic and reduce its health and economic consequences.
Shin et al. [
24] focus on enhancing the clinical effectiveness of machine learning (ML) models for diabetes prediction by addressing both model accuracy and practical applicability. Their study evaluates various machine learning (ML) algorithms using clinical datasets to improve predictive performance while ensuring the models’ interpretability for healthcare professionals. Their findings demonstrate that tailored machine learning (ML) approaches not only improve diabetes risk stratification but also facilitate better clinical decision-making.
Syed and Khan [
25] present a machine learning-based application designed to predict the risk of Type 2 Diabetes Mellitus (T2DM) within the Saudi Arabian population, using a retrospective cross-sectional study approach. Their research utilized clinical and demographic data to train various machine learning models, aiming to identify individuals at high risk for T2DM early on. They assert that AI-driven tools can support healthcare professionals with timely and accurate diabetes risk assessments.
Al-Sadi and Balachandran [
26] developed a comprehensive prediction model for Type 2 Diabetes Mellitus (T2DM) focused on prediabetic patients in Oman, employing artificial neural networks (ANNs) alongside six different machine learning classifiers. The authors systematically compared the performance of multiple classifiers, including support vector machines, random forests, and logistic regression, highlighting the strengths and limitations of each model within the Omani population context. Their findings demonstrate that combining ANN with other classifiers can enhance prediction accuracy, supporting personalised and region-specific diabetes management strategies. They did not consider the issue of equivalences among the parameters used for learning the models.
Dutta et al. [
27] investigated the early prediction of diabetes by employing an ensemble of machine learning models to improve predictive accuracy and robustness. Their study combines multiple classifiers, including decision trees, support vector machines, and logistic regression, to leverage the complementary strengths of each. The ensemble approach demonstrated superior performance compared to individual models, effectively handling the complexity and variability of diabetes-related data. They ignored the issue of equivalences among the parameters used for modelling.
Qin et al. [
28] explored the use of machine learning models for predicting diabetes risk based on individual lifestyle types, highlighting the role of behavioural factors in disease development. Their study integrates lifestyle data, such as diet, physical activity, and smoking habits, with clinical indicators to build predictive models. They demonstrated that machine learning can effectively differentiate diabetes risk levels and improve prediction accuracy.
Yuk et al. [
29] presented an artificial intelligence (AI)-based approach for predicting diabetes and prediabetes using routine health checkup data collected in Korea. This study leverages machine learning algorithms to analyse diverse clinical parameters and biomarkers, aiming to identify individuals at risk with high accuracy, while minimising the impact of modal parameters.
Farnoodian et al. [
30] focused on the detection and prediction of diabetes through the identification and utilization of effective biomarkers. Their study highlights the integration of advanced computational techniques with biomedical data to improve the accuracy of diabetes diagnosis. By analyzing various biomarkers linked to glucose metabolism, inflammation, and lipid profiles, the authors developed predictive models that improve early detection capabilities. Their approach leverages machine learning algorithms to handle complex biomarker data, demonstrating improved sensitivity and specificity in diabetes prediction. The parameter equivalences, however, are not addressed.
Massaro et al. [
31] propose a diabetes prediction system that integrates Long Short-Term Memory (LSTM) networks with decision support system (DSS) automation and dataset optimization techniques. The study emphasizes the use of LSTM, a type of recurrent neural network, to capture temporal dependencies in patient health data for more accurate diabetes forecasting. Their automated DSS framework facilitates real-time decision-making by healthcare providers, offering risk assessments and early warning signals for diabetes onset.
Larabi-Marie-Sainte et al. [
32] provide a comprehensive review of current techniques used for diabetes prediction. The paper surveys a range of machine learning and statistical methods, including support vector machines, decision trees, neural networks, and ensemble models. The authors analyze the strengths, limitations, and performance metrics of these techniques in predicting diabetes onset. They present how integrating various predictive models can enhance accuracy and reliability in clinical settings.
Madan et al. [
33] propose an optimization-based diabetes prediction model that combines Convolutional Neural Networks (CNNs) and Bi-Directional Long Short-Term Memory (Bi-LSTM) networks to enhance prediction accuracy in real-time environments. This study leverages CNN’s ability to extract spatial features and Bi-LSTM’s capability to capture temporal dependencies from health data sequences. The model integrates optimization techniques to fine-tune parameters, improving both efficiency and predictive performance.
Sonia et al. [
34] present a machine-learning-based approach for predicting diabetes mellitus risk, utilising a multi-layer neural network combined with a No-Prop algorithm. This study focuses on developing an efficient neural architecture that minimises computational overhead while maintaining high prediction accuracy. The No-Prop algorithm streamlines the training process by minimizing the need for extensive backpropagation, making the model faster and more scalable for real-world applications.
Fitriyani et al. [
35] conducted a comprehensive comparative analysis of diabetes risk screening scores across diverse populations, including Chinese, Japanese, Korean, US-PIMA Indian, and Trinidadian cohorts. The study evaluates the predictive performance and applicability of various established diabetes risk assessment tools within these ethnic groups. By analyzing the differences in sensitivity, specificity, and overall accuracy, the authors identify population-specific strengths and limitations of each screening score.
Dritsas and Trigka et al. [
36] provide a comprehensive overview of data-driven machine learning methods applied to diabetes risk prediction. The authors examine various algorithms, including decision trees, support vector machines, and ensemble methods, highlighting their effectiveness in early diabetes risk detection based on clinical and demographic data.
Huang et al. [
37] explore the early prediction of prediabetes and Type 2 diabetes by integrating genetic risk scores with biomarkers of oxidative stress. Their study emphasizes the combined use of genetic predisposition and physiological stress indicators to enhance the accuracy of diabetes risk assessment. By incorporating these novel biomarkers into predictive models, the authors demonstrate improved sensitivity and specificity compared to traditional risk factors alone. This approach highlights the potential of personalised medicine strategies that utilise both genetic and biochemical data for earlier detection and targeted prevention of diabetes.
Tan et al. [
38] developed a predictive model focusing on the incidence of Type 2 diabetes in individuals with abdominal obesity, based on data from the general population, as published in Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy. Their study highlights abdominal obesity as a significant risk factor for diabetes onset, and integrates demographic, clinical, and lifestyle variables to build a robust risk prediction tool. Using statistical and machine learning techniques, the model effectively stratifies individuals based on their likelihood of developing Type 2 diabetes, highlighting the importance of targeted screening and early intervention in high-risk populations.
Toledo-Marín et al. [
39] present an advanced approach to predicting blood risk scores in diabetes patients using deep neural networks. Their study leverages deep learning techniques to analyze complex clinical and biochemical data, aiming to improve the accuracy and reliability of diabetes risk stratification. By employing neural network architectures that can capture nonlinear relationships within the data, the authors demonstrate enhanced predictive performance compared to traditional machine learning methods.
Alghamdi et al. [
40] investigates the prediction of diabetes complications using computational intelligence techniques. The study focuses on leveraging advanced algorithms, including machine learning and hybrid models, to accurately identify patients at high risk of developing various diabetes-related complications. By analyzing clinical, demographic, and biochemical data, the proposed models demonstrate significant improvements in predictive accuracy and early detection capabilities compared to conventional methods.
Sun et al. [
41] developed prediction models specifically targeting the risk of diabetic kidney disease (DKD) in Chinese patients with Type 2 diabetes mellitus, as detailed in renal failure. Their research employed various statistical and machine learning techniques to identify key clinical and biochemical risk factors associated with the onset and progression of diabetic kidney disease (DKD). The study emphasizes the importance of localized, population-specific models that account for demographic and genetic differences, enhancing prediction accuracy.
Serés-Noriega, Perea, and Amor et al. [
42] investigated the screening of subclinical atherosclerosis and its predictive value for cardiovascular events in individuals with Type 1 diabetes, as published in the
Journal of Clinical Medicine. The study emphasizes the heightened cardiovascular risk faced by this population and investigates non-invasive imaging and biomarker-based screening tools to detect early vascular changes before clinical symptoms emerge. Their findings suggest that systematic screening for subclinical atherosclerosis can significantly improve the prediction of adverse cardiovascular events, enabling timely clinical interventions. This research underscores the importance of integrating advanced diagnostic methods into diabetes management to reduce cardiovascular morbidity and mortality in patients with Type 1 diabetes.
Gosak et al. [
43] present a comprehensive literature review on the use of artificial intelligence (AI) for predicting diabetic foot risk in patients with diabetes, published in
Applied Sciences. The review highlights how AI techniques, including machine learning and deep learning algorithms, have been increasingly applied to analyze clinical, imaging, and sensor data to identify early signs of diabetic foot complications. The study emphasises the potential of AI-based predictive models to improve early detection, personalise risk assessment, and assist healthcare professionals in preventing severe outcomes such as ulcers and amputations. By synthesising findings from multiple studies, Gosak et al. underscore the critical role of AI in enhancing diabetic foot care and the need for further research to optimise model accuracy and clinical implementation.
Mu et al. [
44] investigated the prediction of diabetic kidney disease (DKD) in patients newly diagnosed with Type 2 diabetes mellitus. Their study focused on identifying clinical and biochemical markers that could effectively forecast the onset of DKD, a serious microvascular complication of diabetes. Utilizing advanced statistical and machine learning models, the authors developed predictive tools aimed at early risk stratification, which is crucial for timely intervention and prevention of kidney function decline.
Xia et al. [
45] conducted a comprehensive systematic review and meta-analysis on risk prediction models for mild cognitive impairment (MCI) in patients with Type 2 diabetes mellitus. The study critically evaluated various predictive algorithms and biomarkers associated with cognitive decline in diabetic populations, highlighting the growing concern of neurological complications in diabetes management. By synthesising evidence from multiple studies, the authors identified key clinical, metabolic, and lifestyle factors that contribute to MCI risk and assessed the performance of machine learning and statistical models in early detection. Their findings emphasize the need for accurate, individualized risk assessment tools to prevent or delay cognitive deterioration among patients with Type 2 diabetes.
Wang et al. [
46] developed and internally validated a kidney risk prediction model specifically for patients with Type 2 diabetes mellitus. The study leveraged advanced machine learning algorithms to analyse clinical and biochemical data, aiming to improve the early detection of diabetic kidney disease (DKD).
Chen et al. [
47] investigated the risk prediction of diabetes progression by applying big data mining techniques on a comprehensive set of physical examination indicators. The study utilised a range of clinical and biochemical features from large-scale datasets to identify key predictors and patterns associated with the progression of diabetes. By integrating diverse physical examination data with advanced data mining algorithms, Chen’s model achieved high accuracy in forecasting the progression stages of diabetes.
Kong et al. [
48] employed Bayesian network analysis to explore the complex relationships among factors influencing Type 2 diabetes, coronary heart disease, and their comorbidities. This probabilistic graphical modelling approach allowed for the identification of direct and indirect causal links between various clinical, demographic, and lifestyle variables. The study provides insights into how these factors interplay to increase the risk of both diseases and their coexistence, enabling better risk stratification and targeted prevention strategies.
Toofanee et al. [
49] proposed DFU-Siam, a novel deep learning-based model for classifying diabetic foot ulcers. Their approach utilises Siamese neural networks to effectively distinguish between different ulcer types, thereby improving diagnostic accuracy and facilitating timely clinical intervention. The study demonstrated that DFU-Siam outperforms traditional classification methods by learning subtle differences in ulcer images, which is critical for personalized treatment and reducing complications associated with diabetic foot ulcers.
Sun and Zhang et al. [
50] developed a diagnostic model for diabetic retinopathy using electronic health records (EHRs). Their study employed machine learning techniques to analyze clinical data for early detection and severity assessment of diabetic retinopathy.
Islam et al. [
51] conducted an in-depth study on advanced techniques for predicting the future progression of Type 2 diabetes, highlighting the importance of early and accurate forecasting in clinical decision-making. They utilised a variety of machine learning algorithms, including ensemble methods, recurrent neural networks, and support vector machines, to analyse longitudinal patient data from electronic health records. The study emphasized the integration of temporal trends and multiple clinical indicators such as blood glucose levels, HbA1c, BMI, and comorbidities to improve prediction accuracy. By leveraging these multifaceted data inputs, their models were capable of forecasting disease progression stages with higher precision compared to traditional static models.
Reshan et al. [
52] proposed an innovative ensemble deep learning-based Clinical Decision Support System (CDSS) designed to enhance the accuracy of diabetes prediction. This study integrated multiple deep learning architectures, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, within an ensemble framework to leverage the strengths of each model. The system processes complex clinical datasets, including patient demographics, laboratory results, and lifestyle factors, transforming them into actionable insights for early detection of diabetes. By combining predictions from different deep learning models, the ensemble approach enhances robustness and generalization, outperforming individual models in both sensitivity and specificity.
Linkon et al. [
53] present an evaluation of feature transformation techniques combined with various machine learning models aimed at the early detection of diabetes mellitus. The study examines the impact of various data preprocessing methods—such as normalisation, scaling, and dimensionality reduction—on the performance of classifiers, including decision trees, support vector machines, and neural networks. The research focuses on enhancing prediction accuracy and model robustness by refining feature representation.
Dorcely et al. [
54] investigates emerging biomarkers that play a crucial role in the early detection and monitoring of prediabetes, diabetes, and their related complications. These novel biomarkers include inflammatory markers, adipokines, and metabolic indicators, which reflect underlying pathophysiological changes such as insulin resistance, beta-cell dysfunction, and chronic inflammation. Identifying these biomarkers enables more precise risk assessment and helps tailor personalized interventions to delay or prevent disease progression and complications like cardiovascular disease, nephropathy, and neuropathy. The study emphasizes the importance of integrating biomarker data with clinical parameters to improve diabetes management outcomes.
Fazakis et al. [
55] present an extensive study on the application of various machine learning tools for the long-term prediction of Type 2 diabetes risk. The research evaluates multiple algorithms, including decision trees, support vector machines, and ensemble methods, to identify key predictors from clinical and lifestyle data. Their findings highlight the effectiveness of machine learning models in capturing complex patterns and improving prediction accuracy compared to traditional statistical methods.
Guo et al. [
56] explore the role of oxidative stress and epigenetic regulation in the pathological changes of lens epithelial cells that contribute to diabetic cataract development. The study details how chronic hyperglycaemia-induced oxidative damage and epigenetic modifications disrupt cellular homeostasis, leading to lens opacity. By elucidating these molecular mechanisms, the research provides insight into potential therapeutic targets for preventing or slowing cataract formation in patients with diabetes.
Alkhodari et al. [
57] present a study focused on screening for cardiovascular autonomic neuropathy (CAN) in diabetic patients exhibiting microvascular complications, leveraging machine learning methods. CAN is a serious complication affecting the autonomic regulation of the cardiovascular system in diabetes, often leading to increased morbidity and mortality. The research utilized 24 h heart rate variability (HRV) data, which reflects the autonomic nervous system’s control over the heart, as a key diagnostic biomarker. Various machine learning algorithms were applied to analyse HRV features extracted from continuous ECG monitoring, to classify patients with CAN accurately.
Rahim et al. [
58] proposed an integrated machine learning framework designed to improve the prediction accuracy of cardiovascular diseases (CVDs). Recognizing the complexity and multifactorial nature of CVDs, the study combined multiple machine learning algorithms to leverage the strengths of each and enhance predictive performance. The framework incorporated data preprocessing, feature selection, and model optimization techniques to handle high-dimensional clinical datasets effectively. Various classifiers, such as random forest, support vector machines, and gradient boosting, were evaluated, with the integrated approach demonstrating superior accuracy, sensitivity, and specificity compared to individual models. The study emphasises the importance of a holistic machine learning pipeline for early detection and risk assessment of cardiovascular conditions, which could facilitate timely clinical decision-making and personalised patient care.
Salih et al. [
59] present a machine learning approach for diabetes prediction using the PIMA Indian dataset. The study evaluates multiple algorithms, including decision trees, support vector machines, and neural networks, to identify the most effective model for early detection of diabetes. The authors report that ensemble methods and hybrid models achieve superior performance compared to single classifiers. Their findings demonstrate that machine learning can provide reliable and cost-effective tools for diabetes screening and risk assessment, potentially aiding in timely medical interventions.
Dagliati et al. [
60] explored the application of machine learning techniques to predict complications in patients with diabetes. Using clinical data, the study evaluated models like decision trees, support vector machines, and logistic regression. These models were trained to identify patterns and risk factors associated with long-term complications, including nephropathy, retinopathy, and cardiovascular diseases. A key focus was on the temporal nature of the data and its integration into predictive modelling. The study utilized electronic health records (EHRs) to generate dynamic patient profiles over time. The results showed that machine learning can accurately stratify risk and assist in proactive disease management. Interpretability and clinical relevance were emphasized to support real-world decision-making.
Dinh et al. [
61] presented a data-driven approach for predicting both diabetes and cardiovascular disease using machine learning techniques. The study utilized a large dataset containing patient health records and lifestyle variables. Various algorithms, including decision trees, random forests, and support vector machines, were applied and compared for performance. The goal was to assess predictive accuracy and identify key risk factors. Feature importance analysis highlighted the relevance of age, BMI, blood pressure, and cholesterol levels. The results showed that ensemble models, particularly random forests, performed best in classification tasks.
Tan et al. [
62] conducted a systematic review evaluating various machine learning (ML) methods developed for predicting diabetes-related complications. The review analyzed 48 peer-reviewed studies covering complications such as nephropathy, retinopathy, cardiovascular disease, and diabetic foot. Commonly used ML algorithms included logistic regression, decision trees, support vector machines, random forests, and neural networks. Most models focused on Type 2 diabetes populations and utilized structured datasets like electronic health records (EHRs).
Kee et al. [
63] present a systematic review focused on predicting cardiovascular complications in diabetic patients using machine learning (ML) models. The review analyzed over 40 studies, primarily targeting Type 2 diabetes mellitus (T2DM) populations. Common complications assessed included coronary artery disease, stroke, and heart failure. Frequently used ML methods included logistic regression, random forests, support vector machines, and deep learning architectures. Data sources were largely based on electronic health records and longitudinal clinical datasets. Feature variables often included age, HbA1c, blood pressure, cholesterol, and duration of diabetes.
Li et al. [
64] investigated the performance of various machine learning (ML) models in predicting Diabetic Ketoacidosis (DKA) in adults with Type 1 diabetes using electronic health record (EHR) data. The study utilized a cohort of over 5000 adult patients from a national health database. Several machine learning (ML) algorithms were compared, including logistic regression, random forest, support vector machines, and gradient boosting. Key predictors included insulin dosage, HbA1c levels, frequency of hypoglycemia, and previous hospital admissions. The gradient boosting model performed best, achieving high predictive accuracy. Logistic regression also showed strong performance with fewer variables. Feature importance analysis highlighted HbA1c and history of prior DKA as top predictors. The study highlighted the utility of EHR-based prediction models for identifying early risk.
Seerapu et al. [
65] explored the integration of machine learning (ML) and computer vision in the diagnosis and management of diabetic cataracts, a common complication of diabetes. The study discusses recent advancements in automated image analysis, particularly using deep learning models applied to fundus and slit-lamp imaging. Convolutional Neural Networks (CNNs) were highlighted for their accuracy in detecting early-stage cataract changes in diabetic patients. The authors emphasize the importance of early detection, which can reduce the risk of vision impairment. Several machine learning (ML) models were evaluated for their classification and segmentation capabilities.
7. Methodology
The methodology adopted to predict diabetes and the related complications is shown in
Figure 1. First of all, the PIMA dataset was downloaded from the KAGGLE site. The required characteristics, including glucose (GL), blood pressure (BP), body mass index (BMI), and age, were extracted, and a separate dataset was created. A total of 758 records were extracted, of which 90% were used for training and the rest for testing the machine learning models.
The model parameters were fixed, including the learning rate, number of epochs, batch size, activation function, error computations, and optimization function. The issue of overfitting was dealt with by using a regularization function wherever required.
The machine learning models, which included RF (random forest), Decision (DT), K Nearest Neighbors (KNN), SVM (Support Vector Method), LT (logistic regression), and Naive Bayes Classifier were trained using common model parameters and the PIMA training dataset. The models were tested using the PIMA test dataset, and the performance metrics of accuracy, precision, recall, and F1 score were calculated. RF was found to yield the highest accuracy. A comparative analysis was performed and the best model was found.
Risk analysis was carried out using test data and a new dataset was created to reflect the risk associated with each of the biomarkers, which were combined to obtain the overall risk. This dataset was again split into training data (80% records) and the rest into test data. From the test data, data related to a specific set of biomarkers related to a complication were selected, and a neural network was trained, which was used to predict the possibility of the occurrence of a specific complication. The eight neural networks were thus trained, each reflecting specific complications.
The proposed work comprises two phases. Phase A focuses on predicting the existence of diabetes using different machine learning models and identifying the best-performing model. Phase B involves performing risk analysis based on biomarkers and predicting complications using artificial neural network (ANN) models.
11. Conclusions
Detecting diabetes in a patient and predicting the likelihood of various complications is crucial and urgent, allowing for immediate corrective actions to be taken.
The models comparing various machine learning models revealed that Random Forest produces the most accurate prediction regarding the existence of diabetes.
Comparative models have employed various machine learning methods. As such, there is no commonality among the various studies conducted. Every ML model is experimented with separately and optimized through the choice of model parameters. Training of the models is achieved using different approaches. This leads to non-uniformity; therefore, the comparison models do not reveal the proper suitability of a method for predicting diabetes.
In this paper, all the ML models have been trained on commonly related model parameters; the study considers all possible ML models and a comparison of the accuracy of the models has been presented, which revealed the non-existence of variability among the accuracy estimation, and the accuracy of Random Forest was slightly better, even though it took more time for processing.
Every biomarker contributes to the risk of complications that arise due to diabetes. The percentage of risk contribution varies from biomarker to biomarker. In this paper, a method for assigning risk due to each biomarker and its weight in contributing to a particular complication was presented, which contributes to an accurate prediction of 100% of complications due to diabetes.
This system gathers standard test results to diagnose diabetes mellitus in patients and offers predictions to help manage the disease effectively. In addition, it analyses the input parameters to assess the risk of developing diabetic complications, enabling proactive measures to prevent these potential problems. The system offers insight into the risks of the most important diabetes-related complications, namely, diabetic ketoacidosis, stroke, cardiomyopathy, nephropathy, diabetic foot ulcers/gangrene, retinopathy, diabetic neuropathy, and cataracts, by evaluating important metrics such as blood glucose levels, blood pressure, BMI, and age. Additionally, the system creates risk assessment and predictions of complications, helping patients to take precautionary measures. This strategy highlights the value of preventative care and sophisticated decision-making, which might improve quality of life for people with diabetes and empower patients and healthcare professionals.