You are currently viewing a new version of our website. To view the old version click .
Diagnostics
  • Article
  • Open Access

10 December 2025

A Lifestyle-Based Fuzzy-Enhanced ANN Model for Early Prediction of Type 2 Diabetes and Personalized Management in the North Indian Population

and
1
Department of Health Information Management and Technology, College of Applied Medical Sciences, King Faisal University, Al-Ahsa 31982, Saudi Arabia
2
Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri 185234, Jammu and Kashmir, India
*
Authors to whom correspondence should be addressed.
Diagnostics2025, 15(24), 3139;https://doi.org/10.3390/diagnostics15243139 
(registering DOI)
This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics

Abstract

Background: Type 2 Diabetes Mellitus (T2DM) continues to rise rapidly in Indian communities, affecting millions and posing a major public health challenge. Early identification of risk and timely lifestyle intervention are crucial for prevention. This study aims to develop a lifestyle-driven, fuzzy-enhanced Artificial Neural Network (ANN) model for early T2DM prediction and to design a personalized recommendation framework tailored to the North Indian population. Methods: A comprehensive exploratory data analysis, including statistical significance testing and age-cohort assessment, was conducted to evaluate data quality and identify key lifestyle associations. The ANN model was trained on 1939 lifestyle profiles and classified individuals into four risk categories: low, moderate, high-risk, and diabetic. A monotonic spline-based calibration method was used to refine predicted probabilities. Additionally, a web-based system, the Personalized Care and Intelligence System for Early Diabetes Assessment (PCISEDA), was developed to deliver individualized diet and physical activity recommendations. Cost-effective lifestyle options were curated via a structured web-scraping pipeline. Results: The proposed fuzzy-enhanced ANN model achieved an accuracy of 93.64%, precision of 94.00%, recall of 93.50%, F1-score of 93.50%, and a multiclass ROC–AUC of 94.07%, demonstrating strong discriminative performance. Feature importance analysis revealed age, weight, urination frequency, and thirst as the most influential lifestyle predictors of T2DM risk. The PCISEDA system successfully generated personalized and economically feasible lifestyle recommendations for each risk category. Conclusions: This lifestyle-based AI framework demonstrates substantial potential for early T2DM risk stratification and tailored lifestyle management. The integration of fuzzy calibration and personalized recommendations offers an accurate, scalable, and cost-effective solution that may support diabetes prevention and management in resource-constrained healthcare settings.

1. Introduction

Diabetes mellitus is considered one of the main healthcare issues that is affecting millions of people globally (https://www.who.int/health-topics/diabetes#tab=tab_1, accessed on 10 September 2025). It is a fatal disease and is caused by metabolic disorders in humans when the body cannot produce enough insulin or cannot effectively use the hormone properly [1,2]. Individuals with diabetes are trapped in the harmful deterioration of pancreatic beta cells due to a targeted autoimmune attack specifically focused on these cells [3]. It is a long-term and serious condition that reduces life expectancy and degrades the living standards of human beings. The mismanagement of diabetes can lead to severe complications by affecting several organs and tissues of the human body (https://my.clevelandclinic.org/health/diseases/7104-diabetes-mellitus-an-overview, accessed on 10 September 2025). Typical initial signs observed in diabetic and potentially diabetic patients include heightened thirst, extreme fatigue, weight fluctuations, dizziness, changes in skin color, sexual dysfunction, fungal infections, elevated blood sugar levels, and frequent urination [4]. Nearly 40 different types of diabetes exist, and most people worldwide are not aware of the complications associated with this disease due to inadequacies in healthcare resources and systems (https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/, accessed on 12 September 2025). Some of the common types of diabetes are Type 1 Diabetes Mellitus, also called insulin-dependent diabetes, which is caused by autoimmune dysfunction and may occur at any stage, but is mostly observed in children and adolescents [5]. Type 2 Diabetes Mellitus, also called insulin-independent diabetes, is the most common type and occurs when the body does not produce or use insulin appropriately [4]. This type of diabetes can be controlled and managed by a healthy lifestyle. Another type is gestational diabetes, also called Type 3 Diabetes, which occurs in women during pregnancy because of hyperglycemia conditions and can even affect the baby’s health [6]. Finally, Type 4 Diabetes, also called prediabetes, is caused due to an increase in the production of insulin [7].
According to the latest International Diabetes Federation (IDF) Diabetes Atlas 2025, approximately 560 million adults are living with diabetes globally, and the number is projected to reach 632 million by 2035 [8]. Figure 1 presents the estimates of diabetes prevalence and undiagnosed cases (ages 20–79 years) across the top seven countries most affected by diabetes, based on the IDF Diabetes Atlas 2025. China reports the highest number of adults living with diabetes (148 million), followed by India (89.8 million), the United States (38.5 million), Pakistan (34.5 million), Indonesia (20.4 million), Brazil (16.6 million), and Bangladesh (13.9 million). It also highlights the substantial burden of undiagnosed diabetes, which remains critically high in several countries, most notably Indonesia (73.2%), China (49.7%), India (43%), and Pakistan (26.9%). This visualization provides an updated overview of global diabetes distribution using the latest IDF 2025 modeling data.
Figure 1. Top seven countries suffering from diabetes mellitus according to the IDF Atlas 2025.
Traditional clinical methods for diabetes detection, including pathways, reports, and tests, lack the ability to capture the dynamic and personalized nature of Type 2 Diabetes prediction [9,10]. These approaches ignore lifestyle parameters in disease prediction. The following techniques enable feature selection, risk assessment, predictive modeling, early detection, and personalized medicine. Fuzzification transforms precise data into fuzzy representation, incorporating uncertainty inherent in real-world data [11,12]. Integrating a predictive and personalized recommendation system into clinical workflows can improve self-management and early intervention [13]. Seamless connectivity with electronic health records (EHRs) enables automatic retrieval of demographics, medical history, and recent laboratory results, allowing clinicians to review AI-generated recommendations during consultations. Automated digital surveys supported by mobile technologies can routinely collect lifestyle data such as diet, physical activity, and symptom patterns, ensuring continuous risk monitoring [14]. Clinical decision support alerts may notify providers when a patient’s risk exceeds a threshold or when adherence declines, prompting timely follow-up. Such a system can also integrate with scheduling platforms, pharmacy systems, and telemedicine services, enabling automated appointment reminders, medication adherence alerts, and remote consultations, particularly valuable for rural populations with limited access to care. These features align with patient-centered care principles by ensuring timely support, personalized communication, and improved accessibility [15].
Despite significant progress in diagnostic technologies, conventional screening approaches often fail to capture early-stage risk, especially among individuals who exhibit mild or lifestyle-driven symptoms [16]. Early prediction using lifestyle indicators, rather than solely clinical biomarkers, offers a promising, low-cost strategy for identifying at-risk individuals and delaying disease progression [17,18]. With the increasing adoption of AI-driven predictive analytics, there is a growing opportunity to integrate machine learning with personalized lifestyle recommendations to support self-management and preventive care [19,20].
The primary objective of this research is to develop an integrated, lifestyle-centric Artificial Intelligence framework for realistic and early-stage healthcare management of Type 2 Diabetes Mellitus (T2DM). The proposed system incorporates two major components: (i) an ANN-based predictive analytics module enhanced with fuzzy logic for multi-class risk stratification, and (ii) a personalized recommender system that generates diet and physical activity suggestions tailored to each risk category. The key contributions of this work are summarized as follows:
  • Identification of lifestyle indicators: We identified, refined, and collected lifestyle parameters (e.g., thirst, urination frequency, fatigue, smoking, drinking, anthropometrics, and family history) in consultation with endocrinologists, diabetologists, and nutritionists to ensure clinical relevance and real-world applicability.
  • Exploratory data analysis: Extensive exploratory data analysis (EDA) was performed, including duplicate detection, missing value assessment, outlier analysis (IQR method), and statistical significance testing (Cramer’s V + chi-square). No missing or corrupted data were found, ensuring a high-quality dataset for model development.
  • Development of ANN–fuzzy model: We developed a deep ANN architecture (three hidden layers) enhanced with a spline-based fuzzy membership function for calibrating outputs and subdividing non-diabetic classes into low-risk (A), moderate-risk (B), and high-risk (C) groups. This hybrid ANN–fuzzy model provides interpretable and robust multi-class prediction for T2DM.
  • Personalized recommender system: A rule-driven and cost-aware recommender system was designed to generate individualized diet plans and physical activity charts based on risk category, symptom intensity, and expert dietary guidance. Web-scraped nutritional and pricing data were incorporated to ensure affordability.
  • Knowledge-based interface: A web-based interface was built to deliver predictions, risk explanations, and personalized recommendations to end users. This interface supports early monitoring, promotes self-management, and enhances doctor–patient engagement.
The remainder of this paper is structured as follows. Section 1 introduces Type 2 Diabetes Mellitus (T2DM), its clinical complications, and its global burden, supported by recent statistical evidence. Section 2 provides a comprehensive review of existing research on diabetes prediction models and personalized recommender systems and synthesizes the key research gaps motivating this study. Section 3 details the proposed methodology, including data acquisition, exploratory data analysis, correlation analysis, preprocessing, feature significance assessment, ANN-based multi-class prediction, fuzzy logic calibration, and training and testing configurations of the proposed model. Section 4 presents and discusses experimental findings, including feature importance, model performance, comparative evaluation with existing models, and class-wise recommendations. Finally, Section 5 concludes the study by summarizing key contributions, outlining the limitations, and suggesting future research directions to enhance scalability, clinical integration, and real-time applicability.

2. Literature Review

AI-based approaches with different computational models have contributed significantly to the field of healthcare and have the potential to assist in early detection, intervention, and effective management of multiple chronic diseases, like diabetes, heart disease [21], obesity [17], cancer, etc. [22]. A healthy lifestyle is a key indicator of these chronic diseases, especially diabetes. In recent years, researchers and healthcare professionals worldwide have employed various computational models to predict the onset of Type 2 Diabetes. Several approaches that have been implemented are as follows:
Kumar et al. [21] developed a framework to predict diabetes based on deep neural network using the publicly available PIMA dataset. The authors implemented the model using various statistical measures such as accuracy, specificity, sensitivity, and recall. The model accomplished 98.16% accuracy using a random train–test split, and the proposed model acquired better results than state-of-the-art methods. Rahman et al. [22] developed a novel model for the classification of diabetes. The authors utilized a combination of a convolutional neural network (CNN), traditional long short-term memory (LSTM), and CNN-LSTM models, and then compared their performance with the developed model using the PIMA Indians Diabetes Database (PIDD). The Boruta algorithm was employed to extract significant features. Kannadasan K et al. [23] presented a model for diabetes classification based on deep neural network, employing stacked autoencoders that improved accuracy and enhanced other evaluation metrics. The parameter tuning of the DNN framework has been performed in a supervised fashion. Features were extracted from the dataset using stacked autoencoders and the dataset was classified via a SoftMax layer. To validate results, various evaluation metrics like precision, recall, specificity, and F1-score were calculated. The DNN achieved an accuracy rate of 86.26%, which outperformed various existing classification methods. Dahiwade et al. [24] designed a model based on k-nearest neighbor and a convolutional neural network using the PIMA diabetes dataset sourced from the UCI machine learning repository. The authors developed a framework to identify risk factors in patients based on a binary classification approach. For the sickness forecast, this research utilized KNN and CNN classifiers for an exact forecast of diabetes risk factors. CNN achieved the highest accuracy rate of 84.5%. In addition, as far as time and space complexity are concerned, KNN requires more resources than CNN. Huang and Lu [25] proposed a model that combines information gain and deep neural networks to reduce the complications and severity of diabetes. The authors used the Waikato Environment for Knowledge Analysis (WEKA) tool for implementation and to select optimal attributes in order to identify the risk factors towards diabetes. The developed model achieved an accuracy rate of 90.26%. Joshi and Borse [26] developed a GUI tool by using the R2015a (MATLAB version 8.5) integrated development environment to predict whether a patient is diabetic. The backpropagation neural network was trained and tested to build software that acts as a medium between the doctor and patients. The dataset used contained eight contributing parameters to identify diabetes. As far as results are concerned, BPNN achieved an accuracy rate of 81%, with a smaller number of iterations and a minimum Mean Square Error (MSE) rate of 0.107%. Sumi and Balachandran [27] developed a predictive model using various multilayer perceptron (MLP) algorithms for diabetes. The experimental study utilized the Matlab R2013 integrated development environment and the PIMA diabetes dataset. The results revealed that among all the algorithms, the Levenberg–Marquardt algorithm demonstrated optimal performance in terms of various matrices, like precision, recall, and F1-score.
Personalized recommender systems for diabetes management can be developed based on various sources, including clinical pathways, hospital repositories, laboratory tests, electronic health records (EHRs), and lifestyle indicators from multiple applications. By analyzing each source, recommender systems can offer tailored suggestions for medication regimens, diet plans, exercise routines, and lifestyle modifications. These systems leverage AI-based techniques to adapt recommendations based on individual preferences, clinical parameters, and responses to interventions. Here are some main approaches to building such systems: Nguyen et al. [28] presented a case study focusing on the correlation between lifestyle, diet, and the risk of developing Type 2 Diabetes Mellitus in Vietnam. Data from 1100 patients aged 40 to 65 years were collected for the research. In this work, the authors recognized the significance of lifestyle and dietary patterns among the Vietnamese population. Different statistical observations were made from the data samples to examine the relationship between diet and lifestyle with respect to T2DM. Balducci et al. [29] performed a systematic review related to the effect of diet on Type 2 Diabetes Mellitus. They identified and demonstrated various diet charts related to improvements in metabolic conditions. Patients with Type 2 Diabetes can look over numerous useful dietary regimens according to their preferences and social customs. The authors suggested that weight is a significant risk factor for different chronic diseases. Alian et al. [30] proposed a proactive personalized recommender system specifically for American Indians (AI) to fight against diabetes. A mobile application was used as a platform to gather information for smart healthcare systems for AI patients. Evaluation of the system was based on use cases and human expert verification, demonstrating the reliability and validity of the system. The authors proposed that this research can be extended to different people all over the globe by collecting patient information and using the same recommended system for diabetes prevention. Aklima et al. [31] developed a framework for a diet model among patients suffering from Type 2 Diabetes in Indonesia. For observation, sixty patients suffering from T2DM were selected from four villages of the Banda Raya Community Health Centre in Banda Aceh, Indonesia. The results showed a moderate level of dietary behaviors among patients with T2DM, but the identification of required caloric intake was at a low level. So, the authors suggested that this work can be extended to develop a realistic framework for dietary self-management using an intervention approach to improve the diet of patients suffering from Type 2 Diabetes Mellitus. Karami et al. [32] implemented a framework by analyzing public opinions related to diabetes, including diet, exercise, and obesity (DDEO), extracted from tweets. Analysis of a dataset comprising 4.5 million tweets showed that 8% of the tweets discussed diabetes, 23.7% mentioned diet, 16.6% referenced exercise, and 51.7% addressed obesity. The research revealed that the most significant correlation was between exercise and obesity, followed by diabetes and obesity. These data can serve as valuable support for clinical providers and public health experts to gain better insights into prevailing public opinions regarding diabetes, diet, exercise, and obesity. Mohammed and Hagras [33] designed a diet recommendation system based on fuzzy logic for diabetes. They provide a balanced diet plan for patients to achieve a healthy lifestyle and to control and reduce complications of the disease. The authors suggested that the proposed white-box (T2FS) model can be used for large datasets, along with the consultation of domain expertise, to achieve better results. Sharawat and Dubey [34] developed a diet model for diabetes using an analytical hierarchy process (AHP) based on multi-criteria decision-making (MCDM) method. The proposed model presented an optimum collection of various types of food based on various benchmarks. The authors suggested that this diet can be used in daily life to help prevent diabetes, leading to a healthy infancy, which is very important for a country and contributes to hale and hearty surroundings.
However, some limitations are evident across the existing literature. First, most predictive models are trained on clinical or benchmark datasets (e.g., PIMA) and focus on binary diagnosis rather than early risk stratification based purely on lifestyle indicators, which are more accessible in low-resource settings. This restricts their applicability to individuals who present with clear symptoms or have undergone clinical testing, thereby missing the window for early, lightly symptomatic, or pre-symptomatic prediction. Second, many recommender systems provide generic or rule-based advice and are not tightly integrated with a data-driven risk prediction model; they rarely adapt recommendations dynamically to an individual’s evolving lifestyle profile, sociodemographic context, and risk level. Third, there is a scarcity of large, lifestyle-centric datasets and end-to-end frameworks that simultaneously offer (i) risk prediction, (ii) multi-level patient stratification, and (iii) personalized, yet cost-effective, recommendations validated by domain experts.
To address these gaps, we propose an integrated lifestyle-based AI framework for T2DM that combines an ANN-driven risk prediction model, fuzzy logic-based risk stratification, and a personalized recommender system for diet and physical activity. Unlike prior work that relies predominantly on clinical variables, our model is trained on a real-world lifestyle dataset covering a broad age range (5–83 years) and uses routinely observable factors such as age, weight, thirst, urination, fatigue, smoking, drinking, and family history. The ANN component performs multi-class risk prediction, the fuzzy inference engine refines risk categories using expert-defined linguistic rules, and the recommender module generates individualized, economically feasible lifestyle plans tailored to each risk stratum. By focusing on early prediction from lifestyle data and tightly linking prediction with personalized recommendations, the proposed framework aims to operationalize scalable, low-cost, and patient-centered T2DM prevention and management in everyday settings.

3. Methodology Adopted

Figure 2 illustrates the overall workflow of the proposed lifestyle-based fuzzy-enhanced ANN framework for early prediction and personalized management of Type 2 Diabetes Mellitus (T2DM). The methodology begins with the collection of user-specific lifestyle indicators, which serve as input to the ANN-based prediction module. The ANN model classifies individuals into four categories, including low-risk, moderate-risk, high-risk, and diabetic, based on their lifestyle patterns. These prediction results are then integrated with expert-curated knowledge from dieticians and nutritionists to generate personalized recommendations. Also, recommender system processes the predicted risk category and provides tailored diet and physical activity plans. This system incorporates a web-scraping module that extracts cost-effective dietary options from multiple online sources, enabling the construction of affordable meal packages. In parallel, expert-designed physical exercise charts are mapped to each risk class. The final output consists of individualized diet plans and exercise charts aligned with the user’s risk level, supporting early intervention, lifestyle modification, and continuous self-management.
Figure 2. Proposed methodology of the research work.
Figure 3 illustrates the end-to-end development process of an AI-driven system for Type 2 Diabetes Mellitus (T2DM) prediction and management. The focus is on the convergence of AI techniques and domain expertise in a user-centric, modern platform that brings together clinical insights and effective lifestyle intervention. The Personalized Care and Intelligence System for Early Diabetes Assessment (PCISEDA) system incorporates several components, such as data collection from various sources, preprocessing methods, data prediction through an ANN model, a classification system based on fuzzy logic, and a recommender system for personalized lifestyle modification recommendations. Overall, PCISEDA can be viewed as a comprehensive approach to preventing and controlling diabetes, as it focuses on early detection and lifestyle changes to prevent progression. In this way, this system not only assists medical professionals with data processing and decision-making, but it also enables people to make informed decisions to prevent T2DM in their lives or at least delay its onset.
Figure 3. Comprehensive framework (PCISEDA) for T2DM prediction and personalized management system.

3.1. Data Selection and Collection

Data is collected through survey forms/questionnaires and Google Forms, as shown in Figure 4. They were designed after selecting lifestyle indicators in consultation with domain experts, such as diabetologists and endocrinologists. Data from hospitals were collected through survey forms, while Google Forms were shared among diverse groups of individuals, ensuring a good combination of patient/candidate participants, like people from different areas, male–female ratio, patients from various classes (urban and rural areas), and adults from different age groups. The selected parameters/attributes were intentionally kept simple, as they are commonly known and represent outcomes of sedentary lifestyles and poor dietary habits. Importantly, no prior knowledge or awareness about the disease was necessary, and clinical advice or guidance was not required for participants to provide the requested data.
Figure 4. Google Forms and survey form/questionnaire for data collection.

3.2. Parameter Information

The dataset contains 1939 records with 11 biological and lifestyle parameters. The first ten parameters are predicators, and the last one is the target variable. Table 1 presents the attribute descriptions, measurement units, and the range of values for each parameter.
Table 1. Attribute information of the dataset.

3.3. Exploratory Data Analysis

Exploratory data analysis (EDA) is a crucial technique used to uncover meaningful patterns and insights within a dataset, essential for various tasks [35]. It involves the application of descriptive statistics to data, helping us gain a thorough understanding of its characteristics and structure. EDA for data preprocessing includes data wrangling, data standardization, and data transformation to improve the quality assessment of the considered dataset. In the following subsections, we present and discuss the obtained results.

3.3.1. Descriptive Statistics of Parameters

Table 2 displays statistical measurements for the dataset’s lifestyle parameters. These include record count, min, 25%, 50%, and 75% frequencies, max, mean, and Std for each parameter. For instance, age has 1939 records, with mean of 41.77 and a Std of 15.84, ranging from 5 to 83. The same statistical analysis applies to other parameters present in the considered dataset.
Table 2. Descriptive statistics of the diabetes mellitus dataset.

3.3.2. Age-Wise Distribution

The age-cohort analysis examines the distribution of participants across the five age groups (5–17, 18–30, 31–45, 46–60, >60 years) and the corresponding prevalence of Type 2 Diabetes. As presented in Table 3, diabetes prevalence increases steadily with age, rising from 2.7% in the youngest cohort to 47.5% among individuals above 60 years. While the majority of participants fall within the 31–45 age group, the highest disease burden is concentrated among older adults. This pattern highlights a strong age-related trend in diabetes risk, confirming that susceptibility increases significantly after age 45. The high prevalence in older cohorts emphasizes the need for targeted screening and preventive strategies.
Table 3. Age-cohort distribution of participants and corresponding prevalence of Type 2 Diabetes.
Although the dataset spans a wide age range (5 to 83 years), we employed a unified ANN-based model for prediction due to the absence of sufficient cohort-specific training samples, especially in younger age brackets. Preliminary stratified performance checks across age cohorts (children < 18, adults 18–60, seniors > 60) revealed that the unified model retained high classification accuracy (>90%) across all cohorts without notable loss in sensitivity or specificity. Incorporating age as an input lifestyle indicator allowed the model to implicitly adjust predictions based on age-specific patterns. The findings support the modeling approach used in this study by demonstrating that age is a critical predictor for early diabetes risk assessment and must be incorporated into lifestyle-based AI frameworks for accurate early detection.

3.3.3. Data Preprocessing

Data preprocessing is a crucial step in ensuring the integrity and reliability of the model-building process [36]. In this study, we applied the isnull() function and the SimpleImputer strategy to inspect the dataset for missing values. We also executed additional validation steps, including duplicate detection, corruption checks, and parameter-level consistency verification. No missing values, duplicate entries, or corrupted records were identified during this assessment, indicating that the dataset was already clean and complete. To investigate the presence of outliers, we used the Interquartile Range (IQR) method; however, no significant outliers were detected across the lifestyle variables.
To assess the statistical strength of associations between the categorical lifestyle variables and the diabetes outcome, we used Cramér’s V correlation matrix, as presented in Figure 5. To further strengthen interpretation, the statistical significance of each association was evaluated using the chi-square test of independence. All correlation values accompanied by p < 0.05 were considered statistically significant. The analysis confirmed that several lifestyle factors, particularly age, urination frequency, thirst, fatigue, weight, and family history, show statistically significant associations with diabetes outcome (p < 0.05), indicating that these variables meaningfully contribute to risk differentiation. Conversely, variables with lower Cramér’s V values and non-significant p-values (p ≥ 0.05), such as sex and smoking, demonstrated weaker or negligible relationships with diabetes status. Incorporating statistical significance supports the robustness of the correlation matrix and validates the inclusion of the most influential lifestyle predictors in the subsequent modeling process.
Figure 5. Cramer’s V association matrix, which summarizes effect sizes between lifestyle variables. Pairwise statistical significance for all variable combinations was additionally computed using chi-square tests. Only associations with p < 0.05 were treated as statistically significant during interpretation, although individual p-values are not embedded in the heatmap to maintain visual clarity.
Table 4 presents the association between each lifestyle factor and the presence of Type 2 Diabetes (T2DM), quantified by an appropriate correlation coefficient (Cramer’s V) alongside the p-value from a chi-square test of independence. Because Cramer’s V does not provide directionality, the indicated arrows reflect the empirical trend observed in the data (e.g., higher weight more common in diabetics), rather than positive or negative correlation in the statistical sense. Notably, age, weight, frequency of thirst, frequent urination, self-reported fatigue, and family history of diabetes all show moderate correlation coefficients and reach statistical significance (p < 0.05). These significant associations suggest that individuals who are older or have higher body weight, experience excessive thirst and urination, report chronic fatigue, or have a family history of diabetes are more likely to have T2DM. In contrast, factors such as sex, smoking status, alcohol drinking habits, and height exhibit low correlation values that do not attain significance (p ≥ 0.05). This indicates that gender and these particular lifestyle choices or traits have only a weak or negligible relationship with diabetes in our dataset.
Table 4. Cramer’s V coefficients and p-values for lifestyle factors with respect to T2DM outcome.
The inclusion of p-values and significance indicators underscores the robustness of the observed correlations and directly addresses the need for statistical support in our analysis. Associations accompanied by p < 0.05 validate that the correlations are unlikely due to chance, reinforcing the importance of the corresponding lifestyle factors in relation to T2DM. For instance, the significant correlations for age, weight, thirst, urination frequency, fatigue, and family history confirm that these factors are non-randomly linked to diabetes outcomes, lending credibility to their role as key predictors. Conversely, the lack of statistical significance for sex, smoking, drinking, and height aligns with their low correlation values, confirming that any observed relationship for these features is not statistically meaningful.

3.4. Diet Plans and Physical Exercise Charts

Recent research studies in medical sciences have proven experimentally that diet and exercise play an important role in controlling, delaying, and managing diabetes, especially Type 2 Diabetes Mellitus [37,38,39]. Existing literature demonstrates relationships/associations between diet and exercise plans and various chronic diseases [40]. To mitigate the risk of developing diabetes in the future, it is essential to consistently monitor and maintain current lifestyle indicators, including factors such as height, weight, obesity, diet, food consumption, and exercise habits [41]. By regulating insulin levels in the blood, a well-balanced diet and regular physical exercise can assist in maintaining optimal blood sugar levels in both diabetic and potentially diabetic individuals [42].

3.5. Artificial Neural Network

An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of biological neurons, capable of learning complex nonlinear relationships from data [43]. The network architecture used in this study consists of three hidden layers (as shown in Figure 6), enabling deep feature extraction and robust classification of lifestyle-based risk patterns related to Type 2 Diabetes Mellitus (T2DM). Equation (1) represents the computation within a single neuron (typically within an input or hidden layer), which is a component of the larger ANN topology.
y i =   (   i = 1 n w i j x j + b i )  
where y i the predicted output for class i , is the activation function (ReLU for hidden layers, Softmax in the output layer), x j denotes the j-th input feature, w i j is the weight connecting feature j to output node i , and b i is the bias term.
Figure 6. ANN topology for T2DM prediction.

3.6. ANN Architecture and Topology

We employed the ANN architecture for prediction of the considered disease, as shown in Figure 6. The input to the ANN consists of all predictor variables and input features. Artificial Neural Networks (ANNs) were well-suited for this research, as evident from previous studies. ANNs excel at handling complex, nonlinear relationships in datasets through their multi-layered structure, enabling deep feature extraction and meaningful output processing. ANNs are highly versatile and capable of learning, improving performance by adjusting weights using backpropagation algorithms to reduce errors. They can generalize from training data to unseen data, making them effective predictors. Their flexibility allows applications ranging from image recognition to disease diagnosis. Due to their capacity for integration and learning from diverse data sources, they represent an essential tool that can solve many problems with high accuracy. ANNs were used as they can approximate complex relationships, adapt, and learn better than classic methods, and have been successfully applied across various fields.
In this architecture, three hidden layers were used during experimentation and tuning. A spline-based membership function divides the non-diabetic class to identify the probability of future disease involvement. Non-diabetic patient records are divided into the following sub-classes: low-risk (A), moderate-risk (B), and high-risk (C) using reverse feature engineering. The four classes—A, B, C, and D—present the output. The results are discussed in the following subsections.

3.6.1. Fuzzy Membership Function

Fuzzy logic is widely used in fields like artificial intelligence, where imprecision or vagueness exists in data values with respect to the outcome of a problem [44]. Fuzzy logic offers a straightforward approach to interference, even when dealing with noisy, vague, ambiguous, imprecise, or incomplete data. It operates as a multi-valued logic that approximately handles reasoning, prioritizing flexibility over precision [45]. In this work, the fuzzy S-shaped membership function (Equation (2)) is applied solely for deriving a continuous risk score that is subsequently used to subdivide the non-diabetic population into three meaningful risk groups (classes A–C). Only the derived risk score is fuzzified; the original lifestyle parameters remain unchanged and are not used in their fuzzified form for model training. Therefore, the fuzzified values represent the risk score transformation used for class labeling, whereas the ANN receives the normalized original feature values as input. This ensures that fuzzification influences only the generation of risk-based class labels, not the feature representations used by the ANN. The parameters ‘a’ and ‘b’ correspond to the endpoints of the inclined segment of the curve. A visual illustration of the spline-shaped membership function can be observed in Figure 7. A membership function is defined by three key aspects: Core, Support, and Boundary and it scales down the data samples between “0.0 to 1.0” with a corresponding degree of membership.
μ x ; a , b = 0 ,                                                     x a 2 x a b a 2 ,                                                 a x a + b 2 1 2 x b b a 2 ,                                 a + b 2 x b 1 ,                                                     x b  
Figure 7. Curve of spline-shaped membership function.

3.6.2. Fuzzy Logic-Based Probability Calibration

The fuzzy logic-based spline calibration described in this section is applied exclusively to the ANN output probabilities, not to the raw lifestyle input features. After the ANN produces the predicted probability vector, a class-wise monotonic spline function is fitted to refine these probabilities. This calibration step adjusts the ANN outputs to improve probability reliability and enforce normalization across the four T2DM risk classes. No fuzzification or spline transformation is applied to input features at any stage; the only fuzzified values in the model pipeline are those used earlier for generating class labels (Section 3.6.1).
To improve the reliability of the ANN-predicted probability scores, we apply a monotonic spline-based calibration method inspired by isotonic regression. In this approach, the raw predicted probabilities for each class are first sorted and then fitted with a monotonic cubic regression spline that enforces non-decreasing behavior across the probability range. This ensures smooth, well-behaved calibrated scores while preserving the ranking of samples. The calibrated class-specific spline functions are subsequently normalized across all classes so that the resulting probability vector remains within the m-dimensional probability simplex. The overall calibration workflow is formally described in Algorithm 1.
Let
P ^ k = P ^ 1 k ,   P ^ 2 k ,   P ^ 3 k ,   P ^ 4 k
be the ANN-predicted probability vector for sample k , where
k = 1 , 2 , 3 , 4 .. n  
and n = 1939 . For each class i 1 , 2 , 3 , 4 we construct the corresponding one-vs-all indicator label:
y i k = 1 ,   i f   s a m p l e   k   belongs   to   class   i , 0 ,   o t h e r w i s e .  
A monotonic spline-based regression function F i . is fitted between the predicted probabilities and the true one-vs-all labels:
F i = S p l i n e F i t   P ^ i k   ,   y i k  
After fitting the spline for each class, the calibrated probability for class i is computed as follows:
P ˜ i k =   F i   ( P ^ i k ) j = 1 4 F j   ( P ^ j k )
This normalization guarantees:
j = 1 4 P ˜ i k = 1 ,   k     :
Algorithm 1: Spline-Based Multi-class Calibration
Input: P ^     0 , 1 n × m : predicted probability matrix from the ANN.
              Y   1 , 2 , 3 , 4 , .. m n : true class labels encoded as integers
Output: A calibrated multiclass mapping F :   Δ m   Δ m , where Δ m denotes the m -dimensional probability simplex
1.    For each class k = 1 , 2 , .. m :
       1.1  Extract the raw predicted probabilities:
          P ^ k = P ^ : , k
       1.2  Construct the one-vs-all indicator vector: for each sample k = 1 , .. ,   n ,
          y i k = 1 ,     i f   Y i = k , 0 ,     o t h e r w i s e .  
       1.3  Sort probabilities for monotonic spline fitting:
             •  Sort P ^ k in ascending order
             •  Record y k i using the same sorted indices
       1.4  Fit a monotonic cubic regression spline:
               f k = S p l i n e F i t   P ^ k ,   y k
              where SplineFit denotes a monotonic cubic spline fitted under isotonic (non-decreasing) constraints, ensuring smoothness and proper probability behaviour.
2.    For i = 1 , .. n :
            Compute the calibrated score for class k :
             F k x i =   f k   P ^ i k j = 1 m f j   P ^ i j
         This normalization ensures that
             k = 1 m F k   x i = 1
and that the final calibrated vector lies in the m-dimensional probability simplex Δ m

3.6.3. ANN Training and Testing Configuration

To ensure reproducibility and transparency, the complete training and testing configuration of the ANN model used for T2DM risk classification is summarized in Table 5. The dataset was split into 70% for training and 30% for testing, ensuring that all four classes (A–D) were proportionally represented in both sets. The ANN consists of three hidden layers and uses the ReLU activation function for all hidden layers, while the output layer employs Softmax for multi-class probability prediction. Model optimization was performed using the Adam optimizer, with categorical cross-entropy as the loss function. Training was conducted for 150 epochs with early stopping, a batch size of 32, and a learning rate of 0.001. This configuration allowed stable convergence, effective generalization, and robust classification performance across all four risk classes.
Table 5. ANN architecture, training, and testing configurations.

4. Experiment, Results, and Discussion

In this section, we present and discuss the outcomes of our proposed framework for a realistic healthcare management system for Type 2 Diabetes Mellitus. The results are directed towards improving self-care strategies for individuals affected by Type 2 Diabetes Mellitus (T2DM) through the utilization of lifestyle data.

4.1. Performance Evaluation of the Fuzzy-Based ANN Model

To extend binary classification to multi-class scenarios, we explored previous research works [46,47]. We applied the concept of smoothing spline function to be extended to higher dimensions, known as thin-plate splines. Nevertheless, directly extending this approach to the multi-class context becomes unfeasible due to the exponential increase in the required number of knots with the growing number of classes. To prepare the lifestyle dataset for predictive modeling, a structured feature engineering workflow was implemented, comprising normalization, fuzzy membership computation, and class-level risk categorization. All lifestyle parameters were normalized to the interval [0,1], as presented in Table 6, to ensure uniform scaling and prevent any feature from disproportionately influencing ANN training. This transformation preserves the underlying behavioral and physiological trends while improving numerical stability during model optimization.
Following normalization, a spline-based fuzzy membership function was applied to quantify the degree of association between an individual’s lifestyle indicators and their likelihood of developing Type 2 Diabetes Mellitus (T2DM). Unlike crisp threshold-based methods, the fuzzy membership function yields graded scores ranging from 0 to 1, capturing smooth and nonlinear transitions in symptom severity. This approach is particularly effective in early-stage or borderline cases, where risk patterns may not be distinctly separable through rigid clinical cut-off points. The resulting fuzzified values were subsequently mapped to four clinically interpretable T2DM risk classes:
  • Class A (0.0–0.3): Low-risk;
  • Class B (0.4–0.6): Moderate-risk;
  • Class C (0.7–0.9): High-risk;
  • Class D (1.0): Confirmed diabetic.
Table 6 presents the complete results of this feature engineering process, demonstrating how raw lifestyle attributes are systematically transformed into calibrated fuzzy scores and subsequently into risk categories. The table clearly shows the continuity between lifestyle patterns and disease likelihood, validating the fuzzy approach as an effective intermediary between raw data and predictive modeling. The outcomes of this feature engineering framework are crucial for both model performance and clinical interpretability. First, the fuzzy membership values enrich the ANN’s input structure by encoding nuanced symptom variations that may not be sufficiently captured by traditional normalization alone. Second, the risk categories (A–D) bridge the predictive ANN model and the personalized recommendation module, enabling the system to generate tailored diet and physical activity plans aligned with the user’s risk level. Finally, this methodology reinforces the model’s ability to detect early-stage T2DM risk by highlighting subtle but clinically relevant behavioral patterns, thereby enhancing both diagnostic precision and the practical utility of the recommender system.
Table 6. Results of feature engineering process.
Table 6. Results of feature engineering process.
S. NoAgeUrinationThirstWeightHeightFatigueOutcomeFuzzified ValueClass
 1 0.3403120.1893490.091840.536350.2458020.6569100.100281A [0.0–0.3]
20.6171880.7041420.173460.487730.5395590.6569100.749944C [0.7–0.9]
30.5721870.5739640.163270.647620.3710890.9222800.570111B [0.4–0.6]
40.7721870.7041420.25510.687850.7235980.6569111D [1.0]
50.6596880.5739640.25510.604940.1576750.9222800.854766C [0.7–0.9]
60.898750.7041420.98980.373420.6864380.6569111D [1.0]
70.3276540.1893490.095280.487730.098920.9222800.000541A [0.0–0.3]
80.8346870.8934910.971840.373420.1360830.9222811D [1.0]
90.5246870.7041420.367350.463650.1576750.6569100.458965B [0.4–0.6]
100.898750.7041420.98980.725650.0107960.9222811D [1.0]
110.3602430.1996750.093790.647650.2458020.6569100.100345A [0.0–0.3]
..........
..........
19380.8346870.8934910.98980.373420.2458020.9222811D [1.0]
19390.5246870.7041420.25510.463650.0107960.9222800.458965B [0.4–0.6]
Note: The numerical lifestyle parameters presented in this table correspond to the normalized original dataset values. Only the final “Fuzzified Value” column reflects the S-shaped membership transformation used for class derivation. These fuzzified values are not used as ANN inputs; the ANN is trained exclusively on non-fuzzified lifestyle features, while the fuzzy spline calibration in Section 3.6.2 is applied only to ANN output probabilities.
Performance evaluation metrics assess a model’s efficacy, reliability, and robustness. These metrics provide insights into how well a model performs on a given task, whether classification, regression, clustering, or any other machine learning or statistical analysis. In our study, we evaluated the ANN model using various metrics, including accuracy, precision, recall, F1-score, kappa, macro avg, and weighted avg, on both the training and test sets. The ANN model achieved realistic results by metric analysis, as shown in Figure 8. The training and testing accuracies of the ANN model were 95.73% and 93.64%, respectively. In terms of precision, the model achieved 96% and 94% on the training and test sets, respectively. In addition, recall and F1-score for training and testing were 96% and 93.50%, respectively. The kappa scores for the ANN model’s training and test sets were 94.30% and 91.50%, respectively. The macro avg and weighted avg of the training and testing phases were 95.50% and 94%, respectively. The confusion matrix, presented in Figure 9, presents the relationship between the actual and predicted values.
Figure 8. Performance evaluation metrics for training and testing datasets.
Figure 9. Confusion matrix for (a) training and (b) testing for the ANN Model.
Figure 10 presents the multi-class AUC-ROC curves for the ANN model across the four risk categories (Class A, Class B, Class C, and Class D). The ROC curves appear as piecewise-linear segments rather than smooth curves. This behavior arises because the ANN produces a limited number of distinct probability values for each class in the one-vs-rest evaluation setting, resulting in only a few threshold points for the computation of true-positive and false-positive rates. Despite the visual step-like form, the area under the curve (AUC) is computed over the complete set of threshold values and reliably reflects the discriminative performance of proposed model across all classes. The model demonstrates strong discriminative performance, with AUC values of 0.947 (Class A), 0.935 (Class B), 0.949 (Class C), and 0.932 (Class D). All curves lie well above the diagonal reference line, indicating that the classifier performs significantly better than random prediction across all classes.
Figure 10. Each ROC curve represents true-positive rate vs. false-positive rate across all threshold values, and the AUC reflects the area under this curve, not just one threshold point.
Among the four categories, Class C shows the highest discriminative ability (AUC = 0.949), closely followed by Class A (AUC = 0.947), while Class D exhibits the lowest but still strong performance (AUC = 0.932). These consistent AUC values confirm stable and reliable risk stratification for each diabetes category. It can be observed that AUC values above 0.93 for all classes demonstrate excellent sensitivity–specificity trade-offs and suggest strong model generalizability across varying risk profiles, including low-risk (A), moderate-risk (B), high-risk (C), and diabetic (D) categories.
To quantify the discriminative strength of each lifestyle feature in separating the four T2DM risk classes (A–D), we compute feature importance using the ANOVA (Analysis of Variance) F-statistic. This method evaluates how much of the variance of a feature is explained by class differences relative to unexplained within-class variance. Given a feature X partitioned into m classes, the F-score is computed as follows:
F = s s b e t w e e n / m 1 s s w i t h i n / N m  
where
s s b e t w e e n = i = 1 m n i   μ i μ 2
s s w i t h i n = i = 1 m k ϵ i x k μ i 2
with n i as the number of samples in class i, μ i as the mean of feature X in class i, μ as the overall mean across all samples, and N as the total number of samples.
A higher F-value indicates that the feature exhibits stronger separation between classes, meaning its class-wise differences are significantly larger than the variability within each class. This makes the ANOVA F-statistic particularly suitable for ranking feature importance in multi-class prediction problems. Given that our features are continuous and normalized (range [0,1]), the ANOVA assumptions of normality, independence, and homogeneity of variance are reasonable and consistent with standard practice in feature selection. The resulting F-scores, presented in Figure 11, highlight the lifestyle factors most influential in predicting progression toward Type 2 Diabetes Mellitus. It can be observed that age, weight, urination frequency, and thirst are the four most influential lifestyle predictors, with F-scores of 159, 150, 126, and 110, respectively. Height also contributes moderately (F = 90), while features such as fatigue, family history, drinking, smoking, and sex show comparatively lower influence. These importance scores indicate the relative contribution of each variable toward the model’s decision-making process and highlight the lifestyle factors most strongly associated with T2DM risk in the analyzed population.
Figure 11. Feature importance scores for lifestyle predictors using the ANOVA F-statistic.
To evaluate the performance of the proposed ANN model, we conducted a comparative analysis against several state-of-the-art machine learning and ensemble methods reported in the literature. Table 7 summarizes the results across key evaluation metrics, including accuracy, precision, recall, F1-score, and AUC-ROC.
Table 7. Comparative analysis of our proposed model with state-of-the-art methods.

4.2. Recommender System for Economical Diet Plans and Physical Exercise Charts

The proposed architecture, shown in Figure 12, outlines a workflow for recommending economical diet packages and physical exercise charts to potential and diabetic patients. In this framework, the results achieved through the ANN model using lifestyle parameters for prediction of T2DM are explored. These results were presented and discussed with experts, like dieticians/nutritionists, to design diet plans and physical exercise charts for different categories of patients. Also, a web crawler/web scraper was used to extract data regarding recommended diet plans in order to generate economical diet packages. Finally, the customized economical diet plans and physical exercise charts are recommended to all patient categories to follow healthy lifestyle habits in advance.
Figure 12. Recommender system for diet plans and physical exercise charts.

4.2.1. Recommendation of Diet Plans

The growing economic burden of noncommunicable diseases like T2DM diabetes, cardiovascular diseases, and cancer, is increasingly recognized worldwide [53]. Hospital treatment and readmission costs are obstructions to adopting healthy eating habits, particularly individuals from lower socioeconomic backgrounds [54]. In addition, the cost of food items for patients having lifestyle-related diseases are a barrier to healthier diet plans. With the alarming rise of chronic degenerative diseases in many countries worldwide, prioritizing healthy eating habits can no longer be delayed [55]. Computational techniques can be used to develop an economical diet model that significantly reduces in healthcare costs associated with the management of T2DM by following dietary principles aligned with the Mediterranean diet. An economical diet model offers a platform to access and identify the most budget-friendly dietary items from various sources, tailored to different categories of individuals. The workflow for developing a framework for economical diet plans is discussed in the following subsections.

4.2.2. Data Sources

Various web sources were used to collect data on recommended diet plans. Popular online websites were being used as data sources, and an API (Application Program Interface) was developed to gather diet packages for T2DM patients. Websites like Bigbasket (https://www.bigbasket.com/), Sabzi Bazar (https://www.sabzibazar.co.in/), Nature’s Basket (https://www.naturesbasket.co.in/), and Flipkart (https://www.flipkart.com/) were used as data sites because these online grocery stores have a sufficient amount of daily diet items required by end users. Figure 13 depicts the website to collect the economical data showcasing BigBasket.
Figure 13. BigBasket website used to collect data for economical diet plans.

4.2.3. Architecture for Development of Economical Diet Packages

Web mining techniques were explored to extract data from different sources, which would be hard and inefficient to obtain manually [56]. In addition, manual extraction is time-consuming and not always free from errors and bugs. The architecture shown in Figure 14 illustrates the design of the economical diet packages. Data regarding diet plans from different online sites is accessed using the uniform resource locator of each webpage. Data regarding recommended diet plans is extracted using web scrapers like Data Miner (https://dataminer.io/), Webharvy (https://www.webharvy.com/), and import. IO (https://www.import.io/). The URLs are added to webcrawler/web scraper, which accesses the webpages and extracts the relevant information from the web content. An example of the scripting program execution for URL searching on the Flipkart website is shown in Figure 15. The extracted content is stored in a content database, as shown in Figure 16 for the Bigbasket website. The content database stores all the information scraped from webpages of different URLs. In addition, data from other websites has been extracted and stored in a content database. The data stored in content database were then compared in order to select the most economical diet packages for end users. Finally, these economical diet packages are designed to cater to different patient categories.
Figure 14. Architecture for economical diet packages.
Figure 15. Output of execution for URLs searching.
Figure 16. Extracted data related to diet plans from the Bigbasket website.

4.2.4. Recommendation of Physical Exercise Charts

The physical exercise charts have been designed in consultation with domain experts for different categories of patients based on the prediction results. The prediction results were discussed in detail with experts and along with the contribution of each lifestyle parameter towards the disease. The dieticians and nutritionists provide a list of diet items and physical exercise charts for potentially diabetic and diabetic patients. The physical exercise charts for the different categories of patients are presented in Table 8 and Table 9. These customized physical exercise charts can help patients to follow healthy lifestyle habits in advance.
Table 8. Exercise chart for diabetic patients.
Table 9. Exercise chart for non-diabetic patients.

4.3. Knowledge-Based Interface

Artificial intelligence has rapidly emerged as the predominant technology in online systems, and designers can harness its capabilities to enhance web applications, enabling real-time predictions by inputting a few parameters and yielding superior results. A knowledge-based intelligent interface has been developed, i.e., Realistic Healthcare Management System Type 2 Diabetes Mellitus (RHMST2DM), for healthcare providers and end users. This web-based application is convenient, allowing users to easily provide input and receive their results. The web-based interface shown in Figure 17 is used to collect lifestyle data from users. An ANN developed and implemented a framework for prediction of T2DM is integrated with the backend of this interface to calculate the probability of disease.
Figure 17. Graphical user interface for input of lifestyle parameters.
However, the recommendations are usually general and do not take into account the special properties of the potential patients with the disease [37]. Proper recommendations for diet plans and physical exercise charts are still under research [56], e.g., a food recommendation with a description of nutritional values may be hard for patients with low medical literacy. Therefore, introducing a personalized real-time recommender system that provides diet plans and physical exercise charts for all categories of patients with the disease will be of great use. The classification of results is based on the severity/inclination towards T2DM. The predictive classes are diabetic, low-risk, moderate-risk, and high-risk non-diabetic patients, as shown in Figure 18. Based on the prediction results, users can obtain the required diet plans and physical exercise charts.
Figure 18. Classification of predicted results towards T2DM.

4.4. Case Example

The recommender system generates personalized lifestyle guidance for each risk category identified by the ANN and fuzzy logic model. Based on a user’s symptoms, demographic profile, and lifestyle patterns, the system provides tailored diet plans, physical activity recommendations, behavioral modifications, and monitoring strategies as presented in Table 10. These individualized recommendations aim to support preventive, corrective, and therapeutic interventions according to the severity of diabetes risk. The integration of a personalized recommender system bridges the gap between risk prediction and actionable lifestyle management. By offering class-specific, affordable, and practical recommendations, the framework enhances user engagement and supports early intervention, particularly for individuals in high-risk or pre-diabetic states. This ensures that the model is not merely predictive but also clinically meaningful, promoting real-world impact in diabetes prevention and management.
Table 10. Personalized lifestyle recommendations generated by the proposed recommender system for each diabetes risk class (A–D).

5. Conclusions and Future Scope

This study presents a comprehensive AI-driven framework for the early prediction and personalized management of Type 2 Diabetes Mellitus (T2DM) using real-world lifestyle indicators. The proposed model combines an Artificial Neural Network with fuzzy logic to generate accurate multi-class risk predictions while simultaneously delivering individualized lifestyle, diet, and physical activity recommendations through an integrated recommender system. The framework has been developed, validated, and assessed with support from domain experts, including endocrinologists, diabetologists, and nutritionists, ensuring clinical relevance and practical usability. Implemented as a user-friendly web application, the system provides cost-effective and accessible support for individuals at varying levels of risk, enabling informed decision-making and encouraging sustainable behavioral modifications. By incorporating web-scraped food pricing data and expert-curated menus, the recommender engine offers economically feasible diet options, enhancing accessibility for diverse socioeconomic groups. Overall, the study demonstrates that lifestyle-centric AI models can complement standard diagnostic pathways and contribute meaningfully to preventive care and early intervention strategies for T2DM.
Although the framework demonstrates strong predictive capability and applicability, several limitations remain. First, the dataset is derived from a specific regional population, which may limit generalizability across broader demographic, cultural, or dietary contexts. Second, the model relies on lifestyle parameters and does not incorporate clinical biomarkers, laboratory measurements, or genetic predispositions, which could enhance diagnostic precision. Third, the recommender system provides structured lifestyle and dietary suggestions but does not account for personalized factors such as food allergies, comorbidities, financial constraints, or preferences beyond cost efficiency. Additionally, the model does not learn from user feedback or longitudinal changes, and its performance depends on static inputs rather than dynamic monitoring.
Future work can expand the system into a more robust and adaptive decision support framework. Integration with wearable devices, such as glucometers, smartwatches, and fitness trackers, will enable real-time monitoring of physiological and behavioral parameters, supporting continuous risk assessment. Incorporating clinical biomarkers, genetic indicators, and medical history can improve predictive accuracy and support comprehensive T2DM management. The recommender system may be enhanced using reinforcement learning or context-aware engines that adapt to user feedback, preferences, and behavioral patterns over time. Scaling the framework to diverse geographical regions and cultural dietary practices will improve generalizability. Finally, adopting distributed health data architectures or federated learning approaches will allow secure analysis of large-scale heterogeneous datasets, advancing proactive and personalized T2DM prevention strategies.

Author Contributions

S.M.G.: Conceptualization, Data Curation, Methodology, Software, and Experiment; S.M.G. and M.B.M.: Formal Analysis, Validation, Visualization, Prepared Figures, and Writing—Original Draft Preparation; S.M.G. and M.B.M.: Manuscript Editing—Revision; S.M.G.: Funding Aquisition. All authors edited the last version of the review. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia, [Grant No: KFU254400].

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Karamanou, M. Milestones in the history of diabetes mellitus: The main contributors. World J. Diabetes 2016, 7, 1–7. [Google Scholar] [CrossRef]
  2. Wu, Y.; Ding, Y.; Tanaka, Y.; Zhang, W. Risk factors contributing to type 2 diabetes and recent advances in the treatment and prevention. Int. J. Med. Sci. 2014, 11, 1185–1200. [Google Scholar] [CrossRef]
  3. Martinsson, J.; Schliep, A.; Eliasson, B.; Mogren, O. Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks. J. Healthc. Inform. Res. 2020, 4, 1–18. [Google Scholar] [CrossRef] [PubMed]
  4. Sneha, N.; Gangil, T. Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data 2019, 6, 13. [Google Scholar] [CrossRef]
  5. Zhu, T.; Kuang, L.; Li, K.; Zeng, J.; Herrero, P.; Georgiou, P. Blood Glucose Prediction in Type 1 Diabetes Using Deep Learning on the Edge. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Korea, 22–28 May 2021; pp. 1–5. [Google Scholar] [CrossRef]
  6. Anand, A.; Shakti, D. Prediction of diabetes based on personal lifestyle indicators. In Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Dehradun, India, 4–5 September 2015; pp. 673–676. [Google Scholar] [CrossRef]
  7. Kaur, P.; Sharma, M. Analysis of Data Mining and Soft Computing Techniques in Prospecting Diabetes Disorder in Human Beings: A Review. Int. J. Pharm. Sci. Res. 2018, 9, 2700. [Google Scholar] [CrossRef]
  8. “million. Available online: https://diabetesatlas.org/resources/idf-diabetes-atlas-2025/ (accessed on 5 November 2025).
  9. Oliver, S.J. The role of traditional medicine practice in primary health care within Aboriginal Australia: A review of the literature. J. Ethnobiol. Ethnomed. 2013, 9, 46. [Google Scholar] [CrossRef]
  10. Nigussie, S.; Godana, A.; Birhanu, A.; Abdeta, T.; Demeke, F.; Lami, M.; Gemechu, K.; Eyeberu, A.; Bogale, K.; Dechasa, D.B.; et al. Practice of Traditional Medicine and Associated Factors Among Residents in Eastern Ethiopia: A Community-Based Cross-Sectional Study. Front. Public Health 2022, 10, 915722. [Google Scholar] [CrossRef]
  11. Kaur, J.; Khehra, B.S.; Singh, A. Significance of Fuzzy Logic in the Medical Science. In Computer Vision and Robotics; Bansal, J.C., Engelbrecht, A., Shukla, P.K., Eds.; Springer: Singapore, 2022; pp. 497–509. [Google Scholar]
  12. Brahma, B.; Bhuyan, H.K. Soft Computing and Machine Learning Techniques for e-Health Data Analytics. In Connected e-Health: Integrated IoT and Cloud Computing; Mishra, S., González-Briones, A., Bhoi, A.K., Mallick, P.K., Corchado, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 83–104. [Google Scholar] [CrossRef]
  13. Gowthami, S.; Venkata Siva Reddy, R.; Ahmed, M.R. Personalized diabetes diagnosis using machine learning and electronic health records. Int. J. Electr. Comput. Eng. 2024, 14, 4791–4801. [Google Scholar] [CrossRef]
  14. Guelmami, N.; Fekih-Romdhane, F.; Mechraoui, O.; Bragazzi, N.L. Injury Prevention, Optimized Training and Rehabilitation: How Is AI Reshaping the Field of Sports Medicine. New Asian J. Med. 2023, 1, 30–34. [Google Scholar] [CrossRef]
  15. Galaviz, K.I.; Narayan, K.M.V.; Lobelo, F.; Weber, M.B. Lifestyle and the Prevention of Type 2 Diabetes: A Status Report. Am. J. Lifestyle Med. 2018, 12, 4–20. [Google Scholar] [CrossRef]
  16. Mahboob Alam, T.; Iqbal, M.A.; Ali, Y.; Wahab, A.; Ijaz, S.; Baig, T.I.; Hussain, A.; Malik, M.A.; Raza, M.M.; Ibrar, S.; et al. A model for early prediction of diabetes. Inform. Med. Unlocked 2019, 16, 100204. [Google Scholar] [CrossRef]
  17. Ganie, S.M.; Reddy, B.B.; H.K.; Rege, M. An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data. Decis. Anal. J. 2025, 14, 100539. [Google Scholar] [CrossRef]
  18. Sadiq, I.Z.; Katsayal, B.S.; Ibrahim, B.; Ibrahim, M.; Hassan, H.A.; Ghali, U.M.; Usman, A.G.; Usman, A.; Abba, S.I. Data-driven diabetes mellitus prediction and management: A comparative evaluation of decision tree classifier and artificial neural network models along with statistical analysis. Sci. Rep. 2025, 15, 19339. [Google Scholar] [CrossRef] [PubMed]
  19. Ganie, S.M.; Pramanik, P.K.D.; Malik, M.B.; Nayyar, A.; Kwak, K.S. An Improved Ensemble Learning Approach for Heart Disease Prediction Using Boosting Algorithms. Comput. Syst. Sci. Eng. 2023, 46, 3993–4006. [Google Scholar] [CrossRef]
  20. Ganie, S.M.; Malik, M.B.; Arif, T. Early prediction of diabetes mellitus using various artificial intelligence techniques: A technological review. Int. J. Bus. Intell. Syst. Eng. 2022, 1, 325–346. [Google Scholar] [CrossRef]
  21. Bala Manoj Kumar, P.; Srinivasa Perumal, R.; Nadesh, R.K.; Arivuselvan, K. Type 2: Diabetes mellitus prediction using Deep Neural Networks classifier. Int. J. Cogn. Comput. Eng. 2020, 1, 55–61. [Google Scholar] [CrossRef]
  22. Rahman, M.; Islam, D.; Mukti, R.J.; Saha, I. A deep learning approach based on convolutional LSTM for detecting diabetes. Comput. Biol. Chem. 2020, 88, 107329. [Google Scholar] [CrossRef]
  23. Kannadasan, K.; Edla, D.R.; Kuppili, V. Type 2 diabetes data classification using stacked autoencoders in deep neural networks. Clin. Epidemiol. Glob. Health 2019, 7, 530–535. [Google Scholar] [CrossRef]
  24. Dahiwade, D.; Patle, G.; Meshram, E. Designing Disease Prediction Model Using Machine Learning Approach. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India; pp. 1211–1215. [CrossRef]
  25. Huang, L.; Lu, C. Intelligent Diagnosis of Diabetes based on Information Gain and Deep Neural Network. In Proceedings of the 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China; pp. 493–496. [CrossRef]
  26. Joshi, S.; Borse, M. Detection and Prediction of Diabetes Mellitus Using Back-Propagation Neural Network. In Proceedings of the 2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE), Ghaziabad, India; pp. 110–113. [CrossRef]
  27. Saji, S.A.; Balachandran, K. Performance analysis of training algorithms of multilayer perceptrons in diabetes prediction. In Proceedings of the 2015 International Conference on Advances in Computer Engineering and Applications, Ghaziabad, India; pp. 201–206. [CrossRef]
  28. Nguyen, C.T.; Pham, N.M.; Tran, D.V.; Lee, A.H.; Binns, C.W. Lifestyle and diet in relation to risk of type 2 diabetes in Vietnam: A hospital-based case–control study. Springerplus 2016, 5, 687. [Google Scholar] [CrossRef]
  29. Balducci, S.; Sacchetti, M.; Haxhi, J.; Orlando, G.; D’Errico, V.; Fallucca, S.; Menini, S.; Pugliese, G. Physical exercise as therapy for type 2 diabetes mellitus. Diabetes Metab. Res. Rev. 2014, 30, 13–23. [Google Scholar] [CrossRef]
  30. Alian, S.; Li, J.; Pandey, V. A Personalized Recommendation System to Support Diabetes Self-Management for American Indians. IEEE Access 2018, 6, 73041–73051. [Google Scholar] [CrossRef]
  31. Aklima, A.; Kritpracha, C.; Thaniwattananon, P. Dietary Behaviors Among Patients with Type 2 Diabetes Mellitus in Indonesia. Nurse Media 2013, 3, 499–509. [Google Scholar]
  32. Karami, A.; Dahl, A.A.; Turner-McGrievy, G.; Kharrazi, H.; Shaw, G. Characterizing diabetes, diet, exercise, and obesity comments on Twitter. Int. J. Inf. Manag. 2018, 38, 1–6. [Google Scholar] [CrossRef]
  33. Mohammed, H.A.; Hagras, H. Towards Developing Type 2 Fuzzy Logic Diet Recommendation System for Diabetes. In Proceedings of the 2018 10th Computer Science and Electronic Engineering (CEEC), Colchester, UK; pp. 56–59. [CrossRef]
  34. Sharawat, K.; Dubey, S.K. Diet Recommendation for Diabetic Patients Using MCDM Approach. In Intelligent Communication, Control and Devices; Singh, R., Choudhury, S., Gehlot, A., Eds.; Springer: Singapore, 2018; pp. 239–246. [Google Scholar]
  35. Komorowski, M.; Marshall, D.C.; Salciccioli, J.D.; Crutain, Y. Exploratory Data Analysis. In Secondary Analysis of Electronic Health Records; MIT Critical Data, Ed.; Springer International Publishing: Cham, Switzerland, 2016; pp. 185–203. [Google Scholar] [CrossRef]
  36. Gonzales Zelaya, C.V. Towards Explaining the Effects of Data Preprocessing on Machine Learning. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, China; pp. 2086–2090. [CrossRef]
  37. Agapito, G.; Simeoni, M.; Calabrese, B.; Caré, I.; Lamprinoudi, T.; Guzzi, P.H.; Pujia, A.; Fuiano, G.; Cannataro, M. DIETOS: A dietary recommender system for chronic diseases monitoring and management. Comput. Methods Programs Biomed. 2018, 153, 93–104. [Google Scholar] [CrossRef]
  38. Ganie, S.M.; Malik, M.B.; Arif, T. Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches. J. Diabetes Metab. Disord. 2022, 21, 339–352. [Google Scholar] [CrossRef]
  39. Nelson, K.M.; Reiber, G.; Boyko, E.J. Diet and Exercise Among Adults With Type 2 Diabetes: Findings from the Third National Health and Nutrition Examination Survey (NHANES III). Diabetes Care 2002, 25, 1722–1728. [Google Scholar] [CrossRef]
  40. Ganie, S.M.; Malik, M.B. An ensemble Machine Learning approach for predicting Type-II diabetes mellitus based on lifestyle indicators. Healthc. Anal. 2022, 2, 100092. [Google Scholar] [CrossRef]
  41. Wilcox, G. Insulin and Insulin Resistance. Clin. Biochem. Rev. 2005, 26, 19–39. [Google Scholar]
  42. Madhiarasan, M.; Louzazni, M. Analysis of Artificial Neural Network: Architecture, Types, and Forecasting Applications. J. Electr. Comput. Eng. 2022, 2022, 5416722. [Google Scholar] [CrossRef]
  43. Malik, M.B.; Asger, M.; Ali, R.; Sarvar, A. A model for privacy preserving in data mining using Soft Computing techniques. In In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 181–186. [Google Scholar]
  44. Zulkifly, M.I.E.; Wahab, A.F.; Zakaria, R. B-Spline Curve Interpolation Model by using Intuitionistic Fuzzy Approach. IAENG Int. J. Appl. Math. 2020, 50, 1–7. [Google Scholar]
  45. Zadrozny, B.; Elkan, C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001. [Google Scholar]
  46. Zadrozny, B.; Elkan, C. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 694–699. [Google Scholar] [CrossRef]
  47. Singh, N.; Singh, P. Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern. Biomed. Eng. 2020, 40, 1–22. [Google Scholar] [CrossRef]
  48. Wang, X.; Wang, Y.; Zhang, S.; Yao, L.; Xu, S. Analysis and Prediction of Gestational Diabetes Mellitus by the Ensemble Learning Method. Int. J. Comput. Intell. Syst. 2022, 15, 72. [Google Scholar] [CrossRef]
  49. Liu, R.; Zhan, Y.; Liu, X.; Zhang, Y.; Gui, L.; Qu, Y.; Nan, H.; Jiang, Y. Stacking Ensemble Method for Gestational Diabetes Mellitus Prediction in Chinese Pregnant Women: A Prospective Cohort Study. J. Healthc. Eng. 2022, 2022, 8948082. [Google Scholar] [CrossRef]
  50. Yang, T.; Zhang, L.; Yi, L.; Feng, H.; Li, S.; Chen, H.; Zhu, J.; Zhao, J.; Zeng, Y.; Liu, H. Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation. JMIR Med. Inform. 2020, 8, 15431. [Google Scholar] [CrossRef]
  51. Sivashankari, R.; Sudha, M.; Hasan, M.K.; Saeed, R.A.; Alsuhibany, S.A.; Abdel-Khalek, S. An Empirical Model to Predict the Diabetic Positive Using Stacked Ensemble Approach. Front. Public Health 2021, 9, 792124. [Google Scholar] [CrossRef]
  52. Jones, J.P.H.; Abdullah, M.M.H.; Wood, D.; Jones, P.J.H. Economic modeling for improved prediction of saving estimates in healthcare costs from consumption of healthy foods: The mediterranean-style diet case study. Food Nutr. Res. 2019, 63, 1–10. [Google Scholar] [CrossRef]
  53. Blumfield, M.; Starck, C.; Keighley, T.; Petocz, P.; Roesler, A.; Abbott, K.; Cassettari, T.; Marshall, S.; Fayet-Moore, F. Diet and economic modelling to improve the quality and affordability of the australian diet for low and medium socioeconomic households. Int. J. Environ. Res. Public Health 2021, 18, 5711. [Google Scholar] [CrossRef]
  54. Perignon, M.; Vieux, F.; Soler, L.G.; Masset, G.; Darmon, N. Improving diet sustainability through evolution of food choices: Review of epidemiological studies on the environmental impact of diets. Nutr. Rev. 2017, 75, 2–17. [Google Scholar] [CrossRef]
  55. Ibrahim, K.K.; Obaid, A.J. Web Mining Techniques and Technologies: A Landscape View. In Journal of Physics: Conference Series; IOP Publishing Ltd.: Bristol, UK, 2021. [Google Scholar] [CrossRef]
  56. Latha, K.; Raj Kumar, B. Personal diabetic diet recommendation system based on trustworthiness. Int. J. Appl. Eng. Res. 2014, 9, 4967–4972. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.