Next Article in Journal
The Impact of Age, Gender, Temporality, and Geographical Region on the Prevalence of Obesity and Overweight in Saudi Arabia: Scope of Evidence
Previous Article in Journal
The Dilemma of Compulsory Vaccinations—Ethical and Legal Considerations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus

1
Department of Psychology, College of Humanities and Social Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
2
The Lin’s Clinic, Kaohsiung 807057, Taiwan
3
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
4
Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung 807378, Taiwan
*
Author to whom correspondence should be addressed.
Healthcare 2023, 11(8), 1141; https://doi.org/10.3390/healthcare11081141
Submission received: 6 March 2023 / Revised: 5 April 2023 / Accepted: 13 April 2023 / Published: 15 April 2023
(This article belongs to the Section Artificial Intelligence in Medicine)

Abstract

:
Several risk factors are related to glycemic control in patients with type 2 diabetes mellitus (T2DM), including demographics, medical conditions, negative emotions, lipid profiles, and heart rate variability (HRV; to present cardiac autonomic activity). The interactions between these risk factors remain unclear. This study aimed to use machine learning methods of artificial intelligence to explore the relationships between various risk factors and glycemic control in T2DM patients. The study utilized a database from Lin et al. (2022) that included 647 T2DM patients. Regression tree analysis was conducted to identify the interactions among risk factors that contribute to glycated hemoglobin (HbA1c) values, and various machine learning methods were compared for their accuracy in classifying T2DM patients. The results of the regression tree analysis revealed that high depression scores may be a risk factor in one subgroup but not in others. When comparing different machine learning classification methods, the random forest algorithm emerged as the best-performing method with a small set of features. Specifically, the random forest algorithm achieved 84% accuracy, 95% area under the curve (AUC), 77% sensitivity, and 91% specificity. Using machine learning methods can provide significant value in accurately classifying patients with T2DM when considering depression as a risk factor.

1. Introduction

The prevalence of diabetes mellitus was 10.5% among individuals aged 20–79 years in 2021, and it is expected to increase to 12.2% by 2045. Type 2 diabetes mellitus (T2DM) accounts for the largest portion of DM cases. The global diabetes-related health expenditures were about 966 billion USD in 2021 and are estimated to reach 1054 billion USD by 2045 [1]. Regarding the risk factors for T2DM, Ismail et al. [2] reviewed 106 studies and found that high-level serum uric acid, sleep quality and quantity, smoking, depression, cardiovascular disease, dyslipidemia, hypertension, aging, ethnicity, family history of diabetes, physical inactivity, and obesity were related to development of T2DM. Haghighatpanah et al. [3] found that female patients who were aged younger than 65 years old, obese (body mass index [BMI] ≥ 30), engaging in housework, with low high-density lipoprotein (HDL) level, and on certain types of medication were more likely to have poor glycemic control in patients with T2DM and had secondary medical complications.
Poor glycemic control was related to sociodemographic factors (such as duration of diabetes, age of onset, family history, job status, educational status, etc.), medical status (hypertension, lipid profiles, and fasting plasma glucose levels), lifestyle (dietary compliance, physical activity, self-blood glucose monitoring, and drug compliance), and complications [4]. Research also found that profiles with high lipid levels (such as low-density lipoprotein [LDL]), LDL/HDL ratio, and triglycerides [TG]/HDL ratio) were predictive markers for poor glycemic control in T2DM [5]. Poor glycemic control is associated with increased hypoglycemia, cardiovascular disease, sudden death during a severe episode, and mortality in diabetes [6]. Therefore, defining the risk factors for poor glycemic control in T2DM is important for preventing poor prognosis and for medical management.
Studies utilizing machine learning methods to predict diagnostic outcomes for chronic diseases have become increasingly popular [7,8]. In the case of diabetes, numerous studies have compared the effectiveness of various machine learning methods [9,10,11,12,13,14,15]. Nusinovici et al. [16] conducted a study that examined four different chronic diseases (including diabetes, hypertension, cardiovascular diseases, and chronic kidney disease) using machine learning techniques, such as random forest, neural networks, and standard logistic regression. Of these diseases, diabetes prediction is of significant interest. For instance, Dagliati et al. [17] identified logistic regression with machine learning as a useful tool for predicting factors related to different diabetes complications, such as retinopathy, neuropathy, and nephropathy at different time points using longitudinal data. Machine learning has become popular in predicting the probability of having diabetes, and many studies have used different variables or attributes, such as background information, BMI, and heart rate [9,10,11,12,13,14,15]. These studies used machine learning approaches to classify the likelihood of having diabetes based on these variables. Research into diabetes has led to the development of risk factor scores that enable patients to assess their own risk factors and self-care needs. For instance, Lindstrom and Tuomilehto [18] proposed a method of summing risk factor scores based on factors like physical activity and fruit and vegetable consumption. Bang et al. [19] developed a new score using several individual scores, including obesity and health habits. Recently, Yang et al. [20] went a step further by using big data and creating an online risk factor score calculation system for personalized health management. While these studies have been effective in detecting type 2 diabetes mellitus in a non-invasive and cost-effective way, they have one limitation: emotional factors have not been incorporated into the risk score calculation. The popularity of machine learning methods can be seen in the increasing number of review papers published in recent years, including those by Kavakiotis et al. [21], Abhari et al. [22], and Olusanya et al. [23]. While previous studies have shown promising results in applying machine learning methods to diabetes research, negative emotions (such as depression and anxiety) have received less attention. Some studies have investigated the use of machine learning methods to predict anxiety or depression [24,25], and satisfactory prediction rates have been achieved. However, only a few studies have examined negative emotions with diabetes-related conditions [26,27,28], and research is scarce in this direction. Ducat et al. [29] reviewed the literature and identified anxiety, depression, and eating disorders as the three major health comorbidities for diabetes.
The current state of research suggests that negative emotions (such as anxiety and depression) have not been fully considered in machine learning applications for predicting diabetes, despite evidence suggesting a connection between these factors and diabetes [30,31,32]. The careful selection of the variables from relevant literature is critical for enhancing the model’s predictive power [33,34]. Therefore, further research is needed to deepen our understanding of these relationships and contribute to the field. To address this gap, the present study aims to explore the potential interactions of risk factors using regression tree analysis and to identify the best machine learning method for the classification of diabetes when negative emotions are involved. It is hoped that this study will contribute to the development of more accurate and effective prediction models for diabetes, considering the impact of negative emotions.

2. Materials and Methods

2.1. Participants

Participants were recruited from the Department of Internal Medicine at Kaohsiung Medical University Hospital and Kaohsiung Municipal Siaogang Hospital. A total of 647 patients aged over 20 years old and diagnosed with T2DM completed the study between 25 August 2020 and 30 May 2021. The original data used traditional statistical analysis to explore the association between cardiac autonomic activity and glycemic control in T2DM and was published by Lin et al. [35]. There were 361 males (56%) and 286 females (44%). Participants’ mean age was 62.64 (SD = 10.32, range from 31 to 91 years old). The mean of the HbA1c value was 7.25% with an SD of 1.18%, ranging from 4.68 to 15.13%); the mean and SD of depression were 2.10 and 3.02, respectfully, with 84.70% within the normal range and 15.30% higher than mild depression; and the mean and SD of anxiety were 1.38 and 2.46, respectfully, with 90.88 % within the normal range and 9.12% higher than mild anxiety.
This study used machine learning of artificial intelligence to figure out the best prediction model for multiple risk factors in glycemic control. Patients with a pacemaker or arrhythmia were excluded since these conditions cannot correctly analyze the HRV indices. The institutional review board was approved by the ethics committee of Kaohsiung Medical University Hospital (KMUHIRB-E(I)-20200194), and informed consent was obtained from each patient before the study.

2.2. Measurement

Demographic data, including age, gender, and BMI, were collected. All participants were measured with the Patient Health Questionnaire-9 (PHQ-9) [36] and Generalized Anxiety Disorder-7 (GAD-7) [37] for the symptoms of depression and anxiety. The PHQ-9 includes nine items with a four-point Likert scale ranging from 0 to 3 to measure depressive symptoms during the past two weeks. The internal consistency (Cronbach’s α) of the PHQ-9 was 0.86 to 0.89, and the test-retest reliability was 0.84 [38]. The GAD-7 includes seven items with a four-point Likert scale ranging from 0 to 3 to measure anxiety-related symptoms during the past two weeks. The internal consistency (Cronbach’s α) of the GAD-7 was 0.92 and the intraclass test-retest reliability was 0.83 [37].
Subsequently, the electrocardiography (ECG) signals were collected by using a lead II QOCA portable ECG monitoring device (Quanta Computer Inc., Taiwan), which was approved by the Ministry of Health and Welfare, Taiwan (Number 005428). The five-minute ECG was measured for each patient at a sitting and resting baseline. The ECG device was connected to a Samsung Galaxy Tab A 10.1 SM-T515 (Samsung, Gyeonggi-do, Republic of Korea), which raw ECG signals that were then uploaded to the QOCA platform (Quanta Computer Inc., Taoyuan, Taiwan).
Blood samples included the lipid profiles (HDL, LDL, and TG) and HbA1c and were obtained from the electronic medical records system during the three months since the ECG had been measured. All the blood samples were measured at at least 12 h fasting.

2.3. Data Reduction and Statistical Analysis

Researchers checked the ECG waveform and deleted arrhythmia and movement artifacts. The interbeat interval (IBI) data were downloaded from the QOCA platform through Python software (Quanta, Taiwan) and then imported to the CardioPro Infiniti HRV Analysis Module (Thought Technology Ltd., Montreal, Quebec, Canada), which transformed the IBI data into the time and frequency domains of HRV. The time domain of HRV included standard deviation of normal-to-normal intervals (SDNN), root mean square of the successive differences (RMSSD), number of pairs of successive NNs that differ by more than 50 ms (NN50), and percentage of NN50 (pNN50) [39]. The frequency domain of HRV included very low frequency (VLF; 0.0033–0.04 Hz), low frequency (LF; 0.04–0.15 Hz, refers to sympathetic and parasympathetic nervous systems co-regulation or baroreceptor gain), high frequency (HF; 0.15–0.40 Hz, refers to the activity of the parasympathetic nervous system), total power (TP; 0.0033–0.4 Hz, refers to total HRV), and LF/HF ratio (refers to the activity of the parasympathetic nervous system) [39]. Due to the skewed HRV distributions, VLF, LF, HF, and TP of HRV were transformed using natural logarithms into lnVLF, lnLF, lnHF, and lnTP.
To investigate which multiple risk factors could predict HbA1c values and determine whether machine learning methods were suitable for accurately classifying poor and normal glycemic control groups, the study used the following statistical analyses. Firstly, regression tree analysis was conducted, which has been used in previous studies to identify risk factors that predict health outcomes [40,41,42]. In this study, we used regression tree analysis to determine which multiple risk factors can lead to high values of HbA1c. Secondly, while the previous analysis showed how different factors interplayed to result in high HbA1c values, it did not suggest how accurately these factors can predict the probability of developing T2DM, nor which machine learning method is best for this classification task. To address this gap, we compared different machine learning classification approaches to determine the most useful method for identifying poor or normal glycemic control groups. We transformed the HbA1c outcome variable using the criteria of 6.5% [43] and divided it into two groups: the poor glycemic control group (HbA1c values ≥ 6.5%, n = 495) and the normal glycemic control group (HbA1c values < 6.5%, n = 152).
To determine the best machine learning method for accurately classifying poor or normal glycemic control groups, we compared several classification techniques, including support vector machine (SVM), boosting, classification tree, neural network, K-nearest neighbors, and random forest. To avoid biased estimation due to unequal class sizes between groups [44], we balanced the class sizes using the groupdata2 package in R [45,46] for oversampling the minority group (which is the normal glycemic group, resulting in a final sample size of 990 subjects, with 495 in each group). We randomly split the data into 80% training data and 20% testing data and coded the outcome variable as 1 for the poor glycemic control group and 0 for the normal glycemic control group.
To compare the models, we examined their sensitivity, specificity, area under curves, and classification accuracy rate with the testing data. Sensitivity represents the proportion of correctly classified positive cases, while specificity represents the proportion of correctly classified negative cases. The AUC is the area under the receiver operating characteristic curve, which plots sensitivity against 1-specificity. We used the R packages rpart [47] and machine learning functions in JASP [48] to perform the above analyses.

3. Results

3.1. Regression Tree

The regression tree analysis used the full 647 samples. In classification tree analysis, the balanced sample was the normal glycemic control group (n = 495; 50%) and the poor glycemic control group (n = 495; 50%). The regression tree analysis identified multiple risk factors that could lead to higher values of HbA1c. The optimal fit of the model was achieved using the complexity parameter (cp) of 0.01. The starting node indicated that the average HbA1c value of the group was 7.2. The analysis suggested that patients with multiple risk factors had higher HbA1c values than those with only one risk factor. The following were the key findings:
Group 1: Patients with SDNN < 14, BMI < 34, age < 54 years old, and lnVLF < 2.6 had the highest HbA1c values (11%).
Group 2: Patients with SDNN < 14 and BMI ≥ 34 had high HbA1c values (9.5%), although it was less than the first group.
Group 3: Patients with almost the same conditions as the first group but with lnVLF ≥ 2.6 had less high HbA1c values (7.9%).
Group 4: Patients with SDNN < 14, BMI < 34, age ≥ 54 years old, and LDL ≥ 107 had higher HbA1c values (8.6%).
Group 5: Patients with multiple risk factors, including SDNN < 14, 25 ≤ BMI < 34, age ≥ 54 years old, LDL < 107, and HDL < 37 had higher HbA1c values (8.3%).
Group 6: Patients with SDNN ≥ 14, TG ≥ 72, LDL ≥ 145 had high HbA1c values (8.1%).
Group 7: Patients with SDNN ≥ 14, TG ≥ 72, LDL < 145, PHQ-9 score ≥ 3, and BMI ≥ 23 had HbA1c values of 7.5%.
The analysis suggested that multiple risk factors interacted with each other to result in higher HbA1c values. The results showed that one risk factor alone did not necessarily lead to higher HbA1c values. Figure 1 displays the regression tree results.
Note: BMI, body mass index; HDL, high-density lipoprotein; LDL, low-density lipoprotein; lnVLF, natural logarithms of very low frequency; PHQ-9, Patient Health Questionnaire-9; SDNN, the standard deviation of normal-to-normal intervals; TG, triglycerides. In each node, the number denotes the predictive mean HbA1c value of individuals within that group, while the percentage represents the proportion of individuals in that node relative to the total sample size.

3.2. Comparisons of AI Machine Learning Classification Methods

Table 1 summarizes the comparison of various machine learning classification methods including SVM, boosting, classification tree, neural network, K-nearest neighbors (KNN), and random forest, in terms of sensitivity, specificity, the area under curves, and accuracy rate on testing data. Our analysis showed that random forest had the highest sensitivity (77%) and specificity (91%), resulting in an accuracy rate of 84% and the largest area under curves of 95% among all the machine learning methods (Table 1).

4. Discussion

A gap in the previous literature was the limited inclusion of depression and anxiety as predictors of diabetes. In this study, we found evidence suggesting that depression can be an important factor in certain subgroups of T2DM. Although the causality between depression and diabetes is not clear, previous studies have reported a high comorbidity rate between these two diseases [49,50]. While some studies have examined depression, they either used depression as the outcome in a group of T2DM patients or as a predictor of other co-occurring diseases in patients with T2DM. The current study has included depression and anxiety to examine their relations with diabetes directly. Using regression tree analysis, this study identified three pathways of multiple risk factors associated with poor glycemic control in T2DM patients who have low parasympathetic activation and whose age is younger than 56 years old or in patients who have low parasympathetic activation (HF), whose age is higher than 56 years old with high LDL, and whose LF is higher or lower than 1.2.
The results of the regression tree in this study provide valuable information on how multiple factors interact to create subgroups within diabetes patients and can be informative for developing prevention strategies for T2DM. Traditional regression analysis is useful in identifying risk factors [51] but does not provide information on how these factors interact. In contrast, the regression tree analysis used in this study revealed different sets of conditions that could all lead to high HbA1c values. This approach is particularly useful in situations where people may not have certain critical risk factors but still have high HbA1c values. While some recent studies have applied decision tree analysis to understand subgroups in diabetes-related situations [25,26], the focus of these studies was not on how risk factors interact to create subgroups for HbA1c values. The current study aims to fill this gap in the literature by providing information about subgroups that can be informative for clinicians. The use of regression tree analysis is a valuable methodological contribution, as it provides a more nuanced understanding of the complex relationships between multiple risk factors and poor glycemic control in T2DM patients.
The second analysis aimed to provide further insight into the comparison of different machine learning methods for predicting diabetes, as the results from previous studies were diverse [21,22]. Our classification analysis showed that random forest had the highest prediction rate (84%) for the outcome of T2DM, among several machine learning methods. This finding is consistent with previous studies that also found random forest to be the best model [9,10,13,15], while others did not [11,12,14]. In our study, we included variables such as depression and anxiety that were not analyzed in previous studies. This could be one of the reasons why SVM was not the best-performing method in our analysis, as some studies have suggested [21]. However, a review by Abhari et al. [22] pointed out that it is important to note that the performance of different machine learning methods can vary depending on the variables being included, the methods being used, and the types of outcomes being analyzed. Therefore, our findings should be interpreted in the context of our study design and variables. In addition, the study by Raghavendra and Santosh [52] suggests another possible explanation for our finding that random forest outperformed SVM in our analysis. They found that random forest performs better than other machine learning methods when studies have fewer variables. This may be the case in our study. Therefore, the complexity of our dataset may have contributed to the superior performance of random forest over SVM.
This study has several limitations that should be considered when interpreting the results. First, the sample size of 647 patients with T2DM may be considered small for artificial intelligence machine learning analysis. Although we used balanced methods to increase the sample size of the normal glucose group to have an equal class size, the sample size was only up to 990. Previous studies have shown that machine learning can still be potentially biased with a sample size of fewer than 1000. Therefore, future studies should aim to increase the sample size to confirm the risk factors for T2DM identified in the regression tree model. Second, the data on blood glucose and lipid profiles were collected only once, which may not fully represent the long-term glycemic control for patients with T2DM. Collecting blood samples multiple times over a long period of time and exploring possible risk factors for poor glycemic control may provide more accurate results. Third, while we found that depression is a risk factor for a subgroup of T2DM patients, the causality between depression and diabetes is still not fully clear. Despite these limitations, our comparison of machine learning methods suggested that random forest is a more accurate method for analyzing small sets of features and is particularly useful when depression is included as a risk factor. Further research can extend these findings by using cross-lagging models with longitudinal data to better understand the relationship between depression and diabetes.

5. Conclusions

In conclusion, our study identified key risk factors and pathways for poor glycemic control in patients with T2DM using the regression tree algorithm. The random forest approach found that multiple risk factors are important in screening, monitoring, diagnosis, and prevention of poor glycemic control, such as considering demographic data, physiology dimensions (BMI, lipid profiles, and HRV indices), and emotional dimensions (depression and anxiety). The multiple risk factors can also be considered as a framework for designing potential intervention programs in future studies.

Author Contributions

All of the authors participated in the study. Conceptualization, Y.-L.C. and I.-M.L.; methodology, investigation, and formal analysis, Y.-L.C., Y.-R.W., I.-M.L. and C.-H.R.L.; resources and acquisition of the data, K.-D.L. and Y.-R.W.; interpretation of data, K.-D.L., Y.-R.W. and I.-M.L.; supervision and project administration, I.-M.L.; funding acquisition, I.-M.L., Y.-L.C. and C.-H.R.L. All authors have read and agreed to the published version of the manuscript.

Funding

I.-M.L. and C.-H.R.L. received a research grant from NSYSU-KMU JOINT RESEARCH PROJECT (grant number NSYSUKMU111KN-002), Kaohsiung, Taiwan (grant number KN-111KN002). Y.-L.C. received a research grant from the Ministry of Science and Technology, Taiwan (MOST-111-2410-H-037-004-MY2). The funding sources had no role in the design and conduct of the study, preparation, review, or approval of the manuscript.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Kaohsiung Medical University Hospital, Taiwan (KMUHIRB-E(I)-20200194).

Informed Consent Statement

Written informed consent was obtained from all participants involved in the study.

Data Availability Statement

Data are available via correspondence upon request.

Acknowledgments

We thank Wei-Hao Hsu at the Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Taiwan for referring patients to participate in our study. We would like to thank the study participants for participating in this study, and research assistants Cheng-Hsuan Tsai, Yi-Chi Huang, Wei-Shan Chang, Ching-Cheng Lin, Yong-Chuan Chung, Yi-Tsen Ko, and Yen-Ling Ting for data collection.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Sun, H.; Saeedi, P.; Karuranga, S.; Pinkepank, M.; Ogurtsova, K.; Duncan, B.B.; Stein, C.; Basit, A.; Chan, J.C.N.; Mbanya, J.C.; et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. J. Diabetes Res. 2022, 183, 109119. [Google Scholar] [CrossRef]
  2. Ismail, L.; Materwala, H.; Al Kaabi, J. Association of risk factors with type 2 diabetes: A systematic review. Comput. Struct. Biotechnol. J. 2021, 19, 1759–1785. [Google Scholar] [CrossRef] [PubMed]
  3. Haghighatpanah, M.; Nejad, A.S.M.; Haghighatpanah, M.; Thunga, G.; Mallayasamy, S. Factors that correlate with poor glycemic control in type 2 diabetes mellitus patients with complications. Osong Public Health Res. Perspect. 2018, 9, 167. [Google Scholar] [CrossRef]
  4. Kayar, Y.; Ilhan, A.; Kayar, N.B.; Unver, N.; Coban, G.; Ekinci, I.; Eroglu, H. Relationship between the poor glycemic control and risk factors, life style and complications. Biomed. Res. 2017, 28, 1581–1586. [Google Scholar]
  5. Artha, I.M.J.R.; Bhargah, A.; Dharmawan, N.K.; Pande, U.W.; Triyana, K.A.; Mahariski, P.A.; Yuwono, J.; Bhargah, V.; Prabawa, I.P.Y.; Manuaba, I.B.A.P.; et al. High level of individual lipid profile and lipid ratio as a predictive marker of poor glycemic control in type-2 diabetes mellitus. Vasc. Health Risk Manag. 2019, 15, 149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Amiel, S.A.; Aschner, P.; Childs, B.; Cryer, P.E.; de Galan, B.E.; Frier, B.M.; Gonder-Frederick, L.; Heller, R.S.; Jones, T.; Khunti, K.; et al. Hypoglycaemia, cardiovascular disease, and mortality in diabetes: Epidemiology, pathogenesis, and management. Lancet Diabetes Endocrinol. 2019, 7, 385–396. [Google Scholar] [CrossRef]
  7. Battineni, G.; Sagaro, G.G.; Chinatalapudi, N.; Amenta, F. Applications of machine learning predictive models in the chronic disease diagnosis. J. Pers. Med. 2020, 10, 21. [Google Scholar] [CrossRef] [Green Version]
  8. Mishra, S.; Mallick, P.K.; Tripathy, H.K.; Bhoi, A.K.; González-Briones, A. Performance evaluation of a proposed machine learning model for chronic disease datasets using an integrated attribute evaluator and an improved decision tree classifier. Appl. Sci. 2020, 10, 8137. [Google Scholar] [CrossRef]
  9. Daghistani, T.; Alshammari, R. Comparison of statistical logistic regression and random forest machine learning techniques in predicting diabetes. J. Inf. Technol. 2020, 11, 78–83. [Google Scholar] [CrossRef]
  10. Dritsas, E.; Trigka, M. Data-driven machine-learning methods for diabetes risk prediction. Sensors 2022, 22, 5304. [Google Scholar] [CrossRef]
  11. Kopitar, L.; Kocbek, P.; Cilar, L.; Sheikh, A.; Stiglic, G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci. Rep. 2020, 10, 11981. [Google Scholar] [CrossRef] [PubMed]
  12. Lai, H.; Huang, H.; Keshavjee, K.; Guergachi, A.; Gao, X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 2019, 19, 101. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Laila, U.E.; Mahboob, K.; Khan, A.W.; Khan, F.; Taekeun, W. An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study. Sensors 2022, 22, 5247. [Google Scholar] [CrossRef]
  14. Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan Rural Cohort Study. Sci. Rep. 2020, 10, 4406. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 2018, 9, 515. [Google Scholar] [CrossRef] [PubMed]
  16. Nusinovici, S.; Tham, Y.C.; Yan, M.Y.C.; Ting, D.S.W.; Li, J.; Sabanayagam, C.; Wong, Y.T.; Cheng, C.Y. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 2020, 122, 56–69. [Google Scholar] [CrossRef]
  17. Dagliati, A.; Marini, S.; Sacchi, L.; Cogni, G.; Teliti, M.; Tibollo, V.; De Cata, P.; Chiovato, L.; Bellazzi, R. Machine learning methods to predict diabetes complications. J. Diabetes Sci. Technol. 2018, 12, 295–302. [Google Scholar] [CrossRef] [Green Version]
  18. Lindström, J.; Tuomilehto, J. The diabetes risk score: A practical tool to predict type 2 diabetes risk. Diabetes Care 2003, 26, 725–6731. [Google Scholar] [CrossRef] [Green Version]
  19. Bang, H.; Edwards, A.M.; Bomback, A.S.; Ballantyne, C.M.; Brillon, D.; Callahan, M.A.; Kern, L.M. A patient self-assessment diabetes screening score: Development, validation, and comparison to other diabetes risk assessment scores. Ann. Intern. Med. 2009, 151, 775. [Google Scholar] [CrossRef]
  20. Yang, H.; Luo, Y.; Ren, X.; Wu, M.; He, X.; Peng, B.; Deng, K.; Yan, D.; Tang, H.; Lin, H. Risk prediction of diabetes: Big data mining with fusion of multifarious physical examination indicators. Inf. Fusion 2021, 75, 140–149. [Google Scholar] [CrossRef]
  21. Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 2017, 15, 104–6116. [Google Scholar] [CrossRef]
  22. Abhari, S.; Kalhori, S.R.N.; Ebrahimi, M.; Hasannejadasl, H.; Garavand, A. Artificial intelligence applications in type 2 diabetes mellitus care: Focus on machine learning methods. Healthc. Inform. Res. 2019, 25, 248–261. [Google Scholar] [CrossRef] [PubMed]
  23. Olusanya, M.O.; Ogunsakin, R.E.; Ghai, M.; Adeleke, M.A. Accuracy of machine learning classification models for the prediction of type 2 diabetes mellitus: A systematic survey and meta-analysis approach. Int. J. Environ. Res. Public Health 2022, 19, 14280. [Google Scholar] [CrossRef] [PubMed]
  24. Kumar, P.; Garg, S.; Garg, A. Assessment of anxiety, depression and stress using machine learning models. Procedia Comput. Sci. 2020, 171, 1989–1998. [Google Scholar] [CrossRef]
  25. Nemesure, M.D.; Heinz, M.V.; Huang, R.; Jacobson, N.C. Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Sci. Rep. 2021, 11, 1980. [Google Scholar] [CrossRef]
  26. Chu, H.; Chen, L.; Yang, X.; Qiu, X.; Qiao, Z.; Song, X.; Zhao, E.; Zhou, J.; Zhang, W.; Mehmood, A.; et al. Roles of anxiety and depression in predicting cardiovascular disease among patients with type 2 diabetes mellitus: A machine learning approach. Front. Psychol. 2021, 12, 645418. [Google Scholar] [CrossRef]
  27. Khalil, R.M.; Al-Jumaily, A. Machine learning based prediction of depression among type 2 diabetic patients. In Proceedings of the 2017 12th International Conference on Intelligent Systems and Knowledge Engineering, Nanjing, China, 24–26 November 2017. [Google Scholar]
  28. Rees, G.; Xie, J.; Fenwick, E.K.; Sturrock, B.A.; Finger, R.; Rogers, S.L.; Lim, L.; Lamoureux, E.L. Association between diabetes-related eye complications and symptoms of anxiety and depression. JAMA Ophthalmol. 2016, 134, 1007–1014. [Google Scholar] [CrossRef] [Green Version]
  29. Ducat, L.; Philipson, L.H.; Anderson, B.J. The mental health comorbidities of diabetes. JAMA 2014, 312, 691–692. [Google Scholar] [CrossRef]
  30. Grigsby, A.B.; Anderson, R.J.; Freedland, K.E.; Clouse, R.E.; Lustman, P.J. Prevalence of anxiety in adults with diabetes: A systematic review. J. Psychosom. Res. 2002, 53, 1053–1060. [Google Scholar] [CrossRef]
  31. Nouwen, A.; Adriaanse, M.C.; van Dam, K.; Iversen, M.M.; Viechtbauer, W.; Peyrot, M.; Caramlau, I.; Kokoszka, A.; Kanc, K.; de Groot, M.; et al. Longitudinal associations between depression and diabetes complications: A systematic review and meta-analysis. Diabet. Med. 2019, 36, 1562–1572. [Google Scholar] [CrossRef]
  32. Smith, K.J.; Deschênes, S.S.; Schmitz, N. Investigating the longitudinal association between diabetes and anxiety: A systematic review and meta-analysis. Diabet. Med. 2018, 35, 677–693. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Bagherzadeh-Khiabani, F.; Ramezankhani, A.; Azizi, F.; Hadaegh, F.; Steyerberg, E.W.; Khalili, D. A tutorial on variable selection for clinical prediction models: Feature selection methods in data mining could improve the results. J. Clin. Epidemiol. 2016, 71, 76–85. [Google Scholar] [CrossRef] [PubMed]
  34. Chowdhury, M.Z.I.; Turin, T.C. Variable selection strategies and its importance in clinical prediction modeling. Fam. Med. Community Health 2020, 8, 4. [Google Scholar] [CrossRef] [Green Version]
  35. Lin, K.D.; Chang, L.H.; Wu, Y.R.; Hsu, W.H.; Kuo, C.H.; Tsai, J.R.; Yu, M.L.; Su, W.S.; Lin, I.M. Association of depression and parasympathetic activation with glycemic control in type 2 diabetes mellitus. J. Diabetes Complicat. 2022, 36, 108264. [Google Scholar] [CrossRef] [PubMed]
  36. Spitzer, R.L.; Williams, J.B.; Kroenke, K.; Hornyak, R.; McMurray, J. Validity and utility of the PRIME-MD Patient Health Questionnaire in assessment of 3000 obstetric-gynecologic patients: The PRIME-MD Patient Health Questionnaire Obstetrics-Gynecology Study. Am. J. Obstet. Gynecol. 2000, 183, 759–769. [Google Scholar] [CrossRef]
  37. Spitzer, R.L.; Kroenke, K.; Williams, J.B.; Löwe, B. A brief measure for assessing generalized anxiety disorder: The GAD-7. Arch. Intern. Med. 2006, 166, 1092–1097. [Google Scholar] [CrossRef] [Green Version]
  38. Kroenke, K.; Spitzer, R.L.; Williams, J.B. The PHQ-9: Validity of a brief depression severity measure. J. Gen. Intern. Med. 2001, 16, 606–613. [Google Scholar] [CrossRef]
  39. Shaffer, F.; Ginsberg, J.P. An overview of heart rate variability metrics and norms. Front. Public Health 2017, 5, 258. [Google Scholar] [CrossRef] [Green Version]
  40. King, M.W.; Resick, P.A. Data mining in psychological treatment research: A primer on classification and regression trees. J. Consult. Clin. Psychol. 2014, 82, 895. [Google Scholar] [CrossRef]
  41. Lemon, S.C.; Roy, J.; Clark, M.A.; Friedman, P.D.; Rakowski, W. Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Ann. Behav. Med. 2003, 26, 172–181. [Google Scholar] [CrossRef]
  42. Richardson, B.; Fuller-Tyszkiewicz, M.; O’Donnell, R.; Ling, M.; Staiger, P.K. Regression tree analysis of ecological momentary assessment data. Health Psychol. Rev. 2017, 11, 235–241. [Google Scholar] [CrossRef] [PubMed]
  43. Wu, I.C.; Hsu, C.C.; Chen, C.Y.; Chuang, S.C.; Cheng, C.W.; Hsieh, W.S.; Wu, M.S.; Liu, Y.T.; Liu, Y.H.; Tsai, T.L.; et al. Paradoxical relationship between glycated hemoglobin and longitudinal change in physical functioning in older adults: A prospective cohort study. J. Gerontol. A Biol. Sci. 2019, 74, 949–956. [Google Scholar] [CrossRef] [PubMed]
  44. Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
  45. Olsen, L.R. Groupdata2: Creating Groups from Data. 2019. Available online: https://cran.r-project.org/package=groupdata2 (accessed on 20 October 2022).
  46. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: http://www.R-project.org/ (accessed on 20 October 2022).
  47. Therneau, T.; Atkinson, B.; Ripley, B.; Ripley, M.B. Package ‘Rpart’. Available online: https://cran.r-project.org/web/packages/rpart/rpart.pdf (accessed on 20 April 2016).
  48. JASP Team. JASP [Computer Software], Version 0.17.1; JASP Team: Washington, DC, USA, 2023.
  49. Darwish, L.; Beroncal, E.; Sison, M.V.; Swardfager, W. Depression in people with type 2 diabetes: Current perspectives. Diabetes Metab. Syndr. Obes. Targets Ther. 2018, 11, 333–343. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  50. Khaledi, M.; Haghighatdoost, F.; Feizi, A.; Aminorroaya, A. The prevalence of comorbid depression in patients with type 2 diabetes: An updated systematic review and meta-analysis on huge number of observational studies. Acta Diabetol. 2019, 56, 631–650. [Google Scholar] [CrossRef]
  51. Rothenbacher, D.; Rüter, G.; Saam, S.; Brenner, H. Younger patients with type 2 diabetes need better glycaemic control: Results of a community-based study describing factors associated with a high HbA1c value. Br. J. Gen. Pract. 2003, 53, 389–391. Available online: http://www.ncbi.nlm.nih.gov/pmc/articles/pmc1314599/ (accessed on 20 October 2022).
  52. Raghavendra, S.; Santosh, K.J. Performance evaluation of random forest with feature selection methods in prediction of diabetes. Int. J. Electr. Comput. Eng. 2020, 10, 353–359. [Google Scholar] [CrossRef]
Figure 1. Regression tree for type II diabetes (N = 647).
Figure 1. Regression tree for type II diabetes (N = 647).
Healthcare 11 01141 g001
Table 1. Comparison of classification methods of AI machine learning.
Table 1. Comparison of classification methods of AI machine learning.
ModelTesting Data
%Sensitivity Specificity Area under CurvesAccuracy Rate
Boosting51%63%63%57%
Support vector machine52%65%58%58%
Classification tree59%80%69%69%
Neural network69%80%60%74%
K-nearest neighbors 63%93%78%78%
Random forest77%91%95%84%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, Y.-L.; Wu, Y.-R.; Lin, K.-D.; Lin, C.-H.R.; Lin, I.-M. Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus. Healthcare 2023, 11, 1141. https://doi.org/10.3390/healthcare11081141

AMA Style

Cheng Y-L, Wu Y-R, Lin K-D, Lin C-HR, Lin I-M. Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus. Healthcare. 2023; 11(8):1141. https://doi.org/10.3390/healthcare11081141

Chicago/Turabian Style

Cheng, Yi-Ling, Ying-Ru Wu, Kun-Der Lin, Chun-Hung Richard Lin, and I-Mei Lin. 2023. "Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus" Healthcare 11, no. 8: 1141. https://doi.org/10.3390/healthcare11081141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop