Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes
Abstract
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Models
3.1.1. Logistic Regression
3.1.2. Decision Tree
3.1.3. Bernoulli Naïve Bayes
3.1.4. Support Vector Machine
3.1.5. K-Nearest Neighbor
3.2. Development of the Case Study
3.2.1. Dataset Processing
3.2.2. Training and Evaluation of Models
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, Z.; Han, D.; Qi, T.; Deng, J.; Li, L.; Gao, C.; Gao, W.; Chen, H.; Zhang, L.; Chen, W. Hemoglobin A1c in Type 2 Diabetes Mellitus Patients with Preserved Ejection Fraction Is an Independent Predictor of Left Ventricular Myocardial Deformation and Tissue Abnormalities. BMC Cardiovasc. Disord. 2023, 23, 49. [Google Scholar] [CrossRef] [PubMed]
- OMS Diabetes—World Health Organization. Available online: https://www.who.int/es/news-room/fact-sheets/detail/diabetes (accessed on 20 February 2023).
- OPS/OMS Diabetes—PAHO/WHO: Pan American Health Organization. Available online: https://www.paho.org/es/temas/diabetes (accessed on 20 February 2023).
- PAHO PAHO/WHO|Pan American Health Organization. Available online: https://www.paho.org/en (accessed on 25 February 2023).
- International Diabetes Federation. IDF Diabetes Atlas|Tenth Edition. Available online: https://diabetesatlas.org/ (accessed on 25 February 2023).
- El-Attar, N.E.; Moustafa, B.M.; Awad, W.A. Deep Learning Model to Detect Diabetes Mellitus Based on DNA Sequence. Intell. Autom. Soft Comput. 2022, 31, 325–338. [Google Scholar] [CrossRef]
- Mohamed, A.T.; Santhoshkumar, S. Deep Learning Based Process Analytics Model for Predicting Type 2 Diabetes Mellitus. Comput. Syst. Sci. Eng. 2022, 40, 191–205. [Google Scholar] [CrossRef]
- Philip, N.Y.; Razaak, M.; Chang, J.; Suchetha, M.S.; Okane, M.; Pierscionek, B.K. A Data Analytics Suite for Exploratory Predictive, and Visual Analysis of Type 2 Diabetes. IEEE Access 2022, 10, 13460–13471. [Google Scholar] [CrossRef]
- Susana, E.; Ramli, K.; Murfi, H.; Apriantoro, N.H. Non-Invasive Classification of Blood Glucose Level for Early Detection Diabetes Based on Photoplethysmography Signal. Information 2022, 13, 59. [Google Scholar] [CrossRef]
- Zhou, H.; Myrzashova, R.; Zheng, R. Diabetes Prediction Model Based on an Enhanced Deep Neural Network. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 148. [Google Scholar] [CrossRef]
- American Diabetes Association. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2018. Diabetes Care 2018, 41, S13–S27. [Google Scholar] [CrossRef]
- Thotad, P.N.; Bharamagoudar, G.R.; Anami, B.S. Diabetes Disease Detection and Classification on Indian Demographic and Health Survey Data Using Machine Learning Methods. Diabetes Metab. Syndr. Clin. Res. Rev. 2023, 17, 102690. [Google Scholar] [CrossRef]
- Azit, N.A.; Sahran, S.; Leow, V.M.; Subramaniam, M.; Mokhtar, S.; Nawi, A.M. Prediction of Hepatocellular Carcinoma Risk in Patients with Type-2 Diabetes Using Supervised Machine Learning Classification Model. Heliyon 2022, 8, e10772. [Google Scholar] [CrossRef]
- Aggarwal, S.; Pandey, K. Early Identification of PCOS with Commonly Known Diseases: Obesity, Diabetes, High Blood Pressure and Heart Disease Using Machine Learning Techniques. Expert Syst. Appl. 2023, 217, 119532. [Google Scholar] [CrossRef]
- Amour Diwan, S.; Sam, A. Diabetes Forecasting Using Supervised Learning Techniques. ACSIJ Adv. Comput. Sci. Int. J. 2014, 3, 10–18. [Google Scholar]
- Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting Diabetes Mellitus with Machine Learning Techniques. Front. Genet. 2018, 9, 515. [Google Scholar] [CrossRef] [PubMed]
- Mahboob Alam, T.; Iqbal, M.A.; Ali, Y.; Wahab, A.; Ijaz, S.; Imtiaz Baig, T.; Hussain, A.; Malik, M.A.; Raza, M.M.; Ibrar, S.; et al. A Model for Early Prediction of Diabetes. Inf. Med. Unlocked 2019, 16, 100204. [Google Scholar] [CrossRef]
- Kushwaha, S.; Srivastava, R.; Jain, R.; Sagar, V.; Aggarwal, A.K.; Bhadada, S.K.; Khanna, P. Harnessing Machine Learning Models for Non-Invasive Pre-Diabetes Screening in Children and Adolescents. Comput. Methods Programs Biomed. 2022, 226, 107180. [Google Scholar] [CrossRef]
- Carlos Padierna, L.; Fabián Amador-Medina, L.; Olivia Murillo-Ortiz, B.; Villaseñor-Mora, C. Classification Method of Peripheral Arterial Disease in Patients with Type 2 Diabetes Mellitus by Infrared Thermography and Machine Learning. Infrared Phys. Technol. 2020, 111, 103531. [Google Scholar] [CrossRef]
- Ganie, S.M.; Malik, M.B. An Ensemble Machine Learning Approach for Predicting Type-II Diabetes Mellitus Based on Lifestyle Indicators. Healthc. Anal. 2022, 2, 100092. [Google Scholar] [CrossRef]
- Khanam, J.J.; Foo, S.Y. A Comparison of Machine Learning Algorithms for Diabetes Prediction. ICT Express 2021, 7, 432–439. [Google Scholar] [CrossRef]
- Wei, H.; Sun, J.; Shan, W.; Xiao, W.; Wang, B.; Ma, X.; Hu, W.; Wang, X.; Xia, Y. Environmental Chemical Exposure Dynamics and Machine Learning-Based Prediction of Diabetes Mellitus. Sci. Total Environ. 2022, 806, 150674. [Google Scholar] [CrossRef]
- Pramanik, S.; Bandyopadhyay, S.K. Identifying Disease and Diagnosis in Females Using Machine Learning; IGI Global: Hershey, PA, USA, 2023; pp. 3120–3143. [Google Scholar] [CrossRef]
- Theerthagiri, P.; Ruby, A.U.; Vidya, J. Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms. SN Comput. Sci. 2023, 4, 72. [Google Scholar] [CrossRef]
- Pal, S.; Mishra, N.; Bhushan, M.; Kholiya, P.S.; Rana, M.; Negi, A. Deep Learning Techniques for Prediction and Diagnosis of Diabetes Mellitus. In Proceedings of the 2022 International Mobile and Embedded Technology Conference, MECON, Noida, India, 10–11 March 2022; pp. 588–593. [Google Scholar] [CrossRef]
- Allen, A.; Iqbal, Z.; Green-Saxena, A.; Hurtado, M.; Hoffman, J.; Mao, Q.; Das, R. Prediction of Diabetic Kidney Disease with Machine Learning Algorithms, upon the Initial Diagnosis of Type 2 Diabetes Mellitus. BMJ Open Diabetes Res. Care 2022, 10, e002560. [Google Scholar] [CrossRef]
- Saxena, R.; Sharma, S.K.; Gupta, M.; Sampada, G.C. A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Comput. Intell. Neurosci. 2022, 2022, 3820360. [Google Scholar] [CrossRef] [PubMed]
- Haq, A.U.; Li, J.P.; Khan, J.; Memon, M.H.; Nazir, S.; Ahmad, S.; Khan, G.A.; Ali, A. Intelligent Machine Learning Approach for Effective Recognition of Diabetes in E-Healthcare Using Clinical Data. Sensors 2020, 20, 2649. [Google Scholar] [CrossRef]
- Maniruzzaman, M.; Kumar, N.; Menhazul Abedin, M.; Shaykhul Islam, M.; Suri, H.S.; El-Baz, A.S.; Suri, J.S. Comparative Approaches for Classification of Diabetes Mellitus Data: Machine Learning Paradigm. Comput. Methods Programs Biomed. 2017, 152, 23–34. [Google Scholar] [CrossRef] [PubMed]
- Dutta, S.; S Manideep, B.C.; Basha, M.; Manideep, B.C.; Muzamil Basha, S.; Caytiles, R.D.; Ch N Iyengar, N.S. Classification of Diabetic Retinopathy Images by Using Deep Learning Models a Comparative Study of Deep Learning Models for Medical Image Classification View Project Bigdata Predictive Analytics View Project Classification of Diabetic Retinopathy Images by Using Deep Learning Models. Int. J. Grid Distrib. Comput. 2018, 11, 89–106. [Google Scholar] [CrossRef]
- Vasu, V.N.; Surendran, R.; Saravanan, M.S.; Madhusundar, N. Prediction of Defective Products Using Logistic Regression Algorithm against Linear Regression Algorithm for Better Accuracy. In Proceedings of the 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT, Sakheer, Bahrain, 20–21 November 2022; pp. 161–166. [Google Scholar] [CrossRef]
- Siddiqi, M.H.; Azad, M.; Alhwaiti, Y. An Enhanced Machine Learning Approach for Brain MRI Classification. Diagnostics 2022, 12, 2791. [Google Scholar] [CrossRef]
- Wu, Y.X.; Hu, Z.N.; Wang, Y.Y.; Min, F. Rare Potential Poor Household Identification with a Focus Embedded Logistic Regression. IEEE Access 2022, 10, 32954–32972. [Google Scholar] [CrossRef]
- Abdelhalim, A.; Traore, I. A New Method for Learning Decision Trees from Rules. In Proceedings of the 8th International Conference on Machine Learning and Applications, ICMLA 2009, Miami, FL, USA, 20–21 November 2022; pp. 693–698. [Google Scholar] [CrossRef]
- Bemando, C.; Miranda, E.; Aryuni, M. Machine-Learning-Based Prediction Models of Coronary Heart Disease Using Naïve Bayes and Random Forest Algorithms. In Proceedings of the 2021 International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM, Pekan, Malaysia, 13–15 December 2009; pp. 232–237. [Google Scholar] [CrossRef]
- Ismail, S.; Reza, H. Evaluation of Naive Bayesian Algorithms for Cyber-Attacks Detection in Wireless Sensor Networks. In Proceedings of the 2022 IEEE World AI IoT Congress, AIIoT, Seattle, WA, USA, 6–9 June 2022; pp. 283–289. [Google Scholar] [CrossRef]
- Ye, F.; Chen, G.; Liu, Q.; Zhang, L.; Qi, Q.; Hu, B.; Fan, X. A Spam Classification Method Based on Naive Bayes. In Proceedings of the IEEE 6th Information Technology and Mechatronics Engineering Conference, ITOEC 2022, Chongqing, China, 4–6 March 2022; pp. 1856–1861. [Google Scholar] [CrossRef]
- Tanveer, M.; Rajani, T.; Rastogi, R.; Shao, Y.H.; Ganaie, M.A. Comprehensive Review on Twin Support Vector Machines. Ann. Oper. Res. 2022, 3, 1–46. [Google Scholar] [CrossRef]
- Fathabadi, A.; Seyedian, S.M.; Malekian, A. Comparison of Bayesian, k-Nearest Neighbor and Gaussian Process Regression Methods for Quantifying Uncertainty of Suspended Sediment Concentration Prediction. Sci. Total Environ. 2022, 818, 151760. [Google Scholar] [CrossRef]
- Bruschetta, R.; Tartarisco, G.; Lucca, L.F.; Leto, E.; Ursino, M.; Tonin, P.; Pioggia, G.; Cerasa, A. Predicting Outcome of Traumatic Brain Injury: Is Machine Learning the Best Way? Biomedicines 2022, 10, 686. [Google Scholar] [CrossRef] [PubMed]
- Hu, M.; Tsang, E.C.C.; Guo, Y.; Chen, D.; Xu, W. Attribute Reduction Based on Overlap Degree and K-Nearest-Neighbor Rough Sets in Decision Information Systems. Inf. Sci. 2022, 584, 301–324. [Google Scholar] [CrossRef]
- Iparraguirre-Villanueva, O.; Guevara-Ponce, V.; Paredes, O.R.; Sierra-Liñan, F.; Zapata-Paulini, J.; Cabanillas-Carbonell, M. Convolutional Neural Networks with Transfer Learning for Pneumonia Detection. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 544–551. [Google Scholar] [CrossRef]








| Column | Not Empty Count | Dtype | 
|---|---|---|
| Number of pregnancies in women | 768 (not empty) | long64 | 
| Glucose level (amount of sugar in the blood) | 768 (not empty) | long64 | 
| Diastolic blood pressure | 768 (not empty) | long64 | 
| Thickness of skin folds | 768 (not empty) | long64 | 
| Insulin levels | 768 (not empty) | long64 | 
| BMI | 768 (not empty) | Float64 | 
| Genetic history of diabetes | 768 (not empty) | Float64 | 
| Age | 768 (not empty) | long64 | 
| Diabetes result (yes/no) | 768 (not empty) | long64 | 
| dtypes: float64 (2), long64 (7) | ||
| Pregnancies | Glucose | Blood Pressure | Skin Thickness | Insulin | BMI | Diabetes Pedigree Function | Age | Diabetes | |
|---|---|---|---|---|---|---|---|---|---|
| Quantity | 768 | 763 | 733 | 541 | 394 | 757 | 768 | 768 | 768 | 
| mean | 3.85 | 121.69 | 72.41 | 29.15 | 155.55 | 32.46 | 0.472 | 33.24 | 0.35 | 
| std | 3.37 | 30.54 | 12.38 | 10.48 | 118.78 | 6.92 | 0.333 | 11.76 | 0.48 | 
| minimum | 0 | 44 | 24 | 7 | 14 | 18.2 | 0.081 | 21 | 0.00 | 
| 25% | 1 | 99 | 64 | 22 | 76.25 | 27.5 | 0.254 | 24 | 0.00 | 
| 50% | 3 | 117 | 72 | 29 | 125 | 32.3 | 0.367 | 29 | 0.00 | 
| 75% | 6 | 141 | 80 | 36 | 190 | 36.6 | 0.653 | 41 | 1 | 
| maximum | 17 | 199 | 122 | 99 | 846 | 67.1 | 2.24 | 81 | 1 | 
| F1-Score | Accuracy | Precision | Recall | |
|---|---|---|---|---|
| K-NN | ||||
| SMOTE | 0.667 | 0.721 | 0.573 | 0.796 | 
| KNN | 0.612 | 0.753 | 0.682 | 0.556 | 
| PCA | 0.471 | 0.649 | 0.500 | 0.444 | 
| DT | ||||
| DT | 0.602 | 0.708 | 0.576 | 0.630 | 
| SMOTE | 0.590 | 0.721 | 0.608 | 0.574 | 
| PCA | 0423 | 0.610 | 0.440 | 0.407 | 
| LR | ||||
| SMOTE | 0.555 | 0.694 | 0.597 | 0.727 | 
| LR | 0.513 | 0.698 | 0.612 | 0.672 | 
| PCA | 0.485 | 0.594 | 0.530 | 0.648 | 
| BNB | ||||
| SMOTE | 0.677 | 0.692 | 0.479 | 0.772 | 
| BNB | 0.387 | 0.461 | 0.528 | 0.662 | 
| PCA | 0.529 | 0.539 | 0.582 | 0.597 | 
| SVM | ||||
| SMOTE | 0.56 | 0.701 | 0.588 | 0.717 | 
| SVM | 0.53 | 0.670 | 0.618 | 0.689 | 
| PCA | 0.461 | 0.628 | 0.539 | 0.462 | 
| Best Performing Models | ||||
|---|---|---|---|---|
| F1-Score | Accuracy | Precision | Recall | |
| K-NN (SMOTE) | 0.667 | 0.721 | 0.573 | 0.796 | 
| BNB (SMOTE) | 0.677 | 0.692 | 0.479 | 0.772 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iparraguirre-Villanueva, O.; Espinola-Linares, K.; Flores Castañeda, R.O.; Cabanillas-Carbonell, M. Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics 2023, 13, 2383. https://doi.org/10.3390/diagnostics13142383
Iparraguirre-Villanueva O, Espinola-Linares K, Flores Castañeda RO, Cabanillas-Carbonell M. Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics. 2023; 13(14):2383. https://doi.org/10.3390/diagnostics13142383
Chicago/Turabian StyleIparraguirre-Villanueva, Orlando, Karina Espinola-Linares, Rosalynn Ornella Flores Castañeda, and Michael Cabanillas-Carbonell. 2023. "Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes" Diagnostics 13, no. 14: 2383. https://doi.org/10.3390/diagnostics13142383
APA StyleIparraguirre-Villanueva, O., Espinola-Linares, K., Flores Castañeda, R. O., & Cabanillas-Carbonell, M. (2023). Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics, 13(14), 2383. https://doi.org/10.3390/diagnostics13142383
 
        




 
       