CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure
Abstract
1. Introduction
2. Literature Review
2.1. Dementia and Heart Failure Association
2.2. Predictive Model for Dementia Study
2.3. Predictive Model for Heart Failure Study
2.4. Dementia Feature Study
2.5. Classification Model for Dementia and Heart Failure
2.6. Data Synthetic Generation
2.7. The Proposed Study
3. Research Methodology
3.1. Data Collection
3.2. Data Preprocessing
3.2.1. Handling Disease Codes
3.2.2. Feature Encoding
3.2.3. Missing Value Imputation
3.3. Feature Engineering
3.3.1. Dataset Segregation
3.3.2. Feature Scaling
3.3.3. Skewed Data Normalization
3.3.4. Dataset Integration
3.4. Model Construction
3.4.1. Dataset Splitting for Training and Testing
3.4.2. Synthetic Data Generation Using CTGAN
3.4.3. Model Training and Testing
3.5. Model Evaluation
3.6. Feature Evaluation
3.6.1. Pearson Correlation Coefficient
3.6.2. Shapley Value
4. Experimental Results and Discussion
4.1. Pearson Correlation Coefficient of Original Dataset
4.2. Cross-Validation Evaluation
4.3. Performance Evaluation Metrics for Classification Models
4.3.1. Accuracy Evaluation
4.3.2. Precision Evaluation
4.3.3. Recall Evaluation
4.3.4. F1-Score Evaluation
4.3.5. AUC ROC Score Evaluation
4.4. Model Performance Discussion
4.5. Feature Contribution
4.5.1. Feature Importance Using XGBoost
4.5.2. Impact of Features on Model Classification by XGBoost
4.6. Study Scope and Limitations
4.7. Summary of the Findings
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CTGANs | Conditional Tabular Generative Adversarial Networks |
| HFSA | Heart Failure Society of America |
| MRI | Magnetic Resonance Imaging |
| GAN | Generative Adversarial Network |
| XGBoost | Extreme Gradient Boosting |
| LightGBM | Light Gradient Boosting Machine |
| GB | Gradient Boosting |
| RF | Random Forest |
| ET | Extra Trees |
| ML | Machine Learning |
| EHRs | Electronic Health Records |
| BUN | Blood Urea Nitrogen |
| DTs | Decision Trees |
| SMOTE | Synthetic Minority Oversampling Technique |
| AUC ROC | Area Under the Receiver-Operating Characteristic Curve |
| ICD-10 | International Classification of Diseases, Tenth Revision |
| HDL | High-Density Lipoprotein |
| SBP | Systolic Blood Pressure |
| DBP | Diastolic Blood Pressure |
| FBS | Fasting Blood Sugar |
| LDL | Low-Density Lipoprotein |
| HB | Hemoglobin |
| K | Potassium |
| NA | Sodium |
| WBC | White Blood Cells |
| SDX | Secondary Diagnosis |
| TP | True Positive |
| TN | True Negative |
| FP | False Positive |
| FN | False Negative |
| SHAP | Shapley Additive Explanation |
Appendix A. Point-Biserial Correlation with Target


References
- Rudnicka, E.; Napierała, P.; Podfigurna, A.; Męczekalski, B.; Smolarczyk, R.; Grymowicz, M. The World Health Organization (WHO) approach to healthy ageing. Maturitas 2020, 139, 6–11. [Google Scholar] [CrossRef]
- Jaul, E.; Barron, J. Age-Related Diseases and Clinical and Public Health Implications for the 85 Years Old and Over Population. Front. Public Health 2017, 5, 335. [Google Scholar] [CrossRef] [PubMed]
- Cross, P.I. Coronary Heart Disease Diagnosis Before Age 45 May Increase Dementia Risk by 36%. Medical News Today. Available online: https://www.medicalnewstoday.com/articles/coronary-heart-disease-early-diagnosis-age-45-increase-dementia-risk#Heart-health-important-when-evaluating-dementia-risk (accessed on 6 December 2023).
- Wolters, F.J.; Segufa, R.A.; Darweesh, S.K.L.; Bos, D.; Ikram, M.A.; Sabayan, B.; Hofman, A.; Sedaghat, S. Coronary heart disease, heart failure, and the risk of dementia: A systematic review and meta-analysis. Alzheimer’s Dement. J. Alzheimer’s Assoc. 2018, 14, 1493–1504. [Google Scholar] [CrossRef] [PubMed]
- Yap, N.L.X.; Kor, Q.; Teo, Y.N.; Tey, T.; How, C.H.; Chan, Y.H.; Lam, C.S.P. Prevalence and incidence of cognitive impairment and dementia in heart failure: A systematic review, meta-analysis and meta-regression. Hell. J. Cardiol. 2022, 67, 48–58. [Google Scholar] [CrossRef] [PubMed]
- Arvanitakis, Z.; Bennett, D.A. What Is Dementia? JAMA 2019, 322, 1728. [Google Scholar] [CrossRef]
- Yang, H.; Bath, P.A. The use of data mining methods for the prediction of dementia: Evidence from the English Longitudinal Study of Ageing. IEEE J. Biomed. Health Inform. 2020, 24, 345–353. [Google Scholar] [CrossRef]
- Arvanitakis, Z.; Shah, R.C.; Bennett, D.A. Diagnosis and Management of Dementia: Review. JAMA 2019, 322, 1589–1599. [Google Scholar] [CrossRef]
- Ryu, S.-E.; Shin, D.-H.; Chung, K. Prediction model of dementia risk based on XGBoost using derived variable extraction and hyperparameter optimization. IEEE Access 2020, 8, 177708–177720. [Google Scholar] [CrossRef]
- Ferreira-Vieira, T.H.; Guimaraes, I.M.; Silva, F.R.; Ribeiro, F.M. Alzheimer’s disease: Targeting the Cholinergic System. Curr. Neuropharmacol. 2016, 14, 101–115. [Google Scholar] [CrossRef]
- Harrington, D.; McDonald Lenahan, C.; Beacom, R. Heart failure management: Updated guidelines. Am. Nurse J. 2023, 18, 6–11. [Google Scholar] [CrossRef]
- Guha, K.; McDonagh, T. Heart failure epidemiology: European perspective. Curr. Cardiol. Rev. 2013, 9, 123–127. [Google Scholar] [CrossRef]
- Berliner, D.; Hänselmann, A.; Bauersachs, J. The Treatment of Heart Failure with Reduced Ejection Fraction. Dtsch. Arztebl. Int. 2020, 117, 376–386. [Google Scholar] [CrossRef]
- Bader, F.; Atallah, B.; Brennan, L.F.; Rimawi, R.H.; Khalil, M.E. Heart failure in the elderly: Ten peculiar management considerations. Heart Fail. Rev. 2017, 22, 219–228. [Google Scholar] [CrossRef]
- Pellegrini, E.; Ballerini, L.; Hernandez, M.D.C.V.; Chappell, F.M.; González-Castro, V.; Anblagan, D.; Danso, S.; Muñoz-Maniega, S.; Job, D.; Pernet, C.; et al. Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: A systematic review. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2018, 10, 519–535. [Google Scholar] [CrossRef]
- Grueso, S.; Viejo-Sobera, R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: A systematic review. Alzheimer’s Res. Ther. 2021, 13, 162. [Google Scholar] [CrossRef] [PubMed]
- Javeed, A.; Dallora, A.L.; Berglund, J.S.; Skoog, I.; Anderberg, P. Machine learning for dementia prediction: A systematic review and future research directions. J. Med. Syst. 2023, 47, 17. [Google Scholar] [CrossRef] [PubMed]
- Justin, B.N.; Turek, M.; Hakim, A.M. Heart disease as a risk factor for dementia. Clin. Epidemiol. 2013, 5, 135–145. [Google Scholar] [CrossRef]
- Winston, C.N.; Goetzl, E.J.; Akers, J.C.; Carter, B.S.; Rockenstein, E.M.; Galasko, D.; Masliah, E.; Rissman, R.A. Prediction of conversion from mild cognitive impairment to dementia with neuronally derived blood exosome protein profile. Alzheimer’s Dement. Diagn. Assess. Dis. Monit. 2016, 3, 63–72. [Google Scholar] [CrossRef] [PubMed]
- Alves, T.C.; Rays, J.; Fráguas, R., Jr.; Wajngarten, M.; Meneghetti, J.C.; Prando, S.; Busatto, G.F. Localized cerebral blood flow reductions in patients with heart failure: A study using 99mTc-HMPAO SPECT. J. Neuroimaging Off. J. Am. Soc. Neuroimaging 2005, 15, 150–156. [Google Scholar] [CrossRef]
- Gruhn, N.; Larsen, F.S.; Boesgaard, S.; Knudsen, G.M.; Mortensen, S.A.; Thomsen, G.; Aldershvile, J. Cerebral blood flow in patients with chronic heart failure before and after heart transplantation. Stroke 2001, 32, 2530–2533. [Google Scholar] [CrossRef]
- Hjelm, C.; Broström, A.; Dahl, A.; Johansson, B.; Fredrikson, M.; Strömberg, A. Factors associated with increased risk for dementia in individuals age 80 years or older with congestive heart failure. J. Cardiovasc. Nurs. 2014, 29, 82–90. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, M.R.; Zhang, Y.; Feng, Z.; Lo, B.; Inan, O.T.; Liao, H. Neuroimaging and Machine Learning for Dementia Diagnosis: Recent Advancements and Future Prospects. IEEE Rev. Biomed. Eng. 2019, 12, 19–33. [Google Scholar] [CrossRef] [PubMed]
- Zeighami, Y.; Fereshtehnejad, S.M.; Dadar, M.; Collins, D.L.; Postuma, R.B.; Mišić, B.; Dagher, A. A clinical-anatomical signature of Parkinson’s disease identified with partial least squares and magnetic resonance imaging. NeuroImage 2019, 190, 69–78. [Google Scholar] [CrossRef] [PubMed]
- Armananzas, R.; Iglesias, M.; Morales, D.A.; Alonso-Nanclares, L. Voxel-Based Diagnosis of Alzheimer’s Disease Using Classifier Ensembles. IEEE J. Biomed. Health Inform. 2017, 21, 778–784. [Google Scholar] [CrossRef]
- Sharma, S.; Gupta, S.; Gupta, D.; Altameem, A.; Saudagar, A.K.J.; Poonia, R.C.; Nayak, S.R. HTLML: Hybrid AI Based Model for Detection of Alzheimer’s Disease. Diagnostics 2022, 12, 1833. [Google Scholar] [CrossRef]
- Lombardi, G.; Crescioli, G.; Cavedo, E.; Lucenteforte, E.; Casazza, G.; Bellatorre, A.G.; Lista, C.; Costantino, G.; Frisoni, G.; Virgili, G.; et al. Structural magnetic resonance imaging for the early diagnosis of dementia due to Alzheimer’s disease in people with mild cognitive impairment. Cochrane Database Syst. Rev. 2020, 3, CD009628. [Google Scholar] [CrossRef]
- Buckner, R.L.; Snyder, A.Z.; Sanders, A.L.; Raichle, M.E.; Morris, J.C. Functional brain imaging of young, nondemented, and demented older adults. J. Cogn. Neurosci. 2000, 12 (Suppl. 2), 24–34. [Google Scholar] [CrossRef]
- Nadel, J.; McNally, J.S.; DiGiorgio, A.; Grandhi, R. Emerging Utility of Applied Magnetic Resonance Imaging in the Management of Traumatic Brain Injury. Med. Sci. 2021, 9, 10. [Google Scholar] [CrossRef]
- Roopalakshmi, R.; Sreelatha, R. A novel light GBM framework combined with gain optimization for predicting dementia severity of Alzheimer’s disease. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2024, 12, 2422408. [Google Scholar] [CrossRef]
- Armañanzas, R. Consensus Policies to Solve Bioinformatic Problems: Through Bayesian Network Classifiers and Estimation of Distribution Algorithms; LAP Lambert Academic Publishing: Saarbrücken, Germany, 2012. [Google Scholar]
- Aldossary, Y.; Ebrahim, M.; Hewahi, N. A comparative study of heart disease prediction using tree-based ensemble classification techniques. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 25–26 October 2022; IEEE: New York, NY, USA, 2022; pp. 353–357. [Google Scholar] [CrossRef]
- Kerexeta, J.; Larburu, N.; Escolar, V.; Lozano-Bahamonde, A.; Macía, I.; Beristain Iraola, A.; Graña, M. Prediction and Analysis of Heart Failure Decompensation Events Based on Telemonitored Data and Artificial Intelligence Methods. J. Cardiovasc. Dev. Dis. 2023, 10, 48. [Google Scholar] [CrossRef]
- Mavrogiorgou, A.; Kiourtis, A.; Kleftakis, S.; Mavrogiorgos, K.; Zafeiropoulos, N.; Kyriazis, D. A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions. Sensors 2022, 22, 8615. [Google Scholar] [CrossRef]
- Ali, L.; Niamat, A.; Khan, J.A.; Golilarz, N.A.; Xingzhong, X.; Noor, A.; Nour, R.; Bukhari, S.A.C. An optimized stacked support vector machines–based expert system for the effective prediction of heart failure. IEEE Access 2019, 7, 54007–54014. [Google Scholar] [CrossRef]
- Omotehinwa, T.O.; Oyewola, D.O.; Moung, E.G. Optimizing the light gradient-boosting machine algorithm for an efficient early detection of coronary heart disease. Inform. Health 2024, 1, 70–81. [Google Scholar] [CrossRef]
- Ahmed, F.; Saleem, M.; Noor, A. Intelligent Heart Disease Prediction Using CatBoost Empowered with XAI. Int. J. Comput. Innov. Sci. 2023, 2, 8–13. [Google Scholar]
- Mahesh, T.R.; Dhilip Kumar, V.; Vinoth Kumar, V.; Asghar, J.; Geman, O.; Arulkumaran, G.; Arun, N. AdaBoost Ensemble Methods Using K-Fold Cross Validation for Survivability with the Early Detection of Heart Disease. Comput. Intell. Neurosci. 2022, 2022, 9005278. [Google Scholar] [CrossRef]
- Sumwiza, K.; Twizere, C.; Rushingabigwi, G.; Bakunzibake, P.; Bamurigire, P. Enhanced cardiovascular disease prediction model using random forest algorithm. Inform. Med. Unlocked 2023, 41, 101316. [Google Scholar] [CrossRef]
- Angelillo, M.T.; Balducci, F.; Impedovo, D.; Pirlo, G.; Vessio, G. Attentional pattern classification for automatic dementia detection. IEEE Access 2019, 7, 57706–57716. [Google Scholar] [CrossRef]
- Guidi, G.; Pettenati, M.C.; Melillo, P.; Iadanza, E. A machine learning system to improve heart failure patient assistance. IEEE J. Biomed. Health Inform. 2014, 18, 1750–1756. [Google Scholar] [CrossRef]
- Drvenica, I.T.; Stančić, A.Z.; Maslovarić, I.S.; Trivanović, D.I.; Ilić, V.L. Extracellular Hemoglobin: Modulation of Cellular Functions and Pathophysiological Effects. Biomolecules 2022, 12, 1708. [Google Scholar] [CrossRef]
- Zhong, X.; Na, Y.; Yin, S.; Yan, C.; Gu, J.; Zhang, N.; Geng, F. Cell Membrane Biomimetic Nanoparticles with Potential in Treatment of Alzheimer’s Disease. Molecules 2023, 28, 2336. [Google Scholar] [CrossRef]
- Li, X.; Tan, C.; Zhang, W.; Zhou, J.; Wang, Z.; Wang, S.; Wang, J.; Wei, L. Correlation Between Platelet and Hemoglobin Levels and Pathological Characteristics and Prognosis of Early-Stage Squamous Cervical Carcinoma. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 2015, 21, 3921–3928. [Google Scholar] [CrossRef] [PubMed]
- Dos Santos, G.A.A.; Pardi, P.C. Biomarkers in Alzheimer’s disease: Evaluation of platelets, hemoglobin and vitamin B12. Dement. Neuropsychol. 2020, 14, 35–40. [Google Scholar] [CrossRef] [PubMed]
- Murdaca, G.; Banchero, S.; Casciaro, M.; Tonacci, A.; Billeci, L.; Nencioni, A.; Pioggia, G.; Genovese, S.; Monacelli, F.; Gangemi, S. Potential Predictors for Cognitive Decline in Vascular Dementia: A Machine Learning Analysis. Processes 2022, 10, 2088. [Google Scholar] [CrossRef]
- Rustam, F.; Aslam, N.; De La Torre Díez, I.; Khan, Y.D.; Mazón, J.L.V.; Rodríguez, C.L.; Ashraf, I. White Blood Cell Classification Using Texture and RGB Features of Oversampled Microscopic Images. Healthcare 2022, 10, 2230. [Google Scholar] [CrossRef]
- Yilmaz, M.; Tenekecioglu, E.; Arslan, B.; Bekler, A.; Ozluk, O.A.; Karaagac, K.; Agca, F.V.; Peker, T.; Akgumus, A. White Blood Cell Subtypes and Neutrophil-Lymphocyte Ratio in Prediction of Coronary Thrombus Formation in Non-ST-Segment Elevated Acute Coronary Syndrome. Clin. Appl. Thromb. /Hemost. Off. J. Int. Acad. Clin. Appl. Thromb./Hemost. 2015, 21, 446–452. [Google Scholar] [CrossRef]
- Cisternas, P.; Lindsay, C.B.; Salazar, P.; Silva-Alvarez, C.; Retamales, R.M.; Serrano, F.G.; Vio, C.P.; Inestrosa, N.C. The increased potassium intake improves cognitive performance and attenuates histopathological markers in a model of Alzheimer’s disease. Biochim. Biophys. Acta 2015, 1852, 2630–2644. [Google Scholar] [CrossRef]
- Wallig, M.A.; Haschek, W.M.; Rousseaux, C.G.; Bolon, B.; Mahler, B.W. (Eds.) Fundamentals of Toxicologic Pathology, 3rd ed.; Academic Press: Cambridge, MA, USA, 2018. [Google Scholar] [CrossRef]
- Deeks, A.; Lombard, C.; Michelmore, J.; Teede, H. The effects of gender and age on health related behaviors. BMC Public Health 2009, 9, 213. [Google Scholar] [CrossRef]
- Schäfer, I.; Hansen, H.; Schön, G.; Höfels, S.; Altiner, A.; Dahlhaus, A.; Gensichen, J.; Riedel-Heller, S.; Weyerer, S.; Blank, W.A.; et al. The influence of age, gender and socio-economic status on multimorbidity patterns in primary care. First results from the multicare cohort study. BMC Health Serv. Res. 2012, 12, 89. [Google Scholar] [CrossRef]
- Shen, F.X.; Wolf, S.M.; Bhavnani, S.; Deoni, S.; Elison, J.T.; Fair, D.; Garwood, M.; Gee, M.S.; Geethanath, S.; Kay, K.; et al. Emerging ethical issues raised by highly portable MRI research in remote and resource-limited international settings. NeuroImage 2021, 238, 118210. [Google Scholar] [CrossRef]
- Dairi, A.; Harrou, F.; Sun, Y. Deep generative learning-based 1-SVM detectors for unsupervised COVID-19 infection detection using blood tests. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
- Yang, H.; Chen, Z.; Yang, H.; Tian, M. Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison. IEEE Access 2023, 11, 23366–23380. [Google Scholar] [CrossRef]
- Theerthagiri, P. Predictive analysis of cardiovascular disease using gradient boosting based learning and recursive feature elimination technique. Intell. Syst. Appl. 2022, 16, 200121. [Google Scholar] [CrossRef]
- Budholiya, K.; Shrivastava, S.K.; Sharma, V. An optimized XGBoost based diagnostic system for effective prediction of heart disease. J. King Saud. Univ.–Comput. Inf. Sci. 2022, 34, 4514–4523. [Google Scholar] [CrossRef]
- Guo, X.; Hao, P. Using a Random Forest Model to Predict the Location of Potential Damage on Asphalt Pavement. Appl. Sci. 2021, 11, 10396. [Google Scholar] [CrossRef]
- Purwanto, A.D.; Wikantika, K.; Deliar, A.; Darmawan, S. Decision Tree and Random Forest Classification Algorithms for Mangrove Forest Mapping in Sembilang National Park, Indonesia. Remote Sens. 2023, 15, 16. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Chaiyo, Y.; Rueangsirarak, W.; Hristov, G.; Temdee, P. Improving Early Detection of Dementia: Extra Trees-Based Classification Model Using Inter-Relation-Based Features and K-Means Synthetic Minority Oversampling Technique. Big Data Cogn. Comput. 2025, 9, 148. [Google Scholar] [CrossRef]
- Phanbua, P.; Arwatchananukul, S.; Temdee, P. Classification model of dementia and heart failure in older adults using Extra Trees and oversampling-based technique. In Proceedings of the 2025 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), Nan, Thailand, 29 January–1 February 2025; IEEE: New York, NY, USA, 2025; pp. 322–327. [Google Scholar] [CrossRef]
- Yongcharoenchaiyasit, K.; Arwatchananukul, S.; Temdee, P.; Prasad, R. Gradient boosting–based model for elderly heart failure, aortic stenosis, and dementia classification. IEEE Access 2023, 11, 48677–48696. [Google Scholar] [CrossRef]
- Yongcharoenchaiyasit, K.; Arwatchananukul, S.; Hristov, G.; Temdee, P. Enhanced Multi-Model Machine Learning-Based Dementia Detection Using a Data Enrichment Framework: Leveraging the Blessing of Dimensionality. Bioengineering 2025, 12, 592. [Google Scholar] [CrossRef]
- Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef]
- Ferdib-Al-Islam Sanim, M.S.; Islam, M.R.; Rahman, S.; Afzal, R.; Hasan, K.M. Prediction of dementia using SMOTE-based oversampling and stacking classifier. In Hybrid Intelligent Systems; Springer Nature: Cham, Switzerland, 2023; pp. 441–452. [Google Scholar] [CrossRef]
- Ishaq, A.; Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 2021, 9, 39707–39716. [Google Scholar] [CrossRef]
- Oyama, K.; Isogai, T.; Nakayama, Y.; Kobayashi, R.; Kitano, D.; Sakatani, K.; Karako, K. Enhancing dementia risk screening with CTGAN-synthesized periodontal examination and general blood test data. In Proceedings of the 2024 IEEE International Conference on Digital Health (ICDH), Shenzhen, China, 7–13 July 2024; IEEE: New York, NY, USA, 2024; pp. 76–78. [Google Scholar] [CrossRef]
- Abd-Alhussain, R.S.; Obayes, H.K.; Al-Shareefi, F. Utilizing Synthetic Tabular Data Method to Improve Heart Attack Prediction Accuracy. Al-Salam J. Eng. Technol. 2023, 3, 11–22. [Google Scholar] [CrossRef]
- Rácz, A.; Bajusz, D.; Héberger, K. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules 2021, 26, 1111. [Google Scholar] [CrossRef]
- Ghanem, M.; Ghaith, A.K.; El-Hajj, V.G.; Bhandarkar, A.; de Giorgio, A.; Elmi-Terander, A.; Bydon, M. Limitations in Evaluating Machine Learning Models for Imbalanced Binary Outcome Classification in Spine Surgery: A Systematic Review. Brain Sci. 2023, 13, 1723. [Google Scholar] [CrossRef]
- Gómez Penedo, J.M.; Flückiger, C. How to interpret correlational process-outcome effect sizes in psychotherapy: A meta-analytic benchmark study. Psychother. Res. 2025, 1–11. [Google Scholar] [CrossRef]









| Features | Dementia (n = 3372) | Heart Failure (n = 7752) | p-Value |
|---|---|---|---|
| Age | 74.86 ± 8.44 | 73.45 ± 8.66 | 1.826 × 10−15 |
| Biological Characteristics | Male = 1551 Female = 1821 | Male = 3307 Female = 4445 | 1.195 × 10−3 |
| Weight | 52.52 ± 20.43 | 58.39 ± 11.76 | 3.046 × 10−25 |
| Height | 154.97 ± 11.68 | 154.74 ± 7.04 | 4.666 × 10−1 |
| Systolic Blood Pressure (SBP) | 130.19 ± 47.9 | 127.39 ± 17.15 | 3.939 × 10−3 |
| Diastolic Blood Pressure (DBP) | 70.07 ± 12.63 | 68.15 ± 11.33 | 5.571 × 10−5 |
| Cholesterol | 183.69 ± 34.52 | 163.81 ± 30.15 | 7.832 × 10−98 |
| Fasting Blood Sugar (FBS) | 122.2 ± 37.22 | 126.48 ± 39.2 | 4.246 × 10−4 |
| Creatinine | 1.26 ± 0.6 | 1.77 ± 1.91 | 5.392 × 10−30 |
| Triglycerides | 116.5 ± 28.87 | 118.98 ± 53.31 | 2.490 × 10−2 |
| Low-Density Lipoprotein (LDL) | 120.07 ± 27.46 | 111.44 ± 25.29 | 2.310 × 10−18 |
| High-Density Lipoprotein (HDL) | 48.36 ± 8.76 | 40.8 ± 9.12 | 3.882 × 10−177 |
| Blood Urea Nitrogen (BUN) | 19.63 ± 8.16 | 29.98 ± 21.66 | 1.462 × 10−50 |
| Hemoglobin (HB) | 11.45 ± 1.5 | 11.19 ± 2.2 | 9.194 × 10−5 |
| Potassium (K) | 3.97 ± 0.32 | 3.93 ± 0.62 | 4.791 × 10−3 |
| Sodium (NA) | 137.77 ± 2.15 | 137.67 ± 3.64 | 1.970 × 10−1 |
| White Blood Cells (WBC) | 7592.87 ± 1608.01 | 9788.52 ± 3417.32 | 1.530 × 10−132 |
| Neutrophils | 65.67 ± 8.38 | 78.03 ± 14.83 | 5.666 × 10−211 |
| Platelets | 233,121.44 ± 60,042.44 | 241,230.45 ± 73,159.79 | 5.755 × 10−5 |
| Lymphocytes | 22.1 ± 6.66 | 16.67 ± 8.92 | 1.669 × 10−110 |
| Smoker | 14 smokers 126 non-smokers | 150 smokers 1708 non-smokers | 5.213 × 10−1 |
| Drinker | 23 drinkers 117 non-drinkers | 139 drinkers 1719 non-drinkers | 3.441 × 10−4 |
| Secondary Diagnosis (SDX) | 696 with SDX 2676 without SDX | 2166 with SDX 5586 without SDX | 1.426 × 10−199 |
| Class | Classified Class | ||
|---|---|---|---|
| Actual Class | Disease | Heart Failure | Dementia |
| Heart Failure | True Positive | False Positive | |
| Dementia | False Negative | True Negative | |
| Pearson Correlation Coefficient | Correlation Type | Interpretation |
|---|---|---|
| Between 0 and 1 | Positive Correlation | The features change consistency in the same direction |
| 0 | No Correlation | There is no correlation between the features |
| Between 0 and −1 | Negative Correlation | The features change consistency in the opposite direction |
| Pearson Correlation Coefficient (r) | Correlation Type | Interpretation |
|---|---|---|
| 0 < r < 0.027 | Positive Negligible | The feature shows a very weak positive association |
| 0.027 ≤ r < 0.050 | Positive Weak | The feature changes in the same direction with a weak relationship. |
| 0.050 ≤ r < 0.124 | Positive Moderate | The feature changes in the same direction with a moderate relationship. |
| r ≥ 0.124 | Positive Strong | The feature strongly changes in the same direction. |
| −0.027 < r < 0 | Negative Negligible | The feature shows a very weak negative association. |
| −0.050 < r ≤ −0.027 | Negative Weak | The feature changes in the opposite direction with a weak relationship. |
| −0.124 < r ≤ −0.050 | Negative Moderate | The feature changes in the opposite direction with a moderate relationship. |
| r ≤ −0.124 | Negative Strong | The feature strongly changes in the opposite direction. |
| Classifier | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC ROC (%) |
|---|---|---|---|---|---|
| XGBoost | 96.37 ± 0.26 | 96.26 ± 0.41 | 95.10 ± 0.31 | 95.65 ± 0.31 | 99.04 ± 0.31 |
| LightGBM | 96.31 ± 0.24 | 96.20 ± 0.23 | 95.02 ± 0.35 | 95.58 ± 0.35 | 99.04 ± 0.35 |
| GB | 95.89 ± 0.15 | 95.99 ± 0.20 | 94.22 ± 0.17 | 95.04 ± 0.17 | 98.14 ± 0.17 |
| RF | 95.70 ± 0.11 | 95.87 ± 0.15 | 93.89 ± 0.14 | 94.80 ± 0.14 | 98.52 ± 0.14 |
| ET | 94.47 ± 0.17 | 93.98 ± 0.17 | 92.81 ± 0.28 | 93.37 ± 0.28 | 97.91 ± 0.28 |
| Classifier | Synthetic Data Generation Using CTGAN | Synthetic Data Generation Using SMOTE | Baseline Model | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC ROC (%) | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC ROC (%) | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC ROC (%) | |
| XGBoost | 96.94 | 96.93 | 95.79 | 96.34 | 99.09 | 96.45 | 96.11 | 95.44 | 95.77 | 99.13 | 96.36 | 96.29 | 95.04 | 95.63 | 99.06 |
| LightGBM | 96.67 | 96.60 | 95.48 | 96.01 | 99.08 | 96.36 | 96.12 | 95.21 | 95.65 | 99.12 | 96.13 | 95.95 | 94.84 | 95.37 | 99.09 |
| GB | 95.96 | 96.07 | 94.29 | 95.12 | 98.15 | 95.33 | 94.93 | 93.92 | 94.40 | 98.41 | 95.91 | 96.23 | 94.05 | 95.04 | 98.55 |
| RF | 95.91 | 96.18 | 94.09 | 95.05 | 98.58 | 95.87 | 96.15 | 94.01 | 94.99 | 98.79 | 95.87 | 96.44 | 93.76 | 94.96 | 98.97 |
| ET | 95.28 | 94.86 | 93.89 | 94.35 | 98.18 | 95.37 | 95.94 | 93.07 | 94.35 | 98.52 | 95.10 | 96.13 | 92.33 | 93.96 | 98.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Phanbua, P.; Arwatchananukul, S.; Hristov, G.; Temdee, P. CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure. Inventions 2025, 10, 101. https://doi.org/10.3390/inventions10060101
Phanbua P, Arwatchananukul S, Hristov G, Temdee P. CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure. Inventions. 2025; 10(6):101. https://doi.org/10.3390/inventions10060101
Chicago/Turabian StylePhanbua, Pornthep, Sujitra Arwatchananukul, Georgi Hristov, and Punnarumol Temdee. 2025. "CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure" Inventions 10, no. 6: 101. https://doi.org/10.3390/inventions10060101
APA StylePhanbua, P., Arwatchananukul, S., Hristov, G., & Temdee, P. (2025). CTGAN-Augmented Ensemble Learning Models for Classifying Dementia and Heart Failure. Inventions, 10(6), 101. https://doi.org/10.3390/inventions10060101

