Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm
Abstract
:1. Introduction
2. Related Work
3. Methodology
3.1. Machine Learning Algorithms
3.2. Logistic Regression
3.3. Decision Tree Algorithm
3.4. Random Forest
3.5. Dataset
- Sex: This is the sex of the patient: “Male”, “Female” or “Other”.
- Age: age of the patient.
- Hypertension: 0 if the patient doesn’t have hypertension; 1 if the patient has hypertension.
- Heart_Disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease.
- Ever_Married: “No” or “Yes”.
- Work_Type: “children”, “Govt_jov”, “Never_worked”, “Private” or “Self-employed”.
- Residence_Type: “Rural” or “Urban”
- Avg_Glucose_Level: average glucose level in blood
- BMI: body mass index of the patients
- Smoking_Status: The smoking status of the patients: “formerly smoked”, “never smoked”, “smokes” or “Unknown”.
3.6. Evaluation Metrics
4. Result
4.1. Exploratory Data Analysis (EDA)
4.2. Classification Results
4.2.1. Logistic Regression
4.2.2. Decision Tree
4.2.3. Random Forest
4.3. Model Accuracy
4.4. Predictor Importance
4.5. Model Classification Result
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- WHO. World Health Organisation. 9 December 2020. Available online: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (accessed on 4 June 2023).
- Mathers, C.D.; Lopez, A.D.; Murray, C.J. The burden of disease and mortality by condition: Data, methods, and results for 2001. In Global Burden of Disease and Risk Factors; Oxford University Press: New York, NY, USA; The World Bank: Washington, DC, USA, 2006; Volume 45. [Google Scholar]
- Rothwell, P.M.; Coull, A.J.; Silver, L.E.; Fairhead, J.F.; Giles, M.F.; Lovelock, C.E.; Redgrave, J.N.E.; Bull, L.M.; Welch, S.J.V.; Cuthbertson, F.C.; et al. Population-based study of event-rate, incidence, case fatality, and mortality for all acute vascular events in all arterial territories (Oxford Vascular Study). Lancet 2005, 366, 1773–1783. [Google Scholar] [CrossRef] [PubMed]
- Roger, V.L.; Go, A.S.; Lloyd-Jones, D.M.; Adams, R.J.; Berry, J.D.; Brown, T.M.; Carnethon, M.R.; Dai, S.; De Simone, G.; Ford, E.S.; et al. Heart disease and stroke statistics—2011 update: A report from the American Heart Association. Circulation 2011, 123, e18–e209. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Warlow, C.P. Epidemiology of stroke. Lancet 1998, 352, S1–S4. [Google Scholar] [CrossRef]
- Dev, S.; Wang, H.; Nwosu, C.S.; Jain, N.; Veeravalli, B.; John, D. A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2022, 2, 100032. [Google Scholar] [CrossRef]
- Elbagoury, B.M.; Vladareanu, L.; Vlădăreanu, V.; Salem, A.B.; Travediu, A.M.; Roushdy, M.I.A. Hybrid Stacked CNN and Residual Feedback GMDH-LSTM Deep Learning Model for Stroke Prediction Applied on Mobile AI Smart Hospital Platform. Sensors 2023, 23, 3500. [Google Scholar] [CrossRef]
- Kaur, M.; Sakhare, S.R.; Wanjale, K.; Akter, F. Early stroke prediction methods for prevention of strokes. Behav. Neurol. 2022, 2022, 7725597. [Google Scholar] [CrossRef] [PubMed]
- Thanka, M.R.; Ram, K.S.; Gandu, S.P.; Edwin, E.B.; Ebenezer, V.; Joy, P. Comparing Resampling Techniques in Stroke Prediction with Machine and Deep Learning. In Proceedings of the 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 14–16 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1415–1420. [Google Scholar]
- Huang, R.; Liu, J.; Wan, T.K.; Siriwanna, D.; Woo, Y.M.P.; Vodencarevic, A.; Chan, K.H.K. Stroke mortality prediction based on ensemble learning and the combination of structured and textual data. Comput. Biol. Med. 2023, 155, 106176. [Google Scholar] [CrossRef]
- Cao, M.; Yin, D.; Zhong, Y.; Lv, Y.; Lu, L. Detection of geochemical anomalies related to mineralization using the Random Forest model optimized by the Competitive Mechanism and Beetle Antennae Search. J. Geochem. Explor. 2023, 249, 107195. [Google Scholar] [CrossRef]
- Dinh, T.P.; Pham-Quoc, C.; Thinh, T.N.; Do Nguyen, B.K.; Kha, P.C. A flexible and efficient FPGA-based random forest architecture for IoT applications. Internet Things 2023, 22, 100813. [Google Scholar] [CrossRef]
- Koohmishi, M.; Azarhoosh, A.; Naderpour, H. Assessing the key factors affecting the substructure of ballast-less railway track under moving load using a double-beam model and random forest method. Structures 2023, 55, 1388–1405. [Google Scholar] [CrossRef]
- Amini, L.; Azarpazhouh, R.; Farzadfar, M.T.; Mousavi, S.A.; Jazaieri, F.; Khorvash, F.; Norouzi, R.; Toghianfar, N. Prediction and control of stroke by data mining. Int. J. Prev. Med. 2013, 4 (Suppl. S2), S245. [Google Scholar]
- Govindarajan, P.; Soundarapandian, R.K.; Gandomi, A.H.; Patan, R.; Jayaraman, P.; Manikandan, R. Classification of stroke disease using machine learning algorithms. Neural Comput. Appl. 2020, 32, 817–828. [Google Scholar] [CrossRef]
- Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Cheng, D. Learning k for knn classification. ACM Trans. Intell. Syst. Technol. 2017, 8, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Chin, C.L.; Lin, B.J.; Wu, G.R.; Weng, T.C.; Yang, C.S.; Su, R.C.; Pan, Y.J. An automated early ischemic stroke detection system using CNN deep learning algorithm. In Proceedings of the 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST), Taichung, Taiwan, 8–10 November 2017. [Google Scholar]
- Cheon, S.; Kim, J.; Lim, J. The use of deep learning to predict stroke patient mortality. Int. J. Environ. Res. Public Health 2019, 16, 1876. [Google Scholar] [CrossRef] [Green Version]
- Singh, M.S.; Choudhary, P.; Thongam, K. A comparative analysis for various stroke prediction techniques. In Proceedings of the Computer Vision and Image Processing: 4th International Conference, CVIP 2019, Jaipur, India, 27–29 September 2019; Revised Selected Papers, Part II. 2020. [Google Scholar]
- Kansadub, T.; Thammaboosadee, S.; Kiattisin, S.; Jalayondeja, C. Stroke risk prediction model based on demographic data. In Proceedings of the 2015 IEEE 8th Biomedical Engineering International Conference (BMEiCON), Pattaya, Thailand, 25–27 November 2015. [Google Scholar]
- Süt, N.; Çelik, Y. Prediction of mortality in stroke patients using multilayer perceptron neural networks. Turk. J. Med. Sci. 2012, 42, 886–893. [Google Scholar] [CrossRef]
- Maier, O.; Schröder, C.; Forkert, N.D.; Martinetz, T.; Handels, H. Classifiers for ischemic stroke lesion segmentation: A comparison study. PLoS ONE 2015, 10, e0145118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adam, S.Y.; Yousif, A.; Bashir, M.B. Classification of ischemic stroke using machine learning algorithms. Int. J. Comput. Appl. 2016, 149, 26–31. [Google Scholar]
- Chantamit-O.-Pas, P.; Goyal, M. Long short-term memory recurrent neural network for stroke prediction. In Proceedings of the Machine Learning and Data Mining in Pattern Recognition: 14th International Conference, MLDM 2018, New York, NY, USA, 15–19 July 2018; Proceedings, Part I. 2018. [Google Scholar]
- Ogunleye, B.O. Statistical Learning Approaches to Sentiment Analysis in the Nigerian Banking Context. Ph.D. Thesis, Sheffield Hallam University, Sheffield, UK, 2021. [Google Scholar]
- Charbuty, B.; Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
- Akbar, W.; Wu, W.P.; Faheem, M.; Saleem, S.; Javed, A.; Saleem, M.A. Predictive analytics model based on multiclass classification for asthma severity by using random forest algorithm. In Proceedings of the 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Istanbul, Turkey, 12–13 June 2020. [Google Scholar]
- Biau, G.; Scornet, E. A random forest guided tour. TEST 2016, 25, 197–227. [Google Scholar] [CrossRef] [Green Version]
- Sarica, A.; Cerasa, A.; Quattrone, A. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease: A Systematic Review. Front. Aging Neurosci. 2017, 9, 329. [Google Scholar] [CrossRef] [Green Version]
- Shanthakumari, R.; Nalini, C.; Vinothkumar, S.; Roopadevi, E.M.; Govindaraj, B. Multi Disease Prediction System using Random Forest Algorithm in Healthcare System. In Proceedings of the 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, 10–11 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 242–247. [Google Scholar]
- Belgiu, M.; Drăguţ, L. Random Forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- Shobayo, O.; Saatchi, R.; Ramlakhan, S. Infrared thermal imaging and artificial neural networks to screen for wrist fractures in pediatrics. Technologies 2022, 10, 119. [Google Scholar] [CrossRef]
- Odusami, M.; Maskeliūnas, R.; Damaševičius, R.; Krilavičius, T. Analysis of features of alzheimer’s disease: Detection of early stage from functional brain changes in magnetic resonance images using a finetuned ResNet18 network. Diagnostics 2021, 11, 1071. [Google Scholar] [CrossRef] [PubMed]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Attribute | Category | |
---|---|---|
Sex | Male | 0.052 |
Female | 0.048 | |
Hypertension | Non-Hypertensive Patients | 0.04 |
Hypertensive Patients | 0.18 | |
Heart Disease | Yes | 0.18 |
No | 0.04 |
Attribute | Category | |
---|---|---|
Age | 25 | 0.0025 |
25–50 | 0.005 | |
50–75 | 0.075 | |
75–100 | 0.2 | |
BMI | <20 | 0.032 |
20–25 | 0.072 | |
25–30 | 0.056 | |
30–35 | 0.046 | |
>40 | 0.08 | |
Average Glucose level | 30–90 | 0.20 |
90–150 | 0.20 | |
150–210 | 0.60 | |
210–270 | 0.80 | |
270–230 | 1.00 |
Attribute | Category | |
---|---|---|
Marriage | Ever Married | 0.07 |
Never married | 0.018 | |
Work Type | Private | 0.05 |
Self-Employed | 0.08 | |
Govt Job | 0.05 | |
Children | 0.005 | |
Resident Type | Urban | 0.052 |
Rural | 0.046 | |
Smoking Status | Formerly Smoked | 0.078 |
Never smoked | 0.046 | |
Smokes | 0.052 | |
Unknown | 0.03 |
Machine Learning Algorithms | Accuracy (%) | Precision (%) | Recall (%) | Macro F1-Score (%) |
---|---|---|---|---|
Random Forest | 94.11 | 93.27 | 95.06 | 94.16 |
Logistic Regression | 91.43 | 94.41 | 88.06 | 91.12 |
Decision Tree | 88.83 | 89.41 | 88.07 | 88.73 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shobayo, O.; Zachariah, O.; Odusami, M.O.; Ogunleye, B. Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm. Analytics 2023, 2, 604-617. https://doi.org/10.3390/analytics2030034
Shobayo O, Zachariah O, Odusami MO, Ogunleye B. Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm. Analytics. 2023; 2(3):604-617. https://doi.org/10.3390/analytics2030034
Chicago/Turabian StyleShobayo, Olamilekan, Oluwafemi Zachariah, Modupe Olufunke Odusami, and Bayode Ogunleye. 2023. "Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm" Analytics 2, no. 3: 604-617. https://doi.org/10.3390/analytics2030034