Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention

Noor, Isha; Aslam, Amara; Mir, Azka; Insany, Gina Purnama

doi:10.3390/engproc2025107123

Open AccessProceeding Paper

Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention^†

¹

Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan

²

Department of Information Technology, Nusa Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 123; https://doi.org/10.3390/engproc2025107123

Published: 9 October 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Brain stroke is a medical condition where the disruption or interference of blood supply causes damage to blood cells. As a result, the effective area loses the abilities and tasks it performs. It usually affects people over the age of 25 and under 70. This research paper predicts the chances of brain stroke before it happens to save the lives of people. The dataset used in this research paper, found on the Kaggle website, has 4981 samples and 11 different features, or risk factors, on which the basis of this medical condition or event occurs. This study uses different supervised machine learning algorithms: Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, and K-Nearest Neighbors (KNN). We obtained the best results from Random Forest during research, which is 94.3%.

Keywords:

brain stroke; stroke prediction; machine learning; supervised learning; random forest; logistic regression; decision tree; naïve bayes; k-nearest neighbors; risk factors

1. Introduction

Brain stroke is a medical condition or event that occurs when the supply of blood is interrupted or breaks in the brain. Also, when the flow of blood stops for more than a few seconds, the cells of the brain cannot gain enough oxygen and blood, and as a result, the brain cells die. And it causes the loss of ability to control and manage a specific area. Another reason for brain stroke is that it is caused by a rupture of an artery in the brain. Brain stroke is the leading cause of death and disability in the world and affects millions of people and their families. According to the World Health Organization, about 15 million people suffer this stroke each year, leading 5 million people to death, and 5 million people are left with permanent disability that causes a burden to their families. Another statistic sheet for the year 2022, which is referred to by the World Stroke Organization, says almost 6.5 million people worldwide die from stroke annually. The thing to keep in mind is that 6 percent of the deaths are between the ages of 15 and 49 years, and 34 percent of deaths are under 70 years old. Another notable thing is from the past years, 1990 to 2019, the death ratio from brain stroke increased to 43 percent. Most of the deaths, which are 86 percent and 89 percent disabilities from this ratio, are from the developing and second-tier developing countries. Older people have a higher risk of brain stroke, but now this medical condition is increasing in the younger population due to their lifestyle, like bad diet, absence of physical activities in their lives, and smoking consumption. Also, high blood pressure, hypoglycemia (high glucose level), and hypoglycemia (low glucose level) are the biggest risk factors for brain stroke. Knowing these risk factors on time and receiving better treatment can prevent the occurrence of this medical condition. Brain stroke is also called cerebrovascular accident, cerebrovascular insult, or brain attack. The acronym used to remember the symptoms of stroke is FAST. F stands for the drooping of the face. A stand for drifting of arms. S stands for speech problems. And T stands for timing. Time is the most important factor to consider for the occurrence of stroke. There are different symptoms of brain stroke, which are as follows:

Headache without any reason;
Numbness;
Tingling in one part of the body;
Weakness of one part of the body;
Speech issue;
Vitiligo;
Walking issue;
Dizziness;
Temporary blindness, etc.

It is necessary to reach the hospital within 3 h (or as soon as possible). A person having a brain stroke history in their family has more chance of having a stroke. Also, men as compared to women had more strokes than women. Different studies show people having diabetes, the issue of irregular heartbeats, and consumption of alcohol have more chances of the occurrence of brain stroke. Brain strokes have two different types, which are ischemic stroke and hemorrhagic stroke: Ischemic stroke is more common and has a greater than 80 percent chance of occurring. It happens when the vessel supplying blood to the brain is blocked or obstructed. This type of stroke happens due to the cause of atherosclerosis. Ischemic stroke has two types, which are thrombotic strokes and embolic strokes. A hemorrhagic stroke transpires when a cerebral blood artery ruptures, resulting in bleeding within or surrounding the brain. This kind of stroke is less prevalent than ischemic stroke, although it is generally more lethal. Hemorrhaging from the vessel exerts pressure on cerebral tissues, compromising their functionality and resulting in substantial damage or necrosis of neurons. The symptoms of hemorrhagic stroke are high blood pressure, trauma, brain tumor, and use of any medicine used for blood thinning, etc. There are also two types of hemorrhagic stroke, which are as follows: 1. intracerebral hemorrhagic and 2. subarachnoid hemorrhagic.

2. Literature Review

An automated technique for identifying and categorizing abnormalities at the slice level of non-contrast computed tomography (CT) images into acute infarct, chronic infarct, and hemorrhage is shown in this work. The three primary steps of the suggested approach are picture enhancement, midline symmetry detection, and anomalous slice categorization. To improve the region of interest, a windowing technique is used to distribute intensity. Rotation- and translation-invariant abnormality detection is based on domain knowledge about the anatomy of the brain and skull. Features from the intensity and wavelet domains are used to identify anomalies using a two-level classification approach. A dataset consisting of 347 picture slices from 15 patients was used to test the suggested approach. At the patient level, the approach detects abnormalities with 90% accuracy and 100% memory; at the slice level, it obtains an average precision of 91% and 90% recall [1].

The model is predicated on the following suppositions: brain plasticity is a synaptic phenomenon that is primarily stimulus-dependent; brain repair necessitates both behavioral and physical interventions that are specifically designed to reorganize brain circuits; and neural repair mechanisms inherently involve cellular and circuit plasticity. We describe our new model, examine existing methods for brain regeneration following a stroke, and review the data, reasoning, and biological underpinnings of our innovative approach to language and upper-extremity rehabilitation. We think this neurological model for brain regeneration may eventually result in a stroke cure by improving plasticity at the level of brain network interactions [2].

This research presents a comparative investigation of stroke diagnosis on CT and MRI imaging. The program suggests identifying infarction and hemorrhage in the human brain by applying digital image processing technologies. Median filtering is used to preprocess medical pictures. The seeded region growth method and Gabor filtering are used for segmentation. The technique is shown in brain pictures from CT and MRI, which show several kinds of infarcts. The procedure’s outcomes are assessed visually. The suggested approach shows promise in detecting strokes and proves that MRI imaging is better than CT imaging in this regard [3]. This study was compared to a random-effects meta-analysis of related adult studies. Over 17 months, 301 presentations were made by 287 children, 46% of whom were male. A third of them came via ambulance. The interquartile range for the duration of symptoms before arrival was 2–28 h, with a median of 6 h. Between triage and medical evaluation, the median time was 22 min (interquartile range: 6–55 min). Seizures (21%), numbness (24%), headache (56%), vomiting (36%), focal weakness (35%), and altered consciousness (21%) were among the most common symptoms. Ataxia (10%), numbness (13%), focal weakness (31%), and speech disturbance (8%) were among the most common symptoms. Neuroimaging comprised MRI (31%), which was abnormal in 62% of cases, and CT imaging (30%), which was abnormal in 27% of cases. Bell palsy (10%), stroke (7%), conversion disorders (6%), migraine (28%), and seizures (15%) were the most frequently diagnosed conditions. For stroke, migraine, seizures, and conversion disorders, the relative proportions of diseases in children and adults were considerably different [4].

The focus of this study is the diet that stroke patients should follow to prevent the negative effects of their condition. This entailed researching the issues and creating a customized diet for each one. Dysphagia, hypertension, hyperglycemia, and impairments in certain body parts are among the numerous issues that stroke victims deal with. The goal of this research is to use soft computing algorithms to provide them with individualized care and a knowledgeable diet. Patients were required to fill it out every day, and results were based on their responses. They were fed the diet [5].

This paper presents the results of a thorough investigation of a few uses of EEG and MRI in the detection of brain disorders. Additionally, a thorough comparison of MRI and EEG is conducted. There are two sections to this work. A thorough analysis of EEG processing was performed in the first phase. The subsequent stage comprises a comparative analysis of the identification of brain disorders using both EEG and MRI [6].

Because it may show the extent and severity of the event, CT is the most utilized neuroimaging modality. Additionally, compared with different systems, CT tests are more affordable, quicker, and easier to use. Because it’s an emergency, Computer Aided Design (CAD) systems that can interpret CT scans are therefore crucial to obtaining information that can speed up diagnosis and therapy definition. This study presents Parzen Analysis of Brain Tissue Thicknesses, a novel feature extractor from brain CT images used in the identification and categorization of strokes. By analyzing the likelihood that each pixel falls within a predetermined range, the suggested method can use Parzen Window Estimation to lessen the subjectivity found in the brain tissue bands. Furthermore, the technique may be utilized remotely to help medical professionals diagnose and treat strokes, and it is completely connected with an Internet of Things framework. Out of all the approaches assessed here, the method showed encouraging results, achieving the best accuracy (98.41%), F1-score (97.61%), negative predictive value (98.80%), and positive predictive value (95.45%). The outcomes showed how well the technique worked to extract pertinent information from brain CT scans to characterize the presence or absence of a stroke and identify its type [7].

To differentiate between a hemorrhagic stroke, an ischemic stroke, or a healthy brain, we provide an Internet of Things (IoT) framework for the classification of stroke from CT scans using Convolutional Neural Networks (CNNs). CNNs were integrated with various consolidated machine learning techniques, including Bayesian classifiers, multilayer perceptrons, K-Nearest Neighbors, Random Forests, and Support Vector Machines, under the Transfer Learning idea. By using a skilled technique that can gather data that is invisible to the human eye, technology helps to automate the diagnostic process and achieve a more accurate diagnosis. Also, an efficient and flexible new tool to address issues with healthcare services emerged with the introduction of IoT. Our method can offer remote patient monitoring and diagnosis. By examining the parameters accuracy, F1-score, recall, precision, and processing time, the method was verified. The findings show that CNN outperformed most of the investigated classifiers in terms of accuracy, F1-score, recall, and precision. When using the Bayesian Classifier, the quickest test and training times were 0.001 and 0.015 s, respectively. As a result, our suggested method shows effectiveness and dependability in stroke detection [8].

This study uses data from medical reports and a person’s physical condition to identify the sort of stroke that may occur using four machine learning algorithms. To solve issues, we obtained many entries from hospitals. According to the classification result, the outcome is appropriate for use in real-time medical reports. We think that machine learning algorithms can be helpful in healthcare and aid in a better understanding of diseases. The naïve bayes (NB) classifier has an accuracy of 85.6%. For random forest (RF), k-NN, and J48, the accuracy is 99.8%. NB’s f-measure, recall, and precision are 86.1%, 85.6%, and 88.1%, respectively. The precision, recall, and f-measures of the J48, k-NN, and RF are all 99.8%, 99.8%, and 99.8%, respectively [9].

This study uses CNN and deep learning models to try to diagnose brain stroke using MRI. The suggested method uses semantic segmentation to identify problematic regions and classify brain stroke MRI pictures into normal and abnormal images. LeNet and SegNet, two varieties of convolutional neural networks, are used. We trained all layers of LeNet using preprocessed stroke MRI for classification, and we were able to distinguish between normal and abnormal patients. After that, the atypical patient data is saved in a two-dimensional array and sent into SegNet, an encoder-decoder model for segmentation, which trains all of its layers except the entire connection layer. As with the experimental results, the segmentation model obtains accuracy between 85 and 87%, and the classification model achieves accuracy between 96 and 97%. Based on experimental results, we found that deep learning models are not only applicable to non-medical images but also provide exact diagnoses for medical images, particularly for the recognition of brain strokes [10].

As the world’s population ages, the frequency of strokes will rise, rapidly putting financial strain on society. The suggested approach estimates the incidence of stroke disease by using a variety of machine learning classification techniques, including Decision Tree, Deep Neural Network Learning, Maximum Expectation, Random Forest, and Gaussian Naïve Bayesian Classifier, along with the corresponding number of characteristics. The performance and scaling that were previously employed to extract magnificent background statistics from medical data are limited in the current study, primarily by the PCA (Principal Component Analysis) algorithm. To ascertain whether the patient has a stroke disease, we used those diminished traits. The suggested Deep Neural Network Learning Classifier method yielded 86.42% accuracy, 74.89% sensitivity, and 88.49% specificity when compared to other machine learning techniques. Therefore, both patients and medical professionals can help treat a viable stroke [11].

In this study, stroke prediction is performed using machine learning techniques such as AdaBoost, AdaBoost with stochastic gradient decent (SGD), Random Forest, linear SVM, Poly SVM, RBG SVM, Logistic Regression, Gaussian Naïve Bayes, and Decision Tree. The Random Forest classifier was determined to be the most effective of these algorithms, with 94.23% accuracy, 92.16% sensitivity, 95.07% specificities, and 0.04% low error rate values [12,13]. Using various machine learning techniques, this study suggests an early prediction of stroke diseases based on factors such as age, smoking status, heart disease, body mass index, hypertension, average glucose levels, and prior strokes. Ten distinct classifiers—Logistics Regression, Stochastic Gradient Descent, Decision Tree, AdaBoost, Gaussian, Quadratic Discriminant Analysis, Multilayer Perceptron, K-Nearest Neighbors, Gradient Boosting, and XG Boost—have been trained using these high feature attributes to predict the stroke. To achieve the best accuracy, the weighted voting approach is then used to aggregate the base classifiers’ findings. Furthermore, the weighted voting classifier outperforms the base classifiers in the proposed study, which has an accuracy of 97%. This model predicts strokes with the highest accuracy [13].

A new prototype is introduced and described in this study. Based on a low-complexity architecture, the gadget uses a minimal number of antennas that are strategically arranged and designed to be mounted on a helmet. It gives three-dimensional images of the stroke by utilizing a differential imaging technique. The technology’s potential to image a spherical target that mimics a stroke with a radius of 1.25 cm is confirmed by preliminary tests using a 3D phantom filled with brain tissue-mimicking liquid [14].

Stroke ranks second in terms of illness-related mortality worldwide, making it one of the top diseases. About 16 million individuals worldwide suffer from this illness each year, and about 38% of cases result in death. One very useful tool for helping with the medical diagnosis of stroke is CT. However, different experts may have different opinions about the analysis’s features. To overcome the difficulty of detecting stroke from CT pictures, this study suggests a completely automated method based on the Health of Things that can use deep learning networks to categorize skull CT images into two groups: hemorrhagic stroke and uninjured stroke. Following image categorization, Mask RCNN uses machine learning techniques in conjunction with a learning transfer mechanism to segment the stroke. Our novel approach outperformed the existing literature approaches based on automatic models by selecting great outcomes for both classification with 100% accuracy and segmentation in our best model (Mask + KNN), which achieved 99.93% specificity and 99.73% accuracy with a segmentation duration of 4.00 s [15].

In many nations, stroke ranks as the primary cause of both obesity and death. By optimizing image quality to improve image results and reduce noise, this study preprocessed data to improve the image quality of CT scans of stroke patients. Additionally, machine learning algorithms were applied to classify the patient’s images into two sub-types of stroke disease, namely ischemic stroke and stroke hemorrhage. K-Nearest Neighbors, Naive Bayes, Logistic Regression, Decision Tree, Random Forest, Multi-layer Perceptron (MLP-NN), Deep Learning, and Support Vector Machine are the eight machine learning algorithms utilized in this work to classify stroke illness. According to our findings, Random Forest produces the best accuracy (95.97%), precision (94.39%), recall (96.12%), and F1-measures (95.39%) [16].

Based on the patient’s basic physiological data, medical history, and living situation, the dataset was extracted from “Kaggle.” The dataset was split into three subsets at random before model construction: 70% training data, 15% test set data, and 15% validation data. Artificial neural networks (ANNs) are ideally suited for medical applications within the realm of contemporary artificial intelligence. When it comes to predicting cardiovascular disease (CVD), ANNs have shown promising outcomes. Thus, in this article, a model for stroke prediction using artificial neural networks (ANNs) is constructed using the physiological data of the patients. With 1000 iterations of cross-validation, the ANNs approach presented in this research may achieve about 98% classification accuracy [17]. This study predicts the likelihood of brain strokes using a variety of machine learning algorithms [18]. To obtain precise predictions, this study used machine learning methods such as Naïve Bayes classification, K-Nearest Neighbors, Support Vector Machine, Random Forest classification, Decision Tree classification, and Logistic Regression in conjunction with a variety of physiological parameters. With an accuracy of almost 82%, Naïve Bayes was the algorithm that did this task the best [18].

3. Methodology

3.1. Machine Learning (ML)

Machine learning is the branch of artificial intelligence that uses data for learning processes and then predicts new results based on previous data [19]. In simple words, a human writes a program in a machine and trains the program, and then as a result the program becomes enough to work or predict on its own by setting some specific rules or techniques. Machine learning is divided into three categories, which are supervised learning, unsupervised learning, and reinforcement learning. In this study, we have used supervised learning techniques [20]. These techniques are given below.

3.1.1. Naïve Bayes

Naïve Bayes is a supervised learning technique that predicts the data of a class by using the Bayes algorithm. This algorithm is highly used in text classification. The Naïve Bayes algorithm works as a probability classifier, as it predicts the outcomes of the object on the basis of probability.

3.1.2. Decision Tree

Decision Trees are known for solving both classification and regression problems. It is a graphical representation of a tree that consists of a leaf node and a root node. It predicts the result based on decisions by learning from past data.

3.1.3. Random Forest

Random Forest is the supervised learning algorithm that consists of many Decision Trees and then combines each result of the Decision Tree to predict the actual result. The classifier helps us to achieve the best accuracy.

3.1.4. K-Nearest Neighbor

The KNN algorithm is mainly used to solve the problem of missing values. This classifier predicts the missing value by observing the nearest neighbor or similar values. This algorithm is also used to solve the regression and classification problem.

3.1.5. Gradient Boosting

The Gradient Boosting algorithm combines the weak models, usually Decision Trees, and trains them in a particular order to remove the error or minimize the functional loss by the previous model. It is also used to solve the problem of regression and classification.

3.2. Dataset and Attributes

This dataset is available on the Kaggle website for research purposes. The dataset is derived from real healthcare data. There are 12 attributes in this dataset.

3.2.1. About the Dataset

Brain stroke is a serious medical condition that occurs in a person when the supply of blood stops even for a few seconds to the brain or its specific part. As a result, the affected area lost the ability to work as it was doing before the issue. There are many causes of this problem, which are high BP, high cholesterol, diabetes, a person suffering from any past trauma, stoutness, smoking, and consumption of alcohol.

Around 795,000 people in 2018 experienced a stroke in the US, and it became one of the top five leading causes of death in the world. According to the World Health Organization, stroke is the main cause of death in the world and leads to disability. According to another report from the CDC, out of 6 people, 1 has a stroke in their life, and out of 3 strokes, 1 is considered dangerous or deadly. This dataset is uploaded by JillaniSoftTech and is available publicly.

3.2.2. Context

The used dataset is obtained from Kaggle. The attributes and description are shown in Table 1. The brain stroke dataset contains 4982 instances having 12 attributes, of which 10 are considered the most important factors on which the bases of stroke are predicted. The prediction is classified into 0 and 1. Many researchers use this dataset for their studies and research purposes. The dataset is trained on Rapid Miner which is software that is used for testing and training purposes for the dataset.

3.2.3. Attributes with Description

Below is a detailed description of the dataset:

Table 1. Attributes description and data types.

Attribute	Description	Type of Data
Age	Age of the person	Integer
Gender	Gender of the patient (male or female)	Binomial
Hypertension	Does the patient have hypertension	Binomial
Heart Disease	Is the patient suffering from any heart issue	Binomial
Marital Status	Single or married	Binomial
Work Type	The profession or job of the patient	Categorical
Residence Type	Residence of patients in rural or urban areas	Categorical
Avg Glucose Level	Average glucose level of the patient	Integer
BMI	Body mass index of the patient	Integer
ID	Unique identifier	Integer
Stroke	Prediction of stroke (0 means no, 1 means yes)	Binomial
Smoking Status	If the patient has smoking status	Categorical

3.3. Framework

The framework as shown in Figure 1, describes our system’s inception and final operation. It handles all system tasks. Visualizing the mechanism aids comprehension. For cerebral stroke, we tried multiple strategies to acquire the best accuracy, using 70% of the data for training and 30% for testing. The goal can be achieved with Rapid Miner. Rapid Miner is best for this. Its advanced features and user-friendly interface make it ideal for data mining and analysis. Rapid Miner helps you accurately evaluate data for discrimination and decision-making. Accept no less than Rapid Miner for all your data mining needs. The proposed framework of our system will help explain our research. Our Kaggle dataset was imported into Rapid Miner. Next, we split our data into training and testing. We used 30% of the data for testing and 70% for training. After this, we must choose and apply classifiers to the model. The best performance has been achieved using nearly 5 methods.

4. Results

Promising results of using many machine learning techniques to forecast brain stroke risk were achieved as shown in Figure 2. With a classification error of 4.95%, the Random Forest model produced the best accuracy at 95.05%, therefore showing its great performance. With both precision and recall for this model set at 100%, it is clear that the model was rather successful in spotting actual positive events free of false positives. With an accuracy of 94.44%, a classification error of 5.56%, a precision of 95.26%, and a recall of 99.08%, the Decision Tree model also did very well. With a classification error of 5.49%, accuracy of 94.51%, precision of 95.20%, and recall of 99.23%, the K-Nearest Neighbor method produced an accuracy of 93.17%, a classification error of 6.83%, a precision of 95.39%, and a recall of 97.54%. The Gradient Booster Model scored well in terms of precision (96.62%) and recall (88.52%). The Naïve Bayes model had the lowest accuracy of 86.14%, therefore proving its capacity to identify properly despite a larger classification error. These results are shown in Table 2.

4.1. Accuracy

Accuracy is the percentage of accurately predicted cases to all the cases. That performance measure is the most often utilized one. With an accuracy of 95.05%, for example, the Random Forest model produced 95.05% of all the right predictions out of all the ones generated.

4.2. Precision

Precision, then, is the percentage of accurate positive forecasts. When false positives are expensive, this is a vital statistic. With Random Forest, the accuracy was 100%, so every time the model projected a positive instance, it was accurate—that is, without any false positives.

4.3. Recall

Recall, often referred to as sensitivity, gauges a model’s capacity to accurately find every positive case. A recall of 100% indicates that the model found all actual positive events. A recall of 100% for Random Forest means the model was flawless in spotting every incidence of brain damage.

4.4. Classification Error

Measures of the fraction of erroneous model predictions constitute the classification error. It comes computed as 1 less the accuracy. A more accurate model comes from a reduced categorization error. Random Forest, for instance, had a classification error of 4.95%, suggesting that 4.95% of the forecasts it produced were off. The highest accuracy and recall were achieved by Random Forest. Various methods and techniques have been applied to improve the results. The figure below is the accuracy chart of the different classifiers that are used in the study. Here is the confusion matrix of the Random Forest classifier that has achieved the highest accuracy. Table 3 shows the precision and recall values.

Table 4 presents a comparison of various studies conducted in different years using different datasets to achieve the highest accuracy.

5. Conclusions

Though the present work reveals positive results, several paths are still open for additional research. Feature selection is one important area; further improving model accuracy will require the addition of new features or the application of creative methods. Furthermore, ensemble learning techniques, which integrate the benefits of multiple algorithms, might offer improved prediction accuracy and stability. Last but not least, building confidence and ensuring useful use in clinical situations depends on strengthening model interpretability, thereby enabling healthcare practitioners to base their decisions on predictions. These advances in forthcoming studies should increase the dependability and applicability of predictive models for brain strokes, hence enabling better healthcare outcomes.

Author Contributions

Conceptualization, I.N. and A.A.; methodology, I.N. and A.M.; software, A.M.; validation, I.N. and G.P.I.; formal analysis, A.A. and A.M.; investigation, I.N. and A.M.; resources, G.P.I.; data curation, A.A. and G.P.I.; writing and original draft preparation, I.N. and A.M.; writing review and editing, A.A. and G.P.I.; visualization, A.M.; supervision, I.N.; project administration, I.N. and G.P.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chawla, M.; Sharma, S.; Sivaswamy, J.; Kishore, L.T. A method for automatic detection and classification of stroke from brain CT images. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology society, Minneapolis, MN, USA, 3–6 September 2009; pp. 3581–3584. [Google Scholar] [CrossRef]
Small, S.L.; Buccino, G.; Solodkin, A. Brain Repair after Stroke—A Novel Neurological Model. Nat. Rev. Neurol. 2013, 9, 698–707. [Google Scholar] [CrossRef] [PubMed]
Byna, A.; Lakulu, M.M.; Panessai, I.Y. Current critical review on prediction stroke using machine learning. Bull. Electr. Eng. Inform. 2024, 13, 3470–3480. [Google Scholar] [CrossRef]
Mackay, M.T.; Chua, Z.K.; Lee, M.; Yock-Corrales, A.; Churilov, L.; Monagle, P.; Donnan, G.A.; Babl, F.E. Stroke and Nonstroke Brain Attacks in Children. In Proceedings of the 2018 International Conference on Advances in Computing and Communication Engineering (ICACCE), Paris, France, 22–23 June 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
Bhattacharjee, S.; Ghatak, S.; Dutta, S.; Chatterjee, B.; Gupta, M. A Survey on Comparison Analysis between EEG Signal and MRI for Brain Stroke Detection. In Proceedings of the International Conference on Emerging Technologies in Data Mining and Information Security, Kolkata, India, 23–25 July 2018; Advances in Intelligent Systems and Computing. Springer: Singapore, 2019; Volume 814, pp. 377–382. [Google Scholar] [CrossRef]
Sarmento, R.M.; Vasconcelos, F.F.X.; Filho, P.P.R.; de Albuquerque, V.H.C. An IoT Platform for the Analysis of Brain CT Images Based on Parzen Analysis. Future Gener. Comput. Syst. 2020, 105, 135–147. [Google Scholar] [CrossRef]
Dourado, C.M.J.M.; da Silva, S.P.P.; da Nóbrega, R.V.M.; Antonio, A.C.; Filho, P.P.R.; de Albuquerque, V.H.C. Deep Learning IoT System for Online Stroke Detection in Skull Computed Tomography Images. Comput. Netw. 2019, 152, 25–39. [Google Scholar] [CrossRef]
Shoily, T.I.; Islam, T.; Jannat, S.; Tanna, S.A.; Alif, T.M.; Ema, R.R. Detection of Stroke Disease Using Machine Learning Algorithms. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 6–8 July 2019; pp. 1–6. [Google Scholar] [CrossRef]
Nayak, S.; Gupta, N. Enhancing Stroke Prediction with Machine Learning in Smart Healthcare Systems. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT 2024), Kamand, India, 24–28 June 2024; pp. 9616–9621. [Google Scholar]
Sailasya, G.; Aruna Kumari, G.L. Analyzing the Performance of Stroke Prediction Using ML Classification Algorithms. Available online: http://www.ijacsa.thesai.org (accessed on 1 January 2025).
Emon, M.U.; Keya, M.S.; Meghla, T.I.; Rahman, M.M.; Al Mamun, M.S.; Kaiser, M.S. Performance Analysis of Machine Learning Approaches in Stroke Prediction. In Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1464–1469. [Google Scholar] [CrossRef]
Choi, Y.A.; Park, S.J.; Jun, J.A.; Pyo, C.S.; Cho, K.H.; Lee, H.S.; Yu, J.H. Deep Learning-Based Stroke Disease Prediction System Using Real-Time Bio Signals. Sensors 2021, 21, 4269. [Google Scholar] [CrossRef] [PubMed]
Tobon Vasquez, J.A.; Scapaticci, R.; Turvani, G.; Bellizzi, G.; Rodriguez-Duarte, D.O.; Joachimowicz, N.; Duchêne, B.; Tedeschi, E.; Casu, M.R.; Crocco, L.; et al. A Prototype Microwave System for 3D Brain Stroke Imaging. Sensors 2020, 20, 2607. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Holanda, G.; Souza, L.F.D.F.; Silva, H.; Gomes, A.; Silva, I.; Ferreira, M.; Jia, C.; Han, T.; de Albuquerque, V.H.C.; et al. Deep Learning-Enhanced Internet of Medical Things to Analyze Brain CT Scans of Hemorrhagic Stroke Patients: A New Approach. IEEE Sens. J. 2021, 21, 24941–24951. [Google Scholar] [CrossRef]
Mia, R.; Khanam, S.; Mahjabeen, A.; Ovy, N.H.; Ghimire, D.; Park, M.-J.; Begum, M.I.A.; Hosen, A.S.M.S. Exploring Machine Learning for Predicting Cerebral Stroke: A Study in Discovery. Electronics 2024, 13, 686. [Google Scholar] [CrossRef]
Liang, E.N.; Pei, S.; Staibano, P.; van der Woerd, B. Clinical applications of large language models in medicine and surgery: A scoping review. J. Int. Med. Res. 2025, 53, 1–24. [Google Scholar] [CrossRef]
Dev, S.; Wang, H.; Nwosu, C.S.; Jain, N.; Veeravalli, B.; John, D. A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2022, 2, 100032. [Google Scholar] [CrossRef]
Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A New Model for Predicting Component-Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
Javed, D.; Jhanjhi, N.Z.; Khan, N.A.; Ray, S.K.; Ashfaq, F.; Das, S.R. Diabetes detection framework for imbalanced data via explainable machine learning. In AIP Conference Proceedings, Proceeding of the International Conference on Cognitive Computing and Artificial Intelligence Chennai (ICCCAI—2024), Chennai, India, 7–8 March 2024; AIP Publishing: Melville, NY, USA, 2025; Volume 3257, p. 020040. [Google Scholar] [CrossRef]
Ashfaq, F.; Jhanjhi, N.Z.; Khan, N.A.; Javaid, D.; Masud, M.; Shorfuzzaman, M. Enhancing ECG Report Generation with Domain-Specific Tokenization for Improved Medical NLP Accuracy. IEEE Access 2025, 13, 85493–85506. [Google Scholar] [CrossRef]

Figure 1. Proposed framework.

Figure 2. Results of classifiers.

Table 2. Performance comparison of classification algorithms.

Algorithm	Accuracy	Classification Error	Precision
Random Forest	95.05%	4.95%	95.05%
Decision Tree	94.44%	5.56%	95.26%
K-Nearest Neighbor	94.51%	5.49%	95.20%
Gradient Booster Model	93.17%	6.83%	95.39%
Naïve Bayes	86.14%	13.86%	96.62%

Table 3. Confusion matrix with precision and recall.

	True 0	True 1	Precision
Pred 0	0	0	0.00%
Pred 1	74	1420	95.05%
Class Recall	0.00%	100.00%

Table 4. Accuracy of various algorithms from the literature.

Ref.	Year	Accuracy	Algorithms
[1]	2009	90.00%	CT images
[2]	2019	99.99%	Bayesian classifier, RF, KNN, SVMLinear, SVMBRF
[3]	2019	98.41%	KNN-OLF-SVM
[4]	2019	99.8%	Naive Bayes, J48, k-NN, RF in WEKA toolkit
[5]	2020	86.42%	DT, RF, EM, GNB, and DNN
[6]	2020	94.23%	DT, GNB, LR, L-SVM, P-SVM, RBG SVM, RF, AB, and AB with SGD
[7]	2020	97%	LR, SGD, DT, AB, Gaussian, QDA, MLP, KNN, GB, and XGB
[8]	2020	99.73%	DT, AB, RF, NB, KNN, SVM, MLP, and EM
[9]	2020	95.97%	K-NN, NB, LR, DT, RF, MLP-NN, DL, and SVM
[10]	2020	98%	ANN, AT Algorithms such as LM and SCG
[11]	2021	82%	LR, DT, RF, K-NN, SVM, and NB
[12]	2021	94.0%	LSTM, B-LSTM, CNN-LSTM, and CNN-B-LSTM
[13]	2021	96%	NB, K-NN, LR, RF, AB, and DTC
[14]	2022	97%	RF, K-NN, LR, SVM, NB, and MCC
[15]	2023	97.17%	NB, SVM, RF, AB, and XGB
[16]	2023	99%	XGB, AB, LGBM, RF, DT, LR, K-NN, SVM-LK, NB, and DNN
[17]	2024	95.16%	NB, LR, SVM, K-NN, DT, RF, XGB, and NN
[18]	2024	97%	NB, RF, etc.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Noor, I.; Aslam, A.; Mir, A.; Insany, G.P. Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention. Eng. Proc. 2025, 107, 123. https://doi.org/10.3390/engproc2025107123

AMA Style

Noor I, Aslam A, Mir A, Insany GP. Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention. Engineering Proceedings. 2025; 107(1):123. https://doi.org/10.3390/engproc2025107123

Chicago/Turabian Style

Noor, Isha, Amara Aslam, Azka Mir, and Gina Purnama Insany. 2025. "Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention" Engineering Proceedings 107, no. 1: 123. https://doi.org/10.3390/engproc2025107123

APA Style

Noor, I., Aslam, A., Mir, A., & Insany, G. P. (2025). Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention. Engineering Proceedings, 107(1), 123. https://doi.org/10.3390/engproc2025107123

Article Menu

Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention^†

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Machine Learning (ML)

3.1.1. Naïve Bayes

3.1.2. Decision Tree

3.1.3. Random Forest

3.1.4. K-Nearest Neighbor

3.1.5. Gradient Boosting

3.2. Dataset and Attributes

3.2.1. About the Dataset

3.2.2. Context

3.2.3. Attributes with Description

3.3. Framework

4. Results

4.1. Accuracy

4.2. Precision

4.3. Recall

4.4. Classification Error

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention †

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Machine Learning (ML)

3.1.1. Naïve Bayes

3.1.2. Decision Tree

3.1.3. Random Forest

3.1.4. K-Nearest Neighbor

3.1.5. Gradient Boosting

3.2. Dataset and Attributes

3.2.1. About the Dataset

3.2.2. Context

3.2.3. Attributes with Description

3.3. Framework

4. Results

4.1. Accuracy

4.2. Precision

4.3. Recall

4.4. Classification Error

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Predicting Brain Stroke Risk Using Machine Learning: A Comprehensive Approach to Early Detection and Prevention^†