Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review

Sasidharan, Simi Meledathu; Mdletshe, Sibusiso; Wang, Alan

doi:10.3390/app151810082

Open AccessReview

Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review

by

Simi Meledathu Sasidharan

^1,*

,

Sibusiso Mdletshe

^1,*

and

Alan Wang

^1,2,3,4,5,6

¹

Department of Anatomy and Medical Imaging, FMHS, University of Auckland, Auckland 1023, New Zealand

²

Auckland Bioengineering Institute, University of Auckland, Auckland 1010, New Zealand

³

Centre for Brain Research, University of Auckland, Auckland 1023, New Zealand

⁴

Matai Medical Research Institute, Gisborne 4010, New Zealand

⁵

Medical Imaging Research Centre, University of Auckland, Auckland 1023, New Zealand

⁶

Centre for Co-Created Ageing Research, University of Auckland, Auckland 1023, New Zealand

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(18), 10082; https://doi.org/10.3390/app151810082

Submission received: 22 July 2025 / Revised: 3 September 2025 / Accepted: 12 September 2025 / Published: 15 September 2025

(This article belongs to the Special Issue Advances in Medical Imaging: Techniques and Applications)

Download

Browse Figures

Versions Notes

Abstract

Introduction: Stroke remains a major cause of disability worldwide, and precise identification of stroke lesions is essential for prognosis and rehabilitation planning. Machine learning has emerged as a powerful tool for automating stroke lesion segmentation and outcome prediction; however, these tasks are often studied in isolation. The two strategies are inherently interdependent since segmentation provides lesion-based features that directly inform prediction models. Methods: This narrative review synthesises studies published between 2010 and 2024 on the application of machine learning in stroke lesion segmentation and recovery forecasting. A total of 23 relevant studies were reviewed, including 10 focused on lesion segmentation and 13 on recovery prediction. Results: Convolutional Neural Networks (CNNs), including architectures such as U-Net, have improved segmentation accuracy on the Anatomical Tracings of Lesions After Stroke (ATLAS) V2 dataset; however, dataset bias and inconsistent evaluation metrics limit comparability. Integrating imaging-derived lesion characteristics with clinical features improves predictive accuracy at a higher level. Furthermore, semi-supervised and self-supervised methods enhanced performance where annotated datasets are scarce. Discussion: The review highlights the interdependence between segmentation and outcome prediction. Reliable segmentation provides biologically meaningful features that underpin recovery forecasting, while prediction tasks validate the clinical relevance of segmentation outputs. This bidirectional relationship underlines the need for unified pipelines integrating lesion segmentation with outcome prediction. Future research can improve generalisability and foster clinically robust models by advancing semi-supervised and self-supervised learning, bridging the gap between automated image analysis and patient-centred prognosis. Conclusion: Accurate lesion segmentation and outcome prediction should be viewed not as separate goals but as mutually reinforcing components of a single pipeline. Progress in segmentation strengthens recovery forecasting, while predictive modelling emphasises the clinical importance of segmentation outputs. This interdependence provides a pathway for developing more effective, generalisable, and relevant AI-driven stroke care tools.

Keywords:

stroke; segmentation; prediction; ATLAS; MRI; machine learning

1. Introduction

A stroke is a medical emergency marked by discontinuing blood flow to the brain, leading to the deficiency of oxygen and nutrition for brain cells [1]. Stroke is a prominent causality of disability worldwide, and precise identification and quantification of stroke lesions is pivotal for diagnosis, prognosis, and treatment planning. There are two major divisions of stroke: ischemic stroke and haemorrhagic stroke [2]. Each has distinct characteristics and causes. Ischemic stroke, constituting almost 80% of all stroke instances, transpires whilst a blockage or plaque accumulation obstructs or constricts a blood artery, resulting in overall interruption of blood delivery to some aspects of the brain [3].

Neuroimaging plays a pivotal role in identifying and characterising stroke lesions, with structural imaging techniques such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) providing detailed anatomical information, while functional imaging modalities, including Diffusion Weighted Imaging (DWI), perfusion MRI, and functional MRI (fMRI), reveal dynamic changes in brain physiology and connectivity [4]. Zarahn et al. (2011) investigated whether early task-based fMRI adds predictive value to initial clinical impairment in patients with acute stroke [5]. Their results suggest that functional imaging may capture aspects of cortical reorganisation not reflected in clinical scales, offering valuable prognostic insights especially for severely impaired patients. The study by Kalinosky et al. (2017) highlights how multimodal imaging provides a richer picture of recovery potential than structural or functional measures alone, their results underscore that optimal recovery prediction requires both lesion-focused structural measures and network-level functional reorganisation [6]. The development of functional imaging techniques, like perfusion-CT and perfusion-MRI, has allowed for more implications of stroke’s dynamic and evolving nature, shifting the paradigm to a more complex and multifaceted one [7].

Stroke lesions are notoriously heterogeneous, even among patients with similar clinical profiles. Addressing this challenge has proven to be complex in the research community. Researchers have explored several machine learning techniques to classify and segment stroke lesions, including Support Vector Machine (SVM), random forests, and deep learning models [8,9]. To address the challenge posed by the variability in lesion characteristics, researchers have investigated the use of more sophisticated techniques, such as pyramid pooling and focal loss [10], which aim to capture both global and local contextual information, as well as focus on the more difficult samples during the training process [10].

Although significant progress has been made in stroke neuroimaging, research on lesion segmentation and recovery prediction has often developed in parallel rather than in an integrated manner. Segmentation studies focus on achieving technical accuracy, whereas recovery prediction research is more concerned with clinical outcomes. The latter often depends on broad lesion measures or simple clinical scores, without entirely using advanced lesion features. This gap makes it difficult to translate advances in imaging into practical tools that can guide rehabilitation strategies, underscoring the importance of linking segmentation research with outcome prediction.

Lesion segmentation is not an isolated task; it forms the foundation for recovery prediction. Accurate delineation of stroke lesions provides quantitative biomarkers such as lesion size, topography, and other attributes. Traditional scoring systems such as the NIH Stroke Scale (NIHSS) capture acute severity well but fall short in explaining the wide individual variability observed in long-term motor, language, or cognitive outcomes [11,12]. Integrating lesion segmentation with clinical scales like NIHSS will provide a more reliable and comprehensive foundation for outcome prediction. Moreover, integrating multimodal imaging enhances predictive accuracy [13,14]. In this way, segmentation enables the extraction of reliable imaging-derived predictors, while predictive frameworks translate them into clinically actionable forecasts of recovery trajectories.

Unsurprisingly, integrating multimodal MRI has become a popular method the for perfect segmentation of stroke lesions due to the complementary information each sequence provides. DWI is highly sensitive to acute ischemic changes, revealing cytotoxic oedema within minutes of onset, making it a cornerstone modality for early lesion localisation [15]. However, its limitations in delineating lesion boundaries and in later phases of stroke are mitigated by Fluid-Attenuated Inversion Recovery (FLAIR) and T2-weighted imaging, which provide better sensitivity to vasogenic oedema and chronic structural changes. T1-weighted images, while less commonly used in acute stroke, offer high-resolution anatomical context and are particularly useful for chronic lesion mapping and radiomics-based outcome prediction. Recent studies have leveraged multimodal fusion architectures—either through early channel concatenation or modality-specific encoding pathways to enhance segmentation accuracy [15]. These findings highlight the growing consensus that modalities should be synergistic sources of information to enhance both robustness and clinical interpretability of lesion segmentation.

Table 1 depicts the recent studies in stroke lesion segmentation using Multimodal approaches. Rather than using a single modality, these studies use multiple modalities to increase the accuracy and dice score. DWI is considered the core signal for stroke lesion segmentation because it quickly detects acute infarcts [13,15,16,17]. While Apparent Diffusion Coefficient (ADC)/Enhanced DWI (eDWI) provides quantitative diffusion contrasts that further refine lesion segmentation, it boosts Dice by 1–2% when combined with DWI [15]. FLAIR, T1, and T2 provide anatomical and oedema-related context; combining them improves boundary definition and general robustness [13,14].

A deep learning model was proposed for ischemic stroke lesion segmentation using multimodal MRI as input, specifically combining DWI, ADC, and eDWI into a three-channel input per slice [15]. The architecture features a DenseNet121 encoder paired with a Self-Organising Operational Neural Network (SelfONN)-based decoder, enhanced by Channel and Spatial Compound Attention (CSCA) and Deep Supervision Enhancement (DSE) modules. Although the exact spatial resolution of the input slices is not specified, the model leverages a weighted composite loss combining Dice and Jaccard losses to improve segmentation performance across lesion boundaries and imbalanced data. The study [16] tackled 3D segmentation of ischemic stroke lesions on the ISLES 2022 dataset—comprising 400 multi-vendor MRI scans (250 training, 150 test cases), each including DWI, ADC, and FLAIR modalities. However, FLAIR was excluded because of misalignment. All images underwent preprocessing steps, namely resampling and skull stripping. A Segmentation Residual Neural Network (SegResNet) architecture was employed, and training included an ensemble of 15 models, resulting in a Dice score of 0.824.

An unsupervised registration-based segmentation framework for lesions using multimodal MRI, including DWI, ADC, and FLAIR, stroke unsupervised registration, and segmentation (St-RegSeg) was introduced [13]. The pipeline combines a novel CNN with an unsupervised cascaded registration network named the ConvNXMorph registration model—featuring a cascaded ConvNeXt-R backbone with a Modality-Independent Neighborhood Descriptor (MIND)-based loss and attention gates—with nnUNet-v2 for downstream lesion segmentation. Evaluated on the ISLES 2022 dataset, St-RegSeg achieved superior Dice scores while significantly improving inference speed. The study also demonstrated that combining DWI and FLAIR yielded better segmentation performance than ADC and FLAIR, indicating DWI’s higher value for lesion segmentation. The framework offers a clinically relevant, robust solution for multimodal stroke imaging by integrating semantic alignment and lesion segmentation.

Jeong et al. (2024) explored the application of transfer learning for ischemic stroke lesion segmentation using multimodal MRI inputs, specifically DWI and ADC, with an ensemble of nnU-Net models [17]. Their approach leveraged pre-trained weights and adaptive training strategies to improve performance on small datasets, achieving competitive results on the ISLES 2022 benchmark while maintaining generalisability across domains.

In parallel, García-Salgado et al. (2024) presented a lightweight deep learning model based on an Attention U-Net enhanced with Generalised Dice Focal Loss, designed for segmenting ischemic stroke lesions across FLAIR, DWI, and T2 MRI modalities [14]. Evaluated on ISLES 2015 and 2022 datasets, the model demonstrated strong lesion boundary detection and robustness to class imbalance, making it suitable for clinical deployment in diverse imaging settings.

Faultless identification and quantification of lesions are vital for treatment planning. Automatic segmentation of stroke lesions from medical imaging data, such as MRI and perfusion-CT scans, is challenging due to the variability in lesion size, shape, and location, intensity variation, difficulty in capturing complex lesion boundaries precisely, and underlying cerebrovascular dynamics [18]. These make it difficult to distinguish the targeted lesions from the surrounding brain tissue, as highlighted in Figure 1. These drawbacks highlight the necessity of reliable, broadly applicable, and clinically interpretable segmentation methods for stroke imaging.

In stroke imaging, manual segmentation of lesions is time-consuming, requires expert knowledge, and is costly, making large-scale annotation impractical. As a result, semi-supervised and self-supervised learning approaches have gained attention for their ability to reduce reliance on extensive labelled data. Recent studies have emphasised the promise of these methods in alleviating the annotation bottleneck in medical image segmentation.

Su et al. (2024) proposed a mutual learning framework in which two subnetworks exchange only reliable pseudo-labels, determined by both prediction confidence and intra-class feature similarity, thereby reducing the influence of supervised learning and achieving state-of-the-art results across cardiac, pancreatic, and brain tumour benchmarks [19]. Complementing this, Soh and Rajapakse (2025) introduced a noise-induced self-supervised hybrid UNet-Transformer (HUT-NSS) that leverages unlabelled CT perfusion data with a noise anchor regulariser and domain adaptation, achieving substantial Dice score improvements over state-of-the-art models. These approaches highlight how integrating unlabeled data through reliability-aware semi-supervision and noise-regularised self-supervision can advance segmentation performance in challenging contexts such as ischemic stroke [20].

Early work by Bai et al. (2019) demonstrated the potential of self-supervised strategies in medical imaging by introducing an anatomical position prediction task for cardiac MR segmentation [21]. Instead of relying on manually drawn labels, the model learned in a self-supervised manner by predicting anatomical positions and was later fine-tuned for segmentation, showing significant performance gains with minimal annotations. These approaches highlight how semi-supervised and self-supervised methods reduce the need for extensive manual annotations and support the research community in advancing segmentation performance in challenging contexts such as ischemic stroke.

The science community initiated numerous challenges to support lesion segmentation research, ATLAS R2.0 [22] - Stroke Lesion Segmentation, Ischemic Stroke Lesion Segmentation Challenge - ISLES’22 [23], Ischemic Stroke Lesion Segmentation Challenge—ISLES’24 https://isles-24.grand-challenge.org/ (accessed on 18 June 2025) [24]. All the dataset challenges have become the de facto evaluation standard for new algorithms, particularly when adhering to some standards listed on the same web resource: Both training and testing datasets are representative of the task, well described, and large enough to draw significant conclusions from the results; the associated ground truth is created by experts following a clearly defined set of rules; the evaluation metrics were chosen to capture all aspects relevant for the task; and, ideally, challenges remain open for future contestants and serve as an ongoing benchmark for algorithms in the field.

Public datasets such as ATLAS and ISLES demonstrated advances in automated lesion segmentation. At the same time, they have also highlighted ongoing challenges, including variation across datasets, inconsistent evaluation practices, and a lack of thorough external validation. These issues make it difficult to turn algorithmic improvements into tools that have real clinical impact. In this context, our review aims to (i) bring together developments in stroke lesion segmentation and recovery prediction; (ii) highlight their interdependence; and (iii) critically assess current challenges and opportunities for clinical translation. By synthesising these fields, the review seeks to provide researchers and clinicians with an integrated perspective on how imaging and machine learning can jointly advance personalised stroke care.

2. Methods

This study assessed the current research on machine learning and deep learning methods for segmenting brain stroke lesions and predicting recovery after a stroke. This process involved systematically reviewing and synthesising studies from PubMed, Scopus and Web of Science, focusing on identifying trends, challenges, and future directions for research in these areas.

Study Search Strategy

This review includes a comprehensive search of various primary research studies from 2010 to September 2024, excluding reviews and surveys from the search. The search term encompassed the following search strings (stroke OR ischemic stroke OR hemorrhagic stroke) AND (lesion* OR scar*) AND (segmentation), (post-stroke OR “post stroke” OR “stroke recovery” OR “stroke rehabilitation”) AND (prediction OR predictor* OR forecast* OR prognos*) AND (recovery OR outcome* OR improvement OR rehabilitation OR “functional recovery”), (stroke OR “post-stroke” OR “ischemic stroke” OR “hemorrhagic stroke”) AND (recovery OR rehabilitation OR outcome* OR improvement) AND (prediction OR predictor* OR forecast* OR prognos*) AND (“U-Net” OR “U Net” OR “UNet” OR “convolutional neural network” OR “deep learning”). A total of 23 articles were included for analysis in this review; the screening flow chart (Figure 2) details the review process.

3. Results

This review focuses on two key domains in stroke research: stroke lesion segmentation using the ATLAS dataset and post-stroke recovery prediction through machine learning techniques. Studies that utilised the ATLAS dataset for segmentation tasks were selected to ensure consistency and comparability, as this dataset serves as a benchmark in the field. For post-stroke recovery prediction, the review concentrated on research employing machine learning methods to predict functional outcomes, such as motor recovery, language restoration, and cognitive performance. The review explores the intersection of lesion segmentation and post-stoke recovery prediction by combining these two research and providing insights into how advancements in one domain could inform progress in the other.

3.1. Machine Learning-Driven Stroke Lesion Segmentation with ATLAS Dataset

The key findings and characteristics of ten studies in stroke segmentation using deep learning with ATLAS as the dataset are outlined; seven utilised ATLAS V2.0 [22], while three relied on ATLAS V1 [25]. These studies were published between 2020 and 2023.

To take advantage of MRI in stroke lesion segmentation, several deep-learning models with MRI have been proposed. Here, seven studies employed the ATLAS V2.0 dataset, in all of which the preprocessing steps are intensity normalisation, skull stripping, slicing, and cropping [26,27,28,29]. One study had a variation in preprocessing; they used Gaussian denoising as a preprocessing step, resulting in a higher Dice score among all seven studies (94.2%) [30]. Another study used matrix complement as a preprocessing step and was able to achieve a dice coefficient of 69.72 [31].

During the literature gathering, it was observed that many of the researchers adopted the U-Net model as the segmentation technique. Some focused on improving the structure, while others focused on using them as the base of their method. For example, Verma et al. (2022) demanded that they apply 3D-Unet, the first hybrid contextual semantic network with k-fold cross-validation and data augmentation for regularised overfitting [29]. A benchmarking framework employing standard U-Net variants for both 2D and 3D models was presented, achieved the highest Dice score of 0.583 and 0.504 on the 2D transformer-based model and 3D residual U-Net, respectively [26].

Similarly, the work proposed by Huo et al. used the nnU-Net framework for lesion segmentation [27]. This method achieved first place on the unseen test data in the 2022 MICCAI (Medical Image Computing and Computer Assisted Intervention) ATLAS Challenge with an average Dice score of 0.6667. They also tried to obtain an ensemble prediction for better results with effective preprocessing. A Fuzzy Information Seeded Region Growing (FISRG) approach was introduced by integrating fuzzy logic with seeded region growing for enhanced segmentation accuracy (94.2%) [30]. The study employed a final post-processed image where morphological operations refined the segmentation results. Liu et al. provided a Hybrid Contextual Semantic Module (HCSM) in the skip connection layer and residual blocks in the encoder/decoder layers by extending the U-shaped architecture [31]. They argued that it can accurately segment and detect small-size stroke lesions from magnetic resonance images.

Liu et al. experimented with a Simulated Quantum Mechanics-based joint Learning network (SQMLP-net) that simultaneously segments lesions and examines Thrombolysis in Cerebral Infarction (TICI) grade [32]. They concluded that the accuracy of stroke lesion segmentation negatively correlated with the severity of thrombolysis in cerebral infarction. Mohapatra et al. employed an extensive study of eight variant 2D model architectures; they implemented an ensemble method involving stacking and agreement window for the final enhanced prediction [28]. They used an in-house dataset for training and testing purposes along with ATLAS, and they adopted a mixed data approach and intermediate task training approach. In a mixed data approach, VNet outperformed all other 2D models, and in intermediate task training, Fully Connected Densenet outperformed other models. They concluded that the agreement window method super-region-wise generated the best lesion volume.

An enhanced Multi-scale Long-range Interactive and Regional Attention Network (MLiRA-Net) method is offered, integrating multi-scale hierarchical and local-global features for better segmentation accuracy [33]. The authors improved the performance of MLiRA-Net by adding skip connections between the output layers of the encoder and the input layers of the decoder. An extended 3D UNet architecture to accommodate the volumetric segmentation of chronic stroke lesions on T1-weighted MRI scans more effectively and accurately to assist neuroradiologists was suggested [34] in this task; it provided personalised rehabilitation to achieve adequate recovery. Table 2 presents a summary of recent segmentation studies based on ATLAS.

3.2. Post-Stroke Recovery Prediction Through Machine Learning Techniques

This section explores studies that apply machine learning techniques to predict recovery outcomes in stroke patients. These studies aimed to improve the prediction of recovery trajectories and optimising personalised rehabilitation strategies by leveraging multiple levels of data, including clinical, demographical, neuroimaging, and functional assessment.

Corticospinal tract injury observed in acute stroke imaging has been identified as a key predictor of upper extremity motor recovery, with its predictive accuracy unaffected by the extent of damage to the primary motor and premotor cortices [36]. Factors such as age, sex, obesity, education level, stroke location, stroke severity, comorbidities, smoking and alcohol history, medical complications, and functional assessments may influence the ability to predict functional outcomes such as motor function, mobility, cognition, language, swallowing, and activities of daily living in first-time stroke patients [37].

The Motor-Evoked Potentials (MEP) response in Transcranial Magnetic Stimulation (TMS) and preserved Corticospinal Tracts (CST) integrity in Diffusion Tensor Imaging (DTI) have been identified as strong predictors of Upper-Limb (UL) motor recovery in ischemic stroke patients [38]. A high score on the Fugl–Meyer Assessment (FMA) and Motricity Index (MI) has the potential to refine the outcome of the prediction model further.

The patterns of recovery from aphasia within the first year after a stroke has been investigated, focusing on how lesion location and extent influence recovery outcomes [39]. Initial evaluation of speech and language functions were assessed using the Quick Aphasia Battery (QAB) within 5 days of the stroke served as the dependent variable, and lesions were manually delineated based on MRI or CT imaging for the study served as the independent variable of the model. Even though the recovery pattern varies significantly across different speech and language domains, patients with circumscribed frontal lesions recovered well, and those with extensive damage in the middle cerebral artery distribution or temporoparietal regions had a constant shortfall.

Regression was the most common method to develop models among the included studies. Specifically, multivariable logistic regression analysis was used in four studies [38,40,41,42], and three studies used linear, ridge, and elastic net regression models [36,43,44]. According to Lee et al. the integrity of the cerebellar tract was identified as the primary biomarker predicting the recovery of upper extremity motor function over three months [45]. Their study examined biomarkers for good and poor recovery. They implemented fractional anisotropy analysis from diffusion tensor imaging, functional connectivity analysis from resting-state functional MRI, and statistical analysis to extract the biomarkers.

The three studies used multivariable logistic regression analysis with clinical and demographic data to predict post-stroke outcomes: motor impairment, physical functioning, and upper limb functioning [40,41,42]. They measured performance with the Area Under the Curve (AUC), showing values of 0.833, 0.883, and 0.86. The highest AUC (0.883) was achieved for physical functioning, using data from 717 patients and building models for improved functioning and independence. Variations in AUC reflect differences in data richness. The logistic model was converted to an integer score using a regression coefficient-based scoring method called the Scoring Rule; the scoring rule helps identify patients at treatment failure risk, aiding in admission decision-making [42]. Lundquist et al. highlighted Motor-Evoked Potentials (MEP) status and neglect are crucial for predicting non-normal UL use [40].

The study by Sale et al. utilised the SVM approach to identify significant inflammatory biomarkers for predicting rehabilitation outcomes in stroke patients [46]. Feature selections were made using the Kernel Density Estimation and Mutual Information (MI) Criterion. They also concluded that the type of stroke is not a vital variable for predicting the discharge data. Some multimodal CNN combine MRI data with demographic and clinical characteristics to improve prediction accuracy [47]. The study reveals that left hemisphere lesion size was least important when damage to critical anatomical regions of interest was incorporated, and some of the 2D models outperformed 3D CNNs in terms of mean accuracy. Tang et al. analysed the key factors influencing independent walking prediction in poststroke patients [48]. Prediction performed by LR and three machine learning (ML) models: eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Random Forest (RF); the XGBoost model revealed the best overall performance with age, lower limb spasticity, Functional Ambulation Category (FAC) at admission, and Fugl–Meyer Motor Assessment of the Lower Extremity (FMA-LE) to be the key factors influencing independent walking. Table 3 compares various machine learning models used for post-stroke recovery prediction.

In this study, we identified recent articles about brain stroke lesion segmentation with machine learning using MRI (70%), CT (10%) images, and demographic data (30%). Some papers utilise several modalities in a paper, so the sum of the percentages may be greater than 100%. These papers used Dice score (43%), accuracy rate (17%), coefficient of Determination (13%), area under the curve (22%), and correlation (4%) metrics as performance evaluation.

The comparative analysis of different dice scores of stroke lesion segmentation studies is presented in Figure 3. All the methods were compared using the same dataset, i.e., ATLAS. In contrast, the distribution of the dataset into a training set, validation set, and test set was different in different methods. Mean Dice coefficients range from 0.58 to 0.94. Researchers used U-Net (30%), CNN (9%), fuzzy logic (4%), regression (40%), and deep learning architectures and 17% of them used another model. A total of 74% of papers use a dataset of more than 200 image scans, and 26% use a dataset of fewer than 60 image scans.

4. Discussion

Machine learning plays a vital role in medical image segmentation, facilitating accurate delineation of areas of interest. Machine learning offers novel approaches to address complex medical challenges. The ability to accurately predict post-stroke recovery by applying ML in quantifiable image features will improve patient outcomes as it allows healthcare providers to tailor interventions based on individual patient characteristics and projected recovery trajectories [49].

The spectrum of association between different medical parameters and post-stroke recovery still needs to be fully understood. More light could be shed on this by using more precise techniques on medical imaging and clinical data. There is a significant association between behavioural measures and the prediction of anomia recovery, and resting-state functional MRI (rsfMRI) features like fALFF (fractional Amplitude of Low-Frequency Fluctuations) helped predict agrammatism and dysgraphia [43]. The authors took 27 behavioural measures comprising 11 languages and cognitive assessments. Rivier et al. combined the Fugl–Meyer Assessment (FMA) scores with MRI-based brain connectivity measures to predict motor recovery in stroke patients [44]. Brain connectivity measures analysis was conducted by introducing virtual lesions into healthy connectomes; the study estimates the impact of stroke on the brain’s network. Measures like modularity, participation coefficient, and eigenvector centrality are derived from these connectivity analyses.

We suggest that the stroke lesion segmentation researchers share their algorithms’ source code, data, and model, enabling other researchers to gain insights from similar studies and provoke a new wave of lesion segmentation advancements. In the studies included in this review, fewer than 50% of researchers published their source code. Some of them needed to be given access to the data. Machine learning-based methods, such as 3D CNNs and U-Net networks, have shown high accuracy and excellent soft tissue contrast in MRI and CT image segmentation tasks. However, they can be computationally intensive and require extensive training data. Other methods have also shown potential for clinical applications but may need to be more accurate than deep learning-based approaches. Table 2 demonstrates that the fuzzy logic combined with the seeded region growing algorithm is the most accurate for stroke lesion segmentation. In contrast, Table 3 depicts regression models as the best method for accurately segmenting stroke lesions.

The literature on stroke lesion segmentation demonstrates evident methodological progress, with the ATLAS dataset as a cornerstone. Deep learning architectures, particularly U-Net derivatives, remain the dominant approach, with Dice scores ranging between 0.58 and 0.75 in most cases. Refinements such as weighted or compound loss functions (Dice + BCE, TopK loss), patch-based strategies, and ensemble transfer learning modestly improve performance, while hybrid models such as SQMLP-net and HCSNet attempt to capture contextual or multi-task features. Notably, classical intensity-based methods like FISRG reported Dice values as high as 0.94, but these often suffer from computational inefficiency and poor robustness to variable lesion textures. Collectively, segmentation research reflects a balance between architectural sophistication and the persistent challenge of capturing small, subtle lesions.

In contrast, recovery prediction studies span a broader methodological spectrum, from classical regression models to advanced machine learning and deep learning approaches. Smaller-scale studies leveraging functional MRI and diffusion imaging demonstrate encouraging results (e.g., elastic net regression explaining 95% of variance in language recovery; ridge regression with R² 0.68 for motor outcomes) but are hampered by limited sample sizes and a lack of external validation. Larger retrospective cohorts (n = 700–7800) relying primarily on demographic and clinical data achieve respectable predictive accuracy (AUCs ranging from 0.83 to 0.89). However, these often omit neuroimaging or use only coarse imaging proxies, leaving a disconnect between image-derived biomarkers and large-scale prediction. This divergence reflects a key pattern: imaging-rich studies prioritise mechanistic insights but lack scale, while large clinical datasets provide statistical power but limited imaging depth.

Segmentation studies agree on the necessity of rigorous preprocessing—normalisation, skull stripping, registration, and augmentation—but differ widely in architecture and loss design, leading to divergent outcomes. Prediction studies similarly converge on the prognostic importance of lesion burden and location but diverge in data types: some rely on advanced modalities, while others focus on demographics and clinical scores and sometimes both.

The most pressing gap is the lack of integration: segmentation pipelines frequently stop at lesion segmentation without linking to clinical recovery. At the same time, prediction models often omit imaging or rely on manually derived lesion masks. This disconnect hinders the translation of imaging advances into clinically meaningful prognostic tools. Another gap is external validation: few segmentation studies conduct tests beyond ATLAS, and few prediction studies replicate across multiple cohorts. This lack of cross-dataset validation limits generalisability.

Despite progress in stroke lesion segmentation and recovery prediction, key challenges remain. Dataset heterogeneity—differences in scanners, protocols, and preprocessing—limits model generalisability, while small and imbalanced cohorts reduce robustness. The lack of standardised evaluation metrics makes studies difficult to compare, and external validation is often missing, raising concerns about overfitting. Technically, models may still miss subtle lesions or produce false positives from artefacts. Finally, issues of interpretability and bias affect clinician trust and equity. Addressing these barriers is crucial for moving segmentation–prediction pipelines into routine clinical practice.

A deeper examination underscores the central role of segmentation in enabling reliable recovery prediction. Lesion segmentation provides precise spatial and volumetric information about ischemic injury, informing prognostic modelling. Without accurate segmentation, models risk relying on coarse proxies such as lesion volume alone, which may obscure critical topographic information. Furthermore, segmentation outputs can be coupled with radiomics feature extraction or deep feature embedding, supplying high-dimensional descriptors that capture lesion heterogeneity. These imaging-derived features improve outcome models’ interpretability and predictive performance when integrated with clinical and demographic data. Thus, segmentation is a technical task and a foundational step that bridges raw imaging data with clinically actionable recovery forecasts.

5. Future Directions

A promising avenue for enhancing stroke recovery prediction models lies in incorporating White Matter Hyperintensities (WMHs)—MRI-detectable lesions associated with chronic small vessel disease [50]. While most current deep learning models focus on acute lesion segmentation, WMHs provide vital information about brain reserve, and pre-existing tissue vulnerability, significantly influencing functional recovery trajectories. Studies have demonstrated that a higher WMH burden is correlated with increased risk of stroke recurrence, cognitive decline, and impaired post-stroke rehabilitation outcomes [51,52]. Importantly, Peng et al. (2025) emphasised not only the burden but also the spatial distribution of WMHs. Evaluating WMH location may help predict stroke’s functional outcome more accurately [53].

Therefore, we suggest that future models should consider the joint analysis of acute stroke lesions and chronic WMHs, possibly through multimodal input fusion, to enhance recovery forecasting accuracy. Significantly fewer studies show the impact of WMH on motor systems. Based on a large-scale observational study examining the relationship between brain age, lesion volume, and functional outcomes following stroke [54], adolescent brains are associated with good post-stroke outcome compared to older brains. We also suggest that incorporating eXplainable AI (XAI) into stroke outcome models that include both acute ischemic lesions and White Matter Hyperintensities (WMHs) can significantly improve their clinical applicability and trustworthiness.

Additionally, we recommend considering Federated Learning (FL) for data sharing, which provides a way for centres to collaborate on model training without the need to share raw patient data [55]. By enabling secure multi-centre studies, FL reduces the barriers of data silos and enhances reproducibility. Recent work by Rangel et al. used a FedAvg-based framework that showed strong cross-institution performance, achieving a Dice Similarity Coefficient (DSC) of 0.71 ± 0.24, outperforming both centralised and alternative federated baselines [56].

6. Conclusions

Stroke lesion segmentation and recovery prediction are central to advancing precision medicine in stroke as accurate lesion segmentation quantifies injury and underpins prognostic modelling and rehabilitation planning. While deep learning approaches such as U-Net variants have improved segmentation accuracy, and predictive models increasingly combine imaging with clinical features, challenges remain, such as small and heterogeneous datasets, inconsistent metrics, limited external validation, and the underuse of imaging biomarkers in large cohorts hinder clinical translation. Future progress will depend on bridging segmentation and prediction with federated learning, domain adaptation, and semi-/self-supervised strategies, enabling robust multi-centre models. At the same time, multimodal integration and explainable AI will be key to clinician trust and adoption. Together, these advances can drive the development of clinically meaningful tools that support individualised stroke recovery pathways.

Author Contributions

S.M.S. conceptualised the study, conducted a literature search, synthesised the findings, and wrote the manuscript. S.M. and A.W. contributed to the supervision, review, and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Health Research Council of New Zealand’s project 21/144, the Marsden Fund Project 22-UOA-120, and the Royal Society Catalyst: Seeding General Project 23-UOA-055-CSG.

Data Availability Statement

All data generated or analysed during this study are included in the published article.

Acknowledgments

The authors would like to thank the University of Auckland and the Auckland Bioengineering Institute for their support and guidance during the preparation of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ADC	Apparent Diffusion Coefficient
ANTs	Advanced Normalisation Tools
ATLAS	Anatomical Tracings of Lesions After Stroke
AUC	Area Under the Curve
BCE	Binary Cross-Entropy Loss
CNN	Convolutional Neural Network
ConvNeXt	Convolutional Neural Network with Next-Generation Architecture
CSCA	Channel and Spatial Compound Attention
CST	Corticospinal Tracts
CT	Computed Tomography
DICE	Dice Loss
DNN	Deep Neural Network
DSC	Dice Similarity Coefficient
DTI	Diffusion Tensor Imaging
eDWI	Enhanced Diffusion-Weighted Imaging
eXGBoost	eXtreme Gradient Boosting
FAC	Functional Ambulation Category
fALFF	fractional Amplitude of Low-Frequency Fluctuations
FISRG	Fuzzy Information Seeded Region Growing
FLAIR	Fluid-Attenuated Inversion Recovery
FMA	Fugl–Meyer Assessment
FMA-LE	Fugl–Meyer Motor Assessment of the Lower Extremity
HCSM	Hybrid Contextual Semantic Module
HCSNet	Hybrid Contextual Semantic Network
ISLES’22	Ischemic Stroke Lesion Segmentation Challenge 2022
ISLES’24	Ischemic Stroke Lesion Segmentation Challenge 2024
LR	Linear Regression
MEP	Motor-Evoked Potentials
MICCAI	Medical Image Computing and Computer-Assisted Intervention
MI	Mutual Information
MIND	Modality Independent Neighborhood Descriptor
ML	Machine Learning
MLiRA-Net	Multi-scale Long-range Interactive and Regional Attention Network
MRI	Magnetic Resonance Imaging
QAB	Quick Aphasia Battery
RF	Random Forest
rsfMRI	Resting-State Functional Magnetic Resonance Imaging
SegResNet	Segmentation Residual Neural Network
SelfONN	Self-Organising Operational Neural Network
SQMLP-net	Simulated Quantum Mechanics-based Joint Learning Network
St-RegSeg	Stroke Unsupervised Registration and Segmentation Framework
SVM	Support Vector Machine
TICI	Thrombolysis in Cerebral Infarction
TMS	Transcranial Magnetic Stimulation
TransMorph	Transformer-Based Deformable Image Registration Model
U-Net	U-shaped Convolutional Neural Network
UL	Upper Limb
VNet	V-shaped Convolutional Neural Network
WMH	White Matter Hyperintensity
XAI	Explainable Artificial Intelligence

References

Mackay, J.; Mensah, G.A. The Atlas of Heart Disease and Stroke; World Health Organization: Geneva, Switzerland, 2004.
Ling, X.; Zheng, Y.; Tao, J.; Zheng, Z.; Chen, L. Association study of polymorphisms in the ABO gene with ischemic stroke in the Chinese population. BMC Neurol. 2016, 16, 146. [Google Scholar] [CrossRef]
Benjamin, E.J.; Muntner, P.; Alonso, A.; Bittencourt, M.S.; Callaway, C.W.; Carson, A.P.; Chamberlain, A.M.; Chang, A.R.; Cheng, S.; Das, S.R.; et al. Heart Disease and Stroke Statistics—2019 Update: A Report From the American Heart Association. Circulation 2019, 139, e56–e528. [Google Scholar]
Kakkar, P.; Kakkar, T.; Patankar, T.; Saha, S. Current Approaches and Advances in the Imaging of Stroke. Dis. Model. Mech. 2021, 14, dmm048785. [Google Scholar] [CrossRef] [PubMed]
Zarahn, E.; Alon, L.; Ryan, S.; Lazar, R.; Vry, M.S.; Weiller, C.; Marshall, R.; Krakauer, J. Prediction of Motor Recovery Using Initial Impairment and fMRI 48 h Poststroke. Cereb. Cortex 2011, 21, 2712–2721. [Google Scholar] [CrossRef] [PubMed]
Kalinosky, B.; Schmit, B.; Schindler-Ivens, S. Structurofunctional Resting-State Networks Correlate with Motor Function in Chronic Stroke. Neuroimage Clin. 2017, 16, 610–623. [Google Scholar] [CrossRef] [PubMed]
Ahuja, C.K.; Gupta, V.; Khandelwal, N. Acute Stroke Imaging: Current Trends. Ann. Natl. Acad. Med. Sci. 2019, 55, 193–201. [Google Scholar]
Halme, H.L.; Korvenoja, A.; Salli, E. ISLES (SISS) Challenge 2015: Segmentation of Stroke Lesions Using Spatial Normalization, Random Forest Classification and Contextual Clustering. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Crimi, A., Menze, B., Maier, O., Reyes, M., Handels, H., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9556, pp. 211–221. [Google Scholar]
Kamnitsas, K.; Ledig, C.; Newcombe, V.F.J.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar]
Abulnaga, S.M.; Rubin, J. Ischemic Stroke Lesion Segmentation in CT Perfusion Scans using Pyramid Pooling and Focal Loss. In International MICCAI Brainlesion Workshop; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
Peters, H.; White, S.E.; Page, S.J. he NIH Stroke Scale Lacks Validity in Chronic Hemiparetic Stroke. Am. J. Occup. Ther. 2016, 70 (4_Supplement_1), 7011500008p1. [Google Scholar] [CrossRef]
Marsh, E.B.; Lawrence, E.; Hillis, A.E.; Chen, K.; Gottesman, R.F. The NIH Stroke Scale Has Limited Utility in Accurate Daily Monitoring of Neurologic Status. Neurohospitalist. 2016, 6, 97–101. [Google Scholar] [CrossRef]
Gi, C.; An, X.; Li, T.; Liu, S.; Ming, D. St RegSeg: An unsupervised registration based framework for multimodal magnetic resonance imaging stroke lesion segmentation. Quant. Imaging Med. Surg. 2024, 14, 9459–9476. [Google Scholar] [CrossRef]
García-Salgado, B.P.; Almaraz-Damian, J.A.; Cervantes-Chavarria, O.; Ponomaryov, V.; Reyes-Reyes, R.; Cruz-Ramos, C.; Sadovnychiy, S. Enhanced Ischemic Stroke Lesion Segmentation in MRI Using Attention U-Net with Generalized Dice Focal Loss. Appl. Sci. 2024, 14, 8183. [Google Scholar] [CrossRef]
Rahman, A.; Chowdhury, M.E.; Wadud, M.S.I.; Sarmun, R.; Mushtak, A.; Zoghoul, S.B.; Al-Hashimi, I. Deep learning driven segmentation of ischemic stroke lesions using multi-channel MRI. Biomed. Signal Process. Control 2025, 105, 107676. [Google Scholar]
Siddiquee, M.M.R.; Yang, D.; He, Y.; Xu, D.; Myronenko, A. Automated ischemic stroke lesion segmentation from 3D MRI: ISLES 2022 challenge report. arXiv 2022, arXiv:2209.09546. [Google Scholar] [CrossRef]
Jeong, H.; Lim, H.; Yoon, C.; Won, J.; Lee, G.Y.; de la Rosa, E.; Kirschke, J.S.; Kim, B.; Kim, N.; Kim, C.; et al. Robust Ensemble of Two Different Multimodal Approaches to Segment 3D Ischemic Stroke Segmentation Using Brain Tumor Representation Among Multiple Center Datasets. J. Imaging Inform. Med. 2024, 37, 2375–2389. [Google Scholar] [CrossRef]
Maier, O.; Schröder, C.; Forkert, N.D.; Martinetz, T.; Handels, H. Classifiers for Ischemic Stroke Lesion Segmentation: A Comparison Study. PLoS ONE 2015, 10, e0145118. [Google Scholar]
Su, J.; Luo, Z.; Lian, S.; Lin, D.; Li, S. Mutual Learning with Reliable Pseudo Label for Semi-Supervised Medical Image Segmentation. Med. Image Anal. 2024, 94, 103111. [Google Scholar] [CrossRef]
Soh, W.; Rajapakse, J. Noise-Induced Self-Supervised Hybrid UNet Transformer for Ischemic Stroke Segmentation with Limited Data Annotations. Sci. Rep. 2025, 15, 19783. [Google Scholar] [CrossRef]
Bai, W.; Chen, C.; Tarroni, G.; Duan, J.; Guitton, F.; Petersen, S.E.; Guo, Y.; Matthews, P.M.; Rueckert, D. Self-Supervised Learning for Cardiac MR Image Segmentation by Anatomical Position Prediction. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Shenzhen, China, 13–17 October 2019; Springer: Cham, Switzerland, 2019; Volume 11765, pp. 541–549. [Google Scholar] [CrossRef]
Liew, S.; Lo, B.P.; Donnelly, M.R.; Zavaliangos-Petropulu, A.; Jeong, J.N.; Barisano, G.; Hutton, A.; Simon, J.P.; Juliano, J.M.; Suri, A.; et al. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms. Sci. Data 2022, 9, 320. [Google Scholar] [CrossRef] [PubMed]
Hernandez Petzsche, M.R.; de la Rosa, E.; Hanning, U.; Wiest, R.; Valenzuela Pinilla, W.E.; Reyes, M.; Meyer, M.I.; Liew, S.-L.; Kofler, F.; Ezhov, I.; et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci. Data 2022, 9, 762. [Google Scholar] [CrossRef] [PubMed]
Evamaria, O.R.; Ezequiel, d.R.; Anh, B.T.; Moritz, H.P.; Hakim, B.; Kaiyuan, Y.; Antonio, M.F.; Houjing, H.; David, R.; Oscar, S.J.; et al. ISLES’24—A Real-World Longitudinal Multimodal Stroke Dataset. arXiv 2025, arXiv:2408.11142. [Google Scholar] [CrossRef]
Liew, S.L.; Zavaliangos-Petropulu, A.; Sondag, M. Anatomical Tracings of Lesions After Stroke (ATLAS) v2.0: A Curated Dataset of Stroke Lesions for Data Sharing and Machine Learning. Sci. Data 2023, 10, 51. [Google Scholar]
Deb, P.; Baru, L.B.; Dadi, K.; S, B.R. BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision. In Proceedings of the 9th International Workshop, BrainLes 2023, and 3rd International Workshop, SWITCH 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, 8–12 October 2023. [Google Scholar]
Huo, J.; Chen, L.; Liu, Y.; Boels, M.; Granados, A.; Ourselin, S.; Sparks, R. MAPPING: Model Average with Post-processing for Stroke Lesion Segmentation. arXiv 2022, arXiv:2211.15486. [Google Scholar] [CrossRef]
Mohapatra, S.; Gosai, A.; Shinde, A.; Rutkovskii, A.; Nouduri, S.; Schlaug, G. Meta-Analysis of Transfer Learning for Segmentation of Brain Lesions. arXiv 2023, arXiv:2306.11714. [Google Scholar] [CrossRef]
Verma, K.; Kumar, S.; Paydarfar, D. Automatic Segmentation and Quantitative Assessment of Stroke Lesions on MR Images. Diagnostics 2022, 12, 2055. [Google Scholar] [CrossRef] [PubMed]
González, M.P. Fuzzy Information Seeded Region Growing for Automated Lesions After Stroke Segmentation in MR Brain Images. arXiv 2023, arXiv:2311.11742. [Google Scholar] [CrossRef]
Liu, L.; Chang, J.; Liu, Z.; Zhang, P.; Xu, X.; Shang, H. Hybrid Contextual Semantic Network for Accurate Segmentation and Detection of Small-Size Stroke Lesions From MRI. IEEE J. Biomed. Health Inform. 2023, 27, 4062–4073. [Google Scholar] [CrossRef]
Liu, L.; Chang, J.; Liang, G.; Xiong, S. Simulated Quantum Mechanics-Based Joint Learning Network for Stroke Lesion Segmentation and TICI Grading. IEEE J. Biomed. Health Inform. 2023, 27, 3372–3383. [Google Scholar]
Wu, Z.; Zhang, X.; Li, F.; Wang, S.; Huang, L. Multi-scale long-range interactive and regional attention network for stroke lesion segmentation. Comput. Electr. Eng. 2022, 103, 108345. [Google Scholar]
Tomita, N.; Jiang, S.; Maeder, M.E.; Hassanpour, S. Automatic post-stroke lesion segmentation on MR images using 3D residual convolutional neural network. Neuroimage Clin. 2020, 27, 102276. [Google Scholar]
Alquhayz, H.; Tufail, H.Z.; Raza, B. The multi-level classification network (MCN) with modified residual U-Net for ischemic stroke lesions segmentation from ATLAS. Comput. Biol. Med. 2022, 151, 106332. [Google Scholar]
Lin, D.J.; Cloutier, A.M.; Erler, K.S.; Cassidy, J.M.; Snider, S.B.; Ranford, J.; Parlman, K.; Giatsidis, F.; Burke, J.F.; Schwamm, L.H.; et al. Corticospinal Tract Injury Estimated From Acute Stroke Imaging Predicts Upper Extremity Motor Recovery After Stroke. Stroke 2019, 50, 3569–3577. [Google Scholar] [CrossRef] [PubMed]
Shin, S.; Chang, W.H.; Kim, D.Y.; Lee, J.; Sohn, M.K.; Song, M.K.; Shin, Y.; Lee, Y.; Joo, M.C.; Lee, S.Y.; et al. Clustering and prediction of long-term functional recovery patterns in first-time stroke patients. Front. Neurol. 2023, 14, 1130236. [Google Scholar] [CrossRef] [PubMed]
Kumar, P.; Prasad, M.; Das, A.; Vibha, D.; Garg, A.; Goyal, V.; Srivastava, A.K. Utility of transcranial magnetic stimulation and diffusion tensor imaging for prediction of upper-limb motor recovery in acute ischemic stroke patients. Ann. Indian Acad. Neurol. 2022, 25, 54. [Google Scholar] [CrossRef] [PubMed]
Wilson, S.M.; Entrup, J.L.; Schneck, S.M.; Onuscheck, C.F.; Levy, D.F.; Rahman, M.; Willey, E.; Casilio, M.; Yen, M.; Brito, A.C.; et al. Recovery from aphasia in the first year after stroke. Brain 2023, 146, 1021–1039. [Google Scholar] [CrossRef]
Lundquist, C.B.; Nielsen, J.F.; Brunner, I.C. Prediction of Upper Limb use Three Months after Stroke: A Prospective Longitudinal Study. J. Stroke Cerebrovasc. Dis. 2021, 30, 106025. [Google Scholar] [CrossRef]
Scrutinio, D.; Lanzillo, B.; Guida, P.; Mastropasqua, F.; Monitillo, V.; Pusineri, M.; Formica, R.; Russo, G.; Guarnaschelli, C.; Ferretti, C.; et al. Development and Validation of a Predictive Model for Functional Outcome After Stroke Rehabilitation: The Maugeri Model. Stroke 2017, 48, 3308–3315. [Google Scholar] [CrossRef]
Scrutinio, D.; Guida, P.; Lanzillo, B.; Ferretti, C.; Loverre, A.; Montrone, N.; Spaccavento, S. Rehabilitation Outcomes of Patients With Severe Disability Poststroke. Arch. Phys. Med. Rehabil. 2019, 100, 520–529.e3. [Google Scholar] [CrossRef]
Iorga, M.; Higgins, J.; Caplan, D.; Zinbarg, R.; Kiran, S.; Thompson, C.K.; Rapp, B.; Parrish, T.B. Predicting language recovery in post-stroke aphasia using behavior and functional MRI. Sci. Rep. 2021, 11, 8419. [Google Scholar] [CrossRef]
Rivier, C.; Preti, M.G.; Nicolo, P.; Van De Ville, D.; Guggisberg, A.G.; Pirondini, E. Prediction of poststroke motor recovery benefits from measures of sub-acute widespread network damages. Brain Commun. 2023, 5, fcad055. [Google Scholar] [CrossRef]
Lee, J.; Kim, H.; Kim, J.; Chang, W.H.; Kim, Y.H. Multimodal Imaging Biomarker-Based Model Using Stratification Strategies for Predicting Upper Extremity Motor Recovery in Severe Stroke Patients. Neurorehabilit. Neural Repair 2022, 36, 217–226. [Google Scholar] [CrossRef]
Sale, P.; Ferriero, G.; Ciabattoni, L.; Cortese, A.M.; Ferracuti, F.; Romeo, L.; Masiero, S. Predicting Motor and Cognitive Improvement Through Machine Learning Algorithm in Human Subject that Underwent a Rehabilitation Treatment in the Early Stage of Stroke. J. Stroke Cerebrovasc. Dis. 2018, 27, 2962–2972. [Google Scholar]
White, A.; Saranti, M.; d’Avila Garcez, A.; Hope, T.M.H.; Price, C.J.; Bowman, H. Predicting recovery following stroke: Deep learning, multimodal data and feature selection using explainable AI. Neuroimage Clin. 2024, 43, 103638. [Google Scholar] [PubMed]
Tang, Z.; Su, W.; Liu, T.; Lu, H.; Liu, Y.; Li, H.; Zhang, H. Prediction of poststroke independent walking using machine learning: A retrospective study. BMC Neurol. 2024, 24, 332. [Google Scholar] [CrossRef] [PubMed]
Jabal, M.S.; Joly, O.; Kallmes, D.; Harston, G.; Rabinstein, A.; Huynh, T.; Brinjikji, W. Interpretable Machine Learning Modeling for Ischemic Stroke Outcome Prediction. Front. Neurol. 2022, 13, 884693. [Google Scholar] [CrossRef] [PubMed]
Ferris, J.K.; Tavenner, B.P.; Barisano, G.; Brodtmann, A.; Buetefisch, C.M.; Conforto, A.B.; Liew, S.L. Modulation of the association between corticospinal tract damage and outcome after stroke by white matter hyperintensities. Neurology 2024, 102, e209387. [Google Scholar] [CrossRef] [PubMed]
Kang, H.J.; Stewart, R.; Park, M.S.; Bae, K.Y.; Kim, S.W.; Kim, J.M.; Shin, I.S.; Cho, K.H.; Yoon, J.S. White Matter Hyperintensities and Functional Outcomes at 2 Weeks and 1 Year after Stroke. Cerebrovasc. Dis. 2013, 35, 138–145. [Google Scholar] [CrossRef]
Giese, A.K.; Schirmer, M.D.; Dalca, A.V.; Sridharan, R.; Donahue, K.L.; Nardin, M.; Rost, N.S. White Matter Hyperintensity Burden in Acute Stroke Patients Differs by Ischemic Stroke Subtype: A Multicenter Study. Neurology 2020, 95, e79–e88. [Google Scholar] [CrossRef]
Peng, Y.; Luo, D.; Zeng, P.; Zeng, B.; Xiang, Y.; Wang, D.; Luo, T. Impact of white matter hyperintensity location on outcome in acute ischemic stroke patients: A lesion symptom mapping study. Brain Imaging Behav. 2025, 19, 269–278. [Google Scholar] [CrossRef]
Liew, S.L.; Schweighofer, N.; Cole, J.H.; Zavaliangos-Petropulu, A.; Tavenner, B.P.; Han, L.K.; Thompson, P. Association of brain age, lesion volume, and functional outcome in patients with stroke. Neurology 2023, 100, e2103–e2113. [Google Scholar] [CrossRef]
Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations without Sharing Patient Data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef]
Rangel, E.; Martinez, F. Federative Ischemic Stroke Segmentation as an Alternative to Overcome Domain-Shift Multi-Institution Challenges. arXiv 2025, arXiv:2508.18296. [Google Scholar] [CrossRef]

Figure 1. Challenges in stroke lesion segmentation illustrated using diffusion-weighted magnetic resonance imaging (MRI-DWI). Panel (a) shows examples of complex and irregular lesion shapes with heterogeneous textures, which make boundary delineation difficult. Panels (b,c) depict cases with low spatial resolution and poor lesion-to-background contrast, leading to uncertainty in automated segmentation. Panels (d,e) highlight the difficulty of distinguishing ischemic lesions from surrounding normal tissue.

Figure 2. Flow diagram showing the literature screening and selection process. The diagram tracks the number of studies identified from database searches, records removed as duplicates, and articles excluded after title, abstract, and full-text screening. The final number of studies that met the inclusion criteria and were analysed in this review is indicated at the end of the flow. This visualisation clarifies how the evidence base for the review was systematically narrowed to the most relevant studies.

Figure 3. Comparison of Dice similarity coefficients reported by different stroke lesion segmentation methods on the Anatomical Tracings of Lesions After Stroke (ATLAS) dataset, including U-Net [26,29,34,35], nnU-Net [27], Fuzzy Algorithm [30], Attention Network [33], Semantic Network [31], Simulated Quantum Mechanics [32], and Transfer Learning [28]. The results illustrate how methodological choices—including algorithm selection, learning strategy, and data pre- and postprocessing—affect segmentation accuracy while underscoring the variability across studies.

Table 1. Multimodal studies on ischemic stroke lesion segmentation. The table summarises representative approaches integrating different imaging modalities, such as Diffusion-Weighted Imaging (DWI), Apparent Diffusion Coefficient (ADC), FLAIR, and multi-channel MRI, to improve lesion delineation. Each entry lists the study with author and year, the imaging modalities used, the methodology, and the results or reported outcomes. These examples demonstrate that multimodal integration yields more accurate and reliable lesion segmentation than a single imaging modality alone.

Study (Author, Year)	Year	Modalities Used	Method	Result
Rahman et al., 2025 [15]	2025	DWI, ADC, and eDWI	DenseNet121 encoder + SelfONN-CSCA (Channel and Spatial Compound Attention)-UNet decoder	DWI only: 83.88, DWI + ADC: 85.86, DWI + ADC + eDWI: 87.49
Siddiquee et al., 2022 [16]	2022	DWI and ADC	SegResNet + Deep Supervision + Auto3DSeg	DSC = 0.824
Gi et al., 2024 [13]	2024	ADC + FLAIR vs. DWI + FLAIR	integrates an unsupervised registration model, ConvNXMorph, and a segmentation model, nnUNet-v2	DSC = 0.84
Jeong et al., 2024 [17]	2024	DWI and ADC	Base model: nnU-Net	Dataset I: 60.35% with ensemble model. Dataset II: 74.12% with ensemble model. ISLES’22 Challenge: Achieved first rank overall, with an average Dice = 78.69% across test cases
García-Salgado et al., 2024 [14]	2024	FLAIR, DWI, and T2	U-Net architecture	F1-Scores over 0.7 in FLAIR, DWI, and T2

Table 2. Summary of Machine Learning (ML) approaches applied to the Anatomical Tracings of Lesions After Stroke (ATLAS) dataset for Stroke Lesion Segmentation. The table outlines key studies, the segmentation method implemented, loss function, and their main findings, demonstrating how the ATLAS dataset serves as a common benchmark for developing and validating automated stroke lesion segmentation methods. By providing standardised lesion annotations on MRI, ATLAS enables direct comparison across techniques and facilitates progress toward more accurate and generalisable segmentation models.

Ref.	Preprocessing	Dataset	Segmentation Method	Loss Function	Performance Metric	Gap/Limitations
[29]	Intensity normalisation, registration and defacing, brain extraction, skull removing	ATLAS V2.0	Deep Neural Network (DNN) using 3D-UNet with 5-fold cross-validation	NM	DSC = 0.65	Although potential biases from the manual segmentation process could influence the model, and the reliance on a subset for testing may limit generalisability, the scarcity of publicly accessible stroke datasets with manual segmentation labels makes independent validation.
[26]	Z-score Norma, slicing for 2D modality	ATLAS V2.0	Multiple U Net variants	Proportionate weightage for Dice loss (DICE) and Binary Cross-Entropy loss (BCE)	Dice: 0.583	There is a need for better data augmentation, U-Net models, supervised learning, handling small lesions, and decision-making uncertainty.
[27]	Normalisation and registration	ATLAS V2.0	nnU-Net	Compound loss (Dice plus cross-entropy), TopK10 loss	Dice: 0.6667	Small stroke lesions are hard to segment, especially with artifacts or similar intensities to surrounding tissue. Training schemes often predict unconnected lesions as continuous grey matter due to similar intensities.
[30]	Gaussian denoising	ATLAS V2.0	Fuzzy Information Seeded Region Growing (FISRG) algorithm	DiNM	Dice: 0.94	The algorithm struggles with abrupt lesion topology changes, misclassifies regions with similar intensities, and has increased computational time. Intensity-based classification causes errors, especially with variable lesion textures.
[33]	Image slicing, cropping, patch partitioning, patch embedding, normalisation, and augmentation	ATLAS v1	MLiRA-Net (Multi-scale Long-range Interactive and Regional Attention Network)	Dice loss + weighted binary cross-entropy loss	Dice: 0.6119	It comes at the cost of increased computational complexity. Additionally, the current implementation is limited to two-dimensional segmentation.
[31]	Matrix complement and clipping method	ATLAS V2.0	Hybrid Contextual Semantic Network (HCSNet)	Mixing-loss function	Dice: 0.69	Segmenting small lesions is challenging due to the heavy reliance on training data quality and quantity. Additionally, the model’s ability to generalise to real-world clinical settings needs further validation.
[32]	Intensity correction, MNI-152 template registration	ATLAS V2.0	Simulated Quantum Mechanics-based joint Learning Network (SQMLP-net)	The joint loss function incorporates the segmentation and classification losses	Dice: 0.7098	Finding the right balance between the trade-offs of multi-task learning weights is crucial for optimising task performance.
[28]	Resampling and normalisation, skull stripping, slicing, and augmentation	ATLAS V2.0	Transfer learning and mixed data approaches	NM	Dice: 0.736	The ensemble methods’ accuracy may change by the chosen parameters, which may require further adjustment for different datasets or lesion types. The ensemble method tends to overpredict lesions by approximately 10%.
[34]	Normalisation, cropping, a zoom-in and out training strategy	ATLAS v1	3D U-Net architecture with residual learning	Binary cross entropy (BCE) + Dice loss	Dice: 0.64	The small dataset limits generalisability, and the model struggles with smaller lesions. Further validation with diverse scanners is needed. Manual tracing introduces variability, and the dataset only focuses on embolic strokes, leaving other types untested.
[35]	Slice selection, patch extraction, data augmentation	ATLAS v1	U-Net architecture with Xception as the backbone- XU-Net	BCE and Jaccard coefficient	Dice: 0.754	The study struggles with accurately segmenting small stroke lesions and reducing false positives. The model’s generalisability requires further validation on diverse datasets. Additionally, the approach increases computational complexity, limiting real-time application.

Table 3. Study characteristics, modelling approaches, and predictive performance of included studies that used datasets other than ATLAS. Unlike the ATLAS-based studies, which primarily focus on lesion segmentation, these investigations concentrate on outcome prediction. The table summarises dataset characteristics (e.g., cohort size, imaging modality), model methodology, and reported predictive performance for functional or cognitive recovery.

Ref.	Study Design	Sample Size	Data Modality	Method	Focus Area	Performance Metrics	Gap/Limitations
[43]	Longitudinal design	57 patients	Resting-state functional MRI; behavioural language measures	Elastic net regression models	Language recovery	R² = 0.948	Small sample size and lack of validation set.
							Use of LOOCV (Leave-One-Out Cross-Validation)—a less robust and reliable model selection method.
							Recruitment constraints and minimal fMRI filtering resulted in lower accuracy and generalisability.
[44]	Longitudinal design	37 patients	T2 and diffusion-weighted MRI; Fugl–Meyer Assessment (FMA) scores	Ridge regression	Motor recovery	R² = 0.68	Small sample size and no age-matching; manual lesion masking; larger, diverse cohorts required to confirm findings.
[36]	Longitudinal design	48 patients	Magnetic resonance diffusion-weighted images and CT (one case)	Logistic and linear regression models	Upper extremity motor recovery	AUC = ranging from 0.70 to 0.8	The study sample was younger, predominantly male, and ethnically homogeneous compared to national averages, with a small sample size of 48 participants.
[37]	Longitudinal design	7858 patients	Demographic and clinical characteristicss	K-means clustering	Functional recovery	Ischemic: 0.926; hemorrhagic: 0.887	Survivor bias; design limitations—absence of functional MRI or dynamic nomograms.
[38]	Prospective cohort study	29 patients	Transcranial magnetic stimulation or diffusion tensor imaging parameters or their combinations	Multivariate logistic regression analysis	The upper-limb motor function	Predictive ability: 93.3%	Rigorous inclusion criteria: Only 29 patients enrolled, limiting generalisability; exclusion criteria: participants with medical issues were excluded, potentially skewing the results.
[39]	Longitudinal design	334 patients	Demographic factors; MRI or CT imaging	Linear model	Language recovery	QAB overall: 59.5% of the variance	Smaller group sizes and missing data points. Lesions identified through acute clinical imaging may not accurately reflect irreversible tissue damage. QAB limitation: Does not assess written word comprehension or writing.
[45]	Retrospective cohort study	104, 42 patients	Structural, diffusion, and functional magnetic resonance imaging	Multiple linear regression algorithm	Upper extremity motor recovery	R² = 0.853	Small sample size; single-centre study.
[42]	Retrospective cohort study	1265 patients	Data from electronic records	Multivariable logistic regression model	Motor impairment	AUC = 0.833	Retrospective design may introduce biases. No control group. Dynamic motor changes: study does not account for rehabilitation-related variations.
[41]	Retrospective cohort study	717 patients	Clinical and demographic data	Multivariable logistic regression model	Physical functioning	AUC = 0.883	Retrospective design introduces biases. Exclusion of mild stroke patients. Did not assess the prognostic role of neuroimaging.
[40]	Observational prospective cohort design	87 patients	Clinical, demographic, and statistical data; accelerometer data	Multivariate regression model	Upper-limb	AUC = 0.86	Limited accelerometer use, and visible devices could have led to overestimation.
[46]	Longitudinal cohort design	55 patients	Demographic, clinical, biochemical, and hematological parameters; health status data at discharge	SVM	Motor and cognitive improvement	Correlation ranged from 0.75 to 0.81	Single hospital data. Reliance on common inflammatory biomarkers, which can vary between individuals, and short follow-up period.
[47]	Cross-sectional design	758 patients	T1-weighted structural MRI, demographic and clinical characteristics	CNN	Aphasia	Classification accuracy: 0.854	Restricted to research-quality MRI scanners, and the measure of initial severity is relatively crude. Overfitting risk. Simplifies the Comprehensive Aphasia Test T-scores.
[48]	Retrospective cohort study	778 patients	Demographic and clinical information; functional scores at admission.	Machine learning models	Walking independence	LR:AUC: 0.891, XGBoost:AUC: 0.880, SVM:AUC: 0.659, RF:AUC: 0.713	Retrospective, single-centre design. No external validation. Only clinical admission data were used. No long-term outcomes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sasidharan, S.M.; Mdletshe, S.; Wang, A. Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review. Appl. Sci. 2025, 15, 10082. https://doi.org/10.3390/app151810082

AMA Style

Sasidharan SM, Mdletshe S, Wang A. Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review. Applied Sciences. 2025; 15(18):10082. https://doi.org/10.3390/app151810082

Chicago/Turabian Style

Sasidharan, Simi Meledathu, Sibusiso Mdletshe, and Alan Wang. 2025. "Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review" Applied Sciences 15, no. 18: 10082. https://doi.org/10.3390/app151810082

APA Style

Sasidharan, S. M., Mdletshe, S., & Wang, A. (2025). Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review. Applied Sciences, 15(18), 10082. https://doi.org/10.3390/app151810082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning in Stroke Lesion Segmentation and Recovery Forecasting: A Review

Abstract

1. Introduction

2. Methods

Study Search Strategy

3. Results

3.1. Machine Learning-Driven Stroke Lesion Segmentation with ATLAS Dataset

3.2. Post-Stroke Recovery Prediction Through Machine Learning Techniques

4. Discussion

5. Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI