Abstract
Hospital readmission among stroke survivors is frequent, especially in contexts of social vulnerability, compromising recovery and overburdening health services. This study aimed to develop a predictive model of hospital readmission among socially vulnerable stroke survivors, based on the Chronic Conditions Care Model (CCCM). Machine learning algorithms were applied, specifically decision tree and logistic regression, with data split into training (70% and 80%) and testing (30% and 20%) sets. Analyses were conducted using Python, with accuracy evaluated through ROC curves, AUC, and the confusion matrix in Analyse-it®, adopting a 5% significance level. The decision tree with an 80/20 partition achieved an accuracy of 92.45%. The variables most associated with readmission were falls, time since the first stroke, presence of a caregiver, and difficulty sleeping. In logistic regression, falls increased the risk by 235%, ischemic stroke by 155%, complications by 153.53%, COVID-19 by 132%, and time since stroke by 11.5% per year. The model proved to be feasible and robust, with the decision tree standing out, highlighting its potential to support preventive strategies and enhance care management.
1. Introduction
In Latin America, predictive models have been developed to estimate population health outcomes []. In countries such as Colombia [] and Mexico [], recent studies have applied Machine Learning (ML) techniques to predict outcomes in chronic diseases, including diabetes, hypertension, and stroke, demonstrating their potential to support clinical decision-making in contexts marked by inequality and fragmented health systems. In Brazil, the SABE Study employed ML algorithms to predict the five-year mortality risk among older adults [].
In the context of non-communicable diseases (NCDs), predictive models show promise for improving health service organization. They provide prognostic information on the risks of death, disability, readmission, functional limitations, and access to services. Among these diseases, stroke stands out due to its high mortality and morbidity rates, with permanent sequelae that make post-discharge care a significant challenge, particularly due to the frequent need for hospital readmissions [,]. This challenge is especially relevant in Latin America, where countries face structural and access-related barriers to the longitudinal follow-up of stroke survivors [].
The severity of impairments among stroke survivors may lead to unplanned hospital readmissions, often associated with discontinuity of home care, placing an added burden on health systems. The causes of readmission tend to recur, reflecting the clinical status of the survivor and requiring individualized post-discharge care interventions [,,,,,,]. Socially vulnerable individuals who have suffered a stroke present a higher prevalence of risk factors such as low income, limited education, unemployment, and restricted access to healthcare, all of which hinder post-stroke recovery [,].
Social vulnerability is the process through which individuals or groups are exposed to conditions that increase their fragility and risk of exclusion, thus limiting access to fundamental rights such as health, education, housing, and employment. This concept goes beyond a static view by incorporating structural, historical, and contextual factors that intensify inequalities and limit opportunities [,].
This condition of vulnerability is especially pronounced in regions with low Human Development Index (HDI), reflecting inequalities in access to essential services such as healthcare, education, and income []. In Latin America, the demographic and socioeconomic characteristics of countries underscore the relevance of structural determinants in shaping health outcomes and the responsiveness of health systems [,].
The development of a predictive model based on ML that addresses social vulnerability may enable the identification of stroke survivors at high risk of readmission and pave the way for improved care management. The implementation of such a model aims to inform individualized interventions, guide public policy development, and highlight key elements, such as counter-referral services in primary healthcare (PHC), to which stroke survivors should be directed after their initial hospital discharge, following a coordinated care network logic.
From this perspective, the Chronic Conditions Care Model (CCCM), proposed by the World Health Organization (WHO) and structuring the Brazilian Care Networks for People with Chronic Diseases (RASPDC), seeks to improve population health by understanding individual and collective risks to which individuals are exposed, encouraging people and their families to engage in their own care. It incorporates health condition management technologies and case management, enabling preventive interventions on risk factors at various levels of the CCCM [].
Understanding that the combination of clinical and social factors increases the likelihood of readmission reinforces the importance of ML-based predictive models that account for social vulnerability. Such models can identify stroke survivors at greater risk, guide individualized interventions, and support public policies aimed at restructuring care systems, particularly in contexts where structural inequalities still limit access to integrated and continuous health services. In this context, the present study aimed to develop a predictive model for hospital readmission among socially vulnerable stroke survivors.
2. Materials and Methods
The study employed a cross-sectional design [], with data collection conducted from August to October 2023 and from April to May 2024, in primary healthcare (PHC) Units in two municipalities in northeastern Brazil characterized by low Human Development Index (HDI) and marked social inequality. The study report followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [].
The study population consisted of patients with a history of ischemic or hemorrhagic stroke, as documented in the follow-up records of the respective primary healthcare unit. Inclusion criteria were age 18 years or older, given the higher prevalence of stroke in adults []; and registration with the ESF in one of the municipalities. Exclusion criteria included acute diagnosis of Transient Ischemic Attack (TIA) or symptomatic cerebrovascular conditions reported by the patient, family member, or healthcare team; planned or scheduled readmissions; emergency care without hospitalization; follow-up or reassessment consultations; and death during hospitalization.
A simple random sample was calculated using G*Power® software, version 3.1.9.7, with an a priori analysis of the required sample size for evaluating variables in contingency tables []. The effect size was set at 0.30, with an alpha error probability of 0.05 and power (1—beta error probability) of 0.80. The minimum required sample size with a 95% confidence interval was 143 participants. By the end of the study, 267 participants were included. Participants were randomly selected, and those who declined participation were replaced through the same random selection process.
The data collection instrument consisted of four groups of variables: socioeconomic, health condition, biological, and psychosocial variables. The dependent variable was hospital readmission, defined as hospital admission following the discharge date from the stroke-related hospitalization (first or only event). Both stroke-related and -unrelated readmissions were identified.
To assess functional dependence, the Barthel Index [], was applied, classifying patients as: total dependence (≤25 points), severe dependence (26–50 points), moderate dependence (51–75 points), mild dependence (76–99 points), and fully independent (100 points) [].
A pilot test was conducted with a population not included in the final sample to adjust the data collection instrument []. Data from the pilot were not used in the main study results. Participants were interviewed at home in a private setting to ensure confidentiality and anonymity, following prior scheduling, and accompanied by a Community Health Agent (ACS) or another team member.
Stroke can result in disabling complications, with up to 90.00% of patients experiencing expressive speech disorders [], potentially preventing coherent responses to the questionnaire; in such cases, questions were directed to a family member or responsible caregiver. To minimize recall bias, documents issued by healthcare professionals/services, such as prescriptions, discharge reports, and test results, were reviewed.
Exploratory data analysis involved calculating descriptive statistics, such as frequencies, central tendency (mean), and dispersion (standard deviation). Subsequently, logistic regression and decision tree models were fitted to predict readmission. These models were selected based on the binary nature of the outcome variable (readmitted or not). To standardize variable scales and prevent variables with larger magnitudes from disproportionately influencing model performance, the Min–Max normalization technique was employed, transforming variable values to a 0–1 range.
For classification problems involving categorical variables, boundary plots were generated to identify optimal separation between classes, including for socioeconomic variables, health conditions, biological variables, functional dependence (assessed using the Barthel Index), and psychosocial variables.
Two predictive models were developed: logistic regression and decision tree, which were compared. The Ridge logistic regression model was adopted to minimize the risk of overfitting and multicollinearity among predictors. The model was optimized through stratified 5-fold cross-validation and GridSearchCV, testing different values of the regularization hyperparameter. Predictors were standardized using the StandardScaler method. In addition to the penalized version (Ridge), a classical unpenalized logistic regression model was also estimated to obtain the regression coefficients and odds ratios (OR), allowing for a direct comparison between both approaches. Confidence intervals were estimated using the bootstrap resampling technique with 1000 iterations.
The decision tree model applied was CART (Classification and Regression Tree), prioritizing clinical segmentation interpretability. The Gini index was used as the impurity criterion, and tree complexity was controlled using the ccp_alpha (Cost-Complexity Pruning) parameter. Multiple values of alpha were tested, and the one that maximized predictive performance on the validation set was selected. After this optimization, variables with relative importance lower than 2% were removed, and the model was recalibrated.
Both models were evaluated under three distinct validation schemes to test the robustness and stability of the results. Variable partitioning was performed in two groups—70% and 80% for training, and 30% and 20% for testing, respectively, for both model types—and through stratified 5-fold cross-validation (5KFold) to estimate mean performance and standard deviation across folds.
The performance metrics assessed included accuracy ≥0.70 (proportion of correct classifications), AUC-ROC > 0.75 (global discriminative ability), Cohen’s Kappa between 0.40 and 0.60 (agreement between observed and predicted values), RMSE < 0.50 (mean calibration error), as well as false positive and false negative rates.
All analyses were conducted using Python software (version 3.13), with the following libraries: pandas, numpy, matplotlib, scikit-learn, and seaborn. All analyses assumed a statistical significance level of 5% and were performed in Python.
This study followed all ethical and legal principles for research involving human subjects (Brazil, 2012; Brazil, 2016; Brazil, 2018). It was approved by the Research Ethics Committee (CEP) of the Federal University of Piauí (UFPI), under CAAE 67949523.4.0000.5214 and approval number 5.981.191. All participants signed the Informed Consent Form (ICF).
3. Results
3.1. Sample Characterization
Male participants comprised the majority of the sample (52.40%). The mean age was 70.5 years (SD = 12.1), with stroke cases occurring across a wide age range, from 38 to 111 years. Nearly all participants (88.40%) self-identified as Black or Brown. More than half (54.30%) lived with a partner. Most did not have a caregiver (58.40%), and among those who did, the majority were informal caregivers (37.80%).
Regarding health history and lifestyle, 75.30% had hypertension (HTN), 35.60% had diabetes mellitus (DM), 24.70% had experienced a previous stroke, 27.70% had dyslipidemia, 12.30% reported a cardiac comorbidity, 17.60% had pneumonia, and 17.20% had some type of dementia. Concerning primary health conditions presented by stroke survivors, 59.90% reported not seeking medical assistance prior to the index event. Among the strokes that occurred, 88.00% were ischemic strokes (IS).
In the overall sample, the prevalence of readmission at any point was 46.80%. Among these, 60.80% were readmitted for stroke-related causes. The prevalence of readmission within one year following the index event was 37.10%.
With regard to psychosocial interaction variables among stroke survivors, 83 participants (31.10%) reported some degree of difficulty sleeping after the index event. In addition, 16 (6.00%) and 82 (30.70%) presented with aphasia and dysphasia after the illness, respectively (Table 1).
Table 1.
Socioeconomic characteristics, health background, lifestyle, health conditions, and functional dependence of stroke survivors (n = 267). Teresina/Piauí, 2025.
3.2. Construction of ML Models
For the decision tree, the dataset was divided into two distinct parts: a training set (70%) used to train the ML model, and a test set (30%) used to evaluate the performance of the trained model on separate data. The model’s accuracy was 74.1%, demonstrating a better balance between false negative and false positive classifications. (Table 2).
Table 2.
Predictive Performance of Logistic Regression (Ridge) and Decision Tree (CART) Models for Hospital Readmission within One Year after Stroke.
The area under the ROC curve of the decision tree model for this partition was 80.3%. This value indicates that the model has a high capacity to correctly distinguish between patients who were readmitted and those who were not. Dividing the data into a training set (80%) and a test set (20%) allowed for evaluating the model’s generalization ability. The results indicate that, after training, the model performed better in classifying new patients (test set) with respect to the dependent variable ‘readmitted’.
Analysis of the confusion matrix showed that the model performed well in classifying patients. The accuracy was 70.4%, with 25.9% false positives, but only 3.7% false negative classification errors, characterizing high sensitivity but lower specificity (Table 2). For this partition, the area under the ROC curve of the decision tree model reached 81.8%, indicating an exceptional ability to correctly distinguish between readmitted and non-readmitted patients. This result demonstrates the model’s high precision in patient classification. The Decision Tree (CART) model evaluated through 5-fold cross-validation demonstrated consistent and reliable performance, with an average accuracy of 70.8% indicating balanced and consistent performance between false positive and false negative classifications. Furthermore, an AUC-ROC of 77.7% indicates acceptable discriminatory power and stability across validation folds.
Analysis of the confusion matrix for the logistic regression model with the 70%/30% partition revealed good performance in patient classification. The model had a 17.3% error rate by incorrectly classifying a patient who was not readmitted as readmitted (false positive) and a 16.0% error rate by misclassifying readmitted patients as not readmitted (false negatives). The accuracy based on available data was 66.7%, with a greater loss of sensitivity (Table 1). The area under the ROC curve of the logistic regression model reached 78.4%, indicating that the model has adequate capacity to correctly distinguish between patients who were readmitted and those who were not.
Analysis of the confusion matrix with the 80%/20% partition showed that the logistic model performed well in classifying patients. It correctly identified 27 patients who were not readmitted (true negatives) and 14 who were readmitted (true positives). However, the model made seven errors by incorrectly classifying patients who were not readmitted as readmitted (false positives) and six errors by misclassifying readmitted patients as not readmitted (false negatives), indicating balanced errors and good sensitivity. For this sample, the logistic classification model achieved an accuracy of 75.9% (Table 2).
For this partition (80% training and 20% testing), the area under the ROC curve of the logistic regression model reached 80.3%, showing a better balance between sensitivity and specificity. This result demonstrates the model’s good precision in patient classification. The Ridge Logistic Regression model assessed using 5-fold cross-validation achieved an average accuracy of 74.1% and an AUC-ROC of 80.4%, demonstrating good discrimination, moderate agreement, and stable performance across validation folds.
The models using a 70% training and 30% testing data split showed moderate performance, with good discrimination (AUC 78.4%) and moderate accuracy (66.7%). The Ridge Logistic Regression model with 5-fold cross-validation achieved a mean accuracy of 70.0% (±0.08) and a mean AUC-ROC of 77.9% (±0.084), indicating predictive stability and consistent performance across folds (Table 2).
Both models show moderate accuracies, with better performance observed for the Ridge Regression model (80/20) and the Decision Tree model (70/30). However, the cross-validation results demonstrate similar robustness, indicating model stability in the face of data variation. All models exhibited an AUC > 0.75, which is considered good discriminative performance, with the 80/20 Decision Tree model standing out. The Kappa values indicate moderate agreement between predictions, suggesting that both models adequately represented the readmission pattern. The root mean squared error (RMSE) values ranging from 0.41 to 0.48 indicate good calibration, with the Decision Tree model showing predicted probabilities closer to the observed outcomes.
Among the most important explanatory variables for hospital readmission of stroke survivors in the decision tree model, using readmission within one year as the root variable, the splits selected complications during hospitalizations and falls as key health-related predictors; having a caregiver and difficulty sleeping were identified as the most important socioeconomic and psychosocial variables, respectively, in determining hospital readmission (Figure 1).
Figure 1.
Decision tree of the most important explanatory variables for hospital readmission among stroke survivors. Data partitioning: 80% training/20% testing (stratified). Source: research data.
Both the decision tree and logistic regression models converge on key determinants of hospital readmission after stroke, particularly clinical complications during hospitalization and the presence of a caregiver, which highlight the dual importance of clinical stability and social support in post-stroke care. The integration of these findings within the Chronic Care for Conditions Model (CCCM) underscores the need for risk stratification, multidisciplinary follow-up, and continuity of care across levels of the health system. These results reinforce that combining predictive analytics with structured chronic care management can improve coordination, prevent avoidable readmissions, and enhance long-term outcomes for stroke survivors.
According to the logistic regression model, the explanatory variables (X1 … X61) with the greatest classification power are presented below. The variable with the highest coefficient is the most significant for the model’s classification—that is, for determining whether the patient will be readmitted or not. The eight most important variables for predicting readmission within one year are as follows. Acute clinical variables—complications during hospitalization, falls, skin lesions, and type of stroke—had the greatest weight in the risk of readmission (between 15% and 18%). Functional and social factors—caregiver presence, sleep difficulty, and time since the stroke in months—showed an intermediate influence (between 7% and 15%), while home care interventions demonstrated a mild protective effect (−3%) (Table 3).
Table 3.
Explanatory most influential variables with the highest classification power for hospital readmission of stroke survivors according to the Classical Logistic Regression vs. Ridge Logistic Regression. Data partitioning 80% training/20% testing (stratified).
Table 4 presents an interpretative framework of the results of the predictive models in light of the MACC.
Table 4.
Interpretive framework of the results of the predictive models in light of the MACC.
4. Discussion
The prediction of hospital readmission among stroke survivors is essential for optimizing care and reducing recurrent hospitalizations. Predictive models based on the CCCM provide a robust theoretical framework that can enhance our understanding of the challenges faced in caring for this population. By aligning the principles of the CCCM with the results of this study, it becomes possible to more comprehensively assess the needs of post-stroke patients.
The analysis of the patients’ profiles indicates that factors such as advanced age, comorbidities, and unfavorable socioeconomic conditions increase vulnerability to hospitalization and clinical complications. Populations in situations of social vulnerability face significant barriers to accessing healthcare, which can worsen clinical conditions, increase frailty, and negatively affect functionality and quality of life. These findings highlight that the sociodemographic and clinical characteristics of patients directly influence the risk of hospital readmission, reinforcing the importance of care strategies tailored to these conditions. This perspective reinforces the concept of coordinated and continuous care, capable of anticipating demands, closing care gaps, and promoting a more structured approach to preventing readmissions.
The implementation of the predictive model in the investigated healthcare services can restructure the care provided to stroke survivors, fostering the development of routines and workflows that prioritize patients at higher risk of readmission. Identifying individuals at risk can lead to the creation of joint therapeutic plans involving primary healthcare teams, family members, and caregivers, thereby enhancing monitoring, enabling effective interventions, reducing the risk of readmission, and improving quality of life. The development of therapeutic plans that include preventive home measures represents promising strategies to be explored: the strengthening of self-care, symptom surveillance, and prompt health-seeking behavior, coordinated through community health workers.
The implementation of the CCCM and similar models derived from it in Latin America, promoted by the Pan American Health Organization (PAHO), emphasizes the fundamental importance of PHC and recognizes that the best clinical outcomes are achieved when all components of the model are interconnected and operate in a coordinated manner [].
The analysis using the Decision Tree model, with a 70% training and 30% testing split, showed promising results in identifying predictive factors for readmission, with an accuracy of 74.1%, considered satisfactory for clinical application []. Models with accuracy above 75.00% are deemed relevant for clinical applications, provided they are accompanied by complementary analyses to minimize classification errors [].
The AUC-ROC, a widely used metric in the evaluation of predictive models [], was 80.3%, indicating good discriminative ability between readmitted and non-readmitted patients. This performance aligns with the reference threshold of an AUC-ROC above 85.00% as an indicator of reliable classification [], and enables more robust analysis for clinical decision-making [].
Tests using an 80/20 partition, common practice in model evaluation [], demonstrated improved performance, with the decision tree showing greater predictive capacity due to the larger training dataset []. Despite this improvement, the model still presented limitations such as false positives and false negatives [].
Logistic regression with a 70/30 partition achieved an AUC-ROC of 78.4% and accuracy of 66.7%, indicating adequate discriminative ability. Studies highlight that an AUC-ROC above 0.70 indicate acceptable discrimination, while values greater than 0.80 represent good predictive performance in clinical models []. However, the presence of false negatives is concerning in clinical settings, where at-risk patients may not be identified in time for preventive interventions []. The model showed the greatest loss of sensitivity compared to the decision tree model [].
In the 80/20 partition, logistic regression achieved an AUC-ROC of 80.3% and an accuracy of 75.9%, demonstrating a performance gain with a larger training dataset []. Nonetheless, adjustments aimed at improving sensitivity and reducing false negatives are necessary to enhance clinical reliability []. The comparison between models revealed that in the 70/30 partition, decision tree slightly outperformed the logistic regression. However, in the 80/20 partition, the decision tree showed superior performance in both accuracy and AUC-ROC, standing out as an effective predictive tool to support clinical decision-making. However, the cross-validation (5-fold) results show similar robustness, indicating the stability of the models in the face of data variation.
Both models demonstrated satisfactory predictive performance, with accuracy and AUC-ROC values considered acceptable for clinical risk prediction models. These findings suggest that the decision tree model excels in interpretability and clinical applicability, being more sensitive for identifying patients at risk of readmission, whereas the Ridge regression model offers greater statistical robustness and lower overfitting tendency, supporting better generalizability.
This indicates that increasing the proportion of training data positively contributes to the performance of the decision tree model, highlighting the importance of fine-tuning ML models, such as parameter optimization and relevant variable selection, to achieve better predictive outcomes []. These findings suggest that both logistic regression and decision tree models have potential for use in predicting patient readmissions, each with distinct characteristics and strengths. The choice of the most appropriate model should be guided not only by performance metrics, but also by the specific needs of the clinical setting, as well as a consideration of the impacts that classification errors may have on patient care and management.
The analysis of hospital readmission among stroke survivors using a decision tree identified relevant predictive factors, such as complications during hospitalization and falls. Socioeconomic and psychosocial factors, such as the absence of a caregiver and difficulty sleeping, increase the likelihood of readmission, aligning with evidence that a lack of adequate support elevates the risk of complications []. This is a widely shared challenge among health systems in Latin America and the Caribbean, which are often marked by limited access to post-discharge care and home-based rehabilitation services [].
In logistic regression, falls increased the likelihood of readmission corroborating Gaspari et al. (2019) []. Ischemic stroke raised the probability of readmission, reinforcing its severity [,]. Complications during hospitalization and skin lesions showed the strongest contribution to readmission risk, indicating that acute clinical instability and secondary physical complications significantly increase the likelihood of hospital return after stroke []. These results underscore the need for a comprehensive approach that considers clinical, social, and economic conditions, especially in contexts of high social vulnerability such as those observed in Latin American countries, where precarious social determinants of health amplify the risk of adverse events after hospital discharge [].
Additionally, health systems in Latin America are mixed in nature, with joint participation of the public and private sectors. Each subsystem has its own financing model and adopts uncoordinated strategies in response to health issues, generating inequalities in the quality of care delivered. This fragmentation and segmentation result in a heavy burden of inequity, treating the human being as an object of the economy and disregarding their dignity. As a consequence, healthcare is often neglected, particularly for individuals with chronic conditions [].
The structure of the CCCM (Microsystem, Mesosystem, and Macrosystem) provides a comprehensive framework for interpreting predictive models. At the microsystem level, aspects such as functional sequelae, home care, low adherence to therapies, and use of continuous medications can be observed. These factors highlight the urgency of strengthening the integration between primary care, specialized services, and rehabilitation programs, resonating with the need for coordinated healthcare networks, as advocated by recent public system reforms in Latin America aimed at overcoming fragmentation and promoting continuity of care []. Moreover, it is essential to expand family and community support to improve care for stroke survivors.
At the mesosystem level, the need to integrate rehabilitation services and psychosocial support is evident, especially for patients with high dependency and informal caregivers. Additionally, sleep difficulties and the lack of psychological support underscore the importance of expanded care networks. Health education and self-management remain challenges, particularly when working with vulnerable populations. Patients with low literacy levels and dependency on social benefits require adapted educational strategies. Furthermore, intersectoral coordination between health, social assistance, and education is essential to address inequalities and reduce readmissions. Socioeconomic vulnerability and the absence of formal support emphasize the need for intersectoral coordination, integrating health, social services, and other public policies to promote more favorable recovery and reduce disparities.
At the macrosystem level, structural determinants such as poverty, social exclusion, cultural values, racial and gender inequality, and low educational attainment limit the access to and continuity of care. These factors call for structural changes and public policies that promote a more equitable and inclusive social and health system. The CCCM reinforces the importance of public policies that integrate health and social determinants, promoting equity and better outcomes.
The study revealed weaknesses in the follow-up care of post-stroke patients, with many readmissions linked to complications that could have been prevented. Brazilian programs such as “Melhor em Casa” (“Better at Home”) have sought to address this demand by offering home care within the framework of the Unified Health System (SUS), but their coverage and effectiveness remain limited given the complexity of the cases []. The CCCM emerges as an alternative for more proactive management of chronic conditions, with the potential to better coordinate services and reduce hospital readmissions, provided that it is supported by structured public policies.
The decision tree presents five possible pathways that can be interpreted according to the levels of the Chronic Care for Conditions Model (CCCM). The first pathway represents patients without in-hospital complications or sequelae, who have caregivers and no sleep difficulties, resulting in a low risk of readmission. Care for this group could be managed at Level 3 of the CCCM, focusing on the management of chronic conditions. These individuals may receive routine follow-up, self-care promotion, mild rehabilitation, and annual monitoring.
The pathways representing “patients without acute complications or falls, but with sleep difficulties even when having a caregiver” and “patients without complications or falls, but without a caregiver” indicate an intermediate risk of readmission and can be placed at Level 4. These patients should have priority actions in primary healthcare (PHC) directed toward the implementation of individualized care plans, with expanded home visits and integration with the Family Health Support Center and multidisciplinary rehabilitation.
Finally, the pathways corresponding to “patients who experienced in-hospital complications and have a caregiver” and “patients with in-hospital complications and no caregiver” represent the most vulnerable profiles. Level 5 targets patients with a high probability of readmission—individuals whose pathways are marked by falls, time since stroke of less than 12 months, hemorrhagic stroke, or absence of a caregiver. At this level, given the severity, actions such as case management by PHC, intensive home care, specialized support, and coordination with hospital and rehabilitation services may be the most effective.
In other Latin American countries, similar initiatives face comparable challenges. In Costa Rica [], the Long-Term Care program has emphasized home care, the training of formal caregivers, and family support, although budget constraints and legal uncertainties threaten its continuity. In Colombia [], Home Care Programs for Chronic Patients aim to better integrate services but still face difficulties related to the diversity of clinical profiles and the organization of health teams’ work. These examples demonstrate that home care is a significant strategy for reorganizing health services; however, its effectiveness depends on investment, planning, and political will to function resolutely.
One limitation of the study was the difficulty patients and/or family members had in accurately recalling information about the disease. This recall bias was minimized by requesting exams, discharge reports, and medical prescriptions.
Another limitation of the study concerns the use of “hospital readmission” as a general outcome, encompassing both stroke-related and non-stroke-related causes, which may reduce the specificity of the identified predictors. However, this methodological choice was intentional and conceptually aligned with the notion that stroke survivors represent a population with chronic vulnerability. Following the acute event, patients experience functional, cognitive, and metabolic decline that predisposes them to a broad range of health complications and hospitalizations. Therefore, considering overall readmission captures the systemic and long-term impact of stroke and provides a clinically relevant indicator of post-stroke frailty. The use of a composite outcome was also justified by the sample size, as stratifying readmissions by cause would compromise statistical power. Future studies with larger cohorts may refine this model by distinguishing stroke-specific readmissions to increase predictive precision. Another limitation was that the sample consisted of patients from two municipalities in Northeastern Brazil with low Human Development Index (HDI), which restricts the generalizability of the model’s application. However, machine learning models, whenever transferred to settings other than those in which they were developed, must be tested and re-evaluated for accuracy prior to broader implementation.
Beyond these aspects already discussed, the study also adopted a methodological approach that reflects the complexity of post-stroke care. Within this broader context, the decision to adopt overall hospital readmission as the main outcome of the study was both strategic and theoretically grounded. Stroke is a marker of systemic vulnerability, and hospital readmissions reflect post-stroke frailty rather than merely neurological complications. Evidence indicates that a substantial proportion of readmissions occur due to non-neurological causes—such as infections, falls, cardiac complications, and metabolic disorders—demonstrating greater susceptibility to multiple health conditions and constituting a clinical marker of chronic vulnerability [,,]. By adopting overall readmission as the outcome, the model enhances its usefulness for clinical management and primary healthcare, allowing for the stratification of complex patients’ risk, guiding personalized interventions, and supporting continuous surveillance within the logic of the Chronic Condition Care Model (CCCM). Thus, the model’s scope—focused on general readmission rather than stroke-specific recurrence—captures the multidimensional vulnerability of stroke survivors and provides a pragmatic tool for care management, aligned with the principles of the Chronic Care Model and the conceptual structure of the CCCM.
Among the study’s strengths were the high accuracy of the predictive models and the integration of multiple variables, considering not only clinical factors but also socioeconomic and psychosocial aspects. This provided a broader view of hospital readmissions among socially vulnerable patients. The use of the CCCM provided a solid theoretical framework, reinforcing the importance of intersectoral coordination in post-stroke care.
Furthermore, the proposed model can serve as a reference for the development of other machine learning-based predictive models, taking into account the framework of the Chronic Care Model (CCM), patients in situations of social vulnerability, and the outcome of hospital readmission among stroke survivors.
5. Conclusions
In a context where several Latin American countries have been striving to strengthen home care as part of their response to chronic conditions, this study developed predictive models of hospital readmission in stroke survivors living in situations of social vulnerability, based on the principles of the CCCM. The model demonstrated the ability to estimate the probability of readmission with adequate levels of sensitivity and specificity. The good accuracy rates and adequate area under the ROC curve indicate that the models have potential to assist in forecasting hospital readmissions and, consequently, in implementing more effective prevention strategies within post-stroke care.
The decision tree model was within the recommended standards for clinical risk models. Among the most relevant variables for predicting readmission were complications during hospitalization, occurrence of falls, presence of a caregiver, and difficulty sleeping. The analysis of socioeconomic characteristics, such as educational attainment and occupation, was also important, although the model proved more sensitive to clinical aspects such as stroke type and functional dependency.
Machine learning models can be incorporated into post-stroke patient follow-up protocols, prioritizing their use for risk stratification of hospital readmission and the planning of personalized care interventions. Integrating the model into electronic health record systems and developing simplified assessment tools based on the key predictive variables identified in this study may facilitate its routine use by healthcare teams. As a future perspective, multicenter studies are recommended to validate the model across different population contexts and to extend its application to other chronic conditions associated with readmission risk. The integration of such models into public healthcare policies for care management could enhance longitudinal surveillance of vulnerable patients and promote the continuous improvement of care quality.
Author Contributions
Conceptualization, E.S.d.S. and J.W.P.B.; methodology, E.S.d.S., J.W.P.B. and F.L.d.L.F.; software, E.S.d.S., J.W.P.B.; validation, A.C.C.d.S., A.M.R.d.S., A.R.V.d.S. and L.M.F.; formal analysis, E.S.d.S. and J.W.P.B.; investigation, E.S.d.S., F.L.d.L.F. and M.B.d.S.J.; resources, E.S.d.S., M.B.d.S.J. and J.W.P.B.; data curation, E.S.d.S. and J.W.P.B.; writing—original draft preparation, E.S.d.S., J.W.P.B. and F.L.d.L.F.; writing—review and editing, M.d.P.S.G., T.M.M.M. and I.W.D.S.; visualization, L.C.P. and J.C.d.P.; supervision, J.W.P.B.; project administration, E.S.d.S.; funding acquisition, E.S.d.S. and J.W.P.B. All authors have read and agreed to the published version of the manuscript.
Funding
Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq—Processo 402281/2024-1.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Federal University of Piauí (Certificado de Apresentação de Apreciação Ética n. 67949523.4.0000.5214 e Parecer n. 5.981.191; 3 April 2023) for studies involving humans.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
The data presented in this study are available upon request to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Kunstmann, F.S.; Lira, C.T.; Icaza, N.G.; Núñez, F.L.; Grazia, R.D. Estratificación de riesgo cardiovascular en la población chilena. Rev. Médica Clínica Las Condes 2012, 23, 657–665. [Google Scholar] [CrossRef]
- Monroy, B.; Sanchez, K.; Arguello, P.; Estupiñán, J.; Bacca, J.; Correa, C.V.; Valencia, L.; Castillo, J.C.; Mieles, O.; Arguello, H.; et al. Automated Chronic Wounds Medical Assessment and Tracking Framework Based on Deep Learning. Comput. Biol. Med. 2023, 165, 107335. [Google Scholar] [CrossRef] [PubMed]
- Prieto, K. Current Forecast of COVID-19 in Mexico: A Bayesian and Machine Learning Approaches. PLoS ONE 2022, 17, e0259958. [Google Scholar] [CrossRef] [PubMed]
- Santos, H.G.D.; Nascimento, C.F.D.; Izbicki, R.; Duarte, Y.A.D.O.; Porto Chiavegatto Filho, A.D. Machine Learning Para Análises Preditivas Em Saúde: Exemplo de Aplicação Para Predizer Óbito Em Idosos de São Paulo, Brasil. Cad. Saúde Pública 2019, 35, e00050818. [Google Scholar] [CrossRef] [PubMed]
- Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Results; Institute for Health Metrics and Evaluation (IHME): Seattle, WA, USA, 2020; Available online: https://vizhub.healthdata.org/gbd-compare (accessed on 25 October 2025).
- Rissetti, J.; Feistauer, J.B.; Luiz, J.M.; Da Silveira, L.D.S.; Ovando, A.C. Independência Funcional e Comprometimento Motor Em Indivíduos Pós-AVE Da Comunidade. Acta Fisiátr. 2020, 27, 27–33. [Google Scholar] [CrossRef]
- Gonzalez-Aquines, A.; Rosales, J.; De Souza, A.C.; Corredor-Quintero, A.; Barboza, M.A.; Navia-Gonzalez, V.; Brunet-Perez, F.; Lagos-Servellon, J.; Novarro-Escudero, N.; Ortega-Moreno, D.A.; et al. Availability and Barriers to Access Post-Stroke Rehabilitation in Latin America. J. Stroke Cerebrovasc. Dis. 2024, 33, 107917. [Google Scholar] [CrossRef]
- Ang, S.H.; Hwong, W.Y.; Bots, M.L.; Sivasampu, S.; Abdul Aziz, A.F.; Hoo, F.K.; Vaartjes, I. Risk of 28-Day Readmissions among Stroke Patients in Malaysia (2008–2015): Trends, Causes and Its Associated Factors. PLoS ONE 2021, 16, e0245448. [Google Scholar] [CrossRef]
- Deng, Z.; Wu, X.; Hu, L.; Li, M.; Zhou, M.; Zhao, L.; Yang, R. Risk Factors for 30-Day Readmission in Patients with Ischemic Stroke: A Systematic Review and Meta-Analysis. Ann. Palliat. Med. 2021, 10, 11083–11105. [Google Scholar] [CrossRef]
- Kilkenny, M.F.; Dalli, L.L.; Kim, J.; Sundararajan, V.; Andrew, N.E.; Dewey, H.M.; Johnston, T.; Alif, S.M.; Lindley, R.I.; Jude, M.; et al. Factors Associated With 90-Day Readmission After Stroke or Transient Ischemic Attack: Linked Data From the Australian Stroke Clinical Registry. Stroke 2020, 51, 571–578. [Google Scholar] [CrossRef]
- Marques, J.C.; Silva, F.A.R.; Martins, A.N.; Oliveira Perdigão, F.S.; Martins Prudente, C.O.; Fagundes, R.R. Perfil de Pacientes Com Sequelas de Acidente Vascular Cerebral Internados Em Um Centro de Reabilitação. Acta Fisiátr. 2019, 26, 144–148. [Google Scholar] [CrossRef]
- Qiu, X.; Xue, X.; Xu, R.; Wang, J.; Zhang, L.; Zhang, L.; Zhao, W.; He, L. Predictors, Causes and Outcome of 30-Day Readmission among Acute Ischemic Stroke. Neurol. Res. 2021, 43, 9–14. [Google Scholar] [CrossRef] [PubMed]
- Tay, M.R.J. Hospital Readmission in Stroke Survivors One Year versus Three Years after Discharge from Inpatient Rehabilitation: Prevalence and Associations in an Asian Cohort. J. Rehabil. Med. 2021, 53, jrm00208. [Google Scholar] [CrossRef] [PubMed]
- Wen, T.; Liu, B.; Wan, X.; Zhang, X.; Zhang, J.; Zhou, X.; Lau, A.Y.L.; Zhang, Y. Risk Factors Associated with 31-Day Unplanned Readmission in 50,912 Discharged Patients after Stroke in China. BMC Neurol. 2018, 18, 218. [Google Scholar] [CrossRef] [PubMed]
- Lesaine, E.; Francis, F.; Domecq, S.; Miganeh-Hadi, S.; Sevin, F.; Sibon, I.; Rouanet, F.; Pradeau, C.; Coste, P.; Cetran, L.; et al. Social and Clinical Vulnerability in Stroke and STEMI Management during the COVID-19 Pandemic: A Registry-Based Study. BMJ Open 2024, 14, e073933. [Google Scholar] [CrossRef]
- Tong, X.; Carlson, S.A.; Kuklina, E.V.; Coronado, F.; Yang, Q.; Merritt, R.K. Social Vulnerability Index and All-Cause Mortality After Acute Ischemic Stroke, Medicare Cohort 2020–2023. JACC Adv. 2024, 3, 101258. [Google Scholar] [CrossRef]
- Carmo, M.E.D.; Guizardi, F.L. O Conceito de Vulnerabilidade e Seus Sentidos Para as Políticas Públicas de Saúde e Assistência Social. Cad. Saúde Pública 2018, 34, e00101417. [Google Scholar] [CrossRef]
- Dimenstein, M.; Cirilo Neto, M. Abordagens conceituais da vulnerabilidade no âmbito da saúde e assistência social. Rev. Pesqui. Práticas Psicossociais 2020, 15, 1–17. [Google Scholar]
- Vieira-Meyer, A.P.G.F.; Morais, A.P.P.; Campelo, I.L.B.; Guimarães, J.M.X. Violência e Vulnerabilidade No Território Do Agente Comunitário de Saúde: Implicações No Enfrentamento Da COVID-19. Ciênc. Saúde Coletiva 2021, 26, 657–668. [Google Scholar] [CrossRef]
- Souza, C.D.F.D.; Oliveira, D.J.D.; Silva, L.F.D.; Santos, C.D.D.; Pereira, M.C.; Paiva, J.P.S.D.; Leal, T.C.; Mariano, R.D.S.; Araújo, A.K.B.F.D.; Baggio, J.A.D.O. Tendência Da Mortalidade Por Doenças Cerebrovasculares No Brasil (1996–2015) e Associação Com Desenvolvimento Humano e Vulnerabilidade Social. Arq. Bras. Cardiol. 2021, 116, 89–99. [Google Scholar] [CrossRef]
- Mendes, E.V. O Cuidado das Condições Crônicas na Atenção Primária à Saúde: O Imperativo da Consolidação da Estratégia da Saúde da Família, 1st ed.; Organização Pan-Americana da Saúde: Brasília, Brazil, 2012; ISBN 978-85-7967-078-7. [Google Scholar]
- Polit, D.; Beck, C. Fundamentos de Pesquisa em Enfermagem: Avaliação de Evidências Para a Prática da Enfermagem; Artmed: Porto Alegre, Brazil, 2021; ISBN 978-85-8271-489-8. [Google Scholar]
- Malta, M.; Cardoso, L.O.; Bastos, F.I.; Magnanini, M.M.F.; Silva, C.M.F.P.D. Iniciativa STROBE: Subsídios Para a Comunicação de Estudos Observacionais. Rev. Saúde Pública 2010, 44, 559–565. [Google Scholar] [CrossRef]
- Barbosa, A.M.D.L.; Pereira, C.C.M.; Miranda, J.P.R.; Rodrigues, J.H.D.L.; De Carvalho, J.R.O.; Rodrigues, A.C.E. Perfil Epidemiológico Dos Pacientes Internados Por Acidente Vascular Cerebral No Nordeste Do Brasil. Acervo Saúde 2021, 13, e5155. [Google Scholar] [CrossRef]
- Faul, F.; Erdfelder, E.; Lang, A.-G.; Buchner, A. G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences. Behav. Res. Methods 2007, 39, 175–191. [Google Scholar] [CrossRef] [PubMed]
- Caneda, M.A.G.D.; Fernandes, J.G.; Almeida, A.G.D.; Mugnol, F.E. Confiabilidade de Escalas de Comprometimento Neurológico Em Pacientes Com Acidente Vascular Cerebral. Arq. Neuro-Psiquiatr. 2006, 64, 690–697. [Google Scholar] [CrossRef] [PubMed]
- Girotto, E.; Andrade, S.M.D.; Cabrera, M.A.S.; Matsuo, T. Adesão Ao Tratamento Farmacológico e Não Farmacológico e Fatores Associados Na Atenção Primária Da Hipertensão Arterial. Ciênc. Saúde Coletiva 2013, 18, 1763–1772. [Google Scholar] [CrossRef]
- Anderle, P.; Rockenbach, S.P.; Goulart, B.N.G.D. Reabilitação Pós-AVC: Identificação de Sinais e Sintomas Fonoaudiológicos Por Enfermeiros e Médicos Da Atenção Primária à Saúde. CoDAS 2019, 31, e20180015. [Google Scholar] [CrossRef]
- Organização Pan-Americana da Saúde (OPAS). Cuidados Inovadores Para Condições Crônicas: Organização e Prestação de Atenção de Alta Qualidade Às Doenças Crônicas Não Transmissíveis Nas Américas; OPAS: Washington, DC, USA, 2015; ISBN 978-92-75-71738-7. [Google Scholar]
- Fabrizzio, G.C.; Erdmann, A.L.; Oliveira, L.M.D.; Lorenzini, E.; Jensen, R.; Santos, J.L.G.D. Prediction of COVID-19 Patients’ Admission to the Intensive Care Unit Based on the Precision Nursing Framework. J. Health Inform. 2024, 16. [Google Scholar] [CrossRef]
- Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
- Sathyanarayanan, S.; Tantri, B.R. Confusion Matrix-Based Performance Evaluation Metrics. Afr. J. Biomed. Res. 2024, 27, 4023–4031. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- Chicco, D.; Jurman, G. Machine Learning Can Predict Survival of Patients with Heart Failure from Serum Creatinine and Ejection Fraction Alone. BMC Med. Inform. Decis. Mak. 2020, 20, 16. [Google Scholar] [CrossRef]
- Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Man, S.; Bruckman, D.; Tang, A.S.; Uchino, K.; Schold, J.D. The Association of Socioeconomic Status and Discharge Destination with 30-Day Readmission after Ischemic Stroke. J. Stroke Cerebrovasc. Dis. 2021, 30, 106146. [Google Scholar] [CrossRef] [PubMed]
- Organização Pan-Americana da Saúde (OPAS). La Situación de Los Cuidados a Largo Plazo en América Latina y el Caribe; OPAS: Washington, DC, USA, 2023; ISBN 978-92-75-32687-9. [Google Scholar]
- Gaspari, A.P.; Cruz, E.D.D.A.; Batista, J.; Alpendre, F.T.; Zétola, V.; Lange, M.C. Preditores de Internação Prolongada Em Unidade de Acidente Vascular Cerebral (AVC). Rev. Lat.-Am. Enferm. 2019, 27, e3197. [Google Scholar] [CrossRef] [PubMed]
- Giovanella, L.; Almeida, P.F.D. Atenção Primária Integral e Sistemas Segmentados de Saúde Na América Do Sul. Cad. Saúde Pública 2017, 33, e00118816. [Google Scholar] [CrossRef]
- Florian Ángeles, J.M. Sistemas de Salud En Latinoamérica Durante El Periodo 2020 al 2023. Rev. Climatol. 2024, 24, 1374–1381. [Google Scholar] [CrossRef]
- Almeida, P.F.D.; Giovanella, L.; Schenkman, S.; Franco, C.M.; Duarte, P.O.; Houghton, N.; Báscolo, E.; Bousquat, A. Perspectivas Para Las Políticas Públicas de Atención Primaria En Salud En Suramérica. Ciênc. Saúde Coletiva 2024, 29, e03792024. [Google Scholar] [CrossRef]
- Nishimura, F.; Carrara, A.F.; Freitas, C.E.D. Effect of the Melhor Em Casa Program on Hospital Costs. Rev. Saúde Pública 2019, 53, 104. [Google Scholar] [CrossRef]
- Chaverri-Carvajal, A.; Matus-López, M. Cuidados de larga duración en Costa Rica: Enseñanzas para América Latina desde la evidencia internacional. Rev. Panam. Salud Pública 2021, 45, e146. [Google Scholar] [CrossRef]
- Sánchez, Y.C.A. Recomendaciones Para la Articulación de los Programas de Atención Domiciliaria en Paciente Crónico de la Red Pública y Privada En Bogotá: Una Reflexión Desde las Ciencias Contemporáneas. Master’s Thesis, Facultad de Medicina, Universidad El Bosque, Bogotá, Colombia, 2021. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).