A Machine Learning Approach to Predicting Readmission or Mortality in Patients Hospitalized for Stroke or Transient Ischemic Attack

: Readmissions after stroke are not only associated with greater levels of disability and a higher risk of mortality but also increase overall medical costs. Predicting readmission risk and understanding its causes are thus essential for healthcare resource allocation and quality improvement planning. By using machine learning techniques on initial admission data, this study aimed to develop prediction models for readmission or mortality after stroke. During model development, resampling methods were implemented to balance the class distribution. Two-layer nested cross-validation was used to build and evaluate the prediction models. A total of 3422 patients were included for analysis. The 90-day rate of readmission or mortality was 17.6%. This study identiﬁed several important predictive factors, including age, prior emergency department visits, pre-stroke functional status, stroke severity, body mass index, consciousness level, and use of a nasogastric tube. The Naïve Bayes model with class weighting to compensate for class imbalance achieved the highest discriminatory capacity in terms of the area under the receiver operating characteristic curve (0.661). Despite having room for improvement, the prediction models could be used for early risk assessment of patients with stroke. Identiﬁcation of patients at high risk for readmission or mortality immediately after admission has the potential of enabling early discharge planning and transitional care interventions. predicting 90-day readmission or mortality in patients hospitalized for stroke or transient ischemic attack. Prediction models are ordered according to the area under the receiver operating characteristic curve (AUC).


Introduction
Stroke is a leading cause of mortality and adult disability worldwide [1,2]. It causes a huge financial burden on the healthcare system [3]. Stroke survivors are prone to recurrence of stroke. Approximately 7% to 12% of patients with a first ischemic stroke have stroke recurrence within one year [4,5]. In addition, patients with stroke are likely to develop complications such as pneumonia, urinary tract infection, falls, etc., which may have a deleterious effect on the outcome of stroke [6]. As a result, a substantial proportion of stroke survivors are readmitted for various reasons after the initial hospitalization. In Taiwan, the 30-day readmission rate in patients with stroke was around 10% [7,8], and the 1-year readmission rate was between 30% and 43% [7][8][9][10]. Moreover, nearly 30% of the first-year medical cost for patients with stroke was spent on readmission [9]. Therefore, readmission after stroke is costly and needs more attention.
Patients with stroke who are readmitted have greater levels of disability, a higher risk of mortality, and more medical resource utilization than those not readmitted [11,12]. At the same time, risk-standardized mortality and readmission rates are used as indicators for hospital performance [13]. Even though it is arguable to link these indicators to reimbursement [14], the US Centers for Medicare & Medicaid Services (CMS) has implemented a program to reduce payments to hospitals with excess readmissions [15]. Specifically, for stroke care, CMS uses a hospital-level 30-day risk-standardized all-cause readmission measure to determine payments to hospitals [16]. Hospitals would thereby be penalized if their risk-standardized readmission rates were higher than expected [17]. Similarly, in Taiwan, the rate of unplanned 14-day readmission for the same or a related diagnosis and crude mortality rate are among the continuous monitoring indicators of care quality of Taiwan's hospital accreditation system [18]. While the hospital level is determined based on the results of hospital accreditation, it in turn determines the reimbursement a hospital will receive from the National Health Insurance program in Taiwan.
Thorough knowledge of the risk and causes of readmission or mortality is thus essential for both quality improvement planning and healthcare resource allocation. Because the occurrence of medical complications increases the risk of readmission or mortality, aggressive management, and treatment of complications that are potentially modifiable might be able to reduce readmissions or mortality following stroke [6,8,19]. Furthermore, early risk assessment using information available upon admission might help identify high-risk patients for targeted interventions, which could possibly prevent avoidable readmissions, reduce mortality and improve functional outcomes, and even enhance the financial health of hospitals.
Because machine learning (ML) techniques have been widely used in clinical decision support, this study aimed to use ML-based techniques to explore the predictive factors and to develop prediction models for readmission or mortality in patients hospitalized for stroke or transient ischemic attack (TIA). Specifically, this study compared prediction models that were developed using various ML techniques based on initial admission data from a large teaching hospital in Taiwan.

Study Population
The study hospital is a 1000-bed teaching hospital serving a city and its adjoining rural areas of approximately 500,000 inhabitants. The study population was identified from the hospital stroke registry, which enrolled consecutive patients hospitalized for ischemic stroke, hemorrhagic stroke, or TIA within 10 days of symptom onset. Ischemic stroke, hemorrhagic stroke, and TIA were defined in accordance with the criteria of the Taiwan Stroke Registry [20]. Adult patients admitted with the principal diagnosis of stroke or TIA between October 2007 and January 2016 who were discharged alive were identified. Only those who gave informed consent to participate in the stroke registry were included. Patients who were lost to follow-up at 90 days were excluded. The earliest admission during the study period was designated as the index hospitalization. The study protocol was approved by the Ditmanson Medical Foundation Chia-Yi Christian Hospital Institutional Review Board (CYCH-IRB No.104098).

Variables
The dependent variable of the dataset was a combined outcome of readmission or mortality within 90 days after discharge from the index hospitalization. While previous studies mainly focused on 30-day readmission models [21,22], around half of the first readmissions within one year after stroke occurred within 90 days [8]. The prediction of 90-day readmission after stroke has gained attention in recent years [23,24].
As a standard operating protocol for the stroke registry, each patient was interviewed in person or by telephone at 90 days. Information was obtained from a proxy for patients who could not be interviewed because of neurological deficits or death. The reason to use a combined outcome was to avoid underestimation of readmission rates. Patients who did not survive until hospitalization because of critical illness or sudden death might be recorded as dead rather than readmission during the telephone interview.
The independent variables included variables that were available upon admission, such as demographic data, initial vital signs and laboratory results, past medical history and comorbidities, treatment-seeking behavior, pre-stroke functional status as assessed using the modified Rankin Scale (mRS), and initial stroke severity as assessed using the National Institutes of Health Stroke Scale (NIHSS). Age, pre-stroke mRS, and initial NIHSS were treated as continuous variables. The frequency of prior emergency department (ED) visits within one year was categorized into 0, 1, or ≥2 visits. Prior hospitalizations within one year before the index hospitalization were categorized as yes or no. Physiological measurements and laboratory values were categorized into meaningful groups to align with clinical practice (Table S1). For example, the body mass index (BMI) status was categorized according to the local standard as follows: underweight (<18.5 kg/m 2 ), normal (18.5-23.9 kg/m 2 ), overweight (24-26.9 kg/m 2 ), and obese (≥27 kg/m 2 ) [25].
Class imbalance is common in health-related datasets and may distort the performance evaluation of ML methods because of their preference towards the majority class. Therefore, cost-sensitive learning and data resampling have been widely used to address this problem [26,27]. This study implemented several resampling methods, including undersampling, oversampling, synthetic minority oversampling technique (SMOTE), and class weighting, to investigate their effect on the performance of classifiers. Specifically, the SpreadSubsample, Resample, SMOTE, and ClassBalancer filters in Weka were used.

Experiments
Experiments were conducted using Python 3.7 with the python-weka-wrapper3 package version 0.1.7 running on the MacOS 10.15 operating system. The scripts can be downloaded from Supplementary Material. Figure 1 illustrates the process of the classifier building. Two-layer nested cross-validation was used to build and evaluate the classifiers. In the outer loop, the dataset was split into 10 folds containing a training set and a holdout test set in a 9:1 ratio using stratified random sampling. The process of data splitting was repeated three times by varying the random seed, thus generating 30 training and test set pairs. In the inner loop, the training set was used to build classifiers, for which another 10-fold cross-validation was used to find the optimal hyperparameters ( Table 1). The classifiers with the optimal hyperparameters were tested on the holdout test set. Then the evaluation metrics from the 30 training and test set pairs were averaged to estimate the generalization performance of classifiers. This approach ensures that the training, validation, and evaluation data are completely separated. Figure 1. The process of classifier building. In the outer loop, 10-fold cross-validation (CV) was used to estimate the performance of the classifiers. In the inner loop, another 10-fold CV was used to find the optimal hyperparameters. Feature selection is a common ML technique. It has the potential of improving training efficiency, result comprehensibility, and prediction performance. Therefore, during the experiments, feature selection was applied to the training sets mainly to filter out redundant and/or irrelevant features from the original data to build a more explainable model. In this study, the correlation-based feature subset selection (CfsSubsetEval module in Weka) with the BestFirst search method was used to perform the feature selection procedure. A correlation-based feature selection method was used to evaluate the correlations between feature subsets and the dependent variable. The optimal feature subset contains features that are highly correlated with the dependent variable, but uncorrelated with each other [28]. The best first search strategy was used to find an optimal feature subset from the feature space. This approach does not specify a threshold to determine the most important features. Instead, the search terminates when the limit of the number of fully expanded subsets that result in no improvement is reached [28]. PolyKernel, RBFKernel NA Figure 1. The process of classifier building. In the outer loop, 10-fold cross-validation (CV) was used to estimate the performance of the classifiers. In the inner loop, another 10-fold CV was used to find the optimal hyperparameters. Feature selection is a common ML technique. It has the potential of improving training efficiency, result comprehensibility, and prediction performance. Therefore, during the experiments, feature selection was applied to the training sets mainly to filter out redundant and/or irrelevant features from the original data to build a more explainable model. In this study, the correlation-based feature subset selection (CfsSubsetEval module in Weka) with the BestFirst search method was used to perform the feature selection procedure. A correlation-based feature selection method was used to evaluate the correlations between feature subsets and the dependent variable. The optimal feature subset contains features that are highly correlated with the dependent variable, but uncorrelated with each other [28]. The best first search strategy was used to find an optimal feature subset from the feature space. This approach does not specify a threshold to determine the most important features. Instead, the search terminates when the limit of the number of fully expanded subsets that result in no improvement is reached [28].

Evaluation Metrics and Statistical Analysis
Several metrics were calculated to evaluate the performance of classifiers built using different ML techniques and resampling methods. True positives (TP) indicate the number of patients correctly classified as having the outcome whereas true negatives (TN) mean the number of patients correctly classified as not having the outcome. False positives (FP) indicate the number of patients incorrectly classified as having the outcome while false negatives (FN) mean the number of patients incorrectly classified as not having the outcome. Accuracy is (TP + TN) / (TP + TN + FP + FN). Sensitivity is TP / (TP + FN) whereas specificity is TN / (TN + FP). The receiver operating characteristic (ROC) curve displays the full picture of the trade-off between sensitivity and specificity by plotting sensitivity as a function of (1 − specificity) for all possible thresholds. The performance of classifiers was compared according to the area under the ROC curve (AUC).
Continuous variables were reported with means and standard deviations or medians with interquartile ranges, and categorical variables were reported with counts and percentages. Clinical features between patient groups were compared by Chi-square tests for categorical variables and t-tests or Mann-Whitney U tests for continuous variables. The average of the estimates from the 30 dataset pairs were compared using paired t-tests. All statistical analyses were performed using Stata 15.1 (StataCorp, College Station, Texas). Two-tailed p values < 0.05 were considered statistically significant.

Results
A total of 5581 eligible patients were identified from the hospital stroke registry. After excluding patients who did not give informed consent (n = 1384) and those who were lost to follow-up at 90 days (n = 775), the remaining 3422 patients comprised the study population. The study population did not differ statistically from those who were excluded in age (68.3 ± 12.6 versus 68.3 ± 12.9, p = 0.928), sex (female 39.9% versus 41.3%, p = 0.317), and stroke severity (NIHSS median 4, interquartile range [IQR] 2-9 versus 5, IQR 2-11, p = 0.056). Table 2 gives the demographics and characteristics of the stroke of the study population. Patients with the combined outcome were older, more likely to be female, and had greater stroke severity. The 90-day rate of readmission or mortality was 17.6% (602/3422). Table S1 lists the independent variables considered to build the models. Most of the variables were significantly different between groups.

Important Features
Correlation-based feature selection was applied to the 30 training sets to find the optimal feature subset. Table 3 lists the features and the times that each feature was selected from the training sets. Among them, age, prior ED visits within one year, pre-stroke functional status as assessed by the mRS, initial stroke severity as assessed by the NIHSS, BMI, consciousness level as assessed by the Glasgow Coma Scale, and use of nasogastric tube were the most important features that predict readmission or mortality at 90 days after stroke. These features were constantly selected by the feature selection algorithm from the 30 different training sets. In addition, failed dysphagia screening test, coronary artery disease, cancer, heart failure, atrial fibrillation, recent infection, prior hospitalization within one year, the stage of renal dysfunction as assessed by the estimated creatinine clearance rate, and use of Foley catheter, which were selected in more than 20 out of the 30 feature selection processes, were also key features.  Table 4 gives the average AUC, sensitivity, specificity, and accuracy of various prediction models. The average AUC values across the ML techniques and resampling methods are displayed as a heatmap in Figure 2. In general, the AUC values of NB (0.602-0.661) and LR models (0.539-0.659) were higher than those of the other ML techniques, whereas MLP models (0.545-0.563) had the lowest AUC values. Among the data resampling methods, the undersampling method improved the performance of prediction for most of the ML techniques. On the contrary, the SMOTE method resulted in lower performance for all the ML techniques. Figure 3 shows the average AUC values ordered from the highest to the lowest. NB models with class weighting, undersampling, imbalanced, and oversampling, and LR with class weighting ranked the top five. The AUC values between the top five models were not significantly different according to paired t-tests.

Principal Findings
By analyzing a hospital stroke registry, this study found a rate of readmission or mortality of 17.6% at 90 days after stroke or TIA. The most important features that predict readmission or mortality included age, prior ED visits within one year, pre-stroke functional status, initial stroke severity, BMI, consciousness level, and use of nasogastric tube. Several ML techniques were applied to build prediction models. NB and LR models performed better than the other models in terms of AUC. The best model, i.e., the NB model with class weighting, achieved an AUC of 0.661. Although data resampling was expected to improve the performance of prediction, not all resampling methods performed equally well. Among them, the undersampling method improved prediction performance for most of the ML techniques.

Principal Findings
By analyzing a hospital stroke registry, this study found a rate of readmission or mortality of 17.6% at 90 days after stroke or TIA. The most important features that predict readmission or mortality included age, prior ED visits within one year, pre-stroke functional status, initial stroke severity, BMI, consciousness level, and use of nasogastric tube. Several ML techniques were applied to build prediction models. NB and LR models performed better than the other models in terms of AUC. The best model, i.e., the NB model with class weighting, achieved an AUC of 0.661. Although data resampling was expected to improve the performance of prediction, not all resampling methods performed equally well. Among them, the undersampling method improved prediction performance for most of the ML techniques.

Principal Findings
By analyzing a hospital stroke registry, this study found a rate of readmission or mortality of 17.6% at 90 days after stroke or TIA. The most important features that predict readmission or mortality included age, prior ED visits within one year, pre-stroke functional status, initial stroke severity, BMI, consciousness level, and use of nasogastric tube. Several ML techniques were applied to build prediction models. NB and LR models performed better than the other models in terms of AUC. The best model, i.e., the NB model with class weighting, achieved an AUC of 0.661. Although data resampling was expected to improve the performance of prediction, not all resampling methods performed equally well. Among them, the undersampling method improved prediction performance for most of the ML techniques.

Comparisons with Past Studies
Previous studies that examined 90-day readmissions after stroke have found readmission rates ranging from around 18% to 26% [12,24,29,30] even though the inclusion criteria and outcome measures varied slightly across studies. The rate of readmission or mortality in this study was similar to those in previous reports. However, this study differed from previous readmission models in that only the variables available upon stroke admission were used to build the prediction models.
Several prior studies have investigated readmissions in patients with stroke admitted to post-acute-care or inpatient rehabilitation facilities [21,22,31]. Even though their predictor variables were more informative and generally included past medical history, comorbidities, complications during acute care, and factors related to post-acute care [21,22,31], their values of AUC were not much different from those in this study. For example, a large retrospective study of 803,124 patients with stroke using inpatient rehabilitation facility functional outcome data achieved AUC values ranging from 0.553 to 0.694 in predicting 30-day readmissions [21]. In other words, the prediction of readmission after stroke is not a trivial task. This is probably because various patient clinical and social characteristics are associated with readmission and these characteristics are not always routinely collected in clinical databases [32].
In addition to readmission models for stroke, a systematic review found that readmission models for various disease populations based on administrative data, clinical data, or both generally performed inadequately [33]. Those tested in large populations had a particularly poor discriminative ability with AUCs between 0.55 and 0.65. A study that developed readmission models for heart failure also reported that the use of ML algorithms did not improve prediction performance compared with traditional statistical models [34].

Clinical Implications and Applications in Real-World Settings
The mechanisms underlying readmission are complex and remain incompletely understood. In addition to physiological factors and medical conditions, a variety of psychological, social, and economical factors may intertwine with each other to cause readmission. As shown in Table 3, variables related to socioeconomic statuses such as education and occupation, and variables regarding social support such as ED arrival mode and patient's main caregiver, were more or less associated with the risk of readmission. Moreover, without adequate hospital discharge planning and transitional care interventions, patients may be readmitted after discharge from acute stroke care even though their medical conditions have been properly treated. For example, family members may lack the training, skills, and support services to provide caregiving for disabled stroke survivors, and therefore, bring patients back to the hospital. Adequate preparation and support for transition from acute stroke care to home may be required to reduce readmissions [24].
Previous studies have shown that a substantial proportion (up to 12.9%) of readmissions were potentially preventable [16,35]. Even though hospital discharge planning has the potential of preventing readmissions [36], one of the key steps is to identify patients at risk of readmission, preferably in the early stage of admission. The prediction models in this study were developed using only variables available upon admission and, therefore, can be used to estimate the probability of readmission soon after patients are admitted to the hospital. A clinical decision support system employing the ML prediction models developed in this study can facilitate clinicians to identify patients likely to experience readmission upon hospital admission. In this way, early targeted interventions for patients at high risk of readmission can be enabled. Certainly, these predischarge interventions should be tailored to the individual patient according to the estimated risk of readmission and may include patient needs assessment, patient education, medication reconciliation, the arrangement of early outpatient follow-up, and referrals to home health rehabilitation [37][38][39]. Furthermore, after hospital discharge, at-risk patients identified by the prediction models can be closely monitored to detect in time whether they are having problems and about to bounce back [40].

Future Directions
Several approaches may be attempted to improve the performance of prediction models for readmission. First, high-dimensional information in administrative claims data may be used to supplement clinical information in the development of prediction models. Readmission models for patients with chronic pancreatitis using standardized billing codes and basic patient characteristics were found to perform reasonably with AUCs ranging from 0.65 to 0.73 [41]. Second, in addition to structured clinical information, significant determinants of readmission, such as social factors, can be extracted from clinical notes through natural language processing [42], and in turn, used to build prediction models. Third, other ML algorithms can be explored. For example, extreme gradient boosting models were shown to have higher AUCs than traditional statistical models in predicting 90-day readmissions due to recurrent ischemic events in patients with ischemic stroke [23].

Limitations
First, this is a single-hospital study and the generalizability of the study findings should therefore be made with caution. Second, about 39% (2159/5581) of patients were excluded from the analysis because of declining informed consent or missing 3-month data. However, the possibility of selection bias might not be a concern because the study population did not differ from excluded patients in age, sex, and stroke severity. Third, the dependent variable of interest, i.e., readmission or mortality, was obtained through interviews with patients or their proxies. Therefore, recall bias might cause underestimation of the dependent variable. Fourth, this study did not use variables related to the process of acute care or post-acute care hospitalizations to develop prediction models, thus possibly undermining the prediction performance. On the other hand, prediction models using only information available upon admission might be advantageous in delivering the early targeted intervention to patients at high risk of readmission or mortality.

Conclusions
This study developed ML-based models to predict readmission or mortality in patients hospitalized for stroke or TIA. Several important predictive factors that increase the risk of readmission or mortality were identified, including age, prior ED visits within one year, pre-stroke functional status, initial stroke severity, BMI, consciousness level, and use of nasogastric tube. Various resampling methods were implemented to balance the class distribution. Nevertheless, they did not always improve predictive performance. The NB model with class weighting to compensate for class imbalance achieved the highest prediction performance in terms of the AUC. Even though the prediction models did not have a high discriminatory capacity, these models could be useful for identifying patients at high risk for readmission or mortality immediately after admission and enable early discharge planning and transitional care interventions. Future studies may explore previously unrecognized predictive factors in clinical text or high-dimensional electronic medical records.