Prediction of Poststroke Depression Based on the Outcomes of Machine Learning Algorithms

Poststroke depression (PSD) is a major psychiatric disorder that develops after stroke; however, whether PSD treatment improves cognitive and functional impairments is not clearly understood. We reviewed data from 31 subjects with PSD and 34 age-matched controls without PSD; all subjects underwent neurological, cognitive, and functional assessments, including the National Institutes of Health Stroke Scale (NIHSS), the Korean version of the Mini-Mental Status Examination (K-MMSE), computerized neurocognitive test (CNT), the Korean version of the Modified Barthel Index (K-MBI), and functional independence measure (FIM) at admission to the rehabilitation unit in the subacute stage following stroke and 4 weeks after initial assessments. Machine learning methods, such as support vector machine, k-nearest neighbors, random forest, voting ensemble models, and statistical analysis using logistic regression were performed. PSD was successfully predicted using a support vector machine with a radial basis function kernel function (area under curve (AUC) = 0.711, accuracy = 0.700). PSD prognoses could be predicted using a support vector machine linear algorithm (AUC = 0.830, accuracy = 0.771). The statistical method did not have a better AUC than that of machine learning algorithms. We concluded that the occurrence and prognosis of PSD in stroke patients can be predicted effectively based on patients’ cognitive and functional statuses using machine learning algorithms.


Introduction
Poststroke depression (PSD) is one of the most common psychiatric disorders in stroke patients [1,2]. Its incidence ranges from 10 to 52% according to subject selection or diagnostic criteria [2][3][4][5][6]. The pathophysiology of PSD is not obvious, although it might be associated with the secondary effects of psychological distress and cognitive impairment, not stroke itself [7]. The relationship between PSD and cognitive impairment has already been widely investigated. Cognitive impairment, in fact, is known to be one of the major predictors of PSD [6,8,9]; moreover, PSD is associated with a greater degree of cognitive impairment and has a negative impact on the activities of daily living (ADL) that are integral to recovery [4,5,10,11]. Based on the strong relationship between PSD and cognitive impairment, reductions in PSD symptoms might also enhance patients' cognitive and functional recovery [12], although the treatment effect remains unclear. PSD is also associated with functional impairment, and rehabilitation is less effective; moreover, hospital stays are longer in PSD patients than in stroke patients without depression [13,14]. Some studies have reported that treatment for PSD also affects cognitive and functional improvements [4,12]. However, others have found no improvement in cognitive impairment, albeit significant reductions in depressive symptoms in stroke patients were noted [15,16]. Machine learning (ML) algorithms are widely used in the medical field for the prediction of disease diagnosis and treatments. Some recent studies reported that the ML method is more effective in predicting stroke outcomes than statistical methods or scoring systems [17][18][19]. In addition, various psychiatric disorders after stroke, including depression, were predictable using ML algorithms [20]. Some predictors for the treatment outcome of PSD were analyzed statistically in a previous study [21]; however, no studies have used an ML algorithm to predict PSD in stroke patients in combination with the treatment outcome of PSD based on comprehensive cognitive and functional analysis.
In this study, we aimed to use various ML algorithms to predict the occurrence and prognosis of PSD in stroke patients based on their cognitive and functional status and evaluated whether ML algorithms are superior to statistical methods.

Subjects
A total of 623 patients who had a first-ever hemorrhagic or ischemic stroke were reviewed, and finally, 31 patients who were diagnosed with PSD on admission were included ( Figure 1). PSD was confirmed by psychiatrists using the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) for major depressive disorder [22]. Any patients with any of the following conditions were excluded: preexisting major depressive disorder, dementia, Parkinson's disease, any other brain lesions, no follow-up with a psychiatrists' assessment for the progress of PSD, or no follow-up with a computerized neurocognitive test (CNT). Over the course of 4 weeks following their diagnosis, all PSD patients received psychiatric and medical treatments, including antidepressants (i.e., escitalopram, amitriptyline, or fluoxetine). A total of 35 age-matched patients who had a first-ever stroke without PSD were recruited as controls, and their cognitive and functional changes were compared with those of PSD patients.

Measurements
Basic characteristics including medical and family history, education period, and smoking habits were documented, and neurological, cognitive, and functional tests were performed on all subjects, including those in the control (n = 34) and PSD (n = 31) groups, at admission to a rehabilitation unit in the subacute stage following stroke and 4 weeks after initial assessments.

Neurological Assessment
A neurological assessment scored on the National Institutes of Health Stroke Scale (NIHSS) [23] was performed. In addition, the type of stroke, which was classified as hemorrhagic or ischemic, and the laterality of stroke were evaluated by magnetic resonance imaging.

Psychological Assessment
When subjects were admitted to a rehabilitation unit, the results of the Hamilton Rating Scale for Depression, a test commonly used to screen for and identify depression [24], were evaluated by a rehabilitation doctor, and the cutoff score was 10 [3]. Subsequently, a psychiatrist interviewed selected patients and diagnosed them with major depressive disorder according to the criteria for the DSM-IV (which specifies more than five different symptoms of depression from a list of nine, at least one of which was either depressed mood or loss of interest or pleasure for more than two weeks) [22,25]. All of the PSD patients were followed up 4 weeks later by the same psychiatrist to assess them for any improvements in their depressive symptoms. According to the psychiatrist's follow-up results, PSD patients were divided into improved (Imp) and not improved (NoImp) groups ( Figure 1).

Cognitive Assessments
The Korean version of the Mini-Mental Status Examination (K-MMSE) and CNT were assessed in all subjects at admission to a rehabilitation unit, and participants were followed up 4 weeks after the initial assessment. The K-MMSE was modified from the original MMSE [26,27] by Kang et al. [28] and consists of 11 questions in the following five categories: orientation to place and time, registration, recall, attention and calculation, and language and complex commands; the total score ranges from 0 to 30. The CNT consists of 20 items in 4 major subtests: visual memory, language memory, visual perception, and language perception [29]. Each item is scored on a scale ranging from T-scores of 27 to 80. The total score ranges from 108 to 1600.

Functional Assessments
The severity of impairment in ADL performance was evaluated using the Korean version of the Modified Barthel index (K-MBI) and functional independence measure (FIM) for all subjects at admission to a rehabilitation unit, and participants were followed up 4 weeks after the initial assessment. The K-MBI is composed of 10 items: hygiene (grooming), bathing, eating, toileting, stair-climbing, dressing, bowel control, bladder control, toilet transfer, and ambulation; scores range from 0 (completely dependent) to 100 (independent in basic ADL) [30,31]. The FIM is composed of 18 items and divided into six areas: self-care, sphincter control, transfer, locomotion, communication, and social. Scores range from 18 (completely dependent) to 126 (independent in basic ADL) [32].

ML Analysis
We used two sets of data: control vs. PSD and patients in the Imp vs. NoImp groups ( Figure 2). For the process of feature selection, baseline characteristics and the initial cognitive and functional data were used initially. Input features were reduced repeatedly one by one, from the least important to the last, which is known as the wrapper method. This was done to select the best set of features for prediction based on feature importance; it turned out that performance was better when this approach was used. The prediction of PSD occurrence and prognosis was developed using 5 ML models of support vector machine linear (SVM_L), support vector machine with radial basis function (RBF) kernel (SVM_R), k-nearest neighbors (KNN), random forest (RF), and a voting ensemble (VE) algorithm from a retrospective study that included all subjects. Then, we assessed model performance using accuracy and receiver operating characteristic (ROC) curves. The average accuracy was improved from 53.21% to 60.61% in the SVM model by the wrapper method. The wrapper method was used to select the most important predictors, and data were analyzed by ML algorithms with 5-and 10-fold cross-validation. In 10-fold cross-validation, groups were randomly shuffled and partitioned into 10 groups, each of which was used as the test set, while the remaining nine were used for training. A total of 13 randomly selected test datasets were used for the prediction of PSD occurrence and prognosis. Decision tree classifiers are commonly used to provide a descriptive representation of a classifier. The inner nodes of the decision tree represent features, and the branches represent decision rules. In this paper, we used a decision tree classifier from the scikit-learn package with Gini impurity as a partitioning criterion [33].

Statistics
Statistical analyses were performed using PASW Statistics 18 for Windows (IBM Corp., New York, NY, USA). The Shapiro-Wilk test was performed to assess the normal distribution of all numerical data from each group. The likelihood ratio was obtained to analyze baseline categorical data, such as sex and the cause and laterality of stroke on initial assessment, among other aspects. The Mann-Whitney U test was used to compare numerical data of age, educational period, NIHSS score, and time since stroke between participants in the control and PSD groups and between participants with PSD who showed improvement in their symptoms and between participants with PSD without improvement in their symptoms. The Mann-Whitney U test was performed to compare the initial, followup, and gain values of the K-MMSE, CNT, K-MBI, and FIM total and subtest scores between participants in the control and PSD groups and between participants with PSD who showed improvement in their symptoms and participants with PSD without improvement in their symptoms. A Wilcoxon sum rank test was conducted to compare the initial values to the follow-up values of K-MMSE, CNT, FIM, and K-MBI total and subtest scores in the same subjects in the control and PSD groups. The Spearman rank correlation analysis was performed using statistically significant parameters that were determined by the Mann-Whitney U test between participants in the control and PSD groups and participants with PSD who showed improvement and participants with PSD without improvement in their symptoms. The logistic regression analysis was performed, and significant variables for the prediction of PSD (control vs. PSD groups) and PSD prognosis (Imp vs. NoImp) were determined using the forward Wald method. The predictive performance was considered based on the area under the ROC curve (AUC) with their 95% confidence intervals (CIs), sensitivity values, and specificity values. Statistical significance was set at p < 0.05.

Baseline Characteristics
Among the 31 PSD patients, depression symptoms were improved in 13 patients (41.9%) 4 weeks after their initial assessment. The number of positive depressive symptoms among the nine-item depression module based on DSM-IV [34] showed no difference between NoImp and Imp groups initially (6.06 ± 1.39 and 6.69 ± 1.60 for NoImp and Imp groups, respectively), but the Imp group showed a smaller number of depressive symptoms at follow-up period (6.67 ± 1.53 and 2.08 ± 2.25 for NoImp and Imp groups, respectively, Table 1). We found that there were no differences in age, sex, educational period, onset from stroke to initial evaluation, type and laterality of stroke, NIHSS score, family history, mental disorder history, smoking year, history of diabetes mellitus or hypertension between participants in the control and PSD groups, and the educational period was different between participants in the NoImp and Imp groups (Table 1). Note: Values are presented as the number of subjects (%) or mean ± standard deviation. 1 p values between participants in the NoImp and Imp groups determined by the Mann-Whitney U test or the likelihood ratio, 2 p values between participants in the control and PSD groups determined by the Mann-Whitney U test or the likelihood ratio, * p < 0.05. Abbreviations: PSD = poststroke depression; NoImp = PSD patients with no symptom improvement; Imp = PSD patients with improvement in their symptoms; NIHSS = National Institutes of Health Stroke Scale.

Cognitive and Functional Analysis Using Statistical Methods
Patients' cognitive status was expressed as the total K-MMSE score (14.0 ± 8.4 and 14.1 ± 7.7 for participants in the control and PSD groups, respectively) and CNT score (419.1 ± 180.1 and 390.1 ± 127.8 for participants in the control and PSD groups, respectively) were not different between the groups initially ( Table 2). Initial functional status based on the total scores of the K-MBI (25.7 ± 25.2 and 19.8 ± 15.5 for participants in the control and PSD groups, respectively) and FIM (46.3 ± 22.9 and 44.1 ± 15.5 for participants in the control and PSD groups, respectively) were also not different between participants in the control and PSD groups. During the 4-week follow-up period, the K-MMSE, K-MBI, and FIM total scores were improved for participants in both the control and PSD groups without differences between groups, but the CNT total score was not changed from the initial values ( Table 2). The follow-up total K-MBI and FIM scores were lower for participants in the PSD group (40.1 ± 19.7 and 59.1 ± 17.5, respectively) than for participants in the control group (46.5 ± 28.6 and 64.7 ± 28.2, respectively). When comparing the PSD patients in the Imp group with those in the NoImp group, follow-up K-MMSE total scores were higher for patients in the Imp group (21.6 ± 6.8) than for those in the NoImp group (16.1 ± 8.2) ( Table 2).
The detailed scores of the K-MMSE were divided into five categories. All categories except registration at the follow-up period showed improvement from initial values, but there was no difference between participants in the control and PSD groups (Table 3). When comparing participants with PSD in the Imp group with those in the NoImp group, the recall subscores of participants in the Imp group at the follow-up period (2.8 ± 2.2) were higher than those of participants in the NoImp group (0.9 ± 1.0) ( Table 3). Among the 12 subtests of the CNT, the initial visual attention omission error score of participants in the PSD group (30.5 ± 12.0) was lower than that of participants in the control group (38.2 ± 18.2), and the gain scores of auditory attention correct time standard deviation (SD) and visual attention commission error of participants in the PSD group (5.9 ± 12.9 and 5.4 ± 15.5, respectively) were different from those of participants in the control group (−7.5 ± 21.7 and 9.3 ± 13.9, respectively) ( Table 4). The follow-up scores of most categories except auditory and visual attention omission error were improved for participants in the control group, whereas the follow-up scores of only four subtests, such as digit span backward (DSB) language memory, auditory attention correct response, and commission error, and visual attention correct response, improved for participants in the PSD group (Table 4).
Among the 10 items of the K-MBI, the initial subscore of dressing was significantly lower for participants in the PSD group (2.0 ± 1.9) than for participants in the control group (3.4 ± 2.7). In addition, follow-up subscores of bathing, toileting, stair-climbing, dressing, bladder control, transfer and ambulation, and gain subscores of bladder control were lower for participants in the PSD group than for participants in the control (Table 5). However, there was no difference in the initial, follow-up, or gain subscores of all items between participants with PSD in the Imp and NoImp groups (Table 5).    Among the six areas in the FIM, the follow-up subscores of self-care, transfer, and locomotion were significantly lower for participants in the PSD group (17.8 ± 7.3, 8.3 ± 3.9, and 4.3 ± 3.3, respectively) than for participants in the control group (23.2 ± 10.2, 12.2 ± 5.5 and 5.9 ± 3.7, respectively). In addition, all initial, follow-up, and gain subscores for participants in the Imp and NoImp groups were not significantly different (Table 6). As a result of the correlation analysis between featured parameters, a strong correlation was observed between the total and subscores of the functional tests (K-MBI and FIM) in the control and PSD groups (Supplementary Table S1), and the total and subscores of the cognitive tests (K-MMSE and CNT) in the Imp and NoImp groups (Supplementary  Table S2). Then, logistic regression analysis was performed, and two parameters, the initial subscore of auditory attention omission error on the CNT (CNT1_AA OE) and the initial subscore of bathing on the K-MBI (MBI1_Bat) for the comparison of participants in the control and PSD groups, and one parameter, educational period for the comparison of participants in the Imp and NoImp groups, were included. The AUC and accuracy used to classify the control and PSD groups were 0.706 and 0.696, respectively, and those used to classify the Imp and NoImp groups were 0.797 and 0.778, respectively (Table 7).

ML Analysis
The featured parameters were selected through a wrapper method, and each parameter was ranked by importance using support vector machine linear model-based recursive feature elimination ( Table 8). The five featured parameters (MBI1_Amb, CNT1_AA OE, MMSE1_Rec, MBI1_Dre, FIM1_Loc) for the prediction of PSD (control vs. PSD groups) and five parameters (MBI1_Bla, MBI1_Bow, FIM1_Tra, CNT1_VM VSB, FIM1_Com) for the prediction of PSD prognosis (Imp vs. NoImp) were used for ML analysis. When five ML algorithms (SVM_L, SVM_R, KNN, RF, and VE) with 5-or 10-fold crossvalidation were compared, SVM_R with 10-fold cross-validation showed the best AUC for the prediction of PSD occurrence (0.711), and SVM_L with five-fold cross-validation showed the best AUC for the prediction of PSD prognosis (0.830) ( Table 9). Accuracies of SVM_R with 10-fold cross-validation for the prediction of PSD occurrence and SVM_L with five-fold cross-validation for the prediction of PSD prognosis were 7.000 and 0.771, respectively, and which were analyzed using hyper-parameters (Supplementary Table  S3A,B). SVM_L with 10-fold cross-validation showed the best sensitivity (0.775) for the prediction of PSD occurrence and showed the best sensitivity (0.650) and specificity (0.950) for the prediction of PSD prognosis ( Table 9).
The ROC curves of each ML algorithm and logistic regression analysis are shown in Figure 3. The mean AUC of SVM_R with 10-fold cross-validation for the prediction of PSD occurrence was 0.71 ± 0.12, which was comparable to that of logistic regression analysis (0.71 ± 0.07). The mean AUC of SVM_L with five-fold cross-validation for the prediction of PSD prognosis was also higher than that of logistic regression analysis (0.83 ± 0.14 vs. 0.80 ± 0.09) (Figure 3). Table 9. AUC, accuracy, sensitivity, and specificity of each ML algorithm.

Decision-Making Model for the Prediction of PSD Occurrence and Prognosis
Decision tree classification models for the prediction of PSD occurrence were created using initial values of featured parameters, as shown in Table 8 ( Figure 4). From this analysis, we can recursively construct a tree structure in which the input featured parameters and their values can be precisely assigned a given label by generating an appropriate partition and final decision of PSD for clinical use. We found that the initial subscore of auditory attention omission error on the CNT was identified as the first single discriminator for group determination between participants in the control and PSD groups (Figure 4). The initial ADL and locomotor functions such as dressing and transfer on the K-MBI were also important discriminators to predict PSD occurrence (Figure 4).

Discussion
In this study, we found that numerous cognitive and functional statuses were associated with the occurrence of PSD in stroke patients (Tables 5 and 6). The recall and auditory attention omission error among the subitems of the MMSE and CNT, and dressing and locomotor function including ambulation among the subitems of the K-MBI and FIM were considered to be important features to predict the occurrence of PSD ((A) in Table 8). From decision-making models, we found that initial scores of visual and auditory attention were important for the prediction of PSD occurrence. A previous study revealed that the severity of depression and the decrease in visual and auditory tension tend to correlate to a weak degree in PSD patients [35], and depressive patients also showed impaired visual attention omission errors in a meta-analysis [36].
PSD is frequently seen in stroke patients, and it might worsen their cognitive and functional recovery and quality of life [37]. The prevention of PSD has been suggested in previous studies using various nonpharmacological modalities and antidepressants [38]; however, the evidence is very limited, and more clinical trials are needed to confirm effective prevention methods for PSD [39]. There is no doubt that early detection and treatment of PSD can help improve a patient's prognosis; therefore, it is important to identify modifiable risk factors and their application to stroke patients. From previous studies, major risk factors from meta-analysis still debated according to researchers were identified as follows: previous history of mental disorders, including depression or anxiety; diabetes mellitus; cognitive impairment and functional deficits, including impairment in ADL; and other factors, such as old age, female sex, lesion location, and stroke type [40,41]. It should be noted here that risk factors, which are determined by statistical methods, cannot be used to develop a predictive model directly because risk factors contain results, course, or complications of target diseases, as well as causative factors and can be used as an explanatory model [42].
ML methods are a useful tool to overcome the limitations of statistical methods and help us to find predictive models [43]. Most previous ML studies on stroke patients have focused on the prediction of stroke occurrence or outcome [44], and only one study has revealed the relationship between stroke and mood disorders, including depression, apathy, and anxiety, using ML analysis [20]. In this study, we used various ML algorithms to predict PSD occurrence and prognosis for the first time, and the SVM linear and SVM with an RBF kernel were optimal to develop a predictive model for PSD. The RBF kernel is commonly used in SVM classification, has the advantages of the KNN algorithm, and overcomes challenges of using the RBF alone, such as the space complexity problem [45]. We also tried to apply other ML algorithms, including KNNs and RFs, which were effective in predicting stroke occurrence and prognosis in previous studies [44]. The VE algorithm is the combination of all ML models that were used in this study to improve model performance; however, its AUC and accuracy were not higher than those of the SVM linear or SVM with the RBF kernel ( Table 9). The VE model uses multiple models for the analysis, which might contain pros and cons of each model, and individual algorithms, which are superior to other algorithms, can show better performance.
Whereas cognitive impairment and PSD are highly connected with each other, the effect of PSD treatment on cognitive improvement was not clear in previous studies. In one study, improved PSD patients showed greater cognitive improvement than nonimproved PSD patients showed [46], but in another study, PSD treatment only helped to improve attention [47]. Previous studies also revealed that motor recovery is obvious in treated PSD patients [48]. In our study, cognitive impairments such as visual memory visual span backward of the CNT, were closely related to improvements in patients with PSD ((B) in Table 8); however, further studies are needed to further confirm these relationships. We found that the educational period was strongly associated with recovery from PSD in the logistic regression analysis (AUC = 0.80). In previous studies, educational level might be associated with anxiety, depression [49], and poststroke cognitive impairment [50] and be a protective factor against PSD [51,52].
The statistical method of logistic regression analysis did not show a higher AUC than the optimal ML algorithm for the prediction of PSD occurrence and prognosis (Figure 3). During statistical processing of logistic regression, only a few parameters were included for the comparison of participants in the control and PSD groups or the Imp and NoImp groups, whereas ML algorithms included five parameters for the detection of AUC and accuracy. The sample size was important to reduce bias in regression coefficients [53], and a smaller sample size might have influenced the low AUC and accuracy of the regression analysis in this study. Nevertheless, ML algorithms in this study showed comparable performance to the existing statistical methods and might have been a suitable method to overcome a small sample size.
The SVM linear and SVM with RBF showed the best AUC among various ML algorithms for the prediction of PSD occurrence and prognosis; however, the specificity for the prediction of PSD occurrence and the sensitivity for the prediction of PSD prognosis was low, which means that these models might not accurately detect PSD among stroke patients or PSD patients whose symptoms improved depending on the cut-off selected. We studied 65 stroke patients, including controls without PSD, for the development of ML algorithm-based prediction models of PSD occurrence and prognosis; notably, many previous studies could not evaluate more than 60 stroke patients [16,54,55]. Small sample sizes can cause some problems, such as generalization, in the field of ML. In the medical field, the small sample size is due to an imbalance in which the number of people with the disease is smaller than the number of people without the disease. This imbalance problem can be addressed by introducing a method of oversampling relatively small data [56]. In this study, we performed a 5-and 10-fold stratified cross-validation set to compensate for the small sample size [57]. Although there are differences according to the ML algorithms, no significant difference was found between the two methods for the prediction of PSD occurrence and prognosis (Table 9). A prospective study can provide more accurate clinical information; therefore, it is possible to develop better predictive models. Pertinently, some prospective studies have been designed to predict PSD, but they were not replicable in an independent stroke group [6].
In this study, we used various ML algorithms to predict PSD occurrence and prognosis for the first time and showed better performance than the statistical method. However, further studies with larger sample sizes and longer follow-up periods are needed to ascertain applications for clinical use.

Conclusions
We concluded that the occurrence and prognosis of PSD in stroke patients can be predicted effectively based on cognitive and functional status using ML algorithms.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm11082264/s1, Table S1: Correlation analysis between featured parameters in cognitive and functional tests for control and PSD patients; Table S2: Correlation analysis between featured parameters in cognitive and functional tests for PSD patients; Table S3: Accuracy analysis using various hyper-parameters for the prediction of PSD occurrence and prognosis with 5-fold cross-validation (A), and 10-fold cross-validation (B).

Data Availability Statement:
The data presented in this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.