Predicting Long-Term Recovery of Consciousness in Prolonged Disorders of Consciousness Based on Coma Recovery Scale-Revised Subscores: Validation of a Machine Learning-Based Prognostic Index

Prognosis of prolonged Disorders of Consciousness (pDoC) is influenced by patients’ clinical diagnosis and Coma Recovery Scale-Revised (CRS-R) total score. We compared the prognostic accuracy of a novel Consciousness Domain Index (CDI) with that of clinical diagnosis and CRS-R total score, for recovery of full consciousness at 6-, 12-, and 24-months post-injury. The CDI was obtained by a combination of the six CRS-R subscales via an unsupervised machine learning technique. We retrospectively analyzed data on 143 patients with pDoC (75 in Minimally Conscious State; 102 males; median age = 53 years; IQR = 35; time post-injury = 1–3 months) due to different etiologies enrolled in an International Brain Injury Association Disorders of Consciousness Special Interest Group (IBIA DoC-SIG) multicenter longitudinal study. Univariate and multivariate analyses were utilized to assess the association between outcomes and the CDI, compared to clinical diagnosis and CRS-R. The CDI, the clinical diagnosis, and the CRS-R total score were significantly associated with a good outcome at 6, 12 and 24 months. The CDI showed the highest univariate prediction accuracy and sensitivity, and regression models including the CDI provided the highest values of explained variance. A combined scoring system of the CRS-R subscales by unsupervised machine learning may improve clinical ability to predict recovery of consciousness in patients with pDoC.


Introduction
After a coma due to severe acquired brain injury, patients can persist in prolonged (>28 days) Disorders of Consciousness (pDoC) [1,2]. The prolonged states generally include the vegetative state/unresponsive wakefulness syndrome (VS/UWS) [3], characterized by wakefulness without awareness and reflexive albeit not purposeful behavioral responses, as well as the minimally conscious state (MCS) [4], in which patients can manifest minimal and inconsistent but reproducible voluntary behaviors.
The pDoC can persist chronically, but some patients have the potential to progress to full consciousness. The management of such complex patients requires an expert multidisciplinary team [5,6]. In this context, clinicians are routinely called upon to provide the patient's family with prognostic information for decision-making and interventional planning purposes, the latter including intensity and duration of treatment [7]. Bilateral absence of the N20 cortical component on somatosensory evoked potentials is the most robust predictor of poor outcome, particularly in patients in coma and pDoC due to hypoxicischemic/anoxic injury [8]. Unfortunately, N20 responses are rarely recorded in the postacute phase. A clinical diagnosis of MCS due to traumatic brain insult has been found to be a predictor of a more favorable clinical outcome relative to VS/UWS likely in relation to a less severe brain damage and to a higher level of cognitive awareness/responsiveness [9][10][11].
Moreover, bedside neurobehavioral measures such as the Coma Recovery Scale-Revised (CRS-R) [12] total score have been found to provide high prognostic accuracy for long-term outcomes [8,[13][14][15][16]. The CRS-R is the most reliable and validated clinical tool for distinguishing patients in MCS from those in VS/UWS [12,17]. It consists of 23 items organized in six subscales assessing patients' behavioral responses independently from each other on auditory, visual, motor, oromotor, communication, and arousal functions. A higher score in a single subscale can classify the patient as being in MCS or as emerging from MCS (i.e., eMCS, patients who have recovered full consciousness and demonstrate functional communication and/or functional object use) [4]. By considering only the highest subscore for the diagnostic classification of patients in MCS or eMCS, the other domains evaluated by the CRS-R that describe patients' characteristics are usually neglected. Nonetheless, it has been shown that a composite score obtained by considering the highest CRS-R subscore on every subscale can improve diagnostic accuracy [18].
Recently, a Consciousness Domain Index (CDI), computed by means of unsupervised machine learning through the combination of the CRS-R subscores, classified a large cohort of patients with pDoC into two clusters, where the visual and motor subscales were the most discriminating factors. The clustering predicted patients who showed consciousness recovery at 6 months post-injury with a higher accuracy than the clinical diagnosis [19]. The prognostic value of the CDI has only been tested in a single patient sample for predicting a short-term outcome (i.e., 6 months post-injury). The current study goals included: a) externally validating the prognostic accuracy of the CDI, with respect to patient clinical diagnosis and to CRS-R total score; and b) investigating the prediction accuracy of the CDI on long-term consciousness recovery at 6-, 12-, and 24-months post-injury.

Study Design and Patient Samples
This retrospective analysis was conducted on data from a multicenter perspective study launched by the International Brain Injury Association Disorders of Consciousness-Special Interest Group (IBIA DoC-SIG), aimed at examining the clinical evolution of a large sample of patients with pDoC as well as identifying outcome prognostic factors for these patients [14,15].
The IBIA DoC-SIG database [14] was used for the external validation of the CDI, previously internally validated on a different database from an Italian prospective study (hence "reference database") [6,19]. For both studies, inclusion criteria were: (i) age ≥ 18 years; (ii) clinical diagnosis of VS/UWS or MCS, according to standard diagnostic criteria [4]; (iii) traumatic or non-traumatic (i.e., vascular or hypoxic-ischemic/anoxic) etiology; and (iv) time post-injury from 28 days to 3 months. Exclusion criteria were: (i) previous history of acquired brain injury, psychiatric, or neurodegenerative disease; and (ii) coexisting neoplasms, severe organ dysfunction, or unstable clinical condition (e.g., hemodynamic instability or severe respiratory failure) that could directly impact the outcome. In both studies, each center collected patient demographic data (i.e., age, sex) and information about medical history (i.e., etiology and time post-injury at study entry). Within 1 week from study entry, repeated CRS-R assessments (at least three times within a 1-week period) were performed for all patients to confirm the clinical diagnosis. The CRS-R with the best total score was considered for the statistical analysis.
In the reference database, patients were followed-up at 6 months post-injury; whereas in the IBIA DoC-SIG study, patients were followed up at 6-, 12-, and 24-months post-injury. In both databases, the clinicians at the participating centers assessed patient consciousness level and clinical diagnosis by means of CRS-R during hospital stay or, after discharge, at home or in chronic care facilities. In the present study, the primary outcome was the recovery of full consciousness (i.e., patients in VS/UWS or MCS who progressed to eMCS) at 6-, 12-, and 24-months post-injury.
The demographic and clinical characteristics of patients included in the IBIA DoC-SIG database were compared with those of patients in the reference database. Numerical independent variables such as age, time post-injury, CRS-R total score, and sub-scores were compared by means of t-test or Mann-Whitney U test, as appropriate after normality tests. For categorical independent variables as sex, etiology, and clinical diagnosis, χ 2 tests were performed.

Cluster Estimation and External Validation
Patients from the reference database were used to derive cluster properties and to deploy the unsupervised model as internally validated in a previous methodological study [19]. In that study, the number of clusters (N clusters ) that best divided the cohort was 2 (yielding maximal Silhouette score). In the present study, we applied the following procedure: (i). Gathering the CRS-R subscores of each patient in the reference database. (ii). Estimating centroids with partitional clustering algorithms (K-means++ clustering, 500 random initializations) [20] for each training fold of a five-fold cross-validation split. (iii). Applying a twin-sample validation approach to each validation set [21] which involved conducting both the cluster training and the validation phases on the training as well as the validation sets and obtaining two cluster labels for each sample. The two different sets of labels for the validation data were compared achieving the twinvalidation accuracy for each validation fold. These metrics allowed us to check the stability of the clustering process. (iv). Aggregating twin-validation accuracies across folds in order to obtain a k-fold crossvalidated twin-sample accuracy. The centroids from the fold resulting in the minimum twin-validation error were employed. (v). Assigning each patient in the external validation set (IBIA DoC-SIG database) to the cluster with minimum 6-dimensional Euclidean distance between her/his CRS-R subscores and the two cluster centroids. Thus, the assignment to a specific cluster (CDI = 0 or CDI = 1) represents the CDI of that patient.
The CDI derived for the patients in the external validation set (IBIA DoC-SIG database) was compared with the clinical diagnosis and the CRS-R total score. In order to appropriately compare the features of the CDI and of the CRS-R total score, the latter was dichotomized based on the two cut-offs that have been found to provide the highest diagnostic accuracy (93%; CRS-R total score = 8), and the highest specificity for identifying MCS or eMCS (100%; CRS-R total score = 10) [22]. The CDI derived for the patients in the external validation set was compared with: (i). The clinical diagnosis at study entry (i.e., VS/UWS vs. MCS). (ii). A binary CRS-R total score using 8 as cut-off (hence CRS-R 8 ).
(iii). A binary CRS-R total score using 10 as cut-off (hence CRS-R 10 ) The comparisons were performed by means of contingency tables and χ 2 analyses with recovery of full consciousness at 6-, 12-, and 24-months post-injury as outcome. For each analysis, χ 2 values and univariate accuracy were estimated, as well as sensitivity and specificity.

Multivariate Analysis
We investigated the relationships between five independent variables (i.e., CDI, clinical diagnosis, CRS-R 8 , CRS-R 10 , and the CRS-R total score) and the outcome at 6-, 12-, and 24month post-injury by means of multivariate logistic regressions, for a total of 15 regression models (five set of independent variables × 3 follow-ups). Confounding variables included in the regression models from study entry as patients' age, sex, time post-injury, and etiology. The Nagelkerke pseudo-R 2 , which indicates the amount of variability accounted for by the predictors in each regression model, was compared across models. For evaluating the discrimination ability of the logistic regression models, the area under the curve of the receiver operating characteristic curve (AuROC) was also computed.

Cohort Comparison
The IBIA DoC-SIG database included 143 patients for whom clinical diagnosis was available at each follow-up (i.e., 6, 12, and 24 months after brain injury). The study flow-chart and details on baseline characteristics of included patients are reported elsewhere [14,15].
Patients in the IBIA DoC-SIG database were found to be significantly younger than patients in the reference database (p = 0.003) and showed a significantly longer time post-injury (p < 0.001; Table 1). Etiologies were represented in significantly different proportions between the two databases (χ 2 = 10.701(1); p = 0.013). In particular, Bonferronicorrected pairwise z-tests revealed a significantly higher percentage of patients with other etiologies (different than traumatic, vascular, or hypoxic-ischemic/anoxic) in the reference database (corrected-p < 0.05). No significant differences were found between the two databases regarding sex, clinical diagnosis at study entry, CRS-R total score and subscores, and percentages of patients who recovered full consciousness at six-month follow-up (p > 0.05; Table 1).

Cluster Estimation and Validation
In order to select clusters for external validation, the centroids producing minimum twin-validation error were retained for further analysis, thus resulting in the ones deriving from the third fold (see Table 2). Either CDI, the clinical diagnosis, or the binary CRS-R scores were found to be associated with the outcome at 6, 12, and 24 months (all p < 0.001), with a higher χ 2 value for all follow-ups when the CDI was adopted (Table 3).   The univariate prediction accuracy of the CDI was stable across the follow-ups (80%), whereas a slight decrease in accuracy was found as a function of the follow-up time using the clinical diagnosis, or the CRS-R 8 and the CRS-R 10 (see Figure 1).     Importantly, the CDI univariate prediction accuracy was higher by 4%, 6%, and 7% for the 6-, 12-, and 24-month outcome, respectively, compared to accuracy of the clinical diagnosis. Similarly, the prediction accuracy of the CDI was also higher than that of the CRS-R 8 and of the CRS-R 10 (see Figure 1).
As expected, the CRS-R 10 provided a higher sensitivity for identifying patients who recovered full consciousness with respect to CRS-R 8 (Figure 1).

Multivariate Analysis
The CDI, the clinical diagnosis, the CRS-R 8 , the CRS-R 10 , and the CRS-R total score were all associated significantly with the outcome at 6, 12, and 24 months after the brain injury (all p < 0.001). In all multivariate logistic regressions, a shorter time post-injury was found to be an indicator of higher likelihood of recovery of full consciousness (p < 0.05; Table 4). Table 4. Multivariate logistic regressions for predicting recovery of full consciousness at 6, 12, and 24 months after brain injury. Panels A, B, C, D, and E indicate regressions including the CDI, the clinical diagnosis, the CRS-R 8 , the CRS-R 10 , and the CRS-R total score, respectively. First, second, and third columns refer to the analysis targeting 6-, 12-and 24-months outcomes, respectively.  In the regression model including the CDI and in the model using the CRS-R total score, a younger age was significantly associated with a good outcome at all follow-ups (p < 0.05), whereas hypoxic-ischemic/anoxic (vs. non-hypoxic-ischemic/anoxic) etiology to brain injury was associated with a poorer outcome at 12-and 24-month post-injury (p < 0.05). In the regression models including the clinical diagnosis, a younger age and male sex were significantly associated with a good outcome at 12-, 24-, and 6-months (p < 0.05) ), respectively. A younger age and male sex were also significantly associated with a good outcome at all follow-ups (p < 0.05) in the regression model including the CRS-R 8 . In the models including the CRS-R 10, a younger age was associated with good outcome at 12 and 24 months (p < 0.05).
The explained variance of the predictors (R 2 ) and the AuROC were found to be higher when using the CDI than either the clinical diagnosis or the CRS-R total score (see Table 4).

Discussion
Accurately identifying patients with pDoC with higher likelihood of recovery of consciousness is crucial when providing prognostic information to patient's relatives, other practitioners and payors, as well as for planning tailored care and rehabilitation pathways. Prognostication of patients with pDoC remains a major challenge due to a lack of validated outcome predictors. This multicenter international study confirmed that classifying individual patients by means of a novel clinical index (i.e., the CDI) combining the CRS-R subs-cores improved prognostic accuracy compared to clinical diagnosis or to the CRS-R total score, as previously observed in a large cohort of patients with pDoC [19].
Notwithstanding the differences in age, time post-injury and etiology between the present and the original validation databases, the prediction accuracy obtained by means of the CDI was stable at 80%. Importantly, the improvement of the prediction accuracy obtained by means of the CDI, compared to the clinical diagnosis, was found to be stable at approximately 5%, with a sensitivity 10% higher than for clinical diagnosis. This finding suggests that the prediction value of the CDI observed previously [19] was in fact not impacted by patient demographics and medical history but was strictly related to the behavioral findings. The CDI derives from a simultaneous multi-dimensional assessment of consciousness, balancing items derived from all CRS-R subscales. The CDI could therefore be considered an index of the level of behavioral response complexity in the multiple domains assessed by CRS-R subscales, thus allowing to stratify patients with pDoC.
Further cross-sectional studies are needed to investigate the possible correlation between CDI and functional connectivity in networks associated with processing of cognitive functions evaluated by the CRS-R subscales. Moreover, in line with a previous study [19], in the present cohort of patients the visual, auditory, and motor CRS-R subscales were the most crucial factors to differentiate the two centroids of the CDI clusters. The visual subscale in particular was found to affect dissimilarity among clusters. This finding could likely be ascribed to the main role of the CRS-R visual subscale in detecting early signs of consciousness [5,23].
Confirming the prediction accuracy of the CDI on a multicenter external cohort of patients (i.e., IBIA DoC-SIG database) allows us to validate the prediction model's reproducibility and generalizability and to propose its use in different clinical and geographic locales as well as cohorts of patients with pDoC [24,25].
Additionally, in the univariate analysis, the prediction accuracy of the CDI was higher than that of the two values (i.e., 8 and 10) of the CRS-R total score, which have been found to be crucial for recognizing patients in VS/UWS (CRS-R total score < 8) and patients with minimal or full consciousness (CRS-R total score ≥ 10) [22]. This finding further supports the hypothesis that the higher predictive value of the CDI could be explained by the fact that the CDI considers information from single subscores, assigning a weight to each sub-scale, whereas the CRS-R total score can denote the overall level of responsiveness to multisensory stimulations [12].
The CDI, clinical diagnosis, and CRS-R total score, together with younger age, shorter time post-injury, male sex, and non-hypoxic-ischemic/anoxic brain injury, were independent predictors of recovery of full consciousness at a multivariate level. A younger age has been frequently associated with a higher likelihood of a good outcome in patients with pDoC [8,14,16,26], probably due to better general health conditions [27] and to a greater level of potential neuroplasticity [28]. A higher degree of brain plasticity is also usually associated with focal brain injuries (i.e., vascular or traumatic), in which spared cortical areas can support functional reorganization [29]. The association between clinical outcome and male sex might be ascribed to sex-related differences in patient care [30], as care of male patients tends to be more frequently supported by caregivers than care of female patients [15,31,32]. In addition, a shorter time after injury at study entry (i.e., at admission to an early care program for patients with pDoC) might characterize patients who spent less time in intensive care units, likely due to less severe brain injury and less medical and neurosurgical complications in the acute phase [14,33,34].
Finally, we found that the improved prediction accuracy at a multivariate level of the CDI with respect to clinical diagnosis and CRS-R continuous and binary CRS-R total scores was also confirmed at short (6 months), long (12 months), and very long (24 months) assessment points after the initial brain insult, suggesting that the composite of higher behaviors significantly impacted the probability of recovery of consciousness, regardless of the time after injury.
There were several study limitations worth noting. First, the analysis could not be carried out, targeting the minimal improvement from UWS/VS to MCS, which has been considered as an outcome in previous longitudinal studies [14] showing the high prognostic relevance of the CRS-R total score. However, the present study aimed to externally validate a behavioral index of consciousness (i.e., CDI) with the already described high predictive value of full recovery of consciousness [19]. Second, the CDI in the IBIA DoC-SIG database is quite consistent with the clinical diagnosis as the majority of patients in VS/UWS (64 out of 68) were assigned a CDI of 0 and the majority of patients in MCS (60 out of 75) were assigned a CDI of 1. Similarly, in the reference database [19], agreement between CDI and clinical diagnosis was found to be 94% (Cohen's k = 0.85). This substantial overlap may suggest that CDI may be redundant, as the index closely parallels the clinical diagnosis. However, we found a 5% increase in prognostic accuracy and 10% increase in sensitivity compared with clinical diagnosis, which may further improve clinical ability to identify patients with a high probability of recovery of full consciousness. Furthermore, we could not investigate the prognostic value of CDI within each diagnostic group (i.e., VS/UWS and MCS), evaluating patients for which the CDI and the clinical diagnosis differed, due to the small sample size. However, the CDI is a novel index that could be applied for stratifying patients with pDoC regardless of clinical diagnosis. Further cohort studies are needed for evaluating the ability of CDI to profile for prognosis even patients with covert consciousness in whom making an accurate clinical diagnosis based on behavioral responses is challenging because of severe cognitive and motor impairments [35].
Finally, we could not evaluate the medical complications that have been found to negatively impact clinical outcomes of patients with DoC [36,37]. However, we believe this innovative and combined measure of behavioral responses could lead to improved prognostic profiling of this challenging patient population with pDoC regardless of the specific clinical diagnosis.

Conclusions
Patients in MCS (vs. in VS/UWS) and with a higher CRS-R total score usually show a better clinical outcome at both short-and long-term, but the prediction accuracy of the clinical diagnosis and of the CRS-R total score remains limited. The present validation study demonstrated that the CDI, an unsupervised machine-learning clinical index based on a combination of the different functions assessed by the CRS-R subscales, can improve prediction accuracy and sensitivity for consciousness recovery at 6, 12, and 24 months after brain injury. Funding: This work was supported by grants from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement 778234-DoCMA project. The study was also supported by the Italian Ministry of Health with the Ricerca Corrente RC2021-RC2022 programs, the 5xMille funds AF2018: "Data Science in Rehabilitation Medicine", and AF2019: "Study and development of biomedical data science and machine learning methods to support the appropriateness and the decision-making process in rehabilitation medicine", by the United States National Institutes of Health Director's Office (DP2HD101400), and by the Tuscany Region (Bando Ricerca Salute 2018) through the Tuscany Network for BioElectronic Approaches in Medicine: AI-based predictive algorithms for fine-tuning of electroceutical treatments in neurological, cardiovascular and endocrinological diseases (TUNE_BEAM, n. H14I20000300002). O.G. and A.T. are research associates at FRS-FNRS.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the coordinator center (protocol 2/16 OSS) and of each center involved in the study.

Informed Consent Statement:
The surrogate decision-makers of the patients enrolled in the study provided their written informed consent after a semi formalized interview in which the purposes, procedures, and time points of the longitudinal study were clearly explained.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.