A Computer-Assisted System for Early Mortality Risk Prediction in Patients with Traumatic Brain Injury Using Artificial Intelligence Algorithms in Emergency Room Triage

Traumatic brain injury (TBI) remains a critical public health challenge. Although studies have found several prognostic factors for TBI, a useful early predictive tool for mortality has yet to be developed in the triage of the emergency room. This study aimed to use machine learning algorithms of artificial intelligence (AI) to develop predictive models for TBI patients in the emergency room triage. We retrospectively enrolled 18,249 adult TBI patients in the electronic medical records of three hospitals of Chi Mei Medical Group from January 2010 to December 2019, and undertook the 12 potentially predictive feature variables for predicting mortality during hospitalization. Six machine learning algorithms including logistical regression (LR) random forest (RF), support vector machines (SVM), LightGBM, XGBoost, and multilayer perceptron (MLP) were used to build the predictive model. The results showed that all six predictive models had high AUC from 0.851 to 0.925. Among these models, the LR-based model was the best model for mortality risk prediction with the highest AUC of 0.925; thus, we integrated the best model into the existed hospital information system for assisting clinical decision-making. These results revealed that the LR-based model was the best model to predict the mortality risk in patients with TBI in the emergency room. Since the developed prediction system can easily obtain the 12 feature variables during the initial triage, it can provide quick and early mortality prediction to clinicians for guiding deciding further treatment as well as helping explain the patient’s condition to family members.

This study obtained ethics approval (10911-006) from the institutional review board of the Chi Mei Medical Center, Tainan, Taiwan. The authors carried out all methods in accordance with relevant guidelines and regulations. The Ethics Committee waived the requirement for informed consent due to the retrospective nature of the study.

Flow Chart of Current Study
Our study was in performed in compliance with transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) standards. Figure 1 shows the flow chart for integrating the AI prediction model for patients with TBI in the ED. This study selected 12 feature variables for models training and made several models for comparison with fewer features along with TTAS by significant feature (p < 0.05) variables between mortality and non-mortality groups and included those with better correlation between feature variables and mortality based on the correlation coefficient matrix [33][34][35][36]. The authors trained the models on 70% training data and executed the validations through a 30% test set created by a random split. After, six models were constructed to predict mortality risk.
were constructed to predict mortality risk.
Significance testing was performed by t-test for numerical variables and Chi-square test for categorical variables. In addition, we performed Spearman correlation analysis to show the strength of correlation between each feature and outcome. Due to an imbalanced outcome class (mortality) in the dataset, we applied the synthetic minority oversampling technique (SMOTE) [34][35][36][37] to oversample the positive outcome cases (mortality) to be equal to the negative ones (survival) for the final model training with each machine learning algorithm.

Patient Selection
This study retrospectively enrolled all TBI patients aged 18 years old and above admitted to the emergency room (ER) from 1 January 2010 to 31 December 2019 in the electronic medical records of three hospitals under the Chi Mei Medical Group including one medical center, one regional hospital, and one district hospital.

Features Selection and Model Building
Based on the consensus of our study team consisting of multiple neurosurgeon expert physicians, we selected potential clinical variables based on the following criteria: (i) essential to characterize traumatic brain injury, (ii) routinely acquired/measured, and (iii) easy to interpret with a physical meaning. Then, we used univariate filter methods (including continuous variables and categorical variables; a p-value of 0.05 or lower was considered as the selection) and a Spearman's correlation coefficient and experts' opinions as the final feature decision. The twelve feature variables were patients' age, gender, body mass index, TTAS, heart rate, body temperature, respiratory rate, GCS, left and right pupil size, and light reflex due to their wide availability in the triage setting. We used six Significance testing was performed by t-test for numerical variables and Chi-square test for categorical variables. In addition, we performed Spearman correlation analysis to show the strength of correlation between each feature and outcome. Due to an imbalanced outcome class (mortality) in the dataset, we applied the synthetic minority oversampling technique (SMOTE) [34][35][36][37] to oversample the positive outcome cases (mortality) to be equal to the negative ones (survival) for the final model training with each machine learning algorithm.

Patient Selection
This study retrospectively enrolled all TBI patients aged 18 years old and above admitted to the emergency room (ER) from 1 January 2010 to 31 December 2019 in the electronic medical records of three hospitals under the Chi Mei Medical Group including one medical center, one regional hospital, and one district hospital.

Features Selection and Model Building
Based on the consensus of our study team consisting of multiple neurosurgeon expert physicians, we selected potential clinical variables based on the following criteria: (i) essential to characterize traumatic brain injury, (ii) routinely acquired/measured, and (iii) easy to interpret with a physical meaning. Then, we used univariate filter methods (including continuous variables and categorical variables; a p-value of 0.05 or lower was considered as the selection) and a Spearman's correlation coefficient and experts' opinions as the final feature decision. The twelve feature variables were patients' age, gender, body mass index, TTAS, heart rate, body temperature, respiratory rate, GCS, left and right pupil size, and light reflex due to their wide availability in the triage setting. We used six machine learning algorithms including LR, RF, SVM, LightGBM, XGBoost, and MLP to build a model for predicting in-hospital mortality risk.
We conducted a grid search with 5-fold cross-validation for hyper-parameter tuning (Supplementary Table S1) for each algorithm to better evaluate the model performance and thus obtain the optimal model. A default classification threshold value of 0.5 was used to determine the binary outcome. That is, if the result of the predicted probability was equal to or greater than the threshold, we predicted a positive outcome (mortality); otherwise, we predicted a negative outcome (survival).

Model Performance Measurement and Calibration
The study used the accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic curve) as metrics to measure prediction models' performance, which have long been used as quantitative performance measurement metrics in health care studies [35][36][37][38] as well as in machine learning modeling [36][37][38][39]. Accuracy represents the proportion of true results, either true positive or true negative, in the targeted population. It measures the degree of veracity of a diagnostic test on a condition. Sensitivity represents the proportion of true positives that are correctly identified by a diagnostic test. It means how good the test is at detecting a disease or a disease outcome (i.e., mortality in our model). Specificity represents the proportion of the true negatives correctly identified by a diagnostic test. It means how good the test is at identifying a normal (negative) condition. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds and AUC measures the entire area underneath the ROC curve representing the degree of separability.
Meanwhile, models must be well calibrated for patient-level use cases because errors in individual predicted probabilities can lead to inappropriate decision-making [37][38][39][40]. Thus, we also performed model calibration for performance comparison. The Platt scaling method of calibration was used in our models. We then performed a comparison of models with and without calibration in Section 3.5. Comparing model calibration for the best models.

Demographics and Clinical Pictures in Patients with TBI
The present study included 18,249 patients, of which 9908 were males and 8,341 were females. Their average age was 57.85 ± 19.44 (mean ± SD) years. The average GCS upon arrival at the triage was 14.35 ± 1.94. Further, 266 patients died, with a total mortality rate of 1.44% (266/18,249). A total of 12,334 (67.59%) patients had a level 3 to 5 TTAS. Compared with the non-mortality group, the mortality group had a lower body temperature and BMI [40,41]. Except for heart rate and respiratory rate, all other feature variables were significantly different between mortality and non-mortality groups (Table 1). Due to an imbalanced outcome class in the dataset, this study applied the SMOTE [34][35][36][37] for model training.

The Correlation between Feature Variables and Mortality
To quickly select the proper parameters for machine learning, we conducted a correlation analysis (heat map) of mortality and feature variables using a matrix diagram ( Figure 2). It was found that the seven leading feature variables correlated to mortality were left and right pupillary light reflex, GCS, TTAS, left pupil size, age, and right pupil size. This matrix also showed that the GCS, right and left pupillary light reflex, and TTAS were negatively correlated with mortality and age, and that heart rate and pupil size were positively correlated with mortality during hospitalization. Note: A t-test was used for numerical variables and the Chi-square test was used for categorical variables; because there were no cases of mild severity of TTAS levels IV-V in the mortality group, we merged levels III-V for significance testing in demographics.

The Correlation between Feature Variables and Mortality
To quickly select the proper parameters for machine learning, we conducted a correlation analysis (heat map) of mortality and feature variables using a matrix diagram (Figure 2). It was found that the seven leading feature variables correlated to mortality were left and right pupillary light reflex, GCS, TTAS, left pupil size, age, and right pupil size. This matrix also showed that the GCS, right and left pupillary light reflex, and TTAS were negatively correlated with mortality and age, and that heart rate and pupil size were positively correlated with mortality during hospitalization.

The Predictive Model Using the Twelve Feature Variables
When evaluating the models for mortality risk prediction using the 12 feature variables, this study found that the LR-based model had the best predictive performance (AUC = 0.925), followed by SVM (AUC = 0.920), MLP (AUC = 0.893), XGBoost (AUC = 0.871), random forest (AUC = 0.870), and LightGBM (area under the curve (AUC) = 0.851) ( Figure 3). Further, the LR-based model had the highest accuracy (0.893) for mortality risk prediction with a sensitivity of 0.812, and specificity of 0.894 (Table 2). Note: PPV = positive predictive value; NPV = negative predictive value; CI = confidence interval; AUC = area under receiver operating characteristic curve.

The Predictive Models Using Fewer Feature Variables
In addition to TTAS, we made attempts to build models with fewer other feature variables for prediction power comparison based on the correlation coefficient. One-feature model used TTAS as the single feature; five-feature model used features of TTAS, left and right pupil light reflex, GCS and left pupil size; six-feature model used features of TTAS, left and right pupil light reflex, left pupil size, age, and right pupil size; seven-feature model used features of TTAS, left and right pupil light reflex, GCS, left pupil size, age, and right pupil size. Model performances were reported in order in Table 3. The result reveals that even only used TTAS as the only one feature, the model still has accepted performance (AUC = 0.872).
We conducted the Delong test [41,42] to judge whether one model had a significantly different AUC than another model. According to the p-values in the cells with a 0.05 level, it revealed that the 12-, 7-, and 6-feature models had insignificant differences from each other (Table 4). It implies that if hospitals are unable to prepare complete data of 12 features for patients, they can consider using 7-or 6-feature models and still maintain excellent prediction performance similar to the best 12-feature model.

Comparing Model Calibration for the Best Models
We conducted the model calibration to decrease the error between predicted probabilities and observed probabilities for preventing inappropriate prediction. The result showed calibrated models (Table 5), which were used as the basis for practical implementation, with a slightly higher performance than uncalibrated ones (e.g., for the 12-feature model, AUC: 0.925 change to 0.926; 7-feature model, AUC: 0.909 change to 0.910).

External Validation and Computer-Assisted System Development
To confirm the performance of the AI mortality risk prediction model, this study collected 200 new patients with the same definitions of features and outcomes in the HIS from 10 September 2020 to 10 November 2020 for further external validation ( Table 7). The result showed the values of sensitivity (100%), specificity (84.3%), accuracy (84.5%), PPV (0.088), and NPV (1.0). It revealed that this study's model is acceptably stable and

External Validation and Computer-Assisted System Development
To confirm the performance of the AI mortality risk prediction model, this study collected 200 new patients with the same definitions of features and outcomes in the HIS from 10 September 2020 to 10 November 2020 for further external validation ( Table 7). The result showed the values of sensitivity (100%), specificity (84.3%), accuracy (84.5%), PPV (0.088), and NPV (1.0). It revealed that this study's model is acceptably stable and reliable for supporting clinical decision-making. After we confirmed the best LR-based model, we developed a web-based AI prediction system with the best model and integrated it with the existing emergency triage system to assist clinicians and nurses for better decision making and communication with patients and/or their family members ( Figure 5).
ing Mann-Whitney U test for continuous variables and Fisher's exact test for categorical variables. p-Value of <0.05 was considered to show statistical significance.
After we confirmed the best LR-based model, we developed a web-based AI prediction system with the best model and integrated it with the existing emergency triage system to assist clinicians and nurses for better decision making and communication with patients and/or their family members ( Figure 5).

Figure 5.
A screenshot of the computer-assisted AI prediction system. Table 8 shows a comparison with related studies using machine learning models to predict in-hospital mortality.  Table 8 shows a comparison with related studies using machine learning models to predict in-hospital mortality.

Discussion
This study reviewed related literature and found that this is an innovative study to develop an early mortality risk prediction system using AI algorithms in the emergency triage setting. The result showed that the model of the TTAS together with the 11 feature variables had the best predictive performance (AUC = 0.925). This study showed remarkable results that can be useful in the field of neurocritical care in the ED: (1) Even without imaging studies or laboratory data collection, our twelve feature variables were highly accurate and better than TTAS in predicting mortality risk in the emergency triage setting; (2) the LR-based model had the highest accuracy and the best performance model for in-hospital mortality risk prediction; (3) if the value of the mortality risk calculation result is greater than 91.15%, the emergency physician must pay extra attention in caring for the patient and explain to the family that the patients' chance for survival is low; (4) this study actually integrated the best model into the existing HIS for clinical use.
Consistent with previous studies [3,4,[7][8][9], our results showed that patients who are older, male, with a level 1 to 2 TTAS, have low GCS, without pupillary light reflex, and larger pupil size have significantly higher mortality risk. Furthermore, it was found that those with low body temperature (36.30 ± 0.70 • C) and low BMI (22.68 ± 3.78) have significantly higher mortality risk.
Correlation coefficient matrix using Spearman rank order correlation method is a good statistic method to analyze the relationship between two items being observed [36,37]. The correlation coefficient can range from −1 to 1, with −1 or 1 indicating a perfect relationship [40]. This study's results showed that pupillary light reflex, GCS, and TTAS had low correlation to mortality (correlation coefficient 0.1-0.39), and other feature variables had weak correlation to mortality (correlation coefficient < 0.1). This is possibly the case because compared with conventional clinical study, we used all original data without matching the case group and the control group to diminish the effects of confounding factors; therefore, the data was highly diverse. Moreover, the observation end points were obtained at discharge, which may have been affected by other factors present during this period; thus, future studies should develop a dynamic real-time assessment system to enhance the practicality of the model.
Based on the correlation coefficient matrix (Figure 2), the pupillary light reflex had the highest correlation to mortality. Both the pupillary light reflex and GCS had a higher correlation to mortality than TTAS. This result implies that the TTAS should be modified to suit the needs of patients with TBI in the emergency triage setting for the early prediction of mortality risk. The computerized triage system, the Taiwan Triage and Acuity Scale (TTAS), adapted from the Canadian Triage and Acuity Scale (CTAS, 2017) [42][43][44], was officially launched to avoid over-triage and deploy more appropriate resources for ED patients in Taiwan in 2010 [7]. In the current study, it was found that single TTAS has high AUC (0.876) and sensitivity (0.900) but has low specificity (0.693) and accuracy (0.696) for mortality risk prediction. Based on the results, TTAS combined with eleven other feature variables (AUC = 0.925) obtained the best performance in predicting in-hospital mortality risk in patients with TBI (Table 2). Why is our in-hospital predictive mortality risk model higher than TTAS alone? This may be related to the components of the TTAS, including parameters such as respiratory, hemodynamic, temperature, cognitive impairment and trauma mechanisms. In addition to TTAS, we also include age, gender, BMI, pupil size, and pupil light reflex in our predictive models, which are considered prognostic factors for trauma patients. Although there is still much to learn about its benefits, we recommend the twelve feature variables in the AI predictive model to become integrated in the ER triage for clinical applications. Figure 4a showed the probabilities of the edge points, indicating the median value was 91.15% and that no mortality risk occurred below 10.89%. The same statistics, excluding the false-positive and true-negative cases, are shown in Figure 4b, indicating that the minimal predictive probability of mortality risk was 51.24%. Therefore, we recommend the following: (1) the emergency physician in the triage setting should pay extremely close attention to patients with a predictive value of mortality risk higher than 51.24%, especially those higher than 91.15%; and (2) the treatment protocols should be different among patients with a predictive value of mortality risk higher than 91.25%, between 91.15% and 51.24%, between 51.24% and 10.89%, and below 10.89%.
In this study, despite obtaining a high mortality risk prediction of more than 91.15%, 136 patients survived. We checked their detail data in our emergency system and found that their survival was due to provision of early intervention such as immediate aggressive resuscitation in the ER (3.7%), craniotomy procedure to remove intracranial hemorrhage (38.9%), and early admission to ICU (86.1%). It implied that although the AI prediction model has excellent predictive performance, it can only be regarded as a decision-support tool rather than a diagnostic determinator.
In the subgroup analysis, this study investigated 15 patients with a mortality risk prediction of less than 50% but who did not survive; six patients died from advanced stage cancer-related complications, five patients died due to delayed intractable intracranial hemorrhage, three patients died due to septic shock, and one patient died because of a concurrent malignant middle cerebral artery infarction. This indicates that patients with TBI may be relatively well during their initial assessment; however, underlying disease, delayed hemorrhage, or other complications may aggravate their condition. Therefore, in order to improve the performance of this model, future studies on specific early intervention and advanced cancer-related complications are necessary.
Furthermore, although the accuracy, sensitivity, specificity, and AUC of our proposed AI model were all higher than 0.8, the PPV was low. It could be mainly due to the scarcity of cases with positive outcomes (dead patients) resulting in imbalanced data distribution, and this may cause a very high false-positive rate if deployed. However, for high-risk TBI in the emergency room, the high false-positive rate could still have clinical value but deserves further improvement. This needs to be further explored by follow-up studies. Table 8 demonstrated a comparison with related studies for predicting in-hospital mortality using machine learning models. Compared to other studies, we have the highest number of cases and the highest predictive power to predict the risk of in-hospital mortality at the time of emergency triage, and the model is currently being used in clinical settings.
Core CRASH [45] and core IMPACT [15] are currently the most common prognostic systems for trauma. CRASH includes age, motor score, pupils, hypoxia, hypotension, brain CT scan, and lab findings (glucose and hemoglobin level). Core IMPACT includes age, GCS, pupils reflecting the light, major extra-cranial injury, and brain CT scan findings.
Our prediction system has good results without brain CT and lab components. In the future, the predictive power of three additional prediction systems can be evaluated.
Despite its strengths, this study still has limitations. First, being a retrospective observational study, the feature variables could have been miscoded or biased by many unrecognized confounders which could have affected the mortality of patients with TBI. Second, we did not evaluate other feature variables such as coagulopathy, brain CT scan findings, surgery procedures, and other complications, which could influence the outcome after TBI. Third, as a study of three hospitals of the Chi Mei Medical Group, its results cannot be generalized to other hospitals. Therefore, further external validation is required for more heterogeneous samples to confirm and extend our results. Finally, TTAS as an input feature is a country-specific measurement and may limit the generalizability and adoption of the proposed algorithm outside of Taiwan.

Conclusions
Without clinical laboratory data and imaging studies, our results showed that the LR algorithm was the best algorithm to predict the mortality risk in patients with TBI in the emergency room triage setting. Since the 12 feature variables during the initial triage can be easily obtained, our developed AI system can provide real-time mortality prediction to clinicians to help them explain the patient's condition to family members and to guide them in deciding on further treatment. We believe that predicting the adverse outcomes of patients with TBI using machine learning algorithms is a promising research approach to help physicians' decision-making after patient admission to ER at the earliest possible time.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/brainsci12050612/s1, Table S1: Hyper-parameters range for experiments. Data Availability Statement: Based on the privacy of patients within the Chi Mei Medical Centers Health Information Network, the primary data underlying this article cannot be shared publicly. However, de-identified data will be shared on reasonable request to the corresponding author.