Performance of Prognostic Scoring Systems in Trauma Patients in the Intensive Care Unit of a Trauma Center

Background: Prediction of mortality outcomes in trauma patients in the intensive care unit (ICU) is important for patient care and quality improvement. We aimed to measure the performance of 11 prognostic scoring systems for predicting mortality outcomes in trauma patients in the ICU. Methods: Prospectively registered data in the Trauma Registry System from 1 January 2016 to 31 December 2018 were used to extract scores from prognostic scoring systems for 1554 trauma patients in the ICU. The following systems were used: the Trauma and Injury Severity Score (TRISS); the Acute Physiology and Chronic Health Evaluation (APACHE II); the Simplified Acute Physiology Score (SAPS II); mortality prediction models (MPM II) at admission, 24, 48, and 72 h; the Multiple Organ Dysfunction Score (MODS); the Sequential Organ Failure Assessment (SOFA); the Logistic Organ Dysfunction Score (LODS); and the Three Days Recalibrated ICU Outcome Score (TRIOS). Predictive performance was determined according to the area under the receiver operator characteristic curve (AUC). Results: MPM II at 24 h had the highest AUC (0.9213), followed by MPM II at 48 h (AUC: 0.9105). MPM II at 24, 48, and 72 h (0.8956) had a significantly higher AUC than the TRISS (AUC: 0.8814), APACHE II (AUC: 0.8923), SAPS II (AUC: 0.9044), MPM II at admission (AUC: 0.9063), MODS (AUC: 0.8179), SOFA (AUC: 0.7073), LODS (AUC: 0.9013), and TRIOS (AUC: 0.8701). There was no significant difference in the predictive performance of MPM II at 24 and 48 h (p = 0.37) or at 72 h (p = 0.10). Conclusions: We compared 11 prognostic scoring systems and demonstrated that MPM II at 24 h had the best predictive performance for 1554 trauma patients in the ICU.


Introduction
Predicting mortality in trauma patients in the intensive care unit (ICU) is important for planning better treatment and improving the overall quality of patient care. The Trauma and Injury Severity Score (TRISS) is the most commonly used prediction algorithm to predict mortality outcomes in trauma patients [1][2][3]. The TRISS determines the probability of survival mainly using four variables-age; the Injury Severity Score (ISS), an anatomical variable; the Revised Trauma Score (RTS), a physiological variable value related to the patient's initial Glasgow Coma Scale (GCS) score; systolic blood pressure (SBP); and respiratory rate (RR) [4]-and the injury mechanism, such as blunt or penetrating injuries. However, there is still room for improvement in prediction accuracy based on anatomical and physiological injury scores alone [5]. Therefore, adding clinical data such as previous health status, the main diagnosis of acute illness, physiological change, and laboratory data has been recommended to improve the accuracy of TRISS [6,7].
Many prognostic scoring systems have been developed for critically ill patients in the ICU. Scores for the following prognostic scoring systems are calculated using data collected on the first day in the ICU: the Acute Physiology and Chronic Health Evaluation (APACHE II) [8], the Simplified Acute Physiology Score (SAPS II) [9], and the Mortality Prediction Model (MPM II) at admission [10]. Scores from the following prognostic scoring systems are calculated using data collected from the first day in the ICU until the patient's departure from the ICU, or for the first three days: MPM II at 24, 48, and 72 h [10], the Multiple Organ Dysfunction Score (MODS) [11], the Sequential Organ Failure Assessment (SOFA) [12], the Logistic Organ Dysfunction Score (LODS) [13], and the Three Days Recalibrated ICU Outcome Score (TRIOS) [14]. Although the performance of these systems has been extensively validated in the literature for patients in the ICU, these systems are commonly used for general patients with critical illness. Different scoring systems vary in their predictions according to the diagnoses [15]. For example, the Revised Injury Severity Classification II (RISC II) score is mainly based on severely injured patients treated in the ICU [16], the Emergency Surgery Score (ESS) is recommended for triaging perioperative patients [17], and the Physiological Parameters for Prognosis in Abdominal Sepsis (PIPAS) is for patients with acute peritonitis [18]. The scoring systems currently in use may have varied results for trauma patients [19,20]. As the best predictive score should be validated in the focused population and geographic region where the scoring system is to be employed [6], this study was designed to compare the performance of the aforementioned 11 prognostic scoring systems for predicting mortality outcomes in trauma patients in the ICU. This study was performed based on prospectively registered data in the Trauma Registry System of Kaohsiung Chang Gung Memorial Hospital over a three-year period.

Study Population and Data Collection
This study was approved (approval numbers: 201901360B0 and 201900298B0) by the Institutional Review Board (IRB) of Kaohsiung Chang Gung Memorial Hospital, a 2686-bed level I trauma center in Southern Taiwan [21][22][23]. The informed consent requirement was waived in accordance with IRB regulations. Detailed information on trauma patients who were admitted to the ICU between 1 January, 2016 and 31 December, 2018 that were prospectively registered in the hospital's Trauma Registry System was retrospectively retrieved for analysis. Information was collected about age, sex, body mass index (BMI), pre-existing comorbidities (diabetes mellitus (DM), hypertension (HTN), coronary artery disease (CAD), congestive heart failure (CHF), cerebral vascular accident (CVA), and end-stage renal disease (ESRD)), the Abbreviated Injury Scale (AIS) scores in different regions of the body (head, face, thorax, abdomen, extremities, and external regions), the ISS, and the TRISS. Vital signs (temperature, SBP, diastolic blood pressure, mean arterial pressure, heart rate (HR), and RR), and GCS scores were recorded at triage on arrival at the emergency department. Laboratory data at the emergency room, including sodium (Na) levels, potassium (K) levels, blood urine nitrogen (BUN) levels, creatinine (Cr) levels, bilirubin levels, white blood cell (WBC) counts, hematocrit (Hct) levels, platelet counts, and blood gas levels (oxygenation, arterial pH, and bicarbonate (HCO 3 )) were recorded. In-hospital mortality was recorded as the primary outcome for prediction. In the formula, the Age (index) was awarded 1 for patients above the age of 55, and 0 for patients at or below the age of 55 [4]. The APACHE II, SAPS II, and MPM II scores were calculated according to the variables recorded at admission. The scores for MPM II at 24, 48, and 72 h and MODS, SOFA, LODS, and TRIOS were calculated according to the original proposed algorithms [24].

Statistical Analyses
All statistical analyses were performed using SPSS for Windows version 23.0 (IBM Inc., Chicago, IL, USA) or R 3.3.3. The Chi-square test was used to determine the significance of the association between categorical variables. The Kolmogorov-Smirnov test was used to analyze the normalization of the distributed data for continuous variables. The abnormally distributed data were analyzed using the Mann-Whitney U test. The results are presented as median ± interquartile range (IQR). Predictive performance was determined according to the area under the receiver operating characteristic curve (AUC) using the roc and roc.test function in the pROC package in R [25]. Because the TRISS measures the probability of survival, 1 − TRISS was used to present the probability of mortality for a patient while plotting the receiver operating characteristic curves. A p-value of <0.05 was considered statistically significant. Calibration curves were plotted to determine the degree of agreement between the observed outcomes and predicted probabilities of each model by calculating the rank correlation coefficient of Somers' Dxy, the c-index, R 2 , and the Brier score. Somers' Dxy determines the predictive discrimination with measured probability of concordance minus the probability of discordance between predicted and observed outcomes. The c-index describes how well the model is able to discriminate between mortal and non-mortal patients; a c-index score >0.9 indicates outstanding discrimination. R 2 quantifies the goodness-of-fit of a model [26], with R 2 = 1 indicating that the regression line fits the data perfectly. The Brier score is defined as the mean squared difference between the actual outcome and the predicted probability and falls in the range between 0 and 1 [27]. A lower Brier score indicates a better calibrated prediction.

Patient Demographics
As shown in the flow chart in Figure 1, of the 11,449 enrolled trauma patients, 1760 patients were admitted to the ICU. After excluding 60 patients with burns, 129 patients younger than 20 years, and 15 patients with incomplete data, 1554 patients were left in the study population. Among the 1554 patients enrolled, 178 patients died and 1376 patients survived. Patients who died had higher incidence of pre-existing HTN, CAD, and ESRD and higher AIS scores in the head and thorax regions than those who survived (Table 1). There were no significant differences in sex; pre-existing DM, CHF, or CVA; or AIS scores in the face, abdomen, extremities, or external regions between patients who died and those who survived. Patients who died were significantly older; had higher heart rates, and lower body temperatures, blood pressures, and respiratory rates; and worse GCS scores, ISS, and renal function (BUN level, Cr level, and urine output) than those who survived. Patients who died also had significantly lower levels of Hct, platelets, arterial pH, and HCO 3 than those who survived ( Table 2). There were no significant differences in BMI, Na levels, K levels, bilirubin levels, WBC counts, and oxygenation levels between patients who died and those who survived ( Table 2). Patients who died had significantly shorter stays in the hospital than those who survived (median IQR: 5 days [1,14] vs. 13 days [7,22], p < 0.001). Patients who died had significantly lower TRISS and higher APACHE II; SAPS II; MPM II at admission and 24, 48, and 72 h; and MODS, SOFA, LODS, and TRIOS scores than those who survived (Table 3).

Performance of the Prognostic Scoring Systems
A comparison of AUCs among the 11 prognostic scoring systems ( Figure 2) demonstrated that the MPM II at 24 h had the highest AUC (0.9213), followed by the MPM II at 48 h (AUC: 0.9105) ( Table 4). The MPM II had a significantly higher AUC at 24, 48, and 72 h than at admission (AUC: 0.9063). There was no significant difference in the predictive performance of the MPM II at 24 The calibration curves of these eleven predictions are demonstrated in Figure 3. The MPM II at 24 h generated a nonparametric line close to the ideal diagonal line with the highest Somers' Dxy (0.843), c-index (0.921), and R 2 (0.493) and the lowest Brier score (0.054), followed by the MPM II at 48 h (Somers' Dxy: 0.821, c-index: 0.911, R 2 : 0.450, and Brier score: 0.059). By contrast, MODS, SOFA, and TRIOS exhibited marked deviation from the ideal diagonal line located between the predicted probability and the actual outcome.

Discussion
In this study, we compared the performance of 11 prognostic scoring systems for predicting mortality outcomes in trauma patients in the ICU and revealed that the MPM II has the best predictive performance. The MPM II at 24, 48, and 72 h had a significantly higher AUC than all the other scoring systems. In addition, there was no significant difference in the predictive performance of the MPM II at 24, 48, or 72 h.
The MPM II uses data on heath condition (medical or unscheduled surgical admission), pre-existing illness (such as metastatic neoplasm and cirrhosis), acute diagnosis (such as infection, coma, and intracranial mass effect), physiological variables (such as Cr levels, urine output, and partial pressure of oxygen), laboratory data (prothrombin time), and other variables (such as mechanical ventilation and use of vasoactive drugs) [10]. The MPM II at 48 and 72 h uses the same variables as at 24 h and is based on the most deranged values of the preceding 24 h to determine the outcome with different weights from logistic regression [28,29]. Because the physiological variables of patients are dynamic and may be influenced by ongoing management and resuscitation, estimating the outcome based only on physiological variables would lead to bias [19]. Differences in the variables used in different systems (e.g., acute diagnosis is a variable in APACHE II, but not in SAPS II) would contribute to discrepancies in the performance of these systems [30]. Hence, in our study, better performance of the MPM II than the MODS, LODS, and TRIOS is in line with expectations because the MODS and LODS use only physiological and laboratory data and the TRIOS uses only daily SAPS II and LODS data. In the SOFA system, physiological or laboratory variables from five organ systems are classified by integer from 0 to 4, but the real number for computing also markedly reduces its performance in the prediction of mortality. Furthermore, not using laboratory data in the TRISS reduces its performance.
When choosing a scoring system for specific populations in the ICU, the performance, feasibility (e.g., time to calculate score, abstraction burden, copyright), and interobserver variability should be considered [6]. For a prognostic model to be effective for critical care patients, an acceptable time for data collection is needed. The MPM II uses data exclusively obtained at the time of ICU admission as their proponents have focused on simplicity and feasibility for routine use. The MPM II has the lowest abstraction burden and is less prone to interobserver variability because it uses less physiological and laboratory data [31]. In contrast, the use of APACHE II is deterred by the possible associated comorbidities; furthermore, the selection of only one principal diagnostic category from many specific acute diagnoses may be very difficult [24]. The abstraction burden of the APACHE II is substantially greater than that of the MPM II [32].
The study had a few limitations. First, patients declared dead on arrival at the emergency room were not recorded in the registered database. In addition, only in-hospital mortality, not long-term mortality, was evaluated. This may have led to a selection bias. Second, the lack of data on the mechanism of trauma or severity of injury, which are generally used in the assessment of trauma patients, may have limited the accuracy of the prediction systems for trauma patients. Third, because the effects of any one particular treatment intervention could not be assessed, especially the use of vasopressive support on admission and surgical interventions, we assumed that the treatment outcomes were uniform across all the patients studied. Finally, only single-center data from southern Taiwan was used in the analysis; hence, the study results may not be applicable for other populations.

Conclusions
This study revealed that the MPM II at 24 h had the best predictive performance for predicting mortality outcomes in 1554 trauma patients in the ICU after comparing the performance of the 11 most popular prognostic scoring systems. Considering that each scoring system contains different variables possibly related to the patient's outcome, further study to update some of these prognostic scores is encouraged to make them more usable for trauma populations.