Comparison of the Ability to Predict Mortality between the Injury Severity Score and the New Injury Severity Score: A Meta-Analysis

Background: Description of the anatomical severity of injuries in trauma patients is important. While the Injury Severity Score has been regarded as the “gold standard” since its creation, several studies have indicated that the New Injury Severity Score is better. Therefore, we aimed to systematically evaluate and compare the accuracy of the Injury Severity Score and the New Injury Severity Score in predicting mortality. Methods: Two researchers independently searched the PubMed, Embase, and Web of Science databases and included studies from which the exact number of true-positive, false-positive, false-negative, and true-negative results could be extracted. Quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies checklist criteria. The meta-analysis was performed using Meta-DiSc. Meta-regression, subgroup analyses, and sensitivity analyses were conducted to determine the source(s) of heterogeneity and factor(s) affecting the accuracy of the New Injury Severity Score and the Injury Severity Score in predicting mortality. Results: The heterogeneity of the 11 relevant studies (total n = 11,866) was high (I2 > 80%). The meta-analysis using a random-effects model resulted in sensitivity of 0.64, specificity of 0.93, positive likelihood ratio of 5.11, negative likelihood ratio of 0.27, diagnostic odds ratio of 27.75, and area under the summary receiver operator characteristic curve of 0.9009 for the Injury Severity Score; and sensitivity of 0.71, specificity of 0.87, positive likelihood ratio of 5.22, negative likelihood ratio of 0.20, diagnostic odds ratio of 24.74, and area under the summary receiver operating characteristic curve of 0.9095 for the New Injury Severity Score. Conclusion: The New Injury Severity Score and the Injury Severity Score have similar abilities in predicting mortality. Further research is required to determine the appropriate use of the Injury Severity Score or the New Injury Severity Score based on specific patient condition and trauma type.


Introduction
With the recent increase in traffic, the degree of trauma has become increasingly serious. In addition, disasters such as earthquakes, tsunamis, and typhoons occur frequently. Trauma has become the leading cause of morbidity and mortality among individuals aged <40 years and is the third main cause for death worldwide [1]. Meanwhile, there are approximately 5 million deaths due to injuries annually worldwide [2,3].
Following an injury, timely description of the anatomical severity of injuries in trauma patients is important. The Injury Severity Score (ISS) has been considered the "gold standard" [4] indicator for anatomical injury severity since it was first introduced in 1974 [5]. The ISS is the sum of the squares of the single highest Abbreviated Injury Scale score for each of the three most severely injured body regions. However, it only considers one injury per body region. Therefore, in 1997, Osler et al. [6] introduced a modification to the ISS to improve its accuracy and named it the New Severity Injury Score (NISS). The NISS is the sum of the squares of the three highest Abbreviated Injury Scale scores for each patient, regardless of body region.
Various subsequent studies in different countries have used different methodologies to compare the predictive capacities of the ISS and NISS for mortality. A number of these studies found that the NISS is superior to the ISS [7][8][9][10][11][12][13], particularly for predicting mortality in blunt trauma patients [14]. However, because the two tools share similar accuracy and calibration, it has recently been suggested that the NISS should not replace the ISS [15][16][17][18][19][20]. Moreover, the ISS reportedly has a superior ability to predict both intensive care unit admissions and length of hospital stay compared to the NISS [21].
Despite the lack of consistent results, a meta-analysis has not previously been performed, due to the heterogeneity among studies, as identified in a previous systematic review [14]. The present diagnostic meta-analysis aimed to determine if the NISS is better than the ISS for predicting mortality.

Search Strategy and Study Selection
Studies of the predictive ability of the NISS and ISS for mortality were included in this meta-analysis, irrespective of publication status. Two investigators (Qiangyu Deng and Bihan Tang) conducted a systematic search for literature using the electronic databases PubMed, Embase, and Web of Science (all studies until 21 July 2015). The following search terms were used in PubMed: "((New Injury Severity Score OR NISS) OR (Injury Severity Score OR ISS)) AND (((sensitivity OR specificity) OR receiver operating characteristic curve) OR ROC)". The following search terms were used in Embase and Web of Science: New Injury Severity Score, or NISS, or Injury Severity Score, or ISS; and sensitivity, or specificity, or receiver operating characteristic curve, or ROC. Additionally, manual searches of references cited in all relevant original and review articles were conducted. If the full text was not available in the databases, we attempted to obtain the information from the authors by email.

Inclusion and Exclusion Criteria
For studies to be considered eligible for inclusion in the meta-analysis, they should have assessed the performance of the ISS or NISS in predicting patient mortality, and the exact number of true-positive, false-positive, false-negative, and true-negative test results could be either directly or indirectly extracted. The exclusion criteria were as follows: did not assess the performance of ISS or NISS for predicting patient mortality; were not published in English; and were conference abstracts or letters to editors. If more than one article reported data from the same population, then the most recent and complete article was included in the meta-analysis. The two investigators who conducted the literature search also independently performed the study selection and decisions regarding inclusion criteria (Qiangyu Deng and Bihan Tang).

Data Extraction and Quality Assessment
The following data were independently extracted from each eligible study by two investigators (Qiangyu Deng and Bihan Tang): the first author's surname, year of publication, study location, proportion of male subjects, age, study design, number of cases, sensitivity, specificity, area under the receiver operator characteristic (ROC) curve (AUC), and cut-off value.
Quality assessments were independently conducted by two investigators (Qiangyu Deng and Bihan Tang) using a 14-item instrument recommended by the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) [22] checklist criteria. This instrument is widely used as a quality assessment tool for systematic reviews of diagnostic studies. The meta-analysis included studies with positive answers for >7 questions on the tool because studies with positive answers for ≥8 questions are considered high-quality studies. The quality assessment of the original articles was reexamined and independently adjudicated by an additional investigator (Lulu Zhang).

Statistical Analysis
Heterogeneity was evaluated using the I 2 test, and thresholds of 25%, 50%, and 75% indicated low, moderate, and high heterogeneity, respectively. We used Spearman coefficient analysis to test the threshold effect, which reflects the correlation between sensitivity and 1-specificity. The cumulative parameters and 95% confidence interval (CI) of the sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and AUC were then calculated and pooled.
A meta-regression analysis was conducted to test the effect of the number of patients, mortality, cut-off values for the NISS and ISS, and study quality on the diagnostic power of NISS and ISS for mortality. After excluding specific studies, sensitivity analyses were performed to explore if these exclusions would significantly affect the results.
Subgroup analyses were performed for groups based on mortality, number of patients, cut-off value, and country. Mortality reflected the injury severity; the group of serious injuries included mortality ≥0.1, which indicates that ≥10% of the patients in these studies died, and the group of slight injuries included mortality <0.1. A small sample was defined in the inclusion as <100 patients. The cut-off values were grouped using the definitions by Osler et al. [6]. Countries were grouped according to economic characteristics, namely developed and developing countries.
All statistical analyses were performed using Meta-DiSc 1.4 (Hospital Ramon y Cajal and Universidad Complutense de Madrid, Madrid, Spain) [23]. A p value < 0.05 was considered statistically significant.

Search Results and Characteristics of Studies
The search strategy resulted in 7264 studies after removing the duplicates using EndNote ( Figure 1). After reading the title and abstract and excluding studies that did not use the ISS or the NISS as a prognostic method to predict mortality, the full text of 110 studies were evaluated, and 16 studies were excluded because they did not assess the performance of the ISS or NISS in predicting mortality. An additional 77 studies were excluded because the number of true-positive, false-positive, false-negative, and true-negative test results could not be extracted, and six records were excluded because they were not published in English. Finally, 11 full-text articles assessing the performance of the ISS or NISS for predicting mortality were included in the meta-analysis ( Table 1).
The studies originated from seven countries, including America, Brazil, Germany, Iran, and China (Taiwan). The number of patients ranged from 41 to 7208 (total n = 11,886), with all studies including adults, except for one study that included children aged <14 years [18]. Because one study [24] included two groups of patients, and provided results for each group rather than an overall result, we considered the two groups separately in the analysis. More than 50% of the sample in most of the studies comprised men, while one study [25] included only pregnant women. Five of the 11 studies discussed both the ISS and NISS, six studies discussed only the ISS, and none of the studies discussed only the NISS. Most of the patients were based in hospitals, while one study included patients from The Trauma Registry of the German Society for Trauma Surgery (TR-DGU) [26] and one included patients from a level I trauma center [13]. The cut-off value and mortality were also different between studies, and the cut-off value indicates a threshold for predicting mortality. All true-positive, false-positive, false-negative, and true-negative test results were indirectly extracted from the sensitivity, specificity, mortality values and the number of patients. The results of the quality assessment using the QUADAS checklist criteria are also shown in Table 1. The median number of positive answers on the checklist was 9 (range, 8-11). TP, true-positive; FP, false-positive; FN, false-negative; TN, true-negative; AUC, area under the receiver operator characteristic curve; Sen, sensitivity; Spe, specificity; ISS, Injury Severity Score; NISS, New Injury Severity Score; a because the study did not report the cut-off value, we set the cut-off value according to definitions used by Osler et al. [6]; b patients with war-related ballistic injuries; c patients with non-ballistic injuries.

Test of the Threshold Effect and Heterogeneity
The Spearman coefficient analysis to test the threshold effect resulted in positive Spearman correlation coefficients for the sensitivity and specificity of the NISS and ISS (Table 2), indicating no threshold effect. However, the I 2 values between studies were high (sensitivity I 2 = 93.2%; specificity I 2 = 99.3%; PLR I 2 = 97.0%; NLR I 2 = 87.5%; DOR I 2 = 94.0%), indicating high levels of heterogeneity. The heterogeneity might have originated from the differences in the samples, cut-off values, or methods between studies, rather than a threshold effect.

Overall Analysis
Because of the high level of heterogeneity (I 2 > 80%), we pooled related statistical parameters (Table 3)

Test of the Threshold Effect and Heterogeneity
The Spearman coefficient analysis to test the threshold effect resulted in positive Spearman correlation coefficients for the sensitivity and specificity of the NISS and ISS (Table 2), indicating no threshold effect. However, the I 2 values between studies were high (sensitivity I 2 = 93.2%; specificity I 2 = 99.3%; PLR I 2 = 97.0%; NLR I 2 = 87.5%; DOR I 2 = 94.0%), indicating high levels of heterogeneity. The heterogeneity might have originated from the differences in the samples, cut-off values, or methods between studies, rather than a threshold effect.

Overall Analysis
Because of the high level of heterogeneity (I 2 > 80%), we pooled related statistical parameters (Table 3)   The two summary ROC (SROC) curves [32] for the ISS (Figure 2) and NISS (Figure 3) plot the true-positive rates against the false-positive rates from each study. The AUC for the NISS (AUC = 0.9095) was similar to that for the ISS (AUC = 0.9009). The two summary ROC (SROC) curves [32] for the ISS (Figure 2) and NISS (Figure 3) plot the true-positive rates against the false-positive rates from each study. The AUC for the NISS (AUC = 0.9095) was similar to that for the ISS (AUC = 0.9009).    The two summary ROC (SROC) curves [32] for the ISS (Figure 2) and NISS (Figure 3) plot the true-positive rates against the false-positive rates from each study. The AUC for the NISS (AUC = 0.9095) was similar to that for the ISS (AUC = 0.9009).

Meta-Regression and Sensitivity Analyses
The meta-regression analysis [33] was conducted because of the high level of heterogeneity between studies (I 2 > 80%); the number of patients, mortality, cut-off value, and quality of studies did not affect the diagnostic accuracy (Table 4). To determine the source of heterogeneity, sensitivity analysis was conducted ( Table 5). The study by Turina et al. [24] was excluded because it was the only study including patients with war-related injuries (ballistic injuries); the study by Domingues et al. [13] was removed because the patients were from a level I trauma center in America; and the studies by Bulut et al. [18] and Schiff et al. [25] were removed because the samples included only children (<14 years old) and pregnant women, respectively. The sensitivity value did not change, and the I 2 value for sensitivity was similar to that obtained without excluding the studies. However, after removing the study by Lefering [26] because the patients were from the TR-DGU, the sensitivity increased, and the I 2 of sensitivity decreased.

Subgroup Meta-Analysis
Because of the high level of heterogeneity, we performed subgroup meta-analyses by grouping studies according to mortality, number of patients, cut-off value, and country. The changes in specificity and sensitivity values are reported in Table 6. Only five studies analyzed the power of the NISS, which was not adequate to perform a subgroup meta-analysis in detail for NISS; hence, we performed subgroup meta-analysis for ISS only.

Discussion
Some recently developed trauma score systems may even discard the ISS or NISS in favor of the worst and the second worst injury. However, the ISS is by far the most frequently used severity score, and has become a kind of gold standard for traumatologists. In most instances, ISS is calculated days to weeks after injury and is only available after appropriately trained medical personnel have calculated the score based on the injuries identified. It is a very useful epidemiologic tool that can be used to guide the design of trauma care capabilities-from the capabilities of the individual center to the trauma system itself. The NISS and ISS have been compared since the NISS was developed, and even now there is no consistent result. However, understanding the abilities of the NISS and ISS to predict mortality is also important. To the best of our knowledge, this is the first diagnostic meta-analysis to study the accuracy of the NISS and ISS.

ISS vs. NISS
Although the NISS was developed because the ISS reportedly did not adequately consider multiple injuries in the same body region [6], the ISS and NISS had similar accuracy in predicting mortality in the present meta-analysis. The sensitivity and PLR of the NISS were slightly higher than those of the ISS, while the specificity, NLR, and DOR of the NISS were slightly lower than those of the ISS.
Likelihood ratios [34] combine the stability of sensitivity and specificity to provide an omnibus index of test performance, which is far more useful than its constituent parts. The PLR of the NISS (5.22) and that of the ISS (5.11) are similar, with a positive NISS or ISS outcome indicating an approximately five-fold higher risk of mortality. However, we should consider that a PLR should be at least 10 in order to be useful. The lower NLR for the NISS (0.20) suggests that it had a higher accuracy than the ISS (0.27) for predicting survival.
The DOR [35] is a single indicator of test accuracy that combines the sensitivity and specificity data into a single number, ranging from zero to infinity. Higher values indicate better discriminatory performance (higher accuracy) of a test, and a DOR of 1.0 indicates that the NISS or ISS does not discriminate between survival and mortality. In our study, the lower DOR for the NISS (DOR = 24.74) indicates that the NISS has less ability than the ISS (DOR = 27.75) to distinguish between survival and mortality.
The SROC summarizes the sensitivity and specificity, and is an index that is not affected by the threshold effect [32], making it more effective than the other indices. Based on the SROC curve in the present study, the AUC of the NISS (0.9095) was similar to that of the ISS (0.9009).
All of these indices indicate that the overall accuracy of both the NISS and ISS in predicting mortality was as high as expected, and they both had medium sensitivity and high specificity. In the five studies evaluating both the ISS and NISS, four studies [17,25,28,29] concluded that the accuracy of the ISS and NISS for predicting mortality was similar, while another study [13] concluded that the NISS is superior to the ISS. Therefore, we conclude that the ISS and NISS had similar accuracy for predicting mortality, in contrast with the findings of Baker [5], who developed the ISS, and Osler et al., who developed the NISS and found better prediction of mortality with the NISS than with the ISS [6].

Factors Affecting the Accuracy of the ISS
Some factors affected the accuracy of ISS in distinguishing between death and survival. The accuracy of the ISS changed for different subgroups. Sensitivity and specificity values were higher with sample sizes <100 and lower for studies from developed countries. Sample sizes <100 were present in the developing countries of Croatia and Turkey, where war, terrorism, or disaster occur more frequently; therefore, the ISS system is used often and correctly, resulting in better sensitivity. However, studies with a lower sample size always show better results, because moderate results in small samples will not be published. Therefore, the publication bias here should be considered when explaining the results.
The specificity was also lower with mortality ≥0.1, suggesting that the ISS performs worse in patients with severe injuries. The sensitivity after removing the study by Turina et al. did not change, suggesting that the power of the ISS for predicting mortality was not affected by patients with war-related injuries (e.g., ballistic injuries). Moreover, the sensitivity was not affected when the study by Domingues et al. was removed, which included patients from a level-I trauma center in America, indicating that the ISS has the same power in predicting the mortality of patients from a level-I trauma center or a hospital. The inclusion of children and pregnant women also did not affect the sensitivity of the ISS. Only when the study by Lefering was removed did the sensitivity increase. The TR-DGU study [26] included patients from 88 hospitals in Germany, Austria, and Switzerland, potentially resulting in different populations, which decreased the sensitivity of the ISS.

Source of Heterogeneity
The high level of heterogeneity among studies was not affected by the threshold effect; however, the meta-regression analysis did not determine an obvious reason. After removing the study by Rolf et al. from the sensitivity analysis, the I 2 of sensitivity decreased. Therefore, the heterogeneity might also have been affected by the differences in the countries and populations in Germany, Austria, and Switzerland, which supports the results of Tohira et al. [14]. The subgroup meta-analysis showed that the study quality, number of patients (sample size < 100, I 2 = 0.0%, or I 2 = 56.8%), and differences in the economic characteristics between countries (developed country, I 2 = 71.6%) contributed to the heterogeneity. Therefore, larger sample sizes and higher study quality contributed more to the heterogeneity. However, publication bias should be considered when explaining the influence of sample sizes on heterogeneity. The difference in studies from developed and developing countries might also have been large; in developed countries, more studies would observe a uniform standard, and a better economy would provide better health services for patients. Therefore, there was less heterogeneity among these studies. In addition, mortality and the cut-off value had little influence on heterogeneity in the subgroup meta-analysis.

Limitations
This meta-analysis had certain limitations. First, we included only studies published in English, which might have excluded other good studies written in other languages, such as German or Korean; Second, only one study [2] evaluated the ISS and mortality rates in different age groups. Therefore, owing to the limited detail, we were not able to analyze certain factors-such as age, treatment, cure, medical department, ethnic group, and injury type-which might influence the accuracy of the NISS or ISS; Third, most of the studies were retrospective reviews-only one study was prospective [31]. Therefore, well-designed prospective studies are required. Fourth, only two injury severity scoring systems were included in the meta-analysis. There are other tools, including the Trauma and Injury Severity Score [14], International Classification of Diseases-based ISS [14], and Severity Characterization of Trauma [36]. However, because few studies that evaluated these tools met the standards for this meta-analysis, we excluded these tools from this review. Fifth, because of the limited data for meta-analysis, in this study, we discussed only their ability to predict mortality, and excluded age and physiology, which are two of the three determinants of trauma outcome. Despite these limitations, to our knowledge, this is the first meta-analysis focusing on the NISS and ISS for predicting mortality. Our study helps to highlight areas that would benefit from further investigation.

Conclusions
In conclusion, our study demonstrated that the NISS is similar to the ISS for predicting mortality. The sources of heterogeneity and factors affecting their accuracy for distinguishing between mortality and survival were assessed using meta-regression analysis, subgroup analysis, and sensitivity analysis. These findings are valuable for understanding if the NISS is superior to the ISS and how to compare the ISS and NISS. This research might provide directions when using the NISS and ISS for differential prediction of mortality and survival for patients. However, further research is required to determine the appropriate use of the ISS or NISS based on the specific condition of the patient and the type of trauma.