Performance Comparison of Systemic Inflammatory Response Syndrome with Logistic Regression Models to Predict Sepsis in Neonates

In 2005, an international pediatric sepsis consensus conference defined systemic inflammatory response syndrome (SIRS) for children <18 years of age, but excluded premature infants. In 2012, Hofer et al. investigated the predictive power of SIRS for term neonates. In this paper, we examined the accuracy of SIRS in predicting sepsis in neonates, irrespective of their gestational age (i.e., pre-term, term, and post-term). We also created two prediction models, named Model A and Model B, using binary logistic regression. Both models performed better than SIRS. We also developed an android application so that physicians can easily use Model A and Model B in real-world scenarios. The sensitivity, specificity, positive likelihood ratio (PLR) and negative likelihood ratio (NLR) in cases of SIRS were 16.15%, 95.53%, 3.61, and 0.88, respectively, whereas they were 29.17%, 97.82%, 13.36, and 0.72, respectively, in the case of Model A, and 31.25%, 97.30%, 11.56, and 0.71, respectively, in the case of Model B. All models were significant with p < 0.001.


Introduction
Sepsis is the major cause of deaths in neonates, and most of these deaths occur in countries that lack resources [1,2]. Due to non-specific clinical signs, diagnosis of bacterial sepsis remains a difficult task for neonatologists [3]. Blood culture remains the gold standard for identifying sepsis. Although, its accuracy is not 100%, a positive blood culture may be due to presence of contaminants in blood [4]. In addition, a negative blood culture result may be due to a low volume of blood [5,6]. Despite this, whenever there is a suspicion of sepsis, blood is drawn for blood culture, and the neonate is started on antibiotics. For each neonate with blood culture-proven sepsis, approximately 12 additional newborns receive antibiotics, resulting in bacterial resistance and excessive hospital costs [7][8][9][10].
The definition of sepsis has been revised three times-in 1992 [11], 2001 [12], and in 2016 [13]-but none of these definitions defines sepsis in the case of neonates. In 2005, a consensus definition of sepsis was determined for all children less than 18 years of age, and pediatric systemic inflammatory response syndrome (SIRS) was defined; however, preterm neonates (<37 weeks of gestation) were excluded from this definition [14]. In 2012, Nora Hofer et al. [15] did a study to check the performance of the SIRS definition given in 2005, but did not include preterm neonates.
In this paper, we tested the capability of SIRS to predict sepsis in neonates irrespective of their gestational age (i.e., pre-term, term, or post-term), using the cut-off values given in [14]. We also made two prediction models, named Model A and Model B, using logistic regression. Model A consisted of the same parameters as those of SIRS. Model B included birth weight as an additional independent variable (in addition to the Model A parameters). Our decision to add birth weight was based on a conclusion derived from statistical analysis (refer to the Results section), in addition to statements given in [16] ("The risk of late-onset sepsis increases with decreasing birth weight and gestational age.") and [17] ("Early onset sepsis is an important cause of illness and death among infants with very low birth weights (less than 1500 g)"). To use these models in real-world scenarios, we made an android application that does all of the complex calculations, and can be easily used by physicians.

Study Design
This is a retrospective study, which uses the data from the Medical Information Mart for Intensive care (MIMIC) III dataset [18,19]. MIMIC III is an open-access research database that contains more than 58,000 hospital admissions for 38,645 adults and 7875 neonates. The data was collected from the intensive care units at Beth Israel Deaconess Medical center (BIDMC) between June 2001 and October 2012. All data were de-identified in accordance with Health Insurance Portability and Accountability Act (HIPAA); therefore, patient consent was waived by the Institutional review boards of Massachusetts Institute of Technology (MIT) and BIDMC.
Inclusion criteria were the presence of blood culture report and age ≤30 days at the time of the first phlebotomy for blood culture. Neonates with missing or incomplete data necessary for the calculation of scores were excluded from the study.

Definitions
We defined sepsis as the positive blood and/or Cerebrospinal Fluid (CSF) culture [20,21]. The time of suspicion of sepsis was defined as the initial time of the earliest culture draw. We defined a time window of 12 h starting from the suspicion of sepsis, and all the required parameters were taken in this time window. Table 1 shows the cut-off values of SIRS parameters mentioned in the pediatric consensus definition of SIRS [14]. It can be seen that cut-off values vary with age. According to this definition, at least two of the following four criteria must be met, and one of which must be abnormal temperature or abnormal leukocyte count, although some literature shows that white blood cell (WBC) count and temperature shows less sensitivity in the diagnosis of neonatal sepsis [22,23].

Statistical Analysis
All the data was extracted from the MIMIC III dataset using Postgre SQL 9.6 queries (Global Development Group, Berkeley, CA, USA). We used IBM SPSS Statistics version 24 (SPSS Inc., Chicago, IL, USA) and Microsoft Excel 2016 (Microsoft, Inc., Redmond, WA, USA) for the statistical analysis. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were calculated for each model.
Fisher's Exact Test and Pearson's Chi-Square Test were used as a goodness-of-fit tests. We used p < 0.001 to determine statistical significance.
Among the 4651 neonates that met the inclusion criteria, 3053 neonates were excluded because of incomplete data, and 18 were excluded because of duplicate entries. Thus, 1580 neonates with 204 cases of sepsis (12.91% disease prevalence) were studied. Figure 1 shows the complete extraction process from the MIMIC III database. Fisher's Exact Test and Pearson's Chi-Square Test were used as a goodness-of-fit tests. We used p < 0.001 to determine statistical significance. Among the 4651 neonates that met the inclusion criteria, 3053 neonates were excluded because of incomplete data, and 18 were excluded because of duplicate entries. Thus, 1580 neonates with 204 cases of sepsis (12.91% disease prevalence) were studied. Figure 1 shows the complete extraction process from the MIMIC III database. We extracted a total of 9 parameters within the time window of 12 h (as mentioned above), including blood culture reports, CSF culture, temperature (Temp.), heart rate (HR), respiration rate (RR), WBC count, birth weight, age at the time of admission, and gestational age. All the parameters were extracted using PostgreSQL queries except gestational age, which was extracted manually from text notes available in the MIMIC III dataset. Minimum and maximum values of these parameters (wherever applicable) were calculated. Table 2 shows the baseline characteristics of 1580 neonates.  We extracted a total of 9 parameters within the time window of 12 h (as mentioned above), including blood culture reports, CSF culture, temperature (Temp.), heart rate (HR), respiration rate (RR), WBC count, birth weight, age at the time of admission, and gestational age. All the parameters were extracted using PostgreSQL queries except gestational age, which was extracted manually from text notes available in the MIMIC III dataset. Minimum and maximum values of these parameters (wherever applicable) were calculated. Table 2 shows the baseline characteristics of 1580 neonates.

Results
To select the independent variables for Model A, we used binary logistic regression using backward logistic regression (LR) and discriminant analysis, using the stepwise method in SPSS. Both methods resulted in the exclusion of Respiratory Rate (Maximum), as well as Temperature (Maximum). The remaining parameters, as shown in Table 3 (excluding birth weight), were used to build Model A. Table 3 also shows the predictive power of independent variables in decreasing order, using a structure matrix. For Model B, we used the Omnibus tests of the model coefficients to examine the significance of birth weight as an additional parameter. Table 4 shows the Chi-Square values with p < 0.001 in Model B, with Model A used as a baseline model and birth weight being added as an additional parameter. This table shows that Model B provides a better fit for the prediction of neonatal sepsis. Table 4. Omnibus tests of model coefficients.
Step 1 Step Thus, two models were created using binary logistic regression. Model A consisted of five variables (heart rate, respiratory rate, temperature, leukocyte count, and age at the time of first culture draw) that were then also used to validate SIRS. The main difference between SIRS and Model A is that SIRS is calculated using a simple decision rule that uses the cut-off values mentioned in Table 1. Whereas Model A was developed using logistic regression; it predicts the outcome as a probability, rather than as a binary decision. To predict the probability of sepsis, predictor variables are entered, and the probability is calculated using the steps shown in Section: Mobile Application of this paper.
In Model B, we included birth weight as a fifth parameter. For Model A and Model B, the data were randomly divided into two parts, i.e., 70/30, for training and validation purposes. These two models were trained and validated, and were compared with the SIRS. Table 5 shows the comparison of these two models with SIRS. Table 5 illustrates that Model A performed better than SIRS, despite using the same parameters. Moreover, Model B showed better sensitivity than Model A. Both of these models were compared at a 0.5 cut-off value. Table 6 shows the goodness-of-fit test results for the three prediction models using the Pearson Chi-Square and Fisher's Exact Test (whichever is applicable). All the models were statistically significant with p < 0.001. Table 7 shows the performance of Model A and Model B at different cut-off values of probability. This table helps the reader to gain a better understanding of what to expect from each of these two models with respect to sensitivity and specificity. As expected, the value for sensitivity decreases as we move from a lower cut-off value to a higher cut off value. At a cut-off value of 0.7, the sensitivity and specificity of Model A and Model B are approximately equal to SIRS. Hence, the usefulness of both Model A and Model B vanishes.

Mobile Application
To use Model A and Model B in place of SIRS, it is necessary to make its implementation simpler in the real world. To this end, we created an android application that can predict sepsis in neonates using all three models; namely, SIRS, Model A and Model B. Figure 2 shows screenshots of the mobile application that we developed using the App Inventor platform [24]. App Inventor is an open-source visual programming environment that is maintained by MIT. In the bottom of Figure 2, we can see that there is a contradiction between the outcomes of the three models. As per the SIRS criteria, the neonate is free from sepsis, whereas the other two models show a high probability of sepsis. In this particular scenario, Model A and Model B correctly predicted the outcome (the data shown is of a sepsis-positive neonate).

Mobile Application
To use Model A and Model B in place of SIRS, it is necessary to make its implementation simpler in the real world. To this end, we created an android application that can predict sepsis in neonates using all three models; namely, SIRS, Model A and Model B. Figure 2 shows screenshots of the mobile application that we developed using the App Inventor platform [24]. App Inventor is an open-source visual programming environment that is maintained by MIT. In the bottom of Figure 2, we can see that there is a contradiction between the outcomes of the three models. As per the SIRS criteria, the neonate is free from sepsis, whereas the other two models show a high probability of sepsis. In this particular scenario, Model A and Model B correctly predicted the outcome (the data shown is of a sepsis-positive neonate).   Table 1; while for Model A and Model B, the logistic regression coefficients were calculated using IBM SPSS, and were stored in the Android application. The steps shown below were used for the prediction of outcomes in Model A and Model B.
Step 2: Calculate Probability / 1 where L is the Logit calculated in step 1.
The probability calculated in step 2 is displayed as the output of Model A and Model B.  Figure 3 shows the algorithm for the mobile application. SIRS was calculated according to the cut-off values given in Table 1; while for Model A and Model B, the logistic regression coefficients were calculated using IBM SPSS, and were stored in the Android application. The steps shown below were used for the prediction of outcomes in Model A and Model B.
Step 2: Calculate Probability (P) = e L / 1 + e L where L is the Logit calculated in step 1.
The probability calculated in step 2 is displayed as the output of Model A and Model B.

Discussion
To date, the performance of the SIRS definition given by the pediatric sepsis consensus conference has not been tested in preterm neonates [25]. In this study, we tested SIRS accuracy in neonates irrespective of their gestational age, and compared it with the two prediction models (developed by us) using binary logistic regression. Both LR models performed better than SIRS. Figure 4 shows the performance of all three models in terms of correctly predicted positive and negative cases. For comparison, the cut-off values of Model A and Model B were taken as 0.5.

Discussion
To date, the performance of the SIRS definition given by the pediatric sepsis consensus conference has not been tested in preterm neonates [25]. In this study, we tested SIRS accuracy in neonates irrespective of their gestational age, and compared it with the two prediction models (developed by us) using binary logistic regression. Both LR models performed better than SIRS. Figure 4 shows the performance of all three models in terms of correctly predicted positive and negative cases. For comparison, the cut-off values of Model A and Model B were taken as 0.5. The calculations required to predict the probability of Model A and Model B are time-consuming and complex. We overcame this limitation by developing an android application; it does all the calculations and makes it easier for physicians to use complex prediction models.
Our study has several limitations. The values of the prediction variables have to be identified (maximum and minimum of the parameters) and entered manually in the mobile application. In future work, we will try to develop a system that can automatically enter non-invasive parameters values. Further improvements of models may be possible through the use of sophisticated machinelearning techniques, such as artificial neural networks (ANN).
These models can assist physicians in making the decision to start antibiotics in cases of sepsispositive neonates and to stop antibiotics in sepsis-negative neonates before the blood culture report is available. This study also provides the hope of improving the performance of existing Clinical Decision Rules (CDRs) or scoring systems.

Conclusions
To confirm the results, both models should be evaluated prospectively and externally (using datasets from other sources) for predicting sepsis in neonates within 12 h of the first draw of blood culture. The sensitivity of SIRS in predicting sepsis in neonates (irrespective of their gestational age) was quite low. LR models outperformed SIRS in predicting neonatal sepsis, despite using the same parameters. Further improvements of these models could help the decision-making processes of physicians. We speculate that the same approach may improve the performance of other clinical decision rules and scoring systems.  The calculations required to predict the probability of Model A and Model B are time-consuming and complex. We overcame this limitation by developing an android application; it does all the calculations and makes it easier for physicians to use complex prediction models.
Our study has several limitations. The values of the prediction variables have to be identified (maximum and minimum of the parameters) and entered manually in the mobile application. In future work, we will try to develop a system that can automatically enter non-invasive parameters values. Further improvements of models may be possible through the use of sophisticated machine-learning techniques, such as artificial neural networks (ANN).
These models can assist physicians in making the decision to start antibiotics in cases of sepsis-positive neonates and to stop antibiotics in sepsis-negative neonates before the blood culture report is available. This study also provides the hope of improving the performance of existing Clinical Decision Rules (CDRs) or scoring systems.

Conclusions
To confirm the results, both models should be evaluated prospectively and externally (using datasets from other sources) for predicting sepsis in neonates within 12 h of the first draw of blood culture. The sensitivity of SIRS in predicting sepsis in neonates (irrespective of their gestational age) was quite low. LR models outperformed SIRS in predicting neonatal sepsis, despite using the same parameters. Further improvements of these models could help the decision-making processes of physicians. We speculate that the same approach may improve the performance of other clinical decision rules and scoring systems.