Reliability of IL-6 Alone and in Combination for Diagnosis of Late Onset Sepsis: A Systematic Review

Diagnosis of neonatal sepsis is difficult due to nonspecific signs and symptoms. Interleukin-6 (IL-6) is a promising marker for neonatal sepsis. We aimed to test the accuracy of IL-6 in neonates after 72 h of life in case of late onset sepsis (LOS). We searched for studies regarding IL-6 accuracy for the diagnosis of LOS between 1990 and 2020 using the PubMed database. Following study selection, the reported IL-6 sensitivities and specificities ranged between 68% and 100% and 28% and 100%, with median values of 85.7% and 82% and pooled values of 88% and 78% (respectively) in the 15 studies including 1306 infants. Subgroup analysis revealed a better sensitivity (87% vs. 82%), but not specificity (both 86%), in preterm infants compared to term infants or mixed populations. Early sample collection revealed the highest sensitivity (84%), but had the lowest specificity (86%). To assess quality, we used a STARD checklist adapted for septic neonates and the QUADAS criteria. Limitations of this review include the heterogeneous group of studies on the one side and the small number of studies on the other side that analyzed different combinations of biomarkers. We concluded that IL-6 demonstrated good performance especially in the preterm infant population and the best results were achieved by measurements at the time of LOS suspicion.


Introduction
The definition of late onset sepsis (LOS) includes presentation after the first 72 h of life and association with the postnatal nosocomial or community environment [1].Neonates in the NICU are prone to LOS due to their immaturity and their lack of maternal protection by maternal antibody transfer in the case of very preterm infants [2].Coagulase-negative staphylococci (Gram-positive cocci) represent the most common organisms causing nosocomial infections followed by Gram-negative bacilli and fungi [1,3].Risk factors for the development of LOS besides immaturity are mechanical ventilation, intravascular catheterization, formula feeding, prolonged duration of intravascular access devices in cases of parenteral nutrition, any surgery, underlying respiratory and cardiovascular disease, and prolonged hospitalization [4].In high-income countries, the mortality rate due to neonatal sepsis (including both early and late onset sepsis) ranges from 5% to 20% and higher mortality rates of over 70% can be observed in low-and middle-income countries (LMICs) [4].Early and efficient treatment reduces both mortality and morbidity in neonates with suspected sepsis [5].Hence, there is a great need for biological markers that immediately increase in cases of inflammation [6].
Released within 2 h after the onset of bacteremia, the levels of pro-inflammatory cytokine Interleukin-6 (IL-6) increase earlier than both PCT and CRP in neonatal septic patients [7,8].IL-6 levels have been shown to be significantly elevated up to 48 h prior to clinical signs of sepsis [9].Measured at the time of sepsis suspicion, IL-6 levels were found to be associated with sepsis severity and mortality risk in preterm infants [7].Combinations of IL-6 with later and more specific biomarkers (e.g., CRP) have been reported [10].
The aim of this systematic review was to determine the accuracy of IL-6, both alone and combined with other markers, for the diagnosis of LOS by reviewing studies published between 1990 and 2020 and to explore the affecting factors.In this meta-analysis, we decided to focus solely on LOS due to the fact that the type of sepsis had previously been recognized as a source of heterogeneity [11].

Material and Methods
We used the Pubmed database to search for diagnostic accuracy studies of IL-6 in neonates published between 1990 and 2020 that proved the diagnostic capacity of IL-6.The search terms we used in combination were the following: (Interleukin-6 OR IL-6) AND (neonatal sepsis OR neonatal infection OR sepsis) AND (late onset sepsis OR LOS OR LONS).We did not need any PubMed filters or language restrictions.
We identified potentially suitable studies by screening the headlines of the studies and the abstracts.The following criteria had to be fulfilled by reviewing the abstract: only neonates presenting with culture proven and/or clinically suspected sepsis and IL-6 (alone or combined with other inflammatory markers) being evaluated regarding its potential for the diagnosis of LOS.We excluded all studies dealing with early-onset sepsis or other bacterial infections, all studies written in other languages than English or German, animal and in vitro studies.In line with the PRISMA criteria (see Figure 1), full text articles were screened for other potentially relevant studies.The following data were extracted from all full-text studies included in the analysis: first author, country, year of publication, definition of LOS, number of neonates, recruitment characteristics, reference standards, analysis of blood samples, and time of sample collection.Finally, the IL-6 test method and its use alone or combined with other markers were documented.All analyses were based on already published studies; thus, no informed consent or approval of the local ethic committee were required.Two investigators (JE, ER) independently performed the data extraction of all the included studies.In the case of discrepancies or disagreements during data extraction, the third reviewer (BR) resolved any differences.
We used an adapted STARD checklist for septic neonates as published by Chiesa et al. to assess the study quality [12].This checklist included 25 items from the key domains; descriptions of participant recruitment, reference standards and index tests, which are answered with either yes or no [12].We additionally performed a quality assessment of the diagnostic accuracy studies (QUADAS) tool including 11 questions.Questions with "yes", "no", and "unknown" answers were scored as 1, −1, and 0, respectively [13].Thus, we could confirm our first analysis using the STARD criteria.
We explored causes for heterogeneity by means of subgroup analysis.The influence of gestational age was evaluated by comparing subgroups of preterm and mixed populations.The timings of sample collection and its influence on IL-6 accuracy were analyzed by dividing the studies into those reporting sample collection at the time when sepsis was suspicious, and those reporting collection times earlier than 12 h, earlier than 24 h and earlier than 48 h after suspicion of sepsis.Blinded studies and those with blood-cultureproven sepsis both formed individual subgroups.Biomarker combinations were assessed if at least three studies were found.For the subgroup analysis, we calculated the 2-by-2 tables of true positives, true negatives, false positives, and false negatives of each study from the data provided by the individual studies.We then pooled these values according to the subgroup.Calculating the sensitivity and specificity based on these values gave us the pooled sensitivity and specificity for this subgroup, or the overall sensitivity and specificity in the case of pooling all studies.Finally, to summarize the results of the primary studies, a summarized ROC (sROC) curve was generated using the Moses-Littenberg method [14,15].The homogenous area under the curve (AUC) was calculated using the formula provided by Rosman et al. [14].
The IL-6 sensitivities and specificities ranged from 68% to 100% and 28% to 100%, respectively; and the median values were 85.7% and 82%, as shown in Figure 2. The pooled sensitivity was 88% (95% CI: 85-90%) and the pooled specificity was 78% (75-81%), as shown in Figure 3.We summarized all the data extracted from the selected studies in Table 1 for IL-6 as a single marker and all the data for IL-6 in combination with other biomarkers in Table 2.
The IL-6 sensitivities and specificities ranged from 68% to 100% and 28% to 100%, respectively; and the median values were 85.7% and 82%, as shown in Figure 2. The pooled sensitivity was 88% (95% CI: 85-90%) and the pooled specificity was 78% (75-81%), as shown in Figure 3.We summarized all the data extracted from the selected studies in Table 1 for IL-6 as a single marker and all the data for IL-6 in combination with other biomarkers in Table 2.
Boxplots of the distribution of IL-6 cutoff (A) and sensitivity and specificity values (B) of all diagnostic accuracy studies on late onset sepsis using IL-6 as a single marker.
Subgroup analyses are shown in Table 3.The sensitivity was higher in the preterm population (87% vs. 82%), while specificity was the same for both study populations (86%).Eleven studies [3,[16][17][18][19][20][21][22][23][24]26] collected blood samples at the time of sepsis suspicion (0 h).Three studies [17,21,27] collected their samples earlier than 12 h after initial sepsis suspicion, six studies [10,17,21,22,24,27] earlier than 24 h and three studies [10,17,24] earlier than 48 h.One study [9] collected samples within a certain time interval rather than at a specific time point, and thus could not be assigned to one of the subgroups.Collecting the sample at the time of sepsis suspicion showed the highest sensitivity (84%), but the lowest specificity (86%) when compared to the later collection times, as follows: sensitivities and specificities of 57% and 94% before 12 h, 54% and 88% before 24 h and 67% and 92% before 48 h.In three studies [17,23,24], the sepsis group was formed by culture-proven cases only and summarizing these studies resulted in a pooled sensitivity and specificity of 85% and 74%, respectively.This corresponds to a decrease of almost 10% in IL-6 specificity, when compared to subgroups with similar sensitivity.Three studies [22,24,25] reported that their researchers were blinded to the results of the index test and the reference Subgroup analyses are shown in Table 3.The sensitivity was higher in the preterm population (87% vs. 82%), while specificity was the same for both study populations (86%).Eleven studies [3,[16][17][18][19][20][21][22][23][24]26] collected blood samples at the time of sepsis suspicion (0 h).Three studies [17,21,27] collected their samples earlier than 12 h after initial sepsis suspicion, six studies [10,17,21,22,24,27] earlier than 24 h and three studies [10,17,24] earlier than 48 h.One study [9] collected samples within a certain time interval rather than at a specific time point, and thus could not be assigned to one of the subgroups.Collecting the sample at the time of sepsis suspicion showed the highest sensitivity (84%), but the lowest specificity (86%) when compared to the later collection times, as follows: sensitivities and specificities of 57% and 94% before 12 h, 54% and 88% before 24 h and 67% and 92% before 48 h.In three studies [17,23,24], the sepsis group was formed by culture-proven cases only and summarizing these studies resulted in a pooled sensitivity and specificity of 85% and 74%, respectively.This corresponds to a decrease of almost 10% in IL-6 specificity, when compared to subgroups with similar sensitivity.Three studies [22,24,25] reported that their researchers were blinded to the results of the index test and the reference standard, while one of these studies only analyzed biomarker combinations [25].Hence, blinding only formed part of the study design in two studies eligible for subgroup analysis.With 81% and 80%, both sensitivity and specificity were lower than in the preterm groups, mixed study population and sample collection at the time of sepsis suspicion, as shown in Table 4.
Children 2024, 11, x FOR PEER REVIEW 5 of 20 standard, while one of these studies only analyzed biomarker combinations [25].Hence, blinding only formed part of the study design in two studies eligible for subgroup analysis.With 81% and 80%, both sensitivity and specificity were lower than in the preterm groups, mixed study population and sample collection at the time of sepsis suspicion, as shown in Table 4.        4B.Was a reference standard used to exclude sepsis?14 4C.Was a composite reference standard used to identify all newborns without sepsis, and verify index test results in uninfected babies? 4 4D. Did the index test or its comparator form part of the reference standard? 2 5. Were categories of results of the index test (including cut-offs) and the reference standard defined after obtaining results?16 6. Did the study report the number, training and expertise of the persons executing and reading the index tests and the reference standard?3 7. Was there blinding to results of the index test and the reference standard?4 Statistical methods 8. Describe the statistical methods used to quantify uncertainty (i.e., 95% confidence intervals) 6 9. Describe methods for calculating test reproducibility 4 Results: participants and test results 10A.Describe when the study was carried out, including beginning and ending dates of recruitment 13 10B.Did the study report clinical and demographic (postnatal hours or days, gestational age, birth weight, gender) features in those with and without sepsis?15 10C.Did the study report distribution of illness severity scores in those with and without sepsis?0 11.Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or or the reference standard; describe why participants failed to receive either test 4 12. Report a cross-tabulation of the results (including indeterminate and missing results) using the results of the reference standard; for continuous results, report the distribution of the test results using the results of the reference standard 2 Results: estimates 13.Report measures of statistical uncertainty (i.e., 95% confidence intervals) 6 14.Report how indeterminate results, missing responses and outliers of index tests were handled 1 15.Report estimates of test reproducibility 5 Seven studies reported the results of biomarker combinations including IL-6 [3,10,16,18,21,22,25].Four studies combined the early sepsis marker IL-6 with CRP [3,10,16,25].Combinations with early markers sTREM-1 (soluble Triggering Receptor Expressed on Myeloid Cells-1 [18] and CD64 (Cluster of Differentiation 64, n = 1)) [21] were studied by one study each.Combinations of up to three biomarkers including, in addition to IL-6, the markers IP-10 (Interferon gamma-induced protein 10), IL-10 (Interleukin-10), CRP and TNF-α (Tumor necrosis factor-α) have been investigated by Ng et al. [10,22].The positivity criterion of the test was defined by Ng et al. [10,21,22] as any one marker above the cut-off level and by Dillenseger et al. [25] as one of the two above the cut-off level and not specified in the remaining studies.In the four studies analyzing a combination of IL-6 and CRP at sepsis suspicion [3,10,16,25], cut-off values ranged from 21.7 to 60 pg/mL and 4.05 to 14 mg/L, respectively, sensitivities ranged between 78.12 and 100% and specificities between 41 and 96%.The biomarker combination of IL-6 and CRP, measured at the time of sepsis suspicion, had the highest overall sensitivity (92%), but the lowest overall specificity (79%) in the subgroup analysis.
Figure 4 depicts the sROC curve summarizing the results of individual studies.The overall AUC was 0.88, which corresponds to a good diagnostic test [29].
Children 2024, 11, x FOR PEER REVIEW 15 of 20 of how analyses were performed in case of unclear results, absent responses or outliers of index tests [2].None of the studies reported illness severity scores or their distribution in neonates with and without LOS.Figure 4 depicts the sROC curve summarizing the results of individual studies.The overall AUC was 0.88, which corresponds to a good diagnostic test [29].[14,15].The overall area under the curve (AUC) was 0.88 [14].[14,15].The overall area under the curve (AUC) was 0.88 [14].

Discussion
Our systematic review revealed a satisfying pooled sensitivity of IL-6 as a single marker of 88% (95% CI: 85-90%), and a lower pooled specificity of 78% (75-81%).Another review that included 31 studies incorporating 1448 infants demonstrated a global sensitivity of 82% (77-86%) and specificity 88% (83-92%), respectively [30].Only 9 out of the 31 studies (29%) [3,9,10,[16][17][18]22,24] from this review [30] coincided with studies in our review.This fact was mainly due to the missing differentiation between early and late onset sepsis in their meta-analysis.Other differences compared to our meta-analysis were the selection process on how the studies were selected, missing differentiation by gestational age and time of sampling, as well as combinations of IL-6 with other markers.
Fifteen studies analyzed the diagnostic accuracy of IL-6 as a single marker.Most studies measured IL-6 levels at the time of first signs and symptoms of sepsis.Küster et al. [9] in turn investigated the time course of IL-6 expression and its prognostic power in sepsis diagnostics.IL-6 was found to be superior to CRP in the prediction of sepsis 1 or more days before clinical diagnosis.The sepsis-proven group showed a significant increase in IL-6 levels from median baseline values of 7.5 pg/mL to 89.7 pg/mL on day −2, i.e., 2 days before clinical diagnosis [9].Multiple studies found that IL-6 was only able to differentiate between sepsis and no sepsis at the onset and had limited potential for diagnosis later during the course of sepsis [18,27].This is logical due to the early eruption of IL-6 and its short half-life time.Lusyati et al. [17] made serial determinations of IL-6 levels (0, 4, 12, 24, and 48 h).Despite decreasing IL-6 values at all five time points, significantly higher values were found in the proven sepsis group than in the control group for all five measurement points [17].In the study by Panero et al. [26], all 51 patient controls had IL-6 concentrations <15 pg/mL, while the 17 patients with LOS had IL-6 levels strikingly greater than 15 pg/mL at presentation, corresponding to a sensitivity and specificity of 100% for IL-6.Gonzales et al. [24] found that IL-6 had a sensitivity of 75%, specificity of 68%, an NPV of 87% and PPV of 50% on day 0 of the sepsis episode.On day 1, the specificity and NPV improved to 90% [24].However, their cut-off value of 18 pg/mL was defined solely upon inspection of the data [24].
Seven studies included in the meta-analysis reported results of biomarker combinations including IL-6 [3,10,16,18,21,22,25].Raynor et al. [2], analyzing seven cytokines, found IL-6 to be the best-performing individual cytokine.IL-6 at a cut-off of 130 pg/mL demonstrated 100% sensitivity and 52% PPV when discriminating between patients without sepsis and those with sepsis (clinical or culture proven) [2].Testing all 127 possible cytokine combinations for ruling out sepsis revealed that adding any other cytokine to IL-6 did not result in a higher PPV [2].Ng et al. [10] identified IL-6, TNF-α and CRP as the best three markers for LOS diagnosis.A comparison of the diagnostic value of the individual markers versus a combination or panel of markers revealed higher sensitivity and better negative predictive values for the latter [10].Serial measurements of inflammatory markers can further improve diagnostic accuracy.The highest sensitivities (98%) and specificities (91%) were reached when CRP and IL-6 were measured at day 0 combined with either TNF-α (day 1) or CRP (day 2) [10].In a later study, Ng et al. [19] combined IL-6 and CRP at day 0 with CD64 at 24 (day 1), which resulted in a sensitivity and specificity of 100% and 86%, respectively.Sarafidis et al. [18] found the diagnostic accuracy of IL-6 combined with sTREM-1 (sensitivity and specificity 90% and 62%, respectively) not superior to that of IL-6 alone (sensitivity 80% and specificity 81%).The combination of IL-6 and CRP at time point 0 was superior to other markers and possible combinations in a study by Dillenseger et al. [25]; however, a sensitivity of 78% and specificity of 76% were not sufficient.Comparing two cut-off points, IL-6 at 60 pg/mL was shown to have good specificity (96%), but low sensitivity (67%), while a lower cut-off of 30 pg/mL had excellent sensitivity (100%) but only average specificity (74%) [16].Combining the sensitive IL-6 (cut-off of 30 pg/mL) with the more specific CRP, sensitivity and specificity for sepsis prediction improved to 100% and 96% [16].Comparing the diagnostic potential of the three markers CD64, IL-6, and CRP in combinations versus individual markers revealed only marginal improvement of sensitivity and negative predictive values [21].
Subgroup analysis was used to analyze the influence of the gestational age and the time of sample collection.One study [2] modified their cut-off criteria in order to achieve a sensitivity of 100%.To prevent introducing bias, this study [2] was excluded from the subgroup analysis.Some groups provided multiple results, e.g., for varying cut-off levels.
To avoid introducing the same study population multiple times when comparing preterm versus mixed study populations, each study was included only once.We chose analyses including the whole study population and in cases of different scenarios, we chose those that yielded the best results [31].
Chiesa et al. [12] analyzed IL-6 diagnostic accuracy studies and found that the majority were suboptimal due to missing information on essential parts like the study design, conduct, analysis and interpretation of test accuracy [31].We used the adapted STARD checklist [12] to analyze the quality of the present studies.Twelve of the sixteen included studies used different reference standards for diagnosing LOS and verifying index test results [2,3,10,16,[18][19][20]22,[25][26][27][28].The majority of studies included proven and clinical sepsis cases [2,3,9,10,[16][17][18][19][20]25,27].In two studies, the sepsis group consisted only of cultureproven cases [23,24].None of the studies included illness severity scores in their study design.As an inflammatory marker, CRP serves as an important comparator of the index test; however, in two studies, it was also used for being the reference standard to diagnose sepsis [23,25].All studies included defined cut-offs post hoc, with most of them using ROC analysis.In one study, the cut-off was chosen solely upon inspection of the data [24] and one study did not provide information on the origin of their cut-off value [26].For further information on the importance of each item, we refer to a recent publication of our study group [31].In brief, incorporation bias occurs if the index test or the comparator of the index test form part of the reference standard.The fact that the person interpreting the results of these tests would gain some knowledge of the results of the reference standard distorts the diagnostic ability of these tests.This holds true for markers related to, and biomarker combination including, the marker which forms part of the reference standard [31].
Regarding the clinical applicability of IL-6 for sepsis diagnosis, Dillenseger et al. [25] stated that cytokine assays require a minimum time of 85 min to obtain the results, which would be compatible with clinical decision making but nonetheless should be shortened.Compared to CRP, determination of cytokines is more elaborate and their assays are more expensive; therefore, many hospital laboratories are not able to perform these assays [3].Most laboratories are not able to perform these expensive tests in test batteries that further hamper their clinical usefulness as early markers [21].Others like De girmencio glu et al. [23] already implemented IL-6 into clinical routines.Raynor et al. [2] argue that it is unlikely to achieve a 100% diagnostic accuracy via cytokines, since a robust systemic inflammatory response might be absent in some cases of clinical or Gram-positive sepsis.Verboon et al. [3] measured IL-6 levels after 48 h of antibiotic treatment to find out whether IL-6 might support the decision about the duration of antibiotic treatment (7 to 14 days) in cases of confirmed bacterial sepsis and clinical recovery.They found that a rapid decrease in IL-6 at 48 h would justify the early discontinuation of antibiotics [3].The findings of Ng et al. [10] led to the same conclusion for the serial measurement of IL-6 and CRP measured at the day of sepsis suspicion and CRP measured again two days later.While withholding antibiotic treatment at the onset of sepsis is not recommended, high sensitivity (98%) and negative predictive values (98%) of this combination indicate that antibiotics could be discontinued at 48 h if the infants were in good clinical condition [10].This finding can only complement the already common practice of empirically treating the infant for at least 48 h while awaiting blood culture results.In the era of continuously monitored blood culture systems, several studies have even challenged this time frame [32].A study investigating the time-to-positivity (TTP) of blood cultures in children with proven sepsis found that 90% of blood cultures were positive within 36 h, and in most cases even <24 h of incubation [32].They concluded that discontinuing empirical treatment in the absence of a positive blood culture should already be considered after 24 and 36 h [32].
The strengths of the study can be outlined as follows: we eliminated the factor of uncertainty in many studies between early or late onset sepsis by including only cases of LOS.Subgroup analysis identified the type of sepsis as a significant source of heterogeneity [11,30].The limitations of the study are that we investigated a heterogeneous number of studies in order to gain information (subgroup analyses) on IL-6 performance and possible influencing factors.This might have influenced the precision of the study negatively.It might be useful for future research to analyze individual factors causing heterogeneity within otherwise homogenous subgroups.Unfortunately, only a few studies looked at biomarker combinations.
Based on the findings of this review, IL-6 might be of use for the diagnosis of late onset sepsis in populations of preterm infants when measured at the time of sepsis suspicion.Evaluation of these results in the context of existing literature was difficult since other reviews on this topic included either mixed study populations or even sepsis groups consisting of early and late onset sepsis cases.To confirm the use of IL-6 in the diagnosis of LOS, further prospective studies on well-defined study populations and with well-defined sepsis criteria are needed.

Figure 1 .
Figure 1.Flow chart of the study selection process for diagnostic accuracy of Interleukin-6 in late onset sepsis between 1990 and 2020.Reasons for the exclusion of 24 papers at abstract level were no diagnostic accuracy study (n = 9), exposom study (n = 1), biomarkers other than inflammatory markers (n = 3), language other than English (n = 2), animal study (n = 4), did not study or report outcomes of interest (n = 1), in vitro study (n = 3); dealt with EONS (n = 1).

Figure 1 .
Figure 1.Flow chart of the study selection process for diagnostic accuracy of Interleukin-6 in late onset sepsis between 1990 and 2020.Reasons for the exclusion of 24 papers at abstract level were no diagnostic accuracy study (n = 9), exposom study (n = 1), biomarkers other than inflammatory markers (n = 3), language other than English (n = 2), animal study (n = 4), did not study or report outcomes of interest (n = 1), in vitro study (n = 3); dealt with EONS (n = 1).

Figure 2 .
Figure 2. Boxplots of the distribution of IL-6 cutoff (A) and sensitivity and specificity values (B) of all diagnostic accuracy studies on late onset sepsis using IL-6 as a single marker.

Table 1 .
Characteristics of IL-6 accuracy studies for the diagnosis of late onset sepsis using IL-6 as a single marker.

Table 2 .
Characteristics of IL-6 accuracy studies for the diagnosis of late onset sepsis using biomarker combinations.

Table 3 .
Subgroup analysis of IL-6 accuracy studies for diagnosis of late onset sepsis.

Table 4 .
[12]ity of IL-6 diagnostic accuracy studies for diagnosis of late onset sepsis from 1990 to 2020 according to the STARD criteria ("Standards of Reporting Diagnostic Accuracy Studies"[12]).

Table 4 .
[12]ity of IL-6 diagnostic accuracy studies for diagnosis of late onset sepsis from 1990 to 2020 according to the STARD criteria ("Standards of Reporting Diagnostic Accuracy Studies"[12]).