Diagnostic Accuracy of Imaging Findings in Pleural Empyema: Systematic Review and Meta-Analysis

Computed tomography (CT) diagnosis of empyema is challenging because current literature features multiple overlapping pleural findings. We aimed to identify informative findings for structured reporting. The screening according to inclusion criteria (P: Pleural empyema, I: CT C: culture/gram-stain/pathology/pus, O: Diagnostic accuracy measures), data extraction, and risk of bias assessment of studies published between 01-1980 and 10-2021 on Pubmed, Embase, and Web of Science (WOS) were performed independently by two reviewers. CT findings with pooled diagnostic odds ratios (DOR) with 95% confidence intervals, not including 1, were considered as informative. Summary estimates of diagnostic accuracy for CT findings were calculated by using a bivariate random-effects model and heterogeneity sources were evaluated. Ten studies with a total of 252 patients with and 846 without empyema were included. From 119 overlapping descriptors, five informative CT findings were identified: Pleural enhancement, thickening, loculation, fat thickening, and fat stranding with an AUC of 0.80 (hierarchical summary receiver operating characteristic, HSROC). Potential sources of heterogeneity were different thresholds, empyema prevalence, and study year.


Introduction
Pleural effusion is common with an incidence of 0.32% per year in the general population [1] amounting to approximately 1.5 million people in the United States each year alone [2]. Frequently pleural effusion is related to pneumonia, malignancy, or trauma, which may become secondarily infected. Empyema is defined by pus in the pleural space and the most common cause is pneumonia [3]. Empyema-related hospitalizations are increasing [4]. Although empyema accounts for only 5-10% of parapneumonic effusions [5,6], it is associated with worse outcomes: Longer hospital stays and more complications, especially in culture-positive empyemas [7]. Whilst uncomplicated parapneumonic effusions can be treated with antimicrobial therapy, empyema often requires invasive procedures in addition to broad-spectrum antimicrobial therapy [8]. Computed tomography (CT) is a valuable imaging modality for diagnosing pleural effusions and identifying their etiology [9]. Therefore, it is an integral part of diagnostic procedures for a timely diagnosis of empyema.
Case reports, case series, and animal experiments were excluded.

Information Sources
Information sources were Pubmed, Embase, and Web of Science (WOS).

Search Strategy
A sensitive search strategy was established with Mesh-term and Title/Abstract search in Pubmed which included the terms "empyema", "computed tomography", and "diagnostic accuracy". This search strategy was translated with the "polyglot search translator" [13] to "Embase" and "Web of Science". The detailed search terms can be found in Appendix A. The literature search was updated monthly, with the last update performed on 31 October 2021. Additionally, "Cochrane library", PROSPERO, and online clinical trial registries such as ClinicalTrials.gov (https://clinicaltrials.gov, last update: 31 October 2021) and ISRCTN (https://www.isrctn.com, last update: 31 October 2021) were searched for additional relevant studies.

Selection Process
Eligibility screening was conducted in two steps: Title and abstract screening for matching the inclusion criteria (1) and full-text screening (2) Title, author, and abstract were exported from Pubmed, Embase, and WOS to Microsoft Excel 2019 (Redmond, WA, USA). Duplicates were removed prior to the initiation of the screening process. Both reviewers independently reviewed the title and abstract of all identified studies blinded to each other.
If disagreement existed or a paper could not be excluded by title and abstract alone, the paper was included for full-text reading. Full-text versions of relevant studies were retrieved for further evaluation. Reference lists of included studies were checked manually to identify other relevant papers.

Data Collection Process
A structured data extraction sheet [14] was designed, which included QUADAS-2 [15] and all STARD 2015 [16] criteria to review the identified studies summarized in Appendix B. Assessment of risk of bias and methodological quality is summarized in Appendix C. A study was judged to be at risk of bias if one or more QUADAS criteria were unclear or high.

Data Items and Data Extraction
Both reviewers assessed both the individual data items and risk of bias in the uniform data extraction sheet in a blinded design. Any disagreement was resolved by rechecking the original data and consensus.
For each study included in the meta-analysis, data were extracted to generate 2 × 2 contingency tables displaying true positives, true negatives, false positives, and false negatives. Patients without infected pleural effusion were regarded as disease negatives and patients with a positive culture, gram stain, or macroscopic pus as disease positive. False positives were defined as patients having the disease based on a positive pleural finding but categorized as not having the disease by the reference standard.
Pooled sensitivity, specificity, DOR, and AUC (univariate and hierarchical analysis), as well as 95% CI intervals, were calculated for each pleural finding of the published studies. Forest plots were constructed for all included studies displaying sensitivity and specificity.
Since a common implicit cut-off value for test positivity is to be expected and large differences between disease prevalence in different studies exist, estimates of pooled sensitivity and specificity were calculated by fitting a bivariate random effect model to account for both within-and between-study heterogeneity [17,18]. We quantified heterogeneity between the studies using the I 2 -Index and level of heterogeneity (low < 25, moderate 25-75, and high > 75) as defined by Higgins et al. [19]. We are aware there is a threshold value effect for diagnostic accuracy studies of modalities so that these can only be interpreted to a limited extent [20].
Informative CT findings were defined as a DOR 95% confidence interval, not including 1 [21]. The publication bias could only be assessed to a limited extent, as there is no generally accepted method for diagnostic accuracy studies and the number of studies included was low [22]. Subgroup analyses for sensitivity and specificity with random effect models were performed regarding informative pleural findings, the negative collectives (parapneumonic effusions, benign effusions, or effusions in general), concerns regarding applicability (QUADAS-2), the reference standard, slice thickness, whether a study was performed after the year 2000, multiple reviewers, and the dichotomized prevalence of empyema (cutoff 30%). Additionally, a meta-analysis with a mixed-effects model based on DOR estimates was used for disease prevalence and study year. We evaluated suspected significance based on meta-regression with permutation tests (1000 iterations). Alpha level was set to 0.05.

Data Extraction/Characteristics of the Included Studies Population
Finally, 10 studies were included in the quantitative synthesis (meta-analysis) with a total of 1098 patients and 252 empyemas. The summary of the baseline characteristics is shown in Table 1. The mean patient age ranged from 56 to 72. All studies were a retrospective cohort study design.

Data Extraction/Characteristics of the Included Studies Population
Finally, 10 studies were included in the quantitative synthesis (meta-analysis) with a total of 1098 patients and 252 empyemas. The summary of the baseline characteristics is shown in Table 1. The mean patient age ranged from 56 to 72. All studies were a retrospective cohort study design.

Risk of Bias
The quality of included studies assessed by QUADAS-2 is summarized in Table 2. As illustrated, there is a substantial amount of underreporting in the included studies, resulting in many "unclear" judgments which consequently diminish the quality of the data. None of the studies reported whether the reference standard was blinded for the index test.    [47] unclear low unclear low unclear low low Jimenez [48] low low unclear high low low low Stark [49] high low unclear high unclear low high Metintas [50] low low unclear high low low low Leung [51] low low unclear high low low low Cullu [52] unclear high unclear low unclear unclear unclear Waite [53] low unclear unclear low low low low Aquino [54] low low unclear low low low low Takasugi [55] unclear low unclear unclear unclear low low

Categorization of Pleural Findings
There were 119 overlapping descriptions of which 99 describe the pleura, pleural effusion, or the adjoining adipose tissue, and 20 other findings such as lymphadenopathy, liver metastases, lung metastases, and pneumonia. Of these, duplicates were removed and 35 CT findings were assessed as descriptors of empyema. Of these findings, 11 findings were not included in the meta-analysis because they were described in less than 2 studies with the same negative collective (parapneumonic effusion, benign effusion, or pleural effusion in general). Table A2 summarizes the descriptors that were not used for the meta-analysis. Finally, similar descriptors (n = 24) referring to the same imaging finding were subsumed under the following five informative CT findings (visually summarized in Figure 2) after consensus discussion: Pleural enhancement (including the split pleura sign), "pleural thickening" (visible-4 mm), "loculation", "fat thickening" (visible-4 mm), and "fat stranding". Sensitivity, specificity, and DOR are summarized in Table A3. "Hemisplit pleura sign", "circumferential pleural thickening", "pleural thickening ≥ 4 mm", and "fat thickening > 5 mm" were identified as non-informative (2.5% DOR ≤ 1) and later excluded from the following analyses.

Categorization of Pleural Findings
There were 119 overlapping descriptions of which 99 describe the pleura, pleural effusion, or the adjoining adipose tissue, and 20 other findings such as lymphadenopathy, liver metastases, lung metastases, and pneumonia. Of these, duplicates were removed and 35 CT findings were assessed as descriptors of empyema. Of these findings, 11 findings were not included in the meta-analysis because they were described in less than 2 studies with the same negative collective (parapneumonic effusion, benign effusion, or pleural effusion in general). Table A2 summarizes the descriptors that were not used for the metaanalysis. Finally, similar descriptors (n = 24) referring to the same imaging finding were subsumed under the following five informative CT findings (visually summarized in Figure 2) after consensus discussion: Pleural enhancement (including the split pleura sign), "pleural thickening" (visible-4 mm), "loculation", "fat thickening" (visible-4 mm), and "fat stranding". Sensitivity, specificity, and DOR are summarized in Table A3. "Hemisplit pleura sign", "circumferential pleural thickening", "pleural thickening ≥ 4 mm", and "fat thickening > 5 mm" were identified as non-informative (2.5% DOR ≤ 1) and later excluded from the following analyses.

Empyema and Subgroup Analysis
If the CT findings are interpreted as different threshold values for the same diagnosis of empyema, the result is a pooled specificity of 90% (95% CI 86-93) and a sensitivity of 62% (95% CI 55-68) with an AUC of 0.80. Figure A5 shows the corresponding HSROC curve.
The individual pleural finding (p ≤ 0.001 for sensitivity and specificity), the prevalence of empyema (p = 0.04 for specificity), slice thickness (p < 0.001 for sensitivity), and whether a study published after 2000 (p = 0.01 for specificity) was identified as a source of heterogeneity with significant differences in pooled diagnostic accuracy measures of the subgroups.
Based on the random-effects model, there is a significant difference between the sensitivity (p ≤ 0.001) of the individual pleural findings, ranging from 84% for pleural enhancement to 39% for fat stranding. There is also a significant difference between the specificity (p ≤ 0.001), ranging from 83% for pleural enhancement to 97% for fat stranding. Sensitivities (84%, 68%, p = 0.14) and specificities (83%, 87%, p = 0.40) of pleural enhancement and pleural thickening do not differ significantly.
Of the subsumed pleural findings, pleural enhancement and fat stranding had the highest DOR with 20.1 and 26.5. Smooth margin, microbubbles, or pleural gas showed relative high DORs in the narrative summary (range: 5.6 [46,48]-62.4 [49,50]). Despite comparable feature-definitions, there were frequently major differences in the DOR. For example, the DOR of visible fat stranding varied between 28.8 [48] and 19.2 [53] and the DOR of the "Split pleura sign" varied between 7.9 [46] and 44.8 [49]. The diagnostic value of the amount of effusion [47,50] and the presence of septations [47,49] remains unclear, as the available studies show controversial results with regard to the DOR.
While different studies used different CT findings to indicate thoracocentesis [46,56,57], the identified informative findings can be used to differentiate empyema from other pleural diseases in a more complete and standardized manner. This distinction is important because both clinical management and patient outcomes differ [10,58]. Because pleural effusions are managed conservatively, false-negative empyema diagnoses should be avoided, suggesting that more value should be given to sensitivity over specificity. Most of the included studies lacked detailed definition and description of CT findings [46][47][48][49][50][51][52], thereby limiting the analysis of different thresholds. However, since CT findings have relatively high specificity with lower sensitivity, no other lower threshold value can be recommended besides the visibility of the findings. However, a threshold greater than 4 There was no significant difference between the negative collectives (sens: p = 0.96/spec: p = 0.84), the reference standard (sens: p = 0.26/spec: p = 0.99), and between the number of reviewers (sens: p = 0.75/spec: p = 0.24). A tabular representation of subgroup analysis can be found in Table A6.
Of the subsumed pleural findings, pleural enhancement and fat stranding had the highest DOR with 20.1 and 26.5. Smooth margin, microbubbles, or pleural gas showed relative high DORs in the narrative summary (range: 5.6 [46,48]-62.4 [49,50]). Despite comparable feature-definitions, there were frequently major differences in the DOR. For example, the DOR of visible fat stranding varied between 28.8 [48] and 19.2 [53] and the DOR of the "Split pleura sign" varied between 7.9 [46] and 44.8 [49]. The diagnostic value of the amount of effusion [47,50] and the presence of septations [47,49] remains unclear, as the available studies show controversial results with regard to the DOR.
While different studies used different CT findings to indicate thoracocentesis [46,56,57], the identified informative findings can be used to differentiate empyema from other pleu-ral diseases in a more complete and standardized manner. This distinction is important because both clinical management and patient outcomes differ [10,58]. Because pleural effusions are managed conservatively, false-negative empyema diagnoses should be avoided, suggesting that more value should be given to sensitivity over specificity. Most of the included studies lacked detailed definition and description of CT findings [46][47][48][49][50][51][52], thereby limiting the analysis of different thresholds. However, since CT findings have relatively high specificity with lower sensitivity, no other lower threshold value can be recommended besides the visibility of the findings. However, a threshold greater than 4 mm for pleural thickening [53,54] and subcostal fat thickening [53] was not shown to be informative, mainly as this decreases the differentiability from a pleural tumor manifestation. Whereas pleural carcinomatosis is more likely to show nodular, rind-like, pleural thickening (>10 mm) [50,51] or a pleural-based soft tissue mass [50,51], empyema tends to show smooth pleural thickening [48,54].
In an attempt to maximize pleural enhancement, a dedicated CT protocol is warranted [59,60] to further increase the sensitivity of pleural enhancement and pleural thickening at the expense of a potential higher false-positive rate. In addition, more specific features including fat thickening and fat stranding should be utilized to achieve a higher overall diagnostic accuracy. With newer CT scanners and modern diagnostic monitors offering higher resolution, an ever-increasing higher sensitivity can be expected. Surprisingly, our study showed an inverse correlation when comparing sensitivity with the study date as well as no significant difference with decreasing slice-thickness. This could be partly explained by the fact that older studies only partially fulfilled the STARD criteria, and the patient flow in the included studies remained mostly unclear.
There are several limitations to this study. First, the number of included studies was limited, resulting in a paucity of data available for meta-analysis. Second, different CT parameters, especially concerning the administration of contrast medium, could only be compared to a limited extend, as these were not recorded in a standardized manner in the studies presented. This also applies to the slice thickness, as several studies used different CT scanners or CT settings and therefore only overlapping subgroups could be formed. Finally, we found high heterogeneity among the studies used, which can only be partially explained by the subgroup analyses. This might be mostly related to poor methodology and serious underreporting of the patient selection process. This is an important cause of concern and should be taken into consideration when interpreting the results

Conclusions
Our study concludes that an early diagnosis depends on a high index of suspicion. Combined with the presence of one (or more) of the several aforementioned informative pleural findings, the diagnosis of pleural empyema can be made with high specificity.

Future Directions
Imaging advances and a lack of evidence for the optimization of CT protocols with regards to contrast agent administration indicate the need for further studies. In addition to confirming the high specificity already shown in our review, this could lead to improvements in sensitivity. The CT imaging, which is often performed routinely, could thus become increasingly reliable and useful for therapy decisions in the management of pleural empyema.  Data Availability Statement: Most data generated or analyzed during the study are included in the published paper. Additional data generated or analyzed during the study are available from the corresponding author by request.

Conflicts of
Web of Science: ("Empyema, pleural" OR (empyema* OR pyothorax) AND (pleura* OR lung)) AND ("Tomography, X-Ray Computed" OR "computer assisted tomography" OR "computed tomography" OR "computed tomographic scan" OR "computed tomography scan" OR "computer tomography" OR "computerized tomography" OR "computerized tomography") AND ("Data Accuracy/statistics and numerical data" OR "Sensitivity and Specificity" OR "ROC Curve" OR "Area Under Curve" OR accuracy* OR sens* OR speci* OR ROC OR AUC).

Appendix B. Extracted Data Items
The extracted data items were: (1) Expected outcome data were absolute numbers (number of true positive (TP), true negative (TN), false positive (FP), false negative (FN)) to calculate diagnostic accuracy measures, sensitivity, specificity, negative predictive value, positive predictive value, and AUC/ROC. (2) The various pleural findings were understood as prespecified thresholds for the diagnosis of empyema and were initially collected separately in the data collection since a threshold value for the diagnosis "empyema" from the combination of these findings has not yet been established. (3) A clear definition of the negative collective, as this is needed for comparison and pooling of the different identified studies (mainly: Parapneumonic effusions, benign effusions, and effusions in general). (4) A detailed description of computed tomography was included (vendor collimation, slice thickness, contrast, etc.). (5) Additional data items were: First author, published paper, study design, unit of assessment (per patient/ per effusion), prior testing, method of patient selection, number of participants, number of patients excluded (study overlap, insufficient test, no reference standard), number of empyemas (and prevalence), mean age, distribution of sex, definition of test positivity, thresholds of test positivity, number of readers and readers characteristics (e.g., years of professional experience), definition of the reference standard, the time interval between the reference standard and index test, country, year, follow-up, shortly conclusion, funding sources, and studied subgroups.

Appendix C. Study Risk of Bias and Assessment of the Methodological Quality
A customized QUADAS-2 [15] was used, based on the four domains of "study selection", "index test", "reference standard", and "flow and timing". After the assessment, disagreements were resolved by consensus. During the pilot review, there was frequent disagreement on the reference standard because most studies had a clear definition of the reference standard, but the handling of indeterminate, or missing data, or blinding for index tests was unclear. In those studies, we rated the risk of bias as "unclear", but concerns regarding applicability as "low" because of the accepted reference standard.     Pleural findings marked with a 1 were only described in one study in the respective negative collective, which is why sensitivity, specificity, DOR, and AUC were not pooled and tau 2 , Cochrane Q, and Chi 2 are not calculable ("NA").* p < 0.05.