Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessSystematic Review

Peer-Review Record

Diagnostic Test Accuracy and Semi-Quantitative Metrics of ¹⁸F-FDG PET in Assessing Treatment Response in Skull Base Osteomyelitis and Necrotising Otitis Externa: A Systematic Review and Meta-Analysis

Tomography 2026, 12(3), 32; https://doi.org/10.3390/tomography12030032

by Mark Laidlaw^1,*

, Maya Reid², Sukanya Rajiv^1,* and Jean-Marc Gerard^1,3,4

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Tomography 2026, 12(3), 32; https://doi.org/10.3390/tomography12030032

Submission received: 16 January 2026 / Revised: 6 February 2026 / Accepted: 27 February 2026 / Published: 2 March 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This review focuses on the diagnostic accuracy of 18F-FDG PET for monitoring treatment response in SBO and NOE. While the paper is timely, the paper suffers from several issues:

The mention of a "diagnostic positivity rate" of 96.1% is problematic from a methodological standpoint. Given that six out of seven studies only included confirmed cases and lacked disease-free controls, this, in fact, comment applies to the case detection rate and not to diagnostic accuracy.
Presenting 2.1 Protocol and Registrationbefore 2.2 Review Objectivesinverts the standard logical sequence. The reader should first learn of the aims of the study before being presented with the methods.
The Discussion section lacks depth and overs elaborates background information and findings that have already been presented in the Introduction and Results.
The use of "Doctors" in the simple abstract is informal for a scientific manuscript. The better terms would be "clinicians" or "physicians."

Author Response

Peer Review Responses

Responses to Reviewer 1

Thank you to the Reviewer for the thoughtful and constructive feedback provided on our manuscript. We have revised the manuscript accordingly to improve clarity, methodological precision, and presentation.

All line-number references in our responses refer to the latest revised manuscript version uploaded in conjunction with these reviewer responses.

Comment 1:
“1. The mention of a "diagnostic positivity rate" of 96.1% is problematic from a methodological standpoint. Given that six out of seven studies only included confirmed cases and lacked disease-free controls, this, in fact, comment applies to the case detection rate and not to diagnostic accuracy."

Response 1:

Thank you for this important methodological clarification. We agree that, because most diagnosis-timepoint studies enrolled only confirmed SBO/NOE without disease-free controls, the pooled 96.1% figure does not represent diagnostic accuracy (sensitivity/specificity) at initial presentation and instead reflects a case detection proportion/rate among confirmed cases. We attempted to make this distinction explicit in the Methods, where we wrote: “Secondary Objective 1 represents an exploratory deviation from the registered protocol. The literature search identified that almost all included studies enrolled patients with confirmed rather than suspected SBO/NOE, with the exception of one study [33], precluding meta-analysis of diagnostic test accuracy at the point of initial clinical suspicion as originally planned. Instead, we evaluated the proportion of positive ¹⁸F-FDG PET findings in patients with disease confirmed by other reference standards, providing insight into the diagnostic positivity rate rather than true diagnostic accuracy (sensitivity and specificity) at initial presentation.” However, we agree that the term “diagnostic positivity rate” may remain open to misinterpretation. Therefore, we have further reduced ambiguity by revising terminology throughout the manuscript to refer to this secondary outcome as suggested “detection rate” (case detection rate in confirmed cohorts), and we have retained explicit caveat wording that this outcome is not a sensitivity/specificity estimate at initial presentation.

These revisions can be found in

Abstract Pg 2 Line 47
Methods Pg 3 Line 121; Pg 3 Line 131; Pg 4 Line 170; Pg 5 Line 250; Pg 5 Line 267; Pg 5 Line 271
Results Pg 7 Line 307; Pg 7 Line 314; Pg 14 Line 357; Pg 16 Line 413
Discussion Pg 17 Line 448

where “diagnostic positivity rate” was changed to “detection rate”.

Comment 2:
“2. Presenting 2.1 Protocol and Registration before 2.2 Review Objectives inverts the standard logical sequence. The reader should first learn of the aims of the study before being presented with the methods.”

Response 2:

Thank you for this suggestion. We agree. Therefore, we have reordered these subsections so that the study aims are presented before protocol/registration details. Specifically, “Review Objectives” now appears first within the Methods section (as Section 2.1), followed by “Protocol and Registration” (as Section 2.2), with subsequent Methods subsections continuing unchanged.

Comment 3:
“3. The Discussion section lacks depth and overs elaborates background information and findings that have already been presented in the Introduction and Results.”

Response 3:

Thank you for this constructive feedback. We agree. Therefore, we have restructured the Discussion to improve interpretive depth and to focus more explicitly on (i) synthesis and interpretation of the principal findings, (ii) practical clinical implications of the pooled likelihood ratios for treatment-endpoint decisions, and (iii) key drivers of between-study heterogeneity and evidence gaps (threshold variability, composite reference standards, follow-up definitions, and incorporation bias). These revisions were implemented in conjunction with, and to better integrate, points raised by the other reviewer. The additions are located in:

Discussion, Pg 18 Line 500; “Accordingly, the pooled estimates, particularly specificity and the diagnostic odds ratio, should be interpreted as average effects with substantial uncertainty and may not be directly generalisable to all clinical contexts.”
Discussion, Pg 18 Line 504; “Our primary analysis was conducted per lesion because patient-level clustering could not be consistently reconstructed from published data; while most cohorts were unilateral, any unaccounted within-patient correlation (e.g., bilateral or contralateral disease) may underestimate statistical uncertainty in the pooled estimates.”
Discussion, Pg 18 Line 514; “Notably, disease-specific consensus guidance and validated, standardised PET response criteria to define treatment endpoints in SBO/NOE remain limited. This likely contributes to variability in thresholds and composite reference standards across studies and institutions.”
Discussion, Pg 18 Line 518’ “Finally, included cohorts predominantly represented NOE-related (otogenic) SBO, with fewer studies enrolling broader or central SBO. Differences in diagnostic pathways and composite reference standards between these subtypes may influence estimates and generalisability; the limited number of studies per subtype precluded robust subtype-specific comparisons of diagnostic performance.”

Comment 4:
“4. The use of "Doctors" in the simple abstract is informal for a scientific manuscript. The better terms would be "clinicians" or "physicians."”

Response 4:

We agree. Therefore, we have revised the Simple Summary to replace the term “Doctors” with “clinicians” (and/or “physicians,” as appropriate) to maintain a formal scientific tone.

This revision can be found in

Simple Summary, Pg 1 Line 19, where “Doctors face difficulty determining…” has been changed to “Clinicians face difficulty determining…”

Thank you again for the Reviewer’s constructive comments, which have helped improve the manuscript’s clarity, structure, and scientific tone. We hope the revisions and responses satisfactorily address all concerns raised.

Reviewer 2 Report

Comments and Suggestions for Authors

Introduction & Scope

It is unclear whether the review will include all types of SBO or only those related to NOE. The phrase “including necrotising (malignant) otitis externa” suggests inclusion, but it is not explicit if SBO cases without NOE are excluded. This may affect study selection and generalizability. The distinction between SBO and NOE is mentioned, but it is unclear whether diagnostic procedures differ between the two subtypes. Clarification on whether the review considers potential differences in diagnostic performance would be helpful.
The introduction identifies a gap (no systematic review on PET for treatment response in SBO/NOE), but there is no mention of clinical guidelines or standards to contextualize PET utility.

Methods & Analysis

Handling of per-lesion versus per-patient correlation is not formally discussed; the bivariate meta-analysis assumes independence. No sensitivity analyses or meta-regression are reported to explore heterogeneity (e.g., PET standalone vs PET/CT/MRI).
High heterogeneity, especially for specificity (SD logit = 1.47, CrI 0.61–2.56), suggests the pooled estimate may not be reliable across clinical contexts. Wide credible interval for diagnostic odds ratio (28.6–1148.6) indicates estimate instability.

Heterogeneity Across Studies

Considerable heterogeneity for specificity.
Differences in positivity thresholds (visual vs SUVmax/SUVpeak/TLG) and reference standards.
Per-lesion heterogeneity not formally modeled → risk of overestimating precision.

Protocol Deviations

Secondary outcome (“diagnostic positivity rate”) is post-hoc: most studies included only patients with confirmed SBO/NOE → does not represent true diagnostic sensitivity at initial presentation.
Only one study included patients with suspected SBO/NOE and controls → true DTA at diagnosis.

Semi-quantitative PET Measures

Wide methodological heterogeneity (SUVmax, SUVpeak, TLG, TMV, lesion-to-background ratio, weighted extent score).
Discordant results across studies → difficult to standardize or apply clinically.
Analyses exploratory, hypothesis-generating rather than definitive.

Author Response

Peer Review Responses

Responses to Reviewer 2

We thank the Reviewer for their careful evaluation of our manuscript and for the constructive, detailed comments provided. We have revised the manuscript accordingly to improve clarity, methodological transparency, and interpretability of the findings.

All referenced line numbers correspond to the latest revised manuscript version uploaded together with this response document.

Comment 1:

“It is unclear whether the review will include all types of SBO or only those related to NOE. The phrase “including necrotising (malignant) otitis externa” suggests inclusion, but it is not explicit if SBO cases without NOE are excluded. This may affect study selection and generalizability. The distinction between SBO and NOE is mentioned, but it is unclear whether diagnostic procedures differ between the two subtypes. Clarification on whether the review considers potential differences in diagnostic performance would be helpful.”

Response 1:

Thank you for pointing this out. We agree with this comment. Therefore, we have clarified the review scope in the eligibility criteria to state that we included skull base osteomyelitis (SBO) cases encompassing otogenic SBO/necrotising otitis externa (NOE/MOE) and central SBO, provided studies evaluated ¹⁸F-FDG PET (±CT/MRI) for treatment response in confirmed disease. We also added limitation text noting that diagnostic pathways and composite reference standards may differ between NOE-related and non-otogenic SBO, which may influence diagnostic performance and generalisability; however, available evidence was insufficient to support robust subtype-specific quantitative comparisons. These revisions can be found in:

Methods, 2.5 Inclusion Criteria Pg4, Line 158 where
- “Patients of any age with suspected or confirmed skull base osteomyelitis (SBO) or necrotising (malignant) otitis externa (NOE/MOE) undergoing imaging for initial diagnosis and/or assessment of treatment response.”

was replaced with

“Patients of any age with suspected or confirmed skull base osteomyelitis (SBO), including otogenic SBO/necrotising (malignant) otitis externa (NOE/MOE) and central SBO, undergoing imaging for assessment of treatment response (and/or initial evaluation where data permitted).”

Discussion, 4.1 Limitations Pg 18, Line 518 where the following was added:
- “Finally, included cohorts predominantly represented NOE-related (otogenic) SBO, with fewer studies enrolling broader or central SBO. Differences in diagnostic pathways and composite reference standards between these subtypes may influence estimates and generalisability; the limited number of studies per subtype precluded robust subtype-specific comparisons of diagnostic performance.”

Comment 2:

“The introduction identifies a gap (no systematic review on PET for treatment response in SBO/NOE), but there is no mention of clinical guidelines or standards to contextualize PET utility.”

Response 2:

Thank you for pointing this out. We agree with this comment. Therefore, we have revised the Introduction to better contextualise ¹⁸F-FDG PET within current clinical standards by noting that disease-specific, standardised PET response criteria and guideline recommendations for PET-guided treatment endpoints in SBO/NOE remain limited, reflecting the emerging nature of the skull base evidence base (eligible studies from 2019 onwards). We also clarified that the rationale for PET use in SBO/NOE is informed by evidence from chronic osteomyelitis at other anatomical sites, and that this review synthesises the emerging skull base–specific evidence. In addition, we added a brief statement in the Discussion (clinical perspective paragraph) to reinforce the implications for practice. These revisions can be found in:

Introduction Pg 3, Line 94; where the following was added:
- “Disease-specific, standardised PET response criteria and consensus guideline recommendations for PET-guided treatment endpoints in SBO/NOE remain limited. Consequently, interest in ¹⁸F-FDG PET for response assessment in SBO/NOE has grown largely by extrapolation from its established utility in chronic osteomyelitis at other anatomical sites, but the skull base evidence base remains recent and heterogeneous.”
Introduction Pg 3, Line 102; where the following was added:
- “A synthesis of the emerging SBO/NOE-specific evidence is therefore needed to inform clinical decision-making and support future development of standardised response criteria.”
And Discussion, Pg 18, Line 514; where the following was added:
- “Notably, disease-specific consensus guidance and validated, standardised PET response criteria to define treatment endpoints in SBO/NOE remain limited. This likely contributes to variability in thresholds and composite reference standards across studies and institutions.”

Comment 3:

“Handling of per-lesion versus per-patient correlation is not formally discussed; the bivariate meta-analysis assumes independence. No sensitivity analyses or meta-regression are reported to explore heterogeneity (e.g., PET standalone vs PET/CT/MRI).”

Response 3:

Thank you for pointing this out. We agree that the unit-of-analysis assumptions should be stated explicitly. Therefore, we have added text in the Methods to clarify that diagnostic accuracy data were analysed on a per-lesion basis because the included studies predominantly reported unilateral disease and, where bilateral/contralateral involvement was noted, lesion-level outcomes could not be consistently linked at the patient level from published data. We also acknowledge that analysing lesions as independent observations may underestimate statistical uncertainty if within-patient correlation is present, and we added this explicitly as a limitation in the Discussion.

These revisions can be found in

Methods, 2.7 Data Extraction of Diagnostic Accuracy Measures, Pg 5 Line 240, where we have added the following:
- “Most included studies reported unilateral disease. Where bilateral or contralateral involvement was noted, lesion-level outcomes were not consistently reported in a way that allowed linkage of multiple lesions to individual patients; therefore, analysis at the patient level was not possible from published data. We acknowledge that the bivariate model assumes independent observations and that any within-patient correlation could lead to underestimation of uncertainty.”
Discussion, 4.1 Limitations, Pg 18 Line 504, where we have added the following:
- “Our primary analysis was conducted per lesion because patient-level clustering could not be consistently reconstructed from published data; while most cohorts were unilateral, any unaccounted within-patient correlation (e.g., bilateral or contralateral disease) may underestimate statistical uncertainty in the pooled estimates.”

Comment 4:

“High heterogeneity, especially for specificity (SD logit = 1.47, CrI 0.61–2.56), suggests the pooled estimate may not be reliable across clinical contexts. Wide credible interval for diagnostic odds ratio (28.6–1148.6) indicates estimate instability.”

Response 4:

Thank you for highlighting this. We agree that the substantial heterogeneity (particularly for specificity) and the wide credible interval around the diagnostic odds ratio indicate estimate instability and limit generalisability across clinical contexts. In the manuscript, this is reported in Results, 3.3 Primary Outcome: Treatment Response Monitoring (including the specificity heterogeneity estimate) and reflected by the wide credible intervals around the diagnostic odds ratio. Therefore, we have strengthened Discussion, 4.1 Limitations by adding a concise statement emphasising that these pooled estimates should be interpreted cautiously and may not transfer uniformly across settings.

This revision can be found in:

Discussion 4.1 Limitations, Pg 18 Line 500, where the following was added:
- “Accordingly, the pooled estimates, particularly specificity and the diagnostic odds ratio, should be interpreted as average effects with substantial uncertainty and may not be directly generalisable to all clinical contexts.”

Comment 5:
“• Considerable heterogeneity for specificity.
• Differences in positivity thresholds (visual vs SUVmax/SUVpeak/TLG) and reference standards.
• Per-lesion heterogeneity not formally modelled → risk of overestimating precision.”

Response 5:

Thank you for these observations. We agree that heterogeneity, particularly for specificity, limits transferability across clinical contexts, and that variation in PET positivity thresholds and composite reference standards are key contributors. This is reported and discussed in multiple places in the manuscript, including:

Results, 3.2 Study Characteristics, Pg 13 Line 325:
- “Reference standards were composite and varied across studies, incorporating combinations of clinical assessment (symptom resolution, otoscopic findings, cranial nerve function), biological markers (CRP, leucocyte count), microbiological confirmation, histopathology, and other imaging modalities.”
Results 3.3 Primary Outcome: Treatment Response Monitoring, Pg 13 Line 341
- “Between-study heterogeneity was substantial, particularly for specificity. The between-study standard deviation for logit-transformed specificity was 1.47 (95% CrI 0.61 to 2.56), whilst that for logit-transformed sensitivity was 0.59 (95% CrI 0.04 to 1.92). The between-study correlation was −0.06 (95% CrI −0.83 to 0.80), indicating minimal correlation between sensitivity and specificity across studies.”
Discussion, 4.1 Limitations, Pg 18 Line 496
- “Substantial heterogeneity was observed for specificity estimates, with between-study standard deviation on the logit scale of 1.47 (95% credible interval 0.61 to 2.56), likely reflecting variability in PET positivity criteria (five studies employed semiquantitative parameters, three used qualitative visual assessment) and composite reference standards without clearly prespecified criteria.”

With respect to the unit-of-analysis concern (per-lesion modelling and the risk of overstating precision), this was addressed in conjunction with Comment 3, where we added explicit text acknowledging the independence assumption and potential underestimation of uncertainty if within-patient correlation is present, which can be found in:

Methods, 2.7 Data Extraction of Diagnostic Accuracy Measures, Pg 5 Line 240, where we have added the following:

“Most included studies reported unilateral disease. Where bilateral or contralateral involvement was noted, lesion-level outcomes were not consistently reported in a way that allowed linkage of multiple lesions to individual patients; therefore, analysis at the patient level was not possible from published data. We acknowledge that the bivariate model assumes independent observations and that any within-patient correlation could lead to underestimation of uncertainty.”

Discussion, 4.1 Limitations, Pg 18 Line 504, where we have added the following:

“Our primary analysis was conducted per lesion because patient-level clustering could not be consistently reconstructed from published data; while most cohorts were unilateral, any unaccounted within-patient correlation (e.g., bilateral or contralateral disease) may underestimate statistical uncertainty in the pooled estimates.”

Comment 6:
“• Secondary outcome (“diagnostic positivity rate”) is post-hoc: most studies included only patients with confirmed SBO/NOE → does not represent true diagnostic sensitivity at initial presentation.
• Only one study included patients with suspected SBO/NOE and controls → true DTA at diagnosis.”

Response 6:

Thank you for this clarification. We agree that, because most included studies at the diagnosis timepoint enrolled only confirmed SBO/NOE without disease-free controls, this secondary analysis does not estimate true diagnostic sensitivity/specificity at initial presentation, and only one study provided true DTA at diagnosis. This was explicitly handled as an exploratory protocol deviation in the manuscript. In Methods, 2.2 Review Objectives, we state: “Secondary Objective 1 represents an exploratory deviation from the registered protocol. The literature search identified that almost all included studies enrolled patients with confirmed rather than suspected SBO/NOE, with the exception of one study [33], precluding meta-analysis of diagnostic test accuracy at the point of initial clinical suspicion as originally planned. Instead, we evaluated the proportion of positive ¹⁸F-FDG PET findings in patients with disease confirmed by other reference standards, providing insight into the diagnostic positivity rate rather than true diagnostic accuracy (sensitivity and specificity) at initial presentation.” We also note that the primary objective (treatment response DTA) was performed as intended. Therefore, we have revised terminology throughout to refer to this outcome as “detection rate” (rather than “diagnostic positivity rate”), while retaining the explicit caveat that it is not a true sensitivity/specificity estimate.

These revisions can be found in

Abstract Pg 2 Line 47
Methods Pg 3 Line 121; Pg 3 Line 131; Pg 4 Line 170; Pg 5 Line 250; Pg 5 Line 267; Pg 5 Line 271
Results Pg 7 Line 307; Pg 7 Line 314; Pg 14 Line 357; Pg 16 Line 413
Discussion Pg 17 Line 448

where “diagnostic positivity rate” was changed to “detection rate”.

Comment 7:

“Semi-quantitative PET Measures • Wide methodological heterogeneity (SUVmax, SUVpeak, TLG, TMV, lesion-to-background ratio, weighted extent score). • Discordant results across studies → difficult to standardize or apply clinically. • Analyses exploratory, hypothesis-generating rather than definitive.”

Response 7:

Thank you for these observations. We agree that semi-quantitative PET measures are methodologically heterogeneous and that results are not yet sufficiently consistent to support standardised clinical application; accordingly, these analyses should be interpreted as exploratory and hypothesis-generating. This is explicitly acknowledged in the manuscript in Methods, 2.8.3 Exploratory semi-quantitative measures and PET Thresholds (“Given the exploratory nature and anticipated methodological heterogeneity, these analyses were interpreted cautiously and presented as hypothesis-generating…”), in Results, 3.4 Secondary Outcome: Semi-quantitative Metabolic Parameters(“Quantitative threshold approaches varied substantially…”), and in Discussion, 4.2 Future Research, where we note that standardised thresholds remain undefined and emphasise the need for validated threshold values and standardised response criteria before routine clinical adoption.

Thank you again for your insightful feedback, which has strengthened the manuscript. We hope our revisions and responses satisfactorily address all points raised.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I recommended acceptance for publication.

Reviewer 2 Report

Comments and Suggestions for Authors

I am happy that authors make the changes what i suggested. The revisions make the paper more clear and better for reader. Overall, i think the work is good and most comments are answered. This make the article stronger and more easy to understand.

Article Menu

Diagnostic Test Accuracy and Semi-Quantitative Metrics of ¹⁸F-FDG PET in Assessing Treatment Response in Skull Base Osteomyelitis and Necrotising Otitis Externa: A Systematic Review and Meta-Analysis

Peer Review Responses

Responses to Reviewer 1

Peer Review Responses

Responses to Reviewer 2

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Diagnostic Test Accuracy and Semi-Quantitative Metrics of 18F-FDG PET in Assessing Treatment Response in Skull Base Osteomyelitis and Necrotising Otitis Externa: A Systematic Review and Meta-Analysis

Peer Review Responses

Responses to Reviewer 1

Peer Review Responses

Responses to Reviewer 2

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Diagnostic Test Accuracy and Semi-Quantitative Metrics of ¹⁸F-FDG PET in Assessing Treatment Response in Skull Base Osteomyelitis and Necrotising Otitis Externa: A Systematic Review and Meta-Analysis