Review Reports
- Fabian Baier*,
- Oliver Koelbl and
- Felix Steger
- et al.
Reviewer 1: Chenxi Li Reviewer 2: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsCurrent Oncology
curroncol-4066016, entitled “Interobserver variation within planning target volume and organs at risk in a patient with oropharyngeal carcinoma: A contouring study with anatomical analysis” has been carefully reviewed.
Overall Evaluation
This study investigates inter-observer variability (IOV) in the delineation of planning target volumes (PTV) and organs at risk (OAR) for oropharyngeal cancer radiotherapy. By conducting independent contouring experiments involving ten senior radiation oncologists, the extent of variability was quantified and its anatomical origins were examined. The research directly addresses a key challenge in standardizing radiotherapy planning and holds significant clinical relevance. The methodology is well-structured, incorporating multi-center participation and employing multidimensional quantitative metrics—including volumetric measurements, coefficients of variation, and consistency indices—which enhance the reliability of the findings. Nevertheless, certain aspects—such as finer details in experimental design, deeper exploration of underlying variation mechanisms, and clearer links to clinical outcomes—offer opportunities for further refinement. Addressing these points could strengthen the study’s comprehensiveness and practical applicability.
There are some issues that need to be resolved.
- Supplementary delineation guidance and background information: Participants were not explicitly informed whether they adhered to unified guidelines (only stating "no specific guidelines required"), and it was not clarified whether participants were aware of the patient's p16-negative status (which influences lymph node drainage patterns). Suggestion for supplementary explanation: Could guidelines such as DAHANCA be provided as reference materials, along with key patient pathological information (e.g., p16 status)? This would help mitigate unnecessary variations arising from information asymmetry and enable readers to better assess whether observed differences stem from guideline interpretation discrepancies.
- Enhance the completeness of OAR delineation: Certain OARs (e.g., pituitary gland and optic chiasm) may not have been delineated (N<10 in Table 3), and the reasons for their omission were not explained (e.g., whether they were deemed unnecessary for protection or simply overlooked). It is recommended to include specific statistics on undelineated structures, analyze potential associations with differences in organ size perception and clinical significance, and explicitly justify the exclusion of unmarked data in the results to prevent data bias.
- Supplement CT scan parameter details: Only the planned CT slice thickness of 4mm was mentioned, without specifying other parameters such as scan range, window width, and window level settings that could impact delineation accuracy. It is suggested to include these details and discuss whether a 4mm slice thickness might affect the precision of delineating small-volume OARs (e.g., pituitary gland and optic chiasm). This would enhance the technical considerations for analyzing the sources of variation.
- Deepen the analysis of variation in lymph node delineation:Currently, the analysis only notes significant variations in regions such as IVb, Va, and Vc, without considering the lymph node metastasis patterns specific to oropharyngeal cancer—particularly the lymphatic drainage characteristics of p16-negative oropharyngeal cancer—to assess the clinical rationale behind these variations. It is recommended to supplement the analysis by incorporating patient-specific factors, such as cN2b staging and p16-negative status. This would allow for a discussion on whether the inclusion of the Va and Vc regions by some clinicians aligns with established clinical practices. Such an approach would help clarify whether the observed variations represent "over-delineation" or stem from "differences in guideline interpretation," thereby enhancing the clinical relevance of the findings.
- Quantitative analysis of target margin application:While the study mentions that PTV variations are associated with margin application, it lacks a quantitative assessment of the differences in margin sizes among participants—specifically, the extension distances from GTV to PTV. It is suggested to extract spatial relationship data from the outlined contours to quantify the extent of margin variation. By doing so, the proportion of PTV volume variation attributable to "margin differences" can be clarified, thereby improving the precision of the analysis regarding the underlying mechanisms of variation.
- Strengthen the anatomical details analysis of OAR variation:For instance, the delineation of the inner ear exhibits considerable variation (CoV > 0.68), yet the discussion is limited to whether the petrous part of the temporal bone is included, without further addressing the core areas of controversy—such as the boundary between the cochlea and adjacent anatomical structures. It is recommended to include schematic diagrams illustrating typical OAR delineations (e.g., inner ear, pituitary gland), highlighting the anatomical structures where boundary discrepancies among clinicians occur (e.g., cochlea versus temporal bone petrous part, pituitary gland versus sella turcica). This visual aid would make the reasons for variation more intuitive and facilitate a clearer understanding of the delineation challenges.
- Clinical dosimetric impact assessment of contouring variations: The study solely quantified volumetric discrepancies without evaluating their dosimetric implications (e.g., whether PTV2 volume differences result in inadequate target coverage or whether OAR volume variations affect dose constraint adherence). Irecommend incorporating dosimetric simulation analyses: Using a standardized treatment plan, import contours from multiple physicians into the planning system to quantify differences in target coverage metrics (e.g., D95) and OAR dose parameters (e.g., inner ear Dmax, brainstem Dmax). This approach would elucidate the actual clinical consequences of contouring variations beyond mere geometric discrepancies.
- Development of standardized contouring guidelines: While the discussion mentions "supplementary contouring atlas," more specific standardization protocols should be proposed based on study findings. These may include: (1) Establishing unified anatomical boundary definitions for highly variable nodal regions (e.g., using the lateral clavicular border as the lateral boundary for level IVb); (2) Precise delineation criteria for OARs (e.g., limiting inner ear contours to the cochlea rather than the entire petrous temporal bone); (3) Creating template-based guidelines for core target volumes (e.g., PTV2) to enhance the practical implementation of recommendations.
- Comparative analysis with prior research: The current investigation did not conduct comparative analyses with analogous studies on target volume delineation variations in head and neck malignancies (e.g., the seminal work by Nelms et al. in 2012: Nelms BE, Tomé WA, Robinson G, Wheeler J. Variations in the contouring of organs at risk: test case from a patient with oropharyngeal cancer. Int J Radiat Oncol Biol Phys. 2012 Jan 1;82(1):368-78. doi: 10.1016/j.ijrobp.2010.10.019. Epub 2010 Dec 1. PMID: 21123004.), thereby limiting the demonstration of this study's novel contributions. Recommended enhancements include: (1) evaluating whether the observed PTV/OAR variation patterns align with previous findings, and (2) determining if the identified high-variation regions (particularly in PTV2) exhibit site-specific characteristics for oropharyngeal carcinoma, which would better establish the study's significance within the field.
- Enhanced discussion of study limitations and future directions: While the manuscript appropriately acknowledges technical limitations including CT slice thickness and absence of standardized delineation protocols, it would benefit from proposing concrete solutions. Suggested additions: (1) "Implementation of high-resolution 3mm slice CT imaging could significantly enhance the precision of small organ-at-risk contouring," and (2) "Development of anatomical landmark-based standardized contouring templates would address current inconsistencies." These specific recommendations would transform the limitations section into a more forward-looking and solution-oriented discussion.
- Rationale for sample size determination: The justification for selecting "10 participants" was not explicitly provided (e.g., whether this number was derived from typical sample sizes in previous intraobserver variability studies or established through preliminary experiments). It is recommended to include either a formal sample size calculation or appropriate references to validate that this number of participants adequately represents the variability in contouring observed in actual clinical settings.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis article reports a case study (oropharyngeal cancer) regarding contouring variability among ten radiation oncologists, evaluated in terms of PTV and OARs contouring differences. Being a case study, the generalizability of the findings is limited. Also, considering the large number of articles over the past 3 decades on interobserver variability in contouring, the authors should be more convincing regarding better consensus and improved standardization across radiation therapy departments. Below are my specific comments:
Title: the authors should mention that this is a case study
Introduction: The references mentioned in the Introduction to demonstrate inconsistencies and interobserver variability in contouring are too old. Please substantiate your statements with more recent references to justify the topicality of this work for 2025.
Results – Table 3 – please include the right submandibular gland and the brainstem in the table
Discussion: Mention the current role of auto-contouring (using AI-based softwares) and its advantages compared to manual contouring, particularly regarding the larger number of available OARs in these algorithms. Comment on the possible advantage of AI-based contouring in reducing inconsistencies / interobserver variability.
Discussion: the authors should include a section on limitations of this study. Lack of guidelines in contouring, differences in observer’s preference in contouring and inclusion of diverse OARs must be mentioned. Also, the lack of generalization of findings due to the nature of the study (i.e. case study) must also be discussed.
Discussion / conclusions: The authors highlight that despite the existence of consensus guidelines / atlases to standardize delineation, differences in contouring practices still persist. In view of this, I would ask the authors to send their conclusive message more clearly and be more assertive in their recommendations regarding contouring (particularly in HNC). Otherwise, this article could be considered just an addition to the existing literature, reiterating, once again, the same old problem - the large interobserver variations in organ contouring, without any concrete clinical changes in practice.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have adequately addressed all comments raised by this reviewer.