Review Reports - Digital Registrar: A Schema-First Framework for Multi-Cancer Privacy-Preserving Pathology Abstraction via Local LLMs

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper addresses an important problem of structured extraction from pathology reports using LLMs and proposes a schema-first, ontology-driven framework. The work is technically sound and practically relevant, especially for privacy-preserving clinical deployments. However, the core novelty is moderate and is more of an engineering approach as similar directions (ontology-guided extraction, local LLM deployment) have been explored in recent literature. Below are some suggestions which the authors must address to bring it in a better shape for an applied clinical AI paper.

a) The paper lacks meaningful baseline comparisons, making it difficult to assess whether the proposed approach truly outperforms existing methods. The authors should include evaluations against strong baselines such as GPT-4, BioClinicalBERT, and rule-based systems to better contextualize the reported performance.
b) The annotation process introduces potential bias since initial labels were generated using the same model being evaluated; mitigate this by incorporating independently annotated samples or reporting inter-annotator agreement metrics (e.g., Cohen’s kappa).
c) The contribution of individual components (schema constraints, DSPy, prompting) is unclear due to the absence of ablation studies; perform controlled experiments by removing or modifying each component to quantify its impact on performance.
d) The experimental analysis lacks statistical rigor, as results are reported without uncertainty estimates or significance testing; strengthen this by adding confidence intervals, statistical tests, and robustness analyses across multiple runs.
e) External validation is limited in scope, as only a subset of fields and datasets are evaluated. if possible, improve generalizability by testing the full schema on more diverse, multi-institutional datasets.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Digital Registrar: A Schema-First Framework for Multi-Cancer Privacy-Preserving Pathology Abstraction via Local LLMs

Reviewer comments 1:

Chow N. and et al. discussed "Digital Registrar" is an innovative architecture that aims to overcome the translational gap between narrative surgical pathology reports and arranged cancer registries. The authors emphasize a "schema-first" strategy, using clinical ontologies based on College of American Pathologists (CAP) protocols compared to a specific large language model (LLM). The system's on-premises implementation on conventional workstation hardware (a single 48 GB GPU) overcomes serious privacy issues about patient data. Despite the design is realistic, there some imperfections that need to develop before considering the next round paper review. There are several issues to correct.

Please clarify these following points:

Major:

1) The authors observed minor variations in TNM staging due to details such as "anatomic" vs "pathologic" staging. It would be beneficial to explain how the algorithm may be fine-tuned or improved in order to better distinguish between these similar but distinct clinical terminology.

2) Due to the present design's emphasis on single-primary instances, the model occasionally misidentified double primary malignancies (such as double lung or breast tumours). Prospective institutional users would benefit from a discussion of the planned methodology for locating and marking these complicated situations for manual assessment.

3) In the Methodology lines 105-109: The authors need to provide an explanation for their decision to have only two reviewers and how they resolved disagreements between the two. It's difficult to determine if this "gold standard" is representative or only a brief snapshot of two people's perspectives in the absence of a bigger group or at least some statistics on their inter-rater reliability. It is too lengthy in comparison to the introductory and discussion sections of the paper. Reducing it or transferring some techniques as supplemental materials and techniques is preferable.

4) Performance was lower (60–90%) for poorly documented or quantitative fields like surgical technique or specific tumor percentages. Elaborating on whether these limitations stem from model reasoning or inherent inconsistencies in the source reports would provide more clarity.

Minor:
1) The merging of digital photos and genetic data is mentioned in briefly by the authors as a potential future approach. The conclusion might be strengthened with a brief explanation of how the existing JSON format may be adapted to incorporate these various data sources.

2) Methodology line 82-84 “across ten major cancer types and registry fields, including complex variable-length structures, within a model-agnostic DSPy pipeline executed on a single 48 GB GPU.” Why the author selected only 10 major of cancers. More explanation is warranted to promote the understanding of the reader.

3) Results part: Breast Biomarker extraction performance: line 327-332: If it possible the authors may include BRACA 1, BRAC2, and P53??? This will add more strength.

4) Full name is recommended to include as the first time mentioned in the manuscript to make the manuscript easier to read for the reader.

- Line 27: DSPy?

- Line 32: TCGA

- Line 33: GB GPU

- Line 103: JSON

- Line 146: AJCC/TNM

- Line 214: VRAM & GPU

5) Line 118: the full name (CAP) Repeating as it is defined previously.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have adequately addressed the major concerns raised in the previous review round. In particular, the revised manuscript now includes meaningful baseline comparisons, a substantially improved annotation and adjudication protocol, component-level ablation analyses, corrected repeated-inference statistical methodology, and expanded external validation.