Review Reports
- Christoph Sippl 1,*,
- Felix Stark 1 and
- Stefan Linsler 1
- et al.
Reviewer 1: Anonymous Reviewer 2: Sergei Kruglik Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have addressed several points in their responses (conflict of interest statement and discussion on ROI heterogeneity and tissue compression). However, the main methodological issues identified in the previous review remain unresolved. Below, I highlight the points that, in my opinion, require improvement:
Interobserver reliability is still being assessed using a non-standard metric and Student’s t-test, which do not quantify agreement (p. 5, lines 145–147). As previously indicated, established statistics (Fleiss’ kappa or ICC) are required.
The authors’ analysis compares the detection rates of individual histological features between HE and SRS, but this comparison is morphological rather than diagnostic. For intraoperative applicability, the clinically relevant question is whether SRS correctly classifies each case (tumor vs. non-tumor). For these reasons, an imaging accuracy analysis (sensitivity, specificity, accuracy, AUC), along with clinically meaningful parameters, is necessary to meaningfully compare HE and SRS. In my opinion, for an intraoperative tool, imaging accuracy is extremely important and absolutely necessary.
Finally, to define a “non-inferiority” study, the authors must demonstrate that the new technique is not inferior to the standard technique beyond a clinically acceptable margin (Δ). Therefore, it is necessary to define a clinical margin, appropriately calculate the sample size, and use a one-sided test. This information is lacking, and therefore the conclusions should be based on “comparable performance.”
Since all these aspects affect the fundamental statistical validity of the conclusions, the manuscript cannot yet be accepted, and I believe that significant methodological revisions are still required.
Author Response
Reviewer 1: Comments and Suggestions for Authors
The authors have addressed several points in their responses (conflict of interest statement and discussion on ROI heterogeneity and tissue compression). However, the main methodological issues identified in the previous review remain unresolved. Below, I highlight the points that, in my opinion, require improvement:
Interobserver reliability is still being assessed using a non-standard metric and Student’s t-test, which do not quantify agreement (p. 5, lines 145–147). As previously indicated, established statistics (Fleiss’ kappa or ICC) are required.
Response: We thank the reviewer for this important methodological suggestion. We agree that established multi-rater agreement statistics are preferable for quantifying interobserver reliability. We have therefore replaced the previously used custom consensus metric with standard multi-rater agreement coefficients. Specifically, we now report Fleiss’ κ for each histopathological feature separately for SRS and HE images, including 95% confidence intervals. Because several features exhibited skewed prevalence distributions, we additionally report Gwet’s AC1 as a prevalence-robust agreement measure in the Supplementary Material. The previous consensus-based metric and associated t-tests have been removed from the main analysis.
The authors’ analysis compares the detection rates of individual histological features between HE and SRS, but this comparison is morphological rather than diagnostic. For intraoperative applicability, the clinically relevant question is whether SRS correctly classifies each case (tumor vs. non-tumor). For these reasons, an imaging accuracy analysis (sensitivity, specificity, accuracy, AUC), along with clinically meaningful parameters, is necessary to meaningfully compare HE and SRS. In my opinion, for an intraoperative tool, imaging accuracy is extremely important and absolutely necessary.
Response: We thank the reviewer for this important remark. To clarify the scope of the study, we have revised the Discussion section and explicitly stated that the present work was not designed as a diagnostic accuracy study across a spectrum of tumor and non-tumor entities, but rather as a feature-level agreement analysis in neuropathologically confirmed glioblastoma cases. The corresponding limitation and its implications for the interpretation of the results have now been clearly addressed in the revised manuscript.
Finally, to define a “non-inferiority” study, the authors must demonstrate that the new technique is not inferior to the standard technique beyond a clinically acceptable margin (Δ). Therefore, it is necessary to define a clinical margin, appropriately calculate the sample size, and use a one-sided test. This information is lacking, and therefore the conclusions should be based on “comparable performance.”
Response: We acknowledge that the original manuscript used non-inferiority terminology without a pre-specified non-inferiority margin or corresponding one-sided statistical framework. As the present study was not designed as a formal non-inferiority trial, all non-inferiority terminology has been removed. The results are now described as exploratory comparisons demonstrating comparable performance in selected histopathological features.
Since all these aspects affect the fundamental statistical validity of the conclusions, the manuscript cannot yet be accepted, and I believe that significant methodological revisions are still required.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript describes application of newly developed intraoperative Stimulated Raman Scattering system for high-grade glioma diagnosis via comparison of SRS-images with conventional hematoxylin-eosin staining data. The manuscript is written in a clear manner, and I had no problem to follow the story. The collection of high-quality imaging data from so many cancer patients and their further histopathological evaluation by twelve neuropathologists is impressive and is of high value for a specialist in the field. Indeed, SRS-imaging is powerful and rapidly developing tool for intraoperative evaluation of tumor tissue, and as such the related studies are timely and of high interest.
Unfortunately, the manuscript possesses several drawbacks that require major manuscript revision. To be able to recommend publication, I suggest the authors to take into account the following remarks:
1. The SRS apparatus and SRS images have not been properly described (Section 2.4 Raman imaging). The problem is that apparatus description lacks many details, and in no way an interested researcher can repeat the experiment based on provided technical information. In my opinion, the authors should describe their system in full; this information may go to Supplementary Materials, if the authors find it inappropriate in the main text. More specific direct references to previous publications may also help. In my opinion, some general information should still appear in the main text, such as:
- Wavenumber shift at which SRS-image was recorded.
- Laser power density at sample position.
- Accumulation time at each pixel, and total accumulation time of an image.
- Figure 1 should describe the whole system, including adaptation for widefield imaging of Fig. 2. Please describe each element concisely in figure caption, including company and part number (objective, PMT, detectors, galvo scanner, lock-in amplifier, etc).
- Since SRS-imaging does not utilize staining, the procedure of converting gray-scale intensity distribution data into “virtual HE” image (Fig. 2, 3) should be described in detail. If this procedure is well-known, please provide relevant reference.
2. In 2021, Di et al. published their research on Stimulated Raman Histology of gliomas in World Neurosurgery with definite positive conclusion (Reference 18). I may suggest the authors to mention this paper in Introduction, in order to position current work with respect to what has already been done, and to discuss/analyze it directly in more detail in Discussion.
In my opinion, the problem is that the main conclusion of the present manuscript about SRS technique as having strong potential to complement established intraoperative neuropathological workflows (Lines 328-329), has already been formulated by Di et al. I may suggest the authors to adopt some more balanced approach combining the already published conclusions on SRS of glioma with the new findings of the current work.
3. It is claimed that “… The device used in our study enables full-spectrum acquisition rather than narrowband detection [10,12], offering the potential to capture rich molecular data …” (Lines 304-306). Could the authors please provide one characteristic SRS-spectrum of the glioma tissue studied in this work, to support the claim?
4. One of SRS-imaging advantages seems to be potential applicability for rapid intraoperative histopathology (Lines 295-298). Could the authors please discuss the effect of tightly focused picosecond laser light with milliwatt power on glioma tissue?
5. In my opinion, Raman line at ~1005 cm-1 can be attributed to Phenylalanine, not to Amide band (Line 286).
Author Response
Reviewer 2: Comments and Suggestions for Authors
The manuscript describes application of newly developed intraoperative Stimulated Raman Scattering system for high-grade glioma diagnosis via comparison of SRS-images with conventional hematoxylin-eosin staining data. The manuscript is written in a clear manner, and I had no problem to follow the story. The collection of high-quality imaging data from so many cancer patients and their further histopathological evaluation by twelve neuropathologists is impressive and is of high value for a specialist in the field. Indeed, SRS-imaging is powerful and rapidly developing tool for intraoperative evaluation of tumor tissue, and as such the related studies are timely and of high interest. Unfortunately, the manuscript possesses several drawbacks that require major manuscript revision. To be able to recommend publication, I suggest the authors to take into account the following remarks:
- The SRS apparatus and SRS images have not been properly described (Section 2.4 Raman imaging). The problem is that apparatus description lacks many details, and in no way an interested researcher can repeat the experiment based on provided technical information. In my opinion, the authors should describe their system in full; this information may go to Supplementary Materials, if the authors find it inappropriate in the main text. More specific direct references to previous publications may also help. In my opinion, some general information should still appear in the main text, such as:
- Wavenumber shift at which SRS-image was recorded.
- Laser power density at sample position.
- Accumulation time at each pixel, and total accumulation time of an image.
Response: We thank the reviewer for this helpful comment. In the revised manuscript, Section 2.4 (Raman imaging) has been substantially expanded to provide a more detailed technical description of the SRS system and the image acquisition workflow. Specifically, we have now included the Raman shifts used for virtual HE imaging (2850 cm⁻¹ and 2940 cm⁻¹) as well as the spectral acquisition range in the fingerprint region (750–1750 cm⁻¹ in 5 cm⁻¹ steps). Laser power at the sample (175 mW for the Stokes beam and 40 mW for the pump beam), pixel dwell times (40 µs for CH-stretch imaging and 10 µs for fingerprint acquisition), and total imaging times for the different acquisition modes have been added.
- Figure 1 should describe the whole system, including adaptation for widefield imaging of Fig. 2. Please describe each element concisely in figure caption, including company and part number (objective, PMT, detectors, galvo scanner, lock-in amplifier, etc).
Response: In addition, the main optical and electronic components of the system—including galvanometric scanners, objective lens, balanced detector, lock-in amplifier, and data acquisition hardware—are now described in detail with manufacturer information. The procedure used to generate pseudo-colored “virtual HE” images from grayscale SRS intensity distributions has also been described, including the channel subtraction method used to isolate nuclear contrast and the subsequent color mapping approach. Relevant references to previously published stimulated Raman histology methods have been added.
- Since SRS-imaging does not utilize staining, the procedure of converting gray-scale intensity distribution data into “virtual HE” image (Fig. 2, 3) should be described in detail. If this procedure is well-known, please provide relevant reference.
Response: Finally, the figure legends have been expanded to clarify the system components shown in Figure 1 and the acquisition workflow illustrated in Figure 2. These additions ensure that the experimental setup and imaging procedure can be understood and reproduced by interested researchers.
- In 2021, Di et al. published their research on Stimulated Raman Histology of gliomas in World Neurosurgery with definite positive conclusion (Reference 18). I may suggest the authors to mention this paper in Introduction, in order to position current work with respect to what has already been done, and to discuss/analyze it directly in more detail in Discussion.
Response: We thank the reviewer for this helpful suggestion. The study by Di et al. (World Neurosurgery, 2021) has now been added to the Introduction to better position our work within the existing literature on stimulated Raman histology for intraoperative glioma evaluation. The revised text also clarifies how the present study complements prior work by focusing on the identification of individual histopathological features in SRS images of confirmed glioblastoma specimens.
In my opinion, the problem is that the main conclusion of the present manuscript about SRS technique as having strong potential to complement established intraoperative neuropathological workflows (Lines 328-329), has already been formulated by Di et al. I may suggest the authors to adopt some more balanced approach combining the already published conclusions on SRS of glioma with the new findings of the current work.
Response: We thank the reviewer for this valuable suggestion. The study by Di et al. has now been incorporated not only in the Introduction but also in the Discussion to better contextualize our findings within the existing literature on stimulated Raman histology for glioma evaluation. The revised text highlights how the present study complements previous work by focusing on the recognition of individual histopathological features in SRS images of confirmed glioblastoma cases.
- It is claimed that “… The device used in our study enables full-spectrum acquisition rather than narrowband detection [10,12], offering the potential to capture rich molecular data …” (Lines 304-306). Could the authors please provide one characteristic SRS-spectrum of the glioma tissue studied in this work, to support the claim?
Response: We thank the reviewer for this helpful suggestion. A representative SRS spectrum acquired from glioblastoma tissue using the imaging system employed in this study has now been added as Supplementary Figure S1. The spectrum illustrates characteristic Raman bands within the fingerprint region and demonstrates the full spectral acquisition capability of the system.
- One of SRS-imaging advantages seems to be potential applicability for rapid intraoperative histopathology (Lines 295-298). Could the authors please discuss the effect of tightly focused picosecond laser light with milliwatt power on glioma tissue?
Response: We thank the reviewer for this important remark. The Discussion section has been expanded to address the potential effects of picosecond laser irradiation on biological tissue during SRS imaging. We now clarify that the milliwatt-level laser powers and short acquisition times used in SRS microscopy result in low energy deposition and have not been associated with detectable tissue damage in previous studies.
- In my opinion, Raman line at ~1005 cm-1 can be attributed to Phenylalanine, not to Amide band (Line 286).
Response: We thank the reviewer for this helpful remark. The text has been corrected accordingly. The Raman line at ~1005 cm⁻¹ is now correctly described as the phenylalanine band rather than an amide band in the revised manuscript.
Reviewer 3 Report
Comments and Suggestions for Authors
This manuscript evaluates a portable stimulated Raman scattering (SRS) system for intraoperative “virtual H&E” imaging using a multi-rater feature checklist applied to paired SRS and conventional H&E frozen-section images from 30 confirmed GBM cases. The clinical motivation is strong, but the current design and analyses do not support several of the translational claims, and multiple reporting/figure issues reduce interpretability and reproducibility.
I first details major issues.
1) All samples are neuropathologically confirmed GBM, so the study cannot support claims about tumour vs non-tumour discrimination or low- vs high-grade glioma discrimination, which are stated in the conclusion. The work is closer to a feature-visibility / agreement study in GBM than a diagnostic accuracy study across a clinically relevant differential. Please either (i) re-scope the title/abstract/conclusion to match what was tested, or (ii) expand the dataset to include non-tumour tissue and common mimics (reactive gliosis, metastasis, lymphoma, treatment effect, low-grade glioma, etc.) with an appropriate reference standard and diagnostic accuracy metrics.
2) The Methods state each specimen was divided, with H&E and SRS taken from different portions, and the Results explicitly acknowledge that SRS and H&E were acquired from independent ROIs, conflating “image quality” with “ROI content/heterogeneity.” This is not a minor limitation—it directly undermines modality comparisons for feature recognition. At minimum, the manuscript should (a) stop describing the panels as “corresponding” fields when they are not co-registered, and (b) present a clear ROI selection protocol with any available evidence of comparability. Ideally, include a co-registered validation subset (SRS of a region subsequently sectioned for H&E, or imaging on the same tissue face) to justify modality-level claims.
3) The manuscript uses “non-inferiority” framing (and discusses equivalence), but there is no pre-specified non-inferiority margin, no CI-based analysis, and no power justification. A non-significant p-value from a χ² test is not evidence of non-inferiority. Please either (i) remove non-inferiority terminology and reframe as exploratory comparisons, or (ii) implement a proper non-inferiority framework (margin selection justified clinically, CI for differences, and sample size/power).
4) Each rater evaluates many images; each case contributes one SRS and one H&E image; outcomes are binary; and observations are not independent. Using χ² tests on pooled counts (e.g., 360 vs 360) risks pseudo-replication and inflated significance. Please re-analyze with an approach that accounts for clustering (e.g., mixed-effects logistic regression or GEE with rater and case as random effects / clusters), and report effect sizes with confidence intervals. Also address multiplicity across multiple features.
5) Defining a “good evaluation” as ≥3 discernible GBM features, and then defining “good images” via ≥6 “good evaluations,” is not clearly justified and is acknowledged to be sensitive to ROI heterogeneity. This construct risks circularity (images are “good” if they show more features) and may systematically penalize ROIs with biologically real absence of certain features (e.g., necrosis/pseudopalisading). If retained, provide a rationale and sensitivity analyses using alternative thresholds; preferably complement with objective measures (SNR, contrast, blur/focus metrics) and/or a predefined primary endpoint not dependent on feature counts.
6) The custom “interobserver validity” transformation (distance from 6/12) is unconventional and may be difficult for readers to interpret. Please report established multi-rater agreement statistics (e.g., Fleiss’ κ and/or Gwet’s AC1/AC2), with confidence intervals, and consider rater-experience stratification using appropriate models rather than independent-samples t-tests. If the custom metric is retained, justify it and show how it behaves relative to standard measures.
7) K-means on binary variables is not standard without careful choice of distance/encoding; “centroid distance 1.06” and a schematic “heatmap” do not establish cluster validity. Either remove this analysis (or move to supplement) or re-implement using methods suitable for binary data (e.g., k-modes, hierarchical clustering with Jaccard distance), report internal validity (silhouette/stability), and present cluster-wise feature prevalence with uncertainty.
8) Key acquisition and rendering parameters are missing/underspecified: objective NA/magnification, pixel size, dwell time, scan duration per low-res and high-res mode, laser powers at sample, specific wavenumbers used for virtual H&E mapping, and any post-processing/normalization. Without these, readers cannot judge whether poorer detection of mitoses/necrosis/endothelial proliferation is intrinsic or a tunable acquisition/processing issue.
9) Figures/tables: multiple issues impair clarity and credibility
- Table 1 is presented as a highlighted questionnaire screenshot with blank fields; it is not manuscript-ready and does not define thresholds for each criterion. Replace with a clean table of criteria + operational definitions.
- Figure 3 (representative pairs) is particularly problematic: only four cases are shown, co-registration is unclear, and the caption refers to symbols that are faint/unclear and may not match the exported figure. There is also a stray “f” character on the page near the figure area. Provide higher-resolution panels, consistent annotation visibility, and ideally add high-magnification insets for mitoses/endothelial proliferation.
- Figure 2 (workflow) communicates the narrative but lacks explicit panel/step labels and could better show ROI selection (overlay boxes) and timing per step to support “rapid” claims.
- Figure 5 is not a “heatmap” in the conventional sense (binary blocks with no scale). If kept, show actual prevalence (%) per cluster with a legend/scale.
10)The manuscript contains two “Conflicts of Interest” statements that contradict each other (one says no conflicts; another discloses shareholding/employment at Refined Laser Systems). This must be corrected and consolidated.
Other issues:
- You state the study was conducted per STARD, but the current design is not a classical diagnostic accuracy study across a representative spectrum (all confirmed GBM; feature checklist endpoint). Either justify STARD applicability with a checklist or remove/temper this statement.
- Please clarify: randomization of image order; whether raters were blinded to modality distribution (knowing 50/50 can bias recognition); whether any training examples were provided; and whether there was a washout period to reduce recall.
- “Available on reasonable request” is increasingly viewed as insufficient. Consider sharing de-identified image sets (or a subset) and analysis code to enable verification.
Author Response
Reviewer 3:
1) All samples are neuropathologically confirmed GBM, so the study cannot support claims about tumour vs non-tumour discrimination or low- vs high-grade glioma discrimination, which are stated in the conclusion. The work is closer to a feature-visibility / agreement study in GBM than a diagnostic accuracy study across a clinically relevant differential. Please either (i) re-scope the title/abstract/conclusion to match what was tested, or (ii) expand the dataset to include non-tumour tissue and common mimics (reactive gliosis, metastasis, lymphoma, treatment effect, low-grade glioma, etc.) with an appropriate reference standard and diagnostic accuracy metrics.
Response: We agree with the reviewer that the present cohort includes only neuropathologically confirmed glioblastoma cases and therefore does not permit conclusions regarding tumor versus non-tumor discrimination or low- versus high-grade glioma classification. We have revised the title, abstract, and conclusion accordingly and now clearly position the study as a feature-level agreement analysis in confirmed GBM cases rather than a diagnostic accuracy study across a clinical differential spectrum.
2) The Methods state each specimen was divided, with H&E and SRS taken from different portions, and the Results explicitly acknowledge that SRS and H&E were acquired from independent ROIs, conflating “image quality” with “ROI content/heterogeneity.” This is not a minor limitation—it directly undermines modality comparisons for feature recognition. At minimum, the manuscript should (a) stop describing the panels as “corresponding” fields when they are not co-registered, and (b) present a clear ROI selection protocol with any available evidence of comparability. Ideally, include a co-registered validation subset (SRS of a region subsequently sectioned for H&E, or imaging on the same tissue face) to justify modality-level claims.
Response: We thank the reviewer for this important comment. We have revised the manuscript to clarify that SRS and HE images were obtained from independent regions of interest within the same tumor specimen rather than co-registered fields. The wording describing the images as “corresponding” has been removed throughout the manuscript. In addition, the Methods section now includes a clearer description of the ROI selection procedure based on the widefield overview image used to identify representative tumor areas for SRS imaging. We also expanded the limitations section to explicitly discuss the potential influence of intratumoral heterogeneity resulting from the use of independent ROIs.
3) The manuscript uses “non-inferiority” framing (and discusses equivalence), but there is no pre-specified non-inferiority margin, no CI-based analysis, and no power justification. A non-significant p-value from a χ² test is not evidence of non-inferiority. Please either (i) remove non-inferiority terminology and reframe as exploratory comparisons, or (ii) implement a proper non-inferiority framework (margin selection justified clinically, CI for differences, and sample size/power).
Response: As mentioned with responses to reviewer 1, we acknowledge that the original manuscript used non-inferiority terminology without a pre-specified non-inferiority margin or corresponding one-sided statistical framework. As the present study was not designed as a formal non-inferiority trial, all non-inferiority terminology has been removed. The results are now described as exploratory comparisons demonstrating comparable performance in selected histopathological features.
4) Each rater evaluates many images; each case contributes one SRS and one H&E image; outcomes are binary; and observations are not independent. Using χ² tests on pooled counts (e.g., 360 vs 360) risks pseudo-replication and inflated significance. Please re-analyze with an approach that accounts for clustering (e.g., mixed-effects logistic regression or GEE with rater and case as random effects / clusters), and report effect sizes with confidence intervals. Also address multiplicity across multiple features.
Response: We appreciate this methodological clarification. We agree that treating all individual ratings as independent observations does not fully account for the hierarchical structure of the data (multiple raters per image and paired modalities within cases). We have therefore re-analyzed feature detection outcomes using generalized estimating equation (GEE) models with binomial distribution, accounting for clustering at the image level. Effect sizes are reported as odds ratios with 95% confidence intervals.
5) Defining a “good evaluation” as ≥3 discernible GBM features, and then defining “good images” via ≥6 “good evaluations,” is not clearly justified and is acknowledged to be sensitive to ROI heterogeneity. This construct risks circularity (images are “good” if they show more features) and may systematically penalize ROIs with biologically real absence of certain features (e.g., necrosis/pseudopalisading). If retained, provide a rationale and sensitivity analyses using alternative thresholds; preferably complement with objective measures (SNR, contrast, blur/focus metrics) and/or a predefined primary endpoint not dependent on feature counts.
Response: We thank the reviewer for this important observation. We agree that the thresholds used to define “good evaluations” and “good images” may be influenced by biological variability and the presence or absence of specific histopathological features. In the revised manuscript, we therefore clarify that this construct was used only as an exploratory descriptive approach to visualize patterns of feature visibility across images rather than as a formal metric of image quality. The corresponding text in the Results and Discussion (limitations) section has been revised to reflect this interpretation.
6) The custom “interobserver validity” transformation (distance from 6/12) is unconventional and may be difficult for readers to interpret. Please report established multi-rater agreement statistics (e.g., Fleiss’ κ and/or Gwet’s AC1/AC2), with confidence intervals, and consider rater-experience stratification using appropriate models rather than independent-samples t-tests. If the custom metric is retained, justify it and show how it behaves relative to standard measures.
Response: We thank the reviewer for this important comment. In the revised manuscript, the previously used custom interobserver validity metric has been removed from the main analysis and replaced by established multi-rater agreement statistics. Interobserver agreement is now reported using Fleiss’ κ with 95% confidence intervals for each feature and modality. The corresponding Methods, Results, and tables have been revised accordingly. In the revised manuscript, the examiner-experience analysis has been retained only as an exploratory descriptive analysis of consultant-level examiners. Because of the limited sample size, no formal statistical comparisons between modalities were performed, and the text has been revised accordingly to clarify the exploratory nature of this analysis.
7) K-means on binary variables is not standard without careful choice of distance/encoding; “centroid distance 1.06” and a schematic “heatmap” do not establish cluster validity. Either remove this analysis (or move to supplement) or re-implement using methods suitable for binary data (e.g., k-modes, hierarchical clustering with Jaccard distance), report internal validity (silhouette/stability), and present cluster-wise feature prevalence with uncertainty.
Response: We thank the reviewer for this important methodological remark. In response, we have removed the exploratory k-means clustering analysis from the main manuscript to avoid potential misinterpretation related to clustering methods applied to binary variables. The study’s primary analyses focus on feature recognition and interobserver agreement, which remain unchanged and form the central basis of the manuscript.
8) Key acquisition and rendering parameters are missing/underspecified: objective NA/magnification, pixel size, dwell time, scan duration per low-res and high-res mode, laser powers at sample, specific wavenumbers used for virtual H&E mapping, and any post-processing/normalization. Without these, readers cannot judge whether poorer detection of mitoses/necrosis/endothelial proliferation is intrinsic or a tunable acquisition/processing issue.
Response: We thank the reviewer for highlighting the need for more detailed acquisition and rendering parameters. In the revised manuscript, Section 2.4 (Raman imaging) has been expanded to include these technical details. Specifically, we now report the microscope objective (CFI Plan Apochromat Lambda D 20×, Nikon Corporation, Tokyo, Japan) and condenser configuration, pixel size (0.25 µm for virtual HE imaging and 1 µm for fingerprint spectral acquisition), pixel dwell times (40 µs for CH-stretch imaging and 10 µs for fingerprint acquisition), and typical scan durations for overview scans, high-resolution virtual HE imaging, and spectral fingerprint acquisition. In addition, laser powers at the sample (175 mW Stokes and 40 mW pump) and the specific Raman shifts used for virtual HE rendering (2850 cm⁻¹ and 2940 cm⁻¹) are now described. The post-processing procedure used to generate virtual HE images, including channel subtraction and pseudo-color mapping, has also been added with appropriate references. These additions clarify the imaging workflow and allow readers to better assess how acquisition parameters may influence feature visualization.
9) Figures/tables: multiple issues impair clarity and credibility
Table 1 is presented as a highlighted questionnaire screenshot with blank fields; it is not manuscript-ready and does not define thresholds for each criterion. Replace with a clean table of criteria + operational definitions
Response: We thank the reviewer for this helpful suggestion. Table 1 has been revised accordingly. The previous questionnaire screenshot has been replaced with a clean table listing the evaluated histopathological features together with their operational definitions to improve clarity and manuscript readability.
Figure 3 (representative pairs) is particularly problematic: only four cases are shown, co-registration is unclear, and the caption refers to symbols that are faint/unclear and may not match the exported figure. There is also a stray “f” character on the page near the figure area. Provide higher-resolution panels, consistent annotation visibility, and ideally add high-magnification insets for mitoses/endothelial proliferation.
Response: We thank the reviewer for this helpful comment. Figure 3 has been revised accordingly. The stray character (“f”) near the figure has been removed. The figure resolution has been increased to 600 dpi, and annotations have been improved for better visibility. To enhance representativeness, an additional tumor case has been included. Furthermore, inset panels have been added to highlight selected microscopic features at higher magnification. The caption has also been revised to clarify that SRS and HE images were obtained from independent regions of interest within the same specimen and therefore illustrate representative morphology rather than exact field-to-field correspondence.
Figure 2 (workflow) communicates the narrative but lacks explicit panel/step labels and could better show ROI selection (overlay boxes) and timing per step to support “rapid” claims.
Response: We thank the reviewer for this helpful suggestion. In the revised manuscript, the workflow illustration (Figure 2) and its legend have been clarified. The individual steps of the imaging process are now explicitly labeled (1–3), describing the initial macro-overview image acquisition, the low-resolution SRS scan used to identify a region of interest (ROI), and the subsequent high-resolution SRS imaging for detailed cellular visualization. In addition, the process of ROI selection is now described more clearly in the Methods section (Section 2.4), and the typical acquisition times for overview scans and high-resolution imaging have been added to the text to better contextualize the rapid intraoperative workflow.
Figure 5 is not a “heatmap” in the conventional sense (binary blocks with no scale). If kept, show actual prevalence (%) per cluster with a legend/scale.
Response: In line with the reviewer’s comment, Figure 5 and the associated clustering analysis have been removed from the revised manuscript to avoid potential misinterpretation of clustering results based on binary variables.
10)The manuscript contains two “Conflicts of Interest” statements that contradict each other (one says no conflicts; another discloses shareholding/employment at Refined Laser Systems). This must be corrected and consolidated.
Response: We thank the reviewer for pointing this out. The conflict-of-interest statement has been revised and consolidated into a single section in the revised manuscript to avoid ambiguity. The statement now clearly reflects the affiliations of authors associated with Refined Laser Systems GmbH.
Other issues:
You state the study was conducted per STARD, but the current design is not a classical diagnostic accuracy study across a representative spectrum (all confirmed GBM; feature checklist endpoint). Either justify STARD applicability with a checklist or remove/temper this statement.
Response: We thank the reviewer for this helpful remark. The wording in the Methods section has been revised to clarify the scope of the study. Instead of stating that the study was conducted according to the STARD guidelines, the manuscript now specifies that the study design was informed by STARD recommendations where applicable.
Please clarify: randomization of image order; whether raters were blinded to modality distribution (knowing 50/50 can bias recognition); whether any training examples were provided; and whether there was a washout period to reduce recall.
Response: We thank the reviewer for this helpful suggestion. The Methods section has been revised to clarify the image evaluation procedure. Images were presented in randomized order, and examiners were blinded to the true imaging modality. No training examples were provided prior to evaluation to capture unbiased interpretation of the SRS images. Because each image was assessed only once by each examiner, no washout period was required.
“Available on reasonable request” is increasingly viewed as insufficient. Consider sharing de-identified image sets (or a subset) and analysis code.
Response: We thank the reviewer for this suggestion. To improve transparency and reproducibility, the anonymized dataset used for the statistical analyses has been made publicly available in a public repository (Zenodo). The corresponding data availability statement has been updated in the revised manuscript.
To all reviewers: We thank the reviewers for their constructive comments, which helped improve the clarity and methodological rigor of the manuscript.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors addressed the key methodological issues raised in the previous review. Specifically, the interobserver reliability analysis was revised using established multirater agreement statistics, and the terminology related to non-inferiority was corrected. Regarding the intraoperative tool for assessing diagnostic accuracy (tumor vs. non-tumor), the authors clarify that the study was not designed as a diagnostic study and explain that the dataset includes only confirmed cases of glioblastoma, so there is no tumor/non-tumor spectrum, adding this limitation in the discussion.
Overall, the revisions have significantly improved the methodological clarity of the manuscript. For these reasons, I believe the manuscript can now be considered suitable for publication.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have correctly answered all my questions and remarks. I am recommending publication as is.