Implementation of HER2DX Scores into Treatment Decisions in Early-Stage HER2-Positive Breast Cancer
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript addresses a clinically relevant topic, regarding the real-world implementation of the HER2DX genomic assay in early-stage HER2-positive breast cancer. However, despite the interest of the research question, the study design, sample size, and analytical limitations substantially weaken the validity of the conclusions.
- Study design does not support causal claims
The study is a single-centre, observational analysis in which HER2DX testing was performed at physician discretion. Treatment decisions were made within a multidisciplinary setting without a predefined protocol.
As a result:
- There is substantial selection bias
- There is no control group
- There is no prospective pre-/post-test decision framework
Therefore, the central claim that HER2DX “modified clinical management in 51.1% of patients” is not adequately supported, as the study cannot distinguish the independent contribution of HER2DX from physician judgment or other clinical factors.
- Definition of “treatment modification” is not reliable
The authors define treatment changes relative to a hypothetical institutional standard rather than a documented pre-HER2DX treatment plan.
This approach:
- Introduces classification bias
- Relies on assumptions rather than observed clinical decisions
- Does not reflect true clinical decision-making processes
Consequently, the reported impact of HER2DX on treatment decisions is likely overestimated and not reproducible.
- Sample size is critically insufficient
The cohort includes only 45 patients, with multiple subgroup analyses across:
- clinical stage
- three HER2DX scores
- HR status
- HER2 IHC
- Ki-67
- TILs
- treatment regimens
- pathological outcomes
Many comparisons are based on very small subgroups, leading to:
- unstable odds ratios
- extremely wide confidence intervals
- high risk of false-positive findings
For example, associations such as OR 52.5 for pCR score stratification are not credible in such a small dataset.
- Extensive multiple testing without correction
The manuscript presents numerous statistical comparisons but does not address:
- multiple testing correction
- pre-specification of hypotheses
- control of type I error
Given the small sample size, the likelihood of spurious associations is high, and the statistical significance of several findings is questionable.
- Circular interpretation of pCR outcomes
The pCR score is used to guide treatment selection, and pCR rates are then compared across score categories.
This creates a circular analytical framework, making it impossible to determine whether:
- the score is predictive
- or treatment selection is driving the observed differences
The reported high pCR rate (90%) in the high-score group cannot be interpreted as validation.
- Inadequate control of confounding
Treatment decisions are influenced by multiple factors (stage, nodal status, HR status, tumour size, physician preference), yet the analysis does not adequately isolate the effect of HER2DX.
The use of multivariable models is mentioned but not sufficiently described, and with such a small cohort, these analyses are likely statistically unreliable.
- Overinterpretation of weak or non-significant findings
Several conclusions are not supported by the data:
- The Risk score is described as influencing treatment intensity despite non-significant results (p = 0.16)
- The ERBB2 score is presented as clinically actionable without strong supporting evidence
- The manuscript frequently uses causal language (“guided”, “influenced”) that is not justified
- Proposed clinical algorithm is not justified
Figure 2 presents a treatment decision algorithm based on HER2DX scores.
Given:
- the small sample size
- the observational design
- the lack of validation
this algorithm is premature and potentially misleading, and should not be included in its current form.
- Unsupported conclusions regarding clinical safety
The statement that treatment personalization was achieved “without compromising clinical outcomes” is not supported.
The study:
- does not include survival endpoints
- has limited follow-up
- lacks a comparator group
Therefore, no conclusions regarding clinical safety or efficacy can be drawn.
Moreover,
There are multiple typographical and formatting errors (e.g., “immune firm”, “IHQ”, malformed confidence intervals), and inconsistent reporting of TILs (median values unclear.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsI have reviewed the manuscript by Nogueira et al “Implementation of HER2DX scores into treatment decisions in early-stage HER2-positive breast cancer. I consider this manuscript is a practical, single-center real-world implementation paper (n=45) showing how HER2DX scores were used to choose NAT vs upfront surgery and to modulate NAT backbone intensity. The concept is clinically relevant, but the ms needs tighter reporting of the “counterfactual” treatment plan, stronger statistical framing (small n, wide CIs), and removal of MDPI template placeholders. Also, figure transparency should be addressed
Major comments
-Authors define a composite endpoint (NAT vs surgery + regimen intensity changes), but the “initial plan based on routine practice” is not fully described. How was the baseline plan determined (before HER2DX), and who documented it?-? Was it prospectively recorded or reconstructed retrospectively? Without this, the 51.1% “impact” number can be biased.
-The abstract says stage I–II. Fig 2 title uses “Stage II–III”, which conflicts with the cohort definition. Please confirm the actual stages included and correct the figure label (II–III vs I–II).
-I consider that the statistical interpretation should be more cautious (small n, very wide CIs). Example: OR 52.5 for HR status association with pCR score has extremely wide CI (8.5-323.8). Please avoid causal language and add one short sentence that estimates are imprecise due to small strata. Consider presenting key results as descriptive +exact counts, not only ORs.
-HER2DX testing was “at discretion” and decisions are MDT-driven; this can enrich for borderline cases. Can you add a short paragraph clarifying why patients were ordered HER2DX (uncertain NAT need? IHC2+/FISH?HR+??), and whether there were “non tested” eligible HER2+ patients during the same period (to estimate selection bias)???
pCR reporting is potentially optimistic due to incomplete surgery data in stage II. here authors report pCR rates among “measurable stage II cases who had undergone surgery at time of analysis” (15/25). Please clearly state the denominator for pCR per subgroup (who had surgery vs not yet), and avoid mixing partial follow-up with final outcomes.
-The Fig-based “clinical algorithm” is useful but reads as guidance beyond evidence.
The algorithm implies a stageadapted decision tree using HER2DX. add one line in the caption/text stating this is a proposed workflow derived from a small single-center experience and is not a validated clinical guideline
-The infographic style of Fig 1 (multi-panel icons/rounded blocks) and the flowchart-like Fig 2 have a “generative” look and could plausibly come from an AI tool (style resembles Gemini-type outputs).Can the authors explicitly state in methods/acknowledgments whether any figure elements were AI-generated or AI-edited, and if yes, specify the tool and what exactly was generated vs manually designed? If not AI, please state the design platform (like BioRender/canva etc) to avoid ambiguity.
Minor
Fix small typos and formatting (e.g., “IHQ (inmunohistoquímica)” vs “IHC”, decimal comma “61,5%”).
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you for inviting me to review this manuscript draft. This manuscript presents a clinically highly relevant real-world study evaluating the implementation of the HER2DX assay in early-stage HER2-positive breast cancer.
Strengths include a modern topic (de-escalation, genomics), specific integration into multidisciplinary decision-making, and clear, up-to-date data (2023–2026). On the other hand, what makes this manuscript weaker is that it is a small, single-center observational study (n = 45). This leads to significant limitations in statistical power, interpretation of the results, and incorrect generalizability. Nevertheless, this manuscript has publication potential but requires substantial revision in interpretation and methodology.
Recommendations for Authors
MAJOR COMMENTS
- Small sample size and limited statistical power. As mentioned above, the manuscript reports on a very small sample of patients (n = 45, over the 3 years - why?) and, moreover, divides this cohort into multiple subgroup analyses. As a consequence, the results are unstable, with wide confidence intervals (e.g., OR 52.5; 95% CI 8.5–323.8)!! My strong recommendation for the authors is to state that this study is exploratory in nature, and the authors should tone down their conclusions.
- Primary endpoint – methodologically problematic. The authors state that the results can lead to a "change in treatment decision.” This is just speculation and a subjective endpoint. Treatment is always dependent on MDT decisions. Such a small cohort has specific problems. First of all, there is a high risk of decision bias. I recommend that the authors clearly explain how the “initial plan” was defined, who defined it, and acknowledge bias in their results.
- Absence of control group. The paper is missing a comparison with a non-HER2DX cohort. Therefore, it is not possible to conclude whether HER2DX truly improves decision-making. I suggest the authors emphasize this as a limitation of the study to remain scientifically and clinically accurate.
- Overinterpretation of pCR results. Throughout the manuscript, there are clear signs of overinterpretation of the results. The authors state pCR 90% in the high-score group; however, this is based on a very small number of patients and is impacted by selection bias (favorable biology). My recommendation is to rephrase this as, for example, "consistent with the hypothesis,” not as proof.
- Statistical analysis – limited. The authors used logistic regression and ORs without robust validation. The paper is missing a robust, precise multivariate model and adjustment for confounders (e.g., HR status, stage, nodal involvement). My recommendation is that the authors must reduce inferential claims or expand their analysis appropriately.
- Selection bias. The authors state that HER2DX was performed at physician discretion. This raises a risk of selecting “interesting cases.” Here, the authors need to clearly discuss this bias.
- Overstated clinical implications. In the paper, the authors propose a clinical algorithm (Figure 2). The problem is that this algorithm is not validated and is based on a small cohort. I strongly recommend removing it or significantly softening it and indicating its exploratory nature. It would be very misleading for clinicians and decision-making authorities.
- Informed consent – incomplete. I see placeholder text (copied from the journal examples) present in the manuscript. This is a formal issue: “Informed consent statement: Any research article describing a study involving humans should contain this statement. Please add ‘Informed consent was obtained…’ OR ‘waived due to…’ OR ‘Not applicable’…” The authors should state either “Informed consent was obtained from all subjects involved in the study.” or “Patient consent was waived due to retrospective design and anonymized data analysis.”, etc.
MINOR COMMENTS
Tables. Table 1 is overloaded, with very reduced readability. It needs improvement for better reader comfort. Moreover, it needs a nodal status correlation as a third column in the table.
Figures. Figure 1 is very good (visual clarity), but Figure 2 is problematic (I see clear overinterpretation).
References. Here, the authors did a good job. This section is very up-to-date (e.g., references such as Lancet Oncology 2025, JCO 2026—excellent/top-level) and is a strong point of this paper.
NATC: The authors state that up to 71.1% of patients received NAT despite early-stage cancer. This is mainly based on the result of pCR signature analysis for risk score. Selection criteria must be clearly stated in the methods section. 71.1% (32 of the 45 cases) is a suspiciously high number and may represent a very important selection bias.
DETAILED SECTION ASSESSMENT
Abstract: Strong and clear, however with slightly overstated conclusions. This needs rewording.
Introduction section: Excellent and up-to-date, clearly defining the clinical problem. No suggestions.
Methods. As mentioned previously, the weak side is selection bias and a subjective endpoint. The section needs substantial improvement. Moreover, the clinical decision of NAT vs. upfront surgery was based only on core or FNAC biopsies (furthermore, not clearly stated in the paper; the authors just wrote "pretreatment samples"). As is known from tumor heterogeneity and multiclonal evolution, not every part of a bulky tumor has the same clonal population of cells. Therefore, partial CB/FNAC sampling from some parts of the tumor preoperatively cannot reflect its heterogeneity or other biological profiles in neighboring tumor regions. This is a serious limitation of clinical management. Moreover, there is no correlation between pCR genomic signatures taken from pretreatment material and post-treatment material (excluding NAT, as it may change the tumor profile and lead to different results in final postsurgical sample analysis). None of this was considered in the conclusions and limitations of the study; however, the authors propose a management flowchart based on 45 cases and these serious methodological weaknesses.
Statistics: Needs improvement, as in its current form it is limited by a small sample cohort and wide CIs, without robust multivariable validation.
Results. Findings are presented as firm statements and are clearly overinterpreted. I recommend presenting the results from part 2.2 (HER2DX pCR score) in table form, and the same for 2.3 (HER2DX Risk Score). Moreover, the authors should consider putting these data into a single table. The text in paragraphs 2.2 and 2.3 is difficult to read, and one can miss the impact of the findings. As for paragraph 2.5 (Associations of HER2DX scores and HER2 gene copy number), I strongly recommend that the authors include this association in a correlation scatter graph with a trendline and 95% CI.
Discussion. The section is biologically strong and well written, but once again overly optimistic, with insufficiently critical insight into the robustness of the findings and their validity (N = 45!). This section needs critical improvement; overinterpretations should be addressed more as assumptions or trends.
Moreover, I lack comparison with similar studies on early and advanced breast cancer patient groups using this genomic signature in clinical decision-making.
Paragraph “Study limitations” is weak and needs to be improved. It should include all the above-mentioned methodological limitations, bias, and sampling analysis restrictions.
As for the Conclusion, the text is clearly overstated relative to the data and methodology. It must be rewritten in the form of assumptions.
Comments on the Quality of English LanguageLanguage. Style and grammar are very good overall, but there are occasionally overly complex sentences and minor typographical errors. The manuscript needs improvement and rephrasing/rewording of those sentences.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis study evaluated how often the use of commercial HER2DX transcriptomic assay led to the modification of the treatment strategy in early-stage breast cancer. Current presentation of the manuscript is somehow misleading. The Abstract should clearly explain the meaning and the purpose (the corresponding treatment modifications) of the pCR score, the Risk score, and the ERBB2 mRNA score. Please, do realize that there are many approaches to predict pCR, and even explanation of this term (pathologic complete response) is important at least to some readers. The Introduction should explicitly describe the available data on the clinical validation of this score. It is self-explanatory that the use of any assay with believed decision-making utility will result in some modification of treatment plans, although, I agree, the real-world numbers are important. However, it is unclear whether the use of the HER2DX test indeed helped to improve treatment results: at least, comparison with historical controls or published studies may look interesting, particularly with regard to pCR. It would be also interesting to obtain specific comment on patients with low ERBB2 score: could it be a mistake in the IHC staining? How these patients responded to HER2-trageted therapy as compared to women with intermediate/high scores? The text needs to be carefully checked (for example, see an error in the line 152).
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe revised manuscript is improved compared with the previous version, particularly regarding the more cautious interpretation of findings and the expanded limitations section. However, important concerns remain. The study remains a small, single-center observational cohort without a control group, predefined treatment algorithm, or prospective pre-/post-test framework. As a result, the independent contribution of HER2DX to treatment decisions cannot be reliably determined. The definition of “treatment modification” also remains problematic, as treatment changes are defined relative to a retrospectively reconstructed institutional standard rather than documented pre-test clinical plans. This introduces classification bias and limits reproducibility.
In addition, the cohort size is very limited relative to the number of subgroup analyses performed. Several associations rely on extremely small denominators and generate unstable effect estimates with very wide confidence intervals, limiting the robustness of the conclusions. The manuscript also includes numerous statistical comparisons without correction for multiple testing. Given the exploratory nature of the study and the limited sample size, the risk of false-positive findings remains substantial.
Although the language has been softened, some conclusions still appear stronger than supported by the data, particularly regarding treatment personalization “without compromising outcomes.” The study is more appropriately interpreted as a hypothesis-generating real-world implementation analysis. Finally, the proposed clinical algorithm remains overly prescriptive relative to the level of evidence provided by this cohort and may be better removed or substantially simplified.
Overall, while the revisions improve the manuscript, the methodological limitations remain substantial and continue to limit the strength and generalizability of the conclusions.
Author Response
The revised manuscript is improved compared with the previous version, particularly regarding the more cautious interpretation of findings and the expanded limitations section. However, important concerns remain. The study remains a small, single-center observational cohort without a control group, predefined treatment algorithm, or prospective pre-/post-test framework. As a result, the independent contribution of HER2DX to treatment decisions cannot be reliably determined. The definition of “treatment modification” also remains problematic, as treatment changes are defined relative to a retrospectively reconstructed institutional standard rather than documented pre-test clinical plans. This introduces classification bias and limits reproducibility. In addition, the cohort size is very limited relative to the number of subgroup analyses performed. Several associations rely on extremely small denominators and generate unstable effect estimates with very wide confidence intervals, limiting the robustness of the conclusions. The manuscript also includes numerous statistical comparisons without correction for multiple testing. Given the exploratory nature of the study and the limited sample size, the risk of false-positive findings remains substantial. Although the language has been softened, some conclusions still appear stronger than supported by the data, particularly regarding treatment personalization “without compromising outcomes.” The study is more appropriately interpreted as a hypothesis-generating real-world implementation analysis. Finally, the proposed clinical algorithm remains overly prescriptive relative to the level of evidence provided by this cohort and may be better removed or substantially simplified. Overall, while the revisions improve the manuscript, the methodological limitations remain substantial and continue to limit the strength and generalizability of the conclusions.
We thank the reviewer for their continued evaluation and for acknowledging that the revised manuscript represents an improvement, particularly regarding the more cautious interpretation of findings and the expanded limitations section. We have carefully considered each remaining concern and provide below a concise summary of how each one has been addressed.
- On the independent contribution of HER2DX, the absence of a control group, and the definition of treatment modification
We thank the reviewer for this final and important observation, which we consider fully valid. We are pleased to note that these specific concerns are already explicitly addressed in the revised manuscript's limitations section. Regarding the independent contribution of HER2DX and the definition of treatment modification, the limitations section reads;
“[...] Treatment modifications were defined relative to a retrospectively reconstructed stage- and guideline-based institutional standard rather than a prospectively documented individual pre-test plan. While this approach minimizes anchoring and desirability bias inherent to prospective physician self-report methods, residual confounding by unmeasured clinical factors cannot be excluded, and the true independent contribution of the assay to treatment decisions cannot be established from this study design alone.”
Regarding the absence of a control group, the limitations section states:
“[...] the absence of a contemporaneous non-HER2DX-tested control cohort precludes definitive causal conclusions regarding the impact of the assay on treatment decisions and clinical outcomes.”
We would also note that HER2DX, while validated and endorsed by major scientific societies (including ESMO and SEOM), is not yet publicly funded in healthcare systems, with access depending on individual hospital budgets. The limited sample size, therefore, reflects the current stage of implementation at a reference center rather than a methodological shortcoming, and real-world data of this kind constitute a recognized requirement for future funding and reimbursement negotiations.
- On subgroup analyses, small denominators, and multiple testing
We thank the reviewer for this important comment. Formal adjustment for multiple comparisons was not performed because the primary aim of the study was to evaluate the real-world clinical implementation and therapeutic impact of HER2DX in routine practice, rather than to perform de novo biomarker discovery. Importantly, the associations analyzed were based on predefined biologically plausible relationships that have already been consistently described in prior HER2DX validation studies, as detailed in the Discussion section of the manuscript. In our cohort, these analyses were intended to assess the biological coherence and real-world reproducibility of previously reported findings rather than to generate novel biomarker hypotheses.
In addition, several clinically relevant associations were further explored using multivariable logistic regression models adjusting for clinical stage and other potential confounders, providing additional analytical robustness beyond univariable comparisons. Given the relatively limited sample size of this real-world cohort, formal correction for multiple testing would likely have substantially reduced statistical power and increased the risk of overlooking clinically meaningful associations. Therefore, p-values should be interpreted descriptively and within the context of prior biological and clinical evidence.
- On conclusions regarding treatment personalization “without compromising outcomes.”
We thank the reviewer for this observation. This language has been carefully revised during the revision process. The results section of the revised manuscript reads:
“[...] After treatment modulation, pCR was achieved in 90.0% of the high-score group (9/10 evaluable patients), 85.7% (6/7) of the intermediate-score group, and 33.3% (4/12) of the low-score group (Fisher's exact p<0.05), suggesting that HER2DX-guided treatment modulation did not adversely affect pathological outcomes in this cohort.”
The Conclusions section reads accordingly:
“These findings suggest that treatment personalization based on genomic information may be feasible in routine practice without a detrimental effect on pathological response rates in this exploratory cohort and warrant prospective validation in larger cohorts.”
We believe this framing is appropriately hedged. It does not claim long-term benefit, but describes the absence of an observed detrimental effect on pCR rates — a factual observation from this dataset, explicitly qualified by the exploratory framing and the acknowledgment of all key limitations.
- On the clinical algorithm (Figure 2)
We respectfully advocate for retaining Figure 2. The figure represents a proposed conceptual framework for how the three HER2DX components may be integrated in multidisciplinary decision-making, developed on the basis of our clinical experience and grounded in the biological rationale of each score as established in the validation literature.
We would note that, given the current implementation landscape — where HER2DX is not yet publicly funded and its use remains limited to reference centers — clinicians considering its integration into practice have very limited real-world guidance available. In this context, a transparent and explicitly exploratory depiction of how the three scores may be conceptually integrated in multidisciplinary decision-making represents, in our view, one of the most practically valuable contributions of this manuscript.
The figure caption includes an explicit disclaimer, which has been further strengthened in this revision and now reads:
"[...] This algorithm represents a proposed workflow derived from a small single-center exploratory experience, and should not be used to guide individual treatment decisions outside of a prospective research context."
Real-world implementation papers of other genomic assays, including Oncotype DX, MammaPrint, and Prosigna, have routinely included proposed clinical frameworks prior to large-scale prospective validation. We believe this manuscript follows the same established precedent, and we nonetheless defer to the Editors' judgment on this matter.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have made significant changes to their manuscript and responded to the questions raised; however, they still need to correct Fig 1, as it contains errors typical of AI generation, such as incorrect words (exam "chango" instead of "change" (see attached pdf). This is mandatory.
Comments for author File:
Comments.pdf
Author Response
The authors have made significant changes to their manuscript and responded to the questions raised; however, they still need to correct Fig 1, as it contains errors typical of AI generation, such as incorrect words (e.g. “chango” instead of “change”). This is mandatory.
We thank the reviewer for their positive assessment of the revised manuscript and for the careful attention given to Figure 1. We sincerely appreciate the detailed review, including the identification of typographical errors in the figure. We are pleased to confirm that Figure 1 has been thoroughly revised and all typographical errors have been corrected. The corrected figure is included in the revised manuscript.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear editor,
After reviewing the R1 version of this manuscript I have following comments on it:
The authors should be commended for the substantial effort invested into the revision process and for addressing the reviewer comments in a highly professional and scientifically mature manner.
Compared with the original submission, the revised manuscript is now significantly strengthened in terms of methodological transparency, interpretation of findings, and acknowledgment of study limitations.
Several major concerns raised during the initial review have been appropriately addressed:
Exploratory nature of the study and interpretation of findings
The authors have substantially toned down the conclusions and now consistently describe the study as an exploratory real-world cohort. The revised conclusions are significantly more balanced and scientifically appropriate for a small observational study.
Clarification of the primary endpoint and decision-making process
The explanation regarding reconstruction of the “baseline treatment plan” according to stage- and guideline-adapted institutional standards considerably improves methodological transparency and reduces concerns regarding physician-reported decision bias. The explicit discussion of anchoring bias, desirability bias, and residual confounding is particularly appreciated.
Acknowledgment of absence of a control cohort
The manuscript now appropriately recognizes that the lack of a contemporaneous non-HER2DX cohort precludes definitive causal conclusions regarding the impact of HER2DX on treatment decisions and outcomes.
Reduction of overinterpretation regarding pCR findings
The revised wording is much more cautious and scientifically appropriate. The authors now correctly describe the pCR findings as “consistent with the hypothesis” rather than proof of predictive performance, and they acknowledge treatment-by-score confounding and selection effects.
Selection bias discussion
The manuscript now clearly states that HER2DX testing was preferentially used in clinically uncertain cases identified during multidisciplinary tumor board discussions. This is an important clarification and appropriately limits the generalizability of the findings.
Figure and table improvements
The restructuring of the results into a consolidated table substantially improves readability. The addition of the scatter plot illustrating HER2DX scores versus HER2 copy number is also valuable and strengthens the biological interpretation of the findings.
Formal and ethical corrections
The informed consent statement has been appropriately corrected and the previous placeholder text removed.
Overall, the revised manuscript is substantially improved and now represents a considerably more balanced and methodologically transparent real-world implementation study.
However, several issues still remain and should ideally be addressed before final acceptance:
Statistical robustness remains limited
Although the authors appropriately justify the absence of extensive multivariable modeling due to sample size constraints, the manuscript still presents multiple odds ratios with extremely wide confidence intervals (e.g., OR 52.5; 95% CI 8.5–323.8). These estimates remain unstable and should be interpreted primarily descriptively rather than inferentially. Additional softening of inferential language throughout the manuscript would further improve scientific clarity.
Figure 2 still appears somewhat guideline-like
Despite the addition of an appropriate disclaimer (This algorithm represents a proposed workflow derived from a small single-center experi- 271 ence and is not a validated clinical guideline.), the visual presentation of Figure 2 still resembles a semi-validated clinical decision algorithm. Given the small single-center cohort and exploratory nature of the study, the figure may still risk overinterpretation by clinicians. The authors may consider either moving this figure to supplementary material or further emphasizing its illustrative and hypothesis-generating nature. Otherwise, it is on the Editors choice to accept it or not.
Discussion section remains still relatively long and partially narrative
The Discussion is biologically sophisticated and well referenced; however, several sections remain somewhat review-like and overly extensive relative to the size of the cohort. A modest reduction focusing more directly on the presented data would improve clarity and manuscript balance.
Inconsistency regarding stage III disease
The study cohort includes stage I–II patients, whereas Figure 2 includes treatment pathways for stage II–III disease. Since stage III patients were not analyzed in this cohort, inclusion of stage III recommendations may be methodologically misleading and should be corrected or clarified.
Some conclusions remain slightly optimistic relative to the evidence level
Although substantially improved, certain statements still slightly overstate the implications of the findings considering the small sample size, absence of a control cohort, and selection bias. Further minor refinement of the wording would strengthen the manuscript. It would be better to express implications as associations.
In conclusion, the manuscript has improved considerably after revision and now addresses most of the previously identified major methodological concerns.
The study provides valuable real-world insight into HER2DX implementation in multidisciplinary decision-making for early-stage HER2-positive breast cancer. Remaining issues are mainly related to presentation, framing, and statistical caution rather than fundamental methodological flaws.
Recommendation: Accept after mentioned Minor revisions
Comments on the Quality of English LanguageLanguage. No specific comments.
Author Response
The authors should be commended for the substantial effort invested into the revision process and for addressing the reviewer comments in a highly professional and scientifically mature manner. Compared with the original submission, the revised manuscript is now significantly strengthened in terms of methodological transparency, interpretation of findings, and acknowledgment of study limitations. Several major concerns raised during the initial review have been appropriately addressed (…).
However, several issues still remain and should ideally be addressed before final acceptance:
We are deeply grateful to the reviewer for this thorough and generous reassessment, and for the detailed recognition of the specific changes introduced throughout the revision. The reviewer’s constructive engagement across both rounds has been instrumental in substantially improving the quality and transparency of this manuscript. We address the remaining points raised below.
Statistical robustness remains limited Although the authors appropriately justify the absence of extensive multivariable modeling due to sample size constraints, the manuscript still presents multiple odds ratios with extremely wide confidence intervals (e.g., OR 52.5; 95% CI 8.5–323.8). These estimates remain unstable and should be interpreted primarily descriptively rather than inferentially. Additional softening of inferential language throughout the manuscript would further improve scientific clarity.
We thank the reviewer for this important observation, which we consider fully valid and central to the scientific integrity of the manuscript. In direct response to this comment, we have added an explicit qualifier in the Results section, which now reads: "[…] (OR 52.5, 95% CI 8.5-323.8, p<0.01); given the small subgroup sizes, this estimate should be interpreted descriptively […]".
More broadly, we would note that substantial language revision in this direction had already been carried out in the revised manuscript: the Discussion was systematically revised to replace assertive causal language with associative framing throughout (e.g., "appeared to inform", "showed a trend toward", "was associated with modifications in therapeutic decision-making"), and the limitations section already states: "[…] due to the limited size of some subgroups, these estimates should be interpreted with care, as confidence intervals remain wide and the risk of unstable effect size estimates is non-trivial. […]"
We hope the reviewer finds that the cumulative changes introduced across both rounds of revision now fully reflect the exploratory nature of these findings and meet the standards of scientific clarity they have rightly requested.
Figure 2 still appears somewhat guideline-like Despite the addition of an appropriate disclaimer (This algorithm represents a proposed workflow derived from a small single-center experi- 271 ence and is not a validated clinical guideline.), the visual presentation of Figure 2 still resembles a semi-validated clinical decision algorithm. Given the small single- center cohort and exploratory nature of the study, the figure may still risk overinterpretation by clinicians. The authors may consider either moving this figure to supplementary material or further emphasizing its illustrative and hypothesis-generating nature. Otherwise, it is on the Editors choice to accept it or not.
We thank the reviewer for this balanced and pragmatic suggestion. We have chosen to retain the figure in the main manuscript, as we believe its illustrative value for readers seeking practical guidance on HER2DX integration is substantial. We would note that HER2DX is not yet publicly funded in any healthcare system and its use remains limited to reference centers, meaning that clinicians considering its integration into multidisciplinary practice have very limited real-world guidance available. In this context, a transparent and explicitly exploratory depiction of how the three scores may be conceptually integrated in decision-making represents, in our view, one of the most practically valuable contributions of this manuscript. The figure caption has been further strengthened and now reads:
"[…] This algorithm represents a proposed workflow derived from a small single-center exploratory experience, is not a validated clinical guideline, and should not be used to guide individual treatment decisions outside of a prospective research context."
We would additionally note that real-world implementation papers of other genomic assays, including Oncotype DX, MammaPrint, and Prosigna, have routinely included proposed clinical frameworks prior to large-scale prospective validation, and we believe this manuscript follows the same established precedent. We are nonetheless grateful to the reviewer for acknowledging that the ultimate decision on this matter rests with the Editors, and we fully defer to their judgment.
Discussion section remains still relatively long and partially narrative The Discussion is biologically sophisticated and well referenced; however, several sections remain somewhat review-like and overly extensive relative to the size of the cohort. A modest reduction focusing more directly on the presented data would improve clarity and manuscript balance.
We thank the reviewer for this constructive observation. Following their suggestion, we have condensed the two Discussion paragraphs that were most review-like in nature, refocusing them more directly on the findings of the present cohort.
In the paragraph on TIL biology, we removed the discussion of B-cell and T-cell receptor diversity and the comparison with CALGB 40601 and PAMELA trial data on the concordance between histopathologic TIL assessment and transcriptomic immune signatures. The paragraph now reads: "[…] The correlation between TIL levels and the pCR score reflects the immune-related gene signatures incorporated into the HER2DX assay, including the 14-gene IgG signature (15,16). While TILs provide a morphologic estimate of immune infiltration and have been associated with pCR (18), the IgG signature captures functional immune activation and has demonstrated even stronger predictive and prognostic value. In contrast, neither Risk nor ERBB2 scores showed significant correlations with TIL levels in our cohort, consistent with prior reports and reinforcing the biological non-redundancy of HER2DX components (19). […]"
In the paragraph on the ERBB2 score, we removed the discussion of T-DM1 benefit in residual disease, which was not directly supported by data from our cohort. The paragraph now reads: "[…] Importantly, HER2 amplification, HER2-enriched intrinsic subtype, and ERBB2 mRNA levels represent related but non-equivalent biological constructs. In our cohort, the close relationship with HER2 immunohistochemical intensity and lack of association with FISH copy number in equivocal cases support the concept that transcriptional output does not necessarily parallel genomic amplification, consistent with prior evidence demonstrating that ERBB2 mRNA levels provide clinically meaningful information beyond conventional HER2 classification (25,26). The preferential use of anthracycline-sparing regimens in tumors with higher ERBB2 scores supports the hypothesis that strong HER2 transcriptional activation may identify tumors sufficiently dependent on HER2 signaling to allow de-escalated treatment strategies. […]"
We believe the revised Discussion is now more proportionate to the scope and sample size of the study.
Inconsistency regarding stage III disease The study cohort includes stage I–II patients, whereas Figure 2 includestreatment pathways for stage II–III disease. Since stage III patients were not analyzed in this cohort, inclusion of stage III recommendations may be methodologically misleading and should be corrected or clarified.
We thank the reviewer for this observation. Although no stage III patients were included in the present cohort, the inclusion of stage II–III in Figure 2 was intentional, as HER2DX has been clinically validated across stage I–III HER2-positive breast cancer, with stage III patients comprising nearly 20% of the individual patient-level meta-analysis by Villacampa et al. (10), and expert recommendations explicitly support its use across this disease spectrum.
We have nonetheless amended the figure caption to clarify this distinction, which now reads:
"[…] The proposed framework extends beyond the present cohort and reflects the broader clinical application of HER2DX across stage I–III disease (10) […]"
[10] Villacampa G, Pascual T, Tarantino P, Cortés J, Perez-García J, Llombart-Cussac A, et al. HER2DX and survival outcomes in early-stage HER2-positive breast cancer: an individual patient-level meta-analysis. Lancet Oncol. 2025;26(8):1100-12. doi:10.1016/S1470-2045(25)00276-1
Some conclusions remain slightly optimistic relative to the evidence leve.l Although substantially improved, certain statements still slightly overstate the implications of the findings considering the small sample size, absence of a control cohort, and selection bias. Further minor refinement of the wording would strengthen the manuscript. It would be better to express implications as associations
We thank the reviewer for this final observation. We agree that the Conclusions should express findings as associations rather than implications regarding feasibility.
The Conclusions section has been revised accordingly and now reads:
"[…] These findings suggest that genomic information from HER2DX was associated with treatment modifications in routine multidisciplinary practice, without an observed detrimental effect on pathological response rates in this exploratory cohort, warranting prospective validation in larger series […]"
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for Authors-
Author Response
We thank the reviewer for their consideration. No comments were raised from Reviewer 4 during this round of revision.
