4.1. Research Background and Construction of Imaging-Based Evaluation Methodology
The combination of taxanes and anti-HER2 agents has served as the cornerstone of neoadjuvant chemotherapy for HER2-positive early breast cancer, consequently improving breast-conservation rates and survival outcomes [
18]. Anthracyclines, which have been widely utilized in cancer therapy owing to their potent antitumor effects, not only reduce tumor volume and enhance surgical resectability but also act synergistically with large-molecule targeted therapies, such as trastuzumab or pertuzumab, to further improve efficacy [
19]. The recent neoCARHP study demonstrated that in early-stage HER2-positive breast cancer patients receiving dual HER2-blockade, the THP regimen was not inferior to TCbHP in terms of pCR rate while also improving tolerability [
20]. This finding suggests that omission of carboplatin may represent an effective de-escalation strategy in neoadjuvant therapy. Consequently, our institution had implemented the THP-EC sequential regimen (taxane, trastuzumab, pertuzumab followed by epirubicin/cyclophosphamide) for HER2-positive patients undergoing neoadjuvant therapy. In the current study, hormone receptor-negative (HR−) and HR+ subgroups had been nearly equally represented among the 86 patients enrolled (51.1% vs. 48.9%, respectively).
During neoadjuvant therapy, MRI was used to evaluate the relative reduction in primary tumor size before and after treatment. The observed heterogeneity in tumor volumetric change rates (relative to baseline values) reflects interpatient variability in response to the same neoadjuvant regimen, with intrapatient differences even emerging across distinct chemotherapy phases. These differences suggest variations in drug sensitivity or potential resistance to specific agents. To compare the efficacy of THP and EC, MRI data were collected at three timepoints, namely before neoadjuvant therapy, after THP, and after EC. For irregularly shaped tumors, conventional measurements, such as the longest diameter or simplified length × width × height, may inaccurately quantify volumetric reduction during chemotherapy given that they fail to capture three-dimensional tumor regression. To address this, raw MRI data were processed using 3D Slicer software (version 5.2.2) for manual segmentation of primary lesions at each timepoint. The segmented data were then imported into Python for computational analysis, which enables precise tumor volume calculations. This approach provided an objective assessment of volumetric changes across chemotherapy phases. The volumetric change rate from baseline to post-THP (δV1) was defined as the tumor volume reduction during the THP phase divided by the baseline volume. Similarly, the volumetric change rate from post-THP to post-EC (δV2) was calculated as the volume reduction during the EC phase divided by the mid-therapy volume. Concurrently, the longest diameter change rates were recorded from MRI reports, with δL1 (post-THP) and δL2 (post-EC) being derived using RECIST version 1.1. These linear metrics were compared against volumetric measurements to evaluate their concordance in therapeutic response assessment.
4.2. Therapeutic Differences Between THP and EC Phases and Identification of Dual-Target Resistance Subgroups
As shown in
Table 2, a larger proportion of patients fell into the δL1 ≥ δL2 and δV1 ≥ δV2 categories, suggesting that most HER2-positive patients responded favorably to targeted treatment, which is consistent with clinical expectations. Despite the cardiotoxic side effects of anthracyclines, which are dose-dependent and may cause symptomatic heart failure years after treatment, thereby complicating risk assessment and prevention [
21], they remain irreplaceable in certain contexts. Recent studies highlight that the PH-FECH regimen (paclitaxel/trastuzumab followed by fluorouracil, epirubicin, and cyclophosphamide) yields higher pCR rates and longer progression-free survival than does TCH (docetaxel/carboplatin/trastuzumab), underscoring the persistent role of anthracyclines in breast cancer therapy [
22,
23]. Although approximately 70% of patients achieved good responses with targeted therapies, 30% showed superior efficacy with EC than with THP (δV1 < δV2,
Table 2). Patients with δL1 and δV1 values lower than δL2 and δV2 achieved a pCR rate of 64%, whereas those with equal or greater changes achieved a rate of 57.4% (
p > 0.05). This trend highlights the continued importance of anthracyclines, even among patients who appear sensitive to HER2-targeted therapy, and indicates the need for caution when deciding to omit anthracyclines. Additionally, EC therapy is particularly vital for patients resistant to targeted therapies [
24]. To date, no study has yet demonstrated superior pCR or disease-free survival with anthracycline-free regimens over anthracycline-containing protocols in specific populations [
25], emphasizing the need for early identification of targeted therapy-resistant patients to tailor personalized regimens.
In early breast cancer, pCR rates vary according to hormone receptor (HR) status, with HR−/HER2+ patients achieving higher pCR rates than do HR+/HER2+ patients [
26]. Consistent with the literature, our study observed higher pCR rates in the HR− group than in the HR+ subgroup (68.2% vs. 50%,
p = 0.086), mirroring findings of the TRYPHAENA trial (70% vs. 50% pCR in the HR− vs. HR+ subgroups, respectively) [
27]. These findings may stem from the inherently aggressive biology and heightened chemosensitivity of HR− tumors as opposed to HR+ tumors, which often exhibit poorer chemotherapy responses [
28]. The absolute difference of 18.2% suggests a potentially meaningful clinical trend that warrants further investigation in a larger cohort. HER2 3+ patients achieved significantly higher pCR rates than did HER2 2+ patients (68.2% vs. 30%,
p = 0.004), likely due to lower HER2 amplification rates in HER2 2+ tumors, which reduces target sensitivity. HER2 expression intensity has been validated as an independent predictor of pCR [
26]. However, the findings from the exploratory analyses, including the potential relationship between HER2 expression and pCR, should be regarded as hypothesis-generating. These results require further validation in larger, independent cohorts and may be subject to type I error due to the lack of multiple testing correction.
To further explore the efficacy of THP in HER2-positive patients, the mean values of δL1 and δV1 in the 86 patients included herein were calculated to be 0.55 (δL) and 0.85 (δV). This represents the central tendency of the tumor volume reduction observed in this patient population. By selecting the mean value, we aimed to capture the typical magnitude of tumor shrinkage that occurs during neoadjuvant therapy. This threshold serves as a reference for classifying patients with typical or above-average responses to treatment. Our cohort was then divided into two groups using these means as thresholds (above and below the mean) for analysis. Patients with δL1 ≥ 0.55 achieved a pCR rate of 66.7%, whereas those with values below this threshold reached a pCR rate of 50% (
p = 0.118;
Table 3), although the difference failed to reach significance. After stratifying our cohort based on a δV1 threshold of 0.85, we found that patients with values ≥ 0.85 demonstrated a significantly greater likelihood of achieving pCR than did those with values below this cutoff (68.4% vs. 41.4%,
p = 0.016;
Table 3). These results preliminarily indicate that THP efficacy can predict pCR and that δV1 outperforms δL1 in distinguishing responders. Notably, patients with δV1 < 0.85 accounted for approximately one-third of the cohort, whereas those with δV1 ≥ 0.85 accounted for two-thirds of the cohort. Although the mean-based threshold failed to divide our cohort equally, the majority of our patients achieved better outcomes with THP, and the lower one-third of our cohort (e.g., δV1 = 0.2–0.3) may have included patients resistant to targeted therapies, warranting clinical attention. A subgroup analysis found that among patients with δL1 < 0.55, those classified as δL1 < δL2 achieved a pCR of 64.7%, whereas those classified as δL1 ≥ δL2 achieved a pCR of only 38.1%. However, in terms of volumetric response, both categories yielded similar pCR rates (38.5% vs. 43.6%). Among patients with δL1 ≥ 0.55, those in the δL1 < δL2 subgroup also tended to have higher pCR rates, albeit not significantly. In contrast, a significant difference in pCR rates was observed within the δV1 ≥ 0.85 cohort, with individuals with stronger EC-phase shrinkage (δV1 < δV2) achieving a 100% response, whereas those with greater THP-phase reduction achieving a 62.5% response (
p = 0.026;
Table 3), reaffirming superior discriminative power of δV1. This finding indicates that even among THP-sensitive patients, EC improves pCR in a subset of patients, necessitating caution when considering the omission of anthracyclines. Identifying biomarkers for EC sensitivity requires further investigation.
To visually compare δL1 and δV1 in predicting pCR, ROC curves were plotted (
Figure 3). Accordingly, the AUC for δV1 (AUC = 0.745) (95% CI, 0.642–0.847) was higher than that for δL1 (AUC = 0.634) (95% CI, 0.512–0.757), suggesting that volumetric changes may better predict pCR,. Although this difference did not reach conventional statistical significance (
p = 0.123), the observed effect size suggests a potentially meaningful advantage, which should be interpreted with caution given the uncertainty. A scatterplot comparing δL1 and δV1 with a 45° reference line (
Figure 4) revealed that most patients fell above this line, indicating greater volumetric reduction than diameter reduction. This “volume-first shrinkage” pattern likely reflects rapid tumor necrosis or density reduction, directly correlating with higher pCR rates. Notably, 61% (23/38) of the patients classified as δL1 non-responders were reclassified as responders by δV1, with 69.6% (16/23) achieving pCR. This discrepancy suggests that δL1 alone was inadequate for assessing response in cases with substantial volumetric but minimal diameter changes, explaining the divergent AUC values. Although some patients in the lower right quadrant were still misclassified by δV1, its false-negative rate was lower than that of δL1 (23.5% vs. 37.3%). Our multivariable model combining δV1 and clinicopathological features demonstrated robust discriminative performance (AUC = 0.784, 95% CI: 0.684–0.884) and excellent calibration after bootstrap internal validation. These findings support δV1 as a strong early imaging biomarker for pCR prediction. Although δL1 also showed moderate performance, it did not remain significant in multivariate analysis, highlighting the added value of volumetric over diametric measurements. Importantly, the model maintained reliable calibration with minimal overfitting (mean absolute error = 0.039), indicating that its predictions were well-aligned with actual pCR rates. However, given the relatively small sample size and retrospective design, further external validation in larger cohorts is warranted. The post hoc power analysis indicated that our study had an exceptionally high statistical power of 99.83% (
p = 0.05). This suggests that the study was well-powered to detect significant effects, and the absence of significant differences between groups in the current analysis can be attributed to the true absence of a clinically meaningful difference, rather than a lack of statistical power. Overall, our findings suggest that δV1 more comprehensively reflects tumor shrinkage during therapy than does δL1.
4.3. Determination of δV1 Predictive Thresholds and Subgroup Stratification Performance
Based on previous results, patients with δV1 below the mean had significantly lower pCR rates than did those with δV1 above the mean, demonstrating that δV1 can quantitatively predict THP efficacy. In the δV1 ≥ mean subgroup, 84.2% of patients exhibited δV1 ≥ δV2, whereas only 15.8% showed δV1 < δV2 (
Table 3), indicating a marked imbalance in subgroup distribution. This finding suggests that the majority of chemotherapy-sensitive patients derive their benefits primarily from THP. Notably, the δV1 < δV2 subgroup achieved a 100% pCR rate, highlighting that even among patients with favorable THP responses, a small subset can achieve superior pCR through EC, underscoring its indispensable role. The threshold of 0.91 was determined using the minimum absolute difference between sensitivity and specificity (min|se − sp|), a commonly used approach to optimize diagnostic thresholds. This method aims to identify the point at which both sensitivity and specificity are balanced, providing the best overall discrimination between responders and non-responders. By choosing this threshold, we sought to maximize the predictive accuracy of δV1 in identifying patients who would benefit from neoadjuvant therapy. The cohort was nearly evenly divided into δV1 ≥ 0.91 (52.3%) and δV1 < 0.91 (47.7%) subgroups (
Table 4). The δV1 ≥ 0.91 subgroup demonstrated higher pCR rates compared with those with δV1 < 0.91 (71.1% vs. 46.3%,
p = 0.02). While this threshold showed potential discriminative value, these findings are exploratory and should be validated in independent cohorts before clinical application. Among patients with δV1 ≥ 0.91, a majority (80%) demonstrated attenuated response during subsequent EC treatment, which was associated with a reduced pCR rate of 63.9% (
p = 0.042;
Table 4). This finding suggests that prolonging dual-targeted therapy may benefit this subgroup. Both of these thresholds (0.85 and 0.91) were derived from the same cohort of patients, making them exploratory findings. As such, these thresholds require further external validation in independent cohorts before they can be applied in clinical practice.
To investigate whether the δV1 threshold could accurately stratify HER2-positive breast cancer patients with differing pCR rates across hormone receptor status and HER2 expression intensity, a bar chart was generated for comparison (
Figure A1). Regarding hormone receptor status, the HR− group exhibited a higher pCR rate than did the HR+ group in the overall population, as previously described. Subgroup analyses using δV1 thresholds of 0.85 and 0.91 revealed that the lower pCR rate in HR+ patients stemmed from exceptionally poor outcomes in δV1-defined non-responders (HR+). However, among δV1-defined responders, HR− and HR+ patients had comparable pCR rates. When calculating δV1 means separately for HR+ and HR− cohorts, we found that HR+ patients with δV1 below their subgroup-specific mean (0.93) showed significantly lower pCR rates than did those with δV1 above the mean (19.0% vs. 81.0%,
p < 0.001;
Table A1). This finding indicates that HR+ patients with limited early tumor regression (δV1 < subgroup mean) require heightened clinical scrutiny, with the necessity of treatment escalation warranting further study. Despite the high level of significance, these findings are based on subgroup analysis and should be interpreted cautiously. The large effect size highlights the potential utility of δV1 for stratification, pending further validation. Similarly, a comparison of the HER2 2+ and HER2 3+ subgroups showed significant pCR differences in the overall population. After δV1 stratification, the pCR rate gap between HER2 2+ and HER2 3+ patients approached significance in the δV1-defined responder subgroups. HER2 3+ patients consistently achieved high pCR rates across all subgroups, whereas HER2 2+ (low HER2 expression) was associated with inferior therapeutic responses, confirming HER2 expression intensity as a critical predictor of treatment efficacy.
The volumetric reduction rate during early treatment (δV1) may serve as a non-invasive imaging biomarker to identify patients who are likely to achieve pCR early during neoadjuvant therapy. Clinically, this metric could be used after completion of the THP (trastuzumab + pertuzumab) phase to stratify patients into responders vs. non-responders. Those with insufficient volumetric response (e.g., δV1 < 0.85) may be considered for treatment escalation, closer monitoring, or enrollment into clinical trials, while strong responders could potentially benefit from therapy de-escalation or earlier surgery. Nevertheless, relying solely on δV1 for decision-making carries risks of misclassification. False positives may lead to overtreatment or premature surgery, whereas false negatives could delay necessary escalation of therapy. Therefore, δV1 should be integrated with other clinical and pathological features in a multimodal decision framework until prospective validation is achieved.
4.4. Study Limitations and Future Perspectives
This study has several limitations. First, it was conducted at a single center with a retrospective design, which may introduce selection bias and limit the generalizability of the findings. Second, the modest sample size (n = 86) may affect statistical power, particularly in subgroup analyses, and increases the risk of overfitting in multivariable models. Third, spectrum bias may have occurred due to the inclusion of a specific subtype of HER2-positive breast cancer patients receiving a uniform neoadjuvant regimen, which may not reflect the broader patient population. Fourth, the predictive model was developed and internally validated within the same cohort, lacking external validation in independent datasets. This limits the current applicability of the δV1 threshold in diverse clinical settings. Lastly, segmentation of MRI-derived tumor volumes involved semi-manual processes, which are inherently subject to intra- and inter-observer variability, despite the use of standardized protocols and experienced readers. Given the single-center, retrospective nature of this study and the relatively modest sample size, future prospective validation is required to confirm the predictive utility of δV1. An appropriately powered multi-center trial should be designed to include patients with HER2-positive breast cancer undergoing standardized neoadjuvant regimens. Based on an estimated effect size from our cohort (AUC ~0.75), a sample size of approximately 250–300 patients may be needed to ensure sufficient statistical power. Imaging protocols, segmentation workflow, and response criteria should be harmonized across sites to allow reproducible implementation in clinical workflows.