Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer Using Pre-Treatment Histopathologic Images

Hikmat Khan; Ziyu Su; Huina Zhang; Yihong Wang; Bohan Ning; Shi Wei; Hua Guo; Zaibo Li; Muhammad Khalid Khan Niazi

doi:10.3390/cancers17152423

,

and

¹

Department of Pathology, College of Medicine, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA

²

Department of Pathology, University of Rochester Medical Center, Rochester, NY 14642, USA

³

Department of Pathology and Laboratory Medicine, Warren Alpert Medical School, Brown University, Lifespan Academic Medical Center, Providence, RI 02903, USA

⁴

Department of Pathology, University of Alabama at Birmingham, Birmingham, AL 35294, USA

Cancers2025, 17(15), 2423;https://doi.org/10.3390/cancers17152423

This article belongs to the Section Cancer Informatics and Big Data

Version Notes

Order Reprints

Simple Summary

Triple-negative breast cancer (TNBC) is a fast-growing and hard-to-treat form of breast cancer that does not respond to hormone therapies. Although chemotherapy before surgery—called neoadjuvant chemotherapy (NACT)—is the standard treatment, not all patients benefit from it. In this study, we developed an artificial intelligence (AI) model that analyzes routine biopsy slides to predict which patients are likely to respond well to NACT. The model showed strong performance in both internal and external patient groups and focused on tumor regions rich in immune cells, such as CD8+ T, CD163+ and PD-L1. This approach could help personalize treatment, reduce unnecessary side effects, and guide more effective care for patients with TNBC.

Abstract

Triple-negative breast cancer (TNBC) remains a major clinical challenge due to its aggressive behavior and lack of targeted therapies. Accurate early prediction of response to neoadjuvant chemotherapy (NACT) is essential for guiding personalized treatment strategies and improving patient outcomes. In this study, we present an attention-based multiple instance learning (MIL) framework designed to predict pathologic complete response (pCR) directly from pre-treatment hematoxylin and eosin (H&E)-stained biopsy slides. The model was trained on a retrospective in-house cohort of 174 TNBC patients and externally validated on an independent cohort (n = 30). It achieved a mean area under the curve (AUC) of 0.85 during five-fold cross-validation and 0.78 on external testing, demonstrating robust predictive performance and generalizability. To enhance model interpretability, attention maps were spatially co-registered with multiplex immunohistochemistry (mIHC) data stained for PD-L1, CD8+ T cells, and CD163+ macrophages. The attention regions exhibited moderate spatial overlap with immune-enriched areas, with mean Intersection over Union (IoU) scores of 0.47 for PD-L1, 0.45 for CD8+ T cells, and 0.46 for CD163+ macrophages. The presence of these biomarkers in high-attention regions supports their biological relevance to NACT response in TNBC. This not only improves model interpretability but may also inform future efforts to identify clinically actionable histological biomarkers directly from H&E-stained biopsy slides, further supporting the utility of this approach for accurate NACT response prediction and advancing precision oncology in TNBC.

Keywords:

triple-negative breast cancer (TNBC); neoadjuvant chemotherapy (NACT); pathologic complete response (pCR); artificial intelligence (AI); treatment response prediction

1. Introduction

TNBC accounts for approximately 15–20% of all invasive breast cancers worldwide, corresponding to an estimated 200,000–300,000 new cases annually [1]. It is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) expression or gene amplification [2,3,4,5,6,7]. This lack of targetable receptors limits systemic treatment options, making neoadjuvant chemotherapy (NACT) the standard initial approach for early-stage disease [8,9]. The goal of NACT is to achieve pathologic complete response (pCR; ypT0N0, indicating no residual invasive carcinoma in the breast or lymph nodes) [10,11] and to downstage tumors, thereby improving surgical outcomes and enabling breast-conserving surgery in patients who might otherwise require mastectomy [12].

Approximately 40 to 50% of patients achieve a pCR [4,13,14,15,16,17], a critical surrogate endpoint that is strongly associated with improved survival [18]. In contrast, patients with residual disease (i.e., non-pCR, who did not achieve pCR) face higher rates of relapse and worse overall survival [8,19]. This underscores the urgent need for early prediction of NACT response to guide clinical decisions, including tailoring treatments, optimizing surgical planning, and avoiding unnecessary toxicity from ineffective regimens [20,21,22]. Furthermore, early identification of patients that will likely be non-pCR could also support timely consideration of alternative therapies or clinical trial enrollment [18,23]. However, due to TNBC’s aggressive biology and the lack of reliable predictive biomarkers in clinical practice, outcome prediction remains a major unmet need [17,20,24,25,26,27]. In this study, we present an attention-based multiple instance learning (MIL) framework to predict NACT response in TNBC patients from pre-treatment H&E-stained biopsy images. The main contributions of our work are as follows:

We employed an attention-based MIL framework that utilizes pre-treatment H&E-stained images to predict response (i.e., either pCR or non-pCR) to NACT in TNBC patients. Our framework demonstrates strong average predictive performance on an in-house cohort of 174 TNBC patients—an accuracy of 82%, AUC of 0.86, F1-score of 0.84, sensitivity of 0.85, specificity of 0.81, and precision of 0.80 based on five-fold cross-validation—outperforming a traditional model that relies only on clinical data.
We evaluated our attention-based MIL framework on an independent cohort of 30 TNBC patients (12 pCR and 18 non-pCR), achieving an accuracy of 76%, AUC of 0.78, F1-score of 0.67, sensitivity of 0.72, specificity of 0.73, and precision of 0.81, demonstrating its generalizability and potential for clinical utility.
To quantitatively evaluate the biological plausibility of the model’s attention, we computed the IoU between our model’s attention regions in H&E-stained biopsy slides and corresponding regions in co-registered multiplex mIHC data stained for PD-L1, CD8⁺ T cells, and CD163⁺ macrophages. Notably, we found that the model attention regions showed moderate overlap with these biomarkers, with IoU scores of 0.47 for PD-L1, 0.45 for CD8⁺ T cells, and 0.46 for CD163⁺ macrophages. The presence of these biomarkers in high-attention regions highlights their biological relevance to NACT response in TNBC and may improve model interpretability while informing future efforts to identify clinically actionable histological biomarkers directly from H&E-stained images.

In summary, these contributions underscore the potential of the proposed attention-based MIL framework applied to pre-treatment H&E-stained biopsy slides for predicting NACT response in TNBC. The proposed framework demonstrates strong predictive performance, biological interpretability via immune biomarker alignment, and generalizability across independent cohorts, highlighting its translational relevance for clinical decision support in precision oncology.

2. Related Work

Many studies have investigated whether standard clinicopathological features—such as tumor size, histologic grade, molecular subtype, and lymph node involvement—can predict response to NACT in TNBC. However, findings remain inconsistent [28]. While some reports suggest that smaller, lower-grade, and node-negative tumors are more likely to achieve a pCR, others find no significant association, indicating that these conventional features alone lack sufficient predictive power [29,30,31,32,33,34,35,36].

In the absence of reliable clinical biomarkers, machine learning and artificial intelligence (AI) approaches have emerged as promising tools for predicting NACT outcomes. Recent studies have focused heavily on radiomics and deep learning methods applied to medical imaging, particularly magnetic resonance imaging (MRI) and ultrasound. For example, Zhou et al. [37] employed a deep learning model on multiparametric MRI (DCE-MRI and DWI) and achieved an AUC of 0.86, suggesting early-treatment-phase imaging may capture predictive signatures. In contrast, Golden et al. [38] reported a more modest AUC of 0.68, potentially due to smaller sample size (n = 60) or suboptimal feature selection. Jiang et al. [39] used ultrasound-based radiomics in a cohort of 592 TNBC patients, achieving an AUC of 0.93 and an accuracy of 0.84.

Several studies combined imaging with clinical variables to improve accuracy. For instance, Xu et al. [40] integrated MRI with clinicopathological data (AUC = 0.76), while Jimenez et al. [41] incorporated tumor-infiltrating lymphocytes (AUC = 0.71). These approaches suggest that multimodal data integration may better capture tumor heterogeneity, although challenges in interpretability persist.

Despite their potential, imaging-based models depend on modality access, protocol consistency, and cost-intensive workflows, limiting their scalability in routine clinical settings. In contrast, H&E-stained biopsy images are universally available and standardized. Yet, histopathology-based deep learning models remain relatively underexplored. Recently, Savitri et al. [42] pioneered deep learning on H&E-stained slides (AUC = 0.75), offering a cost-effective alternative to imaging. Huang et al. [43] proposed IMPRESS, an AI pipeline integrating H&E with mIHC markers (PD-L1, CD8+, CD163+), reporting an AUC of 0.8975 for HER2+ and 0.7674 for TNBC, demonstrating that AI-based methods can outperform manual pathologist assessments in predicting NACT response. Hussain et al. [44] explore deep learning advancements in biomarker discovery and multi-omics integration to enhance TNBC management, while highlighting challenges such as model interpretability and limited data availability, and emphasize the importance of multidisciplinary collaboration and continued research.

In summary, while prior studies have leveraged imaging and clinical data to forecast NACT response in TNBC, histopathology-driven AI models offer a cost-effective, scalable, and biologically interpretable alternative. Our study builds on these early efforts by applying a multiple instance learning framework to pre-treatment H&E-stained biopsy slides, enhanced through alignment with immune markers derived from mIHC.

3. Materials

This section describes the two cohorts (i.e., datasets) used in this study, including the in-house cohort and the independent validation cohort. Both cohorts are available from the corresponding author upon reasonable request for research purposes. The study was approved by the Institutional Review Board (IRB protocol #2016C0025).

3.1. In-House Cohort

In this retrospective study, we included 174 female patients diagnosed with TNBC and treated with NACT at The Ohio State University Wexner Medical Center (OSUWMC) between 2013 and 2020. All patients had documented treatment outcomes, pre-treatment H&E-stained biopsy slides with tumor regions greater than 0.1 cm, and corresponding clinical data. Among these patients, 81 achieved a pCR, while 93 were categorized as non-pCR. Additionally, a subset of 64 patients had pre-NACT mIHC slides stained for CD8+ T cells, CD163+ macrophages, and PD-L1 biomarkers. This subset enabled downstream biomarker analysis and interpretability validation. To ensure robust model evaluation, we performed five-fold cross-validation at the patient level using stratified sampling, maintaining balanced pCR and non-pCR ratios across the training, validation, and test sets. The training set was used for model development, the validation set guided hyperparameter tuning and early stopping, and the test set was used for fold-level performance evaluation.

3.2. Independent Validation Cohort

To assess model generalizability, we included an independent cohort of 30 TNBC patients (12 achieved pCR; 18 were non-pCR) from The University of Texas MD Anderson Cancer Center collected between 2017 and 2022. Each patient had pre-treatment H&E-stained biopsy slides, and this cohort was used exclusively for external testing. Table 1 summarizes the distribution of pCR and non-pCR cases across both cohorts (see Supplementary Table S1 for additional details on individual cohort data).

Table 1. Distribution of patient cohorts used in this study, showing the number of cases with pathologic complete response (pCR) and pathologic incomplete response (non-pCR) for both the in-house Ohio State University Wexner Medical Center cohort and the independent MD Anderson cancer center cohort.

4. Method

We employed an attention-based MIL framework to predict response to NACT, distinguishing between pCR and non-pCR outcomes using pre-treatment H&E-stained biopsy slides [45]. Figure 1 illustrates an overview of the framework, which consists of four main stages: (1) tissue patch extraction, (2) patch-level feature encoding, (3) attention-based aggregation, and (4) slide-level classification. We begin with a brief overview of the attention-based weakly supervised learning strategy, followed by a detailed description of each stage in the subsequent subsections.

Figure 1. Overview of the pipeline for predicting pathologic complete response (pCR) to neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) using pre-treatment H&E-stained biopsy slides. First, the H&E-stained slide is segmented and divided into a grid to extract tissue patches. Each patch is then encoded into a feature vector using a pretrained deep learning encoder (i.e., UNI v2 [46], a general-purpose, self-supervised pathology foundation model trained on 1.2 million histopathology slides). These patch-level features are aggregated via an attention mechanism [45] that assigns greater weight to the most informative regions, resulting in a slide-level feature representation. A fully connected neural network classifier then utilizes the slide-level feature representation to predict the likelihood of a complete response (pCR) or non-response (non-pCR) to NACT for each patient.

4.1. Overview

Each H&E-stained biopsy slide is divided into non-overlapping patches (also referred to as instances or tiles). Patches from the same biopsy slide are grouped into a single “bag” with only a slide-level label (pCR or non-pCR) assigned. The model is trained to identify and attend to the most informative patches within the bag of each slide that contribute to the overall NACT response prediction.

4.2. Patch Extraction and Feature Encoding

In the MIL framework, each H&E-stained biopsy slide is partitioned into non-overlapping 512 × 512-pixel patches at 40× magnification (0.25 µm/pixel resolution). Patches from the same biopsy slide are grouped into a single “bag”. Each patch in the given bag is then passed through a pretrained UNI v2 (a general-purpose, self-supervised pathology foundation model trained on 1.2 million histopathology slides) [46] to extract discriminative feature

h_{i} \in R^{d}

for each patch

i

, where d = 1536 denotes the feature dimensionality, corresponding to the output of the penultimate layer of the UNI v2 encoder [46,47]. The resulting bag of patch-level feature vectors serves as the input to the next stage, where attention-based aggregation enables the model to focus on the most informative patches for the overall prediction of pCR to NACT.

4.3. Attention-Based Aggregation

In this stage, an attention mechanism is employed to learn attention weights for each patch feature vector (i.e.,

α_{i} \in [0,1]

, satisfying

\sum α_{i} = 1

) [45], representing the contribution (or importance) of each patch to the final slide-level prediction. Then, a slide-level feature vector

z

is computed using attention-weighted aggregation, formally defined below.

z = \sum_{k = 1}^{N} α_{k} h_{k}

where

N

is the number of patches in the given slide,

h_{k}

is the feature vector for the

k

-th patch, and

α_{k}

is the corresponding attention weight for the

k

-th patch. The attention weights are computed as follows:

α_{k} = \frac{\exp (w^{T} \tanh (V h_{k}^{T}))}{\sum_{j = 1}^{N} e x p (w^{T} \tanh (V h_{j}^{T}))}

where

w

and

V

are learnable parameters of the attention-based MIL framework, while

α_{k}

represents the normalized attention weight of the k-th patch in the final prediction. The attention mechanism offers two key benefits: (1) it enhances predictive performance by adaptively focusing on the most relevant morphological features, and (2) it provides interpretability through spatial attention maps that highlight histological regions strongly associated with NACT response prediction.

4.4. Slide-Level Classification

In this stage, the aggregated slide-level feature vector serves as input to a fully connected network with a final sigmoid activation to estimate the probability of a pCR versus non-pCR response to treatment outcome for NACT in TNBC [48].

4.5. Class-Weighted Loss Function

To address the challenge of class imbalance, we employed a class-weighted binary cross-entropy loss function, defined as

L = - \frac{1}{B} \sum_{k = 1}^{B} [{w_{p C R} . y}_{k} \log p_{k} + w_{n o n - p C R} * (1 - y_{k}) \log (1 - p_{k})]

where

B

is the batch size,

y_{k} \in \{0, 1\}

is the ground-truth label, and

p_{k}

is the predicted probability for the

k

-th bag. The class weights

w_{p C R}

and

w_{n o n - p C R}

are assigned to the pCR and non-pCR classes and are computed as follows:

w_{p C R} = \frac{N}{2 * N_{p C R}}

w_{n o n - p C R} = \frac{N}{2 * N_{n o n - p C R}}

where

N

is the total sum of

N_{p C R}

and

N_{n o n - p C R}

;

N_{p C R}

is the number of pCR and

N_{n o n - p C R}

is the number of non-pCR patients, respectively. The class weighting scheme compensates for the inherent imbalance in NACT response by assigning a higher penalty to misclassification of the minority class, thereby encouraging the model to be more sensitive to underrepresented cases during training.

5. Experimental Setup

5.1. Data Augmentation

To mitigate overfitting and enhance generalization, patch-level data augmentation was applied during training. Augmentations included random rotations (with ±30°), together with horizontal and vertical flips performed with a probability of 0.5, and color jittering (±0.2 adjustment in brightness, contrast, saturation, and hue) with a probability of 0.25 [49]. These augmentations not only improved robustness to histological and staining variations but also helped reduce the risk of overfitting, particularly in medical imaging tasks with limited sample sizes [50].

5.2. Training and Implementation Details

All experiments were implemented in PyTorch v2.7 and executed on an NVIDIA A100 GPU. We used the publicly available Trident library to patchify H&E-stained biopsy slides into non-overlapping 512 × 512-pixel tissue patches at 40× magnification (0.25 µm/pixel resolution). The pretrained UNI-V2 model was used as a feature extractor, producing 1536-dimensional feature embeddings for each patch, which served as inputs to the model. Model training was performed using stochastic gradient descent [51,52] with a learning rate of 0.0001, a weight decay of 0.001, and early stopping (patience of 50 epochs) based on minimum validation loss. Each experiment used a batch size of 1 and trained for up to 1024 epochs. A comprehensive list of fixed hyperparameters is provided in Supplementary Tables S2 and S3, and the hyperparameter search space is detailed in Supplementary Table S4. Optimal values were determined using a grid search strategy.

5.3. Baseline Models for Comparison

To benchmark performance, we compared our model against a set of traditional machine learning classifiers, including logistic regression [53], random forest [54], support vector machines (SVMs) [55], k-nearest neighbors (KNN) [56], and linear discriminant analysis (LDA) [57], trained on clinical data to predict NACT response. The established hyperparameter settings for each method are listed in Supplementary Table S5, while the corresponding search spaces are provided in Supplementary Table S6. Optimal values were selected via the grid search strategy.

5.4. Evaluation

We assessed the model’s predictive performance using three primary metrics: accuracy (ACC), area under the receiver operating characteristic curve (AUC-ROC), and F1-score. Accuracy quantifies the proportion of correct predictions, encompassing both true positives (TPs) and true negatives (TNs), out of all cases: it is formally defined as

A C C = \frac{T P + T N}{T P + T N + F N + F P}

The F1-score provides a balanced measure of model performance by combining precision and recall, making it especially useful in the context of class imbalance: it is formally defined as

F 1 - s c o r e = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

where

P r e c i s i o n = \frac{T P}{T P + F P}

R e c a l l = \frac{T P}{T P + F N}

6. Results and Discussion

6.1. Performance on the In-House Cohort

We initially evaluated the proposed attention-based MIL framework using five-fold cross-validation on the in-house OSUWMC cohort. As detailed in Table 2, our model achieved a mean accuracy of 0.82, AUC-ROC of 0.86, F1-score of 0.84, sensitivity of 0.85, specificity of 0.81, and precision of 0.80. The corresponding ROC curves and confusion matrices for each test fold are presented in Figure 2 and Figure 3, respectively. An analysis of the confusion matrices indicates a balanced distribution of false positives (predicting pCR when the patient did not achieve it) and false negatives (predicting non-pCR when the patient did achieve pCR). This balanced error profile, combined with consistently high sensitivity and specificity (greater than 0.80), underscores the model’s robust discriminative performance and its clinical relevance for reliably predicting the pCR to NACT.

Table 2. Performance metrics for predicting pathologic complete response (pCR) versus non-pCR following neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) in the in-house cohort, evaluated using five-fold cross-validation. Metrics are reported for each fold (Folds 1–5), along with the mean ± standard deviation across all test folds, demonstrating consistent performance across accuracy, AUC, F1-score, sensitivity, specificity, and precision.

Figure 2. Confusion matrices illustrating the model’s performance on each test fold of the in-house cohort using five-fold cross-validation.

Figure 3. Receiver operating characteristic (ROC) curves for each test fold of the in-house cohort using five-fold cross-validation. Area under the curve (AUC) values range from 0.81 to 0.91.

6.2. Generalization to External Validation Cohort

To assess generalizability, the trained models were evaluated on an independent cohort of 30 TNBC cases from the MD Anderson Cancer Center. As shown in Table 3, the model achieved a mean accuracy of 0.76, AUC-ROC of 0.82, and F1-score of 0.77, and a sensitivity of 0.72 and specificity of 0.73 on the independent validation cohort. Although a relative performance drop of approximately 6% was observed, the model retained balanced sensitivity and specificity, indicating strong generalization to out-of-distribution data and supporting its applicability in real-world clinical settings.

Table 3. Performance metrics for predicting pathologic complete response (pCR) versus non-pCR following neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) are reported for both in-house and independent cohorts. The average metrics are presented as mean ± standard deviation across three independent runs, demonstrating consistent performance in terms of accuracy, AUC, F1-score, sensitivity, specificity, and precision.

6.3. Comparison with Classical ML Models

To contextualize the performance of our framework, we evaluated it against five classical machine learning classifiers trained solely on clinical features: logistic regression [53], random forest [54], SVM [55], KNN [56], and LDA [57]. The clinical features included age, tumor type, HER2 IHC score, HER2 copy number, and HER2 ratio. As shown in Table 4, the best-performing baseline method achieved a maximum AUC of 0.79, which is notably lower than that of our proposed model. This performance gap highlights the advantage of leveraging spatial histopathological features via attention-based MIL rather than depending solely on structured clinical variables for predicting NACT treatment response in TNBC.

Table 4. Comparison of the proposed model with classical ML baselines trained on clinical data.

6.4. Attention Map Analysis and Corresponding Biological Insights

Figure 4 and Figure 5 present attention maps generated by our model for representative test cases corresponding to a patient with pCR and a non-responder (non-pCR), respectively. Visual inspection of the highlighted regions in the H&E-stained slides, co-registered with their corresponding mIHC slides, demonstrated that the model predominantly focused on regions enriched with immune biomarkers—PD-L1 (shown in brown), CD8⁺ T-cell infiltration (shown in green), and CD163⁺ macrophages (shown in red). To quantitatively assess the biological plausibility of these attention maps, we computed the Intersection over Union (IoU) between the model-generated attention maps and the spatial distribution of immune biomarkers in the aligned mIHC images. Figure 6 shows the H&E image, the registered mIHC counterpart, the model’s attention map, and the masks for CD8⁺ T cells, CD163⁺ macrophages, and PD-L1. The immune cell segmentation masks were generated by fine-tuning the CellViT++ model [58] and subsequently used for IoU calculation. Formally, the IoU for PD-L1, for example, is defined as

P D - L 1 (I o U) = \frac{P D - L 1 (m a s k) \cap B i n a r i z e d A t t e n t i o n M a p}{T o t a l n u m b e r o f P D - L 1 i n m a s k},

where

\cap

denotes the intersection, the numerator represents the overlapping region between the PD-L1 mask and the binarized attention map, and the denominator corresponds to the total area of the PD-L1 mask. We computed the IoU similarly for the other two biomarkers. As summarized in Table 5, the attention maps exhibited moderate spatial overlap with biomarker-enriched regions, with mean IoU scores of 0.47 ± 0.18 for PD-L1, 0.45 ± 0.20 for CD8⁺ T cells, and 0.46 ± 0.17 for CD163⁺ macrophages.

Figure 4. Attention map visualization of an attention-based multiple instance learning (MIL) model for a correctly classified triple-negative breast cancer (TNBC) patient (true positive) who achieved pCR to neoadjuvant chemotherapy (NACT). (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention heatmap representation showing individual patches weighted by the model’s attention scores. (d) Median attention heatmap. (e–g) Progressive filtering of attention regions showing median attention: (e) top 10% attention (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slides of the identified hotspot region at increasing magnifications: 5×, 10×, and 20×, respectively. (k–m) Multiplex immunohistochemistry (mIHC) slides of consecutive tissue sections from the same hotspot region at the same magnifications (5×, 10×, 20×), revealing the presence of PD-L1 (brown), CD8+ T cells, and CD163+ macrophages (red) in the model-identified regions. These immune markers are established biomarkers for pCR in TNBC [27], demonstrating the model’s ability to attend to immunologically relevant regions rich in biomarkers.

Figure 5. Attention map visualization of an attention-based multiple instance learning (MIL) model for a correctly classified triple-negative breast cancer (TNBC) patient (true negative) who did not achieve pathological complete response (non-pCR) to neoadjuvant chemotherapy (NACT). (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap generated by the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention: (e) top 10% attention (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slides of the identified hotspot region at increasing magnifications: 5×, 10×, and 20×, respectively. (k–m) Multiplex immunohistochemistry (mIHC) slide of consecutive tissue sections from the same hotspot region at the same magnifications (5×, 10×, 20×), revealing the presence of PD-L1 (brown), CD8+ T cells, and CD163+ macrophages (red) in the model-identified regions. These immune markers are established biomarkers for pCR in TNBC [27], demonstrating the model’s ability to attend to immunologically relevant regions rich in biomarkers.

Figure 6. Columns (a) through (d) display (a) the original H&E-stained biopsy slide, (b) the corresponding co-registered multiplex immunohistochemistry (mIHC) slide, (c) the median attention map generated by the attention model, and (d) the binarized version of the attention map. Column (e) shows the CD8+ T-cell mask, and column (f) illustrates the intersection between the binarized attention map (d) and the CD8+ T-cell mask (e), indicating the presence of CD8+ T cells within the model’s attention regions. Similarly, column (g) presents the CD163+ cell mask, and column (h) shows the intersection between (d) and (g), reflecting the attention overlap with CD163+ regions. Column (i) displays the PD-L1 mask, and column (j) presents the intersection between (d) and (i), quantifying the presence of PD-L1 in the attended regions.

Table 5. Quantification of PD-L1, CD8⁺ T, and CD163⁺ biomarkers using model attention maps. This table presents the mean Intersection over Union (IoU) values between individual cells and the model’s attention map. A higher IoU indicates greater presence of the biomarker within the model’s attended region.

These IoU results (see Table 5) underscore the biological plausibility of the model’s attention mechanism and are consistent with prior studies that highlight the role of immune infiltration and tumor-associated macrophages (TAMs), especially CD163+ M2-polarized subsets, in therapy resistance and poor pCR outcomes [28,59,60,61,62]. Notably, even in misclassified samples (Figure 7, Figure 8 and Figure 9), the model’s attention consistently localized to tumor-dense regions, suggesting that its predictions for NACT treatment response are based on morphologically and biologically meaningful patterns.

Figure 7. Attention map visualization for an incorrectly classified triple-negative breast cancer (TNBC) patient who achieved pathological complete response (pCR) to neoadjuvant chemotherapy (NACT) but who the model predicted as non-pCR. (a) H&E-stained biopsy slide thumbnail. (b,c) First row: corresponding attention heatmap generated by the deep learning model. The second row displays the weighted attention representation showing individual patches weighted by the model’s attention scores. (c) Median attention. (d–f) display the top 10%, 5%, and 1%, attention masks while below each mask is shown individual patches weighted by the model’s attention scores. (g,h) show the zoomed-in H&E slide of the identified hotspot region 1 (highlighted by the red rectangle) at increasing magnifications of 20×, and 40×, respectively. (i,j) show the zoomed-in H&E slides of the identified hotspot region 2 (highlighted by the blue rectangle) at increasing magnifications of 20×, and 40×, respectively.

Figure 8. Attention map visualization of an attention-based multiple instance learning (MIL) model for an incorrectly classified TNBC patient (false negative) who achieved pathological complete response (pCR) to neoadjuvant chemotherapy (NACT), but who the model predicted as non-pCR. (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention, (e) top 10% attention, (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slide of the identified hotspot region (highlighted by the red rectangle) at increasing magnifications: 8×, 20×, and 40×, respectively. (k–m) Zoomed-in H&E slides of the identified hotspot region (highlighted by the sky-blue rectangle) at increasing magnifications: 8×, 20×, and 40×, respectively.

Figure 9. Attention map visualization of an attention-based multiple instance learning (MIL) model for an incorrectly classified triple-negative breast cancer (TNBC) patient (false positive) who did not achieve pathological complete response (non-pCR) to neoadjuvant chemotherapy (NACT), but who the model predicted as pCR. (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention, (e) top 10% attention, (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slide of the identified hotspot region (highlighted by the red rectangle) at increasing magnifications: 10×, 20×, and 40×, respectively. (k–m) Zoomed-in H&E slide of the identified hotspot region (highlighted by the sky-blue rectangle) at increasing magnifications: 10×, 20×, and 40×, respectively.

Given the absence of definitive histological biomarkers for pCR to NACT [34], the biologically grounded interpretability of the attention maps not only enhances transparency but may also informs biomarker discovery directly from H&E images, potentially facilitating reliable and improved NACT response prediction in TNBC patients and advancing precision oncology.

6.5. Significance and Clinical Implications

The ability to predict NACT response in TNBC holds significant clinical value, enabling more personalized and timely treatment decisions [2,3,4,5,6]. Approximately 40–50% of TNBC patients achieve pCR, which correlates strongly with improved survival outcomes. In contrast, those with residual disease (non-pCR) have a higher risk of recurrence and mortality, highlighting the importance of early pCR prediction to NACT. Our framework demonstrates robust predictive performance and biological interpretability and is designed for seamless integration into existing digital pathology workflows. At The Ohio State University Wexner Medical Center, one of the largest academic cancer centers in the U.S. and a recognized leader in digital pathology, our model can be embedded into whole-slide imaging systems, providing pathologists with interpretable predictions to support treatment planning. Ultimately, our work advances the integration of AI into clinical practice, aiding oncologists and pathologists in making timely, patient-specific decisions, especially for aggressive cancers like TNBC.

7. Conclusions

In this study, we present an attention-based MIL framework to predict treatment response (i.e., pCR) to NACT in patients with TNBC using routine pre-treatment H&E-stained biopsy slides. The framework demonstrated strong predictive performance across both internal and external cohorts, with high generalizability despite the inherent heterogeneity of TNBC. Notably, the integration of attention mechanisms and the availability of mIHC slides for a subset of patients enabled spatial interpretability, revealing alignment between high-attention regions and immune biomarkers, PD-L1, CD8+ T cells, and CD163+ macrophages, validated through aligned mIHC. The moderate overlap (mean IoU ≈ 0.46) between attention maps and immune-enriched regions underscores the biological relevance of the model’s predictions. These findings highlight the potential of our framework to support personalized treatment planning, reduce overtreatment, and accelerate biomarker discovery in TNBC. Future work will focus on validating this approach in larger, multi-institutional cohorts, integrating clinical and genomic variables, and extending its application to other breast cancer subtypes and treatment response endpoints.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/cancers17152423/s1: Supplementary Table S1 details the distribution of patient cohorts used in the study, including the number of pathological complete and incomplete response cases for both the in-house OSU Wexner Medical Center cohort and the independent MD Anderson Cancer Center cohort. Supplementary Table S2 lists the software libraries and packages with versions used to ensure reproducibility. Supplementary Tables S3 and S5 provide the grid search ranges for hyperparameters explored during model selection for both the attention-based MIL and traditional ML models, respectively. Supplementary Tables S4 and S6 present the final selected hyperparameters for those models.

Author Contributions

H.K. led the work, designed the experiments, analyzed the data, prepared the figures and tables, and wrote the manuscript. Z.L. and M.K.K.N. obtained the study cohort. Z.S., H.Z., Y.W., B.N., S.W. and H.G. reviewed the manuscript and provided valuable insights and comments. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by R01 CA276301 (PIs: Niazi, Chen) from the National Cancer Institute. The project was also supported by The Ohio State University Comprehensive Cancer Center, Pelotonia Research Funds and the Department of Pathology. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or National Cancer Institute.

Institutional Review Board Statement

This study was approved by the Institutional Review Board (IRB Protocol #2016C0025).

Informed Consent Statement

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual patients included in the study.

Data Availability Statement

Original data used in this study can be requested by emailing to the corresponding authors Hikmat Khan at hikmat.Khan@osumc.edu or Zaibo Li at zaibo.li@osumc.edu.

Conflicts of Interest

All authors have no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TNBC	Triple-negative breast cancer
H&E	Hematoxylin and eosin
AI	Artificial intelligence
CD8+ T	Cluster of differentiation 8-positive T cells
CD163+	Cluster of differentiation 163-positive macrophages
PD-L1	Programmed death-ligand 1
ER	Estrogen receptor
PR	Progesterone receptor
HER2	Human epidermal growth factor receptor 2
NACT	Neoadjuvant chemotherapy
pCR	Pathological complete response
AUC	Area under the curve
IHC	Immunohistochemistry
TME	Tumor microenvironment
TILs	Tumor-infiltrating lymphocytes
OSU	The Ohio State University
MIL	Multiple instance learning
WSI	Whole-slide image
mIHC	Multiplex immunohistochemistry
TAMs	Tumor-associated macrophages
CAD	Computer-aided diagnosis
GPU	Graphics processing unit
VRAM	Video random-access memory
ROC	Receiver operating characteristic
UNI v2	Self-supervised pathology foundation model

References

Marra, A.; Curigliano, G. Adjuvant and neoadjuvant treatment of triple-negative breast cancer with chemotherapy. Cancer J. 2021, 27, 41–49. [Google Scholar] [CrossRef] [PubMed]
Soliman, A.; Li, Z.; Parwani, A.V. Artificial intelligence’s impact on breast cancer pathology: A literature review. Diagn. Pathol. 2024, 19, 38. [Google Scholar] [CrossRef] [PubMed]
Ferlay, J.; Steliarova-Foucher, E.; Lortet-Tieulent, J.; Rosso, S.; Coebergh, J.-W.W.; Comber, H.; Forman, D.; Bray, F. Cancer incidence and mortality patterns in Europe: Estimates for 40 countries in 2012. Eur. J. Cancer 2013, 49, 1374–1403. [Google Scholar] [CrossRef] [PubMed]
van den Ende, N.S.; Nguyen, A.H.; Jager, A.; Kok, M.; Debets, R.; van Deurzen, C.H. Triple-negative breast cancer and predictive markers of response to neoadjuvant chemotherapy: A systematic review. Int. J. Mol. Sci. 2023, 24, 2969. [Google Scholar] [CrossRef]
Xiong, N.; Wu, H.; Yu, Z. Advancements and challenges in triple-negative breast cancer: A comprehensive review of therapeutic and diagnostic strategies. Front. Oncol. 2024, 14, 1405491. [Google Scholar] [CrossRef]
Huang, Z.; Peng, Q.; Mao, L.; Ouyang, W.; Xiong, Y.; Tan, Y.; Chen, H.; Zhang, Z.; Li, T.; Hu, Y. Neoadjuvant Strategies for Triple Negative Breast Cancer: Current Evidence and Future Perspectives. MedComm–Future Med. 2025, 4, e70013. [Google Scholar] [CrossRef]
Han, R.; Regpala, S.; Slodkowska, E.; Nofech-Mozes, S.; Hanna, W.; Parra-Herran, C.; Lu, F.-I. Lack of standardization in the processing and reporting of post-neoadjuvant breast cancer specimens: A survey of Canadian pathologists and pathology assistants. Arch. Pathol. Lab. Med. 2020, 144, 1262–1270. [Google Scholar] [CrossRef]
Marczyk, M.; Mrukwa, A.; Yau, C.; Wolf, D.; Chen, Y.-Y.; Balassanian, R.; Nanda, R.; Parker, B.; Krings, G.; Sattar, H. Treatment Efficacy Score—Continuous residual cancer burden-based metric to compare neoadjuvant chemotherapy efficacy between randomized trial arms in breast cancer trials. Ann. Oncol. 2022, 33, 814–823. [Google Scholar] [CrossRef]
Cerbelli, B.; Scagnoli, S.; Mezi, S.; De Luca, A.; Pisegna, S.; Amabile, M.I.; Roberto, M.; Fortunato, L.; Costarelli, L.; Pernazza, A. Tissue immune profile: A tool to predict response to neoadjuvant therapy in triple negative breast cancer. Cancers 2020, 12, 2648. [Google Scholar] [CrossRef]
Carlino, F.; Feliciano, S. Efficacy of sacituzumab govitecan in a patient with TNBC with early relapse after neoadjuvant chemotherapy. Recent. Progress. Med. 2024, 115, 26e–30e. [Google Scholar]
Gradishar, W.J.; Moran, M.S.; Abraham, J.; Abramson, V.; Aft, R.; Agnese, D.; Allison, K.H.; Anderson, B.; Bailey, J.; Burstein, H.J. Breast cancer, version 3.2024, NCCN clinical practice guidelines in oncology. J. Natl. Compr. Cancer Netw. 2024, 22, 331–357. [Google Scholar] [CrossRef]
Kubouchi, K.; Shimada, K.; Yokoe, T.; Tsutsumi, Y. Avoidance and period-shortening of neoadjuvant chemotherapy against triple-negative breast cancer in stages I and II: Importance of Ki-67 labeling index and the recognition of apocrine-type lesions. Technol. Cancer Res. Treat. 2020, 19, 1533033820943246. [Google Scholar] [CrossRef]
Zhu, M.; Liang, C.; Zhang, F.; Zhu, L.; Chen, D. A nomogram to predict disease-free survival following neoadjuvant chemotherapy for triple negative breast cancer. Front. Oncol. 2021, 11, 690336. [Google Scholar] [CrossRef] [PubMed]
Abuhadra, N.; Stecklein, S.; Sharma, P.; Moulder, S. Early-stage triple-negative breast cancer: Time to optimize personalized strategies. Oncologist 2022, 27, 30–39. [Google Scholar] [CrossRef] [PubMed]
Schmid, P.; Cortes, J.; Pusztai, L.; McArthur, H.; Kümmel, S.; Bergh, J.; Denkert, C.; Park, Y.H.; Hui, R.; Harbeck, N. Pembrolizumab for early triple-negative breast cancer. N. Engl. J. Med. 2020, 382, 810–821. [Google Scholar] [CrossRef] [PubMed]
Mittendorf, E.A.; Zhang, H.; Barrios, C.H.; Saji, S.; Jung, K.H.; Hegg, R.; Koehler, A.; Sohn, J.; Iwata, H.; Telli, M.L. Neoadjuvant atezolizumab in combination with sequential nab-paclitaxel and anthracycline-based chemotherapy versus placebo and chemotherapy in patients with early-stage triple-negative breast cancer (IMpassion031): A randomised, double-blind, phase 3 trial. Lancet 2020, 396, 1090–1100. [Google Scholar] [CrossRef]
Zhao, Y.; Schaafsma, E.; Cheng, C. Gene signature-based prediction of triple-negative breast cancer patient response to Neoadjuvant chemotherapy. Cancer Med. 2020, 9, 6281–6295. [Google Scholar] [CrossRef]
da Costa, R.E.A.R.; de Oliveira, F.T.R.; Araújo, A.L.N.; Vieira, S.C. Impact of pathologic complete response on the prognosis of triple-negative breast cancer patients: A cohort study. Cureus 2023, 15, e37396. [Google Scholar] [CrossRef]
Bernemann, C.; Hülsewig, C.; Ruckert, C.; Schäfer, S.; Blümel, L.; Hempel, G.; Götte, M.; Greve, B.; Barth, P.J.; Kiesel, L. Influence of secreted frizzled receptor protein 1 (SFRP1) on neoadjuvant chemotherapy in triple negative breast cancer does not rely on WNT signaling. Mol. Cancer 2014, 13, 174. [Google Scholar] [CrossRef]
Bianchini, G.; De Angelis, C.; Licata, L.; Gianni, L. Treatment landscape of triple-negative breast cancer—Expanded options, evolving needs. Nat. Rev. Clin. Oncol. 2022, 19, 91–113. [Google Scholar] [CrossRef]
Dent, R.; Trudeau, M.; Pritchard, K.I.; Hanna, W.M.; Kahn, H.K.; Sawka, C.A.; Lickley, L.A.; Rawlinson, E.; Sun, P.; Narod, S.A. Triple-negative breast cancer: Clinical features and patterns of recurrence. Clin. Cancer Res. 2007, 13, 4429–4434. [Google Scholar] [CrossRef] [PubMed]
Oshi, M.; Newman, S.; Murthy, V.; Tokumaru, Y.; Yan, L.; Matsuyama, R.; Endo, I.; Takabe, K. ITPKC as a prognostic and predictive biomarker of neoadjuvant chemotherapy for triple negative breast cancer. Cancers 2020, 12, 2758. [Google Scholar] [CrossRef] [PubMed]
Biswas, T.; Efird, J.T.; Prasad, S.; Jindal, C.; Walker, P.R. The survival benefit of neoadjuvant chemotherapy and pCR among patients with advanced stage triple negative breast cancer. Oncotarget 2017, 8, 112712. [Google Scholar] [CrossRef] [PubMed]
Wimberly, H.; Brown, J.R.; Schalper, K.; Haack, H.; Silver, M.R.; Nixon, C.; Bossuyt, V.; Pusztai, L.; Lannin, D.R.; Rimm, D.L. PD-L1 expression correlates with tumor-infiltrating lymphocytes and response to neoadjuvant chemotherapy in breast cancer. Cancer Immunol. Res. 2015, 3, 326–332. [Google Scholar] [CrossRef]
Bae, S.B.; Cho, H.D.; Oh, M.-H.; Lee, J.-H.; Jang, S.-H.; Hong, S.A.; Cho, J.; Kim, S.Y.; Han, S.W.; Lee, J.E. Expression of programmed death receptor ligand 1 with high tumor-infiltrating lymphocytes is associated with better prognosis in breast cancer. J. Breast Cancer 2016, 19, 242. [Google Scholar] [CrossRef]
Velcheti, V.; Schalper, K.A.; Carvajal, D.E.; Anagnostou, V.K.; Syrigos, K.N.; Sznol, M.; Herbst, R.S.; Gettinger, S.N.; Chen, L.; Rimm, D.L. Programmed death ligand-1 expression in non-small cell lung cancer. Lab. Investig. 2014, 94, 107–116. [Google Scholar] [CrossRef]
Xin, Y.; Shen, G.; Zheng, Y.; Guan, Y.; Huo, X.; Li, J.; Ren, D.; Zhao, F.; Liu, Z.; Li, Z. Immune checkpoint inhibitors plus neoadjuvant chemotherapy in early triple-negative breast cancer: A systematic review and meta-analysis. BMC Cancer 2021, 21, 1261. [Google Scholar] [CrossRef]
Arole, V.; Nitta, H.; Wei, L.; Shen, T.; Parwani, A.V.; Li, Z. M2 tumor-associated macrophages play important role in predicting response to neoadjuvant chemotherapy in triple-negative breast carcinoma. Breast Cancer Res. Treat. 2021, 188, 37–42. [Google Scholar] [CrossRef]
Kim, T.; Han, W.; Kim, M.K.; Lee, J.W.; Kim, J.; Ahn, S.K.; Lee, H.-B.; Moon, H.-G.; Lee, K.-H.; Kim, T.-Y. Predictive significance of p53, Ki-67, and Bcl-2 expression for pathologic complete response after neoadjuvant chemotherapy for triple-negative breast cancer. J. Breast Cancer 2015, 18, 16. [Google Scholar] [CrossRef]
Guestini, F.; Ono, K.; Miyashita, M.; Ishida, T.; Ohuchi, N.; Nakagawa, S.; Hirakawa, H.; Tamaki, K.; Ohi, Y.; Rai, Y. Impact of Topoisomerase IIα, PTEN, ABCC1/MRP1, and KI67 on triple-negative breast cancer patients treated with neoadjuvant chemotherapy. Breast Cancer Res. Treat. 2019, 173, 275–288. [Google Scholar] [CrossRef]
Mohammed, A.A.; Elsayed, F.M.; Algazar, M.; Rashed, H.E.; Anter, A.H. Neoadjuvant chemotherapy in triple negative breast cancer: Correlation between androgen receptor expression and pathological response. Asian Pac. J. Cancer Prev. APJCP 2020, 21, 563. [Google Scholar] [CrossRef]
Zhu, M.; Yu, Y.; Shao, X.; Zhu, L.; Wang, L. Predictors of response and survival outcomes of triple negative breast cancer receiving neoadjuvant chemotherapy. Chemotherapy 2020, 65, 101–109. [Google Scholar] [CrossRef] [PubMed]
Kedzierawski, P.; Macek, P.; Ciepiela, I.; Kowalik, A.; Gozdz, S. Evaluation of complete pathological regression after neoadjuvant chemotherapy in triple-negative breast cancer patients with brca1 founder mutation Aided Bayesian A/B testing approach. Diagnostics 2021, 11, 1144. [Google Scholar] [CrossRef]
Bignon, L.; Fricker, J.P.; Nogues, C.; Mouret-Fourme, E.; Stoppa-Lyonnet, D.; Caron, O.; Lortholary, A.; Faivre, L.; Lasset, C.; Mari, V. Efficacy of anthracycline/taxane-based neo-adjuvant chemotherapy on triple-negative breast cancer in BRCA 1/BRCA 2 mutation carriers. Breast J. 2018, 24, 269–277. [Google Scholar] [CrossRef] [PubMed]
Masuda, H.; Masuda, N.; Kodama, Y.; Ogawa, M.; Karita, M.; Yamamura, J.; Tsukuda, K.; Doihara, H.; Miyoshi, S.; Mano, M. Predictive factors for the effectiveness of neoadjuvant chemotherapy and prognosis in triple-negative breast cancer patients. Cancer Chemother. Pharmacol. 2011, 67, 911–917. [Google Scholar] [CrossRef] [PubMed]
Van Bockstal, M.R.; Noel, F.; Guiot, Y.; Duhoux, F.P.; Mazzeo, F.; Van Marcke, C.; Fellah, L.; Ledoux, B.; Berlière, M.; Galant, C. Predictive markers for pathological complete response after neo-adjuvant chemotherapy in triple-negative breast cancer. Ann. Diagn. Pathol. 2020, 49, 151634. [Google Scholar] [CrossRef]
Zhou, Z.; Adrada, B.E.; Candelaria, R.P.; Elshafeey, N.A.; Boge, M.; Mohamed, R.M.; Pashapoor, S.; Sun, J.; Xu, Z.; Panthi, B. Prediction of pathologic complete response to neoadjuvant systemic therapy in triple negative breast cancer using deep learning on multiparametric MRI. Sci. Rep. 2023, 13, 1171. [Google Scholar] [CrossRef]
Golden, D.I.; Lipson, J.A.; Telli, M.L.; Ford, J.M.; Rubin, D.L. Dynamic contrast-enhanced MRI-based biomarkers of therapeutic response in triple-negative breast cancer. J. Am. Med. Inform. Assoc. 2013, 20, 1059–1066. [Google Scholar] [CrossRef]
Jiang, M.; Li, C.-L.; Luo, X.-M.; Chuan, Z.-R.; Lv, W.-Z.; Li, X.; Cui, X.-W.; Dietrich, C.F. Ultrasound-based deep learning radiomics in the assessment of pathological complete response to neoadjuvant chemotherapy in locally advanced breast cancer. Eur. J. Cancer 2021, 147, 95–105. [Google Scholar] [CrossRef]
Xu, Z.; Zhou, Z.; Son, J.B.; Feng, H.; Adrada, B.E.; Moseley, T.W.; Candelaria, R.P.; Guirguis, M.S.; Patel, M.M.; Whitman, G.J. Deep Learning Models Based on Pretreatment MRI and Clinicopathological Data to Predict Responses to Neoadjuvant Systemic Therapy in Triple-Negative Breast Cancer. Cancers 2025, 17, 966. [Google Scholar] [CrossRef]
Jimenez, J.E.; Abdelhafez, A.; Mittendorf, E.A.; Elshafeey, N.; Yung, J.P.; Litton, J.K.; Adrada, B.E.; Candelaria, R.P.; White, J.; Thompson, A.M. A model combining pretreatment MRI radiomic features and tumor-infiltrating lymphocytes to predict response to neoadjuvant systemic therapy in triple-negative breast cancer. Eur. J. Radiol. 2022, 149, 110220. [Google Scholar] [CrossRef]
Krishnamurthy, S.; Jain, P.; Tripathy, D.; Basset, R.; Randhawa, R.; Muhammad, H.; Huang, W.; Yang, H.; Kummar, S.; Wilding, G. Predicting response of triple-negative breast cancer to neoadjuvant chemotherapy using a Deep convolutional neural network–based artificial intelligence tool. JCO Clin. Cancer Inform. 2023, 7, e2200181. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Shao, W.; Han, Z.; Alkashash, A.M.; De la Sancha, C.; Parwani, A.V.; Nitta, H.; Hou, Y.; Wang, T.; Salama, P. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 2023, 7, 14. [Google Scholar] [CrossRef] [PubMed]
Hussain, M.S.; Ramalingam, P.S.; Chellasamy, G.; Yun, K.; Bisht, A.S.; Gupta, G. Harnessing Artificial Intelligence for Precision Diagnosis and Treatment of Triple Negative Breast Cancer. Clin. Breast Cancer 2025, 25, 406–421. [Google Scholar] [CrossRef] [PubMed]
Ilse, M.; Tomczak, J.; Welling, M. Attention-based deep multiple instance learning. arXiv 2018, arXiv:1802.04712. [Google Scholar] [CrossRef]
Chen, R.J.; Ding, T.; Lu, M.Y.; Williamson, D.F.; Jaume, G.; Song, A.H.; Chen, B.; Zhang, A.; Shao, D.; Shaban, M.; et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 2024, 30, 850–862. [Google Scholar] [CrossRef]
Bilal, M.; Raza, M.; Altherwy, Y.; Alsuhaibani, A.; Abduljabbar, A.; Almarshad, F.; Golding, P.; Rajpoot, N. Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact. arXiv 2025, arXiv:2502.08333. [Google Scholar]
Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
Wang, Z.; Wang, P.; Liu, K.; Wang, P.; Fu, Y.; Lu, C.-T.; Aggarwal, C.C.; Pei, J.; Zhou, Y. A comprehensive survey on data augmentation. arXiv 2024, arXiv:2405.09591. [Google Scholar] [CrossRef]
Faryna, K.; van der Laak, J.; Litjens, G. Automatic data augmentation to improve generalization of deep learning in H&E stained histopathology. Comput. Biol. Med. 2024, 170, 108018. [Google Scholar] [CrossRef]
Bottou, L. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 421–436. [Google Scholar]
Ketkar, N. Stochastic gradient descent. In Deep Learning with Python: A Hands-On Introduction; Springer: Berlin/Heidelberg, Germany, 2017; pp. 113–132. [Google Scholar]
Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Izenman, A.J. Linear discriminant analysis. In Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning; Springer: Berlin/Heidelberg, Germany, 2013; pp. 237–280. [Google Scholar]
Hörst, F.; Rempe, M.; Becker, H.; Heine, L.; Keyl, J.; Kleesiek, J. CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models. arXiv 2025, arXiv:2501.05269. [Google Scholar]
Ye, J.-H.; Wang, X.-H.; Shi, J.-J.; Yin, X.; Chen, C.; Chen, Y.; Wu, H.-Y.; Jiong, S.; Zhang, M.; Shi, X.-B. Tumor-associated macrophages are associated with response to neoadjuvant chemotherapy and poor outcomes in patients with triple-negative breast cancer. J. Cancer 2021, 12, 2886. [Google Scholar] [CrossRef]
Baharun, N.B.; Adam, A.; Zailani, M.A.H.; Rajpoot, N.M.; Xu, Q.; Zin, R.R.M. Automated scoring methods for quantitative interpretation of Tumour infiltrating lymphocytes (TILs) in breast cancer: A systematic review. BMC Cancer 2024, 24, 1202. [Google Scholar] [CrossRef]
Bhattarai, S.; Saini, G.; Li, H.; Seth, G.; Fisher, T.B.; Janssen, E.A.; Kiraz, U.; Kong, J.; Aneja, R. Predicting neoadjuvant treatment response in triple-negative breast cancer using machine learning. Diagnostics 2023, 14, 74. [Google Scholar] [CrossRef] [PubMed]
Bhattarai, S.; Rupji, M.; Chao, H.-p.; Xu, Q.; Saini, G.; Rida, P.; Aleskandarany, M.A.; Green, A.R.; Ellis, I.O.; Janssen, E.A. Cell cycle traverse rate predicts long-term outcomes in a multi-institutional cohort of patients with triple-negative breast cancer. BJC Rep. 2024, 2, 87. [Google Scholar] [CrossRef]

Figure 1. Overview of the pipeline for predicting pathologic complete response (pCR) to neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) using pre-treatment H&E-stained biopsy slides. First, the H&E-stained slide is segmented and divided into a grid to extract tissue patches. Each patch is then encoded into a feature vector using a pretrained deep learning encoder (i.e., UNI v2 [46], a general-purpose, self-supervised pathology foundation model trained on 1.2 million histopathology slides). These patch-level features are aggregated via an attention mechanism [45] that assigns greater weight to the most informative regions, resulting in a slide-level feature representation. A fully connected neural network classifier then utilizes the slide-level feature representation to predict the likelihood of a complete response (pCR) or non-response (non-pCR) to NACT for each patient.

Figure 2. Confusion matrices illustrating the model’s performance on each test fold of the in-house cohort using five-fold cross-validation.

Figure 3. Receiver operating characteristic (ROC) curves for each test fold of the in-house cohort using five-fold cross-validation. Area under the curve (AUC) values range from 0.81 to 0.91.

Figure 4. Attention map visualization of an attention-based multiple instance learning (MIL) model for a correctly classified triple-negative breast cancer (TNBC) patient (true positive) who achieved pCR to neoadjuvant chemotherapy (NACT). (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention heatmap representation showing individual patches weighted by the model’s attention scores. (d) Median attention heatmap. (e–g) Progressive filtering of attention regions showing median attention: (e) top 10% attention (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slides of the identified hotspot region at increasing magnifications: 5×, 10×, and 20×, respectively. (k–m) Multiplex immunohistochemistry (mIHC) slides of consecutive tissue sections from the same hotspot region at the same magnifications (5×, 10×, 20×), revealing the presence of PD-L1 (brown), CD8+ T cells, and CD163+ macrophages (red) in the model-identified regions. These immune markers are established biomarkers for pCR in TNBC [27], demonstrating the model’s ability to attend to immunologically relevant regions rich in biomarkers.

Figure 5. Attention map visualization of an attention-based multiple instance learning (MIL) model for a correctly classified triple-negative breast cancer (TNBC) patient (true negative) who did not achieve pathological complete response (non-pCR) to neoadjuvant chemotherapy (NACT). (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap generated by the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention: (e) top 10% attention (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slides of the identified hotspot region at increasing magnifications: 5×, 10×, and 20×, respectively. (k–m) Multiplex immunohistochemistry (mIHC) slide of consecutive tissue sections from the same hotspot region at the same magnifications (5×, 10×, 20×), revealing the presence of PD-L1 (brown), CD8+ T cells, and CD163+ macrophages (red) in the model-identified regions. These immune markers are established biomarkers for pCR in TNBC [27], demonstrating the model’s ability to attend to immunologically relevant regions rich in biomarkers.

Figure 6. Columns (a) through (d) display (a) the original H&E-stained biopsy slide, (b) the corresponding co-registered multiplex immunohistochemistry (mIHC) slide, (c) the median attention map generated by the attention model, and (d) the binarized version of the attention map. Column (e) shows the CD8+ T-cell mask, and column (f) illustrates the intersection between the binarized attention map (d) and the CD8+ T-cell mask (e), indicating the presence of CD8+ T cells within the model’s attention regions. Similarly, column (g) presents the CD163+ cell mask, and column (h) shows the intersection between (d) and (g), reflecting the attention overlap with CD163+ regions. Column (i) displays the PD-L1 mask, and column (j) presents the intersection between (d) and (i), quantifying the presence of PD-L1 in the attended regions.

Figure 7. Attention map visualization for an incorrectly classified triple-negative breast cancer (TNBC) patient who achieved pathological complete response (pCR) to neoadjuvant chemotherapy (NACT) but who the model predicted as non-pCR. (a) H&E-stained biopsy slide thumbnail. (b,c) First row: corresponding attention heatmap generated by the deep learning model. The second row displays the weighted attention representation showing individual patches weighted by the model’s attention scores. (c) Median attention. (d–f) display the top 10%, 5%, and 1%, attention masks while below each mask is shown individual patches weighted by the model’s attention scores. (g,h) show the zoomed-in H&E slide of the identified hotspot region 1 (highlighted by the red rectangle) at increasing magnifications of 20×, and 40×, respectively. (i,j) show the zoomed-in H&E slides of the identified hotspot region 2 (highlighted by the blue rectangle) at increasing magnifications of 20×, and 40×, respectively.

Figure 8. Attention map visualization of an attention-based multiple instance learning (MIL) model for an incorrectly classified TNBC patient (false negative) who achieved pathological complete response (pCR) to neoadjuvant chemotherapy (NACT), but who the model predicted as non-pCR. (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention, (e) top 10% attention, (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slide of the identified hotspot region (highlighted by the red rectangle) at increasing magnifications: 8×, 20×, and 40×, respectively. (k–m) Zoomed-in H&E slides of the identified hotspot region (highlighted by the sky-blue rectangle) at increasing magnifications: 8×, 20×, and 40×, respectively.

Figure 9. Attention map visualization of an attention-based multiple instance learning (MIL) model for an incorrectly classified triple-negative breast cancer (TNBC) patient (false positive) who did not achieve pathological complete response (non-pCR) to neoadjuvant chemotherapy (NACT), but who the model predicted as pCR. (a) H&E-stained biopsy slide thumbnail with (b) corresponding attention heatmap of the MIL model. (c) Weighted attention representation showing individual patches weighted by the model’s attention scores. (d) Median attention. (e–g) Progressive filtering of attention regions showing median attention, (e) top 10% attention, (f) and top 5% attention, (g) culminating in top 1% attention hotspots. (h–j) Zoomed-in H&E slide of the identified hotspot region (highlighted by the red rectangle) at increasing magnifications: 10×, 20×, and 40×, respectively. (k–m) Zoomed-in H&E slide of the identified hotspot region (highlighted by the sky-blue rectangle) at increasing magnifications: 10×, 20×, and 40×, respectively.

Table 1. Distribution of patient cohorts used in this study, showing the number of cases with pathologic complete response (pCR) and pathologic incomplete response (non-pCR) for both the in-house Ohio State University Wexner Medical Center cohort and the independent MD Anderson cancer center cohort.

Cohorts	pCR Cases	Non-pCR Cases	Total
OSU-Wexner Medical Center (in-house)	81	93	174
MD Anderson Cancer Center (independent)	12	18	30

Table 2. Performance metrics for predicting pathologic complete response (pCR) versus non-pCR following neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) in the in-house cohort, evaluated using five-fold cross-validation. Metrics are reported for each fold (Folds 1–5), along with the mean ± standard deviation across all test folds, demonstrating consistent performance across accuracy, AUC, F1-score, sensitivity, specificity, and precision.

Folds	Accuracy	AUC	F1-Score	Sensitivity	Specificity	Precision
1	0.83	0.88	0.88	0.89	0.82	0.78
2	0.78	0.83	0.78	0.78	0.78	0.78
3	0.83	0.91	0.86	0.90	0.80	0.75
4	0.83	0.85	0.88	0.89	0.82	0.78
5	0.83	0.81	0.80	0.78	0.84	0.89
	0.82 ± 0.02	0.86 ± 0.03	0.84 ± 0.04	0.85 ± 0.06	0.81 ± 0.01	0.80 ± 0.05

Table 3. Performance metrics for predicting pathologic complete response (pCR) versus non-pCR following neoadjuvant chemotherapy (NACT) in triple-negative breast cancer (TNBC) are reported for both in-house and independent cohorts. The average metrics are presented as mean ± standard deviation across three independent runs, demonstrating consistent performance in terms of accuracy, AUC, F1-score, sensitivity, specificity, and precision.

Cohorts	Accuracy	AUC	F1-Score	Sensitivity	Specificity	Precision
OSU-Wexner Medical Center (in-house)	0.82 ± 0.02	0.86 ± 0.03	0.84 ± 0.04	0.85 ± 0.06	0.81 ± 0.01	0.80 ± 0.05
MD Anderson Cancer Center (independent)	0.76 ± 0.03	0.78 ± 0.02	0.67 ± 0.07	0.72 ± 0.11	0.73 ± 0.02	0.81 ± 0.11

Table 4. Comparison of the proposed model with classical ML baselines trained on clinical data.

Model	Accuracy	AUC	F1-Score	Sensitivity	Specificity	Precision
Logistic Regression [53]	$0.57 \pm$ 0.08	$0.71 \pm$ $0.05$	$0.68 \pm$ 0.05	$0.89 \pm$ $0.07$	$0.27 \pm$ 0.167	$0.55 \pm$ 0.07
Random Forest [54]	$0.60 \pm$ 0.05	$0.72 \pm$ 0.04	$0.62 \pm$ 0.06	$0.65 \pm$ 0.08	$0.56 \pm$ 0.10	$0.59 \pm$ $0.06$
SVM [55]	$0.61 \pm$ 0.05	$0.67 \pm$ 0.05	$0.72 \pm$ 0.03	$1.00 \pm$ $0 .$ 00	$0.22 \pm$ 0.90	$0.56 \pm$ $0.03$
K-Nearest Neighbors [56]	$0.64 \pm$ 0.04	$0.74 \pm$ 0.04	$0.70 \pm$ 0.02	$0.86 \pm$ 0.11	$0.42 \pm$ 0.17	$0.20 \pm$ 0.08
Linear Discriminant Analysis [57]	$0.57 \pm$ 0.07	$0.71 \pm$ 0.04	$0.67 \pm$ 0.02	$0.89 \pm$ 0.07	$0.24 \pm$ 0.18	$0.54 \pm$ 0.05
Ours	$0.82 \pm$ 0.02	$0.86 \pm$ 0.03	$0.84 \pm$ 0.04	$0.85 \pm$ 0.06	$0.81 \pm$ 0.01	$0.80 \pm$ 0.05

Table 5. Quantification of PD-L1, CD8⁺ T, and CD163⁺ biomarkers using model attention maps. This table presents the mean Intersection over Union (IoU) values between individual cells and the model’s attention map. A higher IoU indicates greater presence of the biomarker within the model’s attended region.

Biomarker	IoU (with B-Attention Map)
PD-L1	$0.47 \pm$ 0.18
CD8+ T	$0.45 \pm$ 0.20
CD163+	$0.46 \pm$ 0.17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer Using Pre-Treatment Histopathologic Images

Simple Summary

Abstract

1. Introduction

2. Related Work

3. Materials

3.1. In-House Cohort

3.2. Independent Validation Cohort

4. Method

4.1. Overview

4.2. Patch Extraction and Feature Encoding

4.3. Attention-Based Aggregation

4.4. Slide-Level Classification

4.5. Class-Weighted Loss Function

5. Experimental Setup

5.1. Data Augmentation

5.2. Training and Implementation Details

5.3. Baseline Models for Comparison

5.4. Evaluation

6. Results and Discussion

6.1. Performance on the In-House Cohort

6.2. Generalization to External Validation Cohort

6.3. Comparison with Classical ML Models

6.4. Attention Map Analysis and Corresponding Biological Insights

6.5. Significance and Clinical Implications

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics