Clinically Oriented Evaluation of Transfer Learning Strategies for Cross-Site Breast Cancer Histopathology Classification
Abstract
1. Introduction
- A unified comparison of three adaptation strategies—threshold calibration, head-only fine-tuning, and full fine-tuning—for cross-site histopathology classification.
- Strict patient-level separation for both internal and BreaKHis datasets to prevent information leakage.
- A magnification-aware analysis quantifying the trade-off between adaptation cost and diagnostic performance.
- A clinically oriented discussion on model interpretability and realistic deployment scenarios.
2. Related Work
- Most studies rely on a single adaptation strategy, typically full fine-tuning;
- Lightweight strategies such as threshold calibration or head-only fine-tuning remain underexplored.
- Few works systematically quantify performance variation across magnification levels.
- Robustness under domain shift is not consistently evaluated.
| Study/Approach | Model Type/Innovation | Dataset(s) | Advantages | Limitations |
|---|---|---|---|---|
| Spanhol et al. [8] | Baseline CNNs | BreaKHis | Establishes initial benchmarks | Limited depth; modest performance |
| Araujo et al. [9] | Deeper CNNs | BreaKHis | Improved representational capacity | Requires large datasets; trained from scratch |
| Bayramoglu et al. [10] | Multi-scale CNN | BreaKHis | Integrates multi-magnification cues | High computational cost |
| Transfer learning CNNs [11] | Pretrained ResNet/DenseNet | BreaKHis | Efficient training; strong accuracy | Focus on full fine-tuning |
| Structured DL models [12] | Structured/hierarchical CNNs | BreaKHis | Better modeling of tissue structure | Increased architecture complexity |
| Transformers [16,17,18,19,20] | ViT, Swin, graph-attention, FDT | Various histopathology datasets | SOTA performance; improved interpretability | Requires large datasets; limited cross-site testing |
| Feature fusion & ensembles [23,24,25,26] | CNN fusion, multi-branch, ensemble optimization | Various medical imaging datasets | Enhanced robustness; multi-feature integration | Heavy models; risk of overfitting |
| Radiology-driven approaches [21] | Multi-fractal + fusion | Mammography | Strong texture encoding | Not histopathology-specific |
| Stain normalization & adaptation [14,28] | Stain transfer, adversarial adaptation | Histopathology | Mitigates domain shift | Complex pipeline; higher compute |
| Self-supervised learning [15] | SSL pretraining | Medical Imaging Datasets | Strong features with few labels | Long training time |
| Campanella et al. [27] | Weakly supervised WSI classification | Whole-slide images | Demonstrates need for patient-level independence | Shows risk of inflated accuracy under patch-level splits |
3. Materials and Methods
3.1. Generalized Algorithm for Histopathological Image Classification
| Algorithm 1. Generalized Workflow for Histopathological Image Classification |
| Input: Histopathology dataset D = {(Ii, yi)} consisting of whole-slide images (WSIs) or pre-extracted patches Ii with class labels yi (e.g., benign/malignant) Output: Trained model M* and predicted labels ŷi for unseen samples Stage 1: Data preparation and splitting Acquire raw WSIs or image patches from one or more institutions Optionally perform stain normalization and artefact removal If WSIs are used: Tile each WSI into patches Discard background tiles Resize all patches to a fixed input resolution (e.g., 224 × 224) Split patients (not images) into train, validation, and test sets Ensure that no patient appears in more than one split (patient-level separation) Stage 2: Model initialization Choose a backbone architecture (e.g., CNN or Vision Transformer) Initialize the backbone with pretrained weights (e.g., ImageNet) or random weights Replace the final classification layer with a task-specific head (e.g., 2-class output) Select an adaptation strategy: (a) Threshold calibration only (b) Head-only fine-tuning (c) Full fine-tuning Stage 3: Training/adaptation If using threshold calibration: Apply the pretrained model to the target training/validation data Learn an optimal decision threshold on the validation set Else: Freeze or unfreeze layers according to the chosen adaptation strategy Train the model on the training set using a suitable loss (e.g., weighted cross-entropy) Monitor performance on the validation set Select the best checkpoint M* based on a validation metric (e.g., F1-score) Stage 4: Inference and evaluation Apply M* to the test set to obtain predicted probabilities pi Apply the chosen decision threshold to obtain final labels ŷi Compute evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC, PR-AUC) Optionally generate interpretability maps (e.g., Grad-CAM) for qualitative analysis Return: Trained model M* and performance metrics on the test set |
3.2. Proposed Training and Evaluation Procedure (Kaggle → BreaKHis)
| Algorithm 2. Proposed Training and Evaluation Procedure for Kaggle → BreaKHis Adap-tation |
| Input: ResNet50V2 architecture M (initialized with ImageNet weights), internal dataset D_Kaggle = {train, val, test}, external dataset D_BreaKHis = {train, val, test}, magnifications = {40×, 100×, 200×, 400×} # Stage 1: Baseline fine-tuning on internal dataset (Kaggle) Train M on D_Kaggle[train] using weighted cross-entropy loss Validate on D_Kaggle[val]; select best checkpoint M* Save M* as pretrained baseline for adaptation # Stage 2: Adaptation and evaluation on BreaKHis for each magnification m in magnifications do for each adaptation strategy s in {SiteCalib, LightFT, FullFT} do if s == SiteCalib: Apply M* without retraining; calibrate decision threshold on val[m] if s == LightFT: Freeze backbone; train classifier head for 5 epochs on train[m] if s == FullFT: Unfreeze all layers; train for 5 epochs on train[m] Select best checkpoint by validation F1 Optimize threshold on validation set Evaluate on test[m]; record ACC, F1, ROC-AUC, PR-AUC; generate Grad-CAM end for end for Output: Comparative performance and interpretability for all strategies |
3.3. Datasets
3.4. Model Architecture
- an initial stem (7 × 7 convolution + max pooling),
- four residual stages with bottleneck blocks (1 × 1 → 3 × 3 → 1 × 1 convolutions),
- global average pooling,
- a fully connected classification head.
- ResNet backbones have proven robust and widely adopted in computational pathology and biomedical imaging [4].
- The residual structure enables stable gradient propagation and reliable transfer learning even with limited training data [31].
- ResNet50V2 offers a favorable balance between representational depth and computational efficiency, making it suitable for both lightweight and full fine-tuning scenarios.
3.5. Training Strategies
- Site Calibration (Threshold Calibration): in this approach, the model pretrained on ImageNet is directly applied to the BreaKHis dataset. Only the classification threshold is recalibrated using the BreaKHis validation set, with no retraining of model weights. This setting simulates a realistic clinical scenario where model parameters cannot be updated and only decision calibration is feasible.
- Light Fine-Tuning: the backbone was frozen, and only the classifier head was retrained for five epochs on BreaKHis training data. The best checkpoint was selected on validation, and the threshold was recalibrated on validation.
- Full Fine-Tuning: all network layers were unfrozen and optimized jointly for five epochs on BreaKHis. The best model was again selected on validation, followed by threshold calibration.
3.6. Implementation Details
3.7. Evaluation Metrics
- TP = true positives
- TN = true negatives
- FP = false positives
- FN = false negatives
- F1-score captures performance under class imbalance, which is critical when malignant cases must not be missed.
- ROC-AUC provides a threshold-independent assessment of separability and is widely used for medical classifiers.
- PR-AUC is particularly sensitive to false positives and false negatives in imbalanced datasets.
- Combining ROC-AUC and PR-AUC allows robust evaluation of model discrimination under domain shift (Kaggle → BreaKHis).
3.8. Interpretability Analysis (Grad-CAM)
4. Results
4.1. Initial Dataset
4.2. Site Calibration on BreaKHis
4.3. Light Fine-Tuning
4.4. Full Fine-Tuning
4.5. Comparative Analysis
4.6. Grad-CAM Visualization
5. Discussion
5.1. Key Findings
5.2. Interpretation of Magnification-Dependent Behavior
5.3. Comparison with Prior Work
5.4. Clinical Applicability and Deployment Scenarios
5.5. Interpretability and Limitations of Grad-CAM
5.6. Limitations and Future Work
6. Conclusions
- We propose a unified, patient-level evaluation pipeline for cross-site breast histopathology classification, eliminating information leakage.
- We conduct the first systematic comparison of three adaptation strategies (threshold calibration, head-only fine-tuning, full fine-tuning) under identical domain-shift conditions and across four magnifications.
- We provide a clinically grounded analysis of malignant detection rates, demonstrating where lightweight adaptation is feasible and where full model retraining is required.
- We include a pathologist-reviewed interpretability assessment, highlighting fundamental limitations of Grad-CAM for clinical reasoning.
- We design a lightweight, reproducible cross-site adaptation framework that reflects realistic constraints in medical institutions.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AUC | Area Under the Curve |
| BalAcc | Balanced Accuracy |
| CAD | Computer-Aided Diagnosis |
| CNN | Convolutional Neural Network |
| F1w | F1-score (weighted) |
| FT | Fine-Tuning |
| Grad-CAM | Gradient-weighted Class Activation Mapping |
| MIL | Multiple Instance Learning |
| PR-AUC | Precision–Recall Area Under the Curve |
| ROC | Receiver Operating Characteristic |
| WSI | Whole-Slide Image |
References
- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. [Google Scholar] [CrossRef] [PubMed]
- Araujo, T.; Aresta, G.; Castro, E.; Rouco, J.; Aguiar, P.; Eloy, C.; Polónia, A.; Campilho, A. Classification of breast cancer histology images using convolutional neural networks. PLoS ONE 2017, 12, e0177544. [Google Scholar] [CrossRef]
- Bayramoglu, N.; Kannala, J.; Heikkilä, J. Deep learning for magnification independent breast cancer histopathology image classification. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2440–2445. [Google Scholar] [CrossRef]
- Deniz, E.; Şengür, A.; Kadiroğlu, Z.; Guo, Y.; Bajaj, V.; Budak, Ü. Transfer learning based histopathologic image classification for breast cancer detection. Health Inf. Sci. Syst. 2018, 6, 18. [Google Scholar] [CrossRef]
- Han, Z.; Wei, B.; Zheng, Y.; Yin, Y.; Li, K.; Li, S. Breast cancer multi-classification from histopathological images with structured deep learning model. Sci. Rep. 2017, 7, 4172. [Google Scholar] [CrossRef]
- Ilse, M.; Tomczak, J.M.; Welling, M. Attention-based Deep Multiple Instance Learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 2127–2136. [Google Scholar]
- Macenko, M.; Niethammer, M.; Marron, J.S.; Borland, D.; Woosley, J.T.; Guan, X.; Schmitt, C.; Thomas, N.E. A method for normalizing histology slides for quantitative analysis. In Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), Boston, MA, USA, 28 June–1 July 2009; pp. 1107–1110. [Google Scholar] [CrossRef]
- Azizi, S.; Mustafa, B.; Ryan, F.; Beaver, Z.; Freyberg, J.; Deaton, J.; Loh, A.; Karthikesalingam, A.; Kornblith, S.; Chen, T.; et al. Big self-supervised models advance medical image classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3478–3488. [Google Scholar] [CrossRef]
- Wang, Y.; Luo, F.; Yang, X.; Wang, Q.; Sun, Y.; Tian, S.; Feng, P.; Huang, P.; Xiao, H. The Swin-Transformer network based on focal loss is used to identify images of pathological subtypes of lung adenocarcinoma with high similarity and class imbalance. J. Cancer Res. Clin. Oncol. 2023, 149, 8581–8592. [Google Scholar] [CrossRef] [PubMed]
- Qu, Y.; Zhou, X.; Huang, P.; Liu, Y.; Mercaldo, F.; Santone, A.; Feng, P. CGAM: An end-to-end causality graph attention Mamba network for esophageal pathology grading. Biomed. Signal Process. Control 2025, 103, 107452. [Google Scholar] [CrossRef]
- Huang, P.; Xiao, H.; He, P.; Li, C.; Guo, X.; Tian, S.; Feng, P.; Chen, H.; Sun, Y.; Mercaldo, F.; et al. LA-ViT: A Network With Transformers Constrained by Learned-Parameter-Free Attention for Interpretable Grading in a New Laryngeal Histopathology Image Dataset. IEEE J. Biomed. Health Inform. 2024, 28, 3557–3570. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.; Feng, P.; Tian, S.; Xiao, H.; Mercaldo, F.; Santone, A.; Qin, J. A ViT-AMC Network With Adaptive Model Fusion and Multiobjective Optimization for Interpretable Laryngeal Tumor Grading From Histopathological Images. IEEE Trans. Med. Imaging 2023, 42, 15–28. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.; Luo, X. FDTs: A Feature Disentangled Transformer for Interpretable Squamous Cell Carcinoma Grading. IEEE/CAA J. Autom. Sin. 2025, 12, 2365–2367. [Google Scholar] [CrossRef]
- Zebari, D.A.; Ibrahim, D.A.; Zeebaree, D.Q.; Mohammed, M.A.; Haron, H.; Zebari, N.A.; Damaševičius, R.; Maskeliūnas, R. Breast cancer detection using mammogram images with improved multi-fractal dimension approach and feature fusion. Appl. Sci. 2021, 11, 12122. [Google Scholar] [CrossRef]
- Aldakhil, L.A.; Alhasson, H.F.; Alharbi, S.S. Attention-based deep learning approach for breast cancer histopathological image multi-classification. Diagnostics 2024, 14, 1402. [Google Scholar] [CrossRef] [PubMed]
- Loddo, A.; Usai, M.; Di Ruberto, C. Gastric cancer image classification: A comparative analysis and feature fusion strategies. J. Imaging 2024, 10, 195. [Google Scholar] [CrossRef] [PubMed]
- Çetin-Kaya, Y. Equilibrium optimization-based ensemble CNN framework for breast cancer multiclass classification using histopathological images. Diagnostics 2024, 14, 2253. [Google Scholar] [CrossRef] [PubMed]
- Amin, M.S.; Ahn, H. FabNet: A features agglomeration-based convolutional neural network for multiscale breast cancer histopathology image classification. Cancers 2023, 15, 1013. [Google Scholar] [CrossRef] [PubMed]
- Balasubramanian, A.A.; Al-Heejawi, S.M.A.; Singh, A.; Breggia, A.; Ahmad, B.; Christman, R.; Ryan, S.T.; Amal, S. Ensemble deep learning-based image classification for breast cancer subtype and invasiveness diagnosis from whole slide image histopathology. Cancers 2024, 16, 2222. [Google Scholar] [CrossRef] [PubMed]
- Campanella, G.; Hanna, M.G.; Geneslaw, L.; Miraflor, A.; Werneck Krauss Silva, V.; Busam, K.J.; Brogi, E.; Halpern, M.; Samboy, J.; Klimstra, D.S.; et al. Clinical-grade computational pathology using weakly supervised deep learning on whole-slide images. Nat. Med. 2019, 25, 1301–1309. [Google Scholar] [CrossRef] [PubMed]
- Kamnitsas, K.; Baumgartner, C.; Ledig, C.; Newcombe, V.; Simpson, J.; Kane, A.; Menon, D.; Nori, A.; Criminisi, A.; Rueckert, D.; et al. Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In Proceedings of the Information Processing in Medical Imaging (IPMI), Boone, NC, USA, 25–30 June 2017; pp. 597–609. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
- Teng, Q.; Liu, Z.; Song, Y.; Han, K.; Lu, Y. A Survey on the Interpretability of Deep Learning in Medical Diagnosis. Multimed. Syst. 2022, 28, 2335–2355. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar] [CrossRef]








| Split | Benign | Malignant | Total |
|---|---|---|---|
| Train | 137,750 | 137,750 | 275,500 |
| Validation | 28,994 | 9811 | 38,805 |
| Test | 31,994 | 12,185 | 44,179 |
| Magnification | Split | Benign | Malignant | Total |
|---|---|---|---|---|
| 40× | Train | 1288 | 2077 | 3365 |
| 40× | Validation | 272 | 384 | 656 |
| 40× | Test | 583 | 808 | 1391 |
| 100× | Train | 1068 | 2042 | 3110 |
| 100× | Validation | 254 | 368 | 622 |
| 100× | Test | 597 | 754 | 1351 |
| 200× | Train | 975 | 1781 | 2756 |
| 200× | Validation | 226 | 334 | 560 |
| 200× | Test | 575 | 731 | 1306 |
| 400× | Train | 870 | 1522 | 2392 |
| 400× | Validation | 211 | 312 | 523 |
| 400× | Test | 584 | 704 | 1288 |
| Subset | AUC | F1-Weighted |
|---|---|---|
| Train | 0.983 | 0.975 |
| Validation | 0.898 | 0.886 |
| Test | 0.876 | 0.854 |
| Magnification | ROC-AUC | F1w (Test) | BalAcc | Threshold | F1w (Val) |
|---|---|---|---|---|---|
| 40× | 0.6045 | 0.6816 | 0.5756 | 0.139 | 0.7139 |
| 100× | 0.6371 | 0.5790 | 0.5963 | 0.772 | 0.7754 |
| 200× | 0.7300 | 0.7189 | 0.6830 | 0.337 | 0.8436 |
| 400× | 0.6390 | 0.6344 | 0.6066 | 0.287 | 0.7461 |
| Magnification | ROC-AUC | F1w (Test) | BalAcc | Threshold | F1w (Val) |
|---|---|---|---|---|---|
| 40× | 0.6954 | 0.6845 | 0.6284 | 0.535 | 0.6803 |
| 100× | 0.6875 | 0.6772 | 0.6243 | 0.584 | 0.8152 |
| 200× | 0.7456 | 0.7305 | 0.6544 | 0.535 | 0.8306 |
| 400× | 0.6387 | 0.6395 | 0.5968 | 0.604 | 0.7852 |
| Magnification | ROC-AUC | F1w (Test) | BalAcc | Threshold | F1w (Val) |
|---|---|---|---|---|---|
| 40× | 0.9500 | 0.9306 | 0.9067 | 0.198 | 0.8884 |
| 100× | 0.9405 | 0.9127 | 0.9010 | 0.406 | 0.9509 |
| 200× | 0.9207 | 0.8953 | 0.8847 | 0.465 | 0.9226 |
| 400× | 0.9314 | 0.8631 | 0.8646 | 0.564 | 0.9056 |
| Magnification | ROC-AUC (SiteCalib) | F1w (SiteCalib) | ROC-AUC (LightFT) | F1w (LightFT) | ROC-AUC (FullFT) | F1w (FullFT) |
|---|---|---|---|---|---|---|
| 40× | 0.6045 | 0.6816 | 0.6954 | 0.6845 | 0.9500 | 0.9306 |
| 100× | 0.6371 | 0.5790 | 0.6875 | 0.6772 | 0.9405 | 0.9127 |
| 200× | 0.7300 | 0.7189 | 0.7456 | 0.7305 | 0.9207 | 0.8953 |
| 400× | 0.6390 | 0.6344 | 0.6387 | 0.6395 | 0.9314 | 0.8631 |
| Magnification | Strategy | Correct/Total | % Correct |
|---|---|---|---|
| 40× | Site Calibration | 755/808 | 93.4% |
| Light FT | 770/808 | 95.3% | |
| Full FT | 785/808 | 97.2% | |
| 100× | Site Calibration | 690/754 | 91.5% |
| Light FT | 712/754 | 94.4% | |
| Full FT | 723/754 | 95.9% | |
| 200× | Site Calibration | 648/731 | 88.6% |
| Light FT | 669/731 | 91.5% | |
| Full FT | 689/731 | 94.2% | |
| 400× | Site Calibration | 656/704 | 93.2% |
| Light FT | 674/704 | 95.7% | |
| Full FT | 690/704 | 98.0% |
| Magnification | Strategy | Sensitivity | Specificity | ROC-AUC |
|---|---|---|---|---|
| 40× | Site Calibration | 0.905 | 0.602 | 0.6045 |
| Light FT | 0.959 | 0.782 | 0.6954 | |
| Full FT | 0.964 | 0.911 | 0.9500 | |
| 100× | Site Calibration | 0.509 | 0.735 | 0.6371 |
| Light FT | 0.926 | 0.856 | 0.6875 | |
| Full FT | 0.926 | 0.901 | 0.9332 | |
| 200× | Site Calibration | 0.731 | 0.735 | 0.7300 |
| Light FT | 0.859 | 0.880 | 0.7456 | |
| Full FT | 0.942 | 0.920 | 0.9558 | |
| 400× | Site Calibration | 0.641 | 0.625 | 0.6421 |
| Light FT | 0.703 | 0.671 | 0.6458 | |
| Full FT | 0.980 | 0.945 | 0.9832 |
| Study | Method/Model | Split Level | Reported Metric (Best) | Our Results (Full FT, Patient-Level) |
|---|---|---|---|---|
| Spanhol et al. (2016) [8] | Baseline CNN | Image-level | Accuracy ≈ 77–83% | – |
| Araujo et al. (2017) [9] | CNN trained from scratch | Image-level | Accuracy ≈ 83% | – |
| Bayramoglu et al. (2016) [10] | Multi-scale CNN | Image-level | Accuracy ≈ 84% | – |
| Han et al. (2017) [12] | ResNet/DenseNet fine-tuned | Image-level | AUC ≈ 0.85–0.90 | – |
| Aldakhil et al. (2024) [22] | Attention-based CNN | Image-level | Accuracy > 90% | – |
| Çetin-Kaya (2023) [24] | Ensemble CNN +optimization | Image-level | Accuracy ≈ 91–93% | - |
| Amin & Ahn (2022) [25] | FabNet (multiscale feature aggregation) | Image-level | AUC ≈ 0.90 | - |
| Balasubramanian et al. (2023) [26] | Ensemble DL | Image-level | Accuracy ≈ 92% | - |
| This work (2025) | ResNet50V2 Full Fine-Tuning | Patient-level | ROC-AUC = 0.92–0.95, F1w = 0.86–0.93 | Best: 40× (ROC-AUC = 0.95, F1w = 0.93) |
| Study | Method/Model | Split Level | Reported Metric (Best) |
|---|---|---|---|
| Swin-Transformer (2023) [16] | Transformer with focal loss for lung adenocarcinoma subtype classification | Image-level | AUC ≈ 0.93–0.95 |
| ViT-AMC (2024) [19] | Vision Transformer with adaptive model fusion for laryngeal tumor grading | Image-level | AUC ≈ 0.94 |
| FDTs (2024) [20] | Feature Disentangled Transformer for squamous cell carcinoma grading | Image-level | AUC ≈ 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Stanescu, L.; Stoica-Spahiu, C. Clinically Oriented Evaluation of Transfer Learning Strategies for Cross-Site Breast Cancer Histopathology Classification. Appl. Sci. 2025, 15, 12819. https://doi.org/10.3390/app152312819
Stanescu L, Stoica-Spahiu C. Clinically Oriented Evaluation of Transfer Learning Strategies for Cross-Site Breast Cancer Histopathology Classification. Applied Sciences. 2025; 15(23):12819. https://doi.org/10.3390/app152312819
Chicago/Turabian StyleStanescu, Liana, and Cosmin Stoica-Spahiu. 2025. "Clinically Oriented Evaluation of Transfer Learning Strategies for Cross-Site Breast Cancer Histopathology Classification" Applied Sciences 15, no. 23: 12819. https://doi.org/10.3390/app152312819
APA StyleStanescu, L., & Stoica-Spahiu, C. (2025). Clinically Oriented Evaluation of Transfer Learning Strategies for Cross-Site Breast Cancer Histopathology Classification. Applied Sciences, 15(23), 12819. https://doi.org/10.3390/app152312819

