Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Sources and Search Strategy
2.2. Eligibility Criteria
- Population: Patients with biopsy-proven high-risk (B3 or equivalent) breast lesions diagnosed via core-needle or vacuum-assisted biopsy.
- Index model: Development or validation of an AI/ML model intended to predict malignant upgrades (DCIS or invasive carcinoma) upon making a surgical excision and/or malignant outcomes at a follow-up. Models could use imaging-derived inputs, conventional radiologic descriptors, clinical variables, pathological variables, or combinations.
- Sample size: ≥20 high-risk/B3 lesions.
- Reference standard: Surgical pathology or ≥24-month imaging follow-up for lesions not excised.
- Outcomes: Reported or derivable diagnostic-performance data. For quantitative pooling of predictive values, studies had to provide enough information at a stated operating point (threshold) to allow derivation of a 2 × 2 table (TP/FP/TN/FN) or directly report PPV/NPV with denominators.
2.3. Study Selection and Data Extraction
- Study characteristics: country, design, enrollment period, setting, inclusion criteria.
- Cohort details: number of lesions, lesion subtype mix (e.g., ADH vs. mixed B3), biopsy method, and upgrade prevalence.
- Model details: predictors used (pathological, descriptors and imaging-derived features), algorithm used (random forest, SVM, etc.), and validation approach.
- Operating point: how the threshold was chosen (e.g., fixed predicted-risk cut-off or sensitivity-targeted cut-off).
2.4. Risk of Bias and Applicability
2.5. Statistical Analysis
- Test positive: model recommends excision/“high risk”.
- Test negative: model supports surveillance/“low risk”.
- From each study, we derived (or extracted) TP/FP/TN/FN counts at the stated operating point and calculated the following:
- PPV—upgraded cancers among predicted-excision lesions (surgical yield).
- NPV—non-upgraded lesions among predicted-surveillance lesions (rule-out reassurance).
3. Results
3.1. Study Selection
3.2. Study Characteristics
- Bahl [22]: a machine-learning model using structured clinical/imaging-pathology variables (and report-derived features) with a low risk threshold (e.g., >5%) intended to prioritize sensitivity.
- Harrington [23]: An ML model for ADH upgrades, reported at an operating point targeting very high sensitivity.
- Aslan [24]: ML classifiers using clinical and radiologic descriptors. The selected SVM operating point emphasized specificity, producing a very low false-positive rate.
3.3. Risk of Bias and Applicability
3.4. Predictive Performance at Study-Selected Operating Points
- PPV ranged from 0.15 to 1.00;
- NPV ranged from 0.89 to 0.99.
- In high-sensitivity settings (Bahl; Harrington), NPV was high, but specificity was low and the model recommended excision for most lesions (high false-positive burden).
- In the high-specificity setting (Aslan), PPV was very high, but sensitivity was substantially lower, with more missed upgrades.
3.5. Meta-Analysis of PPV
3.6. Meta-Analysis of NPV
3.7. AUC
4. Discussion
4.1. Principal Findings
4.2. Should NPV Be Emphasized?
- Primary (safety): sensitivity/missed-upgrade rate plus NPV (confidence in surveillance recommendations);
- Secondary (burden/yield): PPV plus the implied excision rate (how many patients the model would send to surgery);
- Contextual (threshold-free): AUC (and calibration, if available).
4.3. Why PPV Was So Variable
4.4. Clinical Implications
- The excision rate implied by the chosen threshold;
- The missed-upgrade count and proportion (false negatives);
- Calibration (so predicted probabilities reflect observed risks);
- Clinical-utility analyses such as decision-curve analysis across plausible threshold ranges.
4.5. Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Forester, N.D.; Lowes, S.; Mitchell, E.; Twiddy, M. High risk (B3) breast lesions: What is the incidence of malignancy for individual lesion subtypes? A systematic review and meta-analysis. Eur. J. Surg. Oncol. 2019, 45, 519–527. [Google Scholar] [CrossRef] [PubMed]
- Rageth, C.J.; O’Flynn, E.A.M.; Pinker, K.; Kubik-Huch, R.A.; Mundinger, A.; Decker, T.; Tausch, C.; Dammann, F.; Baltzer, P.A.; Fallenberg, E.M.; et al. Second International Consensus Conference on lesions of uncertain malignant potential in the breast (B3 lesions). Breast Cancer Res. Treat. 2019, 174, 279–296. [Google Scholar] [CrossRef]
- Elfgen, C.; Leo, C.; Kubik-Huch, R.A.; Muenst, S.; Schmidt, N.; Quinn, C.; McNally, S.; van Diest, P.J.; Mann, R.M.; Bago-Horvath, Z.; et al. Third International Consensus Conference on lesions of uncertain malignant potential in the breast (B3 lesions). Virchows Arch. 2023, 483, 5–20. [Google Scholar] [CrossRef]
- D’archi, S.; Carnassale, B.; Sanchez, A.M.; Accetta, C.; Belli, P.; De Lauretis, F.; Di Guglielmo, E.; Di Leone, A.; Franco, A.; Magno, S.; et al. Navigating the uncertainty of B3 breast lesions: Diagnostic challenges and evolving management strategies. J. Pers. Med. 2025, 15, 36. [Google Scholar] [CrossRef]
- American Society of Breast Surgeons. Resource Guide: Surgical Management of Benign or High-Risk Lesions; American Society of Breast Surgeons: Columbia, MD, USA, 2024. [Google Scholar]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
- Corsi, F.; Cabri, G.F.; Albasini, S.; Bossi, D.; Truffi, M. Management of B3 breast lesions: Potential clinical implications from a retrospective study conducted in an accredited Breast Unit following the 2024 EUSOMA guidelines. Eur. J. Surg. Oncol. 2025, 51, 109579. [Google Scholar] [CrossRef]
- Romeo, V.; Cuocolo, R.; Apolito, R.; Stanzione, A.; Ventimiglia, A.; Vitale, A.; Verde, F.; Accurso, A.; Amitrano, M.; Insabato, L.; et al. Clinical value of radiomics and machine learning in breast ultrasound: A multicenter study for differential diagnosis of benign and malignant lesions. Eur. Radiol. 2021, 31, 9511–9519. [Google Scholar] [CrossRef]
- Hussain, S.; Lafarga-Osuna, Y.; Ali, M.; Naseem, U.; Ahmed, M.; Tamez-Peña, J.G. Deep learning, radiomics and radiogenomics applications in digital breast tomosynthesis: A systematic review. BMC Bioinform. 2023, 24, 259. [Google Scholar] [CrossRef] [PubMed]
- Altabella, L.; Benetti, G.; Camera, L.; Cardano, G.; Montemezzi, S.; Cavedon, C. Machine learning for multi-parametric breast MRI: Radiomics-based approaches for lesion classification. Phys. Med. Biol. 2022, 67, TR01. [Google Scholar] [CrossRef]
- Qi, Y.-J.; Su, G.-H.; You, C.; Zhang, X.; Xiao, Y.; Jiang, Y.-Z.; Shao, Z.-M. Radiomics in breast cancer: Current advances and future directions. Cell Rep. Med. 2024, 5, 101719. [Google Scholar] [CrossRef] [PubMed]
- Wolff, R.F.; Moons, K.G.M.; Riley, R.; Whiting, P.F.; Westwood, M.; Collins, G.S.; Reitsma, J.B.; Kleijnen, J.; Mallett, S. for the PROBAST Group. PROBAST: A tool to assess risk of bias and applicability of prediction model studies. Ann. Intern. Med. 2019, 170, 51–58. [Google Scholar] [CrossRef] [PubMed]
- Moons, K.G.M.; Wolff, R.F.; Riley, R.D.; Whiting, P.F.; Westwood, M.; Collins, G.S.; Reitsma, J.B.; Kleijnen, J.; Mallett, S. PROBAST: Explanation and elaboration. Ann. Intern. Med. 2019, 170, W1–W33. [Google Scholar] [CrossRef]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
- Hanley, J.A.; McNeil, B.J. A method of comparing the areas under ROC curves derived from the same cases. Radiology 1983, 148, 839–843. [Google Scholar] [CrossRef] [PubMed]
- DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Control. Clin. Trials 1986, 7, 177–188. [Google Scholar] [CrossRef]
- IntHout, J.; Ioannidis, J.P.A.; Borm, G.F. The Hartung–Knapp–Sidik–Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian–Laird method. BMC Med. Res. Methodol. 2014, 14, 25. [Google Scholar] [CrossRef]
- Röver, C.; Knapp, G.; Friede, T. Hartung–Knapp–Sidik–Jonkman approach and its modification for random-effects meta-analysis with few studies. BMC Med. Res. Methodol. 2015, 15, 99. [Google Scholar] [CrossRef]
- Egger, M.; Davey Smith, G.; Schneider, M.; Minder, C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997, 315, 629–634. [Google Scholar] [CrossRef]
- Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 2010, 36, 1–48. [Google Scholar] [CrossRef]
- Bahl, M.; Barzilay, R.; Yedidia, A.B.; Locascio, N.J.; Yu, L.; Lehman, C.D. High-Risk Breast Lesions: A Machine Learning Model to Predict Pathologic Upgrade and Reduce Unnecessary Surgical Excision. Radiology 2018, 286, 810–818. [Google Scholar] [CrossRef] [PubMed]
- Harrington, L.; diFlorio-Alexander, R.; Trinh, K.; MacKenzie, T.; Suriawinata, A.; Hassanpour, S. Prediction of Atypical Ductal Hyperplasia Upgrades Through a Machine Learning Approach to Reduce Unnecessary Surgical Excisions. JCO Clin. Cancer Inform. 2018, 2, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Aslan, Ö.; Oktay, A.; Katuk, B.; Erdur, R.C.; Dikenelli, O.; Yeniay, L.; Zekioğlu, O.; Özbek, S.S. Prediction of malignancy upgrade rate in high-risk breast lesions using an artificial intelligence model: A retrospective study. Diagn. Interv. Radiol. 2023, 29, 260–267. [Google Scholar] [CrossRef]
- Ye, D.M.; Wang, H.T.; Yu, T. The application of radiomics in breast MRI: A review. Technol. Cancer Res. Treat. 2020, 19, 1533033820916191. [Google Scholar] [CrossRef] [PubMed]

| Study | Target Population | Total Lesions (n) | Upgraded to Cancer, n (%) | Non-Upgrade (n) |
|---|---|---|---|---|
| Bahl et al. (2017) [22] | High-risk breast lesions (HRLs) found via image-guided core biopsy | 1006 | 115 (11.4%) | 891 |
| Harrington et al. (2018) [23] | Atypical ductal hyperplasia (ADH) found via core needle biopsy with surgical excision outcomes | 128 | 30 (23.4%) | 98 |
| Aslan et al. (2023) [24] | High-risk breast lesions (HRLs), mixed subtypes | 94 | 23 (24.5%) | 71 |
| Study | Model | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| Bahl et al. [22] | Random forest | 0.97 (37/38) | 0.31 (91/297) | 0.15 (37/243) | 0.99 (91/92) |
| Harrington et al. [23] | Random forest | 0.98 | 0.16 | 0.26 | 0.96 |
| Aslan et al. [24] | SVM | 0.61 | 1.00 | 1.00 | 0.89 |
| Study | True Negatives in Predicted Surveillance (TN) | Total Predicted Surveillance (TN + FN) | NPV | 95% CI (Lower) | 95% CI (Upper) | |
|---|---|---|---|---|---|---|
| Bahl et al. (2017) [22] | 91 | 92 | 0.989 | 0.941 | 1.000 | |
| Harrington et al. (2018) [23] | 16 | 17 | 0.941 | 0.713 | 0.999 | |
| Aslan et al. (2023) [24] | 71 | 80 | 0.888 | 0.797 | 0.947 | |
| Summary | k | Pooled NPV | 95% CI (Lower) | 95% CI (Upper) | I2 (%) | τ2 |
| Random-effects pooled NPV (AI-only; Bahl + Harrington + Aslan) | 3 | 0.948 | 0.810 | 0.987 | 63.1 | 1.039 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ferre, R.; Kuzmiak, C.M. Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis. Diagnostics 2026, 16, 75. https://doi.org/10.3390/diagnostics16010075
Ferre R, Kuzmiak CM. Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis. Diagnostics. 2026; 16(1):75. https://doi.org/10.3390/diagnostics16010075
Chicago/Turabian StyleFerre, Romuald, and Cherie M. Kuzmiak. 2026. "Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis" Diagnostics 16, no. 1: 75. https://doi.org/10.3390/diagnostics16010075
APA StyleFerre, R., & Kuzmiak, C. M. (2026). Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis. Diagnostics, 16(1), 75. https://doi.org/10.3390/diagnostics16010075

