Density-Aware Multi-Dataset Evaluation of Deep Learning for Mammographic Mass Detection and BI-RADS Classification
Abstract
1. Introduction
2. Materials and Methods
2.1. Publicly Available Datasets
| Dataset | # Patients | # Images | # Masses | BI-RADS | ACR |
|---|---|---|---|---|---|
| CBIS-DDSM [11] | 2620 | 10,480 | 1698 | Yes | Yes |
| INBREAST [45] | 115 | 410 | 108 | Yes | Yes |
| VINDr-Mammo [47] | 5000 | 20,000 | 1205 | Yes | Yes |
| DMID [46] | 510 | 2040 | 469 | Yes | Yes |
| Mass-Bench | 8245 | 32,930 | 3480 | Yes | Yes |
2.2. Canonical Schema and Label Harmonization
- dataset: source dataset identifier.
- image_id: globally unique image identifier.
- patient_id: anonymized patient-level identifier.
- view: mammographic projection (CC or MLO).
- laterality: breast side (left or right).
- bbox_x, bbox_y, bbox_w, bbox_h: normalized bounding box coordinates.
- acr_mc: multi-class ACR density (1–4).
- acr_bin: binary ACR grouping (1–2 vs. 3–4).
- birads_mc: multi-class BI-RADS category.
- birads_bin: binary BI-RADS grouping (<4 vs. ≥4).
2.3. Mass-Bench Construction and Clinical Balancing
2.4. Experimental Framework
2.4.1. Pre-Processing
2.4.2. Automated Mass Localization via YOLO
2.4.3. ROI Generation
2.4.4. Feature Extraction
2.4.5. Classification
2.5. Evaluation Metrics
3. Results
3.1. Performance of Mass Localization Across ACR Breast Density Grades
3.2. Breast Density Imbalance on Localization Performance
3.3. Overall Localization Performance on the Mass-Bench
ACR Breast Density Classification
3.4. BI-RADS Classification Performance
4. Discussion
| Report | Year | Model | Dataset | ACR-Balanced | Precision | Recall | mAP@50 | mAP@50-95 |
|---|---|---|---|---|---|---|---|---|
| [53] | 2022 | YOLOv5-L6 | CBIS-DDSM | ✕ | - | - | 0.65 | – |
| INBREAST | ✕ | - | - | 0.61 | – | |||
| [54] | 2024 | YOLOv5s | CBIS-DDSM | ✕ | 0.85 | 0.72 | 0.83 | - |
| [71] | 2025 | Improved YOLOv10 | CBIS-DDSM | ✕ | 0.84 | 0.86 | 0.85 | - |
| [72] | 2025 | YOLOv5n | CBIS-DDSM | ✕ | - | - | 0.50 | 0.20 |
| INBREAST | ✕ | - | - | 0.68 | 0.31 | |||
| [55] | 2025 | MANGA-YOLO | CBIS-DDSM | ✕ | 0.69 | 0.71 | 0.66 | 0.27 |
| INBREAST | ✕ | 0.91 | 0.79 | 0.88 | 0.56 | |||
| VinDr-Mammo | ✕ | 0.68 | 0.70 | 0.69 | 0.34 | |||
| VinDr-Mammo + CBIS-DDSM | ✕ | 0.88 | 0.36 | 0.36 | 0.07 | |||
| [56] | 2025 | YOLOv12-L | INBREAST | ✕ | 0.98 | 0.85 | 0.96 | - |
| CBIS-DDSM | ✕ | 0.61 | 0.55 | 0.56 | - | |||
| VinDr-Mammo | ✕ | 0.73 | 0.51 | 0.59 | - | |||
| CBIS-DDSM + VinDr-Mammo + INBREAST | ✕ | 0.71 | 0.59 | 0.63 | - | |||
| RTMDet-X | CBIS-DDSM + VinDr-Mammo + INBREAST | ✕ | 0.73 | 0.65 | 0.68 | - | ||
| Ours | 2026 | YOLOv11 | Mass-Bench | ✓ | 0.71 | 0.99 | 0.66 | 0.33 |
| Report | Year | Model | Dataset | Task | Acc (Bin) | Acc (Multi) | F1 (Bin) | F1 (Multi) | AUC |
|---|---|---|---|---|---|---|---|---|---|
| [79] | 2022 | Stacked ensemble of ResNet models | CBIS-DDSM | BI-RADS (2–6) on detected/segmented masses | – | 0.85 | – | – | 0.94 |
| [79] | 2022 | Stacked ensemble of ResNet models | INBREAST | BI-RADS (2–6) on detected/segmented masses | – | 0.99 | – | – | 1.00 |
| [35] | 2022 | Deep neural network | Clinical cohort | BI-RADS (0,1,2,3,4A,4B,4C,5) | – | 0.94 | – | 0.95 | 0.97 |
| [81] | 2022 | Deep CNN | Clinical dataset | BI-RADS (1 vs. 2/3 vs. 4/5) on manually cropped ROIs | – | 0.90 | – | – | – |
| [82] | 2025 | Deep learning model | Multicenter clinical cohort | BI-RADS (3 vs. 4A) | 0.80 | – | – | – | 0.74 |
| [80] | 2025 | Explainable multi-task CAD | CBIS-DDSM | BI-RADS assessment (B2–B5) on whole mammograms | – | – | – | 0.77 | 0.92 |
| [80] | 2025 | Explainable multi-task CAD | CDD-CESM | BI-RADS assessment (B1–B5) on whole mammograms | – | – | – | 0.78 | 0.92 |
| [80] | 2025 | Explainable multi-task CAD | INBREAST | BI-RADS assessment (B1–B5) on whole mammograms | – | – | – | 0.83 | 0.97 |
| [83] | 2024 | InceptionResNetV2 | RSNA + VinDr-Mammo | BI-RADS (0 vs. 1–2 vs. 4–5) | – | – | – | – | – |
| Ours | 2026 | ML + handcrafted/deep features | Mass-Bench | BI-RADS binary and multiclass (2–5) | 0.90 | 0.84 | 0.90 | 0.82 | – |
Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
- DeSantis, C.E.; Ma, J.; Gaudet, M.M.; Newman, L.A.; Miller, K.D.; Sauer, A.G.; Jemal, A.; Siegel, R.L. Breast cancer statistics, 2019. CA Cancer J. Clin. 2019, 69, 438–451. [Google Scholar] [CrossRef]
- Ren, W.; Chen, M.; Qiao, Y.; Zhao, F. Global guidelines for breast cancer screening: A systematic review. Breast 2022, 64, 85–99. [Google Scholar] [CrossRef]
- Samala, R.K.; Chan, H.P.; Hadjiiski, L.; Helvie, M.A.; Wei, J.; Cha, K. Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography. Med. Phys. 2016, 43, 6654–6666. [Google Scholar] [CrossRef]
- Surendiran, B.; Ramanathan, P.; Vadivel, A. Effect of BIRADS shape descriptors on breast cancer analysis. Int. J. Med. Eng. Inform. 2015, 7, 65–79. [Google Scholar] [CrossRef]
- American College of Radiology. ACR BI-RADS Atlas: Breast Imaging Reporting and Data System, 5th ed.; American College of Radiology: Reston, VA, USA, 2013. [Google Scholar]
- Couture, H.D.; Williams, L.A.; Geradts, J.; Nyante, S.J.; Butler, E.N.; Marron, J.S.; Perou, C.M.; Troester, M.A.; Niethammer, M. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. npj Breast Cancer 2018, 4, 30. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Meng, X.; Wang, T.; Tang, Y.; Yin, Y. Breast masses in mammography classification with local contour features. BioMed. Eng. OnLine 2017, 16, 44. [Google Scholar] [CrossRef]
- Bodewes, F.T.; van Asselt, A.A.; Dorrius, M.D.; Greuter, M.J.; de Bock, G.H. Mammographic breast density and the risk of breast cancer: A systematic review and meta-analysis. Breast 2022, 66, 62–68. [Google Scholar] [CrossRef] [PubMed]
- Mann, R.M.; Athanasiou, A.; Baltzer, P.A.; Camps-Herrero, J.; Clauser, P.; Fallenberg, E.M.; Forrai, G.; Fuchsjäger, M.H.; Helbich, T.H.; Killburn-Toppin, F.; et al. Breast cancer screening in women with extremely dense breasts recommendations of the European Society of Breast Imaging (EUSOBI). Eur. Radiol. 2022, 32, 4036–4045. [Google Scholar] [CrossRef] [PubMed]
- Lee, R.S.; Gimenez, F.; Hoogi, A.; Miyake, K.K.; Gorovoy, M.; Rubin, D.L. A curated mammography data set for use in computer-aided detection and diagnosis research. Sci. Data 2017, 4, 170177. [Google Scholar] [CrossRef]
- Crivellé, M.S.I. An approach to breast density. Rev. Senol. Patol. Mamar. 2014, 27, 138–142. [Google Scholar] [CrossRef]
- Dhungel, N.; Carneiro, G.; Bradley, A.P. Automated Mass Detection in Mammograms Using Cascaded Deep Learning and Random Forests. In Proceedings of the 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
- Agarwal, R.; Díaz, O.; Yap, M.H.; Lladó, X.; Martí, R. Deep learning for mass detection in Full Field Digital Mammograms. Comput. Biol. Med. 2020, 121. [Google Scholar] [CrossRef]
- Hassan, N.M.; Hamad, S.; Mahar, K. Mammogram breast cancer CAD systems for mass detection and classification: A review. Multimed. Tools Appl. 2022, 81, 20043–20075. [Google Scholar] [CrossRef]
- Ballard, D.; Lecun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition; Technical Report; MIT Press: Cambridge, MA, USA, 1989. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Technical Report; IEEE: New York, NY, USA, 2015. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision; Technical Report; IEEE: New York, NY, USA, 2015. [Google Scholar]
- Shen, L.; Margolies, L.R.; Rothstein, J.H.; Fluder, E.; McBride, R.; Sieh, W. Deep learning to improve breast cancer detection on screening mammography. Sci. Rep. 2019, 9, 12495. [Google Scholar] [CrossRef] [PubMed]
- Medeiros, A.; Ohata, E.F.; Silva, F.H.; Rego, P.A.; Filho, P.P.R. An approach to bi-rads uncertainty levels classification via deep learning with transfer learning technique. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS); Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2020; pp. 603–608. [Google Scholar] [CrossRef]
- Kolb, T.M.; Lichy, J.; Newhouse, J.H. Comparison of the Performance of Screening Mammography, Physical Examination, and Breast US and Evaluation of Factors that Influence Them: An Analysis of 27,825 Patient Evaluations. Radiology 2002, 225, 165–175. [Google Scholar] [CrossRef] [PubMed]
- Mandelson, M.T. Breast Density as a Predictor of Mammographic Detection: Comparison of Interval- and Screen-Detected Cancers. J. Natl. Cancer Inst. 2000, 92, 1081–1087. [Google Scholar] [CrossRef]
- Centers for Disease Control and Prevention. About Dense Breasts. 2024. Available online: https://www.cdc.gov/breast-cancer/about/dense-breasts.html (accessed on 15 March 2026).
- Nazari, S.S.; Mukherjee, P. An overview of mammographic density and its association with breast cancer. Breast Cancer 2018, 25, 259–267. [Google Scholar] [CrossRef] [PubMed]
- Campanini, R.; Dongiovanni, D.; Iampieri, E.; Lanconelli, N.; Masotti, M.; Palermo, G.; Riccardi, A.; Roffilli, M. A novel featureless approach to mass detection in digital mammograms based on support vector machines. Phys. Med. Biol. 2004, 49, 961–975. [Google Scholar] [CrossRef]
- Heath, M.; Bowyer, K.; Kopans, D.; Moore, R.; Kegelmeyer, P. The Digital Database for Screening Mammography; Technical Report; Springer: Dordrecht, The Netherlands, 1998. [Google Scholar]
- Xiong, S.; Lu, J. Mass detection in digital mammograms using twin support vector machine-based CAD system. In Proceedings of the 2009 WASE International Conference on Information Engineering, ICIE 2009; IEEE: New York, NY, USA, 2009; Volume 1, pp. 240–243. [Google Scholar] [CrossRef]
- Ertosun, M.G.; Rubin, D.L. Probabilistic Visual Search for Masses within Mammography Images Using Deep Learning. In Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); IEEE: New York, NY, USA; 2015, pp. 1310–1315. [CrossRef]
- Al-masni, M.A.; Al-antari, M.A.; Park, J.M.; Gi, G.; Kim, T.Y.; Rivera, P.; Valarezo, E.; Choi, M.T.; Han, S.M.; Kim, T.S. Simultaneous detection and classification of breast masses in digital mammograms via a deep learning YOLO-based CAD system. Comput. Methods Programs Biomed. 2018, 157, 85–94. [Google Scholar] [CrossRef]
- Al-antari, M.A.; Al-masni, M.A.; Choi, M.T.; Han, S.M.; Kim, T.S. A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inform. 2018, 117, 44–54. [Google Scholar] [CrossRef]
- Al-antari, M.A.; Han, S.M.; Kim, T.S. Evaluation of deep learning detection and classification towards computer-aided diagnosis of breast lesions in digital X-ray mammograms. Comput. Methods Programs Biomed. 2020, 196, 105584. [Google Scholar] [CrossRef]
- Rahman, M.M.; Jahangir, M.Z.B.; Rahman, A.; Akter, M.; Nasim, M.A.A.; Gupta, K.D.; George, R. Breast Cancer Detection and Localizing the Mass Area Using Deep Learning. Big Data Cogn. Comput. 2024, 8, 80. [Google Scholar] [CrossRef]
- Baccouche, A.; Garcia-Zapirain, B.; Olea, C.C.; Elmaghraby, A.S. Breast lesions detection and classification via YOLO-based fusion models. Comput. Mater. Contin. 2021, 69, 1407–1425. [Google Scholar] [CrossRef]
- Keller, B.M.; Nathan, D.L.; Wang, Y.; Zheng, Y.; Gee, J.C.; Conant, E.F.; Kontos, D. Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation. Med. Phys. 2012, 39, 4903–4917. [Google Scholar] [CrossRef]
- Tsai, K.J.; Chou, M.C.; Li, H.M.; Liu, S.T.; Hsu, J.H.; Yeh, W.C.; Hung, C.M.; Yeh, C.Y.; Hwang, S.H. A High-Performance Deep Neural Network Model for BI-RADS Classification of Screening Mammography. Sensors 2022, 22, 1160. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, H.T.X.; Tran, S.B.; Nguyen, D.B.; Pham, H.H.; Nguyen, H.Q. A Novel Multi-View Deep Learning Approach for BI-RADS and Density Assessment of Mammograms. arXiv 2022. [Google Scholar] [CrossRef]
- Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef] [PubMed]
- Islam, T.; Hafiz, M.S.; Jim, J.R.; Kabir, M.M.; Mridha, M. A systematic review of deep learning data augmentation in medical imaging: Recent advances and future research directions. Healthc. Anal. 2024, 5, 100340. [Google Scholar] [CrossRef]
- Velarde, O.M.; Lin, C.; Eskreis-Winkler, S.; Parra, L.C. Robustness of deep networks for mammography: Replication across public datasets. J. Imaging Inform. Med. 2024, 37, 536–546. [Google Scholar] [CrossRef]
- Zafari, Y.; Pan, H.; Durak, G.; Bagci, U.; Rashed, E.A.; Mabrok, M. MammoClean: Toward Reproducible and Bias-Aware AI in Mammography through Dataset Harmonization. arXiv 2025, arXiv:2511.02400. [Google Scholar]
- Pan, H.; Durak, G.; Aktas, H.E.; Bejar, A.M.; Tutun, B.; Uysal, E.; Bulbul, E.; Dogan, M.F.; Erok, B.; Yildirim, B.A.; et al. LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol. arXiv 2026, arXiv:2603.14644. [Google Scholar] [CrossRef]
- Añez, D.; Conti, G.; Uriarte, J.J.; Serrano-Olmedo, J.J.; Martínez-Murillo, R.; Casanova-Carvajal, O. Artificial Intelligence Pipeline for Mammography-Based Breast Cancer Analysis. Medicina 2025, 61, 2237. [Google Scholar] [CrossRef]
- Xu, Z.; Li, J.; Yao, Q.; Li, H.; Zhao, M.; Zhou, S.K. Addressing fairness issues in deep learning-based medical image analysis: A systematic review. npj Digit. Med. 2024, 7, 286. [Google Scholar] [CrossRef] [PubMed]
- Chegini, M.; Mahloojifar, A. Uncertainty-aware deep learning-based CAD system for breast cancer classification using ultrasound and mammography images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2024, 12, 2297983. [Google Scholar] [CrossRef]
- Moreira, I.C.; Amaral, I.; Domingues, I.; Cardoso, A.; Cardoso, M.J.; Cardoso, J.S. INbreast: Toward a Full-field Digital Mammographic Database. Acad. Radiol. 2012, 19, 236–248. [Google Scholar] [CrossRef]
- Oza, P.; Oza, R.; Oza, U.; Sharma, P.; Patel, S.; Kumar, P.O. Digital mammography Dataset for Breast Cancer Diagnosis Research (DMID). Biomed. Eng. Lett. 2023, 14, 317–330. [Google Scholar] [CrossRef]
- Nguyen, H.T.; Nguyen, H.Q.; Pham, H.H.; Lam, K.; Le, L.T.; Dao, M.; Vu, V. VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Sci. Data 2023, 10, 277. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2019; pp. 658–666. [Google Scholar]
- Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems IV; Heckbert, P.S., Ed.; Academic Press: Cambridge, MA, USA, 1994; pp. 474–485. [Google Scholar]
- Suradi, S.H.; Abdullah, K.A.; Mat Isa, N.A. Improvement of image enhancement for mammogram images using fuzzy anisotropic diffusion histogram equalisation contrast adaptive limited (fadhecal). Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022, 10, 67–75. [Google Scholar] [CrossRef]
- Oyelade, O.N.; Ezugwu, A.E. A novel wavelet decomposition and transformation convolutional neural network with data augmentation for breast cancer detection using digital mammogram. Sci. Rep. 2022, 12, 5913. [Google Scholar] [CrossRef]
- Fusco, R.; Granata, V.; Vallone, P.; Petrosino, T.; Iasevoli, M.D.; Raso, M.M.; Pupo, D.; Trovato, P.; Simonetti, I.; Pariante, P.; et al. Engineering the Image Representation for Deep Learning in Contrast-Enhanced Mammography: A Systematic Analysis of Preprocessing and Anatomical Masking. Bioengineering 2026, 13, 322. [Google Scholar] [CrossRef]
- Su, Y.; Liu, Q.; Xie, W.; Hu, P. YOLO-LOGO: A transformer-based YOLO segmentation model for breast mass detection and segmentation in digital mammograms. Comput. Methods Programs Biomed. 2022, 221, 106903. [Google Scholar] [CrossRef]
- Prinzi, F.; Insalaco, M.; Orlando, A.; Gaglio, S.; Vitabile, S. A yolo-based model for breast cancer detection in mammograms. Cogn. Comput. 2024, 16, 107–120. [Google Scholar]
- Trang, K.; Ting, F.F.; Vuong, B.Q.; Ting, C.M. MANGA-YOLO: A Mamba-inspired YOLO model with group attention for breast mass detection in mammograms. Comput. Biol. Med. 2025, 199, 111339. [Google Scholar] [CrossRef]
- Abdikenov, B.; Rakishev, D.; Orazayev, Y.; Zhaksylyk, T. Enhancing breast lesion detection in mammograms via transfer learning. J. Imaging 2025, 11, 314. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2016; pp. 779–788. [Google Scholar]
- Hussain, M. YOLOv1 to v8: Unveiling Each Variant-A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO. Version 8.0.0. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 June 2026).
- Dhungel, N.; Carneiro, G.; Bradley, A.P. Deep structured learning for mass segmentation from mammograms. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP); IEEE: New York, NY, USA, 2015; pp. 2950–2954. [Google Scholar]
- Ribli, D.; Horváth, A.; Unger, Z.; Pollner, P.; Csabai, I. Detecting and classifying lesions in mammograms with deep learning. Sci. Rep. 2018, 8, 4165. [Google Scholar] [CrossRef]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR); ICLR: Appleton, WI, USA, 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 4700–4708. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML); PMLR: New York, NY, USA, 2019; pp. 6105–6114. [Google Scholar]
- Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Youden, W.J. Index for Rating Diagnostic Tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]
- Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Abudukelimu, H.; Gao, Y.; Abulizi, A.; Musideke, M.; Wu, S.; Wang, M.; Aizizi, M.; Yehaiya, G.; Abudukelimu, M. DVF-YOLO-Seg: A two-stage breast mass segmentation model with enhanced feature extraction and small lesion detection. Digit. Health 2025, 11, 20552076251374192. [Google Scholar]
- Manolakis, D.; Bizopoulos, P.; Lalas, A.; Votis, K. A two-stage lightweight deep learning framework for mass detection and segmentation in mammograms using YOLOv5 and depthwise SegNet. J. Imaging Inform. Med. 2025, 38, 3852–3867. [Google Scholar] [CrossRef]
- Mohamed, A.A.; Berg, W.A.; Peng, H.; Luo, Y.; Jankowitz, R.C.; Wu, S. A deep learning method for classifying mammographic breast density categories. Med. Phys. 2018, 45, 314–321. [Google Scholar] [CrossRef]
- Lehman, C.D.; Yala, A.; Schuster, T.; Dontchos, B.; Bahl, M.; Swanson, K.; Barzilay, R. Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation. Radiology 2019, 290, 52–58. [Google Scholar] [CrossRef]
- Lopez-Almazan, H.; Pérez-Benito, F.J.; Larroza, A.; Perez-Cortes, J.C.; Pollan, M.; Perez-Gomez, B.; Trejo, D.S.; Casals, M.; Llobet, R. A deep learning framework to classify breast density with noisy labels regularization. Comput. Methods Programs Biomed. 2022, 221, 106885. [Google Scholar] [CrossRef]
- Rigaud, B.; Weaver, O.O.; Dennison, J.B.; Awais, M.; Anderson, B.M.; Chiang, T.Y.D.; Yang, W.T.; Leung, J.W.T.; Hanash, S.M.; Brock, K.K. Deep Learning Models for Automated Assessment of Breast Density Using Multiple Mammographic Image Types. Cancers 2022, 14, 5003. [Google Scholar] [CrossRef] [PubMed]
- Busaleh, M.; Hussain, M.J.; Aboalsamh, H.A.; e Amin, F.; Al Sultan, S.A. TwoViewDensityNet: Two-View Mammographic Breast Density Classification Based on Deep Convolutional Neural Network. Mathematics 2022, 10, 4610. [Google Scholar] [CrossRef]
- Ragab, D.A.; Sharkas, M.; Marshall, S.; Ren, J. Breast cancer detection using deep convolutional neural networks. Biomed. Signal Process. Control 2021, 65, 102280. [Google Scholar] [CrossRef]
- Baccouche, A.; Garcia-Zapirain, B.; Elmaghraby, A.S. An integrated framework for breast mass classification and diagnosis using stacked ensemble of residual neural networks. Sci. Rep. 2022, 12, 12259. [Google Scholar] [CrossRef] [PubMed]
- Li, P.; Zhong, J.; Chen, H.; Hong, J.; Li, H.; Li, X.; Shi, P. An explainable and comprehensive BI-RADS-assisted diagnosis pipeline for mammograms. Phys. Medica 2025, 132, 104949. [Google Scholar] [CrossRef]
- Sabani, A.; Landsmann, A.; Hejduk, P.; Schmidt, C.; Marcon, M.; Borkowski, K.; Rossi, C.; Ciritsis, A.; Boss, A. BI-RADS-Based Classification of Mammographic Soft Tissue Opacities Using a Deep Convolutional Neural Network. Diagnostics 2022, 12, 1564. [Google Scholar] [CrossRef]
- Lin, X.; Liao, T.; Yang, Y.; Ouyang, R.; Zhou, Y.; Lai, X.; Ma, J. Value of deep learning model for predicting Breast Imaging Reporting and Data System 3 and 4A lesions on mammography. Quant. Imaging Med. Surg. 2025, 15, 4047–4058. [Google Scholar] [CrossRef]
- Tekin, A.; Toktay, B.; Günay, A.C.; Yazgan, H.; İnan, N.G.; Kocadağlı, O. BI-RADS classification in mammography using deep learning. In Güncel Ekonometri ve İstatistiksel Uygulamalar ile Akademik Çalışmalar; Özgür Publications: Istanbul, Turkey, 2024. [Google Scholar] [CrossRef]



| Dataset | ACR | # Images (Train/Aug/Val/Test) | Accuracy | Precision | Sensitivity | F1-Score |
|---|---|---|---|---|---|---|
| CBIS | 1 | 236/1180/67/34 | 0.640 | 0.786 | 0.775 | 0.701 |
| 2 | 530/2650/151/76 | 0.739 | 0.766 | 0.850 | 0.791 | |
| 3 | 314/1570/90/45 | 0.727 | 0.914 | 0.700 | 0.677 | |
| 4 | 107/535/31/15 | 0.630 | 0.895 | 0.680 | 0.654 | |
| INBREAST | 1 | 30/150/8/4 | 0.667 | 0.833 | 0.769 | 0.714 |
| 2 | 26/130/8/4 | 0.941 | 0.999 | 0.941 | 0.941 | |
| 3 | 15/75/4/2 | 0.624 | 0.615 | 0.889 | 0.696 | |
| 4 | 5/25/1/1 | 0.999 | 0.999 | 0.999 | 0.999 | |
| VINDr-Mammo | 1 | 3/15/1/1 | 0.769 | 0.769 | 0.999 | 0.870 |
| 2 | 102/510/29/14 | 0.636 | 0.913 | 0.913 | 0.750 | |
| 3 | 690/3450/197/98 | 0.533 | 0.696 | 0.696 | 0.604 | |
| 4 | 49/245/14/7 | 0.538 | 0.824 | 0.609 | 0.571 | |
| DMID | 1 | 51/255/15/7 | 0.935 | 0.700 | 0.720 | 0.765 |
| 2 | 124/620/36/18 | 0.929 | 0.780 | 0.757 | 0.778 | |
| 3 | 128/640/36/18 | 0.870 | 0.765 | 0.695 | 0.724 | |
| 4 | 25/125/7/4 | 0.899 | 0.788 | 0.674 | 0.702 | |
| Mass-Bench | 1 | 168/840/48/24 | 0.630 | 0.999 | 0.630 | 0.773 |
| 2 | 168/840/48/24 | 0.766 | 0.999 | 0.764 | 0.866 | |
| 3 | 168/840/48/24 | 0.764 | 0.999 | 0.764 | 0.866 | |
| 4 | 168/840/48/24 | 0.716 | 0.999 | 0.716 | 0.835 |
| Model | Accuracy | Precision | Recall | F1-Score | mAP@50 | mAP@50-95 |
|---|---|---|---|---|---|---|
| YOLOv5 | ||||||
| YOLOv8 | ||||||
| YOLOv11 |
| Dataset | Binary Metrics | Best Binary Setup | Multi-Class Metrics | Best Multi-Class Setup |
|---|---|---|---|---|
| CBIS-DDSM | /// | ENB3 + XGB | /// | DN121 + RF |
| INBREAST | /// | R50 + SVM | /// | HC + RF |
| VINDr-Mammo | /// | HC + RF | /// | HC + LR |
| DMID | /// | R50 + LR | /// | ENB3 + XGB |
| Mass-Bench | /// | DN121 + KNN | /// | HC + RF |
| Dataset | Binary Metrics | Best Binary Setup | Multi-Class METRICS | Best Multi-Class Setup |
|---|---|---|---|---|
| CBIS-DDSM | /// | DN121 + KNN | /// | R50 + XGB |
| INBREAST | /// | DN121 + SVM | /// | ENB3 + LR |
| VINDr-Mammo | /// | ENB3 + SVM | /// | ENB3 + SVM |
| DMID | /// | ENB3 + KNN | /// | DN121 + RF |
| Mass-Bench | /// | HC + RF | /// | HC + RF |
| Report | Year | Model | Dataset | ACR Classes | Acc (Bin) | Acc (Multi) | F1 (Bin) | F1 (Multi) | AUC (Bin) | AUC (Multi) |
|---|---|---|---|---|---|---|---|---|---|---|
| [73] | 2017 | CNN | Clinical dataset | B vs. C (binary) | – | – | – | – | 0.94 | – |
| [74] | 2019 | CNN | Clinical cohort | ACR 1–4 + binary | 0.87 | 0.77 | – | – | – | – |
| [75] | 2022 | Deep CNN | DDM-Spain | ACR 1–4 | – | 0.85 | – | – | – | – |
| [77] | 2022 | TwoViewDensityNet | DDSM/INbreast | ACR 1–4 | – | 0.96 | – | – | 0.99 | – |
| [76] | 2022 | EfficientNet | Multi-center dataset | ACR 1–4 + binary | 0.88 | 0.82 | – | – | 0.95 | 0.93 |
| Ours | 2026 | ML + Deep features | Mass-Bench | ACR 1–4 | 0.90 | 0.82 | 0.90 | 0.74 | – | – |
| Study | BI-RADS Composition | Split |
|---|---|---|
| Baccouche et al. (2022) [79] | CBIS: 2 (792), 3 (1938), 4 (2328), 5 (3402); INBREAST: 2 (144), 3 (78), 4 (126), 5 (276), 6 (48) | 80/10/10 |
| Li et al. (2025) [80] | CBIS: 1 (0), 2 (345), 3 (434), 4 (1534), 5 (530); INBREAST: 1 (67), 2 (220), 3 (23), 4 (43), 5 (57) | 80/20; 90/10; CV |
| Tsai et al. (2022) [35] | 0 (520), 1 (0), 2 (2125), 3 (847), 4A (367), 4B (277), 4C (217), 5 (204) | Train/test (blocks) |
| Sabani et al. (2022) [81] | Grouped: 1 vs. (2–3) vs. (4–5) | 70/20/10 |
| Lin et al. (2025) [82] | 3 (632), 4A (214) | No standard split |
| Tekin et al. (2024) [83] | Not explicitly reported | Not clearly specified |
| Ours (complete bench) | 2 (300), 3 (1258), 4 (1591), 5 (597) | Multi-dataset |
| Ours (balanced) | 2 (300), 3 (300), 4 (300), 5 (300) | Balanced |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zepeda-Reyes, H.E.; Peregrina-Barreto, H.; Lopez-Armas, G.C. Density-Aware Multi-Dataset Evaluation of Deep Learning for Mammographic Mass Detection and BI-RADS Classification. Mathematics 2026, 14, 2080. https://doi.org/10.3390/math14122080
Zepeda-Reyes HE, Peregrina-Barreto H, Lopez-Armas GC. Density-Aware Multi-Dataset Evaluation of Deep Learning for Mammographic Mass Detection and BI-RADS Classification. Mathematics. 2026; 14(12):2080. https://doi.org/10.3390/math14122080
Chicago/Turabian StyleZepeda-Reyes, Hector E., Hayde Peregrina-Barreto, and Gabriela C. Lopez-Armas. 2026. "Density-Aware Multi-Dataset Evaluation of Deep Learning for Mammographic Mass Detection and BI-RADS Classification" Mathematics 14, no. 12: 2080. https://doi.org/10.3390/math14122080
APA StyleZepeda-Reyes, H. E., Peregrina-Barreto, H., & Lopez-Armas, G. C. (2026). Density-Aware Multi-Dataset Evaluation of Deep Learning for Mammographic Mass Detection and BI-RADS Classification. Mathematics, 14(12), 2080. https://doi.org/10.3390/math14122080

