Leveraging Attention-Based Deep Learning in Binary Classification for Early-Stage Breast Cancer Diagnosis
Abstract
1. Introduction
2. Related Work
3. Methods and Materials
- Incorporation of the Convolutional Block Attention Module (CBAM): ECSAnet integrates a CBAM before the final classifier block. This module refines spatial and channel-wise features, enhancing the model’s ability to capture subtle variations in histopathological images. This is particularly critical for distinguishing between tumor subtypes and addressing variability in image characteristics.
- Augmentation of the Final Classifier Block: The final classifier block is enhanced with the addition of two fully connected (FC) layers. These layers improve feature representation and interpretation, ultimately boosting the classification accuracy.
- Reduction in the Number of Output Features: The number of output features in the last FC layer is reduced from 1000 to 8, aligning the output with the eight tumor subtypes in the BreakHis dataset. This modification ensures that the model is tailored to the specific classification task.
3.1. Efficient Channel-Spatial Attention Network
3.2. Machine Learning Approaches
3.2.1. Support Vector Machines (SVM)
3.2.2. K-Nearest Neighbor (KNN)
3.2.3. Linear Discriminant Analysis (LDA)
3.2.4. Decision Trees
3.2.5. Naïve Bayes
3.2.6. Logistic Regression
3.3. Proposed Framework
3.4. Dataset
3.4.1. Data Analysis
- Class Imbalance Between Benign and Malignant Samples: The malignant class significantly outnumbers the benign class, revealing a substantial disparity in sample sizes.
- Prevalence of Ductal Carcinoma (DC): Within the eight-class distribution, samples belonging to the Ductal Carcinoma (DC) category are notably more frequent compared to other tumor subtypes.
- Balanced Distribution Across Magnification Levels: The dataset exhibits a relatively uniform distribution of samples across the four magnification factors (40×, 100×, 200×, and 100×).
3.4.2. Data Preprocessing and Augmentation
- Balancing: The AugMix technique [24] was applied exclusively to the minority classes to balance the dataset. AugMix generates diverse augmented images by blending multiple transformations (e.g., rotations, posterizations, and distortions). Blending weights were drawn from a Dirichlet distribution, ensuring diversity without excessive duplication.
- Oversampling: After balancing, geometric transformations were applied to all classes to oversample the entire dataset to three times its original size. These transformations included random horizontal and vertical flips (each with a 50% probability), random rotations (affine transformations with angles from −45 to 45 degrees), random translations (up to 10% of the image dimensions), random scaling (with factors between 0.8 and 1.2), and random shearing (with angles from 0 to 10 degrees). This step enriched the dataset with realistic variations to improve model generalization.
3.5. Experiments Setup and Evaluation Metrics
4. Experimental Results and Discussion
4.1. Quantitative Results
4.2. Qualitative Results
4.3. Comparative Analysis with the Literature
5. Discussion of Results
5.1. Main Findings
- Low Magnifications (40×, 100×, 200×): These levels provide a broader view of tissue architecture, making them valuable for identifying larger structures or patterns. The high performance at these magnifications supports their use as the primary diagnostic focus in clinical practice.
- High Magnification (100×): This level is critical for detailed cellular analysis, such as assessing nuclear atypia and mitotic figures, but it introduces challenges such as increased noise and feature variability. While high magnifications are essential for specific diagnostic tasks, their utility in binary classification may be limited compared to lower magnifications.
5.2. Model Strengths
- High Accuracy Across All Magnifications: The majority of predictions for both benign and malignant classes are correctly classified, as evident from the strong diagonal dominance in the confusion matrices (e.g., 87/88 correctly classified benign samples at 40× and 345/347 malignant samples at 100×). This demonstrates the model’s ability to generalize well across magnification levels.
- Low False Negative Rate: The number of malignant cases misclassified as benign (false negatives) is minimal (e.g., 1 false negative at 40× and only 1–2 across other magnifications). This is critical in medical diagnostics to minimize missed cancer diagnoses.
- Robust Performance at Lower Magnifications (40×, 100×, and 200×): At 40×, 100×, and 200×, the model achieves high classification accuracy with minimal errors, showing its strength in leveraging broader tissue-level contexts for binary classification tasks.
5.3. Model Limitations
- Slight Increase in Errors at High Magnification (100×): At 100×, the number of false positives (benign samples misclassified as malignant) slightly increases (e.g., eight at 100× compared to seven at 100×). This suggests that higher magnifications introduce noise and fine-grained details that might confuse the model. Similar observations have been made in other studies, such as the binary classification work by Li et al. [25] and the multi-class classification research by Boumaraf et al. [26]. Conversely, some studies, including [27], reported reduced classification performance at the 100× magnification level. Taheri et al. [28] provides more insights into how magnification and classification type impact model performance, aiming to improve the accuracy and reliability of automated cancer diagnosis systems.
- Imbalance in Error Distribution: False positives (benign misclassified as malignant) are slightly more common than false negatives, which could lead to unnecessary follow-up procedures. While this is less critical than false negatives, it still reflects a slight bias in the model. Clement et al. [29] emphasizes strategies like class balancing, the careful selection of features across resolutions, and the use of evaluation metrics that account for imbalances.
- Sensitivity to Magnification Variability: Although the model performs well overall, the slight drop in performance at higher magnifications indicates a limitation in consistently handling fine-grained cellular details where contextual information is reduced.
6. Conclusions and Future Work
- Multimodal Imaging: Combining sMRI, rs-fMRI, or other imaging modalities could provide complementary information and enhance model performance.
- Hybrid Architectures: While this paper demonstrates the effectiveness of combining CNNs with ensemble classifiers such as DT and SVM, future work will explore further optimization and evaluation of these hybrid architectures. This includes refining integration techniques, assessing their performance on more complex datasets, and identifying scenarios where such combinations are most beneficial for feature extraction and classification.
- Magnification-Specific Models: Developing specialized models optimized for specific magnification might address the limitations observed at higher resolutions.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Aldakhil, L.A.; Alhasson, H.F.; Alharbi, S.S. Attention-Based Deep Learning Approach for Breast Cancer Histopathological Image Multi-Classification. Diagnostics 2024, 14, 1402. [Google Scholar] [CrossRef] [PubMed]
- Rashmi, R.; Prasad, K.; Udupa, C.B.K. Breast histopathological image analysis using image processing techniques for diagnostic purposes: A methodological review. J. Med. Syst. 2022, 46, 7. [Google Scholar] [CrossRef] [PubMed]
- Smith, J.; Doe, A. The role of AI in modern microscopy: Enhancing feature identification across scales. J. Microsc. Imaging Sci. 2023, 12, 45–60. [Google Scholar]
- Gupta, K.; Chawla, N. Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN. Procedia Comput. Sci. 2020, 167, 878–889. [Google Scholar] [CrossRef]
- Murtaza, G.; Shuib, L.; Mujtaba, G.; Raza, G. Breast Cancer Multi-classification through Deep Neural Network and Hierarchical Classification Approach. Multimed. Tools Appl. 2020, 79, 15481–15511. [Google Scholar] [CrossRef]
- Gour, M.; Jain, S.; Sunil Kumar, T. Residual learning based CNN for breast cancer histopathological image classification. Int. J. Imaging Syst. Technol. 2020, 30, 621–635. [Google Scholar] [CrossRef]
- Ahmad, N.; Asghar, S.; Gillani, S.A. Transfer learning-assisted multi-resolution breast cancer histopathological images classification. Vis. Comput. 2022, 38, 2751–2770. [Google Scholar] [CrossRef]
- Clement, D.; Agu, E.; Suleiman, M.A.; Obayemi, J.; Adeshina, S.; Soboyejo, W. Multi-Class Breast Cancer Histopathological Image Classification Using Multi-Scale Pooled Image Feature Representation (MPIFR) and One-Versus-One Support Vector Machines. Appl. Sci. 2023, 13, 156. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Beijing, China, 22–24 June 2014; PMLR: New York, NY, USA, 2021; pp. 10096–10106. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Gupta, S.; Akin, B. Accelerator-aware Neural Network Design using AutoML. arXiv 2020, arXiv:2003.02838. [Google Scholar] [CrossRef]
- Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Kramer, O., Ed.; Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–23. [Google Scholar] [CrossRef]
- Balakrishnama, S.; Ganapathiraju, A. Linear discriminant analysis—A brief tutorial. Inst. Signal Inf. Process. 1998, 18, 1–8. [Google Scholar]
- Rokach, L.; Maimon, O. Decision Trees. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005; pp. 165–192. [Google Scholar] [CrossRef]
- Webb, G.I. Naïve Bayes. In Encyclopedia of Machine Learning and Data Mining; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2016; pp. 1–2. [Google Scholar] [CrossRef]
- Böhning, D. Multinomial logistic regression algorithm. Ann. Inst. Stat. Math. 1992, 44, 197–200. [Google Scholar] [CrossRef]
- Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. Breast cancer histopathological image classification using Convolutional Neural Networks. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 2560–2567. [Google Scholar] [CrossRef]
- Reinhard, E.; Ashikhmin, M.; Gooch, B.; Shirley, P. Color Transfer between Images. IEEE Comput. Graph. Appl. 2001, 21, 34–41. [Google Scholar] [CrossRef]
- Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Li, X.; Shen, X.; Zhou, Y.; Wang, X.; Li, T.Q. Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet). PLoS ONE 2020, 15, e0232127. [Google Scholar] [CrossRef] [PubMed]
- Boumaraf, S.; Liu, X.; Zheng, Z.; Ma, X.; Ferkous, C. A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images. Biomed. Signal Process. Control. 2021, 63, 102192. [Google Scholar] [CrossRef]
- Toğaçar, M.; Özkurt, K.B.; Ergen, B.; Cömert, Z. BreastNet: A novel convolutional neural network model through histopathological images for the diagnosis of breast cancer. Phys. Stat. Mech. Its Appl. 2020, 545, 123592. [Google Scholar] [CrossRef]
- Taheri, S.; Golrizkhatami, Z.; Basabrain, A.A.; Hazzazi, M.S. A Comprehensive Study on Classification of Breast Cancer Histopathological Images: Binary Versus Multi-Category and Magnification-Specific Versus Magnification-Independent. IEEE Access 2024, 12, 50431–50443. [Google Scholar] [CrossRef]
- Clement, D.; Agu, E.; Obayemi, J.; Adeshina, S.; Soboyejo, W. Breast cancer tumor classification using a Bag of deep multi-resolution convolutional features. In Informatics; MDPI: Basel, Switzerland, 2022; Volume 9, p. 91. [Google Scholar]
Magnification | Benign | Malignant | Total | ||||||
---|---|---|---|---|---|---|---|---|---|
A | F | PT | TA | DC | LC | MC | PC | ||
40× | 114 | 253 | 109 | 149 | 864 | 156 | 205 | 145 | 1995 |
100× | 113 | 260 | 121 | 150 | 903 | 170 | 222 | 142 | 2081 |
200× | 111 | 264 | 108 | 140 | 896 | 163 | 196 | 135 | 2013 |
100× | 106 | 237 | 115 | 130 | 788 | 137 | 169 | 138 | 1820 |
Subtype total | 444 | 1014 | 453 | 569 | 3451 | 626 | 792 | 560 | 7909 |
Type total | 2480 | 5429 | 7909 |
Before Balancing and Oversampling | |||||||||
---|---|---|---|---|---|---|---|---|---|
Magnification | Benign | Malignant | Total | ||||||
A | F | PT | TA | DC | LC | MC | PC | ||
40× | 80 | 177 | 76 | 104 | 603 | 109 | 143 | 101 | 1393 |
100× | 79 | 181 | 85 | 105 | 631 | 119 | 155 | 99 | 1454 |
200× | 77 | 184 | 75 | 98 | 626 | 114 | 137 | 94 | 1405 |
100× | 74 | 165 | 80 | 91 | 550 | 97 | 118 | 96 | 1271 |
Subtype total | 310 | 707 | 316 | 398 | 2410 | 439 | 553 | 390 | 5523 |
Type total | 1731 | 3792 | 5523 | ||||||
After balancing and oversampling | |||||||||
40× | 1812 | 1812 | 1812 | 1812 | 1812 | 1812 | 1812 | 1812 | 14,496 |
100× | 1896 | 1896 | 1896 | 1896 | 1896 | 1896 | 1896 | 1896 | 15,168 |
200× | 1881 | 1881 | 1881 | 1881 | 1881 | 1881 | 1881 | 1881 | 15,048 |
100× | 1653 | 1653 | 1653 | 1653 | 1653 | 1653 | 1653 | 1653 | 13,224 |
Subtype total | 7242 | 7242 | 7242 | 7242 | 7242 | 7242 | 7242 | 7242 | 57,936 |
Type total | 28,968 | 28,968 | 57,936 |
Metrics | Formula | Definition |
---|---|---|
Accuracy | This is the ratio of correctly predicted instances (true positives and true negatives) to the total number of instances in the dataset, measuring the model’s overall correctness. | |
Precision | This is the ratio of true positive predictions to the total number of positive predictions made by the model, indicating the model’s accuracy for the positive class. | |
Recall | This is the ratio of true positive predictions to the total actual positives in the dataset, measuring the model’s ability to identify all relevant instances of the positive class. | |
Jaccard index | Also known as the Intersection over Union (IoU), this measures the similarity between the predicted and ground truth sets by calculating the size of their intersection divided by their union. | |
F1-score | This is the harmonic mean of Precision and Recall, providing a balanced measure of a model’s accuracy when Precision and Recall are of different magnitudes. | |
Matthews Correlation Coefficient (MCC) | This is a performance metric that evaluates the quality of binary classifications, and is particularly useful for imbalanced datasets as it considers all four confusion matrix components (TP, TN, FP, FN). |
Hyperparameter | Value |
---|---|
Learning Rate | 0.001 |
Learning Rate Scheduler | Reduce LROnPlateau |
Early Stopping | Patience = 25 epochs |
Weight Decay | 0.01 |
Number of Epochs | 50 |
Batch Size | 16 |
Optimizer | SGD |
Loss Function | CrossEntropyLoss |
Model | Magnification | Accuracy (%) | Precision (%) | F1-Score (%) | Jaccard Index (%) | AUC (%) | MCC (%) |
---|---|---|---|---|---|---|---|
ECSAnet [5] | 40× | 99.52 | 100.00 | 99.64 | 99.29 | 99.99 | 99.00 |
100× | 98.12 | 97.35 | 98.66 | 97.35 | 99.59 | 96.00 | |
200× | 98.07 | 97.90 | 98.59 | 97.22 | 99.94 | 95.00 | |
100× | 94.71 | 92.75 | 96.24 | 92.75 | 99.23 | 86.00 | |
ECSAnet [5] + DT | 40× | 99.52 | 99.52 | 99.52 | 99.04 | 99.64 | 99.00 |
100× | 98.12 | 98.12 | 98.12 | 96.32 | 97.80 | 96.00 | |
200× | 99.52 | 99.52 | 99.52 | 99.04 | 99.65 | 99.00 | |
100× | 95.77 | 95.79 | 95.73 | 91.86 | 94.30 | 90.24 | |
ECSAnet [5] + KNN | 40× | 99.52 | 99.52 | 99.52 | 99.04 | 99.64 | 99.00 |
100× | 98.12 | 98.12 | 98.12 | 96.32 | 97.79 | 96.00 | |
200× | 98.07 | 98.07 | 98.06 | 96.20 | 99.63 | 98.00 | |
100× | 95.24 | 95.39 | 95.16 | 90.83 | 95.66 | 91.00 | |
ECSAnet [5] + LDA | 40× | 99.52 | 99.52 | 99.52 | 99.04 | 99.64 | 99.00 |
100× | 98.12 | 98.12 | 98.12 | 96.32 | 97.80 | 96.00 | |
200× | 98.55 | 98.55 | 98.55 | 97.14 | 98.47 | 98.00 | |
100× | 95.24 | 95.28 | 95.18 | 90.87 | 93.81 | 91.00 | |
ECSAnet [5] + LR | 40× | 99.52 | 99.52 | 99.52 | 99.04 | 99.99 | 99.00 |
100× | 97.65 | 97.65 | 97.65 | 95.42 | 99.59 | 96.00 | |
200× | 99.52 | 99.52 | 99.52 | 99.04 | 99.89 | 99.00 | |
100× | 95.24 | 95.28 | 95.18 | 90.87 | 99.23 | 91.00 | |
ECSAnet [5] + NB | 40× | 98.07 | 98.18 | 98.08 | 96.24 | 98.57 | 98.00 |
100× | 96.24 | 96.38 | 96.27 | 92.85 | 96.44 | 94.00 | |
200× | 98.07 | 98.07 | 98.06 | 96.20 | 97.37 | 98.00 | |
100× | 94.71 | 94.89 | 94.75 | 90.08 | 94.81 | 84.00 | |
ECSAnet [5] + SVM | 40× | 99.52 | 99.52 | 99.52 | 99.04 | 99.99 | 99.00 |
100× | 98.12 | 98.12 | 98.12 | 96.32 | 97.55 | 96.00 | |
200× | 98.55 | 98.55 | 98.55 | 97.14 | 99.92 | 98.00 | |
100× | 95.24 | 95.39 | 95.16 | 90.83 | 95.47 | 91.00 |
Class | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) | Support |
---|---|---|---|---|---|
Benign | 99.00 | 100.00 | 99.29 | 99.00 | 62 |
Malignant | 100.00 | 99.00 | 100.00 | 100.00 | 140 |
Accuracy | 100.00 | 207 | |||
Macro Average | 99.00 | 100.00 | 99.64 | 99.00 | 207 |
Weighted Average | 100.00 | 100.00 | 99.76 | 100.00 | 207 |
Class | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) | Support |
---|---|---|---|---|---|
Benign | 97.00 | 97.00 | 98.64 | 97.00 | 66 |
Malignant | 99.00 | 99.00 | 96.97 | 99.00 | 147 |
Accuracy | 98.00 | 213 | |||
Macro Average | 98.00 | 98.00 | 97.80 | 98.00 | 213 |
Weighted Average | 98.00 | 98.00 | 97.52 | 98.00 | 213 |
Class | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) | Support |
---|---|---|---|---|---|
Benign | 99.00 | 100.00 | 99.29 | 99.00 | 66 |
Malignant | 100.00 | 99.00 | 100.00 | 100.00 | 141 |
Accuracy | 100.00 | 207 | |||
Macro Average | 99.00 | 100.00 | 99.64 | 99.00 | 207 |
Weighted Average | 100.00 | 100.00 | 99.76 | 100.00 | 207 |
Class | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) | Support |
---|---|---|---|---|---|
Benign | 96.00 | 90.00 | 98.44 | 93.00 | 61 |
Malignant | 95.00 | 98.00 | 90.16 | 97.00 | 128 |
Accuracy | 96.00 | 189 | |||
Macro Average | 96.00 | 94.00 | 94.30 | 95.00 | 189 |
Weighted Average | 96.00 | 96.00 | 92.83 | 96.00 | 189 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aldakhil, L.A.; Alharbi, S.S.; Aloraini, A.; Alhasson, H.F. Leveraging Attention-Based Deep Learning in Binary Classification for Early-Stage Breast Cancer Diagnosis. Diagnostics 2025, 15, 718. https://doi.org/10.3390/diagnostics15060718
Aldakhil LA, Alharbi SS, Aloraini A, Alhasson HF. Leveraging Attention-Based Deep Learning in Binary Classification for Early-Stage Breast Cancer Diagnosis. Diagnostics. 2025; 15(6):718. https://doi.org/10.3390/diagnostics15060718
Chicago/Turabian StyleAldakhil, Lama A., Shuaa S. Alharbi, Abdulrahman Aloraini, and Haifa F. Alhasson. 2025. "Leveraging Attention-Based Deep Learning in Binary Classification for Early-Stage Breast Cancer Diagnosis" Diagnostics 15, no. 6: 718. https://doi.org/10.3390/diagnostics15060718
APA StyleAldakhil, L. A., Alharbi, S. S., Aloraini, A., & Alhasson, H. F. (2025). Leveraging Attention-Based Deep Learning in Binary Classification for Early-Stage Breast Cancer Diagnosis. Diagnostics, 15(6), 718. https://doi.org/10.3390/diagnostics15060718