Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures

Hussain, Syed Ibrar; Toscano, Elena

doi:10.3390/info16050359

Open AccessArticle

Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures

by

Syed Ibrar Hussain

^*,†

and

Elena Toscano

^†

Dipartimento di Matematica e Informatica, Università degli Studi di Palermo, via Archirafi 34, 90123 Palermo, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(5), 359; https://doi.org/10.3390/info16050359

Submission received: 17 March 2025 / Revised: 25 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Applications of Deep Learning in Bioinformatics and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

This paper investigates the categorization of mammogram images into benign, malignant and normal categories, providing novel approaches based on Deep Convolutional Neural Networks to the early identification and classification of breast lesions. Multiple DCNN models were tested to see how well deep learning worked for difficult, multi-class categorization problems. These models were trained on pre-processed datasets with optimized hyperparameters (e.g., the batch size, learning rate, and dropout) which increased the precision of classification. Evaluation measures like confusion matrices, accuracy, and loss demonstrated their great classification efficiency with low overfitting and the validation results well aligned with the training. DenseNet-201 and MobileNet-V3 Large displayed significant generalization skills, whilst EfficientNetV2-B3 and NASNet Mobile struck the optimum mix of accuracy and efficiency, making them suitable for practical applications. The use of data augmentation also improved the management of data imbalances, resulting in more accurate large-scale detection. Unlike prior approaches, the combination of the architectures, pre-processing approaches, and data augmentation improved the system’s accuracy, indicating that these models are suitable for medical imaging tasks that require transfer learning. The results have shown precise and accurate classifications in terms of dealing with class imbalances and dataset poor quality. In particular, we have not defined a new framework for computer-aided diagnosis here, but we have reviewed a variety of promising solutions for future developments in this field.

Keywords:

breast lesions; medical imaging; convolutional neural networks

1. Introduction

In this paper, we have reviewed the main neural network architectures with different characteristics in terms of their performance and hardware requirements, comparing them for a problem of great practical importance. In particular, we used a public-domain dataset of real mammograms acquired under uncontrolled conditions. We considered various techniques for improving the results, and an extensive experiment allowed us to verify the actual effectiveness of all of the models considered, identifying the values of their operating parameters. The approaches investigated may form the basis of future systems to support specialist physicians during the diagnostic phase, minimizing the risk of errors and the time required to classify the eventual pathology.

Breast cancer remains one of the most common and life-threatening illnesses globally, particularly among women. It encompasses a variety of disorders, classified as benign, malignant, and normal tissue abnormalities. The variability in breast cancer, both in terms of its histological appearance and genetic subgroups, highlights the complexities of its diagnosis and treatment. Early and precise identification is critical, as this significantly influences the treatment success and survival rates, particularly for aggressive subtypes. This study focuses on applying sophisticated computational approaches to classifying breast tissue into these three groups, addressing the challenges posed by the disease’s complexity. The standard diagnostic procedures, such as mammographies and biopsies, though successful, are prone to variability due to radiologists’ subjective judgments and the inadequate resolution for detecting minor abnormalities. The use of artificial intelligence (AI) in breast cancer diagnostics tackles these constraints by providing consistent, scalable, and highly accurate detection capabilities. Convolutional Neural Networks (CNNs), a subset of AI, have transformed healthcare by offering unprecedented accuracy in image-based classification tasks. These models are specifically designed to learn the spatial hierarchies in information, making them exceptionally useful for medical imaging applications such as breast cancer diagnosis [1,2].

Artificial-intelligence-powered techniques, particularly those based on deep learning, are increasingly being used to diagnose breast cancer. These techniques analyze mammographic pictures to find abnormalities such as microcalcifications and masses, allowing them to discriminate between benign and cancerous cells. This feature minimizes diagnostic variability while increasing accuracy, particularly in large-scale screening programs. In this work, AI enables the accurate categorization of breast tissues, bridging the gap between clinical competence and computational precision. This work makes use of sophisticated deep learning architectures such as EfficientNet and ResNet, which are well known for their cutting-edge performance in picture categorization. EfficientNet uses a compound scaling approach to optimize the depth, width, and resolution, resulting in higher accuracy with fewer parameters [3]. ResNet uses residual connections to overcome the vanishing gradient problem, allowing for the training of deeper networks. The use of these systems demonstrates AI’s rising significance in offering accurate and scalable solutions to crucial healthcare concerns [4,5].

RelatedWorks

Recent research has highlighted the potential of CNNs in breast cancer diagnosis. Uzun Ozşahin et al. (2022) conducted a thorough evaluation of the AI approaches used in breast cancer detection, demonstrating the efficiency of machine learning algorithms in understanding medical pictures and patient data. This study emphasized that AI systems, particularly those that use deep learning, have proven great accuracy in recognizing cancerous tissues [6]. Similarly, Rautela et al. (2022) conducted a comprehensive study of deep learning algorithms for breast cancer detection and concluded that CNNs excel in image analysis tasks, thus enhancing diagnosis. Recent research has focused on improving the accuracy of AI models [7]. Liu et al. (2019) created an AI-based system for detecting breast cancer node metastases, revealing insights into the model’s decision-making process to promote transparency and reliability in clinical settings. This strategy tackles the ‘black box’ aspect of AI, making the technology more understandable to healthcare specialists [8]. Furthermore, Huang et al. (2023) used AI to analyze multi-stain histopathological pictures, identifying characteristics linked with the response of breast cancer to neoadjuvant treatment. These findings show that artificial intelligence can help predict the treatment outcomes, also allowing for personalized patient care [9].

Large-scale research has also looked at how AI can be integrated into breast cancer screening systems. Marinovich et al. (2023) conducted a study to assess the performance of an AI system in a population-based cohort. This study discovered that AI-assisted screening might possibly match or exceed the accuracy of traditional approaches, implying that AI may play an important role in future screening tactics [10]. Tan et al. (2013) also studied the use of computer-aided detection in automated 3D breast ultrasounds, suggesting that artificial intelligence may improve the identification of malignant tumors in complicated imaging modalities [11].

Charan et al. (2018) used CNNs on mammographic pictures and achieved a good classification accuracy by utilizing pre-trained models such as VGG19 [12]. Similarly, Ismail et al. (2019) increased the model performance by using batch normalization and learning rate decay techniques [13]. Another significant work by Ebrahim et al. (2018) emphasized the need for data augmentation to improve the resilience of CNNs for medical imaging. Collectively, these findings highlight the transformational potential of AI for the detection and diagnosis of breast cancer [14]. EfficientNet, DenseNet, mobileNet, ResNet, and hybrid architectures have consistently outperformed the older approaches in breast cancer diagnosis due to their superior feature extraction and classification skills. For example, EfficientNet strikes a compromise between model depth and parameter economy, making it ideal for complicated datasets such as mammography pictures.

This study seeks to provide a strong foundation for breast cancer categorization utilizing cutting-edge CNN models. The approach is outlined in the following major steps:

Data preparation: Obtaining mammographic images from the mammographic dataset and performing pre-processing operations that include cleaning, normalization, and augmentation;
Model architecture: Using pre-trained CNN models such as EfficientNet and ResNet, with customized changes for breast tissue categorization;
Training and optimization: Using transfer learning, regularization, and learning rate modifications to improve the model performance;
Reliable categorization: Applying evaluation measures such as accuracy, precision, recall, and F1 score;
Interpretability: Using visualization techniques such as Grad-CAM to yield insights into the model’s decision-making while guaranteeing its clinical applicability.

This work aims to expand the area of AI-driven breast cancer diagnostics by addressing issues such as class imbalance, limited datasets, and interpretability while also providing a practical tool for early diagnoses and improved patient outcomes.

2. Breast Cancer Detection

Here, we describe the CNN architecture used to categorize medical images, with a particular emphasis on the diagnosis of breast cancer. Following the usage of a few pre-trained CNN models as feature extractors—such as VGG19 and ResNet—custom fully connected layers created specifically for the classification task are employed. To improve their performance and generalization, the CNN models are optimized through the use of dropout, regularization, batch normalization, and transfer learning approaches. These methods make use of strong pre-trained architectures and modify them to fit the particular medical categorization issue at hand.

2.1. The Dataset

We used the Mammographic Image Analysis Society (MIAS) dataset [15], which includes real images, to diagnose breast cancers and abnormalities, including benign, malignant, and normal tissues (see Figure 1). This dataset was subjected to extensive preparation, including dividing it into training, validation, and testing sets; normalization; and enhancement by rotating the images to different angles. In this analysis, many well-known CNN models were trained and assessed. To evaluate the generalization performance of these models, different batch sizes and epochs were used during training. The actual picture size is

1024 \times 1024

pixels, stored in a lossless portable gray map. All in all, 113 pictures are abnormal, while 209 are normal. In order to make the images compatible with CNN models and other deep learning models, they were first downsized through bilinear interpolation to the standard input size of

299 \times 299

pixels for efficient training and inference. Data augmentation was carried out by rotating the pictures 360 degrees at 6-degree intervals, improving the robustness and variance. To guarantee the representative distribution of each class, the dataset was divided into subsets for training (70%), testing (15%), and validation (15%), as shown in Table 1. Effective learning requires the CNN to receive high-quality input, which is why these procedures are so important.

2.2. Cleaning the Dataset and Creating the Labels

The dataset is cleaned first, and any missing values are dealt with. The missing values for SEVERITY are either filled in as ‘N’ (Normal) or are removed completely. This preparation stage prepares the data for input into the model and guarantees label consistency. Accurate labeling is crucial in a medical setting because incorrect labels might distort the model’s learning. The dependability of the dataset is increased when suitable solutions are used to handle missing data. Only correctly labeled data are included in the training process thanks to dropouts and replacements of missing values.

2.3. Class Balancing

Undersampling of the ‘normal’ class is performed to bring it in line with the number of samples available for the benign and malignant categories. This prevents the model from being skewed in favor of the majority class—a problem that frequently arises in datasets related to medicine. By preventing a single class from dominating the model’s predictions, class balancing improves the generalization to unobserved circumstances.

2.4. Image Augmentation

Image augmentation is a crucial step in improving the dataset. Because there are frequently fewer pictures available in medical datasets, such as that employed here, augmentation is a crucial strategy to avoid overfitting and to enhance the model’s generalization capabilities. By performing different transformations on the original photos, such as rotations, flips, scaling, and shifts, the augmentation creates an artificial increase in the size of the dataset. Each picture is rotated by 6 degrees at a time, resulting in 60 variants overall. This approach allows the model to be invariant to orientation [16,17]: because lesions may show various orientations based on the method of imaging used, it is common for models to be able to recognize objects at various angles. For this reason, the rotation approach was used in this particular situation. By simulating changes that the algorithm would be likely to experience in real-world applications, image augmentation enhances the model’s capacity to generalize to new, unseen data. In the absence of augmentation, the model’s performance on fresh test data may suffer due to overfitting to the particular orientations and pixel values found in this small dataset.

The model is prevented from being trained for an excessive amount of time on the training set if there is no improvement in the validation loss. The learning rate is adjusted using ReduceLROnPlateau when the model’s loss stops improving [18]. The optimizer can converge to the global minimum and avoid local minima better in decreasing the learning rate. These callbacks keep the training process effective while maximizing the model’s performance and preventing overfitting.

2.5. Performance Measures

As reported in Section 1, lesion class prediction is the final step in the suggested DNN paradigm and is based on the combined prediction values for each class from each model. The accuracy of a model’s class predictions is used to assess its classification performance. All of the widely used performance measures, namely accuracy, precision, recall, and

F 1

score, are employed in this study to corroborate the model performance [19,20]. More precisely, accuracy measures the statistical soundness of the detection and classification of multi-class cancer. Here,

F N

stands for false negative,

F P

for false positive, and

T P

for true positive. Because this measure depends on both the

F P

s and

F N

s, solely relying on it might occasionally be deceptive when assessing a predictor’s performance. This suggests that it is possible to have two models with identical accuracy, one with high

F P

s and low

F N

s and the other with low

F P

s and high

F N

s. The first model can therefore be selected above the second since it has fewer

F N

s, which may not be decided solely by the model’s accuracy scores:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N} .

(1)

The percentage of real photographs in each class that matches the projected images for that class is measured by the precision. Conversely, recall counts how many pictures in a certain class there are overall and what percentage of these pictures is properly identified as belonging to that class:

P r e c i s i o n = \frac{T P}{T P + F P},

(2)

R e c a l l = \frac{T P}{T P + F N} .

(3)

The combination of recall and precision can be described as the weighted mean of both. Namely, we define the

F 1

score as

F 1 = 2 \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 T P}{2 T P + F P + F N} .

(4)

The Cohen’s kappa score is defined as follows:

Kappa = \frac{O b s e r v e d A c c u r a c y - E x p e c t e d A c c u r a c y}{1 - E x p e c t e d A c c u r a c y}

(5)

where

O b s e r v e d A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N},

(6)

E x p e c t e d A c c u r a c y = \frac{(T P + F P) (T P + F N) + (T N + F N) (T N + F P)}{{(T P + F P + T N + F N)}^{2}} .

(7)

It is noteworthy that the recall, precision,

F 1

, and kappa score range between 0 and 1. A model performs better for a certain classification task if its recall, precision, and

F 1

score are higher.

2.6. Convolutional Neural Network Models

A CNN architecture was employed in the study to categorize medical images, with a focus on breast cancer classification. Because the CNN models used in this work have a comparable design, their performance can be consistently evaluated and compared. Two key parts make up the overall architecture: fully connected layers on top of a basic pre-trained network that is employed for the particular categorization task. The models all utilize distinct base networks as mentioned below, but they all adhere to the same basic paradigm. The CNN models used in this work categorize photos of breast cancer using pre-trained architectures with specially designed fully connected layers (Figure 2). Transfer learning cuts down on the time and resources needed to train highly accurate models, while batch normalization, dropout, and regularization approaches guarantee that the models generalize successfully [21].

The network’s basis makes use of proven, pre-trained CNN architectures including VGG19, ResNet, EfficientNet, and others. These models are excellent feature extractors for a variety of imagery classification tasks; they were initially trained on huge image datasets such as ImageNet. Only the convolutional layers remain in the model when the fully connected layers which were created for the ImageNet classification task are disabled. From the input images, these layers collect rich, hierarchical characteristics that are then passed to customized classification layers tailored to the particular job at hand.

Pre-trained models use feature representations learned from massive datasets like ImageNet to drastically minimize the training time and computer resources needed. Transfer learning is very effective for jobs where there is a lack of annotated data, like medical picture classification, since it enables these characteristics to be easily modified to fit the particular task at hand. Custom layers are added to the pre-trained model to modify the network for the purpose of categorizing images of breast cancer into three categories: normal, malignant, and benign. In order to improve the generalization and avoid overfitting, the structure adheres to a widely used deep learning pattern in which the features retrieved by the trained model are flattened and passed via fully connected layers and then batch normalization, activation, and dropout are used [22]. Important elements in this area consist of the following:

In order to make the output of the base model’s convolutional layers ready for the dense layers, it is first converted into a one-dimensional vector. The next step after every completely linked layer is batch normalization, which speeds up and stabilizes the training process [23].
By initializing the weights of the layers using ReLU activation, the He uniform initializer helps the network converge more quickly. Two dense layers, each with 512 and 256 units, are introduced. A dropout layer with a 0.3 dropout rate is added, and the ReLU activation function is employed in order to prevent overfitting by randomly deactivating a subset of the neurons during training [24].
One of the three groups is identified using a dense layer with softmax activation to categorize the images [25]. The network can recognize particular textures, forms, or patterns that are important for diagnosing breast cancer thanks to the completely linked layers, which assist the network in making judgements based on the data taken from the photos.
During training, the weights of the pre-trained network’s convolutional layers are kept constant, as they are frozen. This method makes use of these models’ strong feature extraction capabilities without requiring the time or computing resources to retrain them on the new dataset. In addition to preventing overfitting, freezing the foundation layers is also helpful when dealing with very limited datasets, as medical imaging frequently involves.

The models are constructed using the Adam optimizer, whose variable learning rate makes it a good fit for deep learning. For multi-class classification issues, categorical cross-entropy is a suitable loss function. To optimize the model training further and avoid overfitting, early stopping as well as rate reduction callbacks are used during training. To make sure the model does not overfit to the training set, the callback keeps an eye on the validation loss and stops the training if there is no progress after a predetermined number of epochs. When the loss reaches a plateau, the ReduceLROnPlateau callback lowers the learning rate, enabling the model to make more precise modifications and arrive at a better minimum (Figure 3).

3. Results and Discussion

Several CNN architectures are assessed in this study on categorizing mammographic images into the groups normal, malignant, and benign. Results are given for three sets of hyperparameters: a batch size equal to 128 and the number of epochs equal to 100 (Table 2), 200 (Table 3), and 400 (Table 4). EfficientNetV2 Small, EfficientNetV2-B3, and EfficientNetV2 Large achieve near-perfect accuracy, precision, recall, F1 scores, and Kappa scores across all of the experiments—especially with 400 epochs, where EfficientNetV2-B7 achieved 100% accuracy. The EfficientNet models routinely outperformed the other architectures. However, at 200 and 400 epochs, models such as VGG-19, ResNet-50, and ResNet-152 also showed a good performance, with accuracies around 94% and kappa values above 93%. With their accuracies ranging from 75% to 89%, architectures like DenseNet-201, XceptionNet, and NASNet demonstrated a modest performance in the meanwhile. ConvNet continuously performed poorly, with accuracies below 60%, despite having a simpler structure. Overall, the findings show that the EfficientNet models dominate this challenge and that most of the architectures perform better when trained for extended periods of time (up to 400 epochs).

The CNN models used in this study are generally structured in a consistent and effective manner, with the goal of using pre-trained architectures for feature extraction and customized layers for classification. In order to capture the general image characteristics, each model starts with a pre-trained base, such as MobileNet, EfficientNet, or ResNet, which is initialized with weights from the ImageNet dataset. To preserve these learned characteristics and lower the computational costs, the basic model’s layers are frozen, and its final fully linked layers are excluded. The underlying model’s multidimensional output is then transformed into a one-dimensional vector by applying a flattening layer. The network then learns abstract, higher-level features from the retrieved representations because of a number of dense layers that are initialized with the kernel. The ReLU activation functions add non-linearity to improve the learning capacity, while batch normalization is used with each dense layer to stabilize and expedite the training process. Dropout, which randomly deactivates a subset of the neurons during training, is used to avoid overfitting. In the last layer, a softmax activation function intended for multi-class classification is used to generate the class probabilities. The accuracy, precision, recall, F1 score, and Cohen’s kappa score are used to assess the models’ performance after they are assembled using the Adam optimizer and trained using the categorical cross-entropy loss function. The models achieve a great performance across all of the data due to this structured methodology, which maintains a balance between accurate classification and effective feature extraction.

We employed a generic model structure from the EfficientNet series (B3, B7, Small, and Large) which consisted of layers for classification after a pre-trained EfficientNet for feature extraction. Our dataset of mammograms was used to refine each model with an emphasis on normal, malignant, and benign classes. Since the models rapidly achieve high accuracy with a minimal validation loss, the training accuracy and loss curves for each model exhibit rapid convergence. With essentially similar accuracy and loss curves throughout the training and validation sets, EfficientNetV2 B3 and B7 in particular showed consistency throughout the training process. Despite having a more intricate design, EfficientNetV2 Large demonstrated a notable decrease in the training and validation loss while retaining good accuracy over time. The confusion matrices showed extremely accurate classifications into each of the three categories, demonstrating the exceptional accuracy of the EfficientNet models. In particular, EfficientNetV2 B3 and B7 demonstrated an outstanding classification performance, obtaining great accuracy in distinguishing between benign and malignant cases.

The outcomes offer a thorough understanding of how accurately Inception NetV3, XceptionNet, ResNet152, NasNet Large, Hybrid Inception-ResNetV2, and MobileNet-V3 Large can categorize mammograms into benign, malignant, and normal classes. The accuracy and loss curves and confusion matrices were used to assess these models. In the early epochs, the accuracy and loss curves for all models tended to rapidly converge, with the accuracy rising steadily and the loss falling drastically. Strong generalization scores were demonstrated by models such as ResNet-152 and MobileNet-V3 Large, which showed a consistent performance with training and validation accuracy curves that were closely matched. Although XceptionNet and Inception-ResNetV2 showed somewhat slower convergence, they were nevertheless able to achieve high final accuracy scores. These models’ loss curves stabilized at low levels, suggesting effective learning. Although it still performed well overall, NasNet Large had a somewhat wider disparity between its training and validation accuracy, indicating significant overfitting. With few misclassifications, ResNet-152 and Mobi-leNet-V3 Large demonstrated an exceptional classification accuracy, especially for benign and malignant instances. XceptionNet and the hybrid Inception-ResNetV2 both did well, although they made a few more errors when identifying malignant cases. Inception NetV3 demonstrated low errors in categorizing benign, malignant, and normal instances, achieving a balanced precision across all categories. Compared to the other models, NasNet Large showed misclassifications, especially for malignant and normal instances, even if it still performed adequately. The respective confusion matrices show that ResNet-152 and MobileNet-V3 Large perform well overall, whereas the other models perform similarly but have somewhat higher misinterpretation percentages.

The hardware used in this study was an NVIDIA Tesla V100 PCIe graphics card for all computational tasks, a 4x Tesla GPU, 2496 cores per GPU, 12 GiB of GDDR5 VRAM per GPU, the Xeon 2.3GHz CPU (with 8 dualthread cores), and 64 GiB of RAM.

4. Conclusions

This research provides a group of DCNN-based innovative approaches to the early identification and categorization of breast cancer. To boost the system’s performance, we utilized architectures made up of multiple DCNN designs and a series of pre-processing steps and augmentation methods. The proposed DNN algorithms were trained to learn at multiple levels using multiple dense layers and dropout layers, which increased the precision of classification for early-stage cancers. Xception, DenseNet, MobileNet, NASNet Mobile, and EfficientNetV2-B3 were among the CNN models evaluated that showed how successful deep learning is for challenging, multi-class classification tasks. The models were tested with hyperparameters such as the batch size, learning rate, and dropout regularization optimized for performance.

Plots of the accuracy and loss (Figure 4 and Figure 5) and confusion matrices (Figure 6) were used to prove the findings, which showed strong categorization efficiency with little overfitting because the validation efficiency closely resembled the training results. DenseNet-201 and MobileNet-V3 Large performed well as well; their capacity to generalize to fresh data was facilitated by dense layers and dropout regularization. The majority of the models exhibited consistent validation measures and steady loss curves, demonstrating how well the chosen architectures handled the complexity of the dataset. In particular, the EfficientNetV2-B3 and NASNet Mobile models achieved the greatest trade-off between accuracy and estimation precision, which makes them perfect for real-world applications where speed and accuracy are needed. The multi-model statistical measures and excellent findings in early diagnosis show the proposed study’s strength in identifying multi-class breast cancer. Additionally, selecting multiple augmentation rates per class improved the understanding of imbalanced data and enabled more accurate detection at a large scale.

We also investigated the use of different deep learning models for the categorization of mammogram images into benign, malignant, and normal categories, given as the second problem. To guarantee a balanced representation of the classes and a reliable model assessment, the dataset underwent comprehensive pre-processing that included normalization and data augmentation through image rotation and separation into three sets. As seen by their individual accuracy/loss curves and confusion matrices, the B3 and B7 variants of the EfficientNetV2 models performed exceptionally well across all of the tests, attaining a nearly perfect classification accuracy and low errors. Because of their highly optimized architecture, which effectively balances depth, breadth, and resolution, these models performed better than others. All of the models’ confusion matrices showed how well they differentiated between benign and malignant instances. With the training and validation accuracy closely matched, the accuracy/loss curves showed that the majority of the models achieved rapid convergence, guaranteeing generalization to unknown data. In conclusion, this study shows that these state-of-the-art CNN architectures performed well in the crucial task of classifying mammograms. These results also imply that the use of pre-trained models in transfer learning is a feasible method for medical imaging tasks, enabling an excellent accuracy even with limited datasets. For the sake of completeness, we report known results gained using similar approaches considered state of the art in the literature (Table 5); these results refer to medical applications in general and not necessarily to mammographic analyses.

Our models performed extremely well, but it is not intended for them to take the place of radiologists. Rather, our methodology can benefit physicians by significantly lowering the number of adverse results, which is essential for accurate medical assessments. We endorse the use of these model to help expert medical specialists identify different types of cancers. This contribution’s entire workflow, which includes compiling real breast cancer images, building classification models, enhancing the data to a much larger extent through data augmentation, and classifying multi-class images, can be applied to various medical image analyses, particularly for those datasets that have inadequately labeled data or data with intraclass imbalances.

Although public-domain benchmark datasets were used in the study, bigger and more varied datasets could be analyzed in future studies to explore the application of the proposed technique further to a range of demographics and types of breast abnormalities. This could assist in determining the approach’s robustness and generalization, as well as guaranteeing its efficacy for real-life healthcare settings. Future studies could be beneficial for assessing the transferability of the suggested approach to other datasets and invisible lesions. Evaluating the adaptability and performance of models trained on multiple datasets might shed light on their resilience and usefulness in a range of clinical contexts. We have not proposed an exact framework to support diagnoses, but general improvements to mammographic analyses could be made as future research directions. This would result in the development of more precise, effective, and easily available instruments to facilitate early identification and categorization, which would ultimately improve the outcomes for patients and lessen the incidence of breast cancer and other malignancies.

Author Contributions

Conceptualization, S.I.H. and E.T.; methodology, S.I.H. and E.T.; software, S.I.H. and E.T.; validation, S.I.H. and E.T.; data curation, S.I.H. and E.T.; writing—original draft preparation, S.I.H. and E.T.; writing—review and editing, S.I.H. and E.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted on original data from the downloadable public-domain Mammographic Image Analysis Society dataset [15].

Informed Consent Statement

Figure 1, Figure 2 and Figure 3 show examples from the downloadable public-domain Mammographic Image Analysis Society dataset [15].

Data Availability Statement

The original data presented in the study are openly available in the Mammographic Image Analysis Society dataset [15].

Acknowledgments

Elena Toscano is supported by the research fund of the University of Palermo: FFR 2024 Elena Toscano. Elena Toscano is a member of the “Gruppo Nazionale Calcolo Scientifico—Istituto Nazionale di Alta Matematica (GNCS-INdAM)”.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anandhi, S.; Mahure, S.J.; Royal, T.Y.; Nikhil, V.V.; Viswanadh, U. Mammography scans for breast cancer detection using CNN. In Computer Science Engineering; CRC Press: Boca Raton, FL, USA, 2024; pp. 3–8. [Google Scholar]
Aguerchi, K.; Jabrane, Y.; Habba, M.; El Hassani, A.H. A CNN Hyperparameters Optimization Based on Particle Swarm Optimization for Mammography Breast Cancer Classification. J. Imaging 2024, 10, 30. [Google Scholar] [CrossRef] [PubMed]
Vijetha, K.J.; Priya, S.S.S. A Comparative Analysis of CNN Architectures and Regularization Techniques for Breast Cancer Classification in Mammograms. Ing. Syst. d’Inf. 2024, 29, 2433. [Google Scholar] [CrossRef]
Wahed, M.A.; Alqaraleh, M.; Alzboon, M.S.; Al-Batah, M.S. Evaluating AI and Machine Learning Models in Breast Cancer Detection: A Review of Convolutional Neural Networks (CNN) and Global Research Trends. LatIA 2025, 3, 117. [Google Scholar] [CrossRef]
Hussain, S.I.; Toscano, E. An extensive investigation into the use of machine learning tools and deep neural networks for the recognition of skin cancer: Challenges, future directions, and a comprehensive review. Symmetry 2024, 16, 366. [Google Scholar] [CrossRef]
Şahin, Z.; Kalkan, Ö.; Aktaş, O. The physiology of laughter: Understanding laughter-related structures from brain lesions. J. Health Life 2022, 4, 242–251. [Google Scholar]
Rautela, K.; Kumar, D.; Kumar, V. A systematic review on breast cancer detection using deep learning techniques. Arch. Comput. Methods Eng. 2022, 29, 4599–4629. [Google Scholar] [CrossRef]
Liu, H.; Cui, G.; Luo, Y.; Guo, Y.; Zhao, L.; Wang, Y.; Subasi, A.; Dogan, S.; Tuncer, T. Artificial intelligence-based breast cancer diagnosis using ultrasound images and grid-based deep feature generator. Int. J. Gen. Med. 2022, 15, 2271–2282. [Google Scholar] [CrossRef]
Huang, Z.; Shao, W.; Han, Z.; Alkashash, A.M.; De la Sancha, C.; Parwani, A.V.; Nitta, H.; Hou, Y.; Wang, T.; Salama, P.; et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 2023, 7, 14. [Google Scholar] [CrossRef]
Marinovich, M.L.; Wylie, E.; Lotter, W.; Lund, H.; Waddell, A.; Madeley, C.; Pereira, G.; Houssami, N. Artificial intelligence (AI) for breast cancer screening: BreastScreen population-based cohort study of cancer detection. EBioMedicine 2023, 90, 104498. [Google Scholar] [CrossRef]
Tan, T.Z.; Miow, Q.H.; Huang, R.Y.J.; Wong, M.K.; Ye, J.; Lau, J.A.; Wu, M.C.; Bin Abdul Hadi, L.H.; Soong, R.; Choolani, M.; et al. Functional genomics identifies five distinct molecular subtypes with clinical relevance and pathways for growth control in epithelial ovarian cancer. EMBO Mol. Med. 2013, 5, 1051–1066. [Google Scholar] [CrossRef]
Charan, G.; Yuvaraj, R. Estimating the effectiveness of alexnet in classifying tumor in comparison with resnet. In Proceedings of the AIP Conference Proceedings, Contemporary Innovations in Engineering and Management, Nandyal, India, 22–23 April 2022; AIP Publishing: Melville, NY, USA, 2023; Volume 2821. [Google Scholar]
Ismail, N.S.; Sovuthy, C. Breast cancer detection based on deep learning technique. In Proceedings of the 2019 International UNIMAS STEM 12th engineering conference (EnCon), Kuching, Malaysia, 28–29 August 2019; pp. 89–92. [Google Scholar]
Ibrahim, M.; Yadav, S.; Ogunleye, F.; Zakalik, D. Male BRCA mutation carriers: Clinical characteristics and cancer spectrum. BMC Cancer 2018, 18, 179. [Google Scholar] [CrossRef] [PubMed]
Charan, S.; Khan, M.J.; Khurshid, K. Breast cancer detection in mammograms using convolutional neural network. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, 3–4 March 2018; pp. 1–5. [Google Scholar]
Ebrahim, M.; Alsmirat, M.; Al-Ayyoub, M. Performance study of augmentation techniques for hep2 cnn classification. In Proceedings of the 2018 9th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 3–5 April 2018; pp. 163–168. [Google Scholar]
O’Gara, S.; McGuinness, K. Comparing data augmentation strategies for deep image classification. In Proceedings of the Irish Machine Vision & Image Processing Conference (IMVIP), Dublin, Ireland, 28–30 August 2019; Technological University Dublin: Dublin, Ireland, 2019. [Google Scholar]
Thakur, A.; Gupta, M.; Sinha, D.K.; Mishra, K.K.; Venkatesan, V.K.; Guluwadi, S. Transformative breast Cancer diagnosis using CNNs with optimized ReduceLROnPlateau and Early stopping Enhancements. Int. J. Comput. Intell. Syst. 2024, 17, 14. [Google Scholar]
Chanda, D.; Onim, M.S.H.; Nyeem, H.; Ovi, T.B.; Naba, S.S. DCENSnet: A new deep convolutional ensemble network for skin cancer classification. Biomed. Signal Process. Control 2024, 89, 105757. [Google Scholar] [CrossRef]
Rao, Y.; Lee, Y.; Jarjoura, D.; Ruppert, A.S.; Liu, C.g.; Hsu, J.C.; Hagan, J.P. A comparison of normalization techniques for microRNA microarray data. Stat. Appl. Genet. Mol. Biol. 2008, 7, 22. [Google Scholar] [CrossRef]
Kumar, R.; Corvisieri, G.; Fici, T.; Hussain, S.; Tegolo, D.; Valenti, C. Transfer Learning for Facial Expression Recognition. Information 2025, 16, 320. [Google Scholar] [CrossRef]
Al-Kababji, A.; Bensaali, F.; Dakua, S.P. Scheduling techniques for liver segmentation: Reducelronplateau vs onecyclelr. In Proceedings of the Second International Conference on Intelligent Systems and Pattern Recognition, Hammamet, Tunisia, 24–26 March 2022; Springer: Cham, Switzerland, 2022; Volume 1589, pp. 204–212. [Google Scholar]
Chen, G.; Chen, P.; Shi, Y.; Hsieh, C.Y.; Liao, B.; Zhang, S. Rethinking the usage of batch normalization and dropout in the training of deep neural networks. arXiv 2019, arXiv:1905.05928. [Google Scholar]
Huang, R.; Wu, H. Skin cancer severity analysis and prediction framework based on deep learning. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing, Tianjin China, 25–27 October 2024; pp. 192–198. [Google Scholar]
Joseph, A.A.; Abdullahi, M.; Junaidu, S.B.; Ibrahim, H.H.; Chiroma, H. Improved multi-classification of breast cancer histopathological images using handcrafted features and deep neural network (dense layer). Intell. Syst. Appl. 2022, 14, 200066. [Google Scholar] [CrossRef]
Chaturvedi, S.; Gupta, K.; Prasad, P. Skin Lesion Analyser: An Efficient Seven-Way Multi-class Skin Cancer Classification Using MobileNet. In Proceedings of the Advanced Machine Learning Technologies and Applications, Jaipur, India, 13–15 February 2020; Springer: Singapore, 2020; Volume 1141. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. Lect. Notes Comput. Sci. 2016, 9908, 630–645. [Google Scholar]
Huang, H.; Hsu, B.; Lee, C.; Tseng, V. Development of a light-weight deep learning model for cloud applications and remote diagnosis of skin cancers. J. Dermatol. 2021, 48, 310–316. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Densely Connected Convolutional Networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
El-Nouby, A.; Touvron, H.; Caron, M.; Bojanowski, P.; Douze, M.; Joulin, A.; Laptev, I.; Neverova, N.; Synnaeve, G.; Verbeek, J. XCiT: Cross-Covariance Image Transformers. arXiv 2021, arXiv:2106.09681. [Google Scholar]
Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning. Scientific Research, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
Pacal, I.; Ozdemir, B.; Zeynalov, J.; Gasimov, H.; Pacal, N. A novel CNN-ViT-based deep learning model for early skin cancer diagnosis. Biomed. Signal Process. Control 2025, 104, 107627. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobileNetV3. In Proceedings of the International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Touvron, H.; Bojanowski, P.; Caron, M.; Cord, M.; El-Nouby, A.; Grave, E.; Izacard, G.; Joulin, A.; Synnaeve, G.; Verbeek, J.; et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 5314–5321. [Google Scholar] [CrossRef] [PubMed]
Yu, W.; Zhou, P.; Yan, S.; Wang, X. InceptionNeXt: When Inception Meets ConvNeXt. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–22 June 2024; pp. 5672–5683. [Google Scholar]
Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetV2: Enhance Cheap Operation with Long-Range Attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
Shahin, A.; Kamal, A.; Elattar, M. Deep Ensemble Learning for Skin Lesion Classification from Dermoscopic Images. In Proceedings of the 9th Cairo International Biomedical Engineering Conference, Cairo, Egypt, 20–22 December 2018. [Google Scholar]
Carcagnì, P.; Leo, M.; Cuna, A.; Mazzeo, P.; Spagnolo, P.; Celeste, G.; Distante, C. Classification of Skin Lesions by Combining Multilevel Learnings in a DenseNet Architecture. In Proceedings of the 20th International Conference, Image Analysis and Processing, Trento, Italy, 9–13 September 2019; Springer: Cham, Switzerland, 2019; Volume 11751. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved baselines with Pyramid Vision Transformer. Comp. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 10347–10357. [Google Scholar]
Han, D.; Yun, S.; Heo, B.; Yoo, Y. Rethinking Channel Dimensions for Efficient Model Design. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. RepViT: Revisiting Mobile CNN From ViT Perspective. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Wu, K.; Zhang, J.; Peng, H.; Liu, M.; Xiao, B.; Fu, J.; Yuan, L. TinyViT: Fast Pretraining Distillation for Small Vision Transformers. In Proceedings of the 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; Volume 13681. [Google Scholar]
Hangbo, B.; Li, D.; Songhao, P.; Furu, W. BEiT: BERT Pre-Training of Image Transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Hatamizadeh, A.; Yin, H.; Heinrich, G.; Kautz, J.; Molchanov, P. Global Context Vision Transformers. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Chaturvedi, S.; Tembhurne, J.; Diwan, T. A multi-class skin Cancer classification using deep convolutional neural networks. Multimed. Tools Appl. 2020, 79, 28477–28498. [Google Scholar] [CrossRef]
Ozdemir, B.; Pacal, I. A robust deep learning framework for multiclass skin cancer classification. Sci. Rep. 2025, 15, 4938. [Google Scholar] [CrossRef]
Alsunaidi, S.; Almuhaideb, A.; Ibrahim, N. Applications of Big Data Analytics to Control COVID-19 Pandemic. Sensors 2021, 21, 2282. [Google Scholar] [CrossRef]
Aladhadh, S.; Alsanea, M.; Aloraini, M.; Khan, T.; Habib, S.; Islam, M. An Effective Skin Cancer Classification Mechanism via Medical Vision Transformer. Sensors 2022, 22, 4008. [Google Scholar] [CrossRef]
Thwin, S.; Hyun-Seok Park, H.; Seo, S. A Trustworthy Framework for Skin Cancer Detection Using a CNN with a Modified Attention Mechanism. Appl. Sci. 2025, 15, 1067. [Google Scholar] [CrossRef]

Figure 1. Example images from the MIAS dataset [15].

Figure 2. CNN flow-chart.

Figure 3. Basic neural network structure.

Figure 4. Accuracy and loss of EfficientNetV2-B3, EffecientNetV2-B7, EfficientNetV2 Small, EfficientNetV2 Large, and InceptionNetV3 (the number of epochs is 100, and the batch size is 128). Here, the reader is referred to the electronic version of this article for the correct interpretation of color.

Figure 5. Accuracy and loss of XceptionNet, ResNet152, NASNet Large, Hybrid InceptionNet-Resnet V2, and MobileNet-V3 Large (the number of epochs is 100, and the batch size is 128).

Figure 6. Confusion matrices for all models considered.

Table 1. Data splitting with original and augmented datasets.

Class	Original	Augmented	Train (70%)	Val (15%)	Test (15%)
	Images	Samples
Normal	209	12,540	8760	1890	1890
Benign	62	3720	2580	570	570
Malignant	51	3060	2160	450	450
Total	322	19,320	13,500	2910	2910

Table 2. Performance on the augmented datasets (the number of epochs is 100, and the batch size is 128).

CNN Models	Accuracy	Precision	Recall	F1 Score	Kappa Score
VGG-19	0.9557	0.9557	0.9557	0.9554	0.9332
MobileNet-V3 Large	0.9215	0.9221	0.9215	0.9216	0.8814
DenseNet-201	0.7545	0.7557	0.7545	0.7548	0.6290
XceptionNet	0.7887	0.7886	0.7887	0.7884	0.6806
MobileNet-V2	0.8028	0.8034	0.8028	0.8029	0.7022
DenseNet-121	0.8672	0.8698	0.8672	0.8668	0.7989
DenseNet-169	0.8692	0.8706	0.8692	0.8690	0.8019
Resnet-50	0.9416	0.9427	0.9416	0.9415	0.9118
Resnet-101	0.9135	0.9147	0.9135	0.9137	0.8691
Resnet-152	0.9427	0.9477	0.9477	0.9477	0.9211
EfficientNetV2-B3	0.9970	0.9970	0.9970	0.9997	0.9954
NASNet	0.7903	0.7904	0.7903	0.7903	0.6832
EfficientNetV2 Large	0.9774	0.9774	0.9774	0.9774	0.9658
CONVNet	0.5445	0.5817	0.5445	0.5201	0.2923
EfficientNetV2 Small	1.000	0.9991	0.9990	0.9974	0.9989
EfficientNetV2 B7	0.9985	0.9985	0.9985	0.9985	0.9977
Inception-ResnetV2	0.8612	0.8647	0.8612	0.8616	0.7901
Resnet101V2	0.7692	0.7696	0.7692	0.7688	0.6504
NasNet Large	0.8778	0.8785	0.8778	0.8780	0.8155
InceptionNet V3	0.8643	0.8646	0.8643	0.8644	0.7946

Table 3. Performance on the augmented datasets (the number of epochs is 200, and the batch size is 128).

CNN Models	Accuracy	Precision	Recall	F1 Score	Kappa Score
VGG-19	0.9437	0.9439	0.9437	0.9436	0.9147
MobileNet-V3 Large	0.9416	0.943	0.9416	0.9417	0.9116
DNS-201	0.7686	0.7694	0.7686	0.7683	0.6491
XceptionNet	0.8451	0.8450	0.8451	0.8448	0.7653
MobileNet-V2	0.8390	0.8398	0.8390	0.8391	0.756
DenseNet-121	0.9115	0.9124	0.9115	0.9117	0.8661
DenseNet-169	0.8531	0.8550	0.8531	0.8532	0.7769
Resnet-50	0.9557	0.9560	0.9557	0.9557	0.9329
Resnet-101	0.9316	0.9321	0.9316	0.9315	0.8962
Resnet-152	0.9457	0.9458	0.9457	0.9457	0.9178
EffecientNetV2-B3	0.9970	0.9970	0.9970	0.9998	0.9956
NASNet	0.7949	0.7967	0.7949	0.7954	0.6919
EffecientNetV2 Large	0.9955	0.9955	0.9955	0.9955	0.9932
CONVNet	0.5716	0.5845	0.5716	0.5617	0.3492
EffecientNetV2 Small	0.9970	0.9997	0.9997	0.9997	0.9955
EffecientNetV2 B7	0.9955	0.9955	0.9955	0.9955	0.9932
Inception-ResnetV2	0.8959	0.8960	0.8959	0.8959	0.8434
Resnet101V2	0.8009	0.8025	0.8009	0.8010	0.7002
NasNet Large	0.8748	0.8749	0.8748	0.8746	0.8116
InceptionNet V3	0.8974	0.8991	0.8974	0.8977	0.8459

Table 4. Performance on the augmented datasets (the number of epochs is 400, and the batch size is 128).

CNN Models	Accuracy	Precision	Recall	F1 Score	Kappa Score
VGG-19	0.8567	0.8571	0.8567	0.8568	0.7848
MobileNet-V3 Large	0.9020	0.9020	0.9020	0.9018	0.8526
DenseNet-201	0.8944	0.8945	0.8944	0.8944	0.8412
XceptionNet	0.9578	0.9579	0.9578	0.9577	0.9365
MobileNet-V2	0.9457	0.9462	0.9457	0.9455	0.9183
DenseNet-121	0.9668	0.9670	0.9668	0.9668	0.9501
DenseNet-169	0.9502	0.9503	0.9502	0.9502	0.9252
Resnet-50	0.9744	0.9744	0.9744	0.9743	0.9615
Resnet-101	0.9985	0.9985	0.9985	0.9985	0.9977
Resnet-152	0.9834	0.9934	0.9834	0.9834	0.9751
EfficientNetV2-B3	1.0000	0.9993	0.9997	0.9998	0.9994
NASNet	0.8115	0.8115	0.8113	0.8115	0.7139
EfficientNetV2 Large	0.9894	0.9895	0.9894	0.9895	0.9840
CONVNet	0.5445	0.5684	0.5445	0.5271	0.2897
EfficientNetV2 Small	1.0000	0.9998	1.0000	0.9998	0.9995
EfficientNetV2 B7	1.0000	0.9995	0.9995	0.9995	0.9991
Inception-ResnetV2	0.9351	0.9351	0.9351	0.9350	0.9015
Resnet101V2	0.7783	0.7785	0.7783	0.7781	0.6637
NasNet Large	0.8658	0.8659	0.8658	0.8657	0.7963
InceptionNet V3	0.9005	0.9005	0.9005	0.9003	0.8487

Table 5. Comparison against state-of-the-art approaches. Results sorted by accuracy.

Approach	Accuracy	Precision	Recall	F1 Score	Kappa Score
MobileNet [26]	0.8310	0.8900	0.8300	0.8300	—
ResNetv250 [27]	0.8493	0.7809	0.7406	0.7571	—
Deep learning models [28]	0.8580	0.7518	—	—	—
DenseNet121 [29]	0.8635	0.8126	0.7798	0.7932	—
XCiT-Small-Patch16 [30]	0.8785	0.8390	0.7921	0.8123	—
Res2NeXt50 [31]	0.8798	0.8401	0.8144	0.8255	—
EfficientNet-B4 [32]	0.8827	0.8210	0.8035	0.8110	—
EfficientNetv2-small [33]	0.8858	0.8580	0.8148	0.8324	—
Xception [34]	0.8858	0.8677	0.8094	0.8345	—
MobileNetv3-large-075 [35]	0.8877	0.8442	0.8077	0.8249	—
ResMLP-24 [36]	0.8885	0.8786	0.8171	0.8449	—
InceptionNeXt-base [37]	0.8929	0.8616	0.8308	0.8444	—
GhostNetv2-100 [38]	0.8982	0.8615	0.8337	0.8440	—
ResNet-50 + Inception V3 [39]	0.8990	0.8620	0.7960	—	—
nseNet + SVM [40]	0.9000	0.8800	0.7600	0.8200	—
PvTv2-B2 [41]	0.9029	0.8754	0.8399	0.8532	—
DeiT-base [42]	0.9034	0.8887	0.8359	0.8588	—
RexNet200 [43]	0.9040	0.8785	0.8596	0.8677	—
RepViT-m2 [44]	0.9061	0.8792	0.8669	0.8713	—
Tiny-ViT-21 [45]	0.9082	0.8740	0.8724	0.8720	—
BeiTv2-base [46]	0.9090	0.8775	0.8731	0.8741	—
PiT-base [44]	0.9092	0.8952	0.8456	0.8675	—
Swinv-base [47]	0.9179	0.9049	0.8757	0.8893	—
GcViT-small [48]	0.9213	0.9127	0.8742	0.8913	—
ResNeXt-101 [49]	0.9320	0.8800	0.8800	—	—
ConvNeXtV2 + ViT [50]	0.9348	0.9324	0.9070	0.9182	—
Deep learning models [51]	0.9580	0.9222	0.8420	0.8803	—
MVT + MLP [52]	0.9614	0.9600	0.9650	0.9700	—
DCAN-Net [53]	0.9757	0.9700	0.9757	0.9710	—
Proposed methodology	1.0000	0.9998	1.0000	0.9998	0.9995

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hussain, S.I.; Toscano, E. Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures. Information 2025, 16, 359. https://doi.org/10.3390/info16050359

AMA Style

Hussain SI, Toscano E. Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures. Information. 2025; 16(5):359. https://doi.org/10.3390/info16050359

Chicago/Turabian Style

Hussain, Syed Ibrar, and Elena Toscano. 2025. "Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures" Information 16, no. 5: 359. https://doi.org/10.3390/info16050359

APA Style

Hussain, S. I., & Toscano, E. (2025). Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures. Information, 16(5), 359. https://doi.org/10.3390/info16050359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Deep Learning for Mammography: Augmentation and Tailored Architectures

Abstract

1. Introduction

RelatedWorks

2. Breast Cancer Detection

2.1. The Dataset

2.2. Cleaning the Dataset and Creating the Labels

2.3. Class Balancing

2.4. Image Augmentation

2.5. Performance Measures

2.6. Convolutional Neural Network Models

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI