Figure 1.
Workflow of the proposed methodology for breast cancer detection, comprising (1) data analysis, with image collection from CancerImaging and CBIS-DDSM datasets, preprocessing, and data augmentation; (2) feature engineering, integrating radiomics and deep learning features to form a multimodal feature space; and (3) transfer learning using models like ResNet50, ResNet152v2, DenseNet121, and others, followed by visualization and internal validation.
Figure 1.
Workflow of the proposed methodology for breast cancer detection, comprising (1) data analysis, with image collection from CancerImaging and CBIS-DDSM datasets, preprocessing, and data augmentation; (2) feature engineering, integrating radiomics and deep learning features to form a multimodal feature space; and (3) transfer learning using models like ResNet50, ResNet152v2, DenseNet121, and others, followed by visualization and internal validation.
Figure 2.
Mammogram images with highlighted regions showing segmented ROI.
Figure 2.
Mammogram images with highlighted regions showing segmented ROI.
Figure 3.
ResNet152 with radiomics and custom input for breast cancer classification.
Figure 3.
ResNet152 with radiomics and custom input for breast cancer classification.
Figure 4.
Feature selection comprehensive analysis, (a) Performance Metrics Comparison (Top 10 Methods); (b) Average F1 Score by Method Category comparing F1 scores across categories; (c) Performance vs Feature Count (Color = Stability Score) illustrating the relationship between the number of features and performance; (d) Top 15 Most Efficient Methods ranked by efficiency score; (e) Stability vs. Performance Trade-off plotting stability score against F1-score to show the trade-off relationship; (f) Multi-metric Comparison comparing accuracy, precision, recall, F1-score, and stability score across the top five methods.
Figure 4.
Feature selection comprehensive analysis, (a) Performance Metrics Comparison (Top 10 Methods); (b) Average F1 Score by Method Category comparing F1 scores across categories; (c) Performance vs Feature Count (Color = Stability Score) illustrating the relationship between the number of features and performance; (d) Top 15 Most Efficient Methods ranked by efficiency score; (e) Stability vs. Performance Trade-off plotting stability score against F1-score to show the trade-off relationship; (f) Multi-metric Comparison comparing accuracy, precision, recall, F1-score, and stability score across the top five methods.
Figure 5.
Training and validation loss (top) and accuracy (bottom) of ResNet152 over 40 epochs. Loss decreases sharply before stabilizing, while accuracy rises quickly and remains high, indicating effective learning and generalization.
Figure 5.
Training and validation loss (top) and accuracy (bottom) of ResNet152 over 40 epochs. Loss decreases sharply before stabilizing, while accuracy rises quickly and remains high, indicating effective learning and generalization.
Figure 6.
Confusion matrix for ResNet152 showing classification performance with minimal misclassifications and high accuracy.
Figure 6.
Confusion matrix for ResNet152 showing classification performance with minimal misclassifications and high accuracy.
Figure 7.
Comparison of model accuracy and loss across transfer learning (VGG, MobileNet, and InceptionV3) architectures.
Figure 7.
Comparison of model accuracy and loss across transfer learning (VGG, MobileNet, and InceptionV3) architectures.
Figure 8.
Comparison of model accuracy and loss across transfer learning (DenseNet) architectures.
Figure 8.
Comparison of model accuracy and loss across transfer learning (DenseNet) architectures.
Figure 9.
Comparison of model accuracy and loss across transfer learning (ResNet 50, 101, and 152) architectures.
Figure 9.
Comparison of model accuracy and loss across transfer learning (ResNet 50, 101, and 152) architectures.
Figure 10.
Comparison of model accuracy and loss across transfer learning (ResNet 50V2, 101V2, and 152V2) architectures.
Figure 10.
Comparison of model accuracy and loss across transfer learning (ResNet 50V2, 101V2, and 152V2) architectures.
Figure 11.
Comparison of confusion matrix across transfer learning architectures (VGG16, 19, MobileNet, InceptionV3, ResNet50, 101).
Figure 11.
Comparison of confusion matrix across transfer learning architectures (VGG16, 19, MobileNet, InceptionV3, ResNet50, 101).
Figure 12.
Comparison of confusion matrices across transfer learning architectures (ResNet152, (50, 101, 152) V2, DenseNet (121, 169, 201)).
Figure 12.
Comparison of confusion matrices across transfer learning architectures (ResNet152, (50, 101, 152) V2, DenseNet (121, 169, 201)).
Figure 13.
The ROC curves highlight the performance of transfer learning models, with ResNet152, VGG19, and DenseNet169, showing the strongest balance between true-positive and false-positive rates.
Figure 13.
The ROC curves highlight the performance of transfer learning models, with ResNet152, VGG19, and DenseNet169, showing the strongest balance between true-positive and false-positive rates.
Figure 14.
The precision–recall curves illustrate the trade-off between precision and recall, with ResNet152 and VGG19 maintaining high precision across a wide recall range, outperforming models like MobileNet and InceptionV3.
Figure 14.
The precision–recall curves illustrate the trade-off between precision and recall, with ResNet152 and VGG19 maintaining high precision across a wide recall range, outperforming models like MobileNet and InceptionV3.
Figure 15.
Bootstrap accuracy distributions across models. The box plot shows the distribution of accuracies obtained through bootstrapping, allowing comparison of model performance variability.
Figure 15.
Bootstrap accuracy distributions across models. The box plot shows the distribution of accuracies obtained through bootstrapping, allowing comparison of model performance variability.
Figure 16.
Mean ranks across bootstrap samples. A simplified critical difference diagram illustrates the average model ranks based on the Friedman test, allowing visual comparison of performance differences.
Figure 16.
Mean ranks across bootstrap samples. A simplified critical difference diagram illustrates the average model ranks based on the Friedman test, allowing visual comparison of performance differences.
Table 1.
Description of the dataset distribution, including the count of original and augmented images for each class (benign and malignant).
Table 1.
Description of the dataset distribution, including the count of original and augmented images for each class (benign and malignant).
Labels | Original Images | Augmented Images |
---|
Benign | 1930 | 8498 |
Malignant | 1354 | 8498 |
Total | 3284 | 16,996 |
Table 2.
Summary of feature selection methods and the number of selected features.
Table 2.
Summary of feature selection methods and the number of selected features.
Method | Num_Features |
---|
RFE (Random Forest) | 10, 20, 50, 100 |
RFE (Logistic Regression) | 10, 20, 50, 100 |
RFECV (Random Forest) | 74 |
RFECV (Logistic Regression) | 647 |
SelectKBest (ANOVA) | 10, 20, 50, 100 |
LASSO (LassoCV) | 90 |
LASSO (LassoCV) | 157 |
XGBoost GPU | 50, 100, 200 |
LightGBM GPU | 50, 100, 200 |
CatBoost GPU | 50, 100, 200 |
Mutual Information | 50, 100, 200 |
Table 3.
Architectural and training configuration of pre-trained deep learning models used in the experimental evaluation. The table summarizes key specifications, including input dimensions, kernel sizes, architectural details, fine-tuned layers, optimizer, learning rate (LR), batch size, and regularization strategy (L2 penalty with ), common across all models. All models were fine-tuned using the Adam optimizer with a learning rate of and a batch size of 32.
Table 3.
Architectural and training configuration of pre-trained deep learning models used in the experimental evaluation. The table summarizes key specifications, including input dimensions, kernel sizes, architectural details, fine-tuned layers, optimizer, learning rate (LR), batch size, and regularization strategy (L2 penalty with ), common across all models. All models were fine-tuned using the Adam optimizer with a learning rate of and a batch size of 32.
Model | Input Size | Base Kernel Sizes | Models Architecture | Fine-Tuned Layers | Optimizer | LR | Batch Size | Regularization |
---|
VGG16 | | conv, pool | 13 conv + 3 FC (standard VGG) | Top 30% | Adam | | 32 | L2 () |
VGG19 | | conv, pool | 16 conv + 3 FC (standard VGG) | Top 30% | Adam | | 32 | L2 () |
ResNet50 | | , , | 50 layers; bottleneck blocks | Top 30% | Adam | | 32 | L2 () |
ResNet101 | | , , | 101 layers; bottleneck blocks | Top 30% | Adam | | 32 | L2 () |
ResNet152 | | , , | 152 layers; bottleneck blocks | Top 30% | Adam | | 32 | L2 () |
ResNet50V2 | | , , | Improved pre-activation | Top 30% | Adam | | 32 | L2 () |
ResNet101V2 | | , , | Improved pre-activation | Top 30% | Adam | | 32 | L2 () |
ResNet152V2 | | , , | Improved pre-activation | Top 30% | Adam | | 32 | L2 () |
DenseNet121 | | , , | 121 layers; dense connections | Top 30% | Adam | | 32 | L2 () |
DenseNet169 | | , , | 169 layers; dense connections | Top 30% | Adam | | 32 | L2 () |
DenseNet201 | | , , | 201 layers; dense connections | Top 30% | Adam | | 32 | L2 () |
MobileNet | | depthwise, pw | Lightweight separable conv | Top 30% | Adam | | 32 | L2 () |
InceptionV3 | | , , | Inception modules, factorized | Top 30% | Adam | | 32 | L2 () |
Table 4.
Distribution of radiomics features by category.
Table 4.
Distribution of radiomics features by category.
Category | Number of Features |
---|
GLCM | 264 |
GLSZM | 176 |
GLRLM | 176 |
First-order | 198 |
GLDM | 154 |
NGTDM | 55 |
Shape | 9 |
Diagnostics | 5 |
Other | 3 |
Table 5.
Comprehensive comparison of radiomics feature selection methods [Ac = Accuracy, Prec = Precision, Reca = Recall, F1s = F1 Score, St = Stability].
Table 5.
Comprehensive comparison of radiomics feature selection methods [Ac = Accuracy, Prec = Precision, Reca = Recall, F1s = F1 Score, St = Stability].
Method | Features | Ac (%) | Prec (%) | Reca (%) | F1s (%) | St |
---|
RFE (RF) | 100 | 83.70 | 86.20 | 80.00 | 83.00 | 0.485 |
RFECV (RF) | 74 | 83.10 | 85.40 | 79.50 | 82.30 | 0.353 |
RFE (RF) | 20 | 82.90 | 84.90 | 79.70 | 82.20 | 0.331 |
RFE (Random Forest) | 50 | 82.80 | 84.70 | 79.70 | 82.10 | 0.403 |
RFE (Random Forest) | 10 | 82.70 | 85.20 | 78.70 | 81.80 | 0.346 |
RFE (LR) | 50 | 82.50 | 84.60 | 79.10 | 81.70 | 0.239 |
CatBoost GPU | 50 | 82.40 | 85.50 | 78.80 | 82.10 | 0.313 |
SelectKBest (ANOVA) | 100 | 82.40 | 85.60 | 77.50 | 81.40 | 0.766 |
SelectKBest (ANOVA) | 20 | 82.20 | 83.50 | 79.80 | 81.60 | 0.897 |
LightGBM GPU | 100 | 82.20 | 84.60 | 79.40 | 81.90 | 0.348 |
Table 6.
Analysis of the performance characteristics of transfer learning models [Ac = Accuracy, Prec = Precision, Reca = Recall, Spec = Specificity, F1s = F1 Score, AUC = Area Under Curve, Ep = Epochs].
Table 6.
Analysis of the performance characteristics of transfer learning models [Ac = Accuracy, Prec = Precision, Reca = Recall, Spec = Specificity, F1s = F1 Score, AUC = Area Under Curve, Ep = Epochs].
Model | Ac (%) | Prec (%) | Reca (%) | Spec (%) | F1s (%) | AUC (%) | Ep |
---|
DenseNet169 | 95.0 | 94.0 | 94.0 | 96.0 | 95.0 | 98.80 | 50 |
DenseNet201 | 94.0 | 93.0 | 94.0 | 96.0 | 94.0 | 98.90 | 50 |
DenseNet121 | 93.0 | 93.0 | 94.0 | 95.0 | 94.0 | 98.60 | 50 |
ResNet152 | 97.0 | 97.0 | 98.0 | 96.0 | 97.0 | 99.30 | 40 |
ResNet101 | 96.0 | 96.0 | 96.0 | 96.0 | 96.0 | 99.10 | 48 |
ResNet50 | 94.0 | 95.0 | 94.0 | 96.0 | 94.0 | 98.80 | 50 |
ResNet50V2 | 93.0 | 93.0 | 93.0 | 93.0 | 93.0 | 98.50 | 50 |
ResNet101V2 | 96.0 | 96.0 | 96.0 | 94.0 | 96.0 | 99.20 | 45 |
ResNet152V2 | 94.0 | 94.0 | 94.0 | 93.0 | 94.0 | 98.70 | 50 |
InceptionV3 | 89.0 | 89.0 | 88.0 | 91.0 | 90.0 | 97.50 | 50 |
MobileNet | 88.0 | 87.0 | 86.0 | 89.0 | 88.0 | 97.00 | 50 |
VGG19 | 96.0 | 96.0 | 96.0 | 96.0 | 96.0 | 99.00 | 50 |
VGG16 | 94.0 | 94.0 | 94.0 | 94.0 | 94.0 | 98.50 | 29 |
Table 7.
Model comparison results with 95% bootstrap confidence intervals and mean ranks [Ac = Accuracy, Prec = Precision, Reca = Recall, F1s = F1 score.]
Table 7.
Model comparison results with 95% bootstrap confidence intervals and mean ranks [Ac = Accuracy, Prec = Precision, Reca = Recall, F1s = F1 score.]
Model | Ac (%) | Prec (%) | Reca (%) | F1s (%) | Mean Rank |
---|
VGG16 | 96 [94, 97] | 96 [94, 97] | 96 [94, 97] | 96 [94, 97] | 3.5 |
ResNet152 | 96 [94, 97] | 96 [95, 97] | 96 [95, 97] | 96 [94, 97] | 3.5 |
ResNet101 | 96 [94, 97] | 96 [94, 97] | 96 [94, 97] | 96 [94, 97] | 3.5 |
VGG19 | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 6.0 |
ResNet50 | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 6.0 |
DenseNet169 | 95 [93, 96] | 95 [93, 96] | 95 [93, 96] | 95 [93, 96] | 6.5 |
DenseNet201 | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 95 [94, 97] | 6.0 |
ResNet101V2 | 95 [93, 96] | 95 [93, 96] | 95 [93, 96] | 95 [93, 96] | 6.5 |
ResNet50V2 | 94 [92, 95] | 94 [92, 95] | 94 [92, 95] | 94 [92, 95] | 8.5 |
DenseNet121 | 94 [92, 95] | 94 [92, 95] | 94 [92, 95] | 94 [92, 95] | 8.5 |
ResNet152V2 | 93 [92, 95] | 93 [92, 95] | 93 [92, 95] | 93 [92, 95] | 10.0 |
InceptionV3 | 91 [89, 93] | 91 [89, 93] | 91 [89, 93] | 91 [89, 93] | 11.5 |
MobileNet | 87 [85, 89] | 87 [84, 89] | 87 [84, 89] | 87 [85, 89] | 13.0 |
Table 8.
Per-class metrics for malignant class [Prec = Precision, Reca = Recall, F1s = F1 score].
Table 8.
Per-class metrics for malignant class [Prec = Precision, Reca = Recall, F1s = F1 score].
Model | Prec (%) | Reca (%) | F1s (%) |
---|
VGG16 | 29.3 | 95.9 | 44.9 |
ResNet152 | 29.3 | 95.3 | 44.8 |
Table 9.
Comparison of performance with recent studies using Deep Learning, and Transfer Learning. [Ac = Accuracy].
Table 9.
Comparison of performance with recent studies using Deep Learning, and Transfer Learning. [Ac = Accuracy].
Author(s) | Methods | Ac (%) |
---|
Yu et al. [28] | ResNet34 | 72 |
ResNet50 | 82 |
VGG16 | 71 |
Gao et al. [31] | ResNet | 82 |
Wei et al. [29] | ResNet50 | 72 |
ResNet101 | 76 |
VGG19 | 83 |
Inception_v3 | 72 |
Sharmin et al. [30] | ResNet50V2 | 95 |
Yang et al. [32] | 3DResNet | 74 |
Alexandru et al. [19] | DenseNet121 | 99.6 |
Francesca et al. [21] | ResNet50 | 60 |
Wang et al. [22] | CNN | 96.4 |
Our Study | ResNet50 | 94 |
ResNet50V2 | 93 |
ResNet101 | 96 |
ResNet101V2 | 96 |
ResNet152 | 97 |
ResNet152V2 | 94 |
VGG19 | 96 |
VGG16 | 94 |
Inception_v3 | 89 |