Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models

Molina Molina, Edgar Omar; Diaz-Ramirez, Victor H.

doi:10.3390/app15147879

Open AccessArticle

Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models

by

Edgar Omar Molina Molina

and

Victor H. Diaz-Ramirez

^*

Instituto Politécnico Nacional, CITEDI, Av. Instituto Politécnico Nacional 1310, Nueva Tijuana, Tijuana 22435, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7879; https://doi.org/10.3390/app15147879

Submission received: 20 May 2025 / Revised: 5 July 2025 / Accepted: 10 July 2025 / Published: 15 July 2025

(This article belongs to the Special Issue Object Detection and Image Processing Based on Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Breast cancer is a leading cause of mortality among women worldwide. Early detection is crucial for increasing patient survival rates. Artificial intelligence, particularly convolutional neural networks (CNNs), has enabled the development of effective diagnostic systems by digitally processing mammograms. CNNs have been widely used for the classification of breast cancer in images, obtaining accurate results similar in many cases to those of medical specialists. This work presents a hybrid feature extraction approach for breast cancer detection that employs variants of EfficientNetV2 network and convenient image representation based on phase features. First, a region of interest (ROI) is extracted from the mammogram. Next, a three-channel image is created using the local phase, amplitude, and orientation features of the ROI. A feature vector is constructed for the processed mammogram using the developed CNN model. The size of the feature vector is reduced using simple statistics, achieving a redundancy suppression of

99.65 %

. The reduced feature vector is classified as either malignant or benign using a classifier ensemble. Experimental results using a training/testing ratio of 70/30 on 15,506 mammography images from three datasets produced an accuracy of

86.28 %

, a precision of

78.75 %

, a recall of

86.14 %

, and an F1-score of

80.09 %

with the modified EfficientNetV2 model and stacking classifier. However, an accuracy of

93.47 %

, a precision of

87.61 %

, a recall of

93.19 %

, and an F1-score of

90.32 %

were obtained using only CSAW-M dataset images.

Keywords:

breast cancer detection; phase image features; convolutional neural networks (CNNs); EfficientNetV2; classifier ensembles

1. Introduction

Breast cancer detection and treatment are major priorities in both basic and clinical medicine, as it is one of the leading causes of mortality among women worldwide [1]. Furthermore, in recent years, the mortality rate has increased by 130% in females and 648% in males [2]. Approximately 2.3 million women are diagnosed with breast cancer worldwide each year, making it the most frequently diagnosed cancer among women worldwide [3,4]. Several studies indicate that the survival rate for this disease is higher when detected in its initial stages [5]. Thus, the need for auxiliary tools motivates the development of image processing and pattern recognition techniques for the detection of breast cancer. The extraction of useful features from medical images and the use of artificial intelligence techniques for pattern classification have been intensively investigated for this purpose. Texture-based features such as fractal dimension [6], gray local covariance matrix (GLCM) [7], and local binary patterns (LBPs) [8] have been considered for this problem. These features are often extracted from a region of interest (ROI), which is typically the portion of the image where the potential cancer lesion is located. Conventional techniques used to classify these features are usually based on conventional machine learning (ML) methods considering binary and multiclass classification [9,10,11,12,13,14,15].

Due to the need to develop automated systems for the detection and classification of cancer in mammograms, different artificial intelligence strategies have been explored. CNN, which can be used for feature extraction, classification, and image segmentation, have emerged as an attractive strategy for mammography image classification. Several existing CNN-based methods have achieved similar effectiveness to that of expert radiologists in controlled conditions. The effectiveness of a CNN model strictly depends on the availability of a large number of annotated training images and adequate data balancing. These requirements are not always feasible, so data augmentation is commonly employed to reduce model overfitting [16,17,18,19,20]. However, augmented data can misrepresent real mammography images, potentially leading to poor generalization of extracted features to clinical data.

Recently, a proposed scheme combines multiple CNN architectures with machine learning strategies for breast cancer classification [21,22,23]. This strategy uses various CNN models as features extractors, which are then classified using different classifiers. This hybrid approach allows exploiting the advantages of CNN-based features while benefiting from the effectiveness of traditional ML algorithms for both binary and multiclass classification.

Various hybrid systems combine convolutional neural networks (CNNs) with recurrent neural network (RNN) modules, such as bidirectional long short-term memory (BLSTM) layers, for feature extraction and classification [24]. Other methods integrate CNN architectures with Transformer encoders and specialized convolutional modules for binary classification [25]. An approach that combines stereoscopic attention mechanisms with Transformer-based feature fusion has also been explored to enhance lesion localization and classification in automated breast ultrasound (ABUS) images [26]. Additional strategies apply bio-inspired optimization algorithms [27] or combine CNN-derived features with texture-based descriptors using ensemble classifiers [28].

It is noteworthy that several existing methods based on CNN or hybrid-CNN have achieved high performance in breast cancer detection. However, these methods typically require data augmentation and specialized procedures for parameter optimization. These requirements increase computational costs and implementation complexity, additionally increasing the risk of overfitting and sensitivity to initial values.

This work addresses the challenge of effectively extracting discriminative features for breast cancer detection from mammograms using lightweight and computationally efficient learning models, without relying on data augmentation or large training datasets. This challenge is particularly relevant in clinical scenarios where data availability is limited and computational resources are constrained. Initially, the mammograms are preprocessed to remove annotations and suppress noise. Afterwards, an ROI is extracted from each mammography, containing the potential cancer lesion. Each extracted ROI is then transformed into a three-channel image consisting of the local phase, local orientation, and phase congruency features in each channel, respectively. These components are obtained with the help of the first-order Riesz transform. Next, a data vector of 1280 elements is obtained from the ROI, employing two variants of EfficientNetV2 trained on an image dataset composed of the Mammogram Image Analysis Society (MIAS), Digital Database for Screening Mammography (DDSM), and An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer (CSAW-M) datasets. Afterwards, a compact five-element feature vector is obtained by computing different statistics from the resultant 1280-element data vector. Finally, four classifier ensembles (voting, stacking, boosting, and bagging) are considered for breast cancer classification.

The main contributions of this work are as follows: (1) a simple preprocessing strategy is introduced to eliminate undesired artifacts and enhance the quality of regions of interest (ROIs) in mammography images; (2) a novel image representation method based on phase information is proposed, transforming ROIs into three-channel images that capture important structural details that are not easily distinguishable by conventional techniques; (3) an efficient CNN-based feature extractor is developed by adapting EfficientNetV2 variants for well-known mammography datasets; and (4) an effective approach is suggested to generate compact feature vectors using basic statistical techniques, thereby reducing overfitting and simplifying classification tasks. These contributions allowed the proposed method to achieve high classification performance without relying on data augmentation or extensive hyperparameter tuning, reducing complexity and contributing to the feasibility of practical implementation in clinical settings.

The paper is organized as follows. Section 2 presents a review of existing methods for breast cancer detection in mammograms. Section 3 explains in detail the proposed methodology. Section 4 presents the results obtained with the proposed methodology for breast cancer detection in terms of objective measures. These results are discussed and compared with respect to existing state-of-the-art methods. Finally, Section 5 presents the conclusions.

2. Literature Review

Breast cancer classification in mammograms has been addressed using a wide range of computational methods. These methods can be grouped into three main categories: classical machine learning methods, convolutional neural network-based methods, and hybrid strategies that combine elements of both. This section reviews recent advances in these three categories, focusing on their methodologies, datasets, and reported performance.

2.1. Machine Learning-Based Methods

Classical machine learning methods have been extensively investigated for breast cancer classification in digital mammograms due to their good results. Table 1 summarizes recent ML-based methods considering variations in feature extraction, dimensionality reduction, and classifiers.

One method employs cross-diagonal texture matrix (CDTM) and Haralick features from non-overlapping ROI blocks in mammograms. Feature optimization is performed using kernel principal component analysis (KPCA) and the grasshopper optimization algorithm (GOA). This method reports classification accuracies of 97.49% for MIAS [9,29] and 92.61% for DDSM [30].

Another approach utilizes the fast discrete curvelet transform with wrapping (FDCT-WRP) for feature extraction. Dimensionality reduction is performed using principal component analysis (PCA) and linear discriminant analysis (LDA). Classification is carried out using an extreme learning machine (ELM) optimized by particle swarm optimization (PSO). The results report accuracies of 100%, 98.94%, and 98.76% on the MIAS, DDSM, and INbreast datasets, respectively [10].

A different method applies the discrete Chebyshev transform (DCT) combined with the contrast-limited adaptive histogram equalization (CLAHE). Feature selection is carried out using KPCA and differential evolution (DE) [11]. This approach reports near 100% precision on the MIAS dataset and 91.13% on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.

Feature extraction based on the crow search algorithm (CSA) and Harris Hawks optimization (HHO) has also been explored. The extracted features are classified using artificial neural networks (ANNs) and support vector machines (SVMs), achieving 97.85% accuracy on the DDSM dataset with a 70/30 train–test split [12,31].

In addition, a proposed computer-aided diagnosis (CAD) system designed for contrast-enhanced spectral mammography (CESM) utilizes morphological and texture features. These features are reduced using correlation-based feature selection (CFS) and classified by SVM with a sigmoid kernel, reporting an accuracy of 96.87%, sensitivity of 97.23%, specificity of 95.47%, and an area under curve (AUC) of 0.98 [13].

A multi-ROI method based on k-means clustering and bidimensional empirical mode decomposition (BEMD) extracts texture features using the gray level co-occurrence matrix (GLCM) and gray level run length matrix (GLRLM). Classification is performed using SVM with a radial basis function (RBF) kernel, achieving accuracies of 98.04%, 98.26%, and 98.62% on the MIAS, DDSM, and INbreast datasets, respectively [14].

In summary, classical ML methods demonstrate good performance on various mammography datasets, performing feature extraction combined with traditional classifiers. However, their dependence on manual ROI selection, ad hoc feature extraction methods, and dataset-specific tuning may limit scalability and generalization in broader clinical applications.

2.2. Convolutional Neural Network-Based Methods

Convolutional neural networks have demonstrated significant potential in breast cancer classification due to their ability to automatically learn hierarchical image features, achieve robust performance, and reduce the need for handcrafted feature extraction. Table 2 summarizes recent CNN-based methods applied to breast cancer classification.

A CAD system based on the You Only Look Once (YOLO) framework was proposed for ROI extraction, using feedforward CNN, ResNet50, and InceptionResNet-V2 architectures for classification [16]. This system, trained on augmented DDSM and INbreast datasets with a 70/20/10 split, reported accuracies of 97.50% and 95.32%, with corresponding AUCs of 0.9750 and 0.9501, respectively.

A CNN model with three convolutional layers, four max-pooling layers, and three fully connected layers was proposed for binary classification [32]. Using ROIs extracted through Otsu’s binarization and contrast-limited adaptive histogram equalization (CLAHE), the model was evaluated on DDSM, MIAS, and INbreast datasets, reporting accuracies of 91.20%, 95.30%, and 96.52%, respectively.

A recent study compared classification performance with and without data augmentation, considering AlexNet, DenseNet, and ShuffleNet models [33]. Performance evaluation on the INbreast dataset showed that DenseNet achieved an accuracy of 99.72% in binary classification, while ShuffleNet reached 97.84% accuracy in multi-density classification, using an 80/20 train–test split.

A model based on the multi-scale attention-guided network (MSANet) was proposed to improve feature extraction [34]. This model was evaluated on the DDSM and INbreast datasets, reporting AUC values of 0.942 and 0.9285, respectively.

A transfer learning-based model combining ECA-Net50 and ResNet50, and employing focal loss (FL) to address class imbalance, was proposed for binary classification using the INbreast dataset. This model achieved an accuracy of 92.9% and an AUC of 0.960 [17].

Another method incorporates phase information from mammograms using the Riesz transform and evaluates multiple CNN models, including VGG16, InceptionV2, and ResNet50 [35]. The highest reported accuracy of 82.2% was achieved using the ResNet50 model on the combined MIAS, DDSM, INbreast, and CSAW-M datasets.

A CAD system based on AlexNet, VGG-16, and VGG-19 was proposed and trained on the combined datasets of MIAS, DDSM, and INbreast. The system achieves an overall accuracy of 92.27% on the combined datasets and accuracies of 95.95%, 96.53%, and 96.53% on the MIAS, INbreast, and DDSM datasets, respectively, using an 80/20 train–test split [18].

Another approach replaced the fully connected layers of EfficientNet models with MobileNetV2 block convolution (MBConv) layers. This model was designed to classify craniocaudal (CC) and mediolateral oblique (MLO) mammography views. It achieved 93.44% accuracy on the CBIS-DDSM dataset using data augmentation [36].

Also, a ResNet18-based model using function-preserving transformations was designed and tested on the CBIS-DDSM dataset, achieving an AUC of 83.13% with an 85/15 train–test split [20].

Finally, the multitask information bottleneck network (MIB-Net), based on VGG-16 and UNet, was proposed for simultaneous segmentation and classification of mammography and ultrasound images [19]. Using the Local Enhanced Set (LES), Denoised Enhanced Set (DES), and Breast Ultrasound Image (BUSI) datasets, the model achieved an accuracy of 91.28% with a training/validation/testing split of 80/10/10.

Overall, CNN-based methods have shown superior performance compared with traditional machine learning approaches. However, their effectiveness depends on factors such as data quality, network architecture, and dataset size and balance.

2.3. Hybrid Methods Based on CNN

Hybrid methods that combine CNN feature extraction with machine learning classifiers have gained attention for breast cancer classification. This approach aims to keep the strengths of both deep and traditional learning techniques, improving interpretability, generalization, and performance. Table 3 summarizes recent hybrid methods applied to breast cancer detection in mammography.

One approach integrates CNN-based feature extraction with SVM classification, using variants of AlexNet, GoogleNet, and ResNet [21]. This system achieved 97.9% accuracy on DDSM and 95.4% on MIAS using an 80/20 train–test split and 5-fold cross-validation.

Another method applies the modified entropy whale optimization algorithm (MEWOA) to optimize features extracted from MobileNetV2 and NasNet Mobile [22]. This method reports accuracies of 99.7% on INbreast, 99.8% on MIAS, and 93.8% on DDSM with 10-fold cross-validation.

An ensemble approach combining SVM, random forest, and sigmoid classifiers with CNN-based feature extraction reports 99.4% accuracy on MIAS and 98.5% on BCDR [23].

For multiclass classification, a CNN-BiLSTM hybrid model achieved 98.56% accuracy on MIAS and 92.26% on INbreast, demonstrating the benefit of combining convolutional and recurrent layers [37].

A multiview classification model combined ResNeXt with Transformer encoders and multiplex convolutions reports 90.57% accuracy and 94.86% AUC on a private dataset [25].

Another hybrid model used ResNet18 for feature extraction, combined with feature reduction using PSO, the dragonfly optimization algorithm (DFOA), and the crow search optimization algorithm (CSOA) and classification with weighted KNN, achieving accuracies of 84.35%, 83.19%, and 97.36% on the MIAS, INbreast, and WDBC datasets, respectively [27].

Finally, a system combining texture features and an ensemble of classifiers with CNN refinement achieved 93.5% accuracy on MIAS and 91.45% on DDSM [28]. Table 4 provides a comparative overview of the advantages and limitations of machine learning, deep learning, and hybrid methods for mammography-based breast cancer classification.

3. Proposed Methodology

This section presents the proposed methodology for feature extraction and breast cancer detection in mammograms. Section 3.1 describes the datasets used for training and testing. Section 3.2 and Section 3.3 detail the proposed mammogram preprocessing and phase image feature extraction methods. Finally, Section 3.4 and Section 3.5 present the convolutional neural network-based model used for feature extraction and classification using variants of the EfficientNetV2 architecture. A schematic diagram of the proposed method for breast cancer detection is shown in Figure 1.

3.1. Datasets

The proposed methodology is evaluated using three publicly available digital mammography datasets: MIAS, mini-DDSM, and CSAW-M.

The Mammographic Image Analysis Society (MIAS) database [29] contains 322 digitized film mammograms with a resolution of 1024 × 1024 pixels. Each image is annotated with breast density, lesion type (if present), and severity (normal, benign, or malignant). Lesion center coordinates and approximate radii are also provided. This dataset is widely used for early-stage algorithm validation due to its manageable size and well-defined labels.

The mini-DDSM dataset [38] is a curated subset of the DDSM (Digital Database for Screening Mammography) dataset, composed of 9684 mammograms with improved contrast and reduced noise. Images are labeled as normal, benign, or malignant. The dataset includes breast density information and lesion contours when applicable. The mini-DDSM provides a more diverse and balanced representation across classes compared with MIAS.

The CSAW-Mammography dataset [39] includes 10,020 mammograms collected from routine clinical screenings in Sweden. Images are captured using full-field digital mammography and labeled as either malignant or normal based on biopsy-confirmed diagnoses. The dataset covers a wide range of patient ages and breast densities, providing realistic clinical variability. No lesion localization is provided, making it suitable for weakly supervised classification tasks.

In total, 20,026 images are considered for the development and evaluation of the proposed methodology. It is worth mentioning that the CSAW-M dataset comprises a significantly larger number of images (10,020) compared with MIAS and mini-DDSM, which could introduce a bias in the combined dataset. To mitigate this effect in the binary classification task (malignant vs. benign), we ensured that class balance was preserved during training and evaluation. Specifically, the positive and negative class proportions were maintained consistently across all datasets.

3.2. Mammography Preprocessing

An essential stage for a successful mammogram classification is the appropriate extraction of ROIs, as illustrated in Figure 2. For this purpose, an input mammogram is cropped by 5% with respect to its full size, as shown in Figure 2b. To suppress undesired artifacts and reduce noise, the mammogram is transformed as follows:

\tilde{f} (x, y) = \{\begin{matrix} f (x, y), & if 0.25 * f_{m a x} \leq f (x, y) \leq 0.75 * f_{m a x}, \\ 0, & otherwise, \end{matrix}

(1)

where

f_{m a x}

is the maximum intensity value of

f (x, y)

, as shown in Figure 2c. Afterwards, the CLAHE algorithm [40] is applied for contrast enhancement as shown in Figure 2d. Finally, the center of mass of the whole image is computed, as shown in Figure 2e, obtaining an ROI per image, as shown in Figure 2f.

3.3. Extraction of Phase Image Features

The process of highlighting relevant details in mammography images is an important step for effective extraction of features and subsequent classification. Morrone et al. [41,42,43] proposed a method called a local energy model, which obtains the phase image spectrum by taking into account image structures with maximum amplitude values of the Fourier transform. Then, by employing local energy models, distinctive features such as lines, edges, and shadows are enhanced. Felsberg [44] proposed the scale-space monogenic signal using the Poisson kernel as an alternative to the Gaussian kernel for linear scale-space analysis followed by the 2D Riesz transform. This transformation is useful to enhance fine details in images, such as delicate tissue, which is particularly relevant for mammography image classification.

Let

f (x, y)

, be a 2D grayscale image with a Fourier transform given by

F (u, v) = F {f (x, y)}

. The monogenic signal in the scale-space representation is given by [35,44]

F_{M s} (u, v) = F_{b p} (u, v) + i H \cdot F_{b p} (u, v),

(2)

where

F_{b p} (u, v) = P_{b p} (u, v) F (u, v),

(3)

is the filtered spectrum of

f (x, y)

using a Poisson band-pass filter, given by

P_{b p} (u, v) = e^{- 2 π S_{0} λ^{k} \sqrt{u^{2} + v^{2}}} - e^{- 2 π S_{0} λ^{k - 1} \sqrt{u^{2} + v^{2}}},

(4)

where

S_{0}

is the coarsest scale,

λ \in (0, 1)

is the relative bandwidth, and

k \in N

the index of the filter used. In our experiments, we set

λ = 0.45

as recommended in [44]. Additionally,

H = [H_{1} (u, v), H_{2} (u, v)]

is the frequency response based on the first-order Riesz transform [35] given by

\begin{matrix} H_{1} (u, v) & = i \frac{u}{\sqrt{u^{2} + v^{2}}}, \\ H_{2} (u, v) & = i \frac{v}{\sqrt{u^{2} + v^{2}}} . \end{matrix}

(5)

Next, from the scale-space monogenic signal

F_{M s} (u, v)

, the following components are computed:

\begin{matrix} f_{b p 1} (x, y) & = F^{- 1} \{H_{1} (u, v) \cdot F_{b p} (u, v)\}, \\ f_{b p 2} (x, y) & = F^{- 1} \{H_{2} (u, v) \cdot F_{b p} (u, v)\}, \\ f_{b p} (x, y) & = F^{- 1} \{F_{b p} (u, v)\}, \end{matrix}

(6)

where

F^{- 1}

denotes the inverse Fourier transform. Thus, the local phase can be computed as

φ (x, y) = t a n^{- 1} (\frac{\sqrt{{(f_{b p 1} (x, y))}^{2} + {(f_{b p 2} (x, y))}^{2}}}{f_{b p} (x, y)}),

(7)

whereas the local orientation can be obtained by

θ (x, y) = t a n^{- 1} (\frac{f_{b p 2} (x, y)}{f_{b p 1} (x, y)}) .

(8)

A useful measure of the significance of an extracted feature is the phase congruence (

P C

), which produces a value close to unity for highly significant features and a value close to zero for less significant features. According to [45], the

P C

can be computed by

\begin{matrix} P C (x) = & \frac{\sum_{n} W (x) ⌊ A_{n} (x) c o s (Φ_{n} (x)) - T ⌋}{\sum_{n} A_{n} (x) + ϵ}, \end{matrix}

(9)

where

W (x)

is the sigmoidal weighting function of the frequency spread,

A_{n} (x)

is the amplitude of the n-th Fourier element,

Φ_{n} (x) = {max}_{\bar{φ} \in [0, 2 π]} {φ_{n} (x) - \bar{φ} (x)}

,

φ_{n} (x)

is the local phase of the n-th Fourier element,

\bar{φ} (x)

is the amplitude-weighted mean local phase angle of all Fourier elements at a given point,

ϵ

is a constant to avoid division by zero, and T is a threshold value. Figure 3a shows an example of an extracted ROI from a mammography image of cancer. The local phase of the ROI is shown in Figure 3b, and its local orientation and phase congruency are shown in Figure 3c and Figure 3d, respectively. It can be seen that highly detailed information of the mammogram tissue is enhanced by the transformation (Figure 3b–d) compared with the original ROI in Figure 3a.

3.4. EfficientNetV2 Models

Training a convolutional neural network (CNN) model typically requires a large number of images to reduce overfitting and improve accuracy. Transfer learning enables the reuse of pre-trained CNN models on extensive datasets for smaller datasets, either through fine-tuning or feature extraction [46]. In this work, EfficientNetV2 [47] is utilized for feature extraction due to its advanced architecture and efficiency.

EfficientNetV2 is an improved version of EfficientNet [48] that incorporates a combination of a mobile inverted bottleneck convolution (MBConv) and fused MBConv layers in its structure, as illustrated in Figure 4. MBConv is a building block that enhances model efficiency by combining depthwise separable convolutions and inverted residuals, which help reducing computational cost while maintaining high accuracy. These enhancements allow EfficientNetV2 models to train faster and be more lightweight compared with other state-of-the-art models [47]. EfficientNetV2 models are pre-trained on the extensive ImageNet-21K dataset [49], which provides a robust reference for various computer vision tasks. A key difference between conventional EfficientNet and EfficientNetV2 is the incorporation of Fused-MBConv layers in the initial stages of the model. Specifically, EfficientNetV2 uses these layers in the first three stages, achieving a trade-off between training speed and network effectiveness [50]. Fused-MBConv layers combine the efficiency of standard convolutions with the benefits of depthwise separable convolutions, improving both computational efficiency and model performance.

There are several variants of EfficientNetV2. These variants are designed to adapt to various applications, ranging from resource-constrained environments to high-performance computing scenarios. Table 5 presents the variants of the EfficientNetV2 model, detailing the input image sizes and the dimensions of the feature vectors in the final fully connected layer. This information is essential for integrating EfficientNetV2 into machine learning algorithms, as it ensures compatibility and optimized performance.

In summary, EfficientNetV2 offers significant improvements over its predecessor, making it a highly effective choice for feature extraction in image classification tasks. Its ability to train rapidly while maintaining a lightweight structure and high accuracy makes it particularly suitable for applications with limited computational resources.

3.5. Feature Extraction CNN-Based Model

This work considers four variants of EfficientNetV2 as feature extractors. The MIAS, mini-DDSM, and CSAW-M datasets are combined, and one ROI of size

256 \times 256

pixels is extracted for each mammogram, as shown in Figure 2. Next, a 3-channel image is created utilizing Equations (7)–(9), with the local phase, local orientation, and phase congruence as the respective channels. Note that the employed phase image features provide information on both broad and fine details of the breast tissue for classification. Afterwards, for the EfficientNetV2S to EfficientNetV2XL models in Table 5, the last layers are removed, and only the values of a 1280-element vector of the last fully connected (FC) layer are considered as features for all models. However, in many cases, classification models are more efficient, with compact vectors containing only the most significant features [51]. Dimensionality reduction is crucial for processing high-dimensional data, as it removes redundant features and increases classification efficiency [52]. This task also reduces the complexity of classification models and improves accuracy by retaining the most relevant information, thereby decreasing computing time [53]. The proposed model employs simple statistical descriptors for dimensionality reduction instead of more complex techniques. Specifically, the following statistics are used to reduce the dimensionality of the FC layer:

μ (F C) = \frac{1}{L} \sum_{i = 1}^{L} F C (i)

(10)

σ^{2} (F C) = \frac{1}{L - 1} \sum_{i = 1}^{L} {[F C (i) - μ (F C)]}^{2}

(11)

σ (F C) = \sqrt{σ^{2} (F C)}

(12)

m a x (F C) = max_{i = 1, \dots L} (F C (i))

(13)

m i n (F C) = min_{i = 1, \dots L} (F C (i)) .

(14)

Consequently, the final descriptor for each mammogram is

S_{F C} = [μ (F C), σ^{2} (F C), σ (F C), max (F C), min (F C)] .

(15)

This dimensionality reduction approach aims to simplify classification, reduce computational complexity, and mitigate the risk of overfitting without significantly degrading essential information, particularly when working with small medical imaging datasets.

To classify the descriptor

S_{F C}

, four machine learning-based ensemble classifiers were employed: voting, stacking, bagging, and boosting [54,55,56,57]. These classifiers were selected for their strong predictive performance, improved generalization, and reduced risk of overfitting. Also, these classifiers are robust to moderate class and dataset imbalances, making them well suited for the combined datasets used in this study. In addition, no data augmentation or resampling techniques were applied in order to preserve the original data distribution and evaluate the proposed method under realistic clinical conditions.

Algorithm 1 presents the pseudocode of the proposed method for feature extraction and breast cancer detection.

Algorithm 1: Proposed method for breast cancer detection

Require: Input mammogram image

f (x, y)

Ensure: Binary class label (1 = Malignant, 0 = Benign)

1:

Step 1: Preprocessing:

2:

Crop 5% from the image borders to remove outer artifacts

3:

Apply intensity filtering using Equation (1) and enhance contrast using CLAHE

4:

Compute the center of mass and extract a

256 \times 256

ROI centered at this location

5:

Step 2: Phase-Based Feature Representation:

6:

Compute the Fourier transform

F (u, v)

of the ROI

7:

Apply Poisson band-pass filter using Equation (3)

8:

Compute the monogenic signal using Equation (2)

9:

Obtain inverse transforms

f_{b p 1} (x, y)

,

f_{b p 2} (x, y)

,

f_{b p} (x, y)

using Equation (6)

10:

Compute:

Local phase $φ (x, y)$ with Equation (7):
Local orientation $θ (x, y)$ with Equation (8)
Phase congruency $P C (x)$ using Equation (9)

11:

Construct a 3-channel image using

φ (x, y)

,

θ (x, y)

, and

P C (x)

12:

Step 3: Feature Extraction with CNN:

13:

Input the 3-channel image into a pre-trained EfficientNetV2 variant (S, M, L, or XL)

14:

Remove the final classification layers

15:

Extract a 1280-dimensional vector from the last fully connected layer

16:

Step 4: Dimensionality Reduction:

17:

Construct the reduced feature descriptor

S_{F C} = [μ, σ^{2}, σ, max, min]

from the original 1280-dimensional vector by computing the statistics defined in Equations (10)–(14)

18:

Step 5: Classification:

19:

Classify

S_{F C}

using ensemble learning methods: Voting, Stacking, Bagging, and Boosting

20:

Return predicted class label

4. Experimental Results

This section evaluates and discusses the performance of the proposed method for breast cancer detection in mammograms. Section 4.1 describes the experimental setup and evaluation metrics. Section 4.2 presents the performance results of the proposed method for feature extraction and classification on well-known mammography datasets. First, the performance of EfficientNetV2 variants trained with the proposed methodology is evaluated on the combined mini-DDSM, MIAS, and CSAW-M datasets. Then, results obtained using only the CSAW-M dataset are discussed. Section 4.3 discusses both the overall performance of the proposed method and the impact of the phase-based features for mammography representation. Finally, Section 4.4 presents a comparative performance evaluation between the proposed method and existing approaches for breast cancer detection.

4.1. Experimental Details and Evaluation

The proposed method was implemented in Python 3.9.12 using OpenCV 4.8.0, TensorFlow 2.13.0, scikit-learn 1.1.2, and the Phasepack library ( Phasepack library available at: https://bit.ly/47wDCou, accessed on 4 Octuber 2024. The implementation was carried out on an HP personal computer with an Intel Core i5 processor, 12 GB of RAM, and a Windows 10 operating system. Additionally, the Google Colab Pro service with an NVIDIA A100 GPU was used. For feature extraction, the proposed methodology described in Section 3.5 was applied, considering a training/testing split of 70/30. All experiments employed the ensemble classifiers voting, stacking, bagging, and boosting. It is noteworthy that these ensemble classifiers perform effectively for unbalanced datasets and reduction of bias and overfitting effects. For evaluation, four widely used performance metrics are considered: accuracy (

A c c

), precision (

P r e

), recall (

R e c

), and F1-score (

F 1

). These metrics are computed from true positives (

T P

s), true negatives (

T N

s), false positives (

F P

s), and false negatives (

F N

s).

Accuracy (

A c c

) measures the ratio of correct predictions to the total number of predictions:

A c c = \frac{T P + T N}{T P + T N + F P + F N} .

(16)

Precision (

P r e

) quantifies the ratio of true positives to all positive predictions:

P r e = \frac{T P}{T P + F P} .

(17)

Recall (

R e c

), or sensitivity, evaluates the model’s ability to correctly identify positive cases:

R e c = \frac{T P}{T P + F N} .

(18)

Finally, the F1-score (

F 1

) is the harmonic mean of precision and recall:

F 1 = 2 \times \frac{P r e \times R e c}{P r e + R e c} .

(19)

4.2. Performance Evaluation of Feature Extraction and Cancer Detection on Mammography Datasets

This section presents evaluation results of the proposed feature extraction approach based on the EfficientNetV2 architecture. The classification performance of several EfficientNetV2 variants is assessed across three publicly available mammography datasets: mini-DDSM, MIAS, and CSAW-M. In addition, the proposed method is compared with existing feature extraction techniques to quantify its performance in terms of accuracy, precision, recall, and F1-score.

4.2.1. Evaluation of EfficientNetV2 Variants on the Combined Dataset (Mini-DDSM, MIAS, and CSAW-M)

To evaluate the performance of the proposed model, we consider the four variants of EfficientNetV2: S, M, L and XL. The first experiment consists of testing the proposed methodology with the three combined datasets. We utilized 5164 images from mini-DDSM, 322 images from MIAS, and 10,020 images from CSAW-M. A total of 15,506 images were used, of which 10,855 were employed for training and 4651 for testing. The images are preprocessed as explained in Section 3.2 and converted to a monogenic signal space as described in Section 3.3. From each image, an ROI of size

256 \times 256

pixels is extracted. Table 6 shows the results with variants of EfficientNetV2S to EfficientNetV2XL. For all ensemble classifiers, the four performance measures described in Section 4.1 are considered by computing the average of 100 repetitions.

Note that when using the voting KNN classifier, the EfficientNetV2S produced the best results with

86.17 %

accuracy,

77.58 %

precision,

86.17 %

recall, and

79.86 %

F1-score. The stacking classifier with EfficientNetV2M produced the best results with

86.28 %

accuracy,

78.75 %

precision,

86.14 %

recall, and

80.09 %

F1-score. The bagging classifier with EfficientNetV2S model yields

86.24 %

accuracy,

76.54 %

precision,

85.24 %

recall, and

79.80 %

F1-score. For the boosting classifier, both EfficientNetV2M and EfficientNetV2L produced an accuracy of 86.25%. EfficientNetV2L yielded a precision of 84.12%, a recall of 85.20%, and an F1-score of 84.65%, whereas EfficientNetV2L yielded a precision of 84.00%, a recall of 85.05%, and an F1-score of 84.50%.

Although accuracy is often prioritized in classification tasks, the overall best performance in this experiment was achieved using the EfficientNetV2M model with the stacking classifier, with an accuracy of

86.28 %

, a precision of

78.75 %

, a recall of

86.14 %

, and an F1-score of

80.09 %

, as illustrated in Figure 5a.

4.2.2. Evaluation of the EfficientNetV2 Variants on the CSAW-M Dataset

This experiment was carried out using only the CSAW-M dataset, as it provides a large and diverse set of annotated mammography images suitable for robust and clinically relevant evaluation. The results are presented in Table 7. For the voting KNN classifier, the best performance was obtained with the EfficientNetV2M model, achieving an accuracy of

93.45 %

, a precision of

93.27 %

, a recall of

93.45 %

, and an F1-score of

90.29 %

. The best results for the stacking classifier were achieved with the EfficientNetV2XL model, reaching an accuracy of

93.47 %

, a precision of

87.61 %

, a recall of

93.19 %

, and an F1-score of

90.32 %

. The bagging classifier obtained its best performance with the EfficientNetV2S model, obtaining an accuracy of

93.30 %

, a precision of

87.76 %

, a recall of

93.30 %

, and an F1-score of

90.21 %

. The boosting classifier, when evaluated with the EfficientNetV2S model, produced an accuracy of

93.43 %

, a precision of

90.51 %

, a recall of

93.43 %

, and an F1-score of

90.29 %

. Considering all four evaluation metrics, the highest overall classification performance was obtained with the EfficientNetV2XL model using the stacking classifier, as illustrated in Figure 5b.

4.3. Performance Discussion and Contribution of Phase-Based Features

This subsection discusses the proposed method’s performance, focusing on three key aspects. First, the effectiveness of the compact statistical descriptor used for dimensionality reduction is discussed. Second, confusion matrices are presented and discussed to provide a detailed view of classification performance across different datasets. Finally, the contribution of the proposed phase-based image representation is evaluated by comparing results with and without phase features.

4.3.1. Effectiveness of the Compact Statistical Descriptor

It should be noted that despite the significant dimensionality reduction achieved by using the proposed compact statistical descriptor given in Equation (15), the most relevant information for classification is preserved. This is confirmed by the consistent performance shown in Table 6 and Table 7, even in the absence of data augmentation or extensive hyperparameter tuning. Furthermore, this low-dimensional representation offers practical advantages. For instance, it reduces computational complexity during both training and prediction, which is particularly suitable for implementation in resource-constrained environments. Moreover, by reducing the number of features, the model reduces the risk of overfitting, especially given the relatively small size of the available datasets.

4.3.2. Detailed Analysis of Classification Errors and Correct Predictions

Table 8 presents the confusion matrices obtained from an evaluation of the proposed method for both the combined dataset (DDSM, MIAS, and CSAW-M) and the CSAW-M dataset. The matrix for the combined dataset is based on 4651 test images, including 440 cancer and 4211 no-cancer cases. The CSAW-M matrix corresponds to 3006 test images, with 261 cancer and 2745 no-cancer samples. These results provide a detailed breakdown of classification outcomes, including true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). For the combined dataset, the model correctly classified most samples, with a moderate number of false positives and false negatives. In the case of the CSAW-M dataset, the matrix shows a higher count of correct predictions for both classes, reflecting improved sensitivity and specificity. These results are consistent with the reported performance metrics and highlight the reliability of the proposed method across datasets with different characteristics.

4.3.3. Impact of Phase-Based Features on Classification Performance

To assess the contribution of the proposed phase-based image representation detailed in Section 3.3, we compared the classification performance of the proposed method with and without the use of phase features. These features, namely local phase, orientation, and phase congruency, are used to emphasize structural patterns in mammographic tissue that are not easily distinguishable in intensity-based representations. On the combined dataset (DDSM, MIAS, and CSAW-M), the EfficientNetV2M model combined with the stacking classifier yielded lower performance when excluding the phase features, with 84.65% accuracy, 74.53% precision, 84.65% recall, and an F1-score of 78.95%, compared with the results obtained with phase features, as shown in Figure 5a. On the CSAW-M dataset, the EfficientNetV2XL model with the stacking classifier yielded 91.16% accuracy, 84.97% precision, 91.16% recall, and an F1-score of 87.93% when phase features were excluded, compared with the improved performance obtained with phase features, as shown in Figure 5b. These results confirm that phase-based features provide complementary structural information that improves the model’s ability to distinguish between cancerous and non-cancerous regions. In particular, they enhance the representation of edges and texture patterns, which are often important for clinical interpretation. Therefore, the inclusion of phase features strengthens the overall discriminative capability of the proposed method.

4.4. Performance Comparisons with Existing Methods

One of the biggest challenges in artificial intelligence and image processing is the accurate diagnosis of breast cancer in digital mammograms. Different strategies based on machine learning, CNNs, and hybrid methods have been developed for this purpose. Despite the importance of making fair performance comparisons among the different existing state-of-the-art methods, it should be taken into account that making a direct comparison between these methods is not always feasible. This is due to several factors, such as the number of classes, training datasets, and evaluation criteria, among others. In addition, implementation details and validation specifications are not always available.

Developing a machine learning model for medical diagnosis without using data augmentation is desirable, as it ensures that the model is trained on unaltered training images, thereby preserving clinical relevance. This approach allows for more accurate validation by medical experts. Moreover, avoiding data augmentation helps reduce the risk of overfitting, improving robustness, computational efficiency, and design simplicity. Table 9 presents a performance comparison between the proposed method and six different existing methods, all of which consider the same number of classes and do not use data augmentation. The main similarities and differences between the proposed methodology and these state-of-the-art methods are briefly discussed below.

In Muduli [10], 2236 images were used from MIAS, DDSM, and INbreast datasets. A manually extracted ROI was employed for each mammogram, and feature extraction was performed using the fast discrete curvelet transform with wrapping. Dimensionality reduction was carried out using PCA and linear discriminant analysis (LDA). Additionally, the classification was performed using the ELM with parameters optimized with PSO. In our approach, the ROI is given by the central region of the image, and similar to [10], the extracted features are reduced, but using a simplified approach with minimal computational cost. Without specifying the training/testing ratio, the best reported result in Muduli [10] is an accuracy of

98.94 %

. However, compared with our approach, this model is less complex, as it used only about 14% of the images considered in our methodology.

Bacha [11] uses 991 images, which represents only

7 %

of the total number of images utilized in the development of the proposed model. Feature extraction is performed by the discrete Chebyshev transform from an extracted ROI of size

127 \times 127

pixels. Without specifying the training/testing ratio, the reported accuracy is nearly 100%, and the AUC is close to unity.

In Thawkar [12], 651 images are considered for constructing the model. For ROI extraction, this approach utilizes watershed segmentation in conjunction with morphological filtering. For dimensionality reduction, this method employs an evolutionary algorithm called the crow search algorithm and Harris Hawks optimization (CSHHO). For classification, the ANN and SVM methods are used. Without specifying the training/testing ratio, the best reported results are

97.85 %

accuracy,

97.45 %

specificity, and

98.22 %

sensitivity. It is noteworthy that the number of images represents only the

4.2 %

of the total number of images utilized in our proposed methodology.

Elmoufidi [14] utilized 1923 images from three combined datasets: MIAS, DDSM, and INbreast. In this approach, multiple ROIs are extracted from each image using a bidimensional empirical mode decomposition (BEMD) algorithm. For feature extraction, the GLCM and LBP algorithms were used. In contrast to our approach, the feature vectors are classified using SVM without any dimensionality reduction. Using a k-fold cross-validation, this work reports for the DDSM dataset

98.62 %

accuracy,

98.65 %

specificity,

98.60 %

sensitivity, and an AUC of

0.9818

. For the MIAS dataset, it reports 98.04% accuracy,

98.31 %

specificity,

98.12 %

sensitivity, and

0.9817

AUC. Finally, for the INbreast dataset, the reported values are 98.26% accuracy,

98.21 %

specificity,

97.60 %

sensitivity, and

0.9823

AUC. Notice that these high-performance results were obtained using only

12.6 %

of the total images considered in the proposed method, which limits a proper evaluation of its generality.

In Amin [13], a total of 633 images were considered for the development of the model. A single ROI per image is extracted, which is enhanced by the CLAHE algorithm. Dimensionality reduction is performed using CFS, while classification is carried out using SVM. Additionally, a training–testing–validation of 60-20-20 is considered without number of repetitions. This method reports

96.87 %

of accuracy,

97.23 %

of sensitivity,

95.47 %

of specificity, and

0.98

of AUC. It should be noted that the number of considered images is very small compared with that utilized in the development of the proposed methodology.

Chakravarthy [27] uses 797 images from MIAS, INbreast, and WDBC datasets. A hybrid methodology is utilized that combines CNN features with a machine learning approach. Similar to our approach, the mammograms are preprocessed to reduce noise and artifacts. Feature extraction is performed using the ResNet18 model. Dimensionality reduction is carried out using PSO, DFO, and CSO algorithms. The feature vectors are classified using the weighted KNN algorithm. For performance evaluation, a

75 / 25

train-test ratio and 5-fold cross-validation was considered. This approach achieved an accuracy of 84.35% on the MIAS dataset, 83.19% on the INbreast dataset, and 97.36% on the WDBC dataset. It is noteworthy that, despite employing evolutionary algorithms with high computational complexity for dimensionality reduction, the performance improvement is moderate. In contrast, the proposed approach utilizes a simple statistical approach for feature reduction, obtaining similar performance.

The proposed methodology introduces a preprocessing stage to remove artifacts from digital mammograms, followed by a monogenic signal-based technique for image representation and enhancement of delicate breast tissue to improve feature extraction. A lightweight CNN-based model for feature extraction was constructed using EfficientNetV2 variants, without applying data augmentation. Dimensionality reduction is performed using simple statistical measures. High classification performance for breast cancer detection is achieved through ensemble classifiers, demonstrating the effectiveness of the proposed feature extraction strategy across several well-known datasets. By performing 100 trials for each experiment, we ensured statistically reliable results, providing robust validation of the evaluation methodology.

The consistent performance of the proposed method on substantially larger datasets (Table 9) and its effective feature extraction model, straightforward dimensionality reduction, absence of data augmentation, and rigorous statistical validation through repeated trials make it a more robust, generalizable, and clinically applicable alternative to existing approaches, which are often developed using limited datasets.

5. Conclusions

Digital mammography is one of the best diagnostic tools for breast cancer. In general, the detection of breast cancer using digital mammography can achieve a sensitivity of

86.9 %

and a specificity of

88.9 %

[58]. High-quality mammograms and experienced radiologists are fundamental for breast cancer detection [59,60,61]. Accurate interpretation of mammograms is crucial for minimizing false positives, consequently reducing patient anxiety [62] and the need for frequent mammograms [63]. This work presents a methodology for classifying mammography images into cancer and non-cancer categories by extracting features using EfficientNetV2 convolutional neural network models. The processing of mammograms, along with the use of phase congruency, local phase, and local orientation, is an efficient approach to highlight important details of breast tissue for feature extraction. Dimensionality reduction of feature descriptors using simple statistics increased the overall efficiency of the proposed methodology without sacrificing classification performance for different ensemble classifiers. By combining the MIAS, DDSM, and CSAW-M datasets, a total of

15, 506

images were considered for the development of the proposed method. Using EfficientNetV2S and a stacking classifier, a performance of

86.28 %

accuracy,

78.75 %

precision,

86.14 %

recall, and

80.09 %

F1-score was obtained from the three combined datasets. By considering only the CSAW-M dataset and utilizing EfficientNetV2XL and the stacking classifier, a performance of

93.47 %

accuracy,

87.61 %

precision,

93.19 %

recall, and

90.32 %

F1-score was obtained, confirming the effectiveness of the proposed approach for feature extraction. A limitation of the proposed method is its design for only two categories: malignant (cancer) and benign, which can potentially restrict its usability for a broader diagnosis. Furthermore, although the proposed method was trained on a limited database with some unbalanced categories, it demonstrates strong performance in breast cancer detection. These limitations could be addressed by incorporating newer image databases for training and considering additional classification categories. For future work, we will explore state-of-the-art deep learning techniques and employ advanced hyperparameter optimization to develop an advanced CNN for feature extraction and explore advanced classification ensembles that are more suitable for multiclass breast cancer classification. This work concludes that it is necessary to construct methodologies with a sufficient number of images from different datasets and to define consistent schemes for comparing methodologies, such as the number of classes, images per dataset, training/testing split ratio, number of experiment repetitions, and performance measures.

Author Contributions

Conceptualization, E.O.M.M.; methodology, E.O.M.M.; software, E.O.M.M.; validation, V.H.D.-R.; formal analysis, V.H.D.-R.; investigation, E.O.M.M. and V.H.D.-R.; data curation, E.O.M.M.; writing—original draft preparation, E.O.M.M.; writing—review and editing, V.H.D.-R.; visualization, E.O.M.M.; funding acquisition, V.H.D.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Instituto Politécnico Nacional, Secretaría de Investgación y Posgrado, through project SIP20253728.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: MIAS dataset—https://www.repository.cam.ac.uk/handle/1810/250394; CSAW-M dataset—https://github.com/yueliukth/CSAW-M; mini-DDSM dataset—https://ardisdataset.github.io/MiniDDSM/ (accessed on 19 May 2025).

Acknowledgments

Edgar Omar Molina Molina acknowledge the support of Consejo Nacional de Humanidades Ciencias y Tecnologías (CONAHCYT) [Estancias Posdoctorales por México 2022(1)].

Conflicts of Interest

The authors declare no conflicts of interest.

List of Abbreviations

CNNs	Convolutional neural networks
ROI	Region of interest
LBPs	Local binary patterns
ML	Machine learning
RNNs	Recurrent neural networks
BLSTM	Bidirectional long short-term memory
MIAS	Mammogram Image Analysis Society
DDSM	Digital Database for Screening Mammography
CSAW-M	Classification Dataset for Benchmarking Mammographic Masking of Cancer
CAD	Computer-aided diagnosis
CDTM	Cross-diagonal texture matrix
KPCA	Kernel principal component analysis
GOA	Grasshopper optimization algorithm
AUC	Area under curve
INbreast	Imaging network in breast disease
FDCT-WRP	Fast discrete curvelet transform with wrapping
PCA	Principal component analysis
LDA	Linear discriminant analysis
PSO	Particle swarm optimization
DCT	Discrete Chebyshev transform
WDBC	Wisconsin Diagnostic Breast Cancer
CLAHE	Contrast limited adaptive histogram equalization
DE	Differential evolution
CSA	Crow search algorithm
HHO	Harris Hawks optimization
ANNs	Artificial neural networks
SVM	Support vector machine
KNN	k-nearest neighbors
CESM	Contrast-enhanced spectral mammography
CFS	Correlation-based feature selection
BEMD	Bidimensional empirical mode decomposition
GLCM	Gray level co-occurrence matrix
GLRLM	Gray level run length matrix
RBF	Radial basis function
YOLO	You Only Look Once
MSANet	Multi-scale attention-guided network
MSA	Multi-scale attention
MSAM	Multi-scale attention module
FL	Focal loss
MBConv	MobileNetV2 block convolution
CC	Craniocaudal
MLO	Mediolateral oblique
CBIS-DDSM	Curated Breast Imaging Subset of DDSM
MIB-Net	Multitask information bottleneck network
LE	Low energy
DES	Dual-energy subtraction
BUSI	Breast Ultrasound Image
MEWOA	Modified entropy whale optimization
BiLSTM	Bidirectional long short-term memory
DFOA	Dragonfly optimization algorithm
CSOA	Crow search optimization algorithm
SI-CSO	Self-improved cat swarm optimization
ELM	Extreme learning machine
NR	Not reported
MBConv	Mobile inverted bottleneck convolution
CC	Craniocaudal
MLO	Mediolateral oblique
MIB-Net	Multitask information bottleneck network
LES	Local Enhanced Set
DES	Denoised Enhanced Set
BUSI	Breast Ultrasound Image dataset
DFOA	Dragonfly optimization algorithm
CSOA	Crow search optimization algorithm
FC	Fully connected
TP	True positive
TN	True negative
FP	False positive
FN	False negative
CSHHO	Crow search with Harris Hawks optimization

References

Sun, Y.S.; Zhao, Z.; Yang, Z.N.; Xu, F.; Lu, H.J.; Zhu, Z.Y.; Shi, W.; Jiang, J.; Yao, P.P.; Zhu, H.P. Risk factors and preventions of breast cancer. Int. J. Biol. Sci. 2017, 13, 1387. [Google Scholar] [CrossRef]
Xiao, Z.; Li, L. Breast cancer mortality in Chinese women and men from 1990 to 2019: Analysis of trends in risk factors. J. Obstet. Gynaecol. Res. 2024, 50, 970–981. [Google Scholar] [CrossRef] [PubMed]
McGuire, A.; Brown, J.A.; Malone, C.; McLaughlin, R.; Kerin, M.J. Effects of age on the detection and management of breast cancer. Cancers 2015, 7, 908–929. [Google Scholar] [CrossRef] [PubMed]
Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
Khamparia, A.; Bharati, S.; Podder, P.; Gupta, D.; Khanna, A.; Phung, T.K.; Thanh, D.N. Diagnosis of breast cancer based on modern mammography using hybrid transfer learning. Multidimens. Syst. Signal Process. 2021, 32, 747–765. [Google Scholar] [CrossRef] [PubMed]
Florindo, J.B.; Bruno, O.M. Fractal descriptors of texture images based on the triangular prism dimension. J. Math. Imaging Vis. 2019, 61, 140–159. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar]
Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Mohanty, F.; Rup, S.; Dash, B. Automated diagnosis of breast cancer using parameter optimized kernel extreme learning machine. Biomed. Signal Process. Control 2020, 62, 102108. [Google Scholar] [CrossRef]
Muduli, D.; Dash, R.; Majhi, B. Fast discrete curvelet transform and modified PSO based improved evolutionary extreme learning machine for breast cancer detection. Biomed. Signal Process. Control 2021, 70, 102919. [Google Scholar] [CrossRef]
Bacha, S.; Taouali, O. A novel machine learning approach for breast cancer diagnosis. Measurement 2022, 187, 110233. [Google Scholar] [CrossRef]
Thawkar, S. Feature selection and classification in mammography using hybrid crow search algorithm with harris hawks optimization. Biocybern. Biomed. Eng. 2022, 42, 1094–1111. [Google Scholar] [CrossRef]
Amin, M.N.; Kamal, R.; Farouk, A.; Gomaa, M.; Rushdi, M.A.; Mahmoud, A.M. An efficient hybrid computer-aided breast cancer diagnosis system with wavelet packet transform and synthetically-generated contrast-enhanced spectral mammography images. Biomed. Signal Process. Control 2023, 85, 104808. [Google Scholar] [CrossRef]
Elmoufidi, A. Deep multiple instance learning for automatic breast cancer assessment using digital mammography. IEEE Trans. Instrum. Meas. 2022, 71, 1–13. [Google Scholar] [CrossRef]
Hirra, I.; Ahmad, M.; Hussain, A.; Ashraf, M.U.; Saeed, I.A.; Qadri, S.F.; Alghamdi, A.M.; Alfakeeh, A.S. Breast cancer classification from histopathological images using patch-based deep learning modeling. IEEE Access 2021, 9, 24273–24287. [Google Scholar] [CrossRef]
Al-Antari, M.A.; Han, S.M.; Kim, T.S. Evaluation of deep learning detection and classification towards computer- aided diagnosis of breast lesions in digital x-ray mammograms. Comput. Methods Programs Biomed. 2020, 196, 105584. [Google Scholar] [CrossRef]
Lou, Q.; Li, Y.; Qian, Y.; Lu, F.; Ma, J. Mammogram classification based on a novel convolutional neural network with efficient channel attention. Comput. Biol. Med. 2022, 150, 106082. [Google Scholar] [CrossRef]
Karthiga, R.; Narasimhan, K.; Amirtharajan, R. Diagnosis of breast cancer for modern mammography using artificial intelligence. Math. Comput. Simul. 2022, 202, 316–330. [Google Scholar] [CrossRef]
Wang, J.; Zheng, Y.; Ma, J.; Li, X.; Wang, C.; Gee, J.; Wang, H.; Huang, W. Information bottleneck-based interpretable multitask network for breast cancer classification and segmentation. Med. Image Anal. 2023, 83, 102687. [Google Scholar] [CrossRef]
Wei, T.; Aviles-Rivero, A.I.; Wang, S.; Huang, Y.; Gilbert, F.J.; Schönlieb, C.-B.; Chen, C.W. Beyond fine-tuning: Classifying high resolution mammograms using function- preserving transformations. Med. Image Anal. 2022, 82, 102618. [Google Scholar] [CrossRef]
Ragab, D.A.; Attallah, O.; Sharkas, M.; Ren, J.; Marshall, S. A framework for breast cancer classification using multi-dcnns. Comput. Biol. Med. 2021, 131, 104245. [Google Scholar] [CrossRef]
Zahoor, S.; Shoaib, U.; Lali, I.U. Breast cancer mammograms classification using deep neural network and entropy-controlled whale optimization algorithm. Diagnostics 2022, 12, 557. [Google Scholar] [CrossRef] [PubMed]
Haq, I.U.; Ali, H.; Wang, H.Y.; Lei, C.; Ali, H. Feature fusion and ensemble learning-based cnn model for mammographic image classification. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 3310–3318. [Google Scholar] [CrossRef]
Chaki, J.; Woźniak, M. Deep learning for neurodegenerative disorder (2016 to 2022): A systematic review. Biomed. Signal Process. Control 2023, 80, 104223. [Google Scholar] [CrossRef]
Xia, L.; An, J.; Ma, C.; Hou, H.; Hou, Y.; Cui, L.; Jiang, X.; Li, W.; Gao, Z. Neural network model based on global and local features for multi-view mammogram classification. Neurocomputing 2023, 536, 21–29. [Google Scholar] [CrossRef]
Ding, W.; Zhang, H.; Zhuang, S.; Zhuang, Z.; Gao, Z. Multi-view stereoscopic attention network for 3D tumor classification in automated breast ultrasound. Expert Syst. Appl. 2023, 234, 120969. [Google Scholar] [CrossRef]
Chakravarthy, S.S.; Bharanidharan, N.; Rajaguru, H. Deep learning-based metaheuristic weighted k-nearest neighbor algorithm for the severity classification of breast cancer. IRBM 2023, 44, 100749. [Google Scholar] [CrossRef]
Vidivelli, S.; Devi, S. Breast cancer detection model using fuzzy entropy segmentation and ensemble classification. Biomed. Signal Process. Control 2023, 80, 104236. [Google Scholar] [CrossRef]
Suckling, J. The mammographic images analysis society digital mammogram database. Exerpta Medica Int. Congr. Ser. 1994, 1069, 375–378. [Google Scholar]
Heath, M.; Bowyer, K.; Kopans, D.; Kegelmeyer, P.; Moore, R.; Chang, K.; Munishkumaran, S. Current status of the digital database for screening mammography. In Digital Mammography: Nijmegen; Springer: Dordrecht, The Netherlands, 1998; pp. 457–460. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
El Houby, E.; Yassin, N.I. Malignant and nonmalignant classification of breast lesions in mammograms using convolutional neural networks. Biomed. Signal Process. Control 2021, 70, 102954. [Google Scholar] [CrossRef]
Huang, M.-L.; Lin, T.Y. Considering breast density for the classification of benign and malignant mammograms. Biomed. Signal Process. Control 2021, 67, 102564. [Google Scholar] [CrossRef]
Xu, C.; Lou, M.; Qi, Y.; Wang, Y.; Pi, J.; Ma, Y. Multi-scale attention-guided network for mammograms classification. Biomed. Signal Process. Control 2021, 68, 102730. [Google Scholar] [CrossRef]
Diaz-Escobar, J.; Kober, V.; Díaz-Ramírez, A. Breast cancer detection in digital mammography using phase features and machine learning approach. Appl. Mach. Learn. 2022, 12227, 197–202. [Google Scholar]
Petrini, D.G.P.; Shimizu, C.; Roela, R.A.; Valente, G.V.; Folgueira, M.A.A.K.; Kim, H.Y. Breast cancer diagnosis in two-view mammography using end-to-end trained efficientnet-based convolutional network. IEEE Access 2022, 10, 77723–77731. [Google Scholar] [CrossRef]
Aslan, M.F. A hybrid end-to-end learning approach for breast cancer diagnosis: Convolutional recurrent network. Comput. Electr. Eng. 2023, 105, 108562. [Google Scholar] [CrossRef]
Lekamlage, C.D.; Afzal, F.; Westerberg, E.; Cheddad, A. Mini-DDSM: Mammography-based automatic age estimation. In Proceedings of the 2020 3rd International Conference on Digital Medicine and Image Processing, Kyoto, Japan, 6–9 November 2020; pp. 1–6. [Google Scholar]
Sorkhei, M.; Liu, Y.; Azizpour, H.; Azavedo, E.; Dembrower, K.; Ntoula, D.; Zouzos, A.; Strand, F.; Smith, K. CSAW-M: An ordinal classification dataset for benchmarking mammographic masking of cancer. arXiv 2021, arXiv:2112.01330. [Google Scholar]
Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems IV; Heckbert, P.S., Ed.; Academic Press: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
Morrone, M.C.; Ross, J.; Burr, D.C.; Owens, R. Mach bands are phase dependent. Nature 1986, 324, 250–253. [Google Scholar] [CrossRef]
Morrone, M.; Owens, R. Feature detection from local energy. Pattern Recognit. Lett. 1987, 6, 303–313. [Google Scholar] [CrossRef]
Morrone, M.C.; Burr, D. Feature detection in human vision: A phase-dependent energy model. Proc. R. Soc. Lond. Ser. B Biol. Sci. 1988, 235, 221–245. [Google Scholar]
Felsberg, M.; Sommer, G. The monogenic scale-space: A unifying approach to phase-based image processing in scale-space. J. Math. Imaging Vis. 2004, 21, 5–26. [Google Scholar] [CrossRef]
Kovesi, P. Image features from phase congruency. Videre J. Comput. Vis. Res. 1999, 1, 1–26. [Google Scholar]
Aljuaid, H.; Alturki, N.; Alsubaie, N.; Cavallaro, L.; Liotta, A. Computer-aided diagnosis for breast cancer classification using deep neural networks and transfer learning. Comput. Methods Programs Biomed. 2022, 223, 106951. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Boca Raton, FL, USA, 16–19 December 2019; pp. 6105–6114. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Gupta, S.; Akin, B. Accelerator-aware Neural Network Design Using AutoML. arXiv 2020, arXiv:2003.02838. [Google Scholar] [CrossRef]
Li, H.; Cui, J.; Zhang, X.; Han, Y.; Cao, L. Dimensionality Reduction and Classification of Hyperspectral Remote Sensing Image Feature Extraction. Remote Sens. 2022, 14, 4579. [Google Scholar] [CrossRef]
Ahmad, N.A. Numerically stable locality-preserving partial least squares discriminant analysis for efficient dimensionality reduction and classification of high-dimensional data. Heliyon 2024, 10, e26157. [Google Scholar] [CrossRef]
Yaniv, A.; Beck, Y. Enhancing NILM classification via robust principal component analysis dimension reduction. Heliyon 2024, 10, e30607. [Google Scholar] [CrossRef]
Subasi, A.; Kadasa, B.; Kremic, E. Classification of the cardiotocogram data for anticipation of fetal risks using bagging ensemble classifier. Procedia Comput. Sci. 2020, 168, 34–39. [Google Scholar] [CrossRef]
Jafarzadeh, H.; Mahdianpari, M.; Gill, E.; Mohammadimanesh, F.; Homayouni, S. Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and polsar data: A comparative evaluation. Remote Sens. 2021, 13, 4405. [Google Scholar] [CrossRef]
Fatima, N.; Liu, L.; Hong, S.; Ahmed, H. Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access 2020, 8, 150360–150376. [Google Scholar] [CrossRef]
Ibrahim, S.; Nazir, S.; Velastin, S.A. Feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis. J. Imaging 2021, 7, 225. [Google Scholar] [CrossRef] [PubMed]
Lehman, C.D.; Arao, R.F.; Sprague, B.L.; Lee, J.M.; Buist, D.S.; Kerlikowske, K.; Henderson, L.M.; Onega, T.; Tosteson, A.N.; Rauscher, G.H. National performance benchmarks for modern screening digital mammography: Update from the breast cancer surveillance consortium. Radiology 2017, 283, 49–58. [Google Scholar] [CrossRef] [PubMed]
Rawashdeh, M.A.; Lee, W.B.; Bourne, R.M.; Ryan, E.A.; Pietrzyk, M.W.; Reed, W.M.; Heard, R.C.; Black, D.A.; Brennan, P.C. Markers of good performance in mammography depend on number of annual readings. Radiology 2013, 269, 61–67. [Google Scholar] [CrossRef]
Theberge, I.; Chang, S.L.; Vandal, N.; Daigle, J.M.; Guertin, M.H.; Pelletier, E.; Brisson, J. Radiologist interpretive volume and breast cancer screening accuracy in a canadian organized screening program. J. Natl. Cancer Inst. 2014, 106, 461. [Google Scholar] [CrossRef] [PubMed]
Giess, C.S.; Wang, A.; Ip, I.K.; Lacson, R.; Pourjabbar, S.; Khorasani, R. Patient, radiologist, and examination characteristics affecting screening mammography recall rates in a large academic practice. J. Am. Coll. Radiol. 2019, 16, 411–418. [Google Scholar] [CrossRef] [PubMed]
Nelson, H.D.; Pappas, M.; Cantor, A.; Griffin, J.; Daeges, M.; Humphrey, L. Harms of breast cancer screening: Systematic review to update the 2009 us preventive services task force recommendation. Ann. Intern. Med. 2016, 164, 256–267. [Google Scholar] [CrossRef]
Tosteson, A.; Fryback, D.; Hammond, C.; Hanna, L.; Grove, M.; Brown, M.; Wang, Q.; Lindfors, K.; Pisano, E. Consequences of false-positive screening mammograms. Jama Intern. Med. 2014, 174, 954–961. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the proposed CNN-based scheme for feature extraction and breast cancer detection using EfficientNetV2 variants.

Figure 2. Mammography processing steps. (a) Original mammography. (b) Mammography outer contour selection. (c) Image intensity level range adjustment. (d) Application of CLAHE algorithm for contrast stretching. (e) Mammography center of mass location (intersection of green lines) and ROI area (red box) (f) ROI of the central part of the image.

Figure 3. Proposed mammogram representation using phase-image features. (a) ROI of breast cancer image. (b) Local phase of ROI. (c) Local orientation of ROI. (d) Phase congruency of ROI.

Figure 4. Structure of MBConv and Fused-MBConv. (a) MBConv layer; (b) Fused-MBConv layer.

Figure 5. Best results of the proposed method. (a) EfficientNetV2M combined with the stacking classifier, tested on the combined mini-DDSM, MIAS, and CSAW-M datasets. (b) EfficientNetV2XL combined with the stacking classifier, tested on the CSAW-M dataset.

Table 1. Reported results of recent machine learning-based methods for breast cancer classification. “NR” = Not reported; other acronyms are listed in the acronym list.

Reference	Images	Data Balanced?	No. of Classes	Train–Test Split (%)	Testing Results (%)
Mohanty [9] 92.61 (Acc, DDSM)	319, 1500	No	3	NR	97.49 (Acc, MIAS)
Muduli [10]	326, 1500, 410	No	2	NR	100 (Acc, MIAS) 98.94 (Acc, DDSM) 98.76 (Acc, INbreast)
Bacha [11]	322, 569	No	2	NR	100.00 (Acc, MIAS) 91.13 (Acc, WBCD)
Thawkar [12]	651	No	2	70/30	97.85 (Acc, DDSM)
Amin [13]	633	No	2	60/20/20	96.34 (Acc, CESM)
Elmoufidi [14]	1923	No	2	NR	98.62 (Acc, DDSM) 98.04 (Acc, MIAS) 98.26 (Acc, INbreast)

Table 2. Literature review of recent convolutional neural network-based methods for breast cancer classification. “NR” = Not reported; other acronyms are listed in the acronym list.

Reference	Images	Data Balanced?	No. of Classes	Train–Test Split (%)	CNN Model	Testing Results (%)
Al-antari [16]	600, 103	Yes	2	70/20/10	ResNet-50, Inception-ResNet-V2	97.50 (Acc, DDSM) 95.32 (Acc, INbreast)
El Houby [32]	322, 1592, 387	No	2	NR	Custom	95.30 (Acc, MIAS) 91.20 (Acc, DDSM) 96.52 (Acc, INbreast)
Huang [33]	410	No	2	NR	AlexNet, DenseNet, ShuffleNet	99.72 (Acc, INbreast)
Xu [34]	10,480, 410	No	2	80/20	MSANet	94.2 (AUC, DDSM) 92. 85 (AUC, INbreast)
Diaz-Escobar [35]	322, 410, 2620, 10,020	Yes	2	NR	ResNet50	82.20 (Acc, MIAS, INbreast, DDSM, CSAW-M)
Karthiga [18]	53, 2188, 106	No	2	80/20	Custom	95.95 (Acc, MIAS) 99.39 (Acc, DDSM) 96.53 (Acc, INbreast)
Petrini [36]	3103	No	2	79.21/20.79	EfficientNet-B4	93.44 (AUC, DDSM)
Wei [20]	3103	No	2	85/15	MorphHR	83.13 (AUC, DDSM)
Lou [17]	410	No	2	ECA-Net50, ResNet50	NR	92.29% (Acc, INbreast)
Wang [19]	378, 378	No	2	80/10/10	VGG16	91.28 (Acc, LES, DES)

Table 3. Literature review of recent hybrid convolutional neural network-based methods for breast cancer classification. “NR” = Not reported; other acronyms are listed in the acronym list.

Reference	Images	Class Type	Balanced?	Train–Test Split (%)	Hybrid CNN Model	Testing Results (%)
Ragab [21]	891, 322	Binary	No	NR	AlexNet, GoogleNet, ResNet-18	97.9 (Acc, DDSM) 95.4 (Acc, MIAS)
Zahoor [22]	108, 300, 1696	Binary/Multiclass	No	50/50	MobileNetV2, NasNet Mobile	99.7 (Acc, INbreast) 99.8 (Acc, MIAS) 93.8 (Acc, DDSM)
Haq [23]	322, 70	Binary	No	70/20/10	Custom	99.4 (Acc, MIAS) 98.5 (Acc, BCDR)
Chakravarthy [27]	115, 113, 569	Binary	No	70/25	Custom	84.4 (Acc, MIAS) 83.2 (Acc, INbreast) 97.4 (Acc, WDBC)
Aslan [37]	322, 336	Multiclass	No	80/20	Custom, BiLSTM	97.6 (Acc, MIAS) 98.6 (Acc, INbreast)
Xia [25]	536	Binary	No	80/20	ResNeXt	90.6 (Acc) 94.9 (AUC)
Vidivelli [28]	DR	Multiclass	No	50/20/30	Custom	93.5 (Acc, MIAS) 91.4 (Acc, DDSM)

Table 4. Summary of principal approaches for breast cancer classification in mammography images.

Approach	Advantages	Disadvantages	References
Machine Learning	Easy implementation Accurate results Does not require data augmentation	Sensitive to preprocessing and classifier configuration	[9,10,11,12,13,14]
Deep Learning	Accurate results Suitable for real-time operation Supports detection of multiple regions of interest (ROIs)	Requires a large amount of data Sensitive to hyperparameter configuration Requires GPU for training and inference	[16,17,18,19,20,32,33,34,35,36]
Hybrid Methods	Accurate results Combines low- and high-level features from both machine learning and deep learning	Complex implementation High computational cost Requires large datasets Demands classifier optimization	[21,22,23,25,27,28,37]

Table 5. Specifications of EfficientNetV2 model variants, including input image shape and the output vector length of the final fully connected layer.

Model Variant	Input Shape	Output Vector Length
V2S	$384 \times 384$	1280
V2M	$480 \times 480$	1280
V2L	$480 \times 480$	1280
V2XL	$512 \times 512$	1280

Table 6. Classification results of EfficientNetV2S, V2M, V2L, and V2XL on the combined DDSM, MIAS, and CSAW-M datasets in terms of accuracy (Acc), precision (Pre), recall (Rec), and F1-score.

Model	Classifier	Acc (%)	Pre (%)	Rec (%)	F1 (%)
EfficientNetV2S	Voting KNN	86.17	77.58	86.17	79.86
	Stacking	86.28	76.24	86.24	79.87
	Bagging	86.24	76.54	85.24	79.80
	Boosting	86.24	77.23	86.24	79.85
EfficientNetV2M	Voting KNN	86.13	78.75	86.14	80.09
	Stacking	86.28	78.75	86.14	80.09
	Bagging	84.88	77.70	84.88	80.11
	Boosting	86.25	84.12	85.20	84.65
EfficientNetV2L	Voting KNN	86.10	78.50	86.10	79.95
	Stacking	86.24	86.24	86.24	79.87
	Bagging	84.87	76.99	84.87	79.82
	Boosting	86.25	84.00	85.05	84.50
EfficientNetV2XL	Voting KNN	85.95	78.60	85.95	80.20
	Stacking	86.21	85.97	86.21	79.83
	Bagging	84.88	78.08	84.88	80.28
	Boosting	86.18	78.90	86.18	79.89

Table 7. Classification results of EfficientNetV2S, V2M, V2L, and V2XL on the CSAW-M dataset in terms of accuracy (Acc), precision (Pre), recall (Rec), and F1-score.

Model	Classifier	Acc (%)	Pre (%)	Rec (%)	F1 (%)
EfficientNetV2S	Voting KNN	93.41	93.24	93.41	90.23
	Stacking	93.43	93.43	93.43	90.26
	Bagging	93.30	87.76	93.30	90.21
	Boosting	93.43	90.51	93.43	90.29
EfficientNetV2M	Voting KNN	93.45	93.27	93.45	90.29
	Stacking	93.38	93.38	93.38	90.18
	Bagging	93.22	88.10	93.22	90.21
	Boosting	93.41	89.63	93.41	90.30
EfficientNetV2L	Voting KNN	93.43	93.38	87.33	90.26
	Stacking	93.40	87.61	93.19	90.22
	Bagging	93.28	87.76	93.28	90.17
	Boosting	93.34	89.34	93.34	90.18
EfficientNetV2XL	Voting KNN	93.31	93.25	93.31	90.08
	Stacking	93.47	87.61	93.19	90.32
	Bagging	93.29	87.86	93.29	90.20
	Boosting	93.34	89.45	93.33	90.16

Table 8. Confusion matrices of the proposed method on: (a) the combined dataset (DDSM, MIAS, and CSAW-M; 4651 test images); and (b) the CSAW-M dataset (3006 test images).

True Label	Cancer	No Cancer
	(a)
Cancer	379 (TP)	61 (FN)
No Cancer	102 (FP)	4109 (TN)
	(b)
Cancer	243 (TP)	18 (FN)
No Cancer	34 (FP)	2711 (TN)

Table 9. Comparison of classification performance with state-of-the-art methodologies. “RepVal” indicates whether repetitions on validation were reported.

Reference	Images	RepVal	Acc (%)	Pre (%)	Rec (%)	F1 (%)	AUC
Muduli [10]	2236	No	98.94	–	–	–	–
Bacha & Taouali [11]	991	No	100.00	–	–	–	1.000
Thawkar [12]	651	No	97.85	–	98.22	–	–
Elmoufidi [14]	1923	No	98.62	–	98.60	–	0.9817
Amin [13]	633	No	96.87	–	97.23	–	0.980
Chakravarthy [27]	797	No	97.90	97.36	–	–	–
Proposed (CSAW-M)	10,020	Yes	93.47	87.61	93.19	90.32	–
Proposed (mini-DDSM, MIAS, CSAW-M)	15,506	Yes	86.28	78.75	86.14	80.09	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Molina Molina, E.O.; Diaz-Ramirez, V.H. Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models. Appl. Sci. 2025, 15, 7879. https://doi.org/10.3390/app15147879

AMA Style

Molina Molina EO, Diaz-Ramirez VH. Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models. Applied Sciences. 2025; 15(14):7879. https://doi.org/10.3390/app15147879

Chicago/Turabian Style

Molina Molina, Edgar Omar, and Victor H. Diaz-Ramirez. 2025. "Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models" Applied Sciences 15, no. 14: 7879. https://doi.org/10.3390/app15147879

APA Style

Molina Molina, E. O., & Diaz-Ramirez, V. H. (2025). Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models. Applied Sciences, 15(14), 7879. https://doi.org/10.3390/app15147879

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models

Abstract

1. Introduction

2. Literature Review

2.1. Machine Learning-Based Methods

2.2. Convolutional Neural Network-Based Methods

2.3. Hybrid Methods Based on CNN

3. Proposed Methodology

3.1. Datasets

3.2. Mammography Preprocessing

3.3. Extraction of Phase Image Features

3.4. EfficientNetV2 Models

3.5. Feature Extraction CNN-Based Model

4. Experimental Results

4.1. Experimental Details and Evaluation

4.2. Performance Evaluation of Feature Extraction and Cancer Detection on Mammography Datasets

4.2.1. Evaluation of EfficientNetV2 Variants on the Combined Dataset (Mini-DDSM, MIAS, and CSAW-M)

4.2.2. Evaluation of the EfficientNetV2 Variants on the CSAW-M Dataset

4.3. Performance Discussion and Contribution of Phase-Based Features

4.3.1. Effectiveness of the Compact Statistical Descriptor

4.3.2. Detailed Analysis of Classification Errors and Correct Predictions

4.3.3. Impact of Phase-Based Features on Classification Performance

4.4. Performance Comparisons with Existing Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

List of Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI