Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling

Shi, Yurui; Wang, Enlin; Zhao, Mengda; Zhang, Jianxin

doi:10.3390/app16115280

Open AccessArticle

Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling

¹

College of Computer Science and Enginerring, Dalian Minzu University, Dalian 116650, China

²

Research Center of Multimodal Information Perception and Intelligent Processing, Dalian Minzu University, Dalian 116650, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5280; https://doi.org/10.3390/app16115280

Submission received: 8 April 2026 / Revised: 5 May 2026 / Accepted: 21 May 2026 / Published: 25 May 2026

Download

Browse Figures

Versions Notes

Abstract

Breast cancer histopathological image classification remains a challenging task because reliable diagnosis depends on both fine-grained local lesion characteristics and multi-scale global tissue structures. However, current deep learning approaches often face challenges in effectively integrating these complementary cues, particularly in the presence of staining variations, ambiguous lesion boundaries, and limited annotated datasets. To address these challenges, we propose a novel method called UNI-Phase-Dual Network (UPDNet). This approach enhances the detection of stable lesion boundaries and subtle patterns by incorporating phase congruency, while combining it with global tissue information using the UNI foundation model. The method utilizes two branches to process features from different perspectives, one focusing on fine details and the other capturing broader context. Additionally, we apply a fine-tuning strategy that improves generalization and reduces overfitting in scenarios with small datasets. Experiments on three widely used breast cancer datasets, BRACS, BreakHis, and BACH, demonstrate that UPDNet significantly outperforms existing methods. Specifically, on the 7-class BRACS task, UPDNet achieves 68.58% accuracy, which is a 2.21% improvement over previous methods, and an increase of 1.48% in the weighted F1 score. These results demonstrate the strong potential of UPDNet in breast cancer histopathological image classification.

Keywords:

breast cancer histopathological image classification; phase congruency; UNI foundation model; fine-tuning strategy

1. Introduction

Pathological image analysis is essential in medical diagnosis, particularly in the early detection of cancers and the precise localization of lesion areas. Breast cancer ranks among the most prevalent cancers affecting women worldwide. Early and precise detection can greatly improve survival rates and reduce treatment costs [1]. Traditional diagnostic methods include clinical evaluations, imaging techniques like mammography and ultrasound, and tissue biopsies [2]. Although these methods are essential for early breast cancer screening, they have practical limitations. The complexity of lesions, small lesion sizes, and image quality variations due to uneven staining in pathological images can reduce diagnostic accuracy and increase reliance on the physician’s experience. This has led to growing interest in computer-aided diagnostic methods, which provide a more objective and accurate approach to detecting lesions, improving diagnostic precision in clinical practice [3].

Recent advancements in deep learning have greatly impacted pathological image analysis, with current methods falling into three main categories: Convolutional Neural Network (CNN), Transformer-based models, and emerging Foundation Models for pathology. CNN-based approaches, including ResNet and Inception, have demonstrated strong performance in a range of pathological image classification tasks. These models automatically extract hierarchical features from images through layers of convolutional networks, showing effectiveness in tasks like tumor detection, tissue segmentation, and cancer classification. ResNet [4], for example, introduces residual connections to mitigate the vanishing gradient problem, which allows deeper models to be trained. However, this still does not address the lack of global context and the inability to effectively capture the subtle, fine-grained lesions that are often present in pathological images. Aresta et al. [5] demonstrated CNNs’ effectiveness in breast cancer diagnosis on the BACH dataset, but their models still fall short in capturing multi-scale lesion features, which are essential for reliable diagnosis. Similarly, Jiang et al. [6] proposed an SE-ResNet module that reduces the parameter count while improving performance, but the model still struggles with fine-grained lesion detection and does not effectively combine global tissue structures with local lesion details. These limitations underscore the fact that CNN-based methods are insufficient for handling the complexity and multi-dimensional characteristics of histopathological images, particularly in breast cancer diagnosis.

As deep learning continues to advance in computer vision, Transformer-based architectures have emerged as a new choice. Transformer models, originally successful in natural language processing (NLP), use self-attention mechanisms to capture long-range dependencies and effectively handle global information, showing strong potential in image classification. Vision Transformer (ViT) divides images into patches and models relationships through self-attention, achieving strong performance. Similarly, the Swin Transformer [7], with its sliding window attention mechanism, improves computational efficiency and enhances classification performance, but still faces limitations in capturing local fine details, such as lesion boundaries, which are essential in medical imaging. These approaches, while promising in some contexts, fail to address the core issue of computational efficiency and fine-grained feature extraction, making them less suitable for practical use in medical applications, where annotated data is scarce, and computational resources are limited.

Foundation Models for pathology, trained on large-scale multi-modal datasets, have recently become a promising approach in pathological image analysis. A key feature of these models is their pre-training on vast amounts of unlabeled data, which helps them learn general visual features through self-supervised learning. The main benefit is their ability to be fine-tuned for specific tasks, reducing the reliance on labeled data. Models like CTransPath [8], despite a 10% improvement in breast cancer classification over CNNs, require large labeled datasets and significant computational resources for fine-tuning, making them impractical for many medical applications. Similarly, the UNI model [9], trained on over 100,000 whole slide images, has achieved strong performance across various pathology tasks. However, its success is heavily reliant on massive amounts of annotated data and computational power, posing a challenge for clinical environments where labeled data is limited. The MS2M model [10] faces similar issues, demonstrating strong performance but still needing large-scale pre-training and extensive fine-tuning. While these models offer significant improvements through generalization, their high computational cost and need for large datasets make them less efficient than CNNs in practical medical settings, especially when labeled data is scarce.

Although deep learning methods like CNNs, Transformers, and pathology foundation models have advanced breast cancer histopathological image analysis, they still struggle with one major issue: balancing fine-grained details of local lesions with the broader, multi-scale tissue structure. In breast histopathology, a reliable diagnosis depends not only on subtle features such as lesion boundaries, micro-lesions, and texture variations [11], but also on the overall tissue organization and contextual information. While CNN-based methods excel at extracting local features, they are less effective at modeling global structures. On the other hand, Transformer-based and foundation models, though strong at capturing global semantics, may fall short in handling subtle morphological details and often require high computational costs or task-specific adjustments [12]. As a result, current methods still show limited ability in capturing discriminative structural information across scales [13], especially under staining variation, fuzzy boundaries, and small-sample conditions.

To address these challenges, we propose the UNI-Phase-Dual Network (UPDNet), a novel framework that uniquely integrates phase congruency (PC) with the UNI foundation model. The core innovation of UPDNet lies in its ability to combine fine-grained local features with global semantic context through a dual-branch feature modeling module. One branch, based on DConv, enhances local lesion details, while the other, using ATConv, captures multi-scale tissue context. The branches are adaptively fused using a learnable spatial gating mechanism, allowing UPDNet to effectively handle both local and global features without interference. This synergistic integration enables UPDNet to address the limitations of existing models, such as poor boundary detection and the inability to handle fine details in small-sample situations. Furthermore, UPDNet introduces a parameter-efficient fine-tuning (PEFT) strategy, which updates only a small number of task-specific parameters while freezing the UNI backbone. This approach reduces computational overhead and mitigates overfitting, making UPDNet highly efficient for scenarios with limited annotated data.

Our contributions are summarized as follows:

We propose UPDNet, a novel framework that uniquely combines PC with the UNI foundation model. This integration enables joint modeling of fine-grained local features and global semantic information, addressing the key limitations of existing methods that struggle to effectively combine these complementary aspects.
We design a dual-branch feature refinement module to improve feature representation from two complementary aspects: one branch (DConv) focuses on fine-grained local texture, while the other (ATConv) captures multi-scale context. The branches are adaptively fused via a learnable spatial gating mechanism, reducing feature interference and improving the overall representation.
We introduce a PEFT strategy, which updates only a small number of task-specific parameters while freezing the UNI backbone. This dramatically reduces computational cost and alleviates the problem of overfitting, especially in small-sample scenarios where traditional models would struggle.

2. Methods

In this section, we describe the proposed UPDNet for breast cancer diagnosis. We begin with an overview of the network architecture. Then, we describe the phase congruency-based feature extraction module and the dual-branch feature modeling module used in UPDNet. Finally, we outline the training procedure and evaluation metrics used to assess the model’s performance.

2.1. UPDNet

As illustrated in Figure 1, we detail the architecture of the proposed UPDNet, which is specifically designed for breast cancer pathological image diagnosis. The overall framework of UPDNet mainly consists of three key parts: UNI pre-trained backbone [9] for global semantic feature extraction; PC [14] module for complementary structural prior feature extraction; Dual-branch feature modeling module for adaptive feature refinement and fusion.

The overall process in UPDNet can be represented by the following simplified equation:

\begin{matrix} F_{output} = Classifier (GatingNetwork (Branch 1 (X), Branch 2 (X))), \end{matrix}

(1)

where X represents the input breast cancer pathological image, Branch1 focuses on extracting fine-grained features using DConv (depthwise separable convolution), Branch2 captures long-range context using ATConv (dilated convolution), GatingNetwork adaptively combines the outputs from both branches. To improve training efficiency and alleviate overfitting in small-sample scenarios, a PEFT strategy is adopted for lightweight adaptation.

UPDNet is built upon the UNI pre-trained foundation model [9], which yields strong global semantic representation from large-scale pre-training. To strengthen the model’s sensitivity to lesion boundaries and microstructures, PC is introduced as a structural prior [14]. PC significantly improves lesion detection by stabilizing structural cues, especially for tiny lesions and fuzzy boundaries.

The Dual-branch feature modeling module does not extract new features, but performs adaptive refinement on the fused UNI + PC features. Branch1 and Branch2 focus on different enhancement targets: Branch1 uses DConv [15] to focus on fine-grained local features, such as small lesion details and tissue textures, with a low computational cost, while Branch2 uses ATConv [16] to capture multi-scale context by expanding the receptive field, making it capable of modeling larger tissue structures. Unlike traditional methods that simply concatenate or average feature maps from different sources, our gating mechanism dynamically assigns different weights to the features based on their relevance to regional lesion characteristics, effectively resolving feature interference and enhancing the robustness of lesion detection, especially in complex cases like fuzzy boundaries and subtle lesions.

This adaptive feature fusion, which allows the model to simultaneously capture both local fine-grained details and global tissue context, represents a substantial improvement over existing methods that are either too focused on local textures or too reliant on global structures. By combining the strengths of both approaches, UPDNet is significantly more efficient in handling real-world challenges such as small-sample data and staining variations, making it a powerful tool for practical breast cancer image diagnosis.

2.2. Phase Congruency

Phase congruency (PC) is introduced in UPDNet as a structure-prior module to enhance the representation of lesion boundaries and subtle morphological patterns in breast cancer pathological images. Unlike traditional intensity-based features, which can be sensitive to variations in illumination or contrast, PC is more robust to these changes because it focuses on phase alignment across scales. In Figure 2, the input image undergoes an initial transformation to the frequency domain using the Fast Fourier Transform (FFT), followed by filtering through a series of Log-Gabor filters at various scales [17,18]. The filtered outputs are reverted to the spatial domain through inverse FFT (IFFT), and the phase-based structural responses are subsequently computed.

Formally, given an input image

I (x)

, its response at scale s and orientation o can be obtained by convolving the image with the corresponding Log-Gabor filter. The resulting complex response is decomposed into an even-symmetric component

e_{s, o} (x)

and an odd-symmetry component

o_{s, o} (x)

. Based on these two components, the local amplitude [19] at each scale and orientation is defined as:

\begin{matrix} A_{s, o} (x) = \sqrt{e_{s, o}^{2} (x) + o_{s, o}^{2} (x)}, \end{matrix}

(2)

where

A_{s, o} (x)

reflects the local energy magnitude of the image structure at position x, To aggregate structural responses across multiple scales under the same orientation, the orientation-dependent energy [20] is computed as:

\begin{matrix} E_{o} (x) = \sqrt{{(\sum_{s} e_{s, o} (x))}^{2} + {(\sum_{s} o_{s, o} (x))}^{2}} . \end{matrix}

(3)

This phase congruency measure quantifies the degree of phase alignment across scales, capturing perceptually significant structures such as edges, corners, and lesion boundaries. When Fourier components at different scales are in phase, the corresponding location typically corresponds to stable, reliable structures critical for accurate diagnosis.

Based on the above quantities, the phase congruency map [21] is defined as:

\begin{matrix} P C (x) = \frac{\sum_{o} W_{o} (x) [E_{o} (x) - T_{o}]}{\sum_{s} \sum_{o} A_{s, o} (x) + ε}, \end{matrix}

(4)

where

W_{o} (x)

represents the weighting factor at orientation

o, T

is a noise compensation term that reduces unstable low-energy responses, and

ϵ

is a small constant to avoid numerical instability. The numerator reflects the useful phase-consistent structural energy after noise suppression, while the denominator normalizes the response by the total local amplitude across all scales and orientations.

Unlike traditional intensity- or gradient-based features, PC is unaffected by changes in illumination or contrast because it relies on phase alignment rather than absolute intensity [18,19]. This makes it especially useful for pathological images, where factors like staining variation, local contrast differences, and complex tissue structures can distort intensity-based features [22]. By incorporating PC as a structural prior, the network is guided to focus on more stable morphological features, improving its sensitivity to fine lesion details and irregular tissue boundaries.

2.3. Dual-Branch Feature Modeling Module

After extracting semantic features from the UNI backbone and structural priors from the PC module, UPDNet refines the fused representation with a dual-branch feature modeling module. One branch focuses on fine local textures, while the other captures multi-scale context. A learnable gating mechanism is used to adaptively balance the two branches at different spatial locations [23].

Let

F_{u n i} \in R^{H \times W \times C}

denote the semantic feature map obtained from the UNI backbone, and

F_{P C} \in R^{H \times W \times C}

represent the structural prior generated by the PC module. As detailed in Section 2.2, the two feature streams are initially aligned and subsequently fused to form the input to the dual-branch refinement module. The fused representation is written as:

\begin{matrix} U_{f u s i o n} = ϕ_{u} (F_{uni}) ⊙ (1 + γ (ϕ_{p} (F_{pc}))) + β (ϕ_{p} (F_{pc})), \end{matrix}

(5)

where

ϕ_{u} (\cdot)

and

ϕ_{p} (\cdot)

denote learnable

1 \times 1

projection operators. The channel-wise modulation functions

γ (\cdot)

and

β (\cdot)

explicitly generate the scale and shift terms for feature adaptation, respectively. In our architecture,

γ (\cdot)

consists of a

1 \times 1

convolution followed by a Sigmoid activation, while

β (\cdot)

is implemented as a

1 \times 1

convolution without non-linearity and ⊙ indicating element-wise multiplication. This step adjusts the global semantic features by incorporating structural saliency cues, thereby establishing a strong shared feature foundation.

For each specified transformer block, the adapter is inserted into the architecture with the following configuration: the down-projection matrix dimensions are dynamically inferred from the ViT block’s native feature dimension to a configurable bottleneck dimension (default: 64), and the up-projection matrix maps this bottleneck dimension back to the original ViT block feature dimension. A GELU activation function is used between the down-projection and up-projection layers, instead of ReLU. The adapter parameters are trained alongside the classification head (and optional local MoE branch) using a shared Adam optimizer with a global learning rate (default: 1

\times 10^{- 3}

) and weight decay of 1

\times 10^{- 4}

.

2.3.1. DConv Branch for Fine-Grained Refinement

The DConv branch is designed to enhance subtle lesion boundaries, morphological details, and fine-grained textural patterns. To efficiently extract local features with minimal parameter overhead, we use depthwise separable convolution, which separates spatial filtering from channel transformation [24]. To maintain the original fused information while emphasizing local details, we implement a residual calibration strategy:

\begin{matrix} V_{f i n e} = U_{f u s i o n} + θ_{d} ⊙ C_{d c o n v} (U_{f u s i o n}), \end{matrix}

(6)

where

C_{d c o n v}

refers to the depthwise separable convolution module, which includes batch normalization and nonlinear activation, and

θ_{d}

is a learnable weight tensor used to highlight important local regions.

This residual calibration allows the network to selectively enhance fine-grained texture cues while preserving the original fused semantic-structural representation [4], ensuring that the DConv branch refines local details without redundantly re-extracting features.

2.3.2. ATConv Branch for Multi-Scale Context Modeling

While enhancing local textures is essential for detecting micro-lesions, pathological diagnosis also depends on structural information from larger tissue areas. To model long-range structural dependencies effectively, the ATConv branch uses multi-scale dilated convolutions to increase the receptive field [25], capturing global tissue features without additional computational cost.

This branch adaptively weights and fuses convolutional responses at different scales to highlight effective structural information related to lesions and suppress interference from irrelevant scales. The multi-scale structural feature aggregation can be formulated as:

\begin{matrix} M_{s c a l e} = \sum_{k = 1}^{K} α_{k} \cdot D_{k} (U_{f u s i o n}), \end{matrix}

(7)

where

D_{k}

denotes the dilated convolution operation with the k-

t h

dilation rate,

α_{k}

is the learnable scale weight, and K is the total number of scales used.

By adopting this approach, the ATConv branch can effectively capture multi-scale context while maintaining computational efficiency, making it a perfect complement to the DConv branch. The branch also utilizes a residual structure to preserve the original fused representation while strengthening the global structural context, ensuring the model captures both fine-grained localization and large-scale tissue understanding.

2.3.3. Spatial Gating Fusion

After obtaining the fine-grained refinement feature

V_{f i n e}

and the multi-scale contextual feature

M_{s c a l e}

, UPDNet does not directly concatenate them. Instead, a learnable spatial gating mechanism is introduced to adaptively determine the relative contribution of the two branches at each spatial position. The gate weights are computed as:

\begin{matrix} [ω_{d}, ω_{a}] = Softmax (Ψ (V_{f i n e} ‖ M_{s c a l e})), \end{matrix}

(8)

where

[\cdot | | \cdot]

denotes channel-wise concatenation,

Ψ (\cdot)

is a lightweight gating function implemented by stacked

1 \times 1

convolutions, and

ω_{d}, ω_{a} \in R^{H \times W \times 1}

are normalized spatial coefficients corresponding to the DConv and ATConv branches, respectively.

The final refined representation is then obtained through gated residual fusion:

\begin{matrix} Z_{f u s e d} = U_{f u s i o n} + ω_{d} ⊙ V_{f i n e} + ω_{a} ⊙ M_{s c a l e} . \end{matrix}

(9)

Compared with rough concatenation, this position-wise fusion strategy allows the network to emphasize texture-sensitive responses in subtle lesion regions while relying more on contextual dependency modeling in structurally complex areas [26]. Therefore, feature interference between heterogeneous branches can be alleviated, and the discriminability of the final representation can be improved.

Finally, the fused feature is sent to the classification head for category prediction:

\begin{matrix} \hat{y} = Softmax (W_{c} GAP \cdot (Z_{f u s e d}) + b_{c}), \end{matrix}

(10)

where

G A P (\cdot)

represents global average pooling, and

W_{c}

and

b_{c}

are the classifier parameters.

2.4. Parameter-Efficient Fine-Tuning Strategy

To improve generalization under limited annotated data, UPDNet adopts a PEFT strategy. Instead of updating all parameters of the large pre-trained UNI backbone, the method freezes the backbone and optimizes only a small number of task-specific parameters. This design reduces the optimization burden and helps alleviate overfitting in small-sample breast cancer histopathological image classification.

For each selected transformer block, a bottleneck-style Adapter is inserted to provide lightweight task adaptation [27]. The adapted feature representation is expressed as:

\begin{matrix} {\tilde{H}}_{l} = H_{l} + α_{l} W_{l}^{↑} δ (W_{l}^{↓} LN (H_{l})), \end{matrix}

(11)

where

H_{l}

denotes the feature representation of the l-

t h

transformer block,

W_{l}^{↑}

and

W_{l}^{↓}

are the down-projection and up-projection matrices of the Adapter,

δ (\cdot)

is a nonlinear activation function, and

α_{l}

is a learnable scaling factor. With this residual bottleneck design, only a small number of additional parameters are introduced.

During optimization, the backbone is fixed, while the Adapter, dual-branch module, and classifier are jointly updated. This PEFT strategy efficiently exploits UNI’s prior knowledge, ensuring sufficient capacity for task-specific learning with high parameter efficiency and robustness.

3. Experimental Setup

In this section, we perform experiments on BRACS, BreakHis, and BACH to evaluate the effectiveness of the proposed method. The following subsections provide details on the datasets, evaluation protocol, implementation, and result analysis.

3.1. Datasets

3.1.1. BRACS Dataset

The BRACS dataset, which stands for BReAst Carcinoma Subtyping, is used for classifying breast lesions through H&E-stained histology images. It consists of 547 whole-slide images and 4539 regions of interest, all of which have been annotated by expert pathologists. The dataset is divided into seven categories (Figure 3): Normal (N), Pathological Benign (PB), Usual Ductal Hyperplasia (UDH), Flat Epithelial Atypia (FEA), Atypical Ductal Hyperplasia (ADH), Ductal Carcinoma in Situ (DCIS), and Invasive Carcinoma (IC). With its inclusion of atypical lesions and detailed subtypes, BRACS presents a challenging and clinically meaningful classification task, which is why it was chosen as the primary dataset for this study.

3.1.2. BreakHis Dataset

The BreakHis dataset is commonly used for breast histopathological image classification. It includes 7909 images from 82 patients, with benign and malignant samples taken at four magnification levels: 40×, 100×, 200×, and 400×. Beyond the basic classification of benign versus malignant, BreakHis offers eight histological subtypes: four benign (adenosis, fibroadenoma, phyllodes tumor, tubular adenoma) and four malignant (ductal carcinoma, lobular carcinoma, mucinous carcinoma, papillary carcinoma). With its use of multiple magnifications and diverse subtypes, BreakHis is a useful dataset for testing the robustness of the proposed method across different scales.

3.1.3. BACH Dataset

The BACH Dataset is a public benchmark introduced in the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Its microscopy subset includes 400 H&E-stained breast histology images, evenly distributed across four categories: Normal (N), Benign (B), In Situ Carcinoma (IS), and Invasive Carcinoma (I). Compared to BRACS, BACH offers a more compact and balanced multi-class setting, and compared to BreakHis, it provides a cleaner image-level benchmark for classification performance. For these reasons, BACH was used as an additional dataset to validate the generalization capability of the proposed method.

3.2. Evaluation Metrics

To evaluate the proposed method on BRACS, BreakHis, and BACH, four standard metrics were used: accuracy (ACC), precision (Pre), F1-score, and AUC. These metrics measure the model’s performance from various angles, including overall accuracy, prediction reliability, the balance between precision and recall, and class separability.

The overall accuracy is defined as:

\begin{matrix} ACC = \frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} I ({\hat{y}}_{i} = y_{i}), \end{matrix}

(12)

where

N_{t}

denotes the total number of test samples,

y_{i}

is the ground-truth label of the i-

t h

sample,

{\hat{y}}_{i}

is the model’s predicted label, and

I (\cdot)

is the indicator function (equal to 1 if the prediction matches the ground truth, 0 otherwise).

For multi-class classification, AUC was computed using the one-versus-rest strategy and averaged over all categories. Precision and F1-score were also reported to provide a more comprehensive evaluation of the proposed method.

3.3. Implementation Details

All experiments were implemented using Python 3.11.5 and PyTorch 2.0.1 with CUDA 12.2, and run on a single NVIDIA RTX 3090 Ti GPU. To ensure a fair, rigorous, and highly reproducible comparison, all experiments including those for the baseline models were conducted using the official dataset partitions provided by the original dataset creators. The BRACS dataset contains 4539 images, with 3657 images used for training, 312 images for validation, and 570 images for testing. The BreakHis dataset consists of 7909 images, with each magnification-level subset split into training and test sets in a 7:3 ratio. The training set includes 5536 images, while the test set includes 2373 images. The BACH dataset contains 400 images, which are randomly split into training and test sets in an 8:2 ratio, with 320 images in the training set and 80 images in the test set. All evaluated methods strictly follow the exact same official data splits and preprocessing protocols.

All images are resized to 224 × 224 and preprocessed according to the UNI model’s official pipeline. We use the UNI (ViT-L/16) foundation model as a frozen backbone. For parameter-efficient fine-tuning, we insert adapters into all Transformer blocks, with a bottleneck dimension of 64. Only the adapters, proposed modules, and classification head are trained. The PC module uses a Log-Gabor filter bank with 6 orientations and 5 scales. The noise threshold T is estimated adaptively as in Kovesi’s method. The PC map is concatenated with UNI features and fed into the dual-branch module. The dual-branch module includes a 64-channel stem layer, a DConv branch, and an ATConv branch with multi-scale dilation. A spatial gating mechanism dynamically fuses the two branches. The final classifier uses global average pooling and a fully connected layer with a dropout rate of 0.5. We train using the Adam optimizer with a learning rate of

0.001

, batch size of 64, weight decay of

0.0001

, and cosine annealing learning rate schedule. All reported results are evaluated on the test set; the validation set is used only for model selection. AUC is computed with the one-versus-rest strategy for multi-class tasks. To augment the data, we applied random horizontal flipping and random cropping to the images. The dropout rate in the classifier was set to 0.5 to help regularize the network. We have made our code publicly available at https://github.com/flipped123-wq/UPDNet (accessed on 13 May 2026).

3.4. Experiment Result Analysis

To thoroughly evaluate the effectiveness of UPDNet, comparative experiments were conducted on the BRACS, BACH, and BreakHis datasets. BRACS served as the primary benchmark to assess the fine-grained classification capability, while BACH was used to evaluate generalization performance under balanced class settings. BreakHis was employed to examine UPDNet’s robustness under different magnification levels. As shown in Table 1, the method’s performance was evaluated based on Accuracy, Precision, Recall, F1-score, and AUC. The results show that the proposed method performs consistently well across all three datasets, demonstrating strong classification accuracy, prediction reliability, and class separation. These findings highlight UPDNet’s competitiveness and stability across various classification tasks and data distributions.

3.4.1. Comparison on BRACS 7-Class Dataset

The BRACS (BReAst Carcinoma Subtyping) dataset includes seven categories of breast histopathological images: N, PB, UDH, FEA, ADH, DCIS, and IC. It presents a challenge for fine-grained classification, particularly in detecting micro-lesions with unclear boundaries, often affected by staining and contrast variations. As shown in Table 2, UPDNet outperforms all other methods, achieving the highest weighted F1-score of 67.46%. It also reached 92.8% in the IC category, significantly surpassing other methods. This demonstrates UPDNet’s strong ability to detect micro-lesions and subtle structural details, making it highly effective for breast cancer diagnosis.

In the comparison of methods, UPDNet clearly surpasses several traditional convolutional neural network approaches. For instance, CLAM performs well in the IC category but struggles with more complex categories like ADH and UDH. While Patch-GNN and TransMIL perform decently in some categories, UPDNet consistently shows more balanced and stable results across all classes.

Figure 4 (left) presents the confusion matrix for UPDNet on the BRACS dataset. Correctly classified samples are shown along the diagonal, while misclassifications appear off the diagonal. As seen in the figure, UPDNet performs exceptionally well in most categories, with particularly high performance in the IC category, where it correctly classifies a significant number of samples.

In addition, Figure 5 shows a bar chart comparing the classification accuracy of UPDNet with other methods on the BRACS (left) and BACH (right) datasets. As shown, UPDNet achieves the highest accuracy on the BRACS dataset, outperforming other methods, which demonstrates its strong performance in fine-grained breast cancer classification tasks.

In conclusion, UPDNet delivers exceptional performance on the BRACS dataset, outperforming other methods in accuracy and demonstrating superior robustness in detecting micro-lesions and capturing subtle structural details. Notably, in the IC category, which involves micro-lesions, UPDNet excels, highlighting its effectiveness for early breast cancer diagnosis.

3.4.2. Comparison on BreakHis Dataset

Besides the BRACS dataset, we also assessed UPDNet’s performance on the BreakHis dataset, a widely used benchmark for breast histopathological image classification. We compared UPDNet with several state-of-the-art methods, including DenseNet, ResNet50, and other hybrid models, across four magnification levels. Table 3 presents the classification accuracy comparison at different magnification levels. UPDNet performed exceptionally well at all magnifications, achieving the accuracy of 99.60% at 40×, 99.35% at 100×, 99.81% at 200×, and 99.22% at 400×. These results demonstrate UPDNet’s ability to handle various image scales, making it robust to magnification changes and well-suited for real-world clinical scenarios where different magnification levels are used.

As illustrated in Figure 6, visualization analysis is performed on the proposed UPDNet model under different magnification levels (40×, 100×, 200×, 400×). The attention heatmaps of UPDNet illustrate the regions of interest focused by the model on pathological images at various scales. It can be observed that UPDNet can stably and accurately focus on lesion regions, tissue edges, and fine-grained structures at all magnification levels, while maintaining high sensitivity to micro-lesions and key cellular morphologies. Meanwhile, the model exhibits consistent and reliable responses in both low-magnification global fields of view and high-magnification detail fields of view, demonstrating favorable multi-scale robustness that effectively adapts to the requirements of breast cancer pathological diagnosis at different imaging magnifications in clinical practice.

3.4.3. Comparison on BACH Dataset

To further assess the generalization capability of UPDNet, we conducted experiments on the BACH dataset, a public benchmark released as part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images.

In this experiment, we compared UPDNet with several state-of-the-art methods, including DeiT, Swin Transformer, and ResViT-GANNet, in both 2-class and 4-class classification tasks. Table 4 shows the classification accuracy for both tasks. UPDNet achieved 98.75% accuracy in the 2-class task and 97.50% in the 4-class task, outperforming all other methods. These results demonstrate UPDNet’s strong generalization ability in balanced classification scenarios.

Additionally, Figure 4 (right) shows the confusion matrix for UPDNet on the BACH dataset. The diagonal values represent correctly classified samples, while the off-diagonal values indicate misclassifications. As seen in the figure, UPDNet performs excellently across both datasets, showing strong ability to distinguish between the various categories in both the BRACS and BACH datasets.

Figure 5 (right) shows a bar chart comparing UPDNet’s classification accuracy with that of other methods on the BACH dataset. As illustrated, UPDNet achieves the highest accuracy across both datasets, surpassing the other methods, further highlighting its strong performance in fine-grained breast cancer classification tasks.

3.4.4. Convergence Analysis

To further assess UPDNet’s optimization and training stability, Figure 7 shows the validation performance curves on the BreakHis and BACH datasets over 30 epochs. As seen in Figure 7a, the accuracy curves on BreakHis at four magnification levels (40×, 100×, 200×, and 400×) rise quickly in the early stages of training and then stabilize, indicating that the model converges rapidly while maintaining strong performance across different image scales. This aligns with BreakHis’s role in our experiments, which is to test the model’s robustness across varying magnification factors.

As shown in Figure 7b, on the BACH dataset, both the ACC and AUC curves of the 2-class and 4-class classification tasks exhibit a clear upward trend during the first several epochs and remain stable in the later stage. In particular, the AUC values quickly approach a high level and show only minor fluctuations afterward, suggesting that UPDNet has good class discrimination ability and stable optimization performance under both binary and multi-class settings. Since BACH is used in this paper to further verify the generalization ability of the proposed method under a relatively balanced image-level benchmark, the convergence behavior shown in Figure 7 further supports the effectiveness and reliability of UPDNet on this dataset.

3.5. Ablation Study

To thoroughly evaluate the contribution of each component in UPDNet, we conducted an ablation study to investigate the impact of key modules on the model’s overall performance. Specifically, we examined the effects of removing or modifying various components, including the baseline model (UNI), the PC module, the dual-branch feature learning structure, DConv, ATConv, and PEFT. This ablation study was conducted on the BRACS, BreakHis, and BACH datasets to evaluate the model’s generalization across different types of histopathological data.

Table 5 compares the performance of different model configurations across three breast histopathology datasets to validate the contribution of each module. Taking the BRACS dataset as an example, the baseline model (UNI), which excludes advanced components, achieved an accuracy of 59.30%. The full UPDNet model, incorporating all proposed modules, significantly improved this score to 68.58%, demonstrating the effectiveness of our architecture.

Integrating the PC module into the baseline increased the accuracy from 59.30% to 62.75%, highlighting its importance in capturing fine-grained structural details in histopathological images. Furthermore, introducing the dual-branch feature learning structure (combining both DConv and ATConv) boosted the performance to 66.87%. Specifically, adding only DConv or ATConv to the PC-enhanced baseline yielded 63.98% and 64.53%, respectively. This demonstrates their crucial and complementary roles in local feature extraction and contextual understanding.

Finally, incorporating the PEFT module brought the overall performance to 68.58%. This 1.71% improvement over the 66.87% configuration indicates its beneficial impact on model generalization and effective representation learning. These results confirm that each component of UPDNet contributes to its overall effectiveness, providing a synergistic boost in classification performance.

4. Discussion

This work proposes UPDNet, a novel framework for breast cancer histopathology image classification. As ablation studies (Table 5) reveal, integrating the PC module, dual-branch refinement and PEFT boosts the UNI baseline accuracy from 59.30% to 68.58%. This improvement stems from addressing a fundamental gap in current literature: while CNNs (e.g., ResNet, CLAM) often miss global context, foundation models (e.g., TransMIL) frequently overlook subtle morphological details. UPDNet bridges this gap. The PC module provides a contrast-invariant structural prior that is highly robust to staining variations. Subsequently, the dual-branch architecture explicitly decouples feature learning: the DConv branch captures fine-grained micro-lesions, while the ATConv branch models multi-scale tissue structures.

Furthermore, we observed distinct dataset-specific behaviors. On the complex 7-class BRACS dataset, UPDNet achieves a 92.8% F1-score on IC, thanks to the DConv branch’s sensitivity to subtle cellular atypia. On the BreakHis dataset, the model maintains >99.3% stability across magnifications (40× to 400×), driven by the ATConv branch’s scale-robustness. Despite these advantages, UPDNet has limitations. The PC computation and dual-branch feature modeling increase computational overhead and inference time, limiting deployment in resource-constrained environments. Additionally, it relies on fully annotated datasets and lacks real-world clinical data, hindering clinical decision-making. Future work will explore data-efficient paradigms like weakly-supervised or self-supervised learning.

5. Conclusions

This paper introduces UPDNet, a novel multi-component fusion model aimed at improving the accuracy and robustness of breast cancer classification. By integrating PC, dual-branch feature refinement, and PEFT, UPDNet tackles key challenges like fine-grained lesion detection, multi-scale feature fusion, and small-sample learning. UPDNet improves local feature extraction through the PC module, while the dual-branch feature refinement module combines global semantic and local detail information. The DConv branch enhances fine-grained feature extraction, and the ATConv branch boosts multi-scale contextual modeling. Meanwhile, the PEFT strategy fine-tunes only a small number of parameters, reducing computational costs and enhancing the model’s generalization ability. Experimental results show that UPDNet surpasses existing methods on the BRACS, BreakHis, and BACH datasets, particularly excelling in fine-grained lesion detection and small-sample learning. Its robustness across different datasets and magnification levels, combined with an interpretable attention mechanism, makes it highly reliable and suitable for clinical use. Overall, UPDNet offers efficient and reliable support for early breast cancer diagnosis and treatment decisions.

Author Contributions

J.Z. supervised the research and provided the core concept. Y.S. and E.W. designed the methodology and performed validation. M.Z. wrote the original draft, with critical revisions by Y.S. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Liaoning Province Science and Technology Plan Joint Program (Key Science and Technology Program) under Grant 2024JH2/102600089, and Key R&D Program of Liaoning Province under Grant 2025JH2/102800010.

Data Availability Statement

The datasets employed in this work are publicly accessible via the links below: https://www.bracs.icar.cnr.it/ (BRACS dataset, accessed on 13 January 2026), https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ (BreakHis dataset, accessed on 13 January 2026), and https://iciar2018-challenge.grand-challenge.org/Dataset/ (BACH dataset, accessed on 13 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sharma, B.P.; Purwar, R.K. Computer-Aided Detection and Diagnosis of Breast Cancer: A Review. Adv. Distrib. Comput. Artif. Intell. J. 2024, 13, e31412. [Google Scholar] [CrossRef]
Labrada, A.; Barkana, B.D. A comprehensive review of computer-aided models for breast cancer diagnosis using histopathology images. Bioengineering 2023, 10, 1289. [Google Scholar] [CrossRef]
Liew, X.Y.; Hameed, N.; Clos, J. A review of computer-aided expert systems for breast cancer diagnosis. Cancers 2021, 13, 2764. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Aresta, G.; Araújo, T.; Kwok, S.; Chennamsetty, S.S.; Safwan, M.; Alex, V.; Marami, B.; Prastawa, M.; Chan, M.; Donovan, M.; et al. Bach: Grand challenge on breast cancer histology images. Med. Image Anal. 2019, 56, 122–139. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Chen, L.; Zhang, H.; Xiao, X. Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module. PLoS ONE 2019, 14, e0214587. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Wang, X.; Yang, S.; Zhang, J.; Wang, M.; Zhang, J.; Yang, W.; Huang, J.; Han, X. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 2022, 81, 102559. [Google Scholar] [CrossRef]
Chen, R.J.; Ding, T.; Lu, M.Y.; Williamson, D.F.; Jaume, G.; Song, A.H.; Chen, B.; Zhang, A.; Shao, D.; Shaban, M.; et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 2024, 30, 850–862. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Sun, Y.; Yao, H.; Han, G.; Chen, B.; Wang, P.; Zhang, J. MS2M: Multi-granularity Self-supervised Second-order Multiple Instance Learning for Breast Cancer Pathology Image. IEEE Trans. Big Data 2025, 12, 869–880. [Google Scholar] [CrossRef]
Haq, I.; Gong, Z.; Liang, H.; Zhang, W.; Khan, R.; Gu, L.; Eils, R.; Kang, Y.; Huang, B. A review of breast cancer histopathology image analysis with deep learning: Challenges, innovations, and clinical integration. Image Vis. Comput. 2025, 162, 105708. [Google Scholar] [CrossRef]
Xu, C.; Yi, K.; Jiang, N.; Li, X.; Zhong, M.; Zhang, Y. MDFF-Net: A multi-dimensional feature fusion network for breast histopathology image classification. Comput. Biol. Med. 2023, 165, 107385. [Google Scholar] [CrossRef] [PubMed]
Jiang, B.; Bao, L.; He, S.; Chen, X.; Jin, Z.; Ye, Y. Deep learning applications in breast cancer histopathological imaging: Diagnosis, treatment, and prognosis. Breast Cancer Res. 2024, 26, 137. [Google Scholar] [CrossRef]
Cao, S.Y.; Yu, B.; Luo, L.; Zhang, R.; Chen, S.J.; Li, C.; Shen, H.L. PCNet: A structure similarity enhancement method for multispectral and multimodal image registration. Inf. Fusion 2023, 94, 200–214. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Jia, L.; Dong, J.; Huang, S.; Liu, L.; Zhang, J. Optical and SAR image registration based on multi-scale orientated map of phase congruency. Electronics 2023, 12, 1635. [Google Scholar] [CrossRef]
Xie, Z.; Zhang, W.; Wang, L.; Zhou, J.; Li, Z. Optical and SAR image registration based on the phase congruency framework. Appl. Sci. 2023, 13, 5887. [Google Scholar] [CrossRef]
Tian, Y.; Wen, M.; Lu, D.; Zhong, X.; Wu, Z. Biological basis and computer vision applications of image phase congruency: A comprehensive survey. Biomimetics 2024, 9, 422. [Google Scholar] [CrossRef] [PubMed]
Forero, M.G.; Jacanamejoy, C.A. Unified mathematical formulation of monogenic phase congruency. Mathematics 2021, 9, 3080. [Google Scholar] [CrossRef]
Forero, M.G.; Jacanamejoy, C.A.; Machado, M.; Penagos, K.L. Generalized Quantification Function of Monogenic Phase Congruency. Mathematics 2023, 11, 3795. [Google Scholar] [CrossRef]
Kim, R.; Kim, K.; Lee, Y. A multiscale deep encoder–decoder with phase congruency algorithm based on deep learning for improving diagnostic ultrasound image quality. Appl. Sci. 2023, 13, 12928. [Google Scholar] [CrossRef]
Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Chen, L.; Gu, L.; Zheng, D.; Fu, Y. Frequency-adaptive dilated convolution for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3414–3425. [Google Scholar]
Wang, S.; Li, H.; Wang, Z.; Ouyang, W. Dynamic position-aware network for fine-grained image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 2791–2799. [Google Scholar]
Xie, T.; Dai, K.; Jiang, Z.; Li, R.; Mao, S.; Wang, K.; Zhao, L. ViT-MVT: A unified vision transformer network for multiple vision tasks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 3027–3041. [Google Scholar] [CrossRef] [PubMed]
Aygüneş, B.; Aksoy, S.; Cinbiş, R.G.; Kösemehmetoğlu, K.; Önder, S.; Üner, A. Graph convolutional networks for region of interest classification in breast histopathology. In Proceedings of the Medical Imaging 2020: Digital Pathology, SPIE, Houston, TX, USA, 19–20 February 2020; Volume 11320, pp. 134–141. [Google Scholar]
Lu, M.Y.; Williamson, D.F.; Chen, T.Y.; Chen, R.J.; Barbieri, M.; Mahmood, F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 2021, 5, 555–570. [Google Scholar] [CrossRef]
Shao, Z.; Bian, H.; Chen, Y.; Wang, Y.; Zhang, J.; Ji, X.; Zhang, Y. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 2021, 34, 2136–2147. [Google Scholar]
Pati, P.; Jaume, G.; Foncubierta-Rodriguez, A.; Feroce, F.; Anniciello, A.M.; Scognamiglio, G.; Brancati, N.; Fiche, M.; Dubruc, E.; Riccio, D.; et al. Hierarchical graph representations in digital pathology. Med. Image Anal. 2022, 75, 102264. [Google Scholar] [CrossRef] [PubMed]
Stegmüller, T.; Bozorgtabar, B.; Spahr, A.; Thiran, J.P. Scorenet: Learning non-uniform attention and augmentation for transformer-based histopathological image classification. In Proceedings of the IEEE/CVF winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 6170–6179. [Google Scholar]
Hao, J.; Liu, Y.; Zeng, S.; He, Y. FECT: Classification of Breast Cancer Pathological Images Based on Fusion Features. arXiv 2025, arXiv:2501.10128. [Google Scholar] [CrossRef]
Sheeraz, G.; Chen, Q.; Feiyu, L.; Zhou, F. Adaptive Deep Learning for Multiclass Breast Cancer Classification via Misprediction Risk Analysis. arXiv 2025, arXiv:2503.12778. [Google Scholar]
Chukwu, J.K.; Bala, F.S.; Nuhu, A.S. Breast cancer classification using deep convolutional neural networks. FUOYE J. Eng. Technol. 2021, 6, 35–38. [Google Scholar] [CrossRef]
Hao, Y.; Zhang, L.; Qiao, S.; Bai, Y.; Cheng, R.; Xue, H.; Hou, Y.; Zhang, W.; Zhang, G. Breast cancer histopathological images classification based on deep semantic features and gray level co-occurrence matrix. PLoS ONE 2022, 17, e0267955. [Google Scholar] [CrossRef]
Chattopadhyay, S.; Dey, A.; Singh, P.K.; Sarkar, R. DRDA-Net: Dense residual dual-shuffle attention network for breast cancer classification using histopathological images. Comput. Biol. Med. 2022, 145, 105437. [Google Scholar] [CrossRef]
Ding, M.; Qu, A.; Zhong, H.; Lai, Z.; Xiao, S.; He, P. An enhanced vision transformer with wavelet position embedding for histopathological image classification. Pattern Recognit. 2023, 140, 109532. [Google Scholar] [CrossRef]
Atban, F.; Ekinci, E.; Garip, Z. Traditional machine learning algorithms for breast cancer image classification with optimized deep features. Biomed. Signal Process. Control 2023, 81, 104534. [Google Scholar] [CrossRef]
Sengodan, N. EfficientNet with Hybrid Attention Mechanisms for Enhanced Breast Histopathology Classification: A Comprehensive Approach. arXiv 2024, arXiv:2410.22392. [Google Scholar]
Xiao, M.; Li, Y.; Yan, X.; Gao, M.; Wang, W. Convolutional neural network classification of cancer cytopathology images: Taking breast cancer as an example. In Proceedings of the 2024 7th International Conference on Machine Vision and Applications, Singapore, 12–14 March 2024; pp. 145–149. [Google Scholar]
Ashraf, F.B.; Alam, S.M.; Sakib, S.M. Enhancing breast cancer classification via histopathological image analysis: Leveraging self-supervised contrastive learning and transfer learning. Heliyon 2024, 10, e24094. [Google Scholar] [CrossRef]
Zhou, Y.; Jin, F.; Suo, G.; Yang, J. ResViT-GANNet: A deep learning framework for classifying breast cancer histopathology images using multimodal attention and GAN-based augmentation. BMC Med. Imaging 2025, 25, 401. [Google Scholar] [CrossRef] [PubMed]
Maurya, R.; Pandey, N.N.; Mahapatra, S. BMEA-ViT: Breast cancer classification using lightweight customised vision transformer architecture with multi-head external attention. IEEE Access 2025, 13, 44317–44329. [Google Scholar] [CrossRef]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
He, Z.; Lin, M.; Xu, Z.; Yao, Z.; Chen, H.; Alhudhaif, A.; Alenezi, F. Deconv-transformer (DecT): A histopathological image classification model for breast cancer based on color deconvolution and transformer architecture. Inf. Sci. 2022, 608, 1093–1112. [Google Scholar] [CrossRef]
Hassani, A.; Walton, S.; Li, J.; Li, S.; Shi, H. Neighborhood attention transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6185–6194. [Google Scholar]
Liu, Y.; Liu, X.; Qi, Y. Adaptive threshold learning in frequency domain for classification of breast cancer histopathological images. Int. J. Intell. Syst. 2024, 2024, 9199410. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of UPDNet. The network combines a UNI backbone with phase congruency (PC) for global-local feature fusion, followed by a dual-branch feature modeling module consisting of a DConv branch, an ATConv branch, and a spatial gating network for adaptive refinement. Arrows indicate the direction of data flow.

Figure 2. Detailed schematic of the PC module. Using multi-scale and multi-orientation phase information, the PC module extracts invariant structural features, which are fused with UNI global features via feature-wise modulation to enhance subtle lesion detection.

Figure 3. Sample histopathology images of seven breast lesion types from the BRACS dataset: N, PB, UDH, FEA, ADH, DCIS, and IC.

Figure 4. Confusion matrices of our proposed UPDNet on two breast histopathology datasets. (Left) BRACS dataset with 7 lesion subtypes; (Right) BACH dataset with 4 diagnostic categories.

Figure 5. Accuracy comparison on BRACS and BACH datasets. The bar figures illustrate the image classification accuracy (%) of our proposed UPDNet compared to other baseline and state-of-the-art methods on the BRACS and BACH datasets.

Figure 6. Attention visualization of our UPDNet on BreakHis histopathology images across four magnification factors. The colored regions indicate the key tissue areas the model uses to distinguish between benign and malignant breast lesions.

Figure 7. Training curves of accuracy (ACC) and area under the curve (AUC) for the proposed model on the BreakHis and BACH datasets over 30 epochs. (a) Accuracy curves of the model on BreakHis dataset with four magnification factors (40×, 100×, 200×, 400×); (b) Accuracy and AUC curves of the model on BACH dataset for 2-class and 4-class classification tasks.

Table 1. Classification performance metrics of the proposed method on the BRACS, BreakHis, and BACH datasets under different magnifications and class settings.

Datasets	BRACS (%)	BreakHis				BACH
Datasets	BRACS (%)	40× (%)	100× (%)	200× (%)	400× (%)	2-Class (%)	4-Class (%)
Accuracy	68.58 ± 0.04	99.60 ± 0.07	99.35 ± 0.09	99.81 ± 0.03	99.22 ± 0.08	98.75 ± 0.05	97.50 ± 0.06
Precision	68.68 ± 0.02	99.61 ± 0.05	99.54 ± 0.07	99.76 ± 0.03	99.12 ± 0.06	98.78 ± 0.03	97.62 ± 0.05
Recall	68.58 ± 0.03	99.61 ± 0.05	99.96 ± 0.05	99.76 ± 0.03	99.37 ± 0.06	98.75 ± 0.03	97.50 ± 0.03
F1-Score	68.34 ± 0.05	99.61 ± 0.05	99.24 ± 0.07	99.76 ± 0.03	99.25 ± 0.05	98.75 ± 0.03	97.46 ± 0.03
AUC	91.53 ± 0.03	99.99 ± 0.01	99.94 ± 0.03	99.97 ± 0.02	99.97 ± 0.02	100.00 ± 0.00	99.86 ± 0.05

Table 2. Per-class accuracy (%) and weighted F1-score comparison on the BRACS 7-class breast histopathology dataset. Best results are shown in bold.

Model	N	PB	UDH	FEA	ADH	DCIS	IC	Weighted F1
Patch-GNN [28]	52.5 ± 3.3	47.6 ± 2.2	23.7 ± 4.6	60.7 ± 5.3	30.7 ± 1.8	58.8 ± 1.1	81.6 ± 2.2	52.1 ± 0.6
CLAM [29]	59.4 ± 2.0	47.7 ± 1.2	31.7 ± 0.7	68.3 ± 0.4	20.1 ± 3.4	59.9 ± 1.7	86.8 ± 0.6	54.8 ± 1.0
TransMIL [30]	47.6 ± 9.8	42.9 ± 3.6	41.5 ± 5.3	72.7 ± 2.6	38.4 ± 5.9	62.7 ± 2.9	87.1 ± 3.9	57.5 ± 0.7
CG-GNN [31]	63.6 ± 4.9	47.7 ± 2.9	39.4 ± 4.7	72.1 ± 1.3	28.5 ± 4.3	54.6 ± 2.2	82.2 ± 4.0	56.6 ± 1.3
TG-GNN [31]	58.8 ± 6.8	40.9 ± 3.0	46.8 ± 1.9	63.7 ± 10.5	40.0 ± 3.6	53.8 ± 3.9	81.1 ± 3.3	55.9 ± 1.0
CTransPath [8]	60.0 ± 1.4	47.1 ± 2.7	37.9 ± 2.3	72.9 ± 3.1	43.3 ± 3.8	65.9 ± 1.4	91.0 ± 2.6	59.7 ± 1.1
HACT-Net [31]	61.6 ± 2.1	47.5 ± 2.9	43.6 ± 1.9	74.2 ± 1.4	40.4 ± 2.5	66.4 ± 2.6	88.4 ± 0.2	61.5 ± 0.9
ScoreNet [32]	64.3 ± 1.5	54.0 ± 2.2	45.3 ± 3.4	78.1 ± 2.8	46.7 ± 1.0	62.9 ± 2.0	91.0 ± 1.4	64.4 ± 0.9
FECT [33]	75.5 ± 2.6	51.8 ± 1.3	47.0 ± 1.7	79.6 ± 2.6	45.2 ± 2.2	66.7 ± 2.4	94.3 ± 1.3	65.8 ± 0.8
MultiRisk [34]	78.30 ± 4.3	54.71 ± 2.6	48.40 ± 2.3	77.91 ± 3.9	47.62 ± 2.1	63.66 ± 2.1	91.23 ± 2.7	65.98 ± 1.0
UPDNet (ours)	78.9 ± 3.7	55.34 ± 2.4	51.70 ± 1.9	79.8 ± 3.1	46.9 ± 1.8	64.9 ± 2.5	92.8 ± 2.3	67.46 ± 1.1

Table 3. Comparison of classification Accuracy using different methods on the BreakHis dataset at four magnification levels. Best results are highlighted in bold.

References	Methods	Image Level (%)
References	Methods	40×	100×	200×	400×
Chukwu et al. [35]	DenseNet	93.64	97.42	95.87	94.67
Hao et al. [36]	CNN + GLCM	96.75	95.21	96.57	93.15
Chattopadhyay et al. [37]	DRDA-Net	95.72	94.41	97.43	96.84
Ding et al. [38]	WPE-ViT	97.24	97.84	99.01	98.96
Atban et al. [39]	CNN + FS + ML	94.32	94.32	97.73	95.45
Sengodan et al. [40]	EfficientNet-HA	97.11	98.04	98.25	98.42
Xiao et al. [41]	Inception-V3	95.00	95.10	93.80	92.20
Ashraf et al. [42]	SSCL + TL	93.46	94.96	91.56	93.68
Zhou et al. [43]	ResViT-GANNet	97.75	98.38	99.12	97.62
Maurya et al. [44]	BMEA-ViT	95.74	96.96	98.18	97.25
Ours	UPDNet	99.60	99.35	99.81	99.22

Table 4. Classification accuracy (%) comparison on the BACH breast histopathology dataset under 2-class and 4-class settings. Best results are highlighted in bold.

References	Methods	2-Class	4-Class
Touvron et al. [45]	DeiT	88.75	76.25
liu et al. [7]	Swin Transformer	91.25	86.25
He et al. [46]	DecT	79.06	-
Wang et al. [8]	CtransPath	97.50	95.30
Hassani et al. [47]	NAT	91.25	76.25
Liu et al. [48]	ATL-FD	-	91.25
Chen et al. [9]	UNI	97.56	95.60
Zhou et al. [43]	ResViT-GANNet	-	96.40
Ours	UPDNet	98.75	97.50

Table 5. Ablation study accuracy (%) to validate the contribution of each module in UPDNet across three breast histopathology datasets, with Baseline (UNI) as the foundational model for progressive module integration.

Baseline (UNI)	PC	Dual-Branch		PEFT	BRACS	BreakHis	BACH
Baseline (UNI)	PC	DConv	ATConv	PEFT	BRACS	BreakHis	BACH
✓					59.30	98.51	96.58
✓	✓				62.75	98.63	96.93
✓	✓	✓			63.98	98.83	97.15
✓	✓		✓		64.53	98.97	97.21
✓	✓	✓	✓		66.87	99.25	97.69
✓	✓	✓	✓	✓	68.58	99.54	98.13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, Y.; Wang, E.; Zhao, M.; Zhang, J. Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling. Appl. Sci. 2026, 16, 5280. https://doi.org/10.3390/app16115280

AMA Style

Shi Y, Wang E, Zhao M, Zhang J. Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling. Applied Sciences. 2026; 16(11):5280. https://doi.org/10.3390/app16115280

Chicago/Turabian Style

Shi, Yurui, Enlin Wang, Mengda Zhao, and Jianxin Zhang. 2026. "Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling" Applied Sciences 16, no. 11: 5280. https://doi.org/10.3390/app16115280

APA Style

Shi, Y., Wang, E., Zhao, M., & Zhang, J. (2026). Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling. Applied Sciences, 16(11), 5280. https://doi.org/10.3390/app16115280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling

Abstract

1. Introduction

2. Methods

2.1. UPDNet

2.2. Phase Congruency

2.3. Dual-Branch Feature Modeling Module

2.3.1. DConv Branch for Fine-Grained Refinement

2.3.2. ATConv Branch for Multi-Scale Context Modeling

2.3.3. Spatial Gating Fusion

2.4. Parameter-Efficient Fine-Tuning Strategy

3. Experimental Setup

3.1. Datasets

3.1.1. BRACS Dataset

3.1.2. BreakHis Dataset

3.1.3. BACH Dataset

3.2. Evaluation Metrics

3.3. Implementation Details

3.4. Experiment Result Analysis

3.4.1. Comparison on BRACS 7-Class Dataset

3.4.2. Comparison on BreakHis Dataset

3.4.3. Comparison on BACH Dataset

3.4.4. Convergence Analysis

3.5. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI