N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images
Abstract
1. Introduction
- 1.
- We introduce an innovative multi-task loss function, AMTL, which effectively balances optimization across different tasks and improves overall performance.
- 2.
- We propose a medical image analysis model called N-Unet, which includes two novel modules: the Adaptive Feature Fusion (AFF) module and the Cross-Level Attention Enhancement (CLAE) module. By combining attention mechanisms with feature fusion, these modules improve feature extraction and information flow.
- 3.
- We further propose an inference-stage optimization strategy, Conditional Segmentation Boosting (CSB). This module uses classification results to assist and refine segmentation, thereby improving model reliability and robustness in practical applications.
- 4.
- We evaluate N-Unet on the BUSI and BUS-UCLM datasets, showing strong classification and segmentation performance with only 8.95 M parameters and 14.74 GFLOPs.
2. Related Work
2.1. Classification
2.2. Segmentation
2.3. Hybrid Tasks
3. Method
3.1. Overall Architecture of N-Unet
3.2. Adaptive Feature Fusion Mechanism (AFF)
- 1.
- Depth-wise separable convolution (DS) [36]: DS is an efficient alternative to standard convolution. It decomposes the operation into two consecutive steps: depth-wise convolution and point-wise convolution.In the depth-wise stage, one convolution kernel is applied independently to each input channel. Each channel is therefore filtered only within its own spatial domain. In the point-wise stage, the resulting feature maps are combined across channels using a convolution, which integrates channel-wise information.Given the output feature map from the previous AFF module, denoted as , the DS block first applies spatial filtering to each channel independently. This is done through depthwise convolution:where is the kernel for the c-th channel and ∗ denotes convolution. The resulting feature map is then processed by pointwise convolution to fuse channel-wise information:To clarify the motivation for using depthwise separable convolution in the AFF module, Table 1 compares it with standard convolution in terms of structure, computational cost, and functional advantages.As shown in Table 1, DS convolution separates spatial filtering from channel mixing, which substantially reduces both parameter count and computational cost. This design makes the AFF module more lightweight while preserving essential feature representations. By learning spatial and channel-wise information separately, DS convolution also offers a more flexible feature-extraction mechanism.It is also worth discussing the possibility that standard convolution may yield less accurate results than DS convolution in certain scenarios, despite having a larger parameter space. In medical image analysis, training datasets are often limited in size compared with natural image benchmarks. Standard convolution jointly learns spatial and channel-wise features with parameters, which substantially increases the risk of overfitting when training data is scarce [36]. In contrast, DS convolution factorizes this operation into separate spatial and channel-wise stages, effectively acting as a structural regularizer that constrains the hypothesis space. This factorization has been shown to improve generalization on small-scale datasets [37]. Furthermore, prior studies on lightweight architectures have demonstrated that models employing DS convolution can match or even surpass the accuracy of their standard convolution counterparts when the training set is limited [38]. In the context of breast ultrasound analysis, where annotated data is inherently scarce, this regularization effect is particularly beneficial, enabling the AFF module to learn more robust and transferable feature representations without overfitting to training-specific noise.
- 2.
- SE module: The AFF module incorporates the SE (Squeeze-and-Excitation) module [39], which adaptively modifies the inter-channel weights by comprehending the dependencies of global features. The SE module consists of a squeezing operation, generating a comprehensive description of the feature channel via global average pooling, and an excitation operation, learning the non-linear relationship through fully connected layers and emitting the weights for each channel. These weights function to intensify or suppress specific channel features, thereby elevating the accuracy of feature fusion.The output is subsequently refined by an SE block to emphasize informative channels and suppress less relevant ones. First, global average pooling is applied to aggregate context:Then, two fully connected layers with ReLU and sigmoid activations generate channel-wise attention weights:The recalibrated feature map is obtained by reweighting each channel:
- 3.
- Feature concatenation: The feature map acquired via the depth-wise separable convolution is merged with the corresponding feature map on the primary pathway. This process amalgamates features from varying levels, increasing the depth of the model’s feature representation.The main-branch feature from the downsampling path is denoted as . This is concatenated with the recalibrated side-branch feature :To fuse the concatenated features, a convolutional layer followed by batch normalization and ReLU is applied:An additional SE block is applied to to further enhance channel-wise discriminability, and the final output is processed by a nonlinear mapping:In the actual implementation, the AFF side branch uses a depthwise convolution followed by a pointwise convolution. The fusion operations in Equations (7) and (8) use kernels. In addition, the AFF branch applies dropout with a rate of 0.2, and the embedded SE units use two fully connected layers with a reduction ratio of 16. Equivalently, for an AFF feature with channel dimension C, the two fully connected layers in each SE unit map the channel descriptor from C to and then back from to C.
3.3. Cross-Level Attention Enhancement Module (CLAE Module)
- 1.
- CBAM (Convolutional Block Attention Module): The first component of the CLAE module is CBAM, which guides attention in both the channel and spatial domains of the feature map. In the channel attention process, CBAM captures global context information by using global average pooling and global max pooling. It then generates channel attention maps through two independent fully connected layers. This ensures that the model can emphasize the feature channels most important for the current task. In the spatial attention process, CBAM further highlights important areas in the image based on the output of channel attention. It uses a small convolutional kernel to process the attention map, thus emphasizing the model’s focus on key positions.For the channel attention step, average pooling and max pooling are applied over spatial dimensions to generate descriptors:These are passed through a shared multilayer perceptron (MLP), composed of two fully connected layers, and summed:The channel-refined feature is computed as:For spatial attention, CBAM uses average pooling and max pooling along the channel axis, followed by a convolutional layer:The final CBAM output is obtained by:
- 2.
- AG (Attention Gate): As the second component of the CLAE module, AG gates the feature maps on the upsampling path. AG uses attention coefficients, calculated from the feature map of the previous layer and the feature map in the skip connection, to dynamically adjust the weights of each feature. This mechanism allows the model to suppress irrelevant areas while emphasizing target areas. Let be the output of CBAM and the upsampled feature. These are processed by convolutional layers and summed:where and are learnable convolutional weights, with F denoting the intermediate channel number. and denote ReLU and Sigmoid activations, respectively. The gated feature is obtained by:In implementation, CBAM channel attention is generated by a shared two-layer MLP with a reduction ratio of 16, which maps the channel descriptor from C to , applies a ReLU activation, and then projects it back to C. The same MLP is shared by the average-pooled and max-pooled channel descriptors. Its spatial attention branch uses a convolution. In the attention gate, the projections , , and are implemented with convolutions, matching the practical decoder-side realization of Equation (14).As a decoder-side refinement block, the CLAE module bridges high-level semantic and low-level spatial features. Its dual-attention design and feature fusion pathway enhance detail preservation and foreground focus, especially in complex medical image regions.
3.4. Conditional Segmentation Boosting (CSB Module)
3.5. Adaptive Multi-Task Learning Loss (AMTL Loss)
3.5.1. Classification Loss Function
3.5.2. Segmentation Loss Function
3.5.3. Overall Loss Function
3.5.4. Summary of the Loss Function
4. Results
4.1. Dataset
4.2. Image Preprocessing
4.3. Experimental Platform
4.4. Parameter Settings
4.5. Evaluation Metrics
4.5.1. Evaluation Metrics for Classification
4.5.2. Evaluation Metrics for Segmentation
4.6. Comparative Experiments
4.6.1. Performance Analysis for Classification Task
4.6.2. Performance Analysis for Segmentation Task
4.6.3. Failure Case Analysis
- (a)
- Classification false positive (FP). A normal sample is incorrectly classified as nodule-containing. This type of error typically arises from ambiguous tissue textures that visually resemble nodule echogenicity. Under the CSB formulation (), a false-positive prediction () preserves the segmentation output fully; whether any spurious activation follows depends on the segmentation sub-network’s own behavior, not on the CSB gate.
- (b)
- Classification false negative (FN). A nodule-containing sample is incorrectly classified as normal. Visually, the annotated lesion region exhibits weak discriminability and shares substantial texture similarity with normal tissue patterns, without showing a prominent or typical nodule-like appearance. As a result, the classification branch fails to assign sufficient lesion confidence at the image level.
- (c)
- Segmentation failure with IOU = 0. Classification is correct, but the predicted segmentation region is completely displaced from the ground-truth lesion. The predicted contour latches onto a high-contrast non-lesion structure, likely owing to ambiguous boundary cues in images containing multiple hyperechoic regions.
- (d)
- Segmentation failure with IOU ≈ 0. Classification is correct, but the predicted region is substantially misaligned relative to the ground-truth annotation. The model captures a partial shape in the approximate vicinity of the lesion but fails to accurately delineate the true lesion boundary, likely attributable to the large size and irregular morphology of the target lesion.
4.7. Summary of Comparative Experiments
4.8. Analysis of Ablation Experiment Results
4.8.1. Impact of Individual Modules
4.8.2. Impact of Two-Module Combinations
4.8.3. Impact of Three-Module Combinations
4.8.4. Impact of Full Module Integration
4.8.5. Comparison with Single-Task Counterparts
4.9. Overall Summary
5. Discussion
5.1. Overall Performance Summary
5.2. Component Interaction and Why the Method Works
5.3. Task-Specific Interpretation and Dataset Characteristics
5.4. Failure Case Behavior and CSB Risk Asymmetry
5.5. Limitations
5.6. Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| BUS | Breast ultrasound |
| BUSI | Breast Ultrasound Image Dataset |
| MTL | Multi-task learning |
| AFF | Adaptive Feature Fusion |
| CLAE | Cross-Level Attention Enhancement |
| CSB | Conditional Segmentation Boosting |
| AMTL | Adaptive Multi-Task Loss |
| GT | Ground truth |
| IOU | Intersection over Union |
| DC | Dice coefficient |
References
- Abhisheka, B.; Biswas, S.K.; Purkayastha, B. A comprehensive review on breast cancer detection, classification and segmentation using deep learning. Arch. Comput. Methods Eng. 2023, 30, 5023–5052. [Google Scholar] [CrossRef]
- Lei, S.; Zheng, R.; Zhang, S.; Wang, S.; Chen, R.; Sun, K.; Zeng, H.; Zhou, J.; Wei, W. Global patterns of breast cancer incidence and mortality: A population-based cancer registry data analysis from 2000 to 2020. Cancer Commun. 2021, 41, 1183–1194. [Google Scholar] [CrossRef]
- Nassif, A.B.; Talib, M.A.; Nasir, Q.; Afadar, Y.; Elgendy, O. Breast cancer detection using artificial intelligence techniques: A systematic literature review. Artif. Intell. Med. 2022, 127, 102276. [Google Scholar] [CrossRef]
- Ilesanmi, A.E.; Chaumrattanakul, U.; Makhanov, S.S. A method for segmentation of tumors in breast ultrasound images using the variant enhanced deep learning. Biocybern. Biomed. Eng. 2021, 41, 802–818. [Google Scholar] [CrossRef]
- Shia, W.C.; Lin, L.S.; Chen, D.R. Classification of malignant tumours in breast ultrasound using unsupervised machine learning approaches. Sci. Rep. 2021, 11, 1418. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, H.; Li, Y.; Liu, Q.; Xu, X.; Wang, S.; Yap, P.T.; Shen, D. Multi-task learning for segmentation and classification of tumors in 3D automated breast ultrasound images. Med. Image Anal. 2021, 70, 101918. [Google Scholar] [CrossRef] [PubMed]
- Zhang, G.; Zhao, K.; Hong, Y.; Qiu, X.; Zhang, K.; Wei, B. SHA-MTL: Soft and hard attention multi-task learning for automated breast cancer ultrasound image segmentation and classification. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 1719–1725. [Google Scholar] [PubMed]
- Chowdary, J.; Yogarajah, P.; Chaurasia, P.; Guruviah, V. A multi-task learning framework for automated segmentation and classification of breast tumors from ultrasound images. Ultrason. Imaging 2022, 44, 3–12. [Google Scholar] [CrossRef] [PubMed]
- Xu, M.; Huang, K.; Qi, X. Multi-task learning with context-oriented self-attention for breast ultrasound image classification and segmentation. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI); IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Munich, Germany, 2018; pp. 3–19. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 7482–7491. [Google Scholar]
- Chen, Z.; Badrinarayanan, V.; Lee, C.Y.; Rabinovich, A. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proceedings of the International Conference on Machine Learning; PMLR: Stockholm, Sweden, 2018; pp. 794–803. [Google Scholar]
- Tanaka, H.; Chiu, S.W.; Watanabe, T.; Kaoku, S.; Yamaguchi, T. Computer-aided diagnosis system for breast ultrasound images using deep learning. Phys. Med. Biol. 2019, 64, 235013. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Moon, W.K.; Lee, Y.W.; Ke, H.H.; Lee, S.H.; Huang, C.S.; Chang, R.F. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Comput. Methods Programs Biomed. 2020, 190, 105361. [Google Scholar] [CrossRef] [PubMed]
- Huang, Q.; Zhang, F.; Li, X. Machine learning in ultrasound computer-aided diagnostic systems: A survey. BioMed. Res. Int. 2018, 2018, 4243761. [Google Scholar] [CrossRef]
- Yang, S.; Gao, X.; Liu, L.; Shu, R.; Yan, J.; Zhang, G.; Xiao, Y.; Ju, Y.; Zhao, N.; Song, H. Performance and reading time of automated breast US with or without computer-aided detection. Radiology 2019, 292, 540–549. [Google Scholar] [CrossRef] [PubMed]
- Wang, K.; Liang, S.; Zhong, S.; Feng, Q.; Ning, Z.; Zhang, Y. Breast ultrasound image segmentation: A coarse-to-fine fusion convolutional neural network. Med. Phys. 2021, 48, 4262–4278. [Google Scholar] [CrossRef] [PubMed]
- Singh, V.K.; Rashwan, H.A.; Abdel-Nasser, M.; Sarker, M.M.K.; Akram, F.; Pandey, N.; Romani, S.; Puig, D. An efficient solution for breast tumor segmentation and classification in ultrasound images using deep adversarial learning. arXiv 2019, arXiv:1907.00887. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
- Li, C.; Tan, Y.; Chen, W.; Luo, X.; He, Y.; Gao, Y.; Li, F. ANU-Net: Attention-based Nested U-Net to exploit full resolution features for medical image segmentation. Comput. Graph. 2020, 90, 11–20. [Google Scholar] [CrossRef]
- Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: Piscataway, NJ, USA, 2022; pp. 574–584. [Google Scholar]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin transformer embedding UNet for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5623215. [Google Scholar] [CrossRef]
- Dai, S.; Liu, X.; Wei, W.; Yin, X.; Qiao, L.; Wang, J.; Zhang, Y.; Hou, Y. A multi-scale, multi-task fusion UNet model for accurate breast tumor segmentation. Comput. Methods Programs Biomed. 2025, 258, 108484. [Google Scholar] [CrossRef]
- Aumente-Maestro, C.; Díez, J.; Remeseiro, B. A multi-task framework for breast cancer segmentation and classification in ultrasound imaging. Comput. Methods Programs Biomed. 2025, 260, 108540. [Google Scholar]
- He, Q.; Yang, Q.; Su, H.; Wang, Y. Multi-task learning for segmentation and classification of breast tumors from ultrasound images. Comput. Biol. Med. 2024, 173, 108319. [Google Scholar] [CrossRef]
- Yang, K.; Dong, X.; Tang, F.; Ye, F.; Chen, B.; Liang, S.; Zhang, Y.; Xu, Y. A transformer-based multi-task deep learning model for simultaneous T-stage identification and segmentation of nasopharyngeal carcinoma. Front. Oncol. 2024, 14, 1377366. [Google Scholar] [CrossRef]
- Xu, J.; Shi, L.; Li, S.; Zhang, Y.; Zhao, G.; Shi, Y.; Li, J.; Gao, Y. Pointformer: Keypoint-guided transformer for simultaneous nuclei segmentation and classification in multi-tissue histology images. IEEE Trans. Image Process. 2025, 34, 1489–1504. [Google Scholar]
- Nath, A.; Shukla, S.; Gupta, P. MTMedFormer: Multi-task vision transformer for medical imaging with federated learning. Med. Biol. Eng. Comput. 2025, 63, 3421–3434. [Google Scholar] [CrossRef]
- Tagnamas, J.; Ramadan, H.; Yahyaouy, A.; Tairi, H. Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images. Vis. Comput. Ind. Biomed. Art. 2024, 7, 2. [Google Scholar] [CrossRef]
- Chennupati, S.; Sistu, G.; Yogamani, S.; A Rawashdeh, S. Multinet++: Multi-stream feature aggregation and geometric loss strategy for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; IEEE: Piscataway, NJ, USA, 2019; pp. 233–241. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2017; pp. 1251–1258. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. (TOMS) 1997, 23, 550–560. [Google Scholar] [CrossRef]
- Ruan, J.; Xiang, S.; Xie, M.; Liu, T.; Fu, Y.M. A Multi-Attention and Light-weight UNet for Skin Lesion Segmentation. arXiv 2022, arXiv:2211.01784. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]












| Aspect | Standard Convolution | DS Convolution |
|---|---|---|
| Operation | Full on all channels | Depthwise + pointwise |
| Parameters | ||
| Computation | High | Low |
| Efficiency | Large, slow | Small, fast |
| Feature learning | Spatial + channel jointly | Spatial + channel separately |
| Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| Unet [22] | 0.8658 ± 0.0124 | 0.9556 ± 0.0085 | 0.9085 ± 0.0102 | 0.8344 ± 0.0156 |
| M-Unet [43] | 0.9762 ± 0.0054 | 0.9462 ± 0.0091 | 0.9609 ± 0.0068 | 0.9363 ± 0.0074 |
| ResUnet [44] | 0.9197 ± 0.0112 | 0.9767 ± 0.0063 | 0.9474 ± 0.0089 | 0.9108 ± 0.0105 |
| AGUnet [11] | 0.9618 ± 0.0072 | 0.9767 ± 0.0051 | 0.9692 ± 0.0061 | 0.9490 ± 0.0065 |
| ANUnet [23] | 0.9764 ± 0.0048 | 0.9538 ± 0.0077 | 0.9650 ± 0.0059 | 0.9427 ± 0.0062 |
| BTS-Unet [28] | 0.9694 ± 0.0081 | 0.9500 ± 0.0088 | 0.9596 ± 0.0083 | 0.9333 ± 0.0091 |
| UNETR [25] | 0.9078 ± 0.0153 | 0.9922 ± 0.0034 | 0.9481 ± 0.0094 | 0.9067 ± 0.0128 |
| SwinUnet [26] | 0.8909 ± 0.0141 | 0.9899 ± 0.0042 | 0.9378 ± 0.0098 | 0.8917 ± 0.0113 |
| MTF-Unet [27] | 0.9343 ± 0.0096 | 0.9846 ± 0.0055 | 0.9588 ± 0.0071 | 0.9299 ± 0.0084 |
| ACSNet [29] | 0.9500 ± 0.0087 | 0.9596 ± 0.0079 | 0.9548 ± 0.0082 | 0.9250 ± 0.0088 |
| N-Unet (Ours) | 0.9843 ± 0.0041 | 0.9615 ± 0.0064 | 0.9728 ± 0.0050 | 0.9654 ± 0.0049 |
| Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| Unet [22] | 0.7879 ± 0.0185 | 0.6500 ± 0.0285 | 0.7123 ± 0.0212 | 0.7900 ± 0.0194 |
| M-Unet [43] | 0.7931 ± 0.0164 | 0.6216 ± 0.0242 | 0.6970 ± 0.0188 | 0.7917 ± 0.0156 |
| ResUnet [44] | 0.8125 ± 0.0152 | 0.7027 ± 0.0198 | 0.7536 ± 0.0164 | 0.8229 ± 0.0142 |
| AGUnet [11] | 0.8333 ± 0.0148 | 0.6944 ± 0.0215 | 0.7576 ± 0.0159 | 0.8333 ± 0.0135 |
| ANUnet [23] | 0.9697 ± 0.0072 | 0.8421 ± 0.0142 | 0.9014 ± 0.0098 | 0.9300 ± 0.0084 |
| BTS-Unet [28] | 0.9688 ± 0.0078 | 0.8158 ± 0.0156 | 0.8857 ± 0.0112 | 0.9200 ± 0.0091 |
| UNETR [25] | 0.9074 ± 0.0124 | 0.9899 ± 0.0038 | 0.9469 ± 0.0074 | 0.9083 ± 0.0108 |
| SwinUnet [26] | 0.9074 ± 0.0119 | 0.9800 ± 0.0042 | 0.9423 ± 0.0082 | 0.9000 ± 0.0115 |
| MTF-Unet [27] | 0.9630 ± 0.0086 | 0.7027 ± 0.0224 | 0.8125 ± 0.0148 | 0.8800 ± 0.0122 |
| ACSNet [29] | 1.0000 ± 0.0000 | 0.8286 ± 0.0135 | 0.9062 ± 0.0104 | 0.9400 ± 0.0081 |
| N-Unet (Ours) | 0.9667 ± 0.0082 | 0.9062 ± 0.0114 | 0.9355 ± 0.0089 | 0.9583 ± 0.0048 |
| Model | SE | PC | DC | IOU | Acc |
|---|---|---|---|---|---|
| Unet [22] | 0.6137 ± 0.0452 | 0.5937 ± 0.0418 | 0.5421 ± 0.0487 | 0.4893 ± 0.0496 | 0.9261 ± 0.0125 |
| M-Unet [43] | 0.7842 ± 0.0215 | 0.7631 ± 0.0234 | 0.7510 ± 0.0248 | 0.6827 ± 0.0312 | 0.9572 ± 0.0084 |
| ResUnet [44] | 0.6645 ± 0.0385 | 0.6358 ± 0.0354 | 0.6234 ± 0.0398 | 0.5569 ± 0.0415 | 0.9423 ± 0.0102 |
| AGUnet [11] | 0.6472 ± 0.0364 | 0.6521 ± 0.0321 | 0.6064 ± 0.0372 | 0.5227 ± 0.0394 | 0.9482 ± 0.0095 |
| ANUnet [23] | 0.7853 ± 0.0198 | 0.8152 ± 0.0165 | 0.7854 ± 0.0184 | 0.7226 ± 0.0215 | 0.9617 ± 0.0072 |
| BTS-Unet [28] | 0.7153 ± 0.0284 | 0.6763 ± 0.0292 | 0.6767 ± 0.0275 | 0.6042 ± 0.0324 | 0.9441 ± 0.0098 |
| UNETR [25] | 0.7032 ± 0.0256 | 0.6427 ± 0.0315 | 0.6998 ± 0.0242 | 0.6185 ± 0.0284 | 0.9554 ± 0.0089 |
| SwinUnet [26] | 0.7473 ± 0.0224 | 0.7265 ± 0.0218 | 0.7208 ± 0.0235 | 0.6531 ± 0.0267 | 0.9703 ± 0.0065 |
| MTF-Unet [27] | 0.7895 ± 0.0182 | 0.8119 ± 0.0154 | 0.7838 ± 0.0176 | 0.7166 ± 0.0208 | 0.9736 ± 0.0051 |
| ACSNet [29] | 0.7937 ± 0.0175 | 0.8004 ± 0.0168 | 0.7846 ± 0.0182 | 0.7201 ± 0.0195 | 0.9674 ± 0.0068 |
| N-Unet (Ours) | 0.8114 ± 0.0125 | 0.8400 ± 0.0112 | 0.8070 ± 0.0134 | 0.7404 ± 0.0158 | 0.9675 ± 0.0062 |
| Model | SE | PC | DC | IOU | Acc |
|---|---|---|---|---|---|
| Unet [22] | 0.6705 ± 0.0342 | 0.6401 ± 0.0318 | 0.6417 ± 0.0365 | 0.6065 ± 0.0384 | 0.9784 ± 0.0058 |
| M-Unet [43] | 0.6895 ± 0.0315 | 0.6923 ± 0.0284 | 0.6716 ± 0.0302 | 0.6332 ± 0.0335 | 0.9727 ± 0.0064 |
| ResUnet [44] | 0.7950 ± 0.0248 | 0.8126 ± 0.0215 | 0.7849 ± 0.0234 | 0.7505 ± 0.0268 | 0.9852 ± 0.0042 |
| AGUnet [11] | 0.8199 ± 0.0205 | 0.8177 ± 0.0198 | 0.8143 ± 0.0182 | 0.7879 ± 0.0215 | 0.9865 ± 0.0039 |
| ANUnet [23] | 0.7848 ± 0.0264 | 0.7591 ± 0.0287 | 0.7654 ± 0.0251 | 0.7372 ± 0.0294 | 0.9781 ± 0.0052 |
| BTS-Unet [28] | 0.7897 ± 0.0252 | 0.7974 ± 0.0224 | 0.7712 ± 0.0241 | 0.7391 ± 0.0275 | 0.9868 ± 0.0038 |
| UNETR [25] | 0.7532 ± 0.0298 | 0.7398 ± 0.0312 | 0.7283 ± 0.0289 | 0.6985 ± 0.0324 | 0.9766 ± 0.0061 |
| SwinUnet [26] | 0.7602 ± 0.0275 | 0.7803 ± 0.0241 | 0.7564 ± 0.0262 | 0.6879 ± 0.0308 | 0.9692 ± 0.0075 |
| MTF-Unet [27] | 0.8582 ± 0.0164 | 0.8702 ± 0.0142 | 0.8547 ± 0.0158 | 0.8243 ± 0.0184 | 0.9851 ± 0.0045 |
| ACSNet [29] | 0.9239 ± 0.0115 | 0.9081 ± 0.0124 | 0.9115 ± 0.0108 | 0.8857 ± 0.0142 | 0.9913 ± 0.0028 |
| N-Unet (Ours) | 0.9329 ± 0.0098 | 0.9215 ± 0.0105 | 0.9216 ± 0.0092 | 0.8974 ± 0.0124 | 0.9917 ± 0.0025 |
| Model | Params (M) | FLOPs (G) |
|---|---|---|
| N-Unet (Ours) | 8.95 | 14.74 |
| M-Unet [43] | 18.50 | 16.40 |
| SwinUnet [26] | 86.09 | 18.22 |
| ANUnet [23] | 24.60 | 21.70 |
| Unet [22] | 29.38 | 25.69 |
| AGUnet [11] | 31.20 | 27.40 |
| ACSNet [29] | 42.99 | 30.96 |
| ResUnet [44] | 40.23 | 29.88 |
| MTF-Unet [27] | 27.31 | 109.12 |
| UNETR [25] | 113.12 | 113.36 |
| BTS-Unet [28] | 25.02 | 142.87 |
| Model Name | AMTL | CSB | AFF | CLAE | Classification | Segmentation | |||
|---|---|---|---|---|---|---|---|---|---|
| Acc | Pixel Acc | IOU | DC | ||||||
| Unet | 0.8217 ± 0.0165 | 0.9261 ± 0.0125 | 0.4893 ± 0.0496 | 0.5421 ± 0.0487 | |||||
| AMTLUnet | ✓ | 0.9167 ± 0.0112 | 0.9173 ± 0.0134 | 0.5252 ± 0.0415 | 0.6056 ± 0.0392 | ||||
| CSBUnet | ✓ | 0.8185 ± 0.0172 | 0.8943 ± 0.0158 | 0.4799 ± 0.0512 | 0.5374 ± 0.0504 | ||||
| AFFUnet | ✓ | 0.8723 ± 0.0138 | 0.9356 ± 0.0118 | 0.5301 ± 0.0428 | 0.5927 ± 0.0385 | ||||
| CLAEUnet | ✓ | 0.8896 ± 0.0125 | 0.9332 ± 0.0121 | 0.5439 ± 0.0406 | 0.6088 ± 0.0374 | ||||
| A_CUnet | ✓ | ✓ | 0.9154 ± 0.0104 | 0.9568 ± 0.0092 | 0.6712 ± 0.0284 | 0.7318 ± 0.0245 | |||
| A_AFFUnet | ✓ | ✓ | 0.8923 ± 0.0118 | 0.9384 ± 0.0112 | 0.5705 ± 0.0356 | 0.6358 ± 0.0312 | |||
| A_CLAEUnet | ✓ | ✓ | 0.9102 ± 0.0109 | 0.9396 ± 0.0108 | 0.5859 ± 0.0342 | 0.6432 ± 0.0305 | |||
| CSB_AFFUnet | ✓ | ✓ | 0.8859 ± 0.0122 | 0.9484 ± 0.0105 | 0.6635 ± 0.0298 | 0.7398 ± 0.0264 | |||
| CSB_CLAEUnet | ✓ | ✓ | 0.8832 ± 0.0128 | 0.9403 ± 0.0114 | 0.6459 ± 0.0315 | 0.7214 ± 0.0278 | |||
| AFF_CLAEUnet | ✓ | ✓ | 0.8904 ± 0.0121 | 0.9317 ± 0.0119 | 0.6011 ± 0.0334 | 0.6650 ± 0.0298 | |||
| A_CSB_AFFUnet | ✓ | ✓ | ✓ | 0.9497 ± 0.0075 | 0.9612 ± 0.0084 | 0.7254 ± 0.0215 | 0.7798 ± 0.0182 | ||
| A_CSB_CLAEUnet | ✓ | ✓ | ✓ | 0.9436 ± 0.0078 | 0.9581 ± 0.0088 | 0.7230 ± 0.0221 | 0.7939 ± 0.0175 | ||
| A_AFF_CLAEUnet | ✓ | ✓ | ✓ | 0.9273 ± 0.0092 | 0.9378 ± 0.0104 | 0.6602 ± 0.0284 | 0.7101 ± 0.0256 | ||
| CSB_AFF_CLAEUnet | ✓ | ✓ | ✓ | 0.9218 ± 0.0098 | 0.9505 ± 0.0091 | 0.6909 ± 0.0242 | 0.7583 ± 0.0218 | ||
| N-Unet (Ours) | ✓ | ✓ | ✓ | ✓ | 0.9654 ± 0.0049 | 0.9675 ± 0.0062 | 0.7404 ± 0.0158 | 0.8070 ± 0.0134 | |
| Model | Precision | Recall | F1-Score | Accuracy | Pixel Acc | IOU | DC |
|---|---|---|---|---|---|---|---|
| N-Unet-Cls | 0.9589 ± 0.0084 | 0.9642 ± 0.0076 | 0.9615 ± 0.0079 | 0.9407 ± 0.0091 | - | - | - |
| N-Unet-Seg | - | - | - | - | 0.9643 ± 0.0082 | 0.7198 ± 0.0215 | 0.7847 ± 0.0194 |
| N-Unet (Ours) | 0.9843 ± 0.0041 | 0.9615 ± 0.0064 | 0.9728 ± 0.0050 | 0.9654 ± 0.0049 | 0.9675 ± 0.0062 | 0.7404 ± 0.0158 | 0.8070 ± 0.0134 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, Y.; Zhu, Z. N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images. J. Imaging 2026, 12, 194. https://doi.org/10.3390/jimaging12050194
Yang Y, Zhu Z. N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images. Journal of Imaging. 2026; 12(5):194. https://doi.org/10.3390/jimaging12050194
Chicago/Turabian StyleYang, Yafeng, and Zhengwei Zhu. 2026. "N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images" Journal of Imaging 12, no. 5: 194. https://doi.org/10.3390/jimaging12050194
APA StyleYang, Y., & Zhu, Z. (2026). N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images. Journal of Imaging, 12(5), 194. https://doi.org/10.3390/jimaging12050194
