MV-UNet: MambaVision U-Net for Breast Cancer Ultrasound Image Segmentation
Abstract
1. Introduction
- Proposes a new deep Mamba-integrated hybrid segmentation paradigm, MV-UNet. A key contribution of this work is the novel end-to-end deep integration of the two advanced architectures, improved MambaVision and UNetMamba, constructing a deeply collaborative encoder–decoder deep Mamba model. This design achieves an excellent balance between segmentation accuracy and model efficiency on public datasets.
- By utilizing the linear computational complexity of Mamba, the number of model parameters is only 14.7% of that of the existing advanced models, and the inference speed is increased by 3.2 times, leading to improved computational performance.
- A plug-and-play LSM is introduced to improve the boundary segmentation accuracy without increasing the inference cost.
- Evaluations on the public BUSI_WHU and BUSI datasets demonstrate that the model achieves an excellent balance between segmentation accuracy and efficiency. Furthermore, experiments with five distinct random seeds on BUSI_WHU provide preliminary evidence for the model’s robustness. This work offers a new perspective and a feasible solution for the design of medical image segmentation models that pursue the synergistic optimization of high accuracy and high efficiency.
2. Related Work
3. Methods
3.1. Overall Architecture
3.2. Improved MambaVision Encoder
3.2.1. Convolutional Block (Conv Block)
3.2.2. SE-MambaVision Stage (SES)
- The input is first normalized by , yielding an intermediate feature .
- The normalized features are then processed by the , yielding an output .
- The output of the Mixer is combined with the original input via a residual connection, yielding the intermediate feature .
- subsequently passes through , followed by the .
- The output of the is finally combined with through another residual connection to produce the final output of this layer, .
3.2.3. Architectural Adaptation Modifications
3.3. Mamba Segmentation Decoder (MSD)
3.3.1. VSS Block
3.3.2. Architectural Adaptation Modification
3.4. Local Supervision Module (LSM)
3.5. Loss Function
4. Experimental Results and Analysis
4.1. Dataset
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Qualitative Analysis
4.5. Quantitative Analysis
5. Conclusions and Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| mIoU | mean Intersection over Union |
| MSD | Mamba segmentation decoder |
| LSM | Local Supervision Module |
| VSS | Visual State Space |
| SES | SE-MambaVision Stage |
| SE-Mixer | Single Expansion Mamba Mixer |
| ASSD | Average Symmetric Surface Distance |
| MLP | Multilayer Perceptron |
| MHSA | Multi-Head Self-Attention |
| S6 | Selective Scan State Space Model |
References
- Liu, M.; Hu, L.; Tang, Y.; Wang, C.; He, Y.; Zeng, C. A deep learning method for breast cancer classification in the pathology images. IEEE J. Biomed. Health Inform. 2022, 26, 5025–5032. [Google Scholar] [CrossRef]
- Xue, C.; Zhu, L.; Fu, H.; Hu, X.; Li, X.; Zhang, H.; Heng, P. Global guidance network for breast lesion segmentation in ultrasound images. Med. Image Anal. 2021, 70, 101989. [Google Scholar] [CrossRef] [PubMed]
- Huang, R.; Lin, M.; Dou, H.; Lin, Z.; Ying, Q.; Jia, X.; Xu, W.; Mei, Z.; Yang, X.; Dong, Y.; et al. Boundary-rendering network for breast lesion segmentation in ultrasound images. Med. Image Anal. 2022, 80, 102478. [Google Scholar] [CrossRef]
- He, Q.; Yang, Q.; Xie, M. HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 2023, 155, 106629. [Google Scholar] [CrossRef]
- Sun, S.; Fu, C.; Xu, S.; Wen, Y.; Ma, T. GLFNet: Global-local fusion network for the segmentation in ultrasound images. Comput. Biol. Med. 2024, 171, 108103. [Google Scholar] [CrossRef] [PubMed]
- Chen, G.; Li, L.; Dai, Y.; Zhang, J.; Yap, M.H. AAU-net: An adaptive attention U-net for breast lesions segmentation in ultrasound images. IEEE Trans. Med. Imaging 2022, 42, 1289–1300. [Google Scholar] [CrossRef]
- Iqbal, A.; Sharif, M. MDA-Net: Multiscale dual attention-based network for breast lesion segmentation using ultrasound images. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 7283–7299. [Google Scholar] [CrossRef]
- Ruan, J.; Xie, M.; Gao, J.; Liu, T.; Fu, Y. Ege-unet: An efficient group enhanced unet for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2023; pp. 481–490. [Google Scholar]
- Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef]
- Qu, X.; Zhou, J.; Jiang, J.; Wang, W.; Wang, H.; Wang, S.; Tang, W.; Lin, X. EH-former: Regional easy-hard-aware transformer for breast lesion segmentation in ultrasound images. Inf. Fusion 2024, 109, 102430. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Ma, J.; Li, F.; Wang, B. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar]
- Ruan, J.; Li, J.; Xiang, S. Vm-unet: Vision mamba unet for medical image segmentation. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, J.Q.; Zhang, Y.; Cui, G.; Li, L. Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv 2024, arXiv:2402.05079. [Google Scholar]
- Xing, Z.; Ye, T.; Yang, Y.; Liu, G.; Zhu, L. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. arXiv 2024, arXiv:2401.13560. [Google Scholar]
- Xie, B.; Yan, Y.; Agam, G. MM-UNet: Meta Mamba UNet for Medical Image Segmentation. arXiv 2025, arXiv:2503.17540. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, Z.; He, X. MambaVesselNet: A hybrid CNN-Mamba architecture for 3D cerebrovascular segmentation. In Proceedings of the 6th ACM International Conference on Multimedia in Asia; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
- Hatamizadeh, A.; Kautz, J. Mambavision: A hybrid mamba-transformer vision backbone. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 25261–25270. [Google Scholar]
- Zhu, E.; Chen, Z.; Wang, D.; Shi, H.; Liu, X.; Wang, L. Unetmamba: An efficient unet-like mamba for semantic segmentation of high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2024, 22, 6001205. [Google Scholar] [CrossRef]
- Huang, J.; Mao, Y.; Deng, J.; Ye, Z.; Zhang, Y.; Zhang, J. Emganet: Edge-aware multi-scale group-mix attention network for breast cancer ultrasound image segmentation. IEEE J. Biomed. Health Inform. 2025, 29, 5631–5641. [Google Scholar] [CrossRef] [PubMed]
- Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Liu, J.; Yang, H.; Zhou, H.Y.; Xi, Y.; Yu, L.; Li, C.; Liang, Y.; Shi, G.; Yu, Y.; Zhang, H.; et al. Swin-umamba: Mamba-based unet with imagenet-based pretraining. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2024; pp. 615–625. [Google Scholar]
- Wang, X.; Zhang, T.; Shao, Y.; Zhang, Y. Crop Disease Detection Based on Unetmamba with Hierarchical Feature Fusion. In Proceedings of the 2025 IEEE 8th International Conference on Signal Processing and Machine Learning (SPML); IEEE: Piscataway, NJ, USA, 2025; pp. 124–129. [Google Scholar]
- Pang, H.; Wu, Y.; Qi, S.; Li, C.; Shen, J.; Yue, Y.; Qian, W.; Wu, J. A fully automatic segmentation pipeline of pulmonary lobes before and after lobectomy from computed tomography images. Comput. Biol. Med. 2022, 147, 105792. [Google Scholar] [CrossRef] [PubMed]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Doc, Y.Z.; Doc, S.W. DualA-Net: A generalizable and adaptive network with dual-branch encoder for medical image segmentation. Comput. Methods Programs Biomed. 2024, 243, 107877. [Google Scholar]
- Tang, F.; Ding, J.; Wang, L.; Xian, M.; Ning, C. Multi-level global context cross consistency model for semi-supervised ultrasound image segmentation with diffusion model. arXiv 2023, arXiv:2305.09447. [Google Scholar]
- Li, Z.; Zheng, Y.; Shan, D.; Yang, S.; Li, Q.; Wang, B. Scribformer: Transformer makes cnn work better for scribble-based medical image segmentation. IEEE Trans. Med. Imaging 2024, 43, 2254–2265. [Google Scholar] [CrossRef]
- Carrilero-Mardones, M.; Parras-Jurado, M.; Nogales, A.; Perez-Martin, J.; Diez, F.J. Deep learning for describing breast ultrasound images with BI-RADS terms. J. Imaging Inform. Med. 2024, 37, 2940–2954. [Google Scholar] [CrossRef] [PubMed]
- Cao, Z.; Yang, G.; Chen, Q.; Chen, X.; Lv, F. Breast tumor classification through learning from noisy labeled ultrasound images. Med. Phys. 2020, 47, 1048–1057. [Google Scholar] [CrossRef] [PubMed]








| Augmentation Operation | Parameters Probability | Notes and Implementation Details |
|---|---|---|
| Random Horizontal Flip | Probability: 50% | The image is flipped left-right along the vertical axis. |
| Random Vertical Flip | Probability: 50% | The image is flipped up-down along the horizontal axis. |
| Random Rotation | Probability: 75% Angles: {90°, 180°, 270°} | Randomly selects one of the three fixed angles for rotation. |
| Random Gaussian Blur | Probability: 50% Blur radius (σ): Sampled uniformly from [0, 1] | This intensity transformation is applied only to the input image; the corresponding ground truth segmentation mask remains unchanged. Blur is implemented using a Gaussian kernel. |
| Parameter | Setting/Value |
|---|---|
| Batch Size | 4 |
| Epochs | 1000 |
| random seed | 123 |
| Optimizer | AdamW |
| Weight Decay | 0.001 |
| pretrained weights | Unused |
| Learning-rate schedule | Cosine Annealing (no warm-up) |
| Thresholding Strategy | Argmax |
| Preprocessing Normalization | Mean = [0.485, 0.456, 0.406] Standard Deviation = [0.229, 0.224, 0.225] |
| Learning Rate for MV-UNet, UNetMamba, and Swin-UMamba (via grid search) | 6 × 10−4 |
| Learning Rate for EMGANet and other models | As per its original publication, |
| Model Selection Criterion (MV-UNet, UNetMamba, and Swin-UMamba) | Highest mIoU on the validation set |
| Model Selection Criterion (EMGANet and other models) | As per its original publication, the highest F1-score on the validation set |
| Component | Specification/Version |
|---|---|
| Operating System | Linux Ubuntu 22.04 |
| CUDA Version | 11.8 |
| Python Version | 3.10.19 |
| Deep Learning Framework | PyTorch 2.0.1 + cu11 |
| CPU | Intel Xeon Silver 4210R |
| GPU | NVIDIA RTX A5000 (24 GB GDDR6 VRAM, 8192 CUDA Cores) |
| Methods | Kappa (%) | Precision (%) | Recall (%) | mIoU (%) | ASSD (pixel) | OA (%) |
|---|---|---|---|---|---|---|
| U-Net2015 | 88.33 | 92.75 | 85.80 | 89.39 | 5.69 | 98.48 |
| SegNet2017 [26] | 89.18 | 92.97 | 87.10 | 90.10 | 6.37 | 98.59 |
| TransUnet2021 [9] | 86.83 | 92.20 | 83.69 | 88.17 | 11.14 | 98.30 |
| MDA-Net2022 [7] | 82.23 | 89.91 | 77.81 | 84.59 | 5.29 | 97.76 |
| DualA-Net2023 [27] | 88.55 | 92.06 | 86.81 | 89.58 | 6.35 | 98.50 |
| EGEUNet2023 [8] | 87.86 | 92.17 | 85.49 | 89.01 | 6.73 | 98.42 |
| MGCC2023 [28] | 89.22 | 92.74 | 87.38 | 90.14 | 5.39 | 98.59 |
| EH-Former2024 [10] | 89.41 | 92.95 | 87.51 | 90.29 | 5.53 | 98.61 |
| ScribFormer2024 [29] | 89.19 | 91.11 | 88.84 | 90.11 | 5.90 | 98.56 |
| EMGANet2025 [20] | 89.45 | 88.57 | 91.95 | 90.32 | 6.10 | 98.56 |
| UNetMamba2025 [19] | 88.62 | 89.46 | 90.47 | 90.16 | 4.59 | 98.77 |
| Swin-UMamba2024 [23] | 88.89 | 90.35 | 90.14 | 90.25 | 4.91 | 98.71 |
| MV-UNet (Ours) | 89.19 (89.34 ± 0.80) | 90.10 (90.06 ± 0.44) | 90.85 (91.11 ± 0.79) | 90.51 (90.57 ± 0.61) | 4.59 (4.73 ± 0.39) | 98.87 (98.89 ± 0.12) |
| Methods | Model Size (MB) | Total Parameters (M) | FPS (Frames per s) | CPU Memory Usage (MB) | GPU Memory Usage (MB) | GFLOPs | Latency (ms) |
|---|---|---|---|---|---|---|---|
| EMGANet2025 [20] | 1264.32 | 331.43 | 9.9 | 122.53 | 2784.27 | 68.08 | 101.01 |
| UNetMamba2025 [19] | 98.38 | 25.79 | 49.96 | 373.21 | 941.29 | 8.73 | 20.02 |
| Swin-UMamba2024 [23] | 239.7 | 59.89 | 32.23 | 243.24 | 733.10 | 87.87 | 31.03 |
| Ours-part1_5 | 141.57 | 37.11 | 73.80 | 301 | 942.85 | 32.00 | 13.55 |
| Ours-part1_5_dim | 197.7 | 49.47 | 24.53 | 338.48 | 1271.36 | 42.70 | 40.77 |
| Ours-part2 | 162.89 | 42.70 | 44.63 | 345.76 | 447.33 | 29.28 | 22.41 |
| Ours-part3 | 181.3 | 45.25 | 32.1 | 290.88 | 648.16 | 28.81 | 31.15 |
| Ours-part4 | 188.15 | 49.32 | 31.9 | 296.94 | 767.67 | 30.70 | 31.35 |
| Ours-part5 | 142.74 | 37.42 | 30.2 | 335.68 | 628.77 | 23.12 | 33.11 |
| MV-UNet (Ours) | 185.90 | 48.73 | 32.1 | 295.97 | 765.42 | 30.16 | 31.15 |
| Methods | Kappa (%) | Precision (%) | Recall (%) | mIoU (%) | ASSD (pixel) | OA (%) |
|---|---|---|---|---|---|---|
| U-Net2015 | 74.40 | 80.83 | 72.81 | 78.88 | 20.58 | 95.96 |
| SegNet2017 [26] | 73.67 | 80.41 | 71.93 | 78.38 | 24.07 | 95.86 |
| TransUnet2021 [9] | 72.58 | 78.02 | 72.16 | 77.64 | 33.07 | 95.62 |
| MDA-Net2022 [7] | 68.71 | 73.67 | 69.39 | 75.11 | 19.81 | 94.96 |
| DualA-Net2023 [27] | 65.32 | 65.60 | 71.93 | 72.91 | 35.12 | 94.02 |
| EGEUNet2023 [8] | 72.92 | 78.54 | 72.27 | 77.87 | 24.07 | 95.69 |
| MGCC2023 [28] | 74.35 | 78.28 | 75.06 | 78.83 | 23.89 | 95.84 |
| EH-Former2024 [10] | 73.99 | 79.46 | 73.30 | 78.59 | 24.49 | 95.85 |
| ScribFormer2024 [29] | 68.71 | 69.84 | 73.50 | 75.06 | 26.67 | 94.71 |
| EMGANet2025 [20] | 78.03 | 76.92 | 83.58 | 81.37 | 18.27 | 96.23 |
| UNetMamba2025 [19] | 67.76 | 65.13 | 63.95 | 75.58 | 21.53 | 96.67 |
| MV-UNet (Ours) | 72.79 | 66.09 | 65.35 | 76.23 | 14.94 | 96.63 |
| Methods | Kappa (%) | Precision (%) | Recall (%) | mIoU (%) | ASSD (pixel) | OA (%) |
|---|---|---|---|---|---|---|
| U-Net2015 (Baseline) | 88.33 | 92.75 | 85.80 | 89.39 | 5.69 | 98.48 |
| Ours-part1_4 (U-Net encoder + 4 layers of MSD and LSM) | 88.27 | 89.98 | 89.25 | 89.94 | 4.99 | 98.76 |
| Ours-part1_5 (U-Net encoder + 5 layers of MSD and LSM) | 88.73 | 89.36 | 90.74 | 90.20 | 5.12 | 98.76 |
| Ours-part1_5_dim (change dim 64 → 74) | 88.52 | 88.74 | 89.93 | 90.36 | 5.24 | 98.78 |
| Ours-part2 (improved MambaVision encoder + U-Net decoder) | 85.74 | 87.63 | 87.51 | 88.45 | 6.33 | 98.58 |
| Ours-part3 (improved MambaVision encoder + MSD (without LSM)) | 87.52 | 88.56 | 89.53 | 89.37 | 5.49 | 98.74 |
| Ours-part4 (Complete model with MambaVision Mixer) | 89.20 | 90.08 | 90.82 | 90.46 | 4.61 | 98.88 |
| Ours-part5 (Complete model, change dim 64 → 56) | 88.81 | 89.74 | 90.77 | 90.32 | 4.83 | 98.81 |
| Ours (Complete model) | 89.19 | 90.10 | 90.85 | 90.51 | 4.59 | 98.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, J.; Cao, C.; Wu, X.; Liu, J.; Liu, L.; Yao, B.; Zheng, J. MV-UNet: MambaVision U-Net for Breast Cancer Ultrasound Image Segmentation. Electronics 2026, 15, 2274. https://doi.org/10.3390/electronics15112274
Lin J, Cao C, Wu X, Liu J, Liu L, Yao B, Zheng J. MV-UNet: MambaVision U-Net for Breast Cancer Ultrasound Image Segmentation. Electronics. 2026; 15(11):2274. https://doi.org/10.3390/electronics15112274
Chicago/Turabian StyleLin, Jiayi, Chenlin Cao, Xiaoxue Wu, Jinze Liu, Lei Liu, Bizheng Yao, and Jiali Zheng. 2026. "MV-UNet: MambaVision U-Net for Breast Cancer Ultrasound Image Segmentation" Electronics 15, no. 11: 2274. https://doi.org/10.3390/electronics15112274
APA StyleLin, J., Cao, C., Wu, X., Liu, J., Liu, L., Yao, B., & Zheng, J. (2026). MV-UNet: MambaVision U-Net for Breast Cancer Ultrasound Image Segmentation. Electronics, 15(11), 2274. https://doi.org/10.3390/electronics15112274

