RPFeaNet: Rethinking Deep Progressive Prompt-Guided Feature Interaction Fusion Network for Medical Ultrasound Image Segmentation
Abstract
1. Introduction
- The progressive prompt generation module is used to effectively enrich multi-level spatial prior information for low-contrast ultrasound images from a progressive prompt-driven perspective.
- The high-level prompt-guided feature interaction fusion module is designed to achieve progressive prompt interaction fusion via visual mamba network and stage-wise conditioning injection.
- The extensive experiments on six ultrasound image benchmark datasets demonstrate that our proposed method, RPFeaNet, achieves superior performance over comparative state-of-the-art methods.
2. Related Works
2.1. CNN/Transformer for Ultrasound Image Segmentation
2.2. Prompt-Guided Ultrasound Image Segmentation
3. The Proposed RPFeaNet
3.1. The Overview of RPFeaNet
- PPTB: Given input I, a frozen Transformer backbone outputs stable structural prompts .
- PPSB: Based on I, patch embeddings are synthesized into dense prompts .
- LFE: A lightweight encoder extracts semantic features X;
3.2. Progressive Prompt Generation Module (PPGM)
- Lightweight Feature Extractor (LFE): A lightweight encoder produces the knowledge from a pretrained network termed “High-Level Knowledge”.
- Prior Prompt Synthesis Branch (PPSB): An encoder–generator pipeline that synthesizes patch-level cues and dense embedding termed “Middle-Level Knowledge”.
- Pretrained Prompt Teacher Branch (PPTB): A frozen Transformer backbone that utilizes trainable prompt tokens to provide stable representation evolution and strong structural priors, termed “Low-Level Knowledge”.
3.2.1. Lightweight Feature Extractor (LFE)
3.2.2. Pretrained Prompt Teacher Branch (PPTB)
3.2.3. Prior Prompt Synthesis Branch (PPSB)
3.3. High-Level Prompt-Guided Feature Interaction Module (HPGFIM)
3.4. Dynamic Selective-Frequency Decoder (DSFD)
3.5. Training Objective
4. Experiment
4.1. Experiment Settings
4.2. Comparative Methods
- U-Net [45] enabled accurate pixel-wise segmentation by combining contextual information and precise localization via multi-scale skip connections.
- UNet++ [46] designed a method for medical image semantic and instance segmentation based on redesigned skip connections and multi-depth network ensemble.
- AttUNet [47] boosted medical image segmentation accuracy by suppressing irrelevant background features and highlighting salient target organ features.
- nnU-Net [48] integrated fixed parameters, interdependent rules and empirical decisions to realize automated preprocessing, network construction, training and post-processing.
- Swin-Unet [49] integrated Swin Transformer blocks and symmetric encoder-decoder structure to model global contextual and multi-scale hierarchical features.
- TransFuse [50] captured global dependencies and low-level spatial details via parallel CNN and Transformer branches.
- H2Former [51] integrated CNNs, multi-scale channel attention and Transformers to capture long-range dependencies, fuse multi-scale features and ensure computational efficiency for medical image segmentation.
- ScribFormer [52] enabled high-performance scribble-based medical image segmentation by capturing global shape information and fusing local and global features via a triple-branch structure comprising a CNN, Transformer and attention-guided class activation map.
- LGFFM [35] for ultrasound image segmentation integrated a Parallel Bi-Encoder, frequency-domain mapping module and multi-domain fusion to capture local–global and frequency-domain features, addressing low resolution, noise and poor generalization of existing methods.
4.3. Result Analysis
4.3.1. The Results of CCAUI and CAMUS
4.3.2. The Results of DDTI and TN3K
4.3.3. The Results of HC-18 and JNU-IFM
4.3.4. Complexity Analysis
4.4. Ablation Studies
4.5. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xiao, X.; Zhang, J.; Shao, Y.; Liu, J.; Shi, K.; He, C.; Kong, D. Deep learning-based medical ultrasound image and video segmentation methods: Overview, frontiers, and challenges. Sensors 2025, 25, 2361. [Google Scholar] [CrossRef]
- Hussain, S.I.; Toscano, E. Optimized deep learning for mammography: Augmentation and tailored architectures. Information 2025, 16, 359. [Google Scholar] [CrossRef]
- Zhu, X.; Cheng, B.; Shen, Y.; Xia, B.; Yue, G. Boundary-Supplementary Network for Carotid Plaque Segmentation in Ultrasound Images. IEEE Signal Process. Lett. 2025, 32, 1765–1769. [Google Scholar] [CrossRef]
- Huang, Q.; Tian, H.; Jia, L.; Li, Z.; Zhou, Z. A review of deep learning segmentation methods for carotid artery ultrasound images. Neurocomputing 2023, 545, 126298. [Google Scholar] [CrossRef]
- Huang, Q.; Luo, Y.; Zhang, Q. Breast ultrasound image segmentation: A survey. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 493–507. [Google Scholar] [CrossRef] [PubMed]
- Horng, M.H.; Yang, C.W.; Sun, Y.N.; Yang, T.H. DeepNerve: A new convolutional neural network for the localization and segmentation of the median nerve in ultrasound image sequences. Ultrasound Med. Biol. 2020, 46, 2439–2452. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Xu, X.; Liu, Z.; Xia, Q.; Xia, M. Low-quality sensor data-based semi-supervised learning for medical image segmentation. Sensors 2024, 24, 7799. [Google Scholar] [CrossRef]
- Ansari, M.Y.; Mangalote, I.A.C.; Meher, P.K.; Aboumarzouk, O.; Al-Ansari, A.; Halabi, O.; Dakua, S.P. Advancements in deep learning for B-mode ultrasound segmentation: A comprehensive review. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2126–2149. [Google Scholar] [CrossRef]
- Meiburger, K.M.; Acharya, U.R.; Molinari, F. Automated localization and segmentation techniques for B-mode ultrasound images: A review. Comput. Biol. Med. 2018, 92, 210–235. [Google Scholar] [CrossRef]
- Ma, J.; Wu, F.; Jiang, T.; Zhao, Q.; Kong, D. Ultrasound image-based thyroid nodule automatic segmentation using convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 1895–1910. [Google Scholar] [CrossRef]
- Jiang, T.; Xing, W.; Yu, M.; Ta, D. A hybrid enhanced attention transformer network for medical ultrasound image segmentation. Biomed. Signal Process. Control 2023, 86, 105329. [Google Scholar] [CrossRef]
- Xiong, Y.; Shu, X.; Liu, Q.; Yuan, D. HCMNet: A Hybrid CNN-Mamba Network for Breast Ultrasound Segmentation for Consumer Assisted Diagnosis. IEEE Trans. Consum. Electron. 2025, 71, 8045–8054. [Google Scholar] [CrossRef]
- Zhai, D.; Hu, B.; Gong, X.; Zou, H.; Luo, J. ASS-GAN: Asymmetric semi-supervised GAN for breast ultrasound image segmentation. Neurocomputing 2022, 493, 204–216. [Google Scholar] [CrossRef]
- Chen, G.; Li, L.; Zhang, J.; Dai, Y. Rethinking the unpretentious U-net for medical ultrasound image segmentation. Pattern Recognit. 2023, 142, 109728. [Google Scholar] [CrossRef]
- Zhu, C.; Chai, X.; Xiao, Y.; Liu, X.; Zhang, R.; Yang, Z.; Wang, Z. Swin-Net: A swin-transformer-based network combing with multi-scale features for segmentation of breast tumor ultrasound images. Diagnostics 2024, 14, 269. [Google Scholar] [CrossRef]
- Mishra, D.; Chaudhury, S.; Sarkar, M.; Soin, A.S. Ultrasound image segmentation: A deeply supervised network with attention to boundaries. IEEE Trans. Biomed. Eng. 2018, 66, 1637–1648. [Google Scholar] [CrossRef]
- Lyu, Y.; Xu, Y.; Jiang, X.; Liu, J.; Zhao, X.; Zhu, X. AMS-PAN: Breast ultrasound image segmentation model combining attention mechanism and multi-scale features. Biomed. Signal Process. Control 2023, 81, 104425. [Google Scholar] [CrossRef]
- Ma, L.; Lin, X.; Liang, W.; Liu, W. EAMSHA-UNet: A breast cancer ultrasound image segmentation model based on edge-aware multi-scale hybrid attention. Biomed. Signal Process. Control 2026, 120, 110160. [Google Scholar] [CrossRef]
- Chen, R.; Zheng, X.; Su, H.; Wu, K. DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2026; Volume 40, pp. 3002–3010. [Google Scholar]
- Chen, H.; Cai, Y.; Wang, C.; Chen, L.; Zhang, B.; Han, H.; Guo, Y.; Ding, H.; Zhang, Q. Multi-Organ Foundation Model for Universal Ultrasound Image Segmentation With Task Prompt and Anatomical Prior. IEEE Trans. Med. Imaging 2025, 44, 1005–1018. [Google Scholar] [CrossRef] [PubMed]
- Spiegler, P.; Koleilat, T.; Harirpoush, A.; Miller, C.S.; Rivaz, H.; Kersten-Oertel, M.; Xiao, Y. Textsam-eus: Text prompt learning for sam to accurately segment pancreatic tumor in endoscopic ultrasound. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2025; pp. 948–957. [Google Scholar]
- Gowda, S.N.; Clifton, D.A. Cc-sam: Sam with cross-feature attention and context for ultrasound image segmentation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 108–124. [Google Scholar]
- Zhang, Y.; Xu, Q.; Li, Y.; He, X.; Zhang, Q.; Haque, M.; Qu, R.; Duan, W.; Chen, Z. FreqDINO: Frequency-Guided Adaptation for Generalized Boundary-Aware Ultrasound Image Segmentation. arXiv 2025, arXiv:2512.11335. [Google Scholar]
- Yin, D.; Zheng, Q.; Chen, L.; Hu, Y.; Wang, Q. APG-SAM: Automatic prompt generation for SAM-based breast lesion segmentation with boundary-aware optimization. Expert Syst. Appl. 2025, 276, 127048. [Google Scholar] [CrossRef]
- Lin, X.; Xiang, Y.; Yu, L.; Yan, Z. Beyond adapting SAM: Towards end-to-end ultrasound image segmentation via auto prompting. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2024; pp. 24–34. [Google Scholar]
- Guo, A.; Fei, G.; Pasupuleti, H.; Wang, J. ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation. In Proceedings of the Medical Imaging 2024: Ultrasonic Imaging and Tomography; SPIE: Bellingham, WA, USA, 2024; Volume 12932, pp. 240–244. [Google Scholar]
- Ryali, C.; Hu, Y.T.; Bolya, D.; Wei, C.; Fan, H.; Huang, P.Y.; Aggarwal, V.; Chowdhury, A.; Poursaeed, O.; Hoffman, J.; et al. Hiera: A hierarchical vision transformer without the bells-and-whistles. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR 202, pp. 29441–29454. [Google Scholar]
- Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef] [PubMed]
- He, Q.; Yang, Q.; Xie, M. HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 2023, 155, 106629. [Google Scholar] [CrossRef]
- Zhang, H.; Lian, J.; Yi, Z.; Wu, R.; Lu, X.; Ma, P.; Ma, Y. HAU-Net: Hybrid CNN-transformer for breast ultrasound image segmentation. Biomed. Signal Process. Control 2024, 87, 105427. [Google Scholar] [CrossRef]
- Dialameh, M.; Rajabzadeh, H.; Sadeghi-Goughari, M.; Sim, J.S.; Kwon, H.J. DualSwinUnet++: An enhanced Swin-Unet architecture with dual decoders for PTMC segmentation. Comput. Biol. Med. 2025, 196, 110716. [Google Scholar] [CrossRef]
- Yang, H.; Yang, D. CSwin-PNet: A CNN-Swin Transformer combined pyramid network for breast lesion segmentation in ultrasound images. Expert Syst. Appl. 2023, 213, 119024. [Google Scholar] [CrossRef]
- Tian, C.; Hu, Y.; Zhang, M.; Liao, X.; Lv, J.; Si, W. Self-prompt contextual learning with AxialMamba for multi-label segmentation in carotid ultrasound. Expert Syst. Appl. 2025, 274, 126749. [Google Scholar] [CrossRef]
- Zhao, J.; Sun, L.; Fan, D.; Wang, K.; Si, H.; Fu, H.; Zhang, D. Uncertainty-driven edge prompt generation network for medical image segmentation. IEEE Trans. Med. Imaging 2025, 44, 3950–3961. [Google Scholar] [CrossRef]
- Luo, X.; Wang, Y.; Ou-Yang, L. Lgffm: A localized and globalized frequency fusion model for ultrasound image segmentation. IEEE Trans. Med. Imaging 2025, 45, 515–527. [Google Scholar] [CrossRef] [PubMed]
- Jia, M.; Tang, L.; Chen, B.C.; Cardie, C.; Belongie, S.; Hariharan, B.; Lim, S.N. Visual prompt tuning. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 709–727. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Zhai, X.; Mustafa, B.; Kolesnikov, A.; Beyer, L. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023; pp. 11975–11986. [Google Scholar]
- Gagan, J.; Shirsat, H.S.; Mathias, G.P.; Mallya, B.V.; Andrade, J.; Rajagopal, K.; Kumar, J.H. Automated segmentation of common carotid artery in ultrasound images. IEEE Access 2022, 10, 58419–58430. [Google Scholar] [CrossRef]
- Leclerc, S.; Smistad, E.; Pedrosa, J.; Østvik, A.; Cervenansky, F.; Espinosa, F.; Espeland, T.; Berg, E.A.R.; Jodoin, P.M.; Grenier, T.; et al. Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Trans. Med. Imaging 2019, 38, 2198–2210. [Google Scholar] [CrossRef]
- Pedraza, L.; Vargas, C.; Narváez, F.; Durán, O.; Muñoz, E.; Romero, E. An open access thyroid ultrasound image database. In Proceedings of the 10th International Symposium on Medical Information Processing and Analysis; SPIE: Bellingham, WA, USA, 2015; Volume 9287, pp. 188–193. [Google Scholar]
- van den Heuvel, T.L.; de Bruijn, D.; de Korte, C.L.; Ginneken, B.v. Automated measurement of fetal head circumference using 2D ultrasound images. PLoS ONE 2018, 13, e0200412. [Google Scholar] [CrossRef]
- Lu, Y.; Zhou, M.; Zhi, D.; Zhou, M.; Jiang, X.; Qiu, R.; Ou, Z.; Wang, H.; Qiu, D.; Zhong, M.; et al. The JNU-IFM dataset for segmenting pubic symphysis-fetal head. Data Brief 2022, 41, 107904. [Google Scholar] [CrossRef]
- Gong, H.; Chen, G.; Wang, R.; Xie, X.; Mao, M.; Yu, Y.; Chen, F.; Li, G. Multi-task learning for thyroid nodule segmentation with thyroid region prior. In Proceedings of the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI); IEEE: New York, NY, USA, 2021; pp. 257–261. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef] [PubMed]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 205–218. [Google Scholar]
- Zhang, Y.; Liu, H.; Hu, Q. Transfuse: Fusing transformers and cnns for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2021; pp. 14–24. [Google Scholar]
- He, A.; Wang, K.; Li, T.; Du, C.; Xia, S.; Fu, H. H2former: An efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans. Med. Imaging 2023, 42, 2763–2775. [Google Scholar] [CrossRef]
- Li, Z.; Zheng, Y.; Shan, D.; Yang, S.; Li, Q.; Wang, B.; Zhang, Y.; Hong, Q.; Shen, D. Scribformer: Transformer makes cnn work better for scribble-based medical image segmentation. IEEE Trans. Med. Imaging 2024, 43, 2254–2265. [Google Scholar] [CrossRef]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to evaluate foreground maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2014; pp. 248–255. [Google Scholar]
- Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 4548–4557. [Google Scholar]
- Fan, D.P.; Ji, G.P.; Qin, X.; Cheng, M.M. Cognitive vision inspired object segmentation metric and loss function. Sci. Sin. Informationis 2021, 6, 5. [Google Scholar]





| Method | Venue | Prompt Application Position | Prior Modeling Strategy | Frequency-Domain Processing |
|---|---|---|---|---|
| APG-SAM [24] | ESWA 2025 | Box Prompt, Point Prompt | Yolov9 Detection | - |
| CC-SAM [22] | ECCV 2024 | LLM Text Prompt | Grounding DINO | No frequency-domain processing |
| FreqDINO [23] | ArXiv 2025 | Spatial Semantic Prompt | DINOv3 | Fixed high-/low-frequency fusion |
| LGFFM [35] | IEEE TMI 2025 | - | SAM2 | Static Wavelet frequency fusion |
| RPFeaNet (Ours) | - | Low-to-high level progressive prompt | Structural (CLIP), patch-level encoder, semantic (SAM2) | Prompt-guided dynamic selective-frequency fusion |
| Dataset | Method | Jaccard | Dice | ASSD | MSE | |||
|---|---|---|---|---|---|---|---|---|
| CCAUI | U-Net [45] | 0.7025 ± 0.0123 * | 0.8214 ± 0.0095 * | 0.9856 ± 0.0032 * | 0.9512 ± 0.0076 * | 0.8023 ± 0.0142 * | 0.4258 ± 0.0215 * | 0.0095 ± 0.0012 * |
| UNet++ [46] | 0.7218 ± 0.0108 * | 0.8386 ± 0.0087 * | 0.9879 ± 0.0028 * | 0.9587 ± 0.0069 * | 0.8195 ± 0.0131 * | 0.3964 ± 0.0198 * | 0.0088 ± 0.0010 * | |
| AttUNet [47] | 0.7352 ± 0.0095 * | 0.8491 ± 0.0076 * | 0.9892 ± 0.0025 * | 0.9623 ± 0.0062 * | 0.8317 ± 0.0118 * | 0.3725 ± 0.0182 * | 0.0081 ± 0.0009 * | |
| nnU-Net [48] | 0.7486 ± 0.0089 * | 0.8593 ± 0.0071 * | 0.9905 ± 0.0022 * | 0.9668 ± 0.0058 * | 0.8429 ± 0.0109 * | 0.3682 ± 0.0175 * | 0.0083 ± 0.0009 * | |
| Swin-Unet [49] | 0.7154 ± 0.0112 * | 0.8327 ± 0.0092 * | 0.9871 ± 0.0030 * | 0.9564 ± 0.0072 * | 0.8136 ± 0.0135 * | 0.4059 ± 0.0203 * | 0.0091 ± 0.0011 * | |
| TransFuse [50] | 0.7289 ± 0.0102 * | 0.8435 ± 0.0082 * | 0.9885 ± 0.0026 * | 0.9605 ± 0.0065 * | 0.8258 ± 0.0125 * | 0.3847 ± 0.0190 * | 0.0085 ± 0.0010 * | |
| H2Former [51] | 0.7395 ± 0.0091 * | 0.8526 ± 0.0074 * | 0.9898 ± 0.0024 * | 0.9641 ± 0.0060 * | 0.8362 ± 0.0115 * | 0.3659 ± 0.0172 * | 0.0080 ± 0.0008 * | |
| ScribFormer [52] | 0.7451 ± 0.0087 * | 0.8572 ± 0.0070 * | 0.9912 ± 0.0021 * | 0.9675 ± 0.0057 * | 0.8405 ± 0.0110 * | 0.3671 ± 0.0173 * | 0.0082 ± 0.0009 * | |
| LGFFM [35] | 0.7525 ± 0.0085 * | 0.8586 ± 0.0068 * | 0.9923 ± 0.0020 * | 0.9736 ± 0.0052 * | 0.8586 ± 0.0105 * | 0.3666 ± 0.0170 * | 0.0082 ± 0.0009 * | |
| RPFeaNet (Ours) | 0.8783 ± 0.0062 | 0.9300 ± 0.0045 | 0.9967 ± 0.0015 | 0.9879 ± 0.0038 | 0.9300 ± 0.0065 | 0.1507 ± 0.0085 | 0.0031 ± 0.0005 | |
| CAMUS | U-Net [45] | 0.9093 ± 0.0026 * | 0.9185 ± 0.0021 * | 0.8512 ± 0.0030 * | 0.9215 ± 0.0019 * | 0.2216 ± 0.0041 * | 0.2854 ± 0.0053 * | 0.0356 ± 0.0012 * |
| UNet++ [46] | 0.9101 ± 0.0017 * | 0.9324 ± 0.0015 * | 0.8726 ± 0.0022 * | 0.9308 ± 0.0014 * | 0.2305 ± 0.0035 * | 0.2469 ± 0.0044 * | 0.0321 ± 0.0010 * | |
| AttUNet [47] | 0.9079 ± 0.0032 * | 0.9432 ± 0.0028 * | 0.8915 ± 0.0035 * | 0.9386 ± 0.0026 * | 0.2387 ± 0.0039 * | 0.2158 ± 0.0038 * | 0.0298 ± 0.0009 * | |
| nnU-Net [48] | 0.9207 ± 0.0056 * | 0.9506 ± 0.0049 * | 0.9058 ± 0.0051 * | 0.9459 ± 0.0045 * | 0.2423 ± 0.0042 * | 0.1987 ± 0.0035 * | 0.0292 ± 0.0008 * | |
| Swin-Unet [49] | 0.8751 ± 0.0017 * | 0.9397 ± 0.0020 * | 0.8864 ± 0.0024 * | 0.9412 ± 0.0018 * | 0.2365 ± 0.0037 * | 0.2241 ± 0.0040 * | 0.0305 ± 0.0009 * | |
| TransFuse [50] | 0.8589 ± 0.0109 * | 0.9461 ± 0.0063 * | 0.8967 ± 0.0071 * | 0.9438 ± 0.0058 * | 0.2401 ± 0.0045 * | 0.2095 ± 0.0039 * | 0.0296 ± 0.0009 * | |
| H2Former [51] | 0.9182 ± 0.0076 * | 0.9528 ± 0.0052 * | 0.9102 ± 0.0058 * | 0.9486 ± 0.0048 * | 0.2445 ± 0.0043 * | 0.1896 ± 0.0033 * | 0.0289 ± 0.0008 * | |
| ScribFormer [52] | 0.9055 ± 0.0024 * | 0.9552 ± 0.0026 * | 0.9185 ± 0.0029 * | 0.9507 ± 0.0024 * | 0.2469 ± 0.0040 * | 0.1789 ± 0.0031 * | 0.0287 ± 0.0007 * | |
| LGFFM [35] | 0.9204 ± 0.0041 * | 0.9559 ± 0.0023 * | 0.9204 ± 0.0025 * | 0.9518 ± 0.0021 * | 0.2482 ± 0.0038 * | 0.1716 ± 0.0029 * | 0.0288 ± 0.0006 * | |
| RPFeaNet (Ours) | 0.9216 ± 0.0025 | 0.9565 ± 0.0018 | 0.9217 ± 0.0020 | 0.9519 ± 0.0017 | 0.2488 ± 0.0032 | 0.1778± 0.0030 | 0.0291 ± 0.0007 |
| Dataset | Method | Jaccard | Dice | ASSD | MSE | |||
|---|---|---|---|---|---|---|---|---|
| DDTI | U-Net [45] | 0.7625 ± 0.0123 * | 0.8614 ± 0.0095 * | 0.9456 ± 0.0032 * | 0.8712 ± 0.0076 * | 0.8123 ± 0.0142 * | 0.5258 ± 0.0215 * | 0.0495 ± 0.0012 * |
| UNet++ [46] | 0.7818 ± 0.0108 * | 0.8786 ± 0.0087 * | 0.9579 ± 0.0028 * | 0.8887 ± 0.0069 * | 0.8295 ± 0.0131 * | 0.4964 ± 0.0198 * | 0.0488 ± 0.0010 * | |
| AttUNet [47] | 0.7952 ± 0.0095 * | 0.8891 ± 0.0076 * | 0.9692 ± 0.0025 * | 0.8923 ± 0.0062 * | 0.8417 ± 0.0118 * | 0.4725 ± 0.0182 * | 0.0481 ± 0.0009 * | |
| nnU-Net [48] | 0.8086 ± 0.0089 * | 0.8993 ± 0.0071 * | 0.9705 ± 0.0022 * | 0.8968 ± 0.0058 * | 0.8529 ± 0.0109 * | 0.4682 ± 0.0175 * | 0.0483 ± 0.0009 * | |
| Swin-Unet [49] | 0.7754 ± 0.0112 * | 0.8727 ± 0.0092 * | 0.9571 ± 0.0030 * | 0.8864 ± 0.0072 * | 0.8236 ± 0.0135 * | 0.5059 ± 0.0203 * | 0.0491 ± 0.0011 * | |
| TransFuse [50] | 0.7889 ± 0.0102 * | 0.8835 ± 0.0082 * | 0.9685 ± 0.0026 * | 0.8905 ± 0.0065 * | 0.8358 ± 0.0125 * | 0.4847 ± 0.0190 * | 0.0485 ± 0.0010 * | |
| H2Former [51] | 0.8095 ± 0.0091 * | 0.8926 ± 0.0074 * | 0.9798 ± 0.0024 * | 0.8941 ± 0.0060 * | 0.8462 ± 0.0115 * | 0.4659 ± 0.0172 * | 0.0480 ± 0.0008 * | |
| ScribFormer [52] | 0.8151 ± 0.0087 * | 0.8972 ± 0.0070 * | 0.9812 ± 0.0021 * | 0.8975 ± 0.0057 * | 0.8505 ± 0.0110 * | 0.4671 ± 0.0173 * | 0.0482 ± 0.0009 * | |
| LGFFM [35] | 0.8502 ± 0.0085 * | 0.9174 ± 0.0068 * | 0.9539 ± 0.0020 * | 0.9007 ± 0.0052 * | 0.9174 ± 0.0105 * | 0.4760 ± 0.0170 * | 0.0461 ± 0.0009 * | |
| RPFeaNet (Ours) | 0.8536 ± 0.0062 | 0.9198 ± 0.0045 | 0.9555 ± 0.0015 | 0.9021 ± 0.0038 | 0.9198 ± 0.0065 | 0.4235 ± 0.0085 | 0.0448 ± 0.0005 | |
| TN3K | U-Net [45] | 0.6276 ± 0.0171 * | 0.7298 ± 0.0164 * | 0.9285 ± 0.0032 * | 0.8815 ± 0.0076 * | 0.7716 ± 0.0142 * | 0.3880 ± 0.0026 * | 0.0356 ± 0.0012 * |
| UNet++ [46] | 0.6424 ± 0.0027 * | 0.7487 ± 0.0042 * | 0.9324 ± 0.0028 * | 0.8908 ± 0.0069 * | 0.7805 ± 0.0131 * | 0.4070 ± 0.0058 * | 0.0321 ± 0.0010 * | |
| AttUNet [47] | 0.6359 ± 0.0161 * | 0.7393 ± 0.0164 * | 0.9432 ± 0.0025 * | 0.8986 ± 0.0062 * | 0.7887 ± 0.0118 * | 0.4100 ± 0.0070 * | 0.0298 ± 0.0009 * | |
| nnU-Net [48] | 0.7102 ± 0.0148 * | 0.8062 ± 0.0122 * | 0.9506 ± 0.0022 * | 0.9059 ± 0.0058 * | 0.7923 ± 0.0109 * | 0.3700 ± 0.0068 * | 0.0292 ± 0.0009 * | |
| Swin-Unet [49] | 0.6599 ± 0.0117 * | 0.7576 ± 0.0112 * | 0.9397 ± 0.0030 * | 0.9012 ± 0.0072 * | 0.7865 ± 0.0135 * | 0.3420 ± 0.0092 * | 0.0305 ± 0.0011 * | |
| TransFuse [50] | 0.7118 ± 0.0118 | 0.8163 ± 0.0089 | 0.9461 ± 0.0026 | 0.9038 ± 0.0065 | 0.7901 ± 0.0125 | 0.2820 ± 0.0009 | 0.0296 ± 0.0010 | |
| H2Former [51] | 0.7111 ± 0.0122 | 0.8072 ± 0.0125 | 0.9528 ± 0.0024 | 0.9086 ± 0.0060 | 0.7945 ± 0.0115 | 0.2840 ± 0.0067 | 0.0289 ± 0.0008 | |
| ScribFormer [52] | 0.6730 ± 0.0147 * | 0.7757 ± 0.0129 * | 0.9552 ± 0.0021 * | 0.9107 ± 0.0057 * | 0.7969 ± 0.0110 * | 0.3360 ± 0.0056 * | 0.0287 ± 0.0009 * | |
| LGFFM [35] | 0.7243 ± 0.0085 * | 0.8210 ± 0.0068 * | 0.9693 ± 0.0020 * | 0.9331 ± 0.0052 * | 0.8265 ± 0.0105 * | 0.2438 ± 0.0170 * | 0.0297 ± 0.0009 * | |
| RPFeaNet (Ours) | 0.7342 ± 0.0075 | 0.8306 ± 0.0065 | 0.9703 ± 0.0015 | 0.9344 ± 0.0038 | 0.8142 ± 0.0065 | 0.1924 ± 0.0027 | 0.0289 ± 0.0005 |
| Dataset | Method | Jaccard | Dice | ASSD | MSE | |||
|---|---|---|---|---|---|---|---|---|
| U-Net [45] | 0.7815 ± 0.0128 * | 0.8752 ± 0.0106 * | 0.9512 ± 0.0085 * | 0.8825 ± 0.0092 * | 0.8217 ± 0.0114 * | 0.6128 ± 0.0412 * | 0.0523 ± 0.0035 * | |
| UNet++ [46] | 0.8028 ± 0.0114 * | 0.8906 ± 0.0098 * | 0.9635 ± 0.0076 * | 0.8947 ± 0.0083 * | 0.8389 ± 0.0106 * | 0.5814 ± 0.0386 * | 0.0501 ± 0.0032 * | |
| AttUNet [47] | 0.8165 ± 0.0107 * | 0.8987 ± 0.0092 * | 0.9718 ± 0.0069 * | 0.9012 ± 0.0078 * | 0.8495 ± 0.0099 * | 0.5578 ± 0.0364 * | 0.0489 ± 0.0030 * | |
| nnU-Net [48] | 0.8302 ± 0.0099 * | 0.9075 ± 0.0085 * | 0.9786 ± 0.0063 * | 0.9085 ± 0.0072 * | 0.8603 ± 0.0092 * | 0.5492 ± 0.0348 * | 0.0485 ± 0.0028 * | |
| Swin-Unet [49] | 0.7946 ± 0.0119 * | 0.8834 ± 0.0102 * | 0.9589 ± 0.0080 * | 0.8893 ± 0.0087 * | 0.8312 ± 0.0111 * | 0.5947 ± 0.0398 * | 0.0512 ± 0.0033 * | |
| TransFuse [50] | 0.8098 ± 0.0103 * | 0.8942 ± 0.0088 * | 0.9674 ± 0.0071 * | 0.8978 ± 0.0079 * | 0.8426 ± 0.0098 * | 0.5735 ± 0.0357 * | 0.0496 ± 0.0029 * | |
| H2Former [51] | 0.8215 ± 0.0094 * | 0.9028 ± 0.0081 * | 0.9759 ± 0.0060 * | 0.9051 ± 0.0068 * | 0.8538 ± 0.0087 * | 0.5467 ± 0.0331 * | 0.0482 ± 0.0027 * | |
| ScribFormer [52] | 0.8279 ± 0.0089 * | 0.9061 ± 0.0077 * | 0.9803 ± 0.0057 * | 0.9097 ± 0.0065 * | 0.8584 ± 0.0083 * | 0.5481 ± 0.0324 * | 0.0484 ± 0.0026 * | |
| LGFFM [35] | 0.8427 ± 0.0076 * | 0.9158 ± 0.0064 * | 0.9618 ± 0.0051 * | 0.9125 ± 0.0058 * | 0.8716 ± 0.0072 * | 0.5398 ± 0.0298 * | 0.0472 ± 0.0023 * | |
| RPFeaNet (Ours) | 0.8519 ± 0.0042 | 0.9214 ± 0.0035 | 0.9645 ± 0.0028 | 0.9158 ± 0.0031 | 0.8789 ± 0.0039 | 0.4876 ± 0.0186 | 0.0458 ± 0.0017 | |
| JNU-IFM | U-Net [45] | 0.7128 ± 0.0156 * | 0.8315 ± 0.0123 * | 0.9326 ± 0.0092 * | 0.8517 ± 0.0105 * | 0.7824 ± 0.0131 * | 1.8254 ± 0.1286 * | 0.0615 ± 0.0042 * |
| UNet++ [46] | 0.7345 ± 0.0142 * | 0.8478 ± 0.0115 * | 0.9418 ± 0.0085 * | 0.8639 ± 0.0097 * | 0.7987 ± 0.0124 * | 1.6872 ± 0.1175 * | 0.0589 ± 0.0039 * | |
| AttUNet [47] | 0.7519 ± 0.0133 * | 0.8592 ± 0.0108 * | 0.9503 ± 0.0078 * | 0.8725 ± 0.0091 * | 0.8094 ± 0.0118 * | 1.5786 ± 0.1063 * | 0.0567 ± 0.0036 * | |
| nnU-Net [48] | 0.7684 ± 0.0125 * | 0.8687 ± 0.0101 * | 0.9578 ± 0.0072 * | 0.8802 ± 0.0085 * | 0.8189 ± 0.0112 * | 1.4963 ± 0.0984 * | 0.0559 ± 0.0034 * | |
| Swin-Unet [49] | 0.7427 ± 0.0147 * | 0.8536 ± 0.0112 * | 0.9465 ± 0.0089 * | 0.8684 ± 0.0099 * | 0.8031 ± 0.0127 * | 1.6348 ± 0.1132 * | 0.0578 ± 0.0038 * | |
| TransFuse [50] | 0.7586 ± 0.0137 * | 0.8631 ± 0.0105 * | 0.9537 ± 0.0075 * | 0.8758 ± 0.0088 * | 0.8146 ± 0.0121 * | 1.5279 ± 0.1035 * | 0.0562 ± 0.0037 * | |
| H2Former [51] | 0.7725 ± 0.0121 * | 0.8714 ± 0.0097 * | 0.9605 ± 0.0069 * | 0.8827 ± 0.0082 * | 0.8225 ± 0.0109 * | 1.4785 ± 0.0957 * | 0.0554 ± 0.0033 * | |
| ScribFormer [52] | 0.7798 ± 0.0116 * | 0.8759 ± 0.0093 * | 0.9632 ± 0.0066 * | 0.8859 ± 0.0079 * | 0.8271 ± 0.0105 * | 1.4692 ± 0.0942 * | 0.0551 ± 0.0032 * | |
| LGFFM [35] | 0.7865 ± 0.0108 * | 0.8807 ± 0.0089 * | 0.9714 ± 0.0059 * | 0.8913 ± 0.0074 * | 0.8358 ± 0.0098 * | 1.4287 ± 0.0896 * | 0.0543 ± 0.0030 * | |
| RPFeaNet (Ours) | 0.7982 ± 0.0053 | 0.8889 ± 0.0041 | 0.9738 ± 0.0032 | 0.8947 ± 0.0038 | 0.8426 ± 0.0046 | 1.2854 ± 0.0572 | 0.0527 ± 0.0021 |
| Method | FPS | Params (M) | FLOPs (G) |
|---|---|---|---|
| U-Net | 89.45 | 34.53 | 65.52 |
| UNet++ | 51.74 | 36.63 | 138.66 |
| AttUNet | 87.28 | 34.88 | 66.63 |
| nnU-Net | 89.63 | 34.56 | 65.72 |
| Swin-UNet | 148.99 | 27.15 | 5.92 |
| TransFuse | 115.63 | 26.17 | 8.65 |
| H2Former | 107.03 | 33.63 | 32.25 |
| ScribFormer | 59.71 | 50.42 | 54.46 |
| LGFFM | 32.22 | 15.02 | 64.42 |
| RPFeaNet (Ours) | 10.54 | 29.31 | 108.12 |
| PPGM | HPGFIM | DSFD | Dice ↑ | Jaccard ↑ | ASSD ↓ |
|---|---|---|---|---|---|
| ✓ | 0.852 | 0.741 | 0.321 | ||
| ✓ | 0.867 | 0.763 | 0.305 | ||
| ✓ | 0.881 | 0.785 | 0.289 | ||
| ✓ | ✓ | 0.905 | 0.827 | 0.242 | |
| ✓ | ✓ | 0.912 | 0.839 | 0.231 | |
| ✓ | ✓ | 0.918 | 0.848 | 0.224 | |
| ✓ | ✓ | ✓ | 0.930 | 0.878 | 0.151 |
| PPGM | HPGFIM | DSFD | Dice ↑ | Jaccard ↑ | ASSD ↓ |
|---|---|---|---|---|---|
| ✓ | 0.921 | 0.853 | 0.285 | ||
| ✓ | 0.934 | 0.872 | 0.268 | ||
| ✓ | 0.942 | 0.889 | 0.251 | ||
| ✓ | ✓ | 0.948 | 0.901 | 0.224 | |
| ✓ | ✓ | 0.951 | 0.907 | 0.216 | |
| ✓ | ✓ | 0.953 | 0.912 | 0.209 | |
| ✓ | ✓ | ✓ | 0.957 | 0.922 | 0.178 |
| Sub-Branch Setup | Dice ↑ | Jaccard ↑ | ASSD ↓ |
|---|---|---|---|
| w/LEF only | 0.9015 | 0.8207 | 0.2217 |
| w/PPTB only | 0.9083 | 0.8324 | 0.2089 |
| w/PPSB only | 0.8957 | 0.8116 | 0.2356 |
| w/LEF + PPTB | 0.9246 | 0.8598 | 0.1783 |
| w/LEF + PPSB | 0.9192 | 0.8503 | 0.1895 |
| w/PPTB + PPSB | 0.9297 | 0.8685 | 0.1672 |
| w/LEF + PPTB + PPSB (Full) | 0.9300 | 0.8783 | 0.1507 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhu, L.; Du, Y. RPFeaNet: Rethinking Deep Progressive Prompt-Guided Feature Interaction Fusion Network for Medical Ultrasound Image Segmentation. Sensors 2026, 26, 2394. https://doi.org/10.3390/s26082394
Zhu L, Du Y. RPFeaNet: Rethinking Deep Progressive Prompt-Guided Feature Interaction Fusion Network for Medical Ultrasound Image Segmentation. Sensors. 2026; 26(8):2394. https://doi.org/10.3390/s26082394
Chicago/Turabian StyleZhu, Lei, and Yuqing Du. 2026. "RPFeaNet: Rethinking Deep Progressive Prompt-Guided Feature Interaction Fusion Network for Medical Ultrasound Image Segmentation" Sensors 26, no. 8: 2394. https://doi.org/10.3390/s26082394
APA StyleZhu, L., & Du, Y. (2026). RPFeaNet: Rethinking Deep Progressive Prompt-Guided Feature Interaction Fusion Network for Medical Ultrasound Image Segmentation. Sensors, 26(8), 2394. https://doi.org/10.3390/s26082394

