A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation
Abstract
1. Introduction
- (1)
- We introduce a Multiscale Attention (MA) module at the encoder’s lower stages. This module utilizes multiscale attention mechanisms and dilated convolutions with multiple sizes to expand the receptive field, improving performance in low-layers feature map extraction.
- (2)
- The proposed method introduces an integrated decoder that incorporates an Attention Gate (AG) module and multi-branch convolution modules to enhance segmentation accuracy for polyps of varying sizes and shapes, leveraging hierarchical feature supplementation and refinement.
- (3)
- The proposed method integrates an Improved Reverse Attention module, which specializes in polyp boundary segmentation tasks via enhanced feature extraction and reverse attention mechanisms.
- (4)
- To verify the practical effectiveness of the proposed model, experiments were conducted on five challenging benchmark datasets. The results demonstrate that the model exhibits reliable performance and strong versatility as a segmentation tool.
2. Related Work
3. Methodology
3.1. Overall Architecture
3.2. Multiscale Attention Module
| Algorithm 1 Multiscale Attention (MA) Module Algorithm |
| Input: Feature map (where H = height, W = width, C = channels) Output: Enhanced feature map
|
3.3. Parallel Multi-Level Aggregation Decoder
3.3.1. Attention Gate (AG)
3.3.2. Parallel Multi-Level Aggregation (Pma)
3.3.3. Adjacent Supplement Partial Decoder
| Algorithm 2 Multi-branch Feature Aggregation (MFA) Module Algorithm |
| Input: Input feature map Output: Aggregated and refined feature map
|
3.4. Improved Reverse Attention Module
3.5. Loss Function
4. Experiments
4.1. Dataset
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Results
4.5. Ablation Study
5. Conclusions
5.1. Discussion
- (1)
- The encoder employs PVTv2, leveraging spatial and multi-scale feature attention mechanisms for visual tasks. A Multiscale Attention module is introduced at the bottom stage of the encoder to optimize multi-scale feature extraction by fusing local and global information, improving polyp tissue recognition accuracy.
- (2)
- The parallel multi-level aggregation decoder is specifically designed for pyramid features, combining attention mechanisms with multi-convolutional blocks in parallel through residual convolutions to efficiently integrate and re-extract multi-scale feature maps from the encoder output.
- (3)
- The Scale Aggregation Reverse Attention module first processes decoder outputs via Scale Aggregation and then reinforces feature extraction using reverse attention, focusing on optimizing boundary recognition accuracy for polyp segmentation.
5.2. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
- Li, N.; Lu, B.; Luo, C.; Cai, J.; Lu, M.; Zhang, Y.; Chen, H.; Dai, M. Incidence, mortality, survival, risk factor and screening of colorectal cancer: A comparison among China, Europe, and northern America. Cancer Lett. 2021, 522, 255–268. [Google Scholar] [CrossRef]
- Bernal, J.; Tajkbaksh, N.; Sánchez, F.J.; Matuszewski, B.J.; Chen, H.; Yu, L.; Angermann, Q.; Romain, O.; Rustad, B.; Balasingham, I.; et al. Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge. IEEE Trans. Med Imaging 2017, 36, 1231–1249. [Google Scholar] [CrossRef]
- Jia, X.; Xing, X.; Yuan, Y.; Xing, L.; Meng, M.Q.H. Wireless Capsule Endoscopy: A New Tool for Cancer Screening in the Colon With Deep-Learning-Based Polyp Recognition. Proc. IEEE 2020, 108, 178–197. [Google Scholar] [CrossRef]
- Gupta, M.; Mishra, A. A systematic review of deep learning based image segmentation to detect polyp. Artif. Intell. Rev. 2024, 57. [Google Scholar] [CrossRef]
- Bang, C.S.; Lee, J.J.; Baik, G.H. Computer-Aided Diagnosis of Diminutive Colorectal Polyps in Endoscopic Images: Systematic Review and Meta-analysis of Diagnostic Test Accuracy. J. Med. Internet Res. 2021, 23, e29682. [Google Scholar] [CrossRef]
- Sánchez-González, A.; García-Zapirain, B.; Sierra-Sosa, D.; Elmaghraby, A. Automatized colon polyp segmentation via contour region analysis. Comput. Biol. Med. 2018, 100, 152–164. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved baselines with Pyramid Vision Transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Jin, E.H.; Lee, D.; Bae, J.H.; Kang, H.Y.; Kwak, M.S.; Seo, J.Y.; Yang, J.I.; Yang, S.Y.; Lim, S.H.; Yim, J.Y.; et al. Improved Accuracy in Optical Diagnosis of Colorectal Polyps Using Convolutional Neural Networks with Visual Explanations. Gastroenterology 2020, 158, 2169–2179.e8. [Google Scholar] [CrossRef] [PubMed]
- Choi, S.J.; Kim, E.S.; Choi, K. Prediction of the histology of colorectal neoplasm in white light colonoscopic images using deep learning algorithms. Sci. Rep. 2021, 11, 5311. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Y Hammerla, N.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Jin, Q.; Meng, Z.; Sun, C.; Cui, H.; Su, R. RA-UNet: A Hybrid Deep Attention-Aware Network to Extract Liver and Tumor in CT Scans. Front. Bioeng. Biotechnol. 2020, 8, 605132. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2021, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:abs/2102.04306. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 548–558. [Google Scholar] [CrossRef]
- Shi, W.; Xu, J.; Gao, P. SSformer: A Lightweight Transformer for Semantic Segmentation. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, China, 26–28 September 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Sanderson, E.; Matuszewski, B.J. FCN-Transformer Feature Fusion for Polyp Segmentation. In Proceedings of the Medical Image Understanding and Analysis: 26th Annual Conference, MIUA 2022, Cambridge, UK, 27–29 July 2022; Proceedings. Volume 13413, pp. 892–907. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale visual recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Valanarasu, J.M.J.; Patel, V.M. UNeXt: MLP-Based Rapid Medical Image Segmentation Network. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, Singapore, 18–22 September 2022; Springer Nature: Cham, Switzerland, 2022; pp. 23–33. [Google Scholar] [CrossRef]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A General U-Shaped Transformer for Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 17662–17672. [Google Scholar] [CrossRef]
- Dong, X.Y.; Bao, J.M.; Chen, D.D.; Zhang, W.M.; Yu, N.H.; Yuan, L.; Chen, D.; Guo, B.N.; Ieee Comp, S.O.C. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12114–12124. [Google Scholar] [CrossRef]
- Yuan, Y.; Fu, R.; Huang, L.; Lin, W.; Zhang, C.; Chen, X.; Wang, J. HRFormer: High-resolution transformer for dense prediction. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Curran Associates Inc.: Red Hook, NY, USA, 2021; p. 557. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 272–284. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Fan, D.P.; Ji, G.P.; Zhou, T.; Chen, G.; Fu, H.; Shen, J.; Shao, L. PraNet: Parallel Reverse Attention Network for Polyp Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Lima, Peru, 4–8 October 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 263–273. [Google Scholar] [CrossRef]
- Kim, T.; Lee, H.; Kim, D. UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; Association for Computing Machinery: New York, NY, USA, 2021. MM ’21. pp. 2167–2175. [Google Scholar] [CrossRef]
- Lou, A.G.; Guan, S.Y.; Ko, H.; Loew, M. CaraNet: Context Axial Reverse Attention Network for Segmentation of Small Medical Objects. J. Med. Imaging 2022, 10, 014005. [Google Scholar] [CrossRef]
- Yang, Y.; Dasmahapatra, S.; Mahmoodi, S. ADS_UNet: A nested UNet for histopathology image segmentation. Expert Syst. Appl. 2023, 226, 120128. [Google Scholar] [CrossRef]
- Zhao, X.; Jia, H.; Pang, Y.; Lv, L.; Tian, F.; Zhang, L.; Sun, W.; Lu, H. M2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation. arXiv 2023, arXiv:2303.10894. [Google Scholar] [CrossRef]
- Zhou, T.; Zhou, Y.; Li, G.; Chen, G.; Shen, J. Uncertainty-Aware Hierarchical Aggregation Network for Medical Image Segmentation. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7440–7453. [Google Scholar] [CrossRef]
- Zhou, T.; Zhang, Y.; Chen, G.; Zhou, Y.; Wu, Y.; Fan, D.P. Edge-aware Feature Aggregation Network for Polyp Segmentation. Mach. Intell. Res. 2025, 22, 101–116. [Google Scholar] [CrossRef]
- Lee, H.J.; Kim, J.U.; Lee, S.; Kim, H.G.; Ro, Y.M. Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4816–4825. [Google Scholar] [CrossRef]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar] [CrossRef]
- Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized self-attention: Towards high-quality pixel-wise mapping. Neurocomputing 2022, 506, 158–167. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Liu, S.T.; Huang, D.; Wang, Y.H. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11215, pp. 404–419. [Google Scholar] [CrossRef]
- Wu, Z.; Su, L.; Huang, Q. Cascaded Partial Decoder for Fast and Accurate Salient Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3902–3911. [Google Scholar] [CrossRef]
- Wei, J.; Wang, S.; Huang, Q. F3Net: Fusion, feedback and focus for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12321–12328. [Google Scholar] [CrossRef]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 516–520. [Google Scholar] [CrossRef]
- Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; Lange, T.d.; Johansen, D.; Johansen, H.D. Kvasir-SEG: A Segmented Polyp Dataset. In Proceedings of the MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, Republic of Korea, 5–8 January 2020; Proceedings, Part II. Springer: Berlin/Heidelberg, Germany, 2020; pp. 451–462. [Google Scholar] [CrossRef]
- Bernal, J.; Sánchez, F.J.; Fernández-Esparrach, G.; Gil, D.; Rodríguez, C.; Vilariño, F. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med Imaging Graph. 2015, 43, 99–111. [Google Scholar] [CrossRef]
- Tajbakhsh, N.; Gurudu, S.R.; Liang, J. Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information. IEEE Trans. Med Imaging 2016, 35, 630–644. [Google Scholar] [CrossRef]
- Vázquez, D.; Bernal, J.; Sánchez, F.J.; Fernández-Esparrach, G.; López, A.M.; Romero, A.; Drozdzal, M.; Courville, A. A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images. J. Healthc. Eng. 2017, 2017, 4037190. [Google Scholar] [CrossRef] [PubMed]
- Silva, J.; Histace, A.; Romain, O.; Dray, X.; Granado, B. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 2014, 9, 283–293. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Kai, L.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. How to Evaluate Foreground Maps. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 248–255. [Google Scholar] [CrossRef]
- Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-Measure: A New Way to Evaluate Foreground Maps. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4558–4567. [Google Scholar] [CrossRef]
- Fan, D.P.; Gong, C.; Cao, Y.; Ren, B.; Cheng, M.M.; Borji, A. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 9–19 July 2018; AAAI Press: Washington, DC, USA, 2018; pp. 698–704. [Google Scholar] [CrossRef]
- Fang, Y.; Chen, C.; Yuan, Y.; Tong, K.y. Selective Feature Aggregation Network with Area-Boundary Constraints for Polyp Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 302–310. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, L.; Lu, H. Automatic Polyp Segmentation via Multi-scale Subtraction Network. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021, Strasbourg, France, 27 September–1 October 2021; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 120–130. [Google Scholar] [CrossRef]
- Patel, K.; Bur, A.M.; Wang, G. Enhanced U-Net: A Feature Enhancement Network for Polyp Segmentation. Proc. Int. Robot Vis. Conf. 2021, 2021, 181–188. [Google Scholar] [CrossRef]
- Zhou, T.; Zhou, Y.; He, K.; Gong, C.; Yang, J.; Fu, H.; Shen, D. Cross-level Feature Aggregation Network for Polyp Segmentation. Pattern Recognit. 2023, 140, 109555. [Google Scholar] [CrossRef]
- Bui, N.T.; Dinh-Hieu, H.; Quang-Thuc, N.; Minh-Triet, T.; Le, N.; Soc, I.C. MEGANet: Multi-Scale Edge-Guided Attention Network for Weak Boundary Polyp Segmentation. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 7970–7979. [Google Scholar] [CrossRef]
- Shu, X.; Wang, J.; Zhang, A.; Shi, J.; Wu, X.J. CSCA U-Net: A channel and space compound attention CNN for medical image segmentation. Artif. Intell. Med. 2024, 150, 102800. [Google Scholar] [CrossRef]










| Dataset | Number | Size (Pixels) | Train | Test |
|---|---|---|---|---|
| Kvasir-SEG [46] | 1000 | range from 332 × 487 to 1920 × 1072 | 900 | 100 |
| CVC-ClinicDB [47] | 612 | 384 × 288 | 548 | 64 |
| CVC-ColonDB [48] | 380 | 574 × 500 | - | 380 |
| Endoscene [49] | 60 | 1225 × 966 | - | 60 |
| ETIS-LaribPolypDB [50] | 196 | 1226 × 996 | - | 196 |
| Methods | Year | Kvasir-SEG | |||||
|---|---|---|---|---|---|---|---|
| U-Net [12] | 2015 | 0.818 | 0.746 | 0.794 | 0.858 | 0.893 | 0.055 |
| U-Net++ [13] | 2018 | 0.821 | 0.743 | 0.808 | 0.862 | 0.910 | 0.048 |
| SFA [56] | 2019 | 0.723 | 0.611 | 0.670 | 0.782 | 0.849 | 0.075 |
| PraNet [31] | 2020 | 0.898 | 0.840 | 0.885 | 0.915 | 0.948 | 0.030 |
| MSNet [57] | 2021 | 0.907 | 0.862 | 0.893 | 0.922 | 0.944 | 0.028 |
| Enhanced U-Net [58] | 2021 | 0.908 | 0.854 | 0.893 | 0.917 | 0.954 | 0.028 |
| CFA-Net [59] | 2023 | 0.915 | 0.861 | 0.903 | 0.924 | - | 0.023 |
| MEGANet(Res2Net-50) [60] | 2024 | 0.913 | 0.863 | 0.907 | 0.918 | 0.959 | 0.025 |
| CSCA U-Net [61] | 2024 | 0.903 | 0.846 | 0.890 | 0.918 | 0.951 | 0.031 |
| Proposed | 0.923 | 0.870 | 0.917 | 0.925 | 0.967 | 0.023 | |
| Methods | Year | CVC-ClinicDB | |||||
|---|---|---|---|---|---|---|---|
| U-Net [12] | 2015 | 0.823 | 0.755 | 0.811 | 0.889 | 0.954 | 0.019 |
| U-Net++ [13] | 2018 | 0.794 | 0.729 | 0.785 | 0.873 | 0.931 | 0.022 |
| SFA [56] | 2019 | 0.700 | 0.607 | 0.647 | 0.793 | 0.885 | 0.042 |
| PraNet [31] | 2020 | 0.899 | 0.849 | 0.896 | 0.936 | 0.979 | 0.009 |
| MSNet [57] | 2021 | 0.921 | 0.879 | 0.914 | 0.941 | 0.972 | 0.008 |
| Enhanced U-Net [58] | 2021 | 0.902 | 0.846 | 0.891 | 0.936 | 0.965 | 0.011 |
| CFA-Net [59] | 2023 | 0.933 | 0.883 | 0.924 | 0.950 | - | 0.007 |
| MEGANet(Res2Net-50) [60] | 2024 | 0.938 | 0.894 | 0.940 | 0.950 | 0.986 | 0.006 |
| CSCA U-Net [61] | 2024 | 0.915 | 0.864 | 0.912 | 0.942 | 0.975 | 0.010 |
| Proposed | 0.939 | 0.892 | 0.942 | 0.952 | 0.989 | 0.006 | |
| Methods | Year | CVC-ColonDB | |||||
|---|---|---|---|---|---|---|---|
| U-Net [12] | 2015 | 0.512 | 0.444 | 0.498 | 0.712 | 0.776 | 0.061 |
| U-Net++ [13] | 2018 | 0.483 | 0.410 | 0.467 | 0.691 | 0.760 | 0.064 |
| SFA [56] | 2019 | 0.456 | 0.337 | 0.366 | 0.629 | 0.765 | 0.094 |
| PraNet [31] | 2020 | 0.709 | 0.640 | 0.696 | 0.819 | 0.869 | 0.045 |
| MSNet [57] | 2021 | 0.755 | 0.678 | 0.737 | 0.836 | 0.883 | 0.041 |
| Enhanced U-Net [58] | 2021 | 0.756 | 0.681 | 0.730 | 0.831 | 0.872 | 0.045 |
| CFA-Net [59] | 2023 | 0.743 | 0.665 | 0.728 | 0.835 | - | 0.039 |
| MEGANet(Res2Net-50) [60] | 2024 | 0.793 | 0.714 | 0.779 | 0.854 | 0.895 | 0.040 |
| CSCA U-Net [61] | 2024 | 0.788 | 0.703 | 0.760 | 0.857 | 0.906 | 0.036 |
| Proposed | 0.820 | 0.738 | 0.805 | 0.869 | 0.927 | 0.028 | |
| Methods | Year | Endoscene | |||||
|---|---|---|---|---|---|---|---|
| U-Net [12] | 2015 | 0.710 | 0.627 | 0.684 | 0.843 | 0.875 | 0.022 |
| U-Net++ [13] | 2018 | 0.707 | 0.624 | 0.687 | 0.839 | 0.898 | 0.018 |
| SFA [56] | 2019 | 0.467 | 0.329 | 0.341 | 0.640 | 0.817 | 0.065 |
| PraNet [31] | 2020 | 0.871 | 0.797 | 0.843 | 0.925 | 0.972 | 0.010 |
| MSNet [57] | 2021 | 0.869 | 0.807 | 0.849 | 0.925 | 0.943 | 0.010 |
| Enhanced U-Net [58] | 2021 | 0.837 | 0.765 | 0.805 | 0.904 | 0.933 | 0.015 |
| CFA-Net [59] | 2023 | 0.893 | 0.827 | 0.875 | 0.938 | - | 0.008 |
| MEGANet(Res2Net-50) [60] | 2024 | 0.899 | 0.834 | 0.882 | 0.935 | 0.969 | 0.007 |
| CSCA U-Net [61] | 2024 | 0.868 | 0.789 | 0.835 | 0.920 | 0.969 | 0.012 |
| Proposed | 0.895 | 0.829 | 0.878 | 0.932 | 0.972 | 0.007 | |
| Methods | Year | ETIS-LaribPolypDB | |||||
|---|---|---|---|---|---|---|---|
| U-Net [12] | 2015 | 0.398 | 0.335 | 0.366 | 0.684 | 0.740 | 0.036 |
| U-Net++ [13] | 2018 | 0.401 | 0.344 | 0.390 | 0.683 | 0.776 | 0.035 |
| SFA [56] | 2019 | 0.297 | 0.217 | 0.231 | 0.557 | 0.633 | 0.109 |
| PraNet [31] | 2020 | 0.628 | 0.567 | 0.600 | 0.794 | 0.841 | 0.031 |
| MSNet [57] | 2021 | 0.719 | 0.664 | 0.678 | 0.840 | 0.830 | 0.020 |
| Enhanced U-Net [58] | 2021 | 0.687 | 0.609 | 0.636 | 0.793 | 0.841 | 0.068 |
| CFA-Net [59] | 2023 | 0.732 | 0.655 | 0.693 | 0.845 | - | 0.014 |
| MEGANet(Res2Net-50) [60] | 2024 | 0.739 | 0.665 | 0.702 | 0.836 | 0.858 | 0.037 |
| CSCA U-Net [61] | 2024 | 0.688 | 0.608 | 0.644 | 0.814 | 0.882 | 0.026 |
| Proposed | 0.756 | 0.673 | 0.709 | 0.844 | 0.894 | 0.020 | |
| Methods | Year | |||
|---|---|---|---|---|
| U-Net [12] | 2015 | 123.87 | 123.11 | 34.52 |
| U-Net++ [13] | 2018 | 262.16 | 82.51 | 36.63 |
| SFA [56] | 2019 | - | - | - |
| PraNet [31] | 2020 | 13.15 | 25.05 | 32.55 |
| MSNet [57] | 2021 | 17.00 | 31.08 | 29.74 |
| Enhanced U-Net [58] | 2021 | - | - | - |
| CFA-Net [59] | 2023 | 55.36 | 23.50 | 25.24 |
| MEGANet(Res2Net-50) [60] | 2024 | - | - | 44.19 |
| CSCA U-Net [61] | 2024 | - | 72.79 (Kvasir-SEG) | - |
| Proposed (PVTv2-B2) | 9.8 (256 × 256) | 170 | 45.2 |
| Methods | Kvasir-SEG | CVC-ClinicDB | ||||
|---|---|---|---|---|---|---|
| MA | PAM | SARA | ||||
| ✓ | ✓ | 0.913 | 0.855 | 0.928 | 0.880 | |
| ✓ | ✓ | 0.904 | 0.845 | 0.922 | 0.876 | |
| ✓ | ✓ | 0.910 | 0.852 | 0.925 | 0.877 | |
| ✓ | ✓ | ✓ | 0.915 | 0.858 | 0.930 | 0.883 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, R.; Zhou, D.; Wan, Y. A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation. Mathematics 2025, 13, 3794. https://doi.org/10.3390/math13233794
Yan R, Zhou D, Wan Y. A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation. Mathematics. 2025; 13(23):3794. https://doi.org/10.3390/math13233794
Chicago/Turabian StyleYan, Ran, Dongming Zhou, and Yulong Wan. 2025. "A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation" Mathematics 13, no. 23: 3794. https://doi.org/10.3390/math13233794
APA StyleYan, R., Zhou, D., & Wan, Y. (2025). A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation. Mathematics, 13(23), 3794. https://doi.org/10.3390/math13233794

