An Integrated Architecture for Colorectal Polyp Segmentation: The µ-Net Framework with Explainable AI
Abstract
1. Introduction
- •
- We analyze various segmentation-based model architectures and observe some potential improvements.
- •
- Considering the need for development, we proposed µ-Net, a strongly improved version of the traditional U-Net model.
- •
- We experimented with CRC polyp image datasets obtained via different modalities; µ-Net exhibited superior performance.
- •
- We qualitatively examined some challenging images and observed significant improvements when using µ-Net rather than U-Net.
2. Related Work
2.1. Convolutional Neural Networks
2.2. Transformer
3. Methodology
3.1. Proposed Segmentation Pipeline
3.2. Model Architecture
3.3. Block Components
3.3.1. CBRes Block

3.3.2. Dual Dilate-CB Block
3.3.3. TriDilation Attention Block
3.3.4. SplitFusion Block
3.3.5. µ(mu) Block
3.4. Model Evaluation
- (1)
- Dice coefficient:
- (2)
- Intersection over union:
- (3)
- Pixel accuracy:
- (4)
- Precision:
- (5)
- Recall:
4. Experimental Results
4.1. Implementation Details
4.2. Dataset Overview
4.3. Structure and Annotation of the Dataset
4.4. Dataset Augmentation
4.5. Explainable AI
5. Results
Ablation Study
| Parameter | Value |
|---|---|
| Img_size | 352 |
| Dataset_type | Kvasir |
| Filters | 17 |
| Seed_value | 42 |
| Learning_rate | 1 × 10−4 |
| Epochs | 300 |
| Min_loss_for_saving | Np.inf |
| Variant | Dice (%) ± std | IoU | FLOPs (G) | Latency (ms) | p-Value |
|---|---|---|---|---|---|
| U-Net (Baseline) | 93.12 ± 0.45 | 0.881 | 48.7 | 22 | 0.010 |
| U-Net + CBRes | 92.87 ± 0.52 | 0.862 | 47.5 | 21 | 0.012 |
| U-Net + Dual Dilate-CB | 91.34 ± 0.60 | 0.842 | 45.2 | 20 | 0.008 |
| U-Net + TriDilation | 90.75 ± 0.58 | 0.831 | 44.8 | 20 | 0.004 |
| U-Net + SplitFusion | 89.61 ± 0.63 | 0.815 | 43.5 | 19 | 0.003 |
| Proposed µ-Net | 94.02 ± 0.38 | 87.72 | 39.04 | 15 | 0.001 |
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zorzi, S.M.; Grazzini, G. Colorectal cancer screening: Tests, strategies, and perspectives. Front. Public Health 2014, 2, 210. [Google Scholar]
- Balchen, V.; Simon, K. Colorectal cancer development and advances in screening. Clin. Interv. Aging 2016, 11, 967–976. [Google Scholar] [CrossRef] [PubMed]
- Hossain, M.S.; Syeed, M.M.; Fatema, K.; Uddin, M.F. The perception of health professionals in Bangladesh toward digitalizing the health sector. Int. J. Environ. Res. Public Health 2022, 19, 13695. [Google Scholar] [CrossRef] [PubMed]
- Borgli, H.; Thambawita, V.; Smedsrud, P.H.; Hicks, S.; Jha, D.; Eskeland, S.L.; Randel, K.R.; Pogorelov, K.; Lux, M.; Nguyen, D.T.D.; et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 2020, 7, 283. [Google Scholar] [CrossRef] [PubMed]
- Hicks, S.A.; Jha, D.; Thambawita, V.; Halvorsen, P.; Hammer, H.L.; Riegler, M.A. In Pattern Recognition. ICPR International Workshops and Challenges; Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R., Eds.; The EndoTect 2020 Challenge: Evaluation and Comparison of Classification, Segmentation, and Inference Time for Endoscopy. Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2021; Volume 12668, pp. 263–274. [Google Scholar] [CrossRef]
- Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; de Lange, T.; Johansen, D.; Johansen, H.D. In MultiMedia Modeling; Ro, Y.M., Cheng, W.-H., Kim, J., Chu, W.-T., Cui, P., Choi, J.-W., Hu, M.-C., De Neve, W., Eds.; Kvasir-SEG: A Segmented Polyp Dataset. Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 11962, pp. 451–462. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Fan, D.-P.; Ji, G.-P.; Zhou, T.; Chen, G.; Fu, H.; Shen, J.; Shao, L. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020; Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L., Eds.; PraNet: Parallel Reverse Attention Network for Polyp Segmentation. Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12266, pp. 263–273. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 6 October 2018; pp. 801–818. Available online: http://openaccess.thecvf.com/content_ECCV_2018/html/Liang-Chieh_Chen_Encoder-Decoder_with_Atrous_ECCV_2018_paper.html (accessed on 28 May 2025).
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. Available online: http://openaccess.thecvf.com/content_CVPR_2019/html/Sun_Deep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.html (accessed on 28 May 2025).
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. arXiv 2019. [Google Scholar] [CrossRef]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Liao, T.-Y.; Yang, C.-H.; Lo, Y.-W.; Lai, K.-Y.; Shen, P.-H.; Lin, Y.-L. HarDNet-DFUS: An Enhanced Harmonically-Connected Network for Diabetic Foot Ulcer Image Segmentation and Colonoscopy Polyp Segmentation. arXiv 2022. [Google Scholar] [CrossRef]
- Duc, N.T.; Oanh, N.T.; Thuy, N.T.; Triet, T.M.; Dinh, V.S. Colonformer: An efficient transformer-based method for colon polyp segmentation. IEEE Access 2022, 10, 80575–80586. [Google Scholar] [CrossRef]
- Raza, M.; Farwa, U.E.; Mozumder, M.A.I.; Mon-il, J.; Kim, H.C. ETRC-net: Efficient transformer for grading renal cell carcinoma in histopathological images. Comput. Electr. Eng. 2025, 128, 110747. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Sanderson, E.; Matuszewski, B.J. In Medical Image Understanding and Analysis; Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, C.-B., Eds.; FCN-Transformer Feature Fusion for Polyp Segmentation. Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2022; Volume 13413, pp. 892–907. [Google Scholar] [CrossRef]
- Wang, J.; Huang, Q.; Tang, F.; Meng, J.; Su, J.; Song, S. In The Medical Image Computing and Computer Assisted Intervention–MICCAI 2022; Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S., Eds.; Stepwise Feature Fusion: Local Guides Global. Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2022; Volume 13433, pp. 110–120. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020. Available online: https://arxiv.org/pdf/2010.11929/1000 (accessed on 28 May 2025).
- Ashish, V. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1. [Google Scholar]
- Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 10096–10106. Available online: http://proceedings.mlr.press/v139/tan21a.html (accessed on 28 May 2025).
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf (accessed on 28 May 2025).
- Shen, Y.; Jia, X.; Meng, M.Q.-H. Hrenet: A hard region enhancement network for polyp segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 559–568. [Google Scholar]
- Tan-Cong, N.; Nguyen, T.-P.; Diep, G.-H.; Tran-Dinh, A.-H.; Nguyen, T.V.; Tran, M.-T. CCBANet: Cascading context and balancing attention for polyp segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 633–643. [Google Scholar]
- Liu, A.; Huang, X.; Li, T.; Ma, P. Co-Net: A collaborative region-contour-driven network for fine-to-finer medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 1046–1055. [Google Scholar]
- Kim, T.; Lee, H.; Kim, D. Uacanet: Uncertainty augmented context attention for polyp segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 2167–2175. [Google Scholar]
- Zhao, X.; Zhang, L.; Lu, H. Automatic polyp segmentation via multi-scale subtraction network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 120–130. [Google Scholar]
- Guo, X.; Chen, Z.; Liu, J.; Yuan, Y. Non-equivalent images and pixels: Confidence-aware resampling with meta-learning mixup for polyp segmentation. Med. Image Anal. 2022, 78, 102394. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, A.; Jha, D.; Chanda, S.; Pal, U.; Johansen, H.D.; Johansen, D.; Riegler, M.A.; Ali, S.; Halvorsen, P. MSRF-Net: A multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed. Health Inform. 2021, 26, 2252–2263. [Google Scholar] [CrossRef] [PubMed]
- Dong, B.; Wang, W.; Fan, D.-P.; Li, J.; Fu, H.; Shao, L. Polyp-pvt: Polyp segmentation with pyramid vision transformers. arXiv 2021, arXiv:2108.06932. [Google Scholar] [CrossRef]
- Tomar, N.K.; Jha, D.; Riegler, M.A.; Johansen, H.D.; Johansen, D.; Rittscher, J.; Halvorsen, P.; Ali, S. Fanet: A feedback attention network for improved biomedical image segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9375–9388. [Google Scholar] [CrossRef] [PubMed]
- Muneer, A.; Waqas, M.; Saad, M.B.; Showkatian, E.; Bandyopadhyay, R.; Xu, H.; Li, W.; Chang, J.Y.; Liao, Z.; Haymaker, C.; et al. From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research. arXiv 2025, arXiv:2507.09028. [Google Scholar] [CrossRef]
- Garimella, K.; Jha, N.K.; Ghodsi, Z.; Garg, S.; Reagen, B. CryptoNite: Revealing the Pitfalls of End-to-End Private Inference at Scale. arXiv 2022. [Google Scholar] [CrossRef]
- Gupta, M.; Mishra, A. A systematic review of deep learning-based image segmentation to detect polyps. Artif. Intell. Rev. 2024, 57, 7. [Google Scholar] [CrossRef]
- Hossain, M.S.; Rahman, M.M.; Syeed, M.M.; Uddin, M.F.; Hasan, M.; Hossain, M.A. DeepPoly: Deep learning-based polyps segmentation and classification for autonomous colonoscopy examination. IEEE Access 2023, 11, 95889–95902. [Google Scholar] [CrossRef]
- Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Halvorsen, P.; de Lange, T.; Johansen, D.; Johansen, H.D. Kvasir-SEG: A Segmented Polyp Dataset|SpringerLink. Available online: https://link.springer.com/chapter/10.1007/978-3-030-37734-2_37 (accessed on 28 May 2025).
- Pogorelov, K.; Randel, K.R.; Griwodz, C.; Eskeland, S.L.; de Lange, T.; Johansen, D.; Spampinato, C.; Dang-Nguyen, D.-T.; Lux, M.; Schmidt, P.T.; et al. KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection. In Proceedings of the 8th ACM on Multimedia Systems Conference, Taipei, Taiwan, 20–23 June 2017; ACM: New York, NY, USA, 2017; pp. 164–169. [Google Scholar] [CrossRef]
- Xiao, M.; Zhang, L.; Shi, W.; Liu, J.; He, W.; Jiang, Z. A visualization method based on the Grad-CAM for medical image segmentation model. In Proceedings of the 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 23–26 September 2021; IEEE: New York, NY, USA, 2021; pp. 242–247. [Google Scholar] [CrossRef]














| SL | Matric | Value |
|---|---|---|
| 01 | HorizontalFlip | 0.5 |
| 02 | VerticalFlip | 0.5 |
| 03 | RandomRotate90 | 0.3 |
| 04 | ColorJitter | 0.5 |
| 05 | Brightness | 0.6 to 1.6× |
| 06 | Contrast | ±20% |
| 07 | Saturation | ±10% |
| 08 | Hue | ±0.01 |
| 09 | Affine Transform | 0.7 |
| 10 | Scale | 90% to 120% |
| 11 | Translation | ±10% |
| 12 | Rotation | ±30° |
| 13 | Shear | ±10° |
| Method | Dice (%) | Jaccard (%) | Precision (%) | Recall (%) | Params (M) | FLOPs (G) | Inference Time (ms) |
|---|---|---|---|---|---|---|---|
| ResUNet++ | 90.48 | 82.62 | 91.46 | 89.52 | 29.7 | 115.88 | 16 |
| Attention U-Net | 88.52 | 79.40 | 87.55 | 89.51 | 34.0 | 48.5 | 22 |
| Polyp-PVT | 84.60 | 77.60 | 85.50 | 89.50 | 37.0 | 41.58 | 19 |
| SANet | 90.80 | 85.40 | 89.30 | 91.17 | 27.7 | 44.40 | 21 |
| μ-Net (Proposed) | 94.02 | 88.72 | 94.75 | 93.31 | 39.18 | 39.04 | 15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Emon, M.H.; Mondal, P.K.; Mozumder, M.A.I.; Kim, H.C.; Lapina, M.; Babenko, M.; Muthanna, M.S.A. An Integrated Architecture for Colorectal Polyp Segmentation: The µ-Net Framework with Explainable AI. Diagnostics 2025, 15, 2890. https://doi.org/10.3390/diagnostics15222890
Emon MH, Mondal PK, Mozumder MAI, Kim HC, Lapina M, Babenko M, Muthanna MSA. An Integrated Architecture for Colorectal Polyp Segmentation: The µ-Net Framework with Explainable AI. Diagnostics. 2025; 15(22):2890. https://doi.org/10.3390/diagnostics15222890
Chicago/Turabian StyleEmon, Mehedi Hasan, Proloy Kumar Mondal, Md Ariful Islam Mozumder, Hee Cheol Kim, Maria Lapina, Mikhail Babenko, and Mohammed Saleh Ali Muthanna. 2025. "An Integrated Architecture for Colorectal Polyp Segmentation: The µ-Net Framework with Explainable AI" Diagnostics 15, no. 22: 2890. https://doi.org/10.3390/diagnostics15222890
APA StyleEmon, M. H., Mondal, P. K., Mozumder, M. A. I., Kim, H. C., Lapina, M., Babenko, M., & Muthanna, M. S. A. (2025). An Integrated Architecture for Colorectal Polyp Segmentation: The µ-Net Framework with Explainable AI. Diagnostics, 15(22), 2890. https://doi.org/10.3390/diagnostics15222890

