EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation
Abstract
:1. Introduction
- (1)
- We propose a novel architecture for the field of panoptic segmentation, called EGSDK-Net. It liberates panoptic segmentation from the constraints of semantic and instance segmentation for the first time and offers valuable structural contributions to the field.
- (2)
- A real-time edge guidance module (RTEGM) is designed. This module not only introduces a novel theory regarding the relationship between edge detection and panoptic segmentation but also enhances segmentation performance through a lightweight structure.
- (3)
- A stepwise dual kernel update module (SDKUM) is proposed. Considering the limitations of past methods in utilizing system information, SDKUM addresses this issue through a clever design. By using information more effectively, it successfully promotes advancements in model capabilities.
2. Related Work
3. Method
3.1. Overall Structure
3.2. Real-Time Edge Guidance Module
3.3. Stepwise Dual Kernel Update Module
4. Experiments
4.1. The Dataset and Metrics
4.2. Implementation Details
4.3. Comparison Experiment
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kirillov, A.; He, K.; Girshick, R.; Rother, C.; Dollár, P. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 9404–9413. [Google Scholar]
- Zhang, W.; Pang, J.; Chen, K.; Loy, C.C. K-net: Towards unified image segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 10326–10338. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
- Yu, Q.; Wang, H.; Qiao, S.; Collins, M.; Zhu, Y.; Adam, H.; Yuille, A.; Chen, L.C. k-means Mask Transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 288–307. [Google Scholar]
- Schön, M.; Buchholz, M.; Dietmayer, K. Rt-k-net: Revisiting k-net for real-time panoptic segmentation. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–7. [Google Scholar]
- Hu, J.; Huang, L.; Ren, T.; Zhang, S.; Ji, R.; Cao, L. You only segment once: Towards real-time panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 17819–17829. [Google Scholar]
- Šarić, J.; Oršić, M.; Šegvić, S. Panoptic SwiftNet: Pyramidal Fusion for Real-Time Panoptic Segmentation. Remote Sens. 2023, 15, 1968. [Google Scholar] [CrossRef]
- Wang, F.; Wang, Z.; Chen, Z.; Zhu, D.; Gong, X.; Cong, W. An edge-guided deep learning solar panel hotspot thermal image segmentation algorithm. Appl. Sci. 2023, 13, 11031. [Google Scholar] [CrossRef]
- Jin, J.; Zhou, W.; Yang, R.; Ye, L.; Yu, L. Edge detection guide network for semantic segmentation of remote-sensing images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Schön, M.; Buchholz, M.; Dietmayer, K. Mgnet: Monocular geometric scene understanding for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 15804–15815. [Google Scholar]
- Chen, L.C.; Wang, H.; Qiao, S. Scaling wide residual networks for panoptic segmentation. arXiv 2020, arXiv:2011.11675. [Google Scholar]
- Petrovai, A.; Nedevschi, S. Real-time panoptic segmentation with prototype masks for automated driving. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1400–1406. [Google Scholar]
- Hou, R.; Li, J.; Bhargava, A.; Raventos, A.; Guizilini, V.; Fang, C.; Lynch, J.; Gaidon, A. Real-time panoptic segmentation from dense detections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Virtual, 14–19 June 2020; pp. 8523–8532. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Mohan, R.; Valada, A. Efficientps: Efficient panoptic segmentation. Int. J. Comput. Vis. 2021, 129, 1551–1579. [Google Scholar] [CrossRef]
- Porzi, L.; Bulo, S.R.; Colovic, A.; Kontschieder, P. Seamless scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 8277–8286. [Google Scholar]
- Gao, N.; Shan, Y.; Wang, Y.; Zhao, X.; Yu, Y.; Yang, M.; Huang, K. Ssap: Single-shot instance segmentation with affinity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 642–651. [Google Scholar]
- Cheng, B.; Collins, M.D.; Zhu, Y.; Liu, T.; Huang, T.S.; Adam, H.; Chen, L.C. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; pp. 12475–12485. [Google Scholar]
- Yu, Q.; Wang, H.; Kim, D.; Qiao, S.; Collins, M.; Zhu, Y.; Adam, H.; Yuille, A.; Chen, L.C. Cmt-deeplab: Clustering mask transformers for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 2560–2570. [Google Scholar]
- Zhou, X.; Shen, K.; Weng, L.; Cong, R.; Zheng, B.; Zhang, J.; Yan, C. Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images. IEEE Trans. Cybern. 2023, 53, 539–552. [Google Scholar] [CrossRef] [PubMed]
- Zheng, X.; Wang, B.; Ai, L.; Tang, P.; Liu, D. EDGE-Net: An edge-guided enhanced network for RGB-T salient object detection. J. Electron. Imaging 2023, 32, 063032. [Google Scholar] [CrossRef]
- Fang, F.; Li, J.; Yuan, Y.; Zeng, T.; Zhang, G. Multilevel Edge Features Guided Network for Image Denoising. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3956–3970. [Google Scholar] [CrossRef]
- Wang, D.; Xie, C.; Liu, S.; Niu, Z.; Zuo, W. Image inpainting with edge-guided learnable bidirectional attention maps. arXiv 2021, arXiv:2104.12087. [Google Scholar]
- Lin, H.; Pagnucco, M.; Song, Y. Edge guided progressively generative image outpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 806–815. [Google Scholar]
- Dai, Q.; Fang, F.; Li, J.; Zhang, G.; Zhou, A. Edge-guided composition network for image stitching. Pattern Recognit. 2021, 118, 108019. [Google Scholar] [CrossRef]
- Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 8, 679–698. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Gou, C.; Wu, Q.; Feng, H.; Han, J.; Ding, E.; Wang, J. RTFormer: Efficient design for real-time semantic segmentation with transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 7423–7436. [Google Scholar]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 2018, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–26 June 2009; pp. 248–255. [Google Scholar]
Method | Backbone | GPU | |||
---|---|---|---|---|---|
MGNet [10] | ResNet-18 | 55.7 | 45.3 | 63.1 | TitanRTX |
K-Net [2] | ResNet50-FPN | 56.9 | 46.0 | 64.8 | - |
Petrovai and Nedevschi [12] | VoVNet2-39 | 57.3 | 50.4 | 62.4 | V100 |
Panoptic-DeepLab [11] | SWideRNet-(0.25,0.25,0.75) | 58.4 | - | - | V100 |
Hou et al. [13] | ResNet50-FPN | 58.8 | 52.1 | 63.7 | V100 |
Panoptic SwiftNet [7] | ResNet-18 | 55.9 | - | - | RTX3090 |
YOSO. [6] | ResNet50 | 59.7 | 51.0 | 66.1 | V100 |
RT-K-Net [5] | RTFormer | 59.3 | 48.9 | 66.7 | V100 |
Ours | RTFormer | 60.6 | 51.9 | 66.8 | V100 |
Method | |||
---|---|---|---|
Baseline | 59.3 | 48.9 | 66.7 |
Baseline w/SDKUM | 60.0 | 50.0 | 67.3 |
Ours | 60.6 | 51.9 | 66.8 |
Method | |||
---|---|---|---|
Convs | 59.3 | 48.9 | 66.7 |
only MaxPool | 59.6 | 50.0 | 66.7 |
only AvgPool | 59.4 | 49.3 | 66.7 |
Ours | 60.6 | 51.9 | 66.8 |
Method | |||
---|---|---|---|
Baseline w/RTEGM | 59.7 | 49.4 | 67.1 |
Using Addition for Fusion | 59.5 | 50.0 | 66.7 |
Using Multiplication for Fusion | 59.7 | 50.3 | 66.5 |
Ours | 60.6 | 51.9 | 66.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mu, P.; Zhao, H.; Ma, K. EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation. Algorithms 2025, 18, 71. https://doi.org/10.3390/a18020071
Mu P, Zhao H, Ma K. EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation. Algorithms. 2025; 18(2):71. https://doi.org/10.3390/a18020071
Chicago/Turabian StyleMu, Pengyu, Hongwei Zhao, and Ke Ma. 2025. "EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation" Algorithms 18, no. 2: 71. https://doi.org/10.3390/a18020071
APA StyleMu, P., Zhao, H., & Ma, K. (2025). EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation. Algorithms, 18(2), 71. https://doi.org/10.3390/a18020071