EfficientSegNet: Lightweight Semantic Segmentation with Multi-Scale Feature Fusion and Boundary Enhancement
Abstract
1. Introduction
- (1)
- Cascade Attention Dense Field (CADF) Module: The CADF module captures deep semantic information through progressively enlarged multi-scale receptive fields. Information sharing across scales is facilitated to maintain high-resolution representations and produce finer segmentation outputs.
- (2)
- Dynamic Weighting Fusion (DWF) Module: The DWF module supervises the fusion process of deep and shallow features by dynamically weighting feature contributions. This mechanism compensates for resolution-induced information loss and enhances boundary clarity.
- (3)
- Lightweight Architecture: EfficientSegNet employs a resource-efficient design that significantly reduces computational demands, enabling real-time inference on mobile or resource-constrained devices without compromising segmentation accuracy.
2. Related Work
3. Methodology
3.1. Network Architecture
3.2. Cascade Attention Dense Field
3.3. Dynamic Weighting Feature Fusion
4. Experiments
4.1. Experimental Equipment
4.2. Experimental Environment
4.3. Datasets
4.4. Evaluation Metrics
5. Results
5.1. Ablation Experiment
5.2. Module Ablation Experiment
5.3. Contrast Experiment
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Elhassan, M.A.M.; Zhou, C.; Khan, A.; Benabid, A.; Adam, A.B.M.; Mehmood, A.; Wambugu, N. Real-time semantic segmentation for autonomous driving: A review of CNNs, Transformers, and beyond. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102226. [Google Scholar] [CrossRef]
- Lyu, J.; Meng, J.; Zhang, Y.; Ling, S.H. MUF-Net: A Novel Self-Attention Based Dual-Task Learning Approach for Automatic Left Ventricle Segmentation in Echocardiography. Sensors 2025, 25, 2704. [Google Scholar] [CrossRef] [PubMed]
- Zeng, F.; Wang, R.; Jiang, Y.; Liu, Z.; Ding, Y.; Dong, W.; Xu, C.; Zhang, D.; Wang, J. Growth monitoring of rapeseed seedlings in multiple growth stages based on low-altitude remote sensing and semantic segmentation. Comput. Electron. Agric. 2025, 232, 110135. [Google Scholar] [CrossRef]
- Zhang, P.; Zhang, S.; Wang, J.; Sun, X. Identifying rice lodging based on semantic segmentation architecture optimization with UAV remote sensing imaging. Comput. Electron. Agric. 2024, 227, 109570. [Google Scholar] [CrossRef]
- Acharya, D.; Tennakoon, R.; Muthu, S.; Khoshelham, K.; Hoseinnezhad, R.; Bab-Hadiashar, A. Single-image localisation using 3D models: Combining hierarchical edge maps and semantic segmentation for domain adaptation. Autom. Constr. 2022, 136, 104152. [Google Scholar] [CrossRef]
- Feng, H.; Liu, W.; Xu, H.; He, J. A lightweight dual-branch semantic segmentation network for enhanced obstacle detection in ship navigation. Eng. Appl. Artif. Intell. 2024, 136, 108982. [Google Scholar] [CrossRef]
- Kos, A.; Majek, K.; Belter, D. SegTrackDetect: A window-based framework for tiny object detection via semantic segmentation and tracking. SoftwareX 2025, 30, 102110. [Google Scholar] [CrossRef]
- Zhang, L.; Wei, Z.; Xiao, Z.; Ji, A.; Wu, B. Dual hierarchical attention-enhanced transfer learning for semantic segmentation of point clouds in building scene understanding. Autom. Constr. 2024, 168, 105799. [Google Scholar] [CrossRef]
- Zhang, H.; Li, L.; Xie, X.; He, Y.; Ren, J.; Xie, G. Entropy guidance hierarchical rich-scale feature network for remote sensing image semantic segmentation of high resolution. Appl. Intell. 2025, 55, 528. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, Y.; Ca, J.; Qin, Q.; Feng, Y.; Yan, J. Semantic segmentation network for mangrove tree species based on UAV remote sensing images. Sci. Rep. 2024, 14, 29860. [Google Scholar] [CrossRef]
- Li, Z.R.; Yan, L.; Leng, Z.; Liu, X.; Zhaodong, F.; Zhou, Q. Enhancing precision in automated segmentation of ambiguous boundaries based on multi-model deep learning and boundary confidence computation. Int. J. Radiat. Oncol. 2024, 120, 640. [Google Scholar] [CrossRef]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3684–3692. [Google Scholar]
- Zhao, S.; Wu, X.; Tian, K.; Yuan, Y. A Transformer-Based Hierarchical Hybrid Encoder Network for Semantic Segmentation. Neural Process. Lett. 2025, 57, 66. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, G. UAV Imagery Real-Time Semantic Segmentation with Global–Local Information Attention. Sensors 2025, 25, 1786. [Google Scholar] [CrossRef]
- Lian, X.; Kang, M.; Tan, L.; Sun, X.; Wang, Y. LSSMask: A lightweight semantic segmentation network for dynamic object. Signal Image Video Process. 2025, 19, 216. [Google Scholar] [CrossRef]
- Mo, X.; Feng, Y.; Liu, Y. Deep semantic segmentation for drivable area detection on unstructured roads. Comput. Vis. Image Underst. 2025, 259, 104420. [Google Scholar] [CrossRef]
- Li, T.; Yang, X.; Zhang, Z.; Cui, Z.; Maoxia, Z. Mix-layers semantic extraction and multi-scale aggregation transformer for semantic segmentation. Complex Intell. Syst. 2024, 11, 36. [Google Scholar] [CrossRef]
- Matić, T.; Aleksi, I.; Hocenski, Z.; Kraus, D. Real-time biscuit tile image segmentation method based on edge detection. ISA Trans. 2018, 76, 246–254. [Google Scholar] [CrossRef]
- Li, J.; Tang, W.; Wang, J.; Zhang, X. Multilevel thresholding selection based on variational mode decomposition for image segmentation. Signal Process. 2018, 147, 80–91. [Google Scholar] [CrossRef]
- Cheng, Z.; Wang, J. Improved region growing method for image segmentation of three-phase materials. Powder Technol. 2020, 368, 80–89. [Google Scholar] [CrossRef]
- Azam, S.; Rony, M.A.H.; Raiaan, M.A.K.; Fatema, K.; Karim, A.; Jonkman, M.; Beissbarth, J.; Leach, A.; De Boer, F. Reimagining otitis media diagnosis: A fusion of nested U-Net segmentation with graph theory-inspired feature set. Array 2024, 23, 100362. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Wang, J.; Sun, K.; Chen, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. arXiv 2020, arXiv:1904.04514. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Papandreou, G.; Kokkinos, L.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv 2016, arXiv:1412.7062. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Kokkinos, L.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv 2017, arXiv:1606.00915. [Google Scholar] [CrossRef]
- Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2018, arXiv:1706.05587. [Google Scholar]
- Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar] [CrossRef]
- Tang, H.; Li, Z.; Zhang, D.; He, S.; Tang, J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1958–1974. [Google Scholar] [CrossRef]
- Khan, A.; Khan, S.; Elhassan, M.A.M.; Deng, H. Advancing Road Safety: A Lightweight Feature Fusion Model for Robust Road Crack Segmentation. Adv. Eng. Inform. 2025, 65, 103262. [Google Scholar] [CrossRef]
- Wei, X.; Li, J.; Zhang, X.; Gu, H.; Van de Weghe, N.; Huang, H. An Innovative Framework Combining a CNN-Transformer Multiscale Fusion Network and Spatial Analysis for Cycleway Extraction Using Street View Images. Sustain. Cities Soc. 2025, 126, 106384. [Google Scholar] [CrossRef]
- Song, M.; Wang, Z.; Chen, Y.; Li, Y.; Jin, Y.; Jia, B. A lightweight multi scale fusion network for IGBT ultrasonic tomography image segmentation. Sci. Rep. 2025, 15, 888. [Google Scholar] [CrossRef]
- Ma, T.; Wang, K.; Hu, F. LMU-Net: Lightweight U-shaped network for medical image segmentation. Med. Biol. Eng. Comput. 2024, 62, 61–70. [Google Scholar] [CrossRef]
- Lu, S.; Chen, Y.; Chen, Y.; Li, P.; Sun, J.; Zheng, C.; Zou, Y.; Liang, B.; Li, M.; Jin, Q.; et al. General lightweight framework for vision foundation model supporting multi-task and multi-center medical image analysis. Nat. Commun. 2025, 16, 2097. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.; Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11218, pp. 122–138. [Google Scholar]
- Singha, T.; Pham, D.; Krishna, A. A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders. Pattern Recognit. 2023, 140, 109557. [Google Scholar] [CrossRef]
- Tsai, T.; Tseng, Y. BiSeNet V3: Bilateral segmentation network with coordinate attention for real-time semantic segmentation. Neurocomputing 2023, 532, 33–42. [Google Scholar] [CrossRef]
- Dai, Y.; Wang, Y.; Meng, L.; Ren, J. A Multiple Resolution Branch Attention Neural Network for Scene Understanding of Intelligent Autonomous Platform. Expert Syst. Appl. 2025, 267, 126253. [Google Scholar] [CrossRef]
- Dai, Y.; Meng, L.; Sun, F.; Wang, S. Lightweight Multi-Scale Feature Dense Cascade Neural Network for Scene Understanding of Intelligent Autonomous Platform. Expert Syst. Appl. 2025, 259, 125354. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Lu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 1290–1299. [Google Scholar]
- Zhang, L.; Huang, W.; Fan, B. SARFormer: Segmenting Anything Guided Transformer for semantic segmentation. Neurocomputing 2025, 635, 129915. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, A. Plant disease detection algorithm based on efficient Swin Transformer. Comput. Mater. Continua 2025, 82, 3045–3068. [Google Scholar]
- Zhu, P.; Liu, J. Joint U-Nets with hierarchical graph structure and sparse Transformer for hyperspectral image classification. Expert Syst. Appl. 2025, 275, 127046. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Mottaghi, R.; Chen, X.; Liu, X.; Cho, N.; Lee, S.W.; Fidler, S.; Urtasun, R.; Yuille, A.L. The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 891–898. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
(6, 12, 18) | (3, 6, 12, 18) | (6, 12, 18, 24) | (3, 6, 12, 18, 24) | ||
---|---|---|---|---|---|
Parameter (M) | 5.05 | 5.75 | 5.75 | 6.48 | |
FLOPs (G) | 76.17 | 88.85 | 88.85 | 101.67 | |
mIoU (%) | PASCAL VOC | 76.81 | 78.6 | 78.43 | 78.79 |
PASCAL Context | 67.14 | 69.93 | 69.06 | 70.26 | |
CityScapes | 66.84 | 68.6 | 68.33 | 68.85 | |
MPA (%) | PASCAL VOC | 87.5 | 88.3 | 88.28 | 89.05 |
PASCAL Context | 78.57 | 80.34 | 80.45 | 80.65 | |
CityScapes | 77.72 | 78.71 | 78.5 | 79.06 |
D-Conv | DDW Conv | D-Conv + Attention | DDW Conv + Attention | ||
---|---|---|---|---|---|
Parameter (M) | 5.75 | 4.94 | 5.87 | 5.07 | |
mIoU (%) | PASCAL VOC | 78.6 | 78.82 | 80.51 | 80.73 |
PASCAL Context | 69.93 | 69.85 | 71.77 | 71.69 | |
CityScapes | 68.6 | 68.68 | 70.30 | 70.38 | |
MPA (%) | PASCAL VOC | 88.3 | 89.07 | 89.68 | 90.45 |
PASCAL Context | 80.34 | 80.77 | 81.32 | 81.75 | |
CityScapes | 78.71 | 79.12 | 80.38 | 80.79 |
Without Transition Layer | Concatenation Transition Layer | DWF | ||
---|---|---|---|---|
Parameter (M) | 4.84 | 6.2 | 3.07 | |
mIoU (%) | PASCAL VOC | 76.22 | 77.24 | 77.74 |
PASCAL Context | 67.63 | 69.2 | 69.5 | |
CityScapes | 66.56 | 68.32 | 67.97 | |
MPA (%) | PASCAL VOC | 89.34 | 90.26 | 89.99 |
PASCAL Context | 78.52 | 78.5 | 79.33 | |
CityScapes | 77.14 | 76.96 | 77.71 |
Baseline | Baseline+V3 | Baseline+V3+CADF | Baseline+V3+DWF | Baseline+V3+CADF+DWF | ||
---|---|---|---|---|---|---|
Parameter (M) | 54.71 | 4.84 | 5.07 | 3.07 | 2.92 | |
FLOPs (G) | 167 | 48.48 | 87.3 | 5.72 | 5.23 | |
mIoU (%) | PASCAL VOC | 76.95 | 76.22 | 80.73 | 77.74 | 81.13 |
PASCAL Context | 70.71 | 67.63 | 71.69 | 69.5 | 74.9 | |
CityScapes | 68.09 | 66.56 | 70.38 | 67.97 | 71.39 | |
mPA (%) | PASCAL VOC | 79.95 | 89.34 | 90.45 | 89.99 | 90.46 |
PASCAL Context | 80.1 | 78.52 | 81.75 | 79.33 | 90.2 | |
CityScapes | 76.96 | 77.14 | 80.79 | 77.71 | 82.04 |
Model | mIoU (%) | Parameter (M) | Average FPS | |||
---|---|---|---|---|---|---|
PASCAL VOC | CityScapes | PASCAL Context | GPU | Jetson Nano | ||
U-Net | 67.53 | 60.6 | 56.59 | 43.93 | 23.33 | 5.55 |
PSPnet | 82.6 | 78.6 | 76.57 | 46.71 | 31.68 | 4.86 |
DeepLab V3+ | 76.95 | 68.09 | 70.71 | 54.71 | 11.3 | 2.58 |
Mask2Former | 74.1 | 62.1 | 59.7 | 43.96 | 33.91 | 1.2 |
BiseNet v3 | 80.74 | 72.2 | 69.72 | 39.37 | 56.49 | 10.51 |
SFRSeg | 79.3 | 70.6 | 61.4 | 1.6 | 194 | 41.1 |
EfficientSegNet | 81.13 | 71.39 | 74.9 | 2.92 | 76.56 | 31.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Li, M.; Zhang, P.; Liu, P. EfficientSegNet: Lightweight Semantic Segmentation with Multi-Scale Feature Fusion and Boundary Enhancement. Sensors 2025, 25, 5934. https://doi.org/10.3390/s25195934
Zhang L, Li M, Zhang P, Liu P. EfficientSegNet: Lightweight Semantic Segmentation with Multi-Scale Feature Fusion and Boundary Enhancement. Sensors. 2025; 25(19):5934. https://doi.org/10.3390/s25195934
Chicago/Turabian StyleZhang, Le, Mengwei Li, Peng Zhang, and Peng Liu. 2025. "EfficientSegNet: Lightweight Semantic Segmentation with Multi-Scale Feature Fusion and Boundary Enhancement" Sensors 25, no. 19: 5934. https://doi.org/10.3390/s25195934
APA StyleZhang, L., Li, M., Zhang, P., & Liu, P. (2025). EfficientSegNet: Lightweight Semantic Segmentation with Multi-Scale Feature Fusion and Boundary Enhancement. Sensors, 25(19), 5934. https://doi.org/10.3390/s25195934