LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling
Abstract
1. Introduction
2. Methods
2.1. Overall Framework of LGD-DeeplabV3+
2.2. LISModule
2.3. GCRModule
2.4. DPFModule
2.5. Loss Function and Training Objective
2.6. Experimental Setup
3. Experiment
3.1. Dataset and Evaluation Protocol
3.1.1. Loveda Dataset
3.1.2. ISPRS Potsdam Dataset
3.1.3. Evaluation Setup and Metrics
3.2. Comprehensive Comparison Experiments on the LoveDA Dataset
3.3. Comprehensive Comparison Experiments on the ISPRS Potsdam Dataset
3.4. Ablation Study and Complexity Trade-Off
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, J.; Cai, Y.; Li, Q.; Kou, M.; Zhang, T. A review of remote sensing image segmentation by deep learning methods. Int. J. Digit. Earth 2024, 17, 2328827. [Google Scholar] [CrossRef]
- Chen, B.; Tong, A.; Wang, Y.; Zhang, J.; Yang, X.; Im, S.-K. LKAFFNet: A Novel Large-Kernel Attention Feature Fusion Network for Land Cover Segmentation. Sensors 2024, 25, 54. [Google Scholar] [CrossRef]
- Song, X.; Chen, M.; Rao, J.; Luo, Y.; Lin, Z.; Zhang, X.; Li, S.; Hu, X. MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images. Sensors 2025, 25, 4660. [Google Scholar] [CrossRef] [PubMed]
- Yang, N.; Tian, C.; Gu, X.; Zhang, Y.; Li, X.; Zhang, F. RST-Net: A Semantic Segmentation Network for Remote Sensing Images Based on a Dual-Branch Encoder Structure. Sensors 2025, 25, 5531. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Liu, W.; Lin, Y.; Liu, W.; Yu, Y.; Li, J. An attention-based multiscale transformer network for remote sensing image change detection. ISPRS J. Photogramm. Remote Sens. 2023, 202, 599–609. [Google Scholar] [CrossRef]
- Liu, X.; Gao, P.; Yu, T.; Wang, F.; Yuan, R.-Y. CSWin-UNet: Transformer UNet with cross-shaped windows for medical image segmentation. Inf. Fusion 2025, 113, 102634. [Google Scholar] [CrossRef]
- Fan, J.; Shi, Z.; Ren, Z.; Zhou, Y.; Ji, M. DDPM-SegFormer: Highly refined feature land use and land cover segmentation with a fused denoising diffusion probabilistic model and transformer. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104093. [Google Scholar] [CrossRef]
- Chen, X.; Li, D.; Liu, M.; Jia, J. CNN and transformer fusion for remote sensing image semantic segmentation. Remote Sens. 2023, 15, 4455. [Google Scholar] [CrossRef]
- He, Y.; Li, C.; Li, X.; Bai, T. A Lightweight CNN Based on Axial Depthwise Convolution and Hybrid Attention for Remote Sensing Image Dehazing. Remote Sens. 2024, 16, 2822. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Ding, X.; Pun, M.-O.; Ma, S. Decomposition-based unsupervised domain adaptation for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5645118. [Google Scholar] [CrossRef]
- Xu, Q.; Zhang, R.; Fan, Z.; Wang, Y.; Wu, Y.-Y.; Zhang, Y.J. Fourier-based augmentation with applications to domain generalization. Pattern Recognit. 2023, 139, 109474. [Google Scholar] [CrossRef]
- Tang, Q.; Zhang, B.; Liu, J.; Liu, F.; Liu, Y. Dynamic token pruning in plain vision transformers for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 777–786. [Google Scholar]
- Qiu, J.; Chang, W.; Ren, W.; Hou, S.; Yang, R. MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation. Sensors 2025, 25, 6225. [Google Scholar] [CrossRef]
- Xiao, X.; Zhao, Y.; Zhang, F.; Luo, B.; Yu, L.; Chen, B.; Yang, C. BASeg: Boundary aware semantic segmentation for autonomous driving. Neural Netw. 2023, 157, 460–470. [Google Scholar] [CrossRef]
- Li, M.; Long, J.; Stein, A.; Wang, X. Using a semantic edge-aware multi-task neural network to delineate agricultural parcels from remote sensing images. ISPRS J. Photogramm. Remote Sens. 2023, 200, 24–40. [Google Scholar] [CrossRef]
- Qu, S.; Wang, Z.; Wu, J.; Feng, Y. FBRNet: A feature fusion and border refinement network for real-time semantic segmentation. Pattern Anal. Appl. 2024, 27, 2. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Chen, Y. Semantic image segmentation with feature fusion based on Laplacian pyramid. Neural Process. Lett. 2022, 54, 4153–4170. [Google Scholar] [CrossRef]
- Tong, L.; Li, W.; Yang, Q.; Chen, L.; Chen, P. Vision Transformer with Key-Select Routing Attention for Single Image Dehazing. IEICE Trans. Inf. Syst. 2024, 107, 1472–1475. [Google Scholar] [CrossRef]
- Cheng, H.; Wu, H.; Zheng, J.; Qi, K.; Liu, W. A hierarchical self-attention augmented Laplacian pyramid expanding network for change detection in high-resolution remote sensing images. ISPRS J. Photogramm. Remote Sens. 2021, 182, 52–66. [Google Scholar] [CrossRef]
- Yin, X.; Yu, Z.; Fei, Z.; Lv, W.; Gao, X. Pe-yolo: Pyramid enhancement network for dark object detection. In Proceedings of the International Conference on Artificial Neural Networks, Heraklion, Greece, 26–29 September 2023; pp. 163–174. [Google Scholar]
- Bodner, A.D.; Tepsich, A.S.; Spolski, J.N.; Pourteau, S. Convolutional kolmogorov-arnold networks. arXiv 2024, arXiv:2406.13155. [Google Scholar] [PubMed]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 10323–10333. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Narayanan, M. SENetV2: Aggregated dense layer for channelwise and global representations. arXiv 2023, arXiv:2311.10807. [Google Scholar] [CrossRef]
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
- Song, A.; Kim, Y. Semantic segmentation of remote-sensing imagery using heterogeneous big data: International society for photogrammetry and remote sensing potsdam and cityscape datasets. ISPRS Int. J. Geo-Inf. 2020, 9, 601. [Google Scholar] [CrossRef]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Wu, D.; Guo, Z.; Yu, L.; Sang, N.; Gao, C. Structural Pruning via Spatial-aware Information Redundancy for Semantic Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 8368–8376. [Google Scholar]
- Selvarajan, N.P.; Megalingam, R.K.; Raghavan, D.; Sudheesh, S.K. Optimizing semantic segmentation for autonomous vehicles: A quantization approach. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 August 2024; pp. 1–6. [Google Scholar]
- Yang, C.; Zhou, H.; An, Z.; Jiang, X.; Xu, Y.; Zhang, Q. Cross-image relational knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12319–12328. [Google Scholar]









| Parameter | LoveDA | ISPRS Potsdam |
|---|---|---|
| Epochs | Iter-based (50,000 iterations) | Iter-based (50,000 iterations) |
| Batch size | 4 | 4 |
| Image size | Train crop: 512 × 512 Val/Test source: 1024 × 1024 | Train crop: 512 × 512 Val/Test source: 6000 × 6000 |
| Optimizer algorithm | Adam | Adam |
| Learning rate | 0.001 | 0.001 |
| Weight decay | 0.005 | 0.005 |
| Ex | Model | Model Variant | OA (%) | mIoU (%) | mF1 (%) |
|---|---|---|---|---|---|
| 1 | BiSeNetV2 | - | 73.58 | 54.36 | 69.91 |
| 2 | ConvNeXt | Tiny + UPerNet | 67.31 | 42.41 | 58.32 |
| 3 | PSPNet | Resnet-18 | 71.17 | 50.72 | 66.68 |
| 4 | ResUNet | - | 68.03 | 46.57 | 62.35 |
| 5 | SegFormer | B0 | 73.37 | 53.01 | 68.50 |
| 6 | Swin | Tiny + UPerNet | 63.95 | 39.14 | 55.23 |
| 7 | DeepLabV3+ | Resnet-18 | 71.39 | 49.65 | 65.48 |
| 8 | LGD-DeepLabV3+ | - | 76.32 | 58.48 | 73.37 |
| Class (K = 7) | Baseline mIoU (%) | LGD-DeepLabV3+ mIoU (%) | Baseline mF1 (%) | LGD-DeepLabV3+ mF1 (%) |
|---|---|---|---|---|
| background | 57.36 | 60.17 | 72.91 | 75.13 |
| building | 52.42 | 59.22 | 68.78 | 74.39 |
| road | 45.68 | 56.96 | 62.72 | 72.58 |
| water | 62.57 | 72.52 | 76.98 | 84.07 |
| barren | 25.81 | 43.68 | 41.03 | 60.80 |
| forest | 44.17 | 49.14 | 61.28 | 65.90 |
| agriculture | 59.55 | 67.56 | 74.64 | 80.70 |
| Ex | Model | Model Variant | OA (%) | mIoU (%) | mF1 (%) |
|---|---|---|---|---|---|
| 1 | BiSeNetV2 | - | 88.16 | 76.16 | 86.16 |
| 2 | ConvNeXt | Tiny + UPerNet | 77.02 | 58.13 | 72.11 |
| 3 | PSPNet | Resnet-18 | 84.91 | 69.96 | 81.85 |
| 4 | ResUNet | - | 78.14 | 59.70 | 72.54 |
| 5 | SegFormer | B0 | 85.01 | 71.76 | 83.19 |
| 6 | Swin | Tiny + UPerNet | 79.67 | 63.53 | 76.88 |
| 7 | DeepLabV3+ | Resnet-18 | 86.72 | 74.07 | 84.60 |
| 8 | LGD-DeepLabV3+ | - | 90.34 | 80.79 | 89.16 |
| Class (K = 6) | Baseline mIoU (%) | LGD-DeepLabV3+ mIoU (%) | Baseline mF1 (%) | LGD-DeepLabV3+ mF1 (%) |
|---|---|---|---|---|
| Background | 54.16 | 70.73 | 70.26 | 82.85 |
| Impervious surfaces | 82.72 | 88.50 | 90.54 | 93.90 |
| Building | 89.99 | 93.51 | 94.73 | 96.65 |
| Low vegetation | 71.32 | 76.54 | 83.26 | 86.71 |
| Tree | 69.25 | 74.47 | 81.83 | 85.37 |
| Car | 76.99 | 81.02 | 87.00 | 89.51 |
| Ex | Method | OA (%) | mIoU (%) | mF1 (%) | ΔmIoU (pp) | ΔmF1 (pp) |
|---|---|---|---|---|---|---|
| 1 | DeepLabV3+ | 71.39 | 49.65 | 65.48 | - | - |
| 2 | +LISModule | 74.62 | 55.48 | 70.76 | +5.83 | +5.28 |
| 3 | +GCRModule | 74.73 | 56.07 | 71.32 | +6.42 | +5.84 |
| 4 | +DPFModule | 75.05 | 56.78 | 71.94 | +7.13 | +6.46 |
| 5 | +LISModule+GCRModule | 74.72 | 55.91 | 70.72 | +6.26 | +5.24 |
| 6 | +LISModule+DPFModule | 75.62 | 55.72 | 71.59 | +6.07 | +6.11 |
| 7 | +GCRModule+DPFModule | 75.93 | 57.54 | 72.76 | +7.89 | +7.28 |
| 8 | LGD-DeepLabV3+ | 76.32 | 58.48 | 73.37 | +8.83 | +7.89 |
| Ex | Method | OA (%) | mIoU (%) | mF1 (%) | ΔmIoU (pp) | ΔmF1 (pp) |
|---|---|---|---|---|---|---|
| 1 | DeepLabV3+ | 86.72 | 74.07 | 84.60 | - | - |
| 2 | +LISModule | 88.75 | 77.54 | 86.98 | +3.47 | +2.38 |
| 3 | +GCRModule | 89.44 | 79.15 | 88.10 | +5.08 | +3.50 |
| 4 | +DPFModule | 89.39 | 78.97 | 87.97 | +4.90 | +3.37 |
| 5 | +LISModule+GCRModule | 88.73 | 79.65 | 87.65 | +5.58 | +3.05 |
| 6 | +LISModule+DPFModule | 89.43 | 79.52 | 88.97 | +5.45 | +4.37 |
| 7 | +GCRModule+DPFModule | 89.51 | 80.26 | 88.95 | +6.19 | +4.35 |
| 8 | LGD-DeepLabV3+ | 90.34 | 80.79 | 89.16 | +6.72 | +4.56 |
| Dataset | Method | Params (M) | GFLOPs (G) | Latency (ms) | FPS | PeakMem (GB) |
|---|---|---|---|---|---|---|
| LoveDA | Baseline | 24.90 | 23.34 | 4.53 | 220.79 | 0.280 |
| LoveDA | +GCRModule | 103.66 | 43.50 | 6.15 | 162.51 | 0.575 |
| LoveDA | LGD-DeepLabV3+ | 110.18 | 136.12 | 26.65 | 37.53 | 1.541 |
| Potsdam | Baseline | 24.89 | 23.33 | 4.50 | 222.12 | 0.277 |
| Potsdam | +GCRModule | 103.66 | 43.49 | 5.62 | 178.05 | 0.575 |
| Potsdam | LGD-DeepLabV3+ | 110.18 | 136.11 | 26.70 | 37.46 | 1.541 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, X.; Liu, X.; Mahmood, A.; Yang, Y.; Li, X. LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling. Sensors 2026, 26, 1008. https://doi.org/10.3390/s26031008
Wang X, Liu X, Mahmood A, Yang Y, Li X. LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling. Sensors. 2026; 26(3):1008. https://doi.org/10.3390/s26031008
Chicago/Turabian StyleWang, Xin, Xu Liu, Adnan Mahmood, Yaxin Yang, and Xipeng Li. 2026. "LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling" Sensors 26, no. 3: 1008. https://doi.org/10.3390/s26031008
APA StyleWang, X., Liu, X., Mahmood, A., Yang, Y., & Li, X. (2026). LGD-DeepLabV3+: An Enhanced Framework for Remote Sensing Semantic Segmentation via Multi-Level Feature Fusion and Global Modeling. Sensors, 26(3), 1008. https://doi.org/10.3390/s26031008

