LMVMamba: A Hybrid U-Shape Mamba for Remote Sensing Segmentation with Adaptation Fine-Tuning
Abstract
Highlights
- A novel hybrid model for segmentation of remote sensing images is proposed.
- Our model achieves high-precision land-cover classification.
- Enables more reliable mapping of complex land-cover patterns.
- Provides a robust foundation for environmental monitoring and resource management.
Abstract
1. Introduction
- (1)
- An innovative hybrid model combining the ViT, Mamba, and CNN models is constructed for remote sensing semantic segmentation tasks. The model has the advantages of global modeling from ViT, long-sequence dependency processing from Mamba and local feature extraction from CNN. The performance of LMVMamba in terms of land-cover segmentation is superior to several state-of-the-art segmentation models.
- (2)
- A low-rank adaptation fine-tuning strategy is incorporated into the ResT encoder. This can preserve the capabilities of ViT and can enhance its capability for remote sensing segmentation tasks by updating a limited number of trainable parameters.
- (3)
- Two modules for multi-scale feature fusion, MPB and LAS, are designed. These modules collectively enhance local feature representation and enable more efficient fusion across multiple scales, which is particularly beneficial for remote sensing images rich in semantic information.
2. Related Works
2.1. Deep Learning Methods in Semantic Segmentation
2.2. Efficient Fine-Tuning with Low-Rank Adaptation
2.3. Mamba Models
3. Method
3.1. Overview
3.2. LoRA Fine-Tuning Technology
3.3. Multi-Scale Post Block
3.4. Local Attention Supervision
4. Experimental Results
4.1. Experimental Dataset Description
4.2. Evaluation Metrics
4.3. Comparison Results
4.4. Ablation Experiments and Results
5. Discussion and Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
- Yang, H.; Yu, B.; Luo, J.; Chen, F. Semantic segmentation of high spatial resolution images with deep neural networks. GISci. Remote Sens. 2019, 56, 749–768. [Google Scholar] [CrossRef]
- Zhang, F.; Guo, A.; Hu, Z.; Liang, Y. A novel image fusion method based on UAV and Sentinel-2 for environmental monitoring. Sci. Rep. 2025, 15, 27256. [Google Scholar] [CrossRef]
- Zhu, H.; Yao, J.; Meng, J.; Cui, C.; Wang, M.; Yang, R. A method to construct an environmental vulnerability model based on multi-source data to evaluate the hazard of short-term precipitation-induced flooding. Remote Sens. 2023, 15, 1609. [Google Scholar] [CrossRef]
- Tamás, J.; Louis, A.; Fehér, Z.Z.; Nagy, A. Land Cover Mapping Using High-Resolution Satellite Imagery and a Comparative Machine Learning Approach to Enhance Regional Water Resource Management. Remote Sens. 2025, 17, 2591. [Google Scholar] [CrossRef]
- Guo, Y.; Jia, X.; Paull, D. Effective sequential classifier training for SVM-based multitemporal remote sensing image classification. IEEE Trans. Image Process. 2018, 27, 3036–3048. [Google Scholar] [CrossRef] [PubMed]
- Adugna, T.; Xu, W.; Fan, J. Comparison of random forest and support vector machine classifiers for regional land cover mapping using coarse resolution FY-3C images. Remote Sens. 2022, 14, 574. [Google Scholar] [CrossRef]
- Song, X.; Chen, M.; Rao, J.; Luo, Y.; Lin, Z.; Zhang, X.; Li, S.; Hu, X. MFPI-Net: A Multi-Scale Feature Perception and Interaction Network for Semantic Segmentation of Urban Remote Sensing Images. Sensors 2025, 25, 4660. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, L.; Jiang, T.; Li, Z.; Wu, W.; Kuang, Y. An Improved Segformer for Semantic Segmentation of UAV-Based Mine Restoration Scenes. Sensors 2025, 25, 3827. [Google Scholar] [CrossRef]
- Wang, H.; Shi, J.; Karimian, H.; Liu, F.; Wang, F. YOLOSAR-Lite: A lightweight framework for real-time ship detection in SAR imagery. Int. J. Digit. Earth 2024, 17, 2405525. [Google Scholar] [CrossRef]
- Chen, J.; Yuan, Z.; Peng, J.; Chen, L.; Huang, H.; Zhu, J.; Liu, Y.; Li, H. DASNet: Dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 1194–1206. [Google Scholar] [CrossRef]
- Tao, Y.; Karimian, H.; Shi, J.; Wang, H.; Yang, X.; Xu, Y.; Yang, Y. MobileYOLO-Cyano: An Enhanced Deep Learning Approach for Precise Classification of Cyanobacterial Genera in Water Quality Monitoring. Water Res. 2025, 285, 124081. [Google Scholar] [CrossRef]
- Song, J.; Gao, S.; Zhu, Y.; Ma, C. A survey of remote sensing image classification based on CNNs. Big Earth Data 2019, 3, 232–254. [Google Scholar] [CrossRef]
- Peng, H.; Xue, C.; Shao, Y.; Chen, K.; Xiong, J.; Xie, Z.; Zhang, L. Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access 2020, 8, 164546–164555. [Google Scholar] [CrossRef]
- Li, R.; Wang, L.; Zhang, C.; Duan, C.; Zheng, S. A2-FPN for semantic segmentation of fine-resolution remotely sensed images. Int. J. Remote Sens. 2022, 43, 1131–1155. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Su, J.; Wang, L.; Atkinson, P.M. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607713. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Transformer-based decoder designs for semantic segmentation on remotely sensed images. Remote Sens. 2021, 13, 5100. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Duan, C.; Zhang, C.; Meng, X.; Fang, S. A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6506105. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Huang, Y.; Li, M.; Yang, G. Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605116. [Google Scholar] [CrossRef]
- Wei, T.; Chen, H.; Liu, W.; Chen, L.; Wang, J. Retain and Enhance Modality-Specific Information for Multimodal Remote Sensing Image Land Use/Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5635318. [Google Scholar] [CrossRef]
- Wang, Z.; Li, J.; Xu, N.; You, Z. Combining feature compensation and GCN-based reconstruction for multimodal remote sensing image semantic segmentation. Inf. Fusion 2025, 122, 103207. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Heinrich, G.; Yin, H.; Tao, A.; Alvarez, J.M.; Kautz, J.; Molchanov, P. Fastervit: Fast vision transformers with hierarchical attention. arXiv 2023, arXiv:2306.06189. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Zhao, S.; Chen, H.; Zhang, X.; Xiao, P.; Bai, L.; Ouyang, W. Rs-mamba for large remote sensing image dense prediction. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5633314. [Google Scholar] [CrossRef]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar] [CrossRef]
- Ma, X.; Zhang, X.; Pun, M.-O. Rs3mamba: Visual state space model for remote sensing image semantic segmentation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6011405. [Google Scholar] [CrossRef]
- Wang, L.; Li, D.; Dong, S.; Meng, X.; Zhang, X.; Hong, D. PyramidMamba: Rethinking pyramid feature fusion with selective space state model for semantic segmentation of remote sensing imagery. arXiv 2024, arXiv:2406.10828. [Google Scholar] [CrossRef]
- Ding, H.; Xia, B.; Liu, W.; Zhang, Z.; Zhang, J.; Wang, X.; Xu, S. A novel mamba architecture with a semantic transformer for efficient real-time remote sensing semantic segmentation. Remote Sens. 2024, 16, 2620. [Google Scholar] [CrossRef]
- He, H.; Zhang, J.; Cai, Y.; Chen, H.; Hu, X.; Gan, Z.; Wang, Y.; Wang, C.; Wu, Y.; Xie, L. Mobilemamba: Lightweight multi-receptive visual mamba network. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 4497–4507. [Google Scholar]
- Zhang, Q.; Yang, Y.-B. Rest: An efficient transformer for visual recognition. Adv. Neural Inf. Process. Syst. 2021, 34, 15475–15485. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Nashville, TN, USA, 19–25 June 2021; pp. 11863–11874. [Google Scholar]
- Deng, P.; Xu, K.; Huang, H. When CNNs meet vision transformer: A joint framework for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8020305. [Google Scholar] [CrossRef]
- Fang, L.; Zhou, P.; Liu, X.; Ghamisi, P.; Chen, S. Context enhancing representation for semantic segmentation in remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4138–4152. [Google Scholar] [CrossRef]
- Meng, X.; Yang, Y.; Wang, L.; Wang, T.; Li, R.; Zhang, C. Class-guided swin transformer for semantic segmentation of remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6517505. [Google Scholar] [CrossRef]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Xue, B.; Cheng, H.; Yang, Q.; Wang, Y.; He, X. Adapting segment anything model to aerial land cover classification with low-rank adaptation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 2502605. [Google Scholar] [CrossRef]
- Xu, C.; Guo, H.; Cen, C.; Chen, M.; Tao, X.; He, J. Efficient program optimization through knowledge-enhanced LoRA fine-tuning of large language models. J. Supercomput. 2025, 81, 1006. [Google Scholar] [CrossRef]
- Xiong, J.; Pan, L.; Liu, Y.; Zhu, L.; Zhang, L.; Tan, S. Enhancing Plant Protection Knowledge with Large Language Models: A Fine-Tuned Question-Answering System Using LoRA. Appl. Sci. 2025, 15, 3850. [Google Scholar] [CrossRef]
- Hu, Y.; Xie, Y.; Wang, T.; Chen, M.; Pan, Z. Structure-aware low-rank adaptation for parameter-efficient fine-tuning. Mathematics 2023, 11, 4317. [Google Scholar] [CrossRef]
- Chavan, A.; Liu, Z.; Gupta, D.; Xing, E.; Shen, Z. One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv 2023, arXiv:2306.07967. [Google Scholar]
- Huang, C.; Liu, Q.; Lin, B.Y.; Pang, T.; Du, C.; Lin, M. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv 2023, arXiv:2307.13269. [Google Scholar]
- Wang, Y.; Cao, L.; Deng, H. MFMamba: A mamba-based multi-modal fusion network for semantic segmentation of remote sensing images. Sensors 2024, 24, 7266. [Google Scholar] [CrossRef] [PubMed]
- Zhu, E.; Chen, Z.; Wang, D.; Shi, H.; Liu, X.; Wang, L. Unetmamba: An efficient unet-like mamba for semantic segmentation of high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2024, 22, 6001205. [Google Scholar] [CrossRef]
- Li, M.; Xing, Z.; Wang, H.; Jiang, H.; Xie, Q. SF-Mamba: A Semantic-flow Foreground-aware Mamba for Semantic Segmentation of Remote Sensing Images. IEEE Multimed. 2025, 32, 85–95. [Google Scholar] [CrossRef]
- Mu, J.; Zhou, S.; Sun, X. PPMamba: Enhancing Semantic Segmentation in Remote Sensing Imagery by SS2D. IEEE Geosci. Remote Sens. Lett. 2024, 22, 6001705. [Google Scholar] [CrossRef]
- Du, F.; Wu, S. ECMNet: Lightweight Semantic Segmentation with Efficient CNN-Mamba Network. arXiv 2025, arXiv:2506.08629. [Google Scholar] [CrossRef]
- Xia, J.; Yokoya, N.; Adriano, B.; Broni-Bediako, C. Openearthmap: A benchmark dataset for global high-resolution land cover mapping. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 6254–6264. [Google Scholar]
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
Dataset | Training Set | Validation Set | Testing Set |
---|---|---|---|
OpenEarthMap | 3500 | 500 | 1500 |
LoveDA. | 2522 | 1669 | 1796 |
Metric | Definition | Equation |
---|---|---|
mean intersection over union/mIoU | The mean of the degree of overlap between the predicted and actual labels | |
mean F1 score/mF1 | Average of F1 for each category | |
overall accuracy/OA | Percentage of correctly predicted pixels to total pixels |
Models | Barren | Rangeland | Developed | Road | Tree | Water | Agriculture | Building | mIoU (%) | OA (%) | mF1 (%) |
---|---|---|---|---|---|---|---|---|---|---|---|
MANet [16] | 48.2 | 52.4 | 49.8 | 56.3 | 68.0 | 65.8 | 72.5 | 72.5 | 60.7 | 75.1 | 75.1 |
A2FPN [15] | 54.2 | 54.6 | 51.8 | 60.7 | 69.3 | 72.1 | 75.9 | 75.4 | 64.2 | 77.8 | 77.8 |
DC-Swin [19] | 45.3 | 45.4 | 38.5 | 41.8 | 59.4 | 53.8 | 61.9 | 58.6 | 50.6 | 67.1 | 66.8 |
UNetFormer [20] | 51.7 | 53.6 | 52.6 | 61.1 | 69.8 | 70.9 | 72.8 | 76.0 | 63.6 | 77.5 | 77.3 |
Efficient PyramidMamba [30] | 55.2 | 54.2 | 52.0 | 60.2 | 69.7 | 71.3 | 75.4 | 75.4 | 64.2 | 77.7 | 77.8 |
RS3Mamba [29] | 50.6 | 53.3 | 51.6 | 61.0 | 69.1 | 62.2 | 75.0 | 76.4 | 62.4 | 77.2 | 76.4 |
LMVMamba (Ours) | 57.9 | 57.4 | 56.4 | 65.1 | 72.0 | 77.5 | 77.1 | 79.9 | 67.9 | 80.3 | 80.5 |
Models | Background | Building | Road | Water | Barren | Forest | Agriculture | mIoU (%) | OA (%) | mF1 (%) |
---|---|---|---|---|---|---|---|---|---|---|
MANet [16] | 52.2 | 61.8 | 50.8 | 58.8 | 27.8 | 38.6 | 51.9 | 48.9 | 68.1 | 64.9 |
A2FPN [15] | 48.1 | 60.2 | 53.4 | 64.0 | 12.3 | 38.8 | 47.4 | 46.2 | 63.9 | 61.3 |
DC-Swin [19] | 48.0 | 47.6 | 47.9 | 45.2 | 27.3 | 34.2 | 41.3 | 41.7 | 61.3 | 58.4 |
UNetFormer [20] | 52.3 | 58.9 | 52.9 | 66.4 | 23.6 | 38.1 | 46.6 | 48.4 | 67.4 | 64.1 |
Efficient PyramidMamba [30] | 49.1 | 62.0 | 52.8 | 64.3 | 15.1 | 39.8 | 47.6 | 47.2 | 65.2 | 62.5 |
RS3Mamba [29] | 52.7 | 61.3 | 53.6 | 64.8 | 26.5 | 37.0 | 46.2 | 48.9 | 67.4 | 64.7 |
LMVMamba (Ours) | 54.3 | 61.7 | 55.1 | 68.9 | 33.3 | 43.2 | 49.5 | 52.3 | 69.8 | 68.0 |
Baseline | MPB | LoRA | LAS | mIoU | OA | mF1 |
---|---|---|---|---|---|---|
✓ | 66.1 | 78.3 | 76.5 | |||
✓ | ✓ | 66.7 | 80.0 | 79.6 | ||
✓ | ✓ | ✓ | 67.7 | 80.1 | 80.4 | |
✓ | ✓ | ✓ | ✓ | 67.9 | 80.3 | 80.5 |
Baseline | MPB | LoRA | LAS | mIoU | OA | mF1 |
---|---|---|---|---|---|---|
✓ | 50.5 | 69.7 | 66.2 | |||
✓ | ✓ | 50.5 | 69.0 | 66.5 | ||
✓ | ✓ | ✓ | 51.9 | 69.8 | 67.4 | |
✓ | ✓ | ✓ | ✓ | 52.3 | 69.8 | 68.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, F.; Wang, X.; Wang, H.; Karimian, H.; Shi, J.; Zha, G. LMVMamba: A Hybrid U-Shape Mamba for Remote Sensing Segmentation with Adaptation Fine-Tuning. Remote Sens. 2025, 17, 3367. https://doi.org/10.3390/rs17193367
Li F, Wang X, Wang H, Karimian H, Shi J, Zha G. LMVMamba: A Hybrid U-Shape Mamba for Remote Sensing Segmentation with Adaptation Fine-Tuning. Remote Sensing. 2025; 17(19):3367. https://doi.org/10.3390/rs17193367
Chicago/Turabian StyleLi, Fan, Xiao Wang, Haochen Wang, Hamed Karimian, Juan Shi, and Guozhen Zha. 2025. "LMVMamba: A Hybrid U-Shape Mamba for Remote Sensing Segmentation with Adaptation Fine-Tuning" Remote Sensing 17, no. 19: 3367. https://doi.org/10.3390/rs17193367
APA StyleLi, F., Wang, X., Wang, H., Karimian, H., Shi, J., & Zha, G. (2025). LMVMamba: A Hybrid U-Shape Mamba for Remote Sensing Segmentation with Adaptation Fine-Tuning. Remote Sensing, 17(19), 3367. https://doi.org/10.3390/rs17193367