LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing

Zhou, Shenbo; He, Sibo; Li, Daixun; Xie, Weiying; Li, Yunsong

doi:10.3390/rs18121972

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing

by

Shenbo Zhou

,

Sibo He

,

Daixun Li

,

Weiying Xie

and

Yunsong Li

^*

State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(12), 1972; https://doi.org/10.3390/rs18121972 (registering DOI)

Submission received: 30 April 2026 / Revised: 6 June 2026 / Accepted: 9 June 2026 / Published: 13 June 2026

(This article belongs to the Special Issue Image Fusion and Object Detection Using Multi-Modal Remote Sensing Data)

Download Versions Notes

Abstract

Multi-modal land cover classification plays an important role in remote sensing applications such as urban monitoring and environmental analysis. By integrating complementary information from hyperspectral imagery (HSI) and LiDAR data, multimodal learning can significantly improve classification performance. However, existing Transformer-based fusion methods often suffer from high computational complexity and inefficient cross-modal interaction modeling, which limits their applicability in resource-constrained scenarios. To address these challenges, we propose LMFusion, an efficient framework for multimodal feature learning. Specifically, LMFusion enables efficient bidirectional feature interaction through a linear-complexity cross-attention mechanism and enhances long-range spatial-spectral representation learning with Mamba-based state space modeling, thereby achieving effective multimodal dependency modeling with linear computational complexity. In addition, a selective quantization-aware optimization strategy is introduced to support multiple bit-width settings (down to 1-bit), yielding a more compact and efficient model while improving representation robustness under low-bit constraints. Extensive experiments on the Houston2013, MUUFL, and Augsburg datasets demonstrate the effectiveness of LMFusion. It achieves overall accuracies of 95.84%, 94.95%, and 99.05%, respectively, consistently outperforming representative multimodal classification methods and showing strong potential for accurate and efficient multimodal remote sensing classification.

Keywords: remote sensing; multimodal learning; hyperspectral imagery; LiDAR; cross-attention; state space models; quantization-aware training

Share and Cite

MDPI and ACS Style

Zhou, S.; He, S.; Li, D.; Xie, W.; Li, Y. LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing. Remote Sens. 2026, 18, 1972. https://doi.org/10.3390/rs18121972

AMA Style

Zhou S, He S, Li D, Xie W, Li Y. LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing. Remote Sensing. 2026; 18(12):1972. https://doi.org/10.3390/rs18121972

Chicago/Turabian Style

Zhou, Shenbo, Sibo He, Daixun Li, Weiying Xie, and Yunsong Li. 2026. "LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing" Remote Sensing 18, no. 12: 1972. https://doi.org/10.3390/rs18121972

APA Style

Zhou, S., He, S., Li, D., Xie, W., & Li, Y. (2026). LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing. Remote Sensing, 18(12), 1972. https://doi.org/10.3390/rs18121972

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

LMFusion: Breaking the Computational Barrier for Multimodal Classification in Remote Sensing

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI