Multispectral Sparse Cross-Attention Guided Mamba Network for Small Object Detection in Remote Sensing

Wen Xiang; Yamin Li; Liu Duan; Qifeng Wu; Jiaqi Ruan; Yucheng Wan; Sihan Wu

doi:10.3390/rs18030381

,

and

¹

School of Computer Science, Hubei University, Wuhan 430062, China

²

Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China

³

Key Laboratory of Intelligent Sensing System and Security, Ministry of Education, Hubei University, Wuhan 430062, China

⁴

Department of Electronic Engineering, Tsinghua University, Haidian District, Beijing 100084, China

Remote Sens.2026, 18(3), 381;https://doi.org/10.3390/rs18030381

This article belongs to the Special Issue Image Fusion and Object Detection Using Multi-Modal Remote Sensing Data

Version Notes

Order Reprints

Abstract

Remote sensing small object detection remains a challenging task due to limited feature representation and interference from complex backgrounds. Existing methods that rely exclusively on either visible or infrared modalities often fail to achieve both accuracy and robustness in detection. Effectively integrating cross-modal information to enhance detection performance remains a critical challenge. To address this issue, we propose a novel Multispectral Sparse Cross-Attention Guided Mamba Network (MSCGMN) for small object detection in remote sensing. The proposed MSCGMN architecture comprises three key components: Multispectral Sparse Cross-Attention Guidance Module (MSCAG), Dynamic Grouped Mamba Block (DGMB), and Gated Enhanced Attention Module (GEAM). Specifically, the MSCAG module selectively fuses RGB and infrared (IR) features using sparse cross-modal attention, effectively capturing complementary information across modalities while suppressing redundancy. The DGMB introduces a dynamic grouping strategy to improve the computational efficiency of Mamba, enabling effective global context modeling. In remote sensing images, small objects occupy limited areas, making it difficult to capture their critical features. We design the GEAM module to enhance both global and local feature representations for small object detection. Experiments on the VEDAI and DroneVehicle datasets show that MSCGMN achieves mAP50 scores of 83.9% and 84.4%, outperforming existing state-of-the-art methods and demonstrating strong competitiveness in small object detection tasks.

Keywords:

small object detection; remote sensing; feature fusion; attention; mamba

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.