You are currently on the new version of our website. Access the old version .
Remote SensingRemote Sensing
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

23 January 2026

Multispectral Sparse Cross-Attention Guided Mamba Network for Small Object Detection in Remote Sensing

,
,
,
,
,
and
1
School of Computer Science, Hubei University, Wuhan 430062, China
2
Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China
3
Key Laboratory of Intelligent Sensing System and Security, Ministry of Education, Hubei University, Wuhan 430062, China
4
Department of Electronic Engineering, Tsinghua University, Haidian District, Beijing 100084, China
This article belongs to the Special Issue Image Fusion and Object Detection Using Multi-Modal Remote Sensing Data

Abstract

Remote sensing small object detection remains a challenging task due to limited feature representation and interference from complex backgrounds. Existing methods that rely exclusively on either visible or infrared modalities often fail to achieve both accuracy and robustness in detection. Effectively integrating cross-modal information to enhance detection performance remains a critical challenge. To address this issue, we propose a novel Multispectral Sparse Cross-Attention Guided Mamba Network (MSCGMN) for small object detection in remote sensing. The proposed MSCGMN architecture comprises three key components: Multispectral Sparse Cross-Attention Guidance Module (MSCAG), Dynamic Grouped Mamba Block (DGMB), and Gated Enhanced Attention Module (GEAM). Specifically, the MSCAG module selectively fuses RGB and infrared (IR) features using sparse cross-modal attention, effectively capturing complementary information across modalities while suppressing redundancy. The DGMB introduces a dynamic grouping strategy to improve the computational efficiency of Mamba, enabling effective global context modeling. In remote sensing images, small objects occupy limited areas, making it difficult to capture their critical features. We design the GEAM module to enhance both global and local feature representations for small object detection. Experiments on the VEDAI and DroneVehicle datasets show that MSCGMN achieves mAP50 scores of 83.9% and 84.4%, outperforming existing state-of-the-art methods and demonstrating strong competitiveness in small object detection tasks.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.