Next Article in Journal
Rheological Transformation of Waxy Crude Oil During Transition to a Viscoplastic State
Previous Article in Journal
FFT-Free Neural Operators for Helmholtz Scattering via Adaptive Coefficient Modulation
Previous Article in Special Issue
UAV Inspection Path Planning for Reservoir Slopes: Application of a Weighted Traveling Salesman Problem Model Based on Genetic Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion

1
School of Computer Science, Hainan University, Haikou 570228, China
2
Key Laboratory of Internet Information Retrieval of Hainan Province, Hainan University, Haikou 570228, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2026, 16(12), 5998; https://doi.org/10.3390/app16125998 (registering DOI)
Submission received: 22 May 2026 / Revised: 8 June 2026 / Accepted: 10 June 2026 / Published: 13 June 2026
(This article belongs to the Special Issue AI-Based Methods for Object Detection and Path Planning)

Abstract

Robust autonomous-driving detection requires using RGB texture and infrared thermal cues without sacrificing real-time inference. Existing RGB-IR detectors often rely on static feature concatenation or quadratic attention, which makes them sensitive to modality imbalance, small spatial offsets, and deployment cost. We propose MobileMamba-DETR, a lightweight DETR-style detector that treats dual-modal fusion as a selective state-space process. Its principal design is an SS2D-based cross-modal interaction module that uses normalized RGB-IR contrast as a guide, while a MobileMamba backbone, spectral–spatial encoder, and dynamic convolutional decoder provide efficient multi-scale representation and query localization. On M3FD and FLIR-ADAS, MobileMamba-DETR achieves mAP50 of 83.6% and 78.3%, respectively, with 38.7M parameters and 42 FPS inference at 640×640 on an RTX 3090. The results, ablations, and seed-based validation show that selective state-space fusion improves accuracy while retaining real-time throughput.
Keywords: autonomous driving; vehicle detection; multimodal fusion; Mamba; infrared; RGB; transformer; object detection; state space model autonomous driving; vehicle detection; multimodal fusion; Mamba; infrared; RGB; transformer; object detection; state space model

Share and Cite

MDPI and ACS Style

Li, B.; Li, C.; Li, Y. MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Appl. Sci. 2026, 16, 5998. https://doi.org/10.3390/app16125998

AMA Style

Li B, Li C, Li Y. MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Applied Sciences. 2026; 16(12):5998. https://doi.org/10.3390/app16125998

Chicago/Turabian Style

Li, Bo, Chunhao Li, and Yuheng Li. 2026. "MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion" Applied Sciences 16, no. 12: 5998. https://doi.org/10.3390/app16125998

APA Style

Li, B., Li, C., & Li, Y. (2026). MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Applied Sciences, 16(12), 5998. https://doi.org/10.3390/app16125998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop