MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion

Li, Bo; Li, Chunhao; Li, Yuheng

doi:10.3390/app16125998

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion

by

Bo Li

¹,

Chunhao Li

^2,* and

Yuheng Li

^2,*

¹

School of Computer Science, Hainan University, Haikou 570228, China

²

Key Laboratory of Internet Information Retrieval of Hainan Province, Hainan University, Haikou 570228, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5998; https://doi.org/10.3390/app16125998 (registering DOI)

Submission received: 22 May 2026 / Revised: 8 June 2026 / Accepted: 10 June 2026 / Published: 13 June 2026

(This article belongs to the Special Issue AI-Based Methods for Object Detection and Path Planning)

Download Versions Notes

Abstract

Robust autonomous-driving detection requires using RGB texture and infrared thermal cues without sacrificing real-time inference. Existing RGB-IR detectors often rely on static feature concatenation or quadratic attention, which makes them sensitive to modality imbalance, small spatial offsets, and deployment cost. We propose MobileMamba-DETR, a lightweight DETR-style detector that treats dual-modal fusion as a selective state-space process. Its principal design is an SS2D-based cross-modal interaction module that uses normalized RGB-IR contrast as a guide, while a MobileMamba backbone, spectral–spatial encoder, and dynamic convolutional decoder provide efficient multi-scale representation and query localization. On M3FD and FLIR-ADAS, MobileMamba-DETR achieves mAP

_{50}

of 83.6% and 78.3%, respectively, with 38.7M parameters and 42 FPS inference at

640 \times 640

on an RTX 3090. The results, ablations, and seed-based validation show that selective state-space fusion improves accuracy while retaining real-time throughput.

Keywords: autonomous driving; vehicle detection; multimodal fusion; Mamba; infrared; RGB; transformer; object detection; state space model

Share and Cite

MDPI and ACS Style

Li, B.; Li, C.; Li, Y. MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Appl. Sci. 2026, 16, 5998. https://doi.org/10.3390/app16125998

AMA Style

Li B, Li C, Li Y. MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Applied Sciences. 2026; 16(12):5998. https://doi.org/10.3390/app16125998

Chicago/Turabian Style

Li, Bo, Chunhao Li, and Yuheng Li. 2026. "MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion" Applied Sciences 16, no. 12: 5998. https://doi.org/10.3390/app16125998

APA Style

Li, B., Li, C., & Li, Y. (2026). MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion. Applied Sciences, 16(12), 5998. https://doi.org/10.3390/app16125998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

MobileMamba-DETR: Efficient Dual-Modal Vehicle Detection for Autonomous Driving via Multi-Scale Selective State Space Fusion

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI