Next Article in Journal
Weaving Together Ecological Data with Indigenous Knowledge to Model Environmental Factors Impacting Rubus chamaemorus Productivity in Southwest Alaska
Previous Article in Journal
VRPF: A Fine-Grained 3D Radar Power-Density Computation Framework Based on Photogrammetric City Models for Urban Observation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing

Rocket Force University of Engineering, Xi’an 710025, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(12), 1938; https://doi.org/10.3390/rs18121938
Submission received: 25 April 2026 / Revised: 3 June 2026 / Accepted: 9 June 2026 / Published: 11 June 2026

Abstract

To address the issues of insufficient single-modality robustness and limited multi-scale object detection accuracy in remote sensing image detection (RSID) in complex environments, this paper proposes a multimodal RSID network named VI-MSFFN. The model adopts a symmetric parallel dual-branch architecture to achieve independent extraction and collaborative modeling of visible and infrared modal features. A cross-modal multi-scale sparse cross-attention fusion module is proposed and applied to the P4 and P5 feature layers, and a high-low-level feature collaborative cross-modal fusion strategy was constructed to achieve efficient and robust cross-modal feature fusion while enhancing multi-scale object modeling capability and suppressing feature redundancy and noise. Additionally, a progressive feature interaction and fusion architecture was designed to combine spatial and frequency domain information to strengthen deep object representation. The experimental results on the VEDAI and Drone Vehicle datasets demonstrate that VI-MSFFN achieves state-of-the-art (SOTA) performance in detection accuracy, robustness, and generalization ability. The proposed method effectively solves the detection challenges of RSID and has significant application value in the field of multi-modal RSID.
Keywords: remote sensing image detection; VI-MSFFN; symmetric parallel dual-branch; progressive feature interaction and fusion architecture; joint spatial-frequency domain remote sensing image detection; VI-MSFFN; symmetric parallel dual-branch; progressive feature interaction and fusion architecture; joint spatial-frequency domain

Share and Cite

MDPI and ACS Style

Yue, Y.; Qin, W.; Chi, H.; An, B.; Wu, D.; Guo, W.; Xiong, J. VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing. Remote Sens. 2026, 18, 1938. https://doi.org/10.3390/rs18121938

AMA Style

Yue Y, Qin W, Chi H, An B, Wu D, Guo W, Xiong J. VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing. Remote Sensing. 2026; 18(12):1938. https://doi.org/10.3390/rs18121938

Chicago/Turabian Style

Yue, Yurong, Weiwei Qin, Hao Chi, Baiwei An, Dingyi Wu, Wenxin Guo, and Jingyi Xiong. 2026. "VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing" Remote Sensing 18, no. 12: 1938. https://doi.org/10.3390/rs18121938

APA Style

Yue, Y., Qin, W., Chi, H., An, B., Wu, D., Guo, W., & Xiong, J. (2026). VI-MSFFN: A Visible-Infrared Multi-Scale Feature Fusion Network for Cross-Modal Detection in Remote Sensing. Remote Sensing, 18(12), 1938. https://doi.org/10.3390/rs18121938

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop