- Article
Asymmetric Spatial–Frequency Fusion Network for Infrared and Visible Object Detection
- Jing Liu,
- Jing Gao and
- Xiaoyong Liu
- + 5 authors
Infrared and visible image fusion-based object detection is critical for robust environmental perception under adverse conditions, yet existing methods still suffer from insufficient modeling of modality discrepancies and limited adaptivity in their fusion mechanisms. This work proposes an asymmetric spatial–frequency fusion network, AsyFusionNet. The network adopts an asymmetric dual-branch backbone that extends the RGB branch to P5 while truncating the infrared branch at P4, thereby better aligning with the physical characteristics of the two modalities, enhancing feature complementarity, and enabling fine-grained modeling of modality differences. On top of this backbone, a local–global attention fusion (LGAF) module is introduced to model local and global attention in parallel and reorganize them through lightweight convolutions, achieving joint spatial–channel selective enhancement. Modality-specific feature enhancement is further realized via a hierarchical attention module (HAM) in the RGB branch, which employs dynamic kernel selection to emphasize multi-level texture details, and a fourier spatial spectral modulation (FS2M) module in the infrared branch, which more effectively captures global thermal radiation patterns. Extensive experiments on the FD and VEDAI datasets demonstrate that AsyFusionNet attains and , respectively, surpassing the baseline by and points (approximately and relative gains) while maintaining real-time inference speed.
17 December 2025




