Previous Issue
Volume 12, May
 
 

J. Imaging, Volume 12, Issue 6 (June 2026) – 1 article

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
21 pages, 3429 KB  
Article
Visible–Infrared Fusion Based on CNN and Deformable Transformer
by Xiaoyi Wang, Xiansong Gu, Bin Li, Mingqiang Zhang, Panpan Yang and Qiang Fu
J. Imaging 2026, 12(6), 219; https://doi.org/10.3390/jimaging12060219 - 22 May 2026
Abstract
To address the limitations of traditional methods in feature extraction and multi-modal information fusion, this paper proposes an infrared–visible image object detection architecture that integrates Convolutional Neural Networks (CNNs) and Deformable Transformers. This method leverages the advantages of CNN in local feature modeling [...] Read more.
To address the limitations of traditional methods in feature extraction and multi-modal information fusion, this paper proposes an infrared–visible image object detection architecture that integrates Convolutional Neural Networks (CNNs) and Deformable Transformers. This method leverages the advantages of CNN in local feature modeling and the capabilities of Transformer in capturing global contextual information, facilitating the fusion of semantic consistency and structural details across modalities. By introducing a detection-aware multi-task optimization mechanism, the model improves object detection in challenging scenarios such as low-light conditions, occlusion, and complex backgrounds. Experiments on multiple standard datasets, including M3FD and LLVIP, indicate that the proposed method achieves competitive or better performance than the compared methods in key metrics such as mAP. Specifically, our method obtains the best mAP50 among the evaluated methods with an mAP50 of 74.2% on the M3FD dataset and 98.6% on the LLVIP dataset, surpassing the second-best PIAFusion by 4.3% and 2.5% respectively. These quantitative results support the practicality and effectiveness of our approach in the evaluated complex environments. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Previous Issue
Back to TopTop