Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning
Abstract
1. Introduction
- (1)
- A state-space-based feature extraction module is proposed, which enables global feature perception of infrared and visible images while maintaining low computational complexity. This design effectively mitigates the limitations of traditional CNNs in feature representation and the high computational cost associated with Transformer-based architectures.
- (2)
- A cross state-space-based multimodal feature fusion module is constructed, which introduces a state-space modeling mechanism to capture long-range dependencies between different modalities. This design enables efficient and complementary integration of infrared and visible information, thereby enhancing the structural consistency and detail preservation of the fused images.
- (3)
- A multi-class discriminator-based adversarial learning mechanism is designed to guide the fusion network toward more refined visual optimization during the multimodal image generation process. By constructing a more discriminative multi-class discriminator, the proposed approach significantly improves both the subjective visual quality and objective evaluation metrics of the fused images.
2. Related Work
2.1. Image Fusion Based on Traditional Methods
2.2. Deep Learning-Based Methods
3. Methodology
3.1. Overall Network Architecture
3.2. Feature Extraction Unit Based on State-Space Model (Mamba)
3.3. Fusion Module Based on Cross State-Space Model (Cross Mamba)
3.4. Design of Loss Function
3.4.1. Design of the Generator’s Loss Function
3.4.2. Design of the Discriminator’s Loss
4. Experiments
4.1. Experimental Setup
4.2. Comparative Experiments
4.3. Ablation Study
4.4. Generalization Experiments
4.4.1. Generalization Experiments on the LLVIP Dataset
4.4.2. Generalization Experiments on the TNO Dataset
4.5. Model Complexity Analysis
5. Discussion
5.1. Discussion and Comparative Analysis
5.2. Limitations
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yi, X.; Tang, L.; Zhang, H.; Xu, H.; Ma, J. Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior. Inf. Fusion 2024, 110, 102450. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, A.; Yang, G.; Liu, Y.; Chen, X. SIMFusion: A semantic information-guided modality-specific fusion network for MR Images. Inf. Fusion 2024, 112, 102560. [Google Scholar] [CrossRef]
- Cheng, C.; Xu, T.; Wu, X.J.; Li, H.; Li, X.; Tang, Z.; Kittler, J. TextFusion: Unveiling the power of textual semantics for controllable image fusion. Inf. Fusion 2025, 117, 102790. [Google Scholar] [CrossRef]
- Meng, B.; Liu, H.; Ding, Z. Multi-scene image fusion via memory aware synapses. Sci. Rep. 2025, 15, 14280. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Huo, H.; Li, J.; Pang, S.; Zheng, B. A semantic-driven coupled network for infrared and visible image fusion. Inf. Fusion 2024, 108, 102352. [Google Scholar] [CrossRef]
- Yuan, M.; Shi, X.; Wang, N.; Wang, Y.; Wei, X. Improving RGB-infrared object detection with cascade alignment-guided transformer. Inf. Fusion 2024, 105, 102246. [Google Scholar] [CrossRef]
- Luo, Y.; Zhang, J.; Li, C. CPIFuse: Toward realistic color and enhanced textures in color polarization image fusion. Inf. Fusion 2025, 120, 103111. [Google Scholar] [CrossRef]
- Chen, X.; Xu, S.; Hu, S.; Ma, X. DGFD: A dual-graph convolutional network for image fusion and low-light object detection. Inf. Fusion 2025, 119, 103025. [Google Scholar] [CrossRef]
- Yang, B.; Jiang, Z.; Pan, D.; Yu, H.; Gui, G.; Gui, W. LFDT-Fusion: A latent feature-guided diffusion Transformer model for general image fusion. Inf. Fusion 2025, 113, 102639. [Google Scholar] [CrossRef]
- Wang, Q.; Li, Z.; Zhang, S.; Luo, Y.; Chen, W.; Wang, T.; Chi, N.; Dai, Q. SMAE-Fusion: Integrating saliency-aware masked autoencoder with hybrid attention transformer for infrared–visible image fusion. Inf. Fusion 2025, 117, 102841. [Google Scholar] [CrossRef]
- Yang, C.; Luo, X.; Zhang, Z.; Chen, Z.; Wu, X.j. KDFuse: A high-level vision task-driven infrared and visible image fusion method based on cross-domain knowledge distillation. Inf. Fusion 2025, 118, 102944. [Google Scholar] [CrossRef]
- Li, Y.; Zhou, P.; Zhou, G.; Wang, H.; Lu, Y.; Peng, Y. A comprehensive survey of visible and infrared imaging in complex environments: Principle, degradation and enhancement. Inf. Fusion 2025, 119, 103036. [Google Scholar] [CrossRef]
- Liu, J.; Li, S.; Dian, R.; Song, Z. DT-F Transformer: Dual transpose fusion transformer for polarization image fusion. Inf. Fusion 2024, 106, 102274. [Google Scholar] [CrossRef]
- Sun, H.; Wu, S.; Ma, L. Adversarial attacks on GAN-based image fusion. Inf. Fusion 2024, 108, 102389. [Google Scholar] [CrossRef]
- Huang, Q.; Wu, G.; Jiang, Z.; Fan, W.; Xu, B.; Liu, J. Leveraging a self-adaptive mean teacher model for semi-supervised multi-exposure image fusion. Inf. Fusion 2024, 112, 102534. [Google Scholar] [CrossRef]
- Wang, X.; Guan, Z.; Qian, W.; Cao, J.; Ma, R.; Bi, C. A degradation-aware guided fusion network for infrared and visible image. Inf. Fusion 2025, 118, 102931. [Google Scholar] [CrossRef]
- Yang, Z.; Yu, H.; Zhang, J.; Tang, Q.; Mian, A. Deep learning based infrared small object segmentation: Challenges and future directions. Inf. Fusion 2025, 118, 103007. [Google Scholar] [CrossRef]
- Long, J.; Fang, Z.; Wang, L. SK-MMFMNet: A multi-dimensional fusion network of remote sensing images and EEG signals for multi-scale marine target recognition. Inf. Fusion 2024, 108, 102402. [Google Scholar] [CrossRef]
- Huang, M.; Yu, W.; Zhang, L. DF3Net: Dual frequency feature fusion network with hierarchical transformer for image inpainting. Inf. Fusion 2024, 111, 102487. [Google Scholar] [CrossRef]
- Ding, W.; Geng, S.; Wang, H.; Huang, J.; Zhou, T. FDiff-Fusion: Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation. Inf. Fusion 2024, 112, 102540. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, Y.; Wang, X.; Zhang, L.; Jiang, Z.; Li, Y.; Yan, C.; Fu, Y.; Zhang, T. A Cross-modal Fusion Method for Multispectral Small Ship Detection. In Proceedings of the 2024 27th International Conference on Information Fusion (FUSION), Venice, Italy, 8–11 July 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83, 79–92. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Durrani, T. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Zhang, H.; Yuan, J.; Tian, X.; Ma, J. GAN-FM: Infrared and visible image fusion using GAN with full-scale skip connection and dual Markovian discriminators. IEEE Trans. Comput. Imaging 2021, 7, 1134–1147. [Google Scholar] [CrossRef]
- Tang, H.; Liu, G.; Qian, Y.; Wang, J.; Xiong, J. EgeFusion: Towards edge gradient enhancement in infrared and visible image fusion with multi-scale transform. IEEE Trans. Comput. Imaging 2024, 10, 385–398. [Google Scholar] [CrossRef]
- Wang, J.; Qu, H.; Zhang, Z.; Xie, M. New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model. Inf. Fusion 2024, 105, 102230. [Google Scholar] [CrossRef]
- Ye, Y.; Zhang, J.; Zhou, L.; Li, J.; Ren, X.; Fan, J. Optical and SAR image fusion based on complementary feature decomposition and visual saliency features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5205315. [Google Scholar] [CrossRef]
- Li, H.; Ma, H.; Cheng, C.; Shen, Z.; Song, X.; Wu, X.J. Conti-Fuse: A novel continuous decomposition-based fusion framework for infrared and visible images. Inf. Fusion 2025, 117, 102839. [Google Scholar] [CrossRef]
- Yue, J.; Hong, X.; Zhang, B. A damage imaging method based on particle swarm optimization for composites nondestructive testing using ultrasonic guided waves. Appl. Acoust. 2024, 218, 109878. [Google Scholar] [CrossRef]
- Cheng, Y.; Zhang, L.; Guo, D.; Wang, N.; Qi, J.; Qiu, J. Subregional polarization fusion via Stokes parameters in passive millimeter-wave imaging. IEEE Trans. Ind. Inform. 2024, 20, 8585–8595. [Google Scholar] [CrossRef]
- Wang, Q.; Li, Z.; Zhang, S.; Chi, N.; Dai, Q. WaveFusion: A Novel Wavelet Vision Transformer with Saliency-Guided Enhancement for Multimodal Image Fusion. IEEE Trans. Circuits Syst. Video Technol. 2025; early access. [Google Scholar] [CrossRef]
- Gong, X.; Hou, Z.; Wan, Y.; Zhong, Y.; Zhang, M.; Lv, K. Multispectral and SAR image fusion for multiscale decomposition based on least squares optimization rolling guidance filtering. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–20. [Google Scholar] [CrossRef]
- Luo, Y.; Yang, B. Progressive fusion of hyperspectral and multispectral images based on joint bilateral filtering. Infrared Phys. Technol. 2025, 145, 105676. [Google Scholar] [CrossRef]
- Aymaz, S.; Köse, C. A novel image decomposition-based hybrid technique with super-resolution method for multi-focus image fusion. Inf. Fusion 2019, 45, 113–127. [Google Scholar] [CrossRef]
- Liu, M.; Jiao, L.; Liu, X.; Li, L.; Liu, F.; Yang, S. C-CNN: Contourlet convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2636–2649. [Google Scholar] [CrossRef] [PubMed]
- Liu, D.; Zhang, Y.; Bai, L.; Zhang, P.; Gao, F. Combining two-layer semi-three-dimensional reconstruction and multi-wavelength image fusion for functional diffuse optical tomography. IEEE Trans. Comput. Imaging 2021, 7, 1055–1068. [Google Scholar] [CrossRef]
- Li, L.; Lv, M.; Jia, Z.; Jin, Q.; Liu, M.; Chen, L.; Ma, H. An effective infrared and visible image fusion approach via rolling guidance filtering and gradient saliency map. Remote Sens. 2023, 15, 2486. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, L.; Song, R.; Huang, C.; Tong, Q. Considering nonoverlapped bands construction: A general dictionary learning framework for hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5505215. [Google Scholar] [CrossRef]
- Liu, J.; Dian, R.; Li, S.; Liu, H. SGFusion: A saliency guided deep-learning framework for pixel-level image fusion. Inf. Fusion 2023, 91, 205–214. [Google Scholar] [CrossRef]
- Zhang, X. Deep learning-based multi-focus image fusion: A survey and a comparative study. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4819–4838. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Newsam, S. Densenet for dense flow. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 790–794. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Ma, J.; Tang, L.; Xu, M.; Zhang, H.; Xiao, G. STDFusionNet: An infrared and visible image fusion network based on salient target detection. IEEE Trans. Instrum. Meas. 2021, 70, 5009513. [Google Scholar] [CrossRef]
- Zheng, N.; Zhou, M.; Huang, J.; Hou, J.; Li, H.; Xu, Y.; Zhao, F. Probing synergistic high-order interaction in infrared and visible image fusion. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 26384–26395. [Google Scholar]
- Wang, C.; Sun, D.; Gao, Q.; Wang, L.; Yan, Z.; Wang, J.; Wang, E.; Wang, T. MLFFusion: Multi-level feature fusion network with region illumination retention for infrared and visible image fusion. Infrared Phys. Technol. 2023, 134, 104916. [Google Scholar] [CrossRef]
- Tang, L.; Xiang, X.; Zhang, H.; Gong, M.; Ma, J. DIVFusion: Darkness-free infrared and visible image fusion. Inf. Fusion 2023, 91, 477–493. [Google Scholar] [CrossRef]
- Yang, Q.; Zhang, Y.; Zhao, Z.; Zhang, J.; Zhang, S. IAIFNet: An illumination-aware infrared and visible image fusion network. IEEE Signal Process. Lett. 2024, 31, 1374–1378. [Google Scholar] [CrossRef]
- Ma, J.; Zhang, H.; Shao, Z.; Liang, P.; Xu, H. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2020, 70, 5005014. [Google Scholar] [CrossRef]
- Liu, J.; Fan, X.; Huang, Z.; Wu, G.; Liu, R.; Zhong, W.; Luo, Z. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5802–5811. [Google Scholar]
- Rao, Y.; Wu, D.; Han, M.; Wang, T.; Yang, Y.; Lei, T.; Zhou, C.; Bai, H.; Xing, L. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Inf. Fusion 2023, 92, 336–349. [Google Scholar] [CrossRef]
AG | EN | SF | SD | VIF | MI | |
---|---|---|---|---|---|---|
DenseFuse | 1.8922 | 5.5640 | 5.7571 | 22.0265 | 0.7521 | 2.3586 |
DDFM | 2.0784 | 5.3739 | 6.7840 | 21.6552 | 0.6272 | 2.1146 |
FusionGAN | 1.2560 | 5.2837 | 3.9676 | 17.2470 | 0.4150 | 1.8377 |
CDDFuse | 3.0554 | 6.0569 | 9.5884 | 33.1917 | 1.0627 | 3.9625 |
GANMcC | 1.8494 | 5.8559 | 5.2707 | 24.6880 | 0.6863 | 2.3853 |
RFN-Nest | 1.2794 | 5.4081 | 4.3961 | 21.9074 | 0.5786 | 2.1504 |
UMF-CMGR | 1.7462 | 5.0635 | 6.0580 | 16.9096 | 0.3096 | 1.5921 |
Ours | 3.1824 | 6.0550 | 10.1023 | 32.7964 | 1.0673 | 3.8239 |
AG | EN | SF | SD | VIF | MI | |
---|---|---|---|---|---|---|
wo_fusion | 2.8667 | 6.0712 | 9.1145 | 33.9747 | 1.0702 | 3.7966 |
wo_dis | 2.7229 | 6.0351 | 8.9257 | 33.0519 | 1.0619 | 3.5104 |
ours | 3.1824 | 6.0550 | 10.1023 | 32.7964 | 1.0673 | 3.8239 |
AG | EN | SF | SD | VIF | MI | |
---|---|---|---|---|---|---|
DenseFuse | 3.4760 | 6.8699 | 11.7854 | 33.9208 | 0.5992 | 2.6486 |
DDFM | 3.5887 | 6.8900 | 11.5380 | 34.9923 | 0.5623 | 2.5703 |
FusionGAN | 2.4589 | 6.2919 | 8.9024 | 24.6102 | 0.4311 | 2.8443 |
CDDFuse | 3.5450 | 7.1858 | 13.3031 | 44.9629 | 0.9368 | 4.2444 |
GANMcC | 2.9430 | 6.6834 | 9.7521 | 31.9885 | 0.5277 | 2.6618 |
RFN-Nest | 3.1834 | 6.8629 | 10.1824 | 34.5664 | 0.5738 | 2.5417 |
UMF-CMGR | 3.1107 | 6.4514 | 12.0271 | 29.0176 | 0.4420 | 2.6725 |
Ours | 4.1607 | 7.2059 | 14.4068 | 44.4945 | 1.0068 | 4.2107 |
AG | EN | SF | SD | VIF | MI | |
---|---|---|---|---|---|---|
DenseFuse | 3.5432 | 6.8206 | 8.9475 | 34.8425 | 0.6630 | 2.3167 |
DDFM | 3.4364 | 6.8226 | 8.5140 | 33.6925 | 0.2884 | 1.7338 |
FusionGAN | 2.4184 | 6.5572 | 6.2719 | 30.7810 | 0.4244 | 2.3327 |
CDDFuse | 4.6453 | 7.0682 | 12.3437 | 44.6297 | 0.7937 | 3.1483 |
GANMcC | 2.5204 | 6.7338 | 6.1114 | 33.4233 | 0.5324 | 2.2798 |
RFN-Nest | 2.6541 | 6.9661 | 5.8461 | 36.9403 | 0.5614 | 2.1286 |
UMF-CMGR | 2.9691 | 6.5371 | 8.1779 | 30.1167 | 0.5981 | 2.2287 |
Ours | 4.7447 | 6.9573 | 13.023 | 39.1446 | 0.7699 | 3.8196 |
Model | Parms (MB) | Test Time (s) |
---|---|---|
DenseFuse | 1.02 | 0.05 |
DDFM | 52.71 | 23.50 |
FusionGAN | 7.24 | 0.07 |
CDDFuse | 4.52 | 0.08 |
GANMcC | 1.09 | 0.05 |
RFN-Nest | 28.70 | 0.06 |
UMF-CMGR | 7.22 | 0.11 |
Ours | 0.87 | 0.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Q.; Peng, Y.; U, K.; Zhao, S. Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning. Mathematics 2025, 13, 2333. https://doi.org/10.3390/math13152333
Hu Q, Peng Y, U K, Zhao S. Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning. Mathematics. 2025; 13(15):2333. https://doi.org/10.3390/math13152333
Chicago/Turabian StyleHu, Qingqing, Yiran Peng, KinTak U, and Siyuan Zhao. 2025. "Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning" Mathematics 13, no. 15: 2333. https://doi.org/10.3390/math13152333
APA StyleHu, Q., Peng, Y., U, K., & Zhao, S. (2025). Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning. Mathematics, 13(15), 2333. https://doi.org/10.3390/math13152333