Infrared and Visible Image Fusion Network Based on Self-Compensating Lightweight Convolution
Abstract
1. Introduction
- A self-compensating lightweight convolution (LWC) module is proposed to address the inherent loss of complementary information caused by lightweight convolutional operations. Unlike conventional lightweight designs that focus primarily on computational reduction, the proposed LWC introduces an information compensation mechanism to preserve and enhance discriminative feature interactions during feature extraction, thereby improving the representation capability of lightweight fusion networks.
- A novel lightweight infrared–visible image fusion framework, termed LWC-DenseFuse, is developed based on the proposed LWC module. By embedding self-compensating feature interaction into both feature extraction and reconstruction stages, the proposed framework establishes an information-compensation-guided lightweight fusion paradigm, enabling substantial model compression while maintaining effective cross-modal information preservation. In addition, a staged training strategy with progressive loss weighting is designed to further enhance optimization stability and fusion performance.
- Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed framework. The proposed method achieves a superior balance between fusion quality and computational efficiency, consistently outperforming representative lightweight fusion approaches while requiring significantly fewer parameters and FLOPs. Moreover, ablation studies verify the effectiveness of the proposed training strategy in improving reconstruction quality and visual fidelity.
2. Related Work
2.1. Infrared and Visible Image Fusion
2.2. Current Status of Model Lightweighting for Image Fusion
2.2.1. Current Status of Model Lightweighting Research
2.2.2. Current Status of Lightweighting Methods in the Field of Image Fusion
- (1)
- Structural decoupling in Ghost- and depthwise-based methods inevitably weakens cross-channel and cross-modal feature interactions, which are essential for preserving complementary thermal and texture information. This limitation also makes feature optimization more sensitive to training dynamics under lightweight constraints.
- (2)
- LUT-based methods rely on highly compact discrete representations, which reduce modeling flexibility and hinder the preservation of fine-grained structural details during fusion, thereby increasing the difficulty of stable reconstruction during training.
- (3)
- Frequency-domain-based methods rely on explicit spectral transforms, which constrain joint spatial–channel representation learning and weaken cross-modal interaction modeling, leading to suboptimal convergence behavior in end-to-end optimization.
3. Materials and Methods
3.1. Overall Network Architecture
3.2. Lightweight Convolution Module Design
3.3. Loss Function
4. Experiment
4.1. Experimental Setup
4.2. Ablation Study
4.2.1. Module Ablation
4.2.2. Training Strategy Ablation
- (A)
- Staged, the proposed progressive scheme, where the SSIM weight λ is set to 0 during epochs 0–19, increased to 10 during epochs 20–29, and further raised to 100 during epochs 30–59, following a reconstruction-first and structure-aware optimization paradigm;
- (B)
- Constant-Low, which maintains λ = 10 throughout training;
- (C)
- Constant-High, which fixes λ = 100 for all epochs;
- (D)
- Linear Annealing, where λ increases linearly from 0 to 100;
- (E)
- Cosine Annealing, where λ follows a cosine progression from 0 to 100. All strategies employ the same DenseFuse architecture enhanced with GhostModules, ensuring that performance differences originate solely from the weight scheduling mechanism.
4.3. Comparative Experiment on Different Lightweight Convolution Strategies
4.4. Comparative Experiment with Advanced Lightweight Methods
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AG | Average Gradient |
| CNN | Convolutional Neural Network |
| DW | Depthwise Separable Convolution |
| EN | Information Entropy |
| LWC | Lightweight Convolution |
| MI | Mutual Information |
| MSE | Mean Squared Error |
| SSIM | Structural Similarity |
| VIF | Visual Information Fidelity |
References
- Li, H.; Wu, X.-J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2624. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. arXiv 2019, arXiv:1911.11907. [Google Scholar]
- Wang, J.; Xi, X.; Li, D.; Li, F. FusionGRAM: An Infrared and Visible Image Fusion Framework Based on Gradient Residual and Attention Mechanism. IEEE Trans. Instrum. Meas. 2023, 72, 5005412. [Google Scholar] [CrossRef]
- Li, H.; Xu, T.; Wu, X.-J.; Lu, J.; Kittler, J. LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11040–11052. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.-J.; Durrani, T. NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
- Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.-P. DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Yuan, J.; Tian, X.; Ma, J. GAN-FM: Infrared and Visible Image Fusion Using GAN with Full-Scale Skip Connection and Dual Markovian Discriminators. IEEE Trans. Comput. Imaging 2021, 7, 1134–1147. [Google Scholar] [CrossRef]
- Valanarasu, J.M.J.; Oza, P.; Hacihaliloglu, I.; Patel, V.M. Image Fusion Transformer. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 3566–3570. [Google Scholar]
- Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and Visible Image Fusion via Gradient Transfer and Total Variation Minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 2261–2269. [Google Scholar]
- Tang, L.; Yuan, J.; Ma, J. Image Fusion in the Loop of High-Level Vision Tasks: A Semantic-Aware Real-Time Infrared and Visible Image Fusion Network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Li, G.; Qian, X.; Qu, X. SOSMaskFuse: An Infrared and Visible Image Fusion Architecture Based on Salient Object Segmentation Mask. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10118–10137. [Google Scholar] [CrossRef]
- Zhao, F.; Zhao, W.; Yao, L.; Liu, Y. Self-Supervised Feature Adaptation for Infrared and Visible Image Fusion. Inf. Fusion 2021, 76, 189–203. [Google Scholar] [CrossRef]
- Hu, T.; Nan, X.; Zhou, Q.; Lin, R.; Shen, Y. A Model-Based Infrared and Visible Image Fusion Network with Cooperative Optimization. Expert Syst. Appl. 2025, 263, 125639. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2019; pp. 1314–1324. [Google Scholar]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetV2: Enhance Cheap Operation with Long-Range Attention. arXiv 2022, arXiv:2201.03297. [Google Scholar]
- Liu, Z.; Hao, Z.; Han, K.; Tang, Y.; Wang, Y. GhostNetV3: Exploring the Training Strategies for Compact Models. arXiv 2024, arXiv:2404.11202. [Google Scholar]
- Cheng, C.; Wu, X.-J.; Xu, T.; Chen, G. UNIFusion: A Lightweight Unified Image Fusion Network. IEEE Trans. Instrum. Meas. 2021, 70, 5016614. [Google Scholar] [CrossRef]
- Chi, H.; Luo, D.; Wang, S. LMDFusion: A Lightweight Infrared and Visible Image Fusion Network for Substation Equipment Based on Mask and Residual Dense Connection. Infrared Phys. Technol. 2024, 138, 105218. [Google Scholar] [CrossRef]
- Yi, X.; Zhang, Y.; Xiang, X.; Yan, Q.; Xu, H.; Ma, J. LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2025; pp. 14559–14568. [Google Scholar]
- Zhao, H.; Jiang, T.; Li, X.; Jin, J. SFDFuse: A lightweight spatial-frequency fusion network for infrared-visible images. Pattern Recognit. 2026, 179, 113634. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Wang, J.; Yang, L.; Dong, B.; Li, Z. HaarFuse: A dual-branch infrared and visible light image fusion network based on Haar wavelet transform. Pattern Recognit. 2025, 164, 111594. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
- Li, A.; Yin, G.; Wang, Z.; Liang, J.; Wang, F.; Bai, X.; Liu, Z. RCAFusion: Cross Rubik Cube Attention Network for Multi-Modal Image Fusion of Intelligent Vehicles. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Republic of Korea, 2–5 June 2024; pp. 2848–2854. [Google Scholar]
- Wu, G.; Liu, H.; Fu, H.; Peng, Y.; Liu, J.; Fan, X.; Liu, R. Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2025; pp. 17882–17891. [Google Scholar]










| Model | TNO | ||||||
|---|---|---|---|---|---|---|---|
| AG (↑) | EN (↑) | VIF (↑) | Qabf (↑) | MI (↑) | Param/M (↓) | Avg_Time/s (↓) | |
| M1 | 3.3433 | 6.4986 | 0.6949 | 0.3613 | 2.2539 | 0.0194 | 0.0169 |
| M2 | 3.7295 | 6.5443 | 0.6694 | 0.3654 | 2.0457 | 0.0195 | 0.0110 |
| M3 | 3.7295 | 6.0730 | 0.5088 | 0.2826 | 1.6519 | 0.0211 | 0.0087 |
| M4 | 4.7145 | 7.0764 | 0.7895 | 0.3945 | 1.9464 | 0.0212 | 0.0112 |
| Model | MSRS | ||||||
|---|---|---|---|---|---|---|---|
| AG (↑) | EN (↑) | VIF (↑) | Qabf (↑) | MI (↑) | Param/M (↓) | Avg_Time/s (↓) | |
| DenseFuse | 2.1084 | 5.9797 | 0.7199 | 0.4061 | 2.7603 | 0.0740 | 0.0026 |
| DW-DenseFuse | 2.2380 | 6.0436 | 0.7466 | 0.4647 | 2.8583 | 0.0150 | 0.0028 |
| Ghsot-DenseFuse | 4.7007 | 6.7492 | 0.6547 | 0.3509 | 2.1305 | 0.0550 | 0.0044 |
| Star-DenseFuse | 2.0641 | 5.9446 | 0.7007 | 0.3804 | 2.7000 | 0.1491 | 0.0080 |
| LWC-DenseFuse | 5.1644 | 7.0520 | 0.8530 | 0.3542 | 2.0939 | 0.0212 | 0.0106 |
| Model | LLVIP | ||||||
|---|---|---|---|---|---|---|---|
| AG (↑) | EN (↑) | VIF (↑) | Qabf (↑) | MI (↑) | Param/M (↓) | Avg_Time/s (↓) | |
| DenseFuse | 3.4885 | 6.8486 | 0.7405 | 0.4190 | 2.5696 | 0.0740 | 0.0028 |
| DW-DenseFuse | 3.6312 | 6.9010 | 0.7589 | 0.4652 | 2.6445 | 0.0150 | 0.0023 |
| Ghsot-DenseFuse | 5.4603 | 7.1307 | 0.7440 | 0.4750 | 2.1661 | 0.0550 | 0.0045 |
| Star-DenseFuse | 3.0758 | 6.8457 | 0.7280 | 0.3068 | 2.7117 | 0.1491 | 0.0084 |
| LWC-DenseFuse | 5.2757 | 7.3922 | 0.8721 | 0.5446 | 2.5154 | 0.0212 | 0.0092 |
| Model | MSRS | |||||
|---|---|---|---|---|---|---|
| AG (↑) | EN (↑) | Qabf (↑) | MI (↑) | Param/M (↓) | FLOPs/G (↓) | |
| UNIFusion | 4.7855 | 5.8130 | 0.4769 | 3.6204 | 0.3900 | 70.214 |
| RCAFusion | 5.5645 | 6.942 | 0.6733 | 2.6308 | 0.1823 | 59.382 |
| SAGE | 4.7569 | 6.900 | 0.4001 | 2.4063 | 0.1356 | 20.337 |
| LUT-Fuse | 4.9828 | 6.920 | 0.5650 | 2.7150 | 0.0730 | 2.449 |
| ours | 5.1644 | 7.0520 | 0.3542 | 2.0939 | 0.0212 | 8.842 |
| Model | LLVIP | |||||
|---|---|---|---|---|---|---|
| AG (↑) | EN (↑) | Qabf (↑) | MI (↑) | Param/M (↓) | FLOPs/G (↓) | |
| UNIFusion | 7.0098 | 6.5163 | 0.4718 | 3.0997 | 0.3900 | 386.543 |
| RCAFusion | 3.8596 | 6.7008 | 0.5851 | 4.6500 | 0.1823 | 253.362 |
| SAGE | 3.1630 | 6.0038 | 0.4363 | 3.2402 | 0.1356 | 86.770 |
| LUT-Fuse | 3.8150 | 6.5649 | 0.4999 | 3.6483 | 0.0730 | 10.449 |
| ours | 5.2757 | 7.3922 | 0.5446 | 2.5154 | 0.0212 | 37.728 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, R.; Wang, H.; Wu, Q.; Liang, C.; Li, H.; Wang, J. Infrared and Visible Image Fusion Network Based on Self-Compensating Lightweight Convolution. Sensors 2026, 26, 3748. https://doi.org/10.3390/s26123748
Li R, Wang H, Wu Q, Liang C, Li H, Wang J. Infrared and Visible Image Fusion Network Based on Self-Compensating Lightweight Convolution. Sensors. 2026; 26(12):3748. https://doi.org/10.3390/s26123748
Chicago/Turabian StyleLi, Ruolin, Hongmei Wang, Qiaorong Wu, Cheng Liang, Haoyu Li, and Jingyu Wang. 2026. "Infrared and Visible Image Fusion Network Based on Self-Compensating Lightweight Convolution" Sensors 26, no. 12: 3748. https://doi.org/10.3390/s26123748
APA StyleLi, R., Wang, H., Wu, Q., Liang, C., Li, H., & Wang, J. (2026). Infrared and Visible Image Fusion Network Based on Self-Compensating Lightweight Convolution. Sensors, 26(12), 3748. https://doi.org/10.3390/s26123748

