Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer
Abstract
:1. Introduction
- (1)
- The VTW-GAN model uses the GAN as the basic network and embeds the improved Transformer module in the generator, which solves the problem of poor performance in terms of fine granularity of forest-fire fusion images. The improved Transformer module combines the efficient global representation capability of the Transformer with the detail enhancement of wavelet-guided pooling. The fusion model keeps more detailed texture information while learning the global fusion relationship, which lays the foundation for accurate forest-fire detection.
- (2)
- The VTW-GAN model uses transfer learning to solve the problem of insufficient forest-fire image data. The pre-training model is obtained by training on the KAIST dataset, and the pre-training model is fine-tuned based on the Corsican Fire dataset. In the case of limited data, the performance of the model in forest-fire image fusion is improved.
- (3)
- This study conducted experiments on the KAIST dataset and Corsican Fire dataset and proved that VTW-GAN has excellent fusion performance in pedestrian street images and forest-fire images, which reflects the better generalization ability of the VTW-GAN model.
2. Materials and Methods
2.1. Generator
2.2. Discriminator
2.3. Loss Function
2.4. Transfer Learning
3. Experiment and Discussion
3.1. Evaluation Metrics
3.2. Comparison and Analysis of Experimental Results on the KAIST Dataset
3.3. Comparison and Analysis of Experimental Results on the Corsican Fire Dataset
3.4. Ablation Study
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, S.; Kang, X.; Fang, L.; Hu, J.; Yin, H. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 2017, 33, 100–112. [Google Scholar] [CrossRef]
- Ma, J.; Ma, Y.; Li, C. Infrared and Visible Image Fusion Methods and Applications: A Survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Yin, H.; Xiao, J. Laplacian pyramid generative adversarial network for infrared and visible image fusion. IEEE Signal Process. Lett. 2022, 29, 1988–1992. [Google Scholar] [CrossRef]
- Mallat, S.G. A Theory for Multiresolution Signal Decomposition—The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
- Li, L.; Ma, H. Pulse coupled neural network-based multimodal medical image fusion via guided filtering and WSEML in NSCT domain. Entropy 2021, 23, 591. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Liu, S.P.; Wang, Z.F. A General Framework for Image Fusion Based on Multi-Scale Transform and Sparse Representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, X.; Wang, Z.; Wang, Z.J.; Ward, R.K.; Wang, X. Deep learning for pixel-level image fusion: Recent advances and future prospects. Inf. Fusion 2018, 42, 158–173. [Google Scholar] [CrossRef]
- Pang, S.; Huo, H.; Yang, X.; Li, J.; Liu, X. Infrared and visible image fusion based on double fluid pyramids and multi-scale gradient residual block. Infrared Phys. Technol. 2023, 131, 104702. [Google Scholar] [CrossRef]
- Li, G.; Qian, X.; Qu, X. SOSMaskFuse: An infrared and visible image fusion architecture based on salient object segmentation mask. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10118–10137. [Google Scholar] [CrossRef]
- Ding, Z.; Li, H.; Zhou, D.; Liu, Y.; Hou, R. A robust infrared and visible image fusion framework via multi-receptive-field attention and color visual perception. Appl. Intell. 2023, 53, 8114–8132. [Google Scholar] [CrossRef]
- Jin, Q.; Tan, S.; Zhang, G.; Yang, Z.; Wen, Y.; Xiao, H.; Wu, X. Visible and Infrared Image Fusion of Forest Fire Scenes Based on Generative Adversarial Networks with Multi-Classification and Multi-Level Constraints. Forests 2023, 14, 1952. [Google Scholar] [CrossRef]
- Rao, Y.; Wu, D.; Han, M.; Wang, T.; Yang, Y.; Lei, T.; Zhou, C.; Bai, H.; Xing, L. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Inf. Fusion 2023, 92, 336–349. [Google Scholar] [CrossRef]
- Huang, S.; Song, Z.; Yang, Y.; Wan, W.; Kong, X. MAGAN: Multi-Attention Generative Adversarial Network for Infrared and Visible Image Fusion. IEEE Trans. Instrum. Meas. 2023, 72, 1–14. [Google Scholar]
- Wang, Z.; Chen, Y.; Shao, W.; Li, H.; Zhang, L. SwinFuse: A residual swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
- Tang, W.; He, F.; Liu, Y.; Duan, Y. MATR: Multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans. Image Process. 2022, 31, 5134–5149. [Google Scholar] [CrossRef]
- Rao, D.; Xu, T.; Wu, X.J. Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network [Early Access]. IEEE Trans. Image Process. 2023. [Google Scholar] [CrossRef]
- Yoo, J.; Uh, Y.; Chun, S.; Kang, B.; Ha, J. Photorealistic style transfer via wavelet transforms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Hwang, J.; Yu, C.; Shin, Y. SAR-to-optical image translation using SSIM and perceptual loss based cycle-consistent GAN. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020. [Google Scholar]
- Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–15 June 2015. [Google Scholar]
- Toulouse, T.; Rossi, L.; Campana, A.; Celik, T.; Akhloufi, M. Computer vision for wildfire research: An evolving image dataset for processing and analysis. Fire Saf. J. 2017, 92, 188–194. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
Method | EN ↑ | SF ↑ | AG ↑ | SD ↑ | MI ↑ | Qabf ↑ |
---|---|---|---|---|---|---|
DenseFuse [21] | 6.949 | 8.098 | 2.941 | 47.442 | 2.538 | 0.476 |
IFCNN [22] | 6.867 | 9.604 | 3.464 | 47.118 | 3.086 | 0.622 |
U2Fusion [23] | 6.699 | 10.028 | 3.536 | 40.493 | 3.174 | 0.536 |
SwinFusion [24] | 6.700 | 9.345 | 3.193 | 42.048 | 3.193 | 0.544 |
TGFuse [16] | 6.987 | 10.485 | 3.586 | 60.625 | 3.541 | 0.592 |
VTW-GAN | 7.187 | 10.517 | 3.610 | 61.563 | 3.962 | 0.687 |
Framework | EN ↑ | SF ↑ | AG ↑ | SD ↑ | MI ↑ | Qabf ↑ |
---|---|---|---|---|---|---|
Fusion Model | 6.689 | 9.842 | 3.604 | 38.686 | 3.749 | 0.586 |
Transfer Learning Model | 6.867 | 10.894 | 3.827 | 43.223 | 4.085 | 0.660 |
Method | EN ↑ | SF ↑ | AG ↑ | SD ↑ | MI ↑ | Qabf ↑ |
---|---|---|---|---|---|---|
DenseFuse [21] | 6.548 | 6.375 | 2.517 | 36.300 | 3.280 | 0.367 |
IFCNN [22] | 6.599 | 10.103 | 3.633 | 40.824 | 3.513 | 0.603 |
U2Fusion [23] | 5.969 | 9.598 | 3.371 | 33.555 | 2.743 | 0.497 |
SwinFusion [24] | 6.467 | 10.603 | 3.501 | 41.164 | 3.115 | 0.582 |
TGFuse [16] | 6.765 | 10.691 | 3.700 | 41.227 | 3.920 | 0.642 |
VTW-GAN | 6.867 | 10.894 | 3.827 | 43.223 | 4.085 | 0.660 |
Method | DenseFuse21 | IFCNN22 | U2Fusion23 | SwinFusion24 | TGFuse16 | VTW-GAN |
---|---|---|---|---|---|---|
Mean (s) | 0.690 | 2.974 | 1.500 | 0.765 | 0.042 | 0.150 |
STD (s) | 0.025 | 0.243 | 0.007 | 0.052 | 0.026 | 0.027 |
FPS | 1.449 | 0.336 | 0.667 | 1.307 | 23.810 | 6.667 |
Channel | Spatial | EN ↑ | SF ↑ | AG ↑ | SD ↑ | MI ↑ | Qabf ↑ | |
---|---|---|---|---|---|---|---|---|
Wavelet-guided pooling and unpooling | √ | 6.803 | 10.719 | 3.752 | 41.801 | 3.536 | 0.643 | |
√ | 6.837 | 10.809 | 3.812 | 42.834 | 3.825 | 0.651 | ||
√ | √ | 6.832 | 10.714 | 3.786 | 42.572 | 3.745 | 0.634 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, H.; Fu, X.; Wang, Z.; Zhao, J. Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer. Forests 2024, 15, 976. https://doi.org/10.3390/f15060976
Wei H, Fu X, Wang Z, Zhao J. Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer. Forests. 2024; 15(6):976. https://doi.org/10.3390/f15060976
Chicago/Turabian StyleWei, Haicheng, Xinping Fu, Zhuokang Wang, and Jing Zhao. 2024. "Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer" Forests 15, no. 6: 976. https://doi.org/10.3390/f15060976
APA StyleWei, H., Fu, X., Wang, Z., & Zhao, J. (2024). Infrared/Visible Light Fire Image Fusion Method Based on Generative Adversarial Network of Wavelet-Guided Pooling Vision Transformer. Forests, 15(6), 976. https://doi.org/10.3390/f15060976