GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion
Abstract
:1. Introduction
- A four-branch Transformer-CNN network architecture is constructed for infrared and visible image fusion by utilizing multiple types of image inputs. The network is able to achieve end-to-end training and testing. Its architecture takes both original infrared and visible images and their preprocessed feature maps as input to go through global and local feature extraction and fusion to enhance the fusion quality of infrared and visible images.
- An Attention-based Feature Selection and Fusion Module (ASFM) is developed so that both unique and common features from different modalities can be integrated through an addition–multiplication strategy. This approach increases the richness of information in the fused image.
- A Dual Attention Fusion Module (DAFM) is proposed that employs a combination of channel and spatial attention mechanisms. It enables selectively filtering and fusing global and local features to reduce feature redundancy in the fused image.
2. Related Work
3. Proposed Method
3.1. Overall Framework
3.2. Four-Branch Feature Extraction
3.2.1. Global Feature Extraction Branches
3.2.2. Local Feature Extraction Branches
3.3. Feature Fusion Module
3.3.1. Attention-Based Feature Selection Fusion Module (ASFM)
3.3.2. Dual Attention Fusion Module (DAFM)
3.4. Loss Function
3.4.1. Content Loss
3.4.2. Perceptual Loss
4. Experiments
4.1. Experimental Settings
4.1.1. Training and Test Datasets
4.1.2. Implement Details
4.2. Results Analysis
4.2.1. Subjective Evaluation
4.2.2. Objective Evaluation
4.3. Ablation Studies
4.3.1. ASFM
4.3.2. DAFM
4.3.3. Perceptual Loss
4.3.4. Gaussian Smoothing Preprocessing
4.4. Object Detection Performance
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, J.; Li, X.; Luo, L.; Ma, J. Multi-focus image fusion based on multi-scale gradients and image matting. IEEE Trans. Multimed. 2021, 24, 655–667. [Google Scholar] [CrossRef]
- Saad, R.S.M.; Moussa, M.M.; Abdel-Kader, N.S.; Farouk, H.; Mashaly, S. Deep video-based person re-identification (Deep Vid-ReID): Comprehensive survey. EURASIP J. Adv. Signal Process. 2024, 1, 63. [Google Scholar] [CrossRef]
- Hu, Z.; Yaguang, J.; Guoqing, W. Decision-level fusion detection method of visible and infrared images under low light conditions. EURASIP J. Adv. Signal Process. 2023, 1, 38. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-Aware Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
- Dai, X.; Yuan, X.; Wei, X. TIRNet: Object detection in thermal infrared images for autonomous driving. Appl. Intell. 2021, 51, 1244–1261. [Google Scholar] [CrossRef]
- Ha, Q.; Watanabe, K.; Karasawa, T.; Ushiku, Y.; Harada, T. MFNet: Towards Real-Time Semantic Segmentation for Autonomous Vehicles with Multi-Spectral Scenes. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 5108–5115. [Google Scholar]
- Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Liu, X.; Mei, W.; Du, H. Structure tensor and nonsubsampled shearlet transform based algorithm for CT and MRI image fusion. Neurocomputing 2017, 235, 131–139. [Google Scholar] [CrossRef]
- Li, S.; Yang, B.; Hu, J. Performance comparison of different multi-resolution transforms for image fusion. Inf. Fusion 2011, 12, 74–84. [Google Scholar] [CrossRef]
- Pajares, G.; De La Cruz, J.M. A wavelet-based image fusion tutorial. Pattern Recognit. 2004, 37, 1855–1872. [Google Scholar] [CrossRef]
- Wang, J.; Peng, J.; Feng, X.; He, G.; Fan, J. Fusion method for infrared and visible images by using non-negative sparse representation. Infrared Phys. Technol. 2014, 67, 477–489. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, Y.; Blum, R.S.; Han, J.; Tao, D. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review. Inf. Fusion 2018, 40, 57–75. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, X.; Ward, R.K.; Wang, Z.J. Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 2016, 23, 1882–1886. [Google Scholar] [CrossRef]
- Lewis, J.J.; O’callaghan, R.J.; Nikolov, S.G.; Bull, D.R.; Canagarajah, C.N. Region-Based Image Fusion Using Complex Wavelets. In Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden, 28 June–1 July 2004; pp. 555–562. [Google Scholar]
- Meher, B.; Agrawal, S.; Panda, R.; Abraham, A. A survey on region based image fusion methods. Inf. Fusion 2019, 48, 119–132. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Zhao, Z.; Xu, S.; Zhang, C.; Liu, J.; Li, P.; Zhang, J. DIDFuse: Deep Image Decomposition for Infrared and Visible Image Fusion. arXiv 2020, arXiv:2003.09210. [Google Scholar]
- Zhang, H.; Ma, J. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. Int. J. Comput. Vision. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
- Rao, D.; Xu, T.; Wu, X.J. Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network. arXiv 2023, arXiv:2201.10147. [Google Scholar] [CrossRef]
- Fu, Y.; Xu, T.Y.; Wu, X.J.; Fu, Y.; Xu, T.; Wu, X.; Kittler, J. Ppt Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion. arXiv 2021, arXiv:2107.13967. [Google Scholar]
- Qu, L.; Liu, S.; Wang, M.; Song, Z. Transmef: A Transformer-Based Multi-Exposure Image Fusion Framework Using Self-Supervised Multi-Task Learning. arXiv 2021, arXiv:2112.01030. [Google Scholar] [CrossRef]
- Li, J.; Zhu, J.; Li, C.; Chen, X.; Yang, B. CGTF: Convolution-guided transformer for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2022, 71, 5012314. [Google Scholar] [CrossRef]
- Zhao, H.; Nie, R. DNDT: Infrared and Visible Image Fusion via Densenet and Dual-Transformer. In Proceedings of the 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 24–26 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 71–75. [Google Scholar]
- Huang, J.; Li, X.; Tan, T.; Li, X.; Ye, T. MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion. arXiv 2024, arXiv:2404.17747. [Google Scholar]
- Feng, S.; Wu, C.; Lin, C.; Huang, M. RADFNet: An infrared and visible image fusion framework based on distributed network. Front. Plant Sci. 2023, 13, 1056711. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Yafei, Z.; Fan, L. Infrared and visible image fusion with edge detail implantation. Front. Phys. 2023, 11, 1180100. [Google Scholar] [CrossRef]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.P. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Huo, H.; Li, C.; Wang, R.; Feng, Q. AttentionFGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimed. 2020, 23, 1383–1396. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Durrani, T. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, H.; Xiao, Y.; Guo, X.; Ma, J. Rethinking the Image Fusion: A Fast Unified Image Fusion Network Based on Proportional Maintenance of Gradient and Intensity. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12797–12804. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck Transformers for Visual Recognition. arXiv 2021, arXiv:2101.11605. [Google Scholar]
- Chen, M.; Peng, H.; Fu, J.; Ling, H. Autoformer: Searching Transformers for Visual Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12270–12280. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable Detr: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. Trackformer: Multi-object Tracking with Transformers. arXiv 2021, arXiv:2101.02702. [Google Scholar]
- Lin, L.; Fan, H.; Zhang, Z.; Xu, Y.; Ling, H. Swintrack: A simple and strong baseline for transformer tracking. Adv. Neural Inf. Process. Syst. 2022, 35, 16743–16754. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Zhang, Y.; Liu, H.; Hu, Q. Transfuse: Fusing Transformers and Cnns for Medical Image Segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021, Proceeding of the 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Part I 24; Springer: Berlin/Heidelberg, Germany, 2021; pp. 14–24. [Google Scholar]
- Chen, J.; Ding, J.; Yu, Y.; Chen, J.; Ding, J.; Yu, Y.; Gong, W. THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor. Neurocomputing 2023, 527, 71–82. [Google Scholar] [CrossRef]
- Yi, S.; Jiang, G.; Liu, X.; Yi, S.; Jiang, G.; Liu, X.; Li, J.; Chen, L. TCPMFNet: An infrared and visible image fusion network with composite auto encoder and transformer–convolutional parallel mixed fusion strategy. Infrared Phys. Technol. 2022, 127, 104405. [Google Scholar] [CrossRef]
- Tang, W.; He, F.; Liu, Y. TCCFusion: An infrared and visible image fusion method based on transformer and cross correlation. Pattern Recognit. 2023, 137, 109295. [Google Scholar] [CrossRef]
- Huang, H.; Zhou, X.; Cao, J.; Huang, H.; Zhou, X.; Cao, J.; He, R.; Tan, T. Vision Transformer with Super Token Sampling. arXiv 2022, arXiv:2211.11167. [Google Scholar]
- Yang, G.; Li, J.; Lei, H.; Gao, X. A multi-scale information integration framework for infrared and visible image fusion. Neurocomputing 2024, 600, 128116. [Google Scholar] [CrossRef]
- Li, X.; Chen, H.; Li, Y.; Peng, Y. MAFusion: Multiscale attention network for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–16. [Google Scholar] [CrossRef]
- Li, H.; Xiao-Jun, W. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Computer Vision–ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part II; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Xu, D.; Zhang, N.; Zhang, Y.; Li, Z.; Zhao, Z.; Wang, Y. Multi-scale unsupervised network for infrared and visible image fusion based on joint attention mechanism. Infrared Phys. Technol. 2022, 125, 104242. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Le, Z.; Jiang, J.; Guo, X. Fusiondn: A Unified Densely Connected Network for Image Fusion. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12484–12491. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83, 79–92. [Google Scholar] [CrossRef]
- Toet, A. TNO Image Fusion Dataset. Figshare 2014. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
- Xue, W.; Wang, A.; Zhao, L. FLFuse-Net: A fast and lightweight infrared and visible image fusion network via feature flow and edge compensation for salient information. Infrared Phys. Technol. 2022, 127, 104383. [Google Scholar] [CrossRef]
- Tang, W.; He, F.; Liu, Y.; Duan, Y.; Si, T. DATFuse: Infrared and visible image fusion via dual attention transformer. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3159–3172. [Google Scholar] [CrossRef]
- Zhao, W.; Xie, S.; Zhao, F.; He, Y.; Lu, H. Metafusion: Infrared and Visible Image Fusion via Meta-Feature Embedding from Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Liu, J.; Lin, R.; Wu, G.; Liu, R.; Luo, Z.; Fan, X. Coconet: Coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion. Int. J. Comput. Vis. 2024, 132, 1748–1775. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhu, Y.; Zhang, J.; Xu, S.; Zhang, Y.; Zhang, K.; Meng, D.; Timofte, R.; Van Gool, L. DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023. [Google Scholar]
- Liu, J.; Fan, X.; Huang, Z.; Wu, G.; Liu, R.; Zhong, W.; Luo, Z. Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Roberts, J.W.; Van Aardt, J.A.; Ahmed, F.B. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote Sens. 2008, 2, 023522. [Google Scholar]
- Eskicioglu, A.M.; Fisher, P.S. Image quality measures and their performance. IEEE Trans. Commun. 1995, 43, 2959–2965. [Google Scholar] [CrossRef]
- Qu, G.; Zhang, D.; Yan, P. Information measure for performance of image fusion. Electron. Lett. 2002, 38, 1. [Google Scholar] [CrossRef]
- Rao, Y.J. In-fibre Bragg grating sensors. Meas. Sci. Technol. 1997, 8, 355. [Google Scholar] [CrossRef]
- Han, Y.; Cai, Y.; Cao, Y.; Xu, X. A new image fusion performance metric based on visual information fidelity. Inf. Fusion 2013, 14, 127–135. [Google Scholar] [CrossRef]
- Aslantas, V.; Bendes, E. A new image quality metric for image fusion: The sum of the correlations of differences. Aeu-Int. J. Electron. Commun. 2015, 69, 1890–1896. [Google Scholar] [CrossRef]
- Jagalingam, P.; Hegde, A.V. A review of quality metrics for fused image. Aquat. Procedia 2015, 4, 133–142. [Google Scholar] [CrossRef]
- Xydeas, C.S.; Petrovic, V. Objective image fusion performance measure. Electron. Lett. 2000, 36, 308–309. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Datasets | Train | Test | ||
---|---|---|---|---|
MSRS | MSRS | TNO | RoadScene | |
Number of pairs of images | 16,245 | 361 | 42 | 50 |
Datasets | Methods | EN | SF | MI | SD | VIF | SCD | PSNR | Qabf |
---|---|---|---|---|---|---|---|---|---|
TNO | DenseFuse | 6.8193 | 8.9854 | 2.3019 | 34.8250 | 0.6584 | 1.7838 | 62.5774 | 0.4463 |
FusionGAN | 6.5580 | 6.2753 | 2.3352 | 30.6632 | 0.4220 | 1.3793 | 60.9794 | 0.2344 | |
IFCNN | 6.8539 | 12.2266 | 2.0572 | 37.0808 | 0.6286 | 1.7777 | 63.4661 | 0.4813 | |
U2Fusion | 6.9967 | 11.8638 | 2.0102 | 43.5316 | 0.6172 | 1.7839 | 62.8082 | 0.4267 | |
SDNet | 6.6948 | 11.6428 | 2.2606 | 33.6693 | 0.5779 | 1.5590 | 62.1501 | 0.4298 | |
RFN-Nest | 6.9632 | 5.8748 | 2.1184 | 36.8970 | 0.5593 | 1.7843 | 62.1930 | 0.3346 | |
FLFuse | 6.3617 | 6.6319 | 2.1863 | 25.7225 | 0.6058 | 1.5817 | 63.8388 | 0.3961 | |
SeAFusion | 7.1335 | 12.2525 | 2.8382 | 44.2436 | 0.7042 | 1.7232 | 61.3918 | 0.4879 | |
DATFuse | 6.4531 | 9.6056 | 2.7322 | 27.5744 | 0.6830 | 1.4957 | 61.7734 | 0.4997 | |
MetaFusion | 6.9092 | 12.8126 | 2.3031 | 41.2381 | 0.5878 | 1.6224 | 60.4842 | 0.4558 | |
CoCoNet | 6.9626 | 15.3051 | 2.1150 | 43.2736 | 0.6527 | 1.7154 | 59.9634 | 0.2988 | |
MMIF-DDFM | 6.2556 | 10.8785 | 2.6921 | 36.5776 | 0.6625 | 1.6233 | 61.2347 | 0.3774 | |
TarDAL | 6.8405 | 7.9591 | 2.6093 | 45.2115 | 0.5388 | 1.5485 | 62.3043 | 0.3017 | |
Ours | 7.1586 | 14.0076 | 2.7605 | 46.6204 | 0.7397 | 1.8081 | 62.7759 | 0.5293 | |
RoadScene | DenseFuse | 6.8528 | 9.7177 | 2.6799 | 33.2557 | 0.5256 | 1.6137 | 62.4043 | 0.4033 |
FusionGAN | 6.9840 | 8.7165 | 2.7843 | 40.6827 | 0.3852 | 1.3024 | 59.4507 | 0.2670 | |
IFCNN | 7.1978 | 14.8602 | 3.0816 | 33.0118 | 0.6445 | 1.6774 | 62.5812 | 0.5476 | |
U2Fusion | 6.6371 | 7.1690 | 2.8250 | 28.5052 | 0.5517 | 1.4136 | 64.5245 | 0.3714 | |
SDNet | 7.3058 | 15.3434 | 3.2655 | 46.0864 | 0.6269 | 1.6590 | 62.1423 | 0.5002 | |
RFN-Nest | 7.2215 | 8.2024 | 2.7284 | 46.9421 | 0.5404 | 1.6633 | 60.4627 | 0.3294 | |
FLFuse | 7.0051 | 11.8355 | 3.0883 | 37.5425 | 0.6722 | 1.6248 | 63.0037 | 0.5522 | |
SeAFusion | 7.3509 | 15.3495 | 3.2368 | 50.1824 | 0.6708 | 1.6821 | 61.6643 | 0.5262 | |
DATFuse | 6.7240 | 11.3166 | 3.0747 | 32.2687 | 0.6249 | 1.2876 | 62.3904 | 0.5191 | |
MetaFusion | 7.0793 | 13.5665 | 2.7918 | 46.2135 | 0.6589 | 1.6435 | 60.5586 | 0.3253 | |
CoCoNet | 7.2792 | 14.5424 | 2.6615 | 51.9432 | 0.5594 | 1.6832 | 59.7690 | 0.3363 | |
MMIF-DDFM | 6.9785 | 13.9568 | 2.8998 | 47.2880 | 0.6118 | 1.6340 | 60.9558 | 0.4788 | |
TarDAL | 7.1443 | 9.7689 | 3.1626 | 47.4445 | 0.5324 | 1.5534 | 62.8829 | 0.3745 | |
Ours | 7.3767 | 15.4976 | 3.2819 | 49.2160 | 0.6923 | 1.7092 | 62.1296 | 0.5387 | |
MSRS | DenseFuse | 5.8762 | 5.5566 | 2.7573 | 23.9138 | 0.6769 | 1.3844 | 65.4523 | 0.3531 |
FusionGAN | 5.5425 | 4.4791 | 2.0483 | 21.9577 | 0.3908 | 1.1554 | 65.3521 | 0.1469 | |
IFCNN | 6.3071 | 11.6410 | 2.6515 | 37.1684 | 0.7697 | 1.7623 | 66.1758 | 0.5641 | |
U2Fusion | 5.3812 | 4.7367 | 2.6663 | 21.0356 | 0.5178 | 1.1844 | 66.7435 | 0.2126 | |
SDNet | 5.1584 | 8.0930 | 1.8228 | 19.6555 | 0.5032 | 1.0192 | 61.4973 | 0.3292 | |
RFN-Nest | 5.1095 | 5.1106 | 2.2275 | 27.2586 | 0.5002 | 1.4883 | 64.9407 | 0.2472 | |
FLFuse | 5.5974 | 6.5804 | 2.2434 | 22.3773 | 0.6517 | 1.4371 | 66.1722 | 0.3722 | |
SeAFusion | 6.3866 | 10.0465 | 3.2239 | 41.8989 | 0.9517 | 1.8347 | 64.5711 | 0.6486 | |
DATFuse | 6.5018 | 10.0867 | 3.4611 | 36.9058 | 0.9115 | 1.6701 | 63.4526 | 0.6097 | |
MetaFusion | 6.3251 | 8.3765 | 2.4669 | 38.2445 | 0.5833 | 1.6770 | 60.5766 | 0.3112 | |
CoCoNet | 6.6009 | 9.3032 | 2.5117 | 46.5309 | 0.6001 | 1.5017 | 57.8727 | 0.3109 | |
MMIF-DDFM | 6.2058 | 9.4601 | 2.5652 | 34.3546 | 0.6823 | 1.5631 | 62.1245 | 0.5433 | |
TarDAL | 5.7889 | 6.6258 | 2.1589 | 33.8457 | 0.3901 | 0.6827 | 62.8708 | 0.1737 | |
Ours | 6.6151 | 9.9702 | 3.7839 | 43.5509 | 0.9826 | 1.8599 | 64.3130 | 0.6341 |
EN | SF | MI | SD | VIF | SCD | PSNR | Qabf | |
---|---|---|---|---|---|---|---|---|
w/o ASFM | 7.0032 | 12.3035 | 2.1068 | 43.4166 | 0.6635 | 1.7509 | 58.3538 | 0.4408 |
w/Addition | 7.1437 | 12.1670 | 2.6934 | 48.5032 | 0.6973 | 1.8208 | 61.3992 | 0.4751 |
w/Multiplication | 7.0664 | 13.0366 | 2.4189 | 45.5905 | 0.7161 | 1.7642 | 62.5519 | 0.5382 |
w/ASFM | 7.1586 | 14.0076 | 2.7605 | 46.6204 | 0.7397 | 1.8081 | 62.7759 | 0.5293 |
EN | SF | MI | SD | VIF | SCD | PSNR | Qabf | |
---|---|---|---|---|---|---|---|---|
w/o DAFM | 7.3032 | 15.2235 | 3.0068 | 46.3773 | 0.6721 | 1.7109 | 61.8674 | 0.5108 |
w/DAFM | 7.3767 | 15.4976 | 3.2819 | 49.2160 | 0.6923 | 1.7092 | 62.1296 | 0.5387 |
EN | SF | MI | SD | VIF | SCD | PSNR | Qabf | |
---|---|---|---|---|---|---|---|---|
w/o Dw | 6.5098 | 10.9142 | 3.6716 | 42.2723 | 0.9696 | 1.8374 | 65.8869 | 0.6282 |
w/Dw | 6.6151 | 9.9702 | 3.7839 | 43.5509 | 0.9826 | 1.8599 | 64.3130 | 0.6341 |
Methods | [email protected] | [email protected] | [email protected] | mAP@[0.5:0.95] | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Person | Car | All | Person | Car | All | Person | Car | All | Person | Car | All | |
Infrared | 0.949 | 0.683 | 0.816 | 0.865 | 0.589 | 0.727 | 0.212 | 0.157 | 0.184 | 0.671 | 0.47 | 0.571 |
Visible | 0.681 | 0.933 | 0.807 | 0.425 | 0.842 | 0.634 | 0.0136 | 0.389 | 0.201 | 0.35 | 0.717 | 0.533 |
DenseFuse | 0.927 | 0.948 | 0.937 | 0.799 | 0.911 | 0.855 | 0.0803 | 0.488 | 0.284 | 0.597 | 0.742 | 0.669 |
FusionGAN | 0.879 | 0.916 | 0.898 | 0.755 | 0.836 | 0.796 | 0.143 | 0.464 | 0.304 | 0.594 | 0.722 | 0.658 |
IFCNN | 0.928 | 0.917 | 0.922 | 0.833 | 0.915 | 0.874 | 0.118 | 0.49 | 0.304 | 0.62 | 0.746 | 0.683 |
U2Fusion | 0.943 | 0.939 | 0.941 | 0.804 | 0.892 | 0.848 | 0.0809 | 0.435 | 0.258 | 0.603 | 0.732 | 0.667 |
SDNet | 0.961 | 0.927 | 0.944 | 0.83 | 0.886 | 0.858 | 0.124 | 0.526 | 0.325 | 0.639 | 0.753 | 0.696 |
RFN-Nest | 0.801 | 0.868 | 0.835 | 0.666 | 0.77 | 0.718 | 0.0466 | 0.436 | 0.241 | 0.487 | 0.657 | 0.572 |
FLFuse | 0.869 | 0.878 | 0.873 | 0.798 | 0.767 | 0.782 | 0.14 | 0.529 | 0.334 | 0.599 | 0.686 | 0.642 |
SeAFusion | 0.915 | 0.905 | 0.91 | 0.833 | 0.83 | 0.831 | 0.101 | 0.456 | 0.278 | 0.601 | 0.709 | 0.655 |
DATFuse | 0.922 | 0.907 | 0.915 | 0.815 | 0.856 | 0.836 | 0.0972 | 0.456 | 0.276 | 0.604 | 0.715 | 0.659 |
MetaFusion | 0.888 | 0.91 | 0.899 | 0.686 | 0.832 | 0.759 | 0.099 | 0.42 | 0.259 | 0.549 | 0.701 | 0.625 |
CoCoNet | 0.813 | 0.702 | 0.757 | 0.668 | 0.649 | 0.659 | 0.107 | 0.236 | 0.172 | 0.512 | 0.52 | 0.516 |
MMIF-DDFM | 0.916 | 0.946 | 0.931 | 0.797 | 0.861 | 0.829 | 0.072 | 0.414 | 0.243 | 0.594 | 0.712 | 0.653 |
TarDAL | 0.872 | 0.913 | 0.893 | 0.717 | 0.817 | 0.767 | 0.118 | 0.357 | 0.237 | 0.552 | 0.674 | 0.613 |
Ours | 0.945 | 0.926 | 0.936 | 0.862 | 0.889 | 0.876 | 0.168 | 0.502 | 0.335 | 0.657 | 0.738 | 0.698 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, G.; Hu, Z.; Feng, S.; Wang, Z.; Wu, H. GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion. Remote Sens. 2024, 16, 3246. https://doi.org/10.3390/rs16173246
Zhao G, Hu Z, Feng S, Wang Z, Wu H. GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion. Remote Sensing. 2024; 16(17):3246. https://doi.org/10.3390/rs16173246
Chicago/Turabian StyleZhao, Genping, Zhuyong Hu, Silu Feng, Zhuowei Wang, and Heng Wu. 2024. "GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion" Remote Sensing 16, no. 17: 3246. https://doi.org/10.3390/rs16173246
APA StyleZhao, G., Hu, Z., Feng, S., Wang, Z., & Wu, H. (2024). GLFuse: A Global and Local Four-Branch Feature Extraction Network for Infrared and Visible Image Fusion. Remote Sensing, 16(17), 3246. https://doi.org/10.3390/rs16173246