DFA-Net: Multi-Scale Dense Feature-Aware Network via Integrated Attention for Unmanned Aerial Vehicle Infrared and Visible Image Fusion
Abstract
1. Introduction
- We design a novel multi-scale dense feature-aware network via integrated attention, which can extract infrared image target features and visible detail texture features, and achieve excellent results in infrared and visible image fusion by multi-scale nesting methods.
- We develop an integrated attention module for enhancing complementary features of both infrared and visible images, aiming at retaining richer detail information and focusing on salient features during fusion.
- We combine intensity and gradient loss to refine the fused multi-source information and generate high-quality infrared and visible fused images.
- We achieve excellent image fusion effects on UAV infrared and visible images.
2. Materials and Methods
2.1. Network Architecture
2.2. Integrated Attention Fusion
2.3. Fusion Loss
3. Experiment and Analysis
3.1. Data Preparation and Baselines
3.2. Evaluation Metric
- (1)
- Entropy, EN is the average amount of information contained in each received message, also known as information entropy, source entropy, and average self-information. The index can only be used to reflect the information carried by the fusion image.
- (2)
- Mutual Information, MI, represents the amount of information that can be extracted from a source image. The fusion effect is better with a higher MI value because the source images contain more information. MI is calculated according to the joint information entropy and the information entropy and of the image:
- (3)
- Visual Information Fidelity, VIF, refers to a measurement method based on visual information fidelity, which is used to measure the quality of fused images. As the VIF value increases, the better the visual effect people will have on the fused image.
- (4)
- Spatial Frequency, SF, is calculated by row frequency and column frequency to measure the spatial frequency information contained in the fusion image. Spatial frequency increases with the sharpness of the image. The formula for its calculation is as follows:
- (5)
- Standard Deviation, SD, represents how much an image’s pixel value has changed relative to its average.
- (6)
- Gradient based Fusion Performance, , is a new objective non-reference quality evaluation method for fused images. The algorithm for obtaining uses local metrics to estimate the degree of representation of input important information in the fused image.
- (7)
- Average Gradient, AG, refers to the sharpness of an image and its ability to express information. A larger average gradient will result in a sharper image and a better fusion result, as indicated by this theory. Here is the calculation formula:
- (8)
- Sum of Correlation Differences, SCD, measures the quality of images in image fusion. Based on this method, differential images are calculated using the source image and the fused image, and their correlation is evaluated. Rather than directly evaluating the correlation between the source image and the fused image, it calculates the quality of the fused image by considering the source image and its effects.
3.3. Experimental Result
3.3.1. Visual Performance
3.3.2. Quantitative Comparison
3.4. Ablation Study
3.5. Generalization Analysis
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ehlers, M. Multisensor image fusion techniques in remote sensing. ISPRS J. Photogramm. Remote Sens. 1991, 46, 19–30. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
- Burt, P.J. A gradient pyramid basis for pattern-selective image fusion. Proc. SID 1992, 23, 467–470. [Google Scholar]
- Yang, Y.; Que, Y.; Huang, S.; Lin, P. Multimodal sensor medical image fusion based on type-2 fuzzy logic in NSCT domain. IEEE Sens. J. 2016, 16, 3735–3745. [Google Scholar]
- Jin, B.; Cruz, L.; Gonçalves, N. Deep facial diagnosis: Deep transfer learning from face recognition to facial diagnosis. IEEE Access 2020, 8, 123649–123661. [Google Scholar] [CrossRef]
- Zheng, Q.; Zhao, P.; Li, Y.; Wang, H.; Yang, Y. Spectrum interference-based two-level data augmentation method in deep learning for automatic modulation classification. Neural Comput. Appl. 2021, 33, 7723–7745. [Google Scholar] [CrossRef]
- Li, B.; Li, Q.; Zeng, Y.; Rong, Y.; Zhang, R. 3D trajectory optimization for energy-efficient UAV communication: A control design perspective. IEEE Trans. Wirel. Commun. 2021, 21, 4579–4593. [Google Scholar] [CrossRef]
- Raza, A.; Liu, J.; Liu, Y.; Liu, J.; Li, Z.; Chen, X.; Huo, H.; Fang, T. IR-MSDNet: Infrared and visible image fusion based on infrared features and multiscale dense network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3426–3437. [Google Scholar] [CrossRef]
- Mei, L.; Yu, Y.; Shen, H.; Weng, Y.; Liu, Y.; Wang, D.; Liu, S.; Zhou, F.; Lei, C. Adversarial multiscale feature learning framework for overlapping chromosome segmentation. Entropy 2022, 24, 522. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Li, X.; Luo, L.; Mei, X.; Ma, J. Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf. Sci. 2020, 508, 64–78. [Google Scholar] [CrossRef]
- Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, B.; Li, S.; Dong, M. Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters. Inf. Fusion 2016, 30, 15–26. [Google Scholar]
- Zhang, Z.; Blum, R.S. A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application. Proc. IEEE 1999, 87, 1315–1326. [Google Scholar]
- Cui, G.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt. Commun. 2015, 341, 199–209. [Google Scholar]
- Zhao, J.; Chen, Y.; Feng, H.; Xu, Z.; Li, Q. Infrared image enhancement through saliency feature analysis based on multi-scale decomposition. Infrared Phys. Technol. 2014, 62, 86–93. [Google Scholar]
- Mei, L.; Guo, X.; Huang, X.; Weng, Y.; Liu, S.; Lei, C. Dense contour-imbalance aware framework for colon gland instance segmentation. Biomed. Signal Process. Control. 2020, 60, 101988. [Google Scholar]
- Yang, B.; Li, S. Multifocus image fusion and restoration with sparse representation. IEEE Trans. Instrum. Meas. 2009, 59, 884–892. [Google Scholar] [CrossRef]
- Li, S.; Yin, H.; Fang, L. Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Trans. Biomed. Eng. 2012, 59, 3450–3459. [Google Scholar] [CrossRef]
- Wang, J.; Peng, J.; Feng, X.; He, G.; Fan, J. Fusion method for infrared and visible images by using non-negative sparse representation. Infrared Phys. Technol. 2014, 67, 477–489. [Google Scholar]
- Li, H.; He, X.; Tao, D.; Tang, Y.; Wang, R. Joint medical image fusion, denoising and enhancement via discriminative low-rank sparse dictionaries learning. Pattern Recognit. 2018, 79, 130–146. [Google Scholar] [CrossRef]
- Ma, J.; Zhou, Z.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.-J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef]
- Li, J.; Huo, H.; Li, C.; Wang, R.; Feng, Q. AttentionFGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimed. 2020, 23, 1383–1396. [Google Scholar]
- Chen, J.; Wu, K.; Cheng, Z.; Luo, L. A saliency-based multiscale approach for infrared and visible image fusion. Signal Process. 2021, 182, 107936. [Google Scholar]
- Zhao, J.; Zhou, Q.; Chen, Y.; Feng, H.; Xu, Z.; Li, Q. Fusion of visible and infrared images using saliency analysis and detail preserving based image decomposition. Infrared Phys. Technol. 2013, 56, 93–99. [Google Scholar] [CrossRef]
- Guo, X.; Meng, L.; Mei, L.; Weng, Y.; Tong, H. Multi-focus image fusion with Siamese self-attention network. IET Image Process. 2020, 14, 1339–1346. [Google Scholar] [CrossRef]
- Kumar, S.S.; Muttan, S. PCA-Based Image Fusion, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XII, 2006; SPIE: Bellingham, WA, USA, 2006; pp. 658–665. [Google Scholar]
- Li, H.; Ding, W.; Cao, X.; Liu, C. Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing. Remote Sens. 2017, 9, 441. [Google Scholar] [CrossRef]
- Pu, Q.; Chehri, A.; Jeon, G.; Zhang, L.; Yang, X. DCFusion: Dual-Headed Fusion Strategy and Contextual Information Awareness for Infrared and Visible Remote sensing Image. Remote Sens. 2023, 15, 144. [Google Scholar] [CrossRef]
- He, G.; Ji, J.; Dong, D.; Wang, J.; Fan, J. Infrared and visible image fusion method by using hybrid representation learning. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1796–1800. [Google Scholar] [CrossRef]
- Zhu, D.; Zhan, W.; Jiang, Y.; Xu, X.; Guo, R. MIFFuse: A multi-level feature fusion network for infrared and visible images. IEEE Access 2021, 9, 130778–130792. [Google Scholar] [CrossRef]
- Ma, Y.; Chen, J.; Chen, C.; Fan, F.; Ma, J. Infrared and visible image fusion using total variation model. Neurocomputing 2016, 202, 12–19. [Google Scholar] [CrossRef]
- Li, S.; Yang, B.; Hu, J. Performance comparison of different multi-resolution transforms for image fusion. Inf. Fusion 2011, 12, 74–84. [Google Scholar] [CrossRef]
- Kong, W.; Lei, Y.; Zhao, H. Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization. Infrared Phys. Technol 2014, 67, 161–172. [Google Scholar]
- Bavirisetti, D.P.; Dhuli, R. Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sens. J. 2015, 16, 203–209. [Google Scholar] [CrossRef]
- Yin, M.; Duan, P.; Liu, W.; Liang, X. A novel infrared and visible image fusion algorithm based on shift-invariant dual-tree complex shearlet transform and sparse representation. Neurocomputing 2017, 226, 182–191. [Google Scholar] [CrossRef]
- Jin, X.; Jiang, Q.; Yao, S.; Zhou, D.; Nie, R.; Lee, S.-J.; He, K. Infrared and visual image fusion method based on discrete cosine transform and local spatial frequency in discrete stationary wavelet transform domain. Infrared Phys. Technol. 2018, 88, 1–12. [Google Scholar]
- Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83, 79–92. [Google Scholar] [CrossRef]
- Zhang, H.; Ma, J. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. IJCV 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar]
- Long, Y.; Jia, H.; Zhong, Y.; Jiang, Y.; Jia, Y. RXDNFuse: A aggregated residual dense network for infrared and visible image fusion. Inf. Fusion 2021, 69, 128–141. [Google Scholar] [CrossRef]
- Li, B.; Zhang, M.; Rong, Y.; Han, Z. Transceiver optimization for wireless powered time-division duplex MU-MIMO systems: Non-robust and robust designs. IEEE Trans. Wirel. Commun. 2021, 21, 4594–4607. [Google Scholar] [CrossRef]
- Cheng, D.; Chen, L.; Lv, C.; Guo, L.; Kou, Q. Light-Guided and Cross-Fusion U-Net for Anti-Illumination Image Super-Resolution. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8436–8449. [Google Scholar] [CrossRef]
- Fang, S.; Li, K.; Shao, J.; Li, Z. SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar]
- Yu, R.; Chen, W.; Zhou, D. Infrared and visible image fusion based on gradient transfer optimization model. IEEE Access 2020, 8, 50091–50106. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.-J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
- Ma, J.; Zhang, H.; Shao, Z.; Liang, P.; Xu, H. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2020, 70, 1–14. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
- Bein, B. Entropy. Best Pract. Res. Clin. Anaesthesiol. 2006, 20, 101–109. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Sheikh, H.R.; Bovik, A.C. A Visual Information Fidelity Approach to Video Quality Assessment. In The First International Workshop on Video Processing and Quality Metrics for Consumer Electronics; 2005; pp. 2117–2128. Available online: https://www.semanticscholar.org/paper/A-VISUAL-INFORMATION-FIDELITY-APPROACH-TO-VIDEO-Bovik/b70b6cf13b55b61a37133b921770dcf32ef0bcfd (accessed on 27 June 2023).
- Shapley, R.; Lennie, P. Spatial frequency analysis in the visual system. Annu. Rev. Neurosci. 1985, 8, 547–581. [Google Scholar] [CrossRef] [PubMed]
- Altman, D.G.; Bland, J.M. Standard deviations and standard errors. BMJ 2005, 331, 903. [Google Scholar] [CrossRef]
- Hisham, M.; Yaakob, S.N.; Raof, R.; Nazren, A.A.; Wafi, N. Template matching using sum of squared difference and normalized cross correlation. In Proceedings of the 2015 IEEE Student Conference on Research and Development (SCOReD), Kuala Lumpur, Malaysia, 13–14 December 2015; pp. 100–104. [Google Scholar]
- Schmidt, M.; Le Roux, N.; Bach, F. Minimizing finite sums with the stochastic average gradient. Math. Program. 2017, 162, 83–112. [Google Scholar] [CrossRef]
- Xydeas, C.S.; Petrovic, V. Objective image fusion performance measure. Electron. Lett. 2000, 36, 308–309. [Google Scholar] [CrossRef]











| EN | SF | SD | MI | VIF | AG | SCD | QAB/F | |
|---|---|---|---|---|---|---|---|---|
| MST-SR | 6.274500 | 0.042881 | 8.1 | 2.615585 | 0.805536 | 3.379710 | 1.302535 | 0.528150 | 
| GTF | 5.488868 | 0.031165 | 6.3 | 1.703488 | 0.472943 | 2.369413 | 0.711759 | 0.401289 | 
| FusionGAN | 5.549333 | 0.019308 | 6.3 | 1.879273 | 0.595634 | 1.642359 | 1.065234 | 0.159680 | 
| U2Fusion | 4.819441 | 0.039653 | 6.5 | 1.813905 | 0.547874 | 2.976654 | 1.334482 | 0.390667 | 
| IFCNN | 6.031980 | 0.039803 | 7.4 | 2.330893 | 0.694640 | 3.169491 | 1.291991 | 0.543784 | 
| RFN-Nest | 5.586484 | 0.027077 | 7.7 | 2.183343 | 0.680637 | 2.264621 | 1.546296 | 0.392791 | 
| SDNet | 5.428818 | 0.037694 | 6.1 | 1.754339 | 0.484199 | 2.982137 | 1.122136 | 0.406097 | 
| GANMcC | 5.877195 | 0.024687 | 8.4 | 2.423083 | 0.694944 | 2.141018 | 1.459269 | 0.312746 | 
| SeAFusion | 6.651394 | 0.043554 | 8.4 | 4.037259 | 0.985942 | 3.696791 | 1.685249 | 0.662335 | 
| Ours | 6.741801 | 0.048172 | 8.5 | 3.75237 | 1.041419 | 3.837578 | 1.718825 | 0.689588 | 
| EN | SF | SD | MI | VIF | AG | SCD | QAB/F | |
|---|---|---|---|---|---|---|---|---|
| 1 | 6.656426 | 0.045487 | 8.4 | 4.170775 | 0.988418 | 3.646758 | 1.621753 | 0.669267 | 
| 5 | 6.705488 | 0.046563 | 8.5 | 3.844076 | 1.030301 | 3.758284 | 1.663681 | 0.690490 | 
| 10 | 6.741384 | 0.048109 | 8.5 | 3.723383 | 1.044102 | 3.831187 | 1.726796 | 0.690936 | 
| 15 | 6.773159 | 0.048611 | 8.5 | 3.764401 | 1.051541 | 3.831952 | 1.727661 | 0.691273 | 
| 20 | 6.797420 | 0.049807 | 8.6 | 3.642573 | 1.060563 | 3.905604 | 1.751419 | 0.690326 | 
| 25 | 6.788444 | 0.049672 | 8.5 | 3.588535 | 1.045185 | 3.887107 | 1.752573 | 0.691391 | 
| 30 | 6.788444 | 0.049672 | 8.5 | 3.588535 | 1.045185 | 3.887107 | 1.752573 | 0.691391 | 
| Metric | DFA-Net(-IAF) | DFA-Net(+IAF) | 
|---|---|---|
| EN | 6.01 ± 0.43 | 6.74 ± 0.45 | 
| SF | 0.051385 ± 0.000089 | 0.048172 ± 0.000184 | 
| SD | 7.4 ± 2.1 | 8.5 ± 2.7 | 
| MI | 2.75 ± 0.47 | 3.75 ± 0.86 | 
| VIF | 0.643 ± 0.01 | 1.0414 ± 0.0057 | 
| AG | 2.18 ± 0.34 | 3.8 ± 2.0 | 
| SCD | 1.294 ± 0.040 | 1.719 ± 0.018 | 
| QAB/F | 0.4052 ± 0.0038 | 0.6896 ± 0.0025 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, S.; Li, D.; Mei, L.; Xu, C.; Ye, Z.; Zhang, Q.; Hong, B.; Yang, W.; Wang, Y. DFA-Net: Multi-Scale Dense Feature-Aware Network via Integrated Attention for Unmanned Aerial Vehicle Infrared and Visible Image Fusion. Drones 2023, 7, 517. https://doi.org/10.3390/drones7080517
Shen S, Li D, Mei L, Xu C, Ye Z, Zhang Q, Hong B, Yang W, Wang Y. DFA-Net: Multi-Scale Dense Feature-Aware Network via Integrated Attention for Unmanned Aerial Vehicle Infrared and Visible Image Fusion. Drones. 2023; 7(8):517. https://doi.org/10.3390/drones7080517
Chicago/Turabian StyleShen, Sen, Di Li, Liye Mei, Chuan Xu, Zhaoyi Ye, Qi Zhang, Bo Hong, Wei Yang, and Ying Wang. 2023. "DFA-Net: Multi-Scale Dense Feature-Aware Network via Integrated Attention for Unmanned Aerial Vehicle Infrared and Visible Image Fusion" Drones 7, no. 8: 517. https://doi.org/10.3390/drones7080517
APA StyleShen, S., Li, D., Mei, L., Xu, C., Ye, Z., Zhang, Q., Hong, B., Yang, W., & Wang, Y. (2023). DFA-Net: Multi-Scale Dense Feature-Aware Network via Integrated Attention for Unmanned Aerial Vehicle Infrared and Visible Image Fusion. Drones, 7(8), 517. https://doi.org/10.3390/drones7080517
 
        


 
       