# PFFNET: A Fast Progressive Feature Fusion Network for Detecting Drones in Infrared Images

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- The influence of the target itself: due to the flight altitude being usually below 500 m, drone targets often have a few to several tens of pixels in an infrared image. In addition, drones usually have a low signal-to-noise ratio (SCR) and are easily submerged in strong noise and cluttered backgrounds [3,4,5]. Therefore, the radiation intensity of the target is lower and it lacks significant morphological features, making target detection in infrared images difficult [6,7].
- (2)
- The contradiction between the target and the detection algorithm: compared to RGB images, detecting drones in infrared images presents more problems, such as the lack of shape and texture features. After filtering and convolution calculations, it is easy to weaken or even lose the representative features of drones (such as wings) [8,9]. Besides, although building shallow networks can improve performance in deep learning algorithms, the contradiction between advanced semantic features and high resolution still cannot be resolved [10].

## 2. Methods

_{a}(a = 2, 3, 4). The low-level spatial position information of the target’s salient features is obtained from b

_{a}(a = 2, 3) by the FSM. Locating the high-frequency response area to reduce the influence of redundant signals on the target position information and output the feature maps f

_{a}(a = 2, 3). The b

_{4}is used as the input of the PFM to output the decoded image p. The PFM is composed of four different pooling structures in parallel to form a pyramid network. The high-frequency response amplitude of deep target features is enhanced and then passes to the FSM after upsampling. The FSM and PFM extract local features of targets and use the progressive fusion method to calculate the phase output feature maps y

_{a}(a = 1, 2, 3). After being processed by the Ghost Model [33], y

_{a}is doubled in size and element-wise added. This process greatly simplifies the task of small target detection by sharing the same weight for all convolution blocks and reduces the parameters of the P algorithm by using element-wise addition while reducing the network inference time.

#### 2.1. Feature Selection Module

_{H}is the deep feature that includes high-level semantic information, X

_{L}is the shallow feature that contains rich image contour information and position information, ⊗ and ⊕ represents element-wise multiplication and addition of vectors, C and L represent the CSM and LSM modules, respectively. Ε is the convolution calculation. Σ represents the activation function of the Rectified Linear Unit. Μ is used to enhance feather representation and is a positive integer.

#### 2.2. Channel Selection Model

_{H}to generate different 3D tensors x

_{hi}. Coupling the global information of the feature map X in its internal channel. Then, a 1 × 1 convolution is used to evaluate the importance of each channel and calculate the corresponding weight. The aggregated output $H\left(X\right)\in {\mathbb{R}}^{C\times H\times W}$ can be represented as:

_{h}

_{1}and x

_{h}

_{2}are the feature vector calculated by average pooling and maximum pooling. W and h represent the width and height of the feature map, respectively. The output of CSM as $C\left({X}_{H}\right)\in {\mathbb{R}}^{C\times H\times W}$ is

_{f}, 1, 1) and (c/r

_{f}, c, 1, 1), respectively. R

_{f}is the channel descent ratio.

#### 2.3. Location Selection Model

_{L}, respectively.

_{l}

_{1}and x

_{l}

_{2}represent the mean and maximum calculation of channel dimension. Perform cascading operations in the channel direction before performing convolution operations. Here, a 7 × 7 convolution can further expand the receptive field of the convolution kernel. It can also capture areas with higher local response amplitudes from the lower-level network. In addition, the accurate position of the drone target in the feature map is ensured. The output $L\left(X\right)\in {\mathbb{R}}^{C\times H\times W}$ can be calculated using the following equation:

#### 2.4. Pooling Pyramid Fusion Module

_{p}is the channel descent ratio. The four feature maps of different sizes are upsampled by bilinear interpolation. Then concatenating with the input feature map in the channel dimension. Finally, a 3 × 3 convolution is performed to output the feature map $O\in {\mathbb{R}}^{C\times H\times W}$, and form a contextual pyramid through five feature maps of the same dimension but different scales.

#### 2.5. Segmentation Head

## 3. Experiments

#### 3.1. Datasets

#### 3.2. Experimental Preparation and Evaluation Method

_{d}) and the false alarm rate (P

_{f}). Precision, recall, target P

_{d}, and P

_{f}are defined as follows:

_{P}represents the target pixels that are correctly matched with the true label by the predicted pixels. F

_{P}represents the background label pixels that are incorrectly predicted as targets. F

_{N}represents the number of target pixels that are incorrectly classified as background. N represents the total number of pixels in the image. F1-score and IoU can be defined as:

#### 3.3. Comparative Experiments

- (1)
- Number of data used for model training

- (2)
- The complexity of the data

#### 3.4. Ablation Experiments

_{f}, r

_{p}). We choose different dimensionality reduction ratios to explore the best way to segment small infrared targets. According to [38], we set r

_{f}= 8. As shown in Table 4, the best result was IoU (73.7%,64.2%) when r

_{f}= 8 and r

_{p}= 4.

_{p}= 1 or 2. Alternatively, choosing a larger ratio of dimensionality reduction may result in the model being unable to learn more complex features, such as r

_{p}= 8, which is clearly not what we expected. Therefore, an appropriate proportion with different kinds of targets should be chosen to enhance the expressive ability of the model.

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Kapoulas, I.K.; Hatziefremidis, A.; Baldoukas, A.K.; Valamontes, E.S.; Statharas, J.C. Small Fixed-Wing UAV Radar Cross-Section Signature Investigation and Detection and Classification of Distance Estimation Using Realistic Parameters of a Commercial Anti-Drone System. Drones
**2023**, 7, 39. [Google Scholar] [CrossRef] - Li, B.; Song, C.; Bai, S.; Huang, J.; Ma, R.; Wan, K.; Neretin, E. Multi-UAV Trajectory Planning during Cooperative Tracking Based on a Fusion Algorithm Integrating MPC and Standoff. Drones
**2023**, 7, 196. [Google Scholar] [CrossRef] - Zhao, M.; Cheng, L.; Yang, X.; Feng, P.; Liu, L.; Wu, N. TBC-Net: A Real-Time Detector for Infrared Small Target Detection Using Semantic Constraint. arXiv
**2019**, arXiv:2001.05852. [Google Scholar] - Hou, Q.; Wang, Z.; Tan, F.; Zhao, Y.; Zheng, H.; Zhang, W. RISTDnet: Robust Infrared Small Target Detection Network. IEEE Geosci. Remote Sens. Lett.
**2022**, 19, 1–5. [Google Scholar] [CrossRef] - Liao, K.-C.; Wu, H.-Y.; Wen, H.-T. Using Drones for Thermal Imaging Photography and Building 3D Images to Analyze the Defects of Solar Modules. Inventions
**2022**, 7, 67. [Google Scholar] [CrossRef] - Li, B.; Yang, Z.-P.; Chen, D.-Q.; Liang, S.-Y.; Ma, H. Maneuvering target tracking of UAV based on MN-DDPG and transfer learning. Def. Technol.
**2021**, 17, 457–466. [Google Scholar] [CrossRef] - Fernández, A.; Usamentiaga, R.; de Arquer, P.; Fernández, M.; Fernández, D.; Carús, J.L.; Fernández, M. Robust Detection, Classification and Localization of Defects in Large Photovoltaic Plants Based on Unmanned Aerial Vehicles and Infrared Thermography. Appl. Sci.
**2020**, 10, 5948. [Google Scholar] [CrossRef] - Chen, F.; Gao, C.; Liu, F.; Zhao, Y.; Zhou, Y.; Meng, D.; Zuo, W. Local Patch Network with Global Attention for Infrared Small Target Detection. IEEE Trans. Aerosp. Electron. Syst.
**2022**, 58, 3979–3991. [Google Scholar] [CrossRef] - Ying, X.; Wang, Y.; Wang, L.; Sheng, W.; Liu, L.; Lin, Z.; Zhou, S. Local Motion and Contrast Priors Driven Deep Network for Infrared Small Target Superresolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2022**, 15, 5480–5495. [Google Scholar] [CrossRef] - Chen, Y.; Li, L.; Liu, X.; Su, X. A Multi-Task Framework for Infrared Small Target Detection and Segmentation. IEEE Trans. Geosci. Remote Sens.
**2022**, 60, 1–9. [Google Scholar] [CrossRef] - Wang, C.; Meng, L.; Gao, Q.; Wang, J.; Wang, T.; Liu, X.; Du, F.; Wang, L.; Wang, E. A Lightweight Uav Swarm Detection Method Integrated Attention Mechanism. Drones
**2022**, 7, 13. [Google Scholar] [CrossRef] - Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense Nested Attention Network for Infrared Small Target Detection. IEEE Trans. Image Process.
**2022**, 32, 1745–1758. [Google Scholar] [CrossRef] - Bai, X.; Zhou, F. Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognit.
**2010**, 43, 2145–2156. [Google Scholar] [CrossRef] - Chang, B.; Meng, L.; Haber, E.; Ruthotto, L.; Begert, D.; Holtham, E. Reversible Architectures for Arbitrarily Deep Residual Neural Networks. Proc. Conf. AAAI Artif. Intell.
**2018**, 32, 2811–2818. [Google Scholar] [CrossRef] - Rivest, J.; Fortin, R. Detection of dim targets in digital infrared imagery by morphological image processing. Opt. Eng.
**1996**, 35, 1886–1893. [Google Scholar] [CrossRef] - Chen, C.L.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A Local Contrast Method for Small Infrared Target Detection. IEEE Trans. Geosci. Remote Sens.
**2014**, 52, 574–581. [Google Scholar] [CrossRef] - Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit.
**2016**, 58, 216–226. [Google Scholar] [CrossRef] - Han, J.; Ma, Y.; Zhou, B.; Fan, F.; Liang, K.; Fang, Y. A Robust Infrared Small Target Detection Algorithm Based on Human Visual System. IEEE Geosci. Remote Sens. Lett.
**2014**, 11, 2168–2172. [Google Scholar] [CrossRef] - Han, J.; Moradi, S.; Faramarzi, I.; Liu, C.; Zhang, H.; Zhao, Q. A Local Contrast Method for Infrared Small-Target Detection Utilizing a Tri-Layer Window. IEEE Geosci. Remote Sens. Lett.
**2019**, 17, 1822–1826. [Google Scholar] [CrossRef] - Zhang, L.; Peng, Z. Infrared Small Target Detection Based on Partial Sum of the Tensor Nuclear Norm. Remote Sens.
**2019**, 11, 382. [Google Scholar] [CrossRef] [Green Version] - Zhu, H.; Liu, S.; Deng, L.; Li, Y.; Xiao, F. Infrared Small Target Detection via Low-Rank Tensor Completion with Top-Hat Regularization. IEEE Trans. Geosci. Remote Sens.
**2020**, 58, 1004–1016. [Google Scholar] [CrossRef] - Dai, Y.; Wu, Y.; Song, Y.; Guo, J. Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values. Infrared Phys. Technol.
**2017**, 81, 182–194. [Google Scholar] [CrossRef] - Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. arXiv
**2019**, arXiv:1809.02983. [Google Scholar] - Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv
**2019**, arXiv:1911.08287. [Google Scholar] [CrossRef] - Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. Eur. Conf. Comput. Vis.
**2016**, 9905, 21–37. [Google Scholar] [CrossRef] [Green Version] - Wang, C.; Shi, Z.; Meng, L.; Wang, J.; Wang, T.; Gao, Q.; Wang, E. Anti-Occlusion UAV Tracking Algorithm with a Low-Altitude Complex Background by Integrating Attention Mechanism. Drones
**2022**, 6, 149. [Google Scholar] [CrossRef] - Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv
**2018**, arXiv:1803.01534. [Google Scholar] - Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric Contextual Modulation for Infrared Small Target Detection. arXiv
**2020**, arXiv:2009.14530. [Google Scholar] - Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Attentional Local Contrast Networks for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens.
**2021**, 59, 9813–9824. [Google Scholar] [CrossRef] - Zhang, T.; Cao, S.; Pu, T.; Peng, Z. AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection. arXiv
**2021**, arXiv:2111.03580. [Google Scholar] - Wang, H.; Zhou, L.; Wang, L. Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8508–8517. [Google Scholar] [CrossRef]
- Cheng, Q.; Wang, H.; Zhu, B.; Shi, Y.; Xie, B. A Real-Time UAV Target Detection Algorithm Based on Edge Computing. Drones
**2023**, 7, 95. [Google Scholar] [CrossRef] - Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. arXiv
**2020**, arXiv:1911.11907. [Google Scholar] - Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv
**2018**, arXiv:1807.11164. [Google Scholar] [CrossRef] - Xiong, Y.; Liu, H.; Gupta, S.; Akin, B.; Bender, G.; Wang, Y.; Kindermans, P.J.; Tan, M.; Singh, V.; Chen, B. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 3824–3833. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv
**2018**, arXiv:1608.06993. [Google Scholar] - Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv
**2018**, arXiv:1807.06521. [Google Scholar] - Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 6230–6239. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 936–944. [Google Scholar] [CrossRef] [Green Version]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape Matters for Infrared Small Target Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 867–876. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv
**2021**, arXiv:2103.14030. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv
**2015**, arXiv:1512.03385. [Google Scholar] - Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv
**2017**, arXiv:1706.05587. [Google Scholar]

**Figure 3.**Structure and composition of CSM and LSM, (

**a**) Channel selection model; (

**b**) Location selection model.

**Figure 6.**The ROC curve compared with other methods on different data sets, (

**a**) SIRST Aug dataset; (

**b**) IRSTD 1k dataset.

**Figure 8.**Visual results of different methods. The close-up attempts to display the zoomed-in targets or detection results. Boxes in red, yellow and blue represent correctly detected targets, miss detected targets and false detected targets, respectively. After-(a,b,c,d) indicates the image(a,b,c,d) after enhancement.

Methods | SIRST Aug (%) | IRSTD 1k (%) | Time on GPU/s | ||||||
---|---|---|---|---|---|---|---|---|---|

Precision | Recall | IoU | F1-Score | Precision | Recall | IoU | F1-Score | ||

ACM | 87.2 | 70.7 | 64.1 | 78.1 | 76.6 | 74.8 | 60.9 | 75.7 | 0.005 |

ALC | 88 | 74.9 | 71.9 | 81 | 80.3 | 73.4 | 62.2 | 76.7 | 0.058 |

MDFA | 81.1 | 65.4 | 56.7 | 72.4 | 66.5 | 70 | 51.7 | 68.2 | 0.064 |

AGPCNet | 87.7 | 74.7 | 67.6 | 80.7 | 76.1 | 78 | 62.6 | 77 | 0.052 |

PFFNet-S | 81 | 89 | 73.7 | 84.8 | 78.4 | 77.9 | 64.2 | 78.2 | 0.011 |

PFFNet-R | 88.7 | 81.3 | 73.7 | 84.9 | 81.8 | 77.4 | 66.1 | 79.6 | 0.023 |

Segmentation Head | FSM | PFM | SIRST Aug(%) | IRSTD 1k(%) | ||
---|---|---|---|---|---|---|

IoU | F1-Score | IoU | F1-Score | |||

47.5 | 55 | 40.4 | 48.7 | |||

√ | 71.3 | 82.4 | 60.3 | 75.3 | ||

√ | √ | 73.4 | 84.7 | 63.4 | 77.6 | |

√ | √ | 71.6 | 83.1 | 61.3 | 76 |

CSM | LSM | SIRST Aug (%) | IRSTD 1k (%) | ||||||
---|---|---|---|---|---|---|---|---|---|

IoU | F1-Score | Precison | Recall | IoU | F1-Score | Precison | Recall | ||

71.3 | 82.4 | 77.1 | 88.6 | 60.3 | 75.3 | 69.8 | 81.7 | ||

√ | 72.5 | 83.6 | 78.6 | 90.3 | 62 | 76.5 | 72.4 | 81.2 | |

√ | 72.8 | 84.3 | 78 | 91.7 | 62.6 | 77.3 | 75.5 | 79.1 |

Reduction Ratios | SIRST Aug (%) | IRSTD 1k (%) | ||||||
---|---|---|---|---|---|---|---|---|

IoU | F1-Score | Precision | Recall | IoU | F1-Score | Precision | Recall | |

(8,1) | 71.8 | 83.6 | 77.8 | 90.3 | 63.8 | 77.9 | 76.6 | 79.2 |

(8,2) | 73.1 | 84.4 | 80.1 | 89.3 | 64 | 78.1 | 75.7 | 80.6 |

(8,4) | 73.7 | 84.8 | 81 | 89 | 64.2 | 78.2 | 78.4 | 77.9 |

(8,8) | 70.6 | 82.8 | 74.8 | 92.7 | 62.8 | 77.1 | 74.2 | 80.3 |

Methods | Advantage | Disadvantage |
---|---|---|

ACM | L-time | H-FA&MD, L-accuracy |

ALC | H-accuracy | H-time, L-recall |

MDFA | H-robustness | H-FA&MD, H-time |

AGPCNet | H-recall | H-time, L-accuracy |

Ours | H-robustness and accuracy | L-FA |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Han, Z.; Zhang, C.; Feng, H.; Yue, M.; Quan, K.
PFFNET: A Fast Progressive Feature Fusion Network for Detecting Drones in Infrared Images. *Drones* **2023**, *7*, 424.
https://doi.org/10.3390/drones7070424

**AMA Style**

Han Z, Zhang C, Feng H, Yue M, Quan K.
PFFNET: A Fast Progressive Feature Fusion Network for Detecting Drones in Infrared Images. *Drones*. 2023; 7(7):424.
https://doi.org/10.3390/drones7070424

**Chicago/Turabian Style**

Han, Ziqiang, Cong Zhang, Hengzhen Feng, Mingkai Yue, and Kangnan Quan.
2023. "PFFNET: A Fast Progressive Feature Fusion Network for Detecting Drones in Infrared Images" *Drones* 7, no. 7: 424.
https://doi.org/10.3390/drones7070424