Infrared Target Detection Based on Interval Sampling Weighting and 3D Attention Head in Complex Scenario
Abstract
:1. Introduction
- We propose a novel interval sampling weighting module in the neck-strengthening feature fusion network; the ISW module converts the feature information from the plane dimension to the channel dimension through interval sampling, which helps to retain more comprehensive positional information. Furthermore, a multidimensional collaborative attention mechanism is introduced to screen and strengthen the features before dimensionality reduction, so that the network focuses on the regions related to the target in the image, thereby expanding the perception range, improving the accuracy, and reducing unnecessary information.
- A 3D-attention-based detection head is proposed. TAHNet boosts model target sensitivity, so that the model may correctly locate and recognize the target. Especially in the face of complex scenes or occlusion, it is helpful for weakening the interference of background noise and optimizing the precision in model detection.
- The C2f is employed as the core feature extraction module of the model. By paralleling more gradient flow branches, the model can obtain more plentiful and hierarchical feature representations, which can accelerate the speed of feature extraction and effectively handle objects of diverse scales and shapes, thereby improving the model’s resilience.
2. Related Studies
2.1. Multidimensional Collaborative Attention
2.2. SimAM Module
3. Proposed Methods and Model Architecture
3.1. Overall Framework of the Proposed Model
3.2. SPPF Module
3.3. Interval Sampling Weighted Module
3.4. TAHNet
4. Experiment and Analysis
4.1. Dataset and Experimental Environment
4.2. Performance Evaluation Metrics
4.3. Analysis of Ablation Experiment
4.4. Comparison of Diverse Models on FLIR Dataset
4.5. Comparison of Various Models on KAIST Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Su, Y.; Tan, W.; Dong, Y.; Xu, W.; Huang, P.; Zhang, J.; Zhang, D. Enhancing concealed object detection in Active Millimeter Wave Images using wavelet transform. Signal Process. 2024, 216, 109303. [Google Scholar] [CrossRef]
- Pramanik, R.; Pramanik, P.; Sarkar, R. Breast cancer detection in thermograms using a hybrid of GA and GWO based deep feature selection method. Expert Syst. Appl. 2023, 219, 119643. [Google Scholar] [CrossRef]
- Kieu, M.; Bagdanov, A.D.; Bertini, M. Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2021, 17, 1–19. [Google Scholar] [CrossRef]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
- Wang, K.; Du, S.; Liu, C.; Cao, Z. Interior attention-aware network for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002013. [Google Scholar] [CrossRef]
- Goddijn-Murphy, L.; Williamson, B.J.; McIlvenny, J.; Corradi, P. Using a UAV thermal infrared camera for monitoring floating marine plastic litter. Remote Sens. 2022, 14, 3179. [Google Scholar] [CrossRef]
- Zhao, X.; Xia, Y.; Zhang, W.; Zheng, C.; Zhang, Z. YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection. Remote Sens. 2023, 15, 3778. [Google Scholar] [CrossRef]
- Jiang, C.; Ren, H.; Ye, X.; Zhu, J.; Zeng, H.; Nan, Y.; Sun, M.; Ren, X.; Huo, H. Object detection from UAV thermal infrared images and videos using YOLO models. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102912. [Google Scholar] [CrossRef]
- Yu, J.; Li, S.; Zhou, S.; Wang, H. MSIA-Net: A Lightweight Infrared Target Detection Network with Efficient Information Fusion. Entropy 2023, 25, 808. [Google Scholar] [CrossRef]
- Sui, L.; Sun, W.; Gao, X. Near-infrared maritime target detection based on Swin-Transformer model. In Proceedings of the 2022 5th International Conference on Signal Processing and Machine Learning, Dalian, China, 4–6 August 2022; pp. 218–225. [Google Scholar]
- Luo, F.; Li, Y.; Zeng, G.; Peng, P.; Wang, G.; Li, Y. Thermal infrared image colorization for nighttime driving scenes with top-down guided attention. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15808–15823. [Google Scholar] [CrossRef]
- Chen, Y.T.; Shi, J.; Ye, Z.; Mertz, C.; Ramanan, D.; Kong, S. Multimodal object detection via probabilistic ensembling. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switerland, 2022; pp. 139–158. [Google Scholar]
- Chen, Y.; Shin, H. Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Appl. Sci. 2020, 10, 809. [Google Scholar] [CrossRef]
- Zhao, C.; Wang, J.; Su, N.; Yan, Y.; Xing, X. Low contrast infrared target detection method based on residual thermal backbone network and weighting loss function. Remote Sens. 2022, 14, 177. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016; Lecture Notes in Computer Science. Springer: Cham, Switerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Kieu, M.; Bagdanov, A.D.; Bertini, M.; Del Bimbo, A. Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switerland, 2020; pp. 546–562. [Google Scholar]
- Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Wang, G.; Dang, Q.; Wei, S.; Du, Y.; et al. PP-YOLOE: An evolved version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar]
- Wen, Z.; Su, J.; Zhang, Y.; Li, M.; Gan, G.; Zhang, S.; Fan, D. A lightweight small object detection algorithm based on improved YOLOv5 for driving scenarios. Int. J. Multimed. Inf. Retr. 2023, 12, 38. [Google Scholar] [CrossRef]
- Du, F.; Jiao, S.; Chu, K. Application research of bridge damage detection based on the improved lightweight convolutional neural network model. Appl. Sci. 2022, 12, 6225. [Google Scholar] [CrossRef]
- Wei, X.; Wei, Y.; Lu, X. HD-YOLO: Using radius-aware loss function for head detection in top-view fisheye images. J. Vis. Commun. Image Represent. 2023, 90, 103715. [Google Scholar] [CrossRef]
- Chen, W.; Li, Y.; Tian, Z.; Zhang, F. 2D and 3D object detection algorithms from images: A Survey. Array 2023, 19, 100305. [Google Scholar] [CrossRef]
- Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Li, X.; Hu, X.; Yang, J. Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv 2019, arXiv:1905.09646. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science. Springer: Cham, Switerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Yu, Y.; Zhang, Y.; Cheng, Z.; Song, Z.; Tang, C. MCA: Multidimensional collaborative attention in deep convolutional neural networks for image recognition. Eng. Appl. Artif. Intell. 2023, 126, 107079. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Wu, T.; Zhou, S.; Pan, H.; Zhang, X.; Zhang, W. An SAR Ship Object Detection Algorithm Based on Feature Information Efficient Representation Network. Remote Sens. 2022, 14, 3489. [Google Scholar] [CrossRef]
- Hwang, S.; Park, J.; Kim, N.; Choi, Y.; So Kweon, I. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1037–1045. [Google Scholar]
- Devaguptapu, C.; Akolekar, N.; Sharma, M.M.; Balasubramanian, V.N. Borrow From Anywhere: Pseudo Multi-Modal Object Detection in Thermal Imagery. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, C.; Hu, Q.; Fu, H.; Zhu, P. Confidence-aware fusion using dempster-shafer theory for multispectral pedestrian detection. IEEE Trans. Multimed. 2022, 25, 3420–3431. [Google Scholar] [CrossRef]
- Jiang, X.; Cai, W.; Yang, Z.; Xu, P.; Jiang, B. IARet: A Lightweight Multiscale Infrared Aerocraft Recognition Algorithm. Arab. J. Sci. Eng. 2022, 47, 2289–2303. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You Only Look One-level Feature. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October 2019–2 November 2019. [Google Scholar] [CrossRef]
- Cao, Y.; Zhou, T.; Zhu, X.; Su, Y. Every Feature Counts: An Improved One-Stage Detector in Thermal Imagery. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), Chengdu, China, 6–9 December 2019. [Google Scholar] [CrossRef]
- Li, S.; Li, Y.; Li, Y.; Li, M.; Xu, X. YOLO-FIRI: Improved YOLOv5 for Infrared Image Object Detection. IEEE Access 2021, 9, 141861–141875. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Name | Configuration |
---|---|
CPU | Intel(R) Core(TM) i7-12700K |
Operating systems | Win 10 |
GPU | NVIDIA GeForce RTX 3060 |
GPU memory size | 12G |
Deep learning framework | Pytorch 1.12.0 |
Programming language | Python 3.7 |
Dependent package | CUDA 11.7 + CUDNN8.3.2 |
Hyperparameter | Initialization |
---|---|
Image size | 640 |
Learning rate | 0.01 |
Momentum | 0.9 |
Weight decay | 0.005 |
Batch size | 16 |
Training epoch | 100 |
ISW | TAH | C2f | Precision | Recall | F1 | mAP |
---|---|---|---|---|---|---|
85.5 | 71.3 | 78 | 80.9 | |||
🗸 | 86.8 | 72.8 | 79 | 81.9 | ||
🗸 | 86.2 | 73.3 | 79 | 82.0 | ||
🗸 | 85.7 | 72.1 | 78 | 81.4 | ||
🗸 | 🗸 | 86.4 | 72.6 | 79 | 81.8 | |
🗸 | 🗸 | 86.5 | 73.2 | 79 | 82.3 | |
🗸 | 🗸 | 85.6 | 74.6 | 80 | 82.8 | |
🗸 | 🗸 | 🗸 | 86.8 | 75.7 | 81 | 84.0 |
Method | Size (MB) | Params (M) | mAP | F1 | R | P |
---|---|---|---|---|---|---|
SSD | 181.3 | 91.7 | 67.9 | 68 | 66.2 | 69.8 |
YOLOv3-tiny | 17.4 | 8.7 | 68.4 | 69 | 61.5 | 78.0 |
YOLOv5n | 3.7 | 1.8 | 80.9 | 78 | 71.3 | 85.5 |
YOLOv7-tiny | 11.7 | 6.0 | 83.1 | 79 | 77.3 | 81.6 |
Ours | 4.4 | 2.1 | 84.0 | 81 | 75.7 | 86.8 |
Method | Person | Bicycle | Car | mAP |
---|---|---|---|---|
PearlGAN [11] | 54.0 | 23.0 | 75.5 | 50.8 |
MMTOD-CG [37] | 50.3 | 63.3 | 70.6 | 61.4 |
CMPD [38] | 69.6 | 59.8 | 78.1 | 69.2 |
IARet [39] | 77.2 | 48.7 | 85.8 | 70.6 |
YOLOF [40] | 67.8 | 68.1 | 79.4 | 71.8 |
FCOS [41] | 69.7 | 67.4 | 79.7 | 72.3 |
BU(AT,T) [3] | 76.1 | 56.1 | 87.0 | 73.1 |
BU(LT,T) [3] | 75.6 | 57.4 | 86.5 | 73.2 |
TermalDet [42] | 78.2 | 60.0 | 85.5 | 74.6 |
YOLO-FIR [43] | 85.2 | 70.7 | 84.3 | 80.1 |
Cascade R-CNN [44] | 77.3 | 84.3 | 79.8 | 80.5 |
YOLOX [45] | 78.2 | 85.4 | 80.2 | 81.2 |
Ours | 85.1 | 76.8 | 90.1 | 84.0 |
Method | Precision | Recall | F1 | mAP | Time/ms |
---|---|---|---|---|---|
YOLOv3 * | 73.5 | 76.9 | 75 | 79.6 | 25 |
YOLOv4 * | 76.9 | 75.8 | 76 | 81.0 | 37 |
YOLO-ACN) * | 76.2 | 87.9 | 82 | 82.3 | 20 |
YOLO-FIR * | 92.1 | 88.1 | 90 | 93.1 | 12 |
YOLOv5n | 94.2 | 89.7 | 92 | 94.8 | 5 |
Ours | 96.6 | 92.2 | 94 | 97.5 | 6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, J.; Wang, H.; Zhou, S.; Li, S. Infrared Target Detection Based on Interval Sampling Weighting and 3D Attention Head in Complex Scenario. Appl. Sci. 2024, 14, 249. https://doi.org/10.3390/app14010249
Yu J, Wang H, Zhou S, Li S. Infrared Target Detection Based on Interval Sampling Weighting and 3D Attention Head in Complex Scenario. Applied Sciences. 2024; 14(1):249. https://doi.org/10.3390/app14010249
Chicago/Turabian StyleYu, Jimin, Hui Wang, Shangbo Zhou, and Shun Li. 2024. "Infrared Target Detection Based on Interval Sampling Weighting and 3D Attention Head in Complex Scenario" Applied Sciences 14, no. 1: 249. https://doi.org/10.3390/app14010249
APA StyleYu, J., Wang, H., Zhou, S., & Li, S. (2024). Infrared Target Detection Based on Interval Sampling Weighting and 3D Attention Head in Complex Scenario. Applied Sciences, 14(1), 249. https://doi.org/10.3390/app14010249