Fine-Grained Feature Perception for Unmanned Aerial Vehicle Target Detection Algorithm
Abstract
:1. Introduction
- (1)
- Introduction of slicing assistance for the first time during both training and inference phases of YOLOv8s-P2, enhancing target pixel information.
- (2)
- Improvements in the backbone network for high-quality feature map extraction:
- a:
- Design of the Large Kernel Spatial Pyramid Pooling Fast module, enabling high-level feature maps to consider long-range dependencies, local dependencies, and channel adaptability, enhancing the model’s understanding of complex scenes.
- b:
- Design the feature extraction module with deformable convolutions, decoupling the learning processes of offset and modulation scalar, enhancing target localization and adaptability to targets with different scales and shapes.
- (3)
- A Random FasterNet Block has been designed and applied to the neck network. It introduces randomness into convolution operations and incorporates non-linear transformations by adding depth-wise convolutions followed by point-wise convolutions, enhancing the model’s robustness. This approach accelerates detection speed while maintaining the original feature representation of convolution operations.
2. Related Work
2.1. Multi-Scale Feature Fusion
2.2. Data Augmentation
2.3. Attention Mechanism
3. Improved Unmanned Aerial Vehicle Target Detection Algorithm
3.1. Data Processing
3.2. Backbone
3.2.1. DC2-DCNv3-C2f
3.2.2. Large Kernel Spatial Pyramid Pooling Fast Module
3.3. Neck
4. Experimental and Analysis
4.1. Dataset Setting
4.2. Experimental Setup
4.3. Evaluation Metrics
4.4. Ablation Study
4.4.1. Impact of DC2-DCNv3
4.4.2. Impact of RFC2f
4.4.3. Impact of Integrated Enhancements
4.5. Comparative Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Butilă, E.V.; Boboc, R.G. Urban traffic monitoring and analysis using unmanned aerial vehicles (uavs): A systematic literature review. Remote Sens. 2022, 14, 620. [Google Scholar] [CrossRef]
- Vasilopoulos, E.; Vosinakis, G.; Krommyda, M.; Karagiannidis, L.; Ouzounoglou, E.; Amditis, A. A comparative study of autonomous object detection algorithms in the maritime environment using a UAV platform. Computation 2022, 10, 42. [Google Scholar] [CrossRef]
- Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13, 2014. pp. 740–755. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Guan, W.; Zou, Y.X.; Zhou, X. Multi-scale object detection with feature fusion and region objectness network. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2596–2600. [Google Scholar]
- Zeng, N.; Wu, P.; Wang, Z.; Li, H.; Liu, W.; Liu, X. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 2022, 71, 3507014. [Google Scholar] [CrossRef]
- Deng, C.; Wang, M.; Liu, L.; Liu, Y.; Jiang, Y. Extended feature pyramid network for small object detection. IEEE Trans. Multimed. 2021, 24, 1968–1979. [Google Scholar] [CrossRef]
- Sun, W.; Dai, L.; Zhang, X.; Chang, P.; He, X. RSOD: Real-time small object detection algorithm in UAV-based traffic monitoring. Appl. Intell. 2022, 52, 8448–8463. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Wang, S. An augmentation small object detection method based on NAS-FPN. In Proceedings of the 2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18–20 December 2020; pp. 213–218. [Google Scholar]
- Ali-Gombe, A.; Elyan, E. MFC-GAN: Class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 2019, 361, 212–221. [Google Scholar] [CrossRef]
- Bosquet, B.; Cores, D.; Seidenari, L.; Brea, V.M.; Mucientes, M.; Del Bimbo, A. A full data augmentation pipeline for small object detection based on generative adversarial networks. Pattern Recognit. 2023, 133, 108998. [Google Scholar] [CrossRef]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for Small Object Detection. In Proceedings of the 9th International Conference on Advances in Computing and Information Technology (ACITY 2019), Sydney, Australia, 21–22 December 2019; Aircc Publishing Corporation: Chennai, India, 2019; pp. 119–133. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Lim, J.-S.; Astrid, M.; Yoon, H.-J.; Lee, S.-I. Small object detection using context and attention. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Republic of Korea, 13–16 April 2021; pp. 181–186. [Google Scholar]
- Zhan, W.; Sun, C.; Wang, M.; She, J.; Zhang, Y.; Zhang, Z.; Sun, Y. An improved Yolov5 real-time detection method for small objects captured by UAV. Soft Comput. 2022, 26, 361–373. [Google Scholar] [CrossRef]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and feature fusion SSD for remote sensing object detection. IEEE Trans. Instrum. Meas. 2021, 70, 5501309. [Google Scholar] [CrossRef]
- Fang, Y.; Liao, B.; Wang, X.; Fang, J.; Qi, J.; Wu, R.; Niu, J.; Liu, W. You only look at one sequence: Rethinking transformer in vision through object detection. Adv. Neural Inf. Process. Syst. 2021, 34, 26183–26197. [Google Scholar]
- Zhang, J.; Xia, K.; Huang, Z.; Wang, S.; Akindele, R.G. ETAM: Ensemble transformer with attention modules for detection of small objects. Expert Syst. Appl. 2023, 224, 119997. [Google Scholar] [CrossRef]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 966–970. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 22 August 2023; pp. 14408–14419. [Google Scholar]
- Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–19 June 2023; pp. 12021–12031. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Hong, M.; Li, S.; Yang, Y.; Zhu, F.; Zhao, Q.; Lu, L. Sspnet: Scale selection pyramid network for tiny person detection from uav images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8018505. [Google Scholar] [CrossRef]
- Shahin, A.I.; Almotairi, S. SVA-SSD: Saliency visual attention single shot detector for building detection in low contrast high-resolution satellite images. PeerJ Comput. Sci. 2021, 7, e772. [Google Scholar] [CrossRef] [PubMed]
- Chai, E.; Chen, L.; Hao, X.; Zhou, W. Mitigate the scale imbalance via multi-scale information interaction in small object detection. Neural Comput. Appl. 2024, 36, 1699–1712. [Google Scholar] [CrossRef]
- Ruiz-Ponce, P.; Ortiz-Perez, D.; Garcia-Rodriguez, J.; Kiefer, B. Poseidon: A data augmentation tool for small object detection datasets in maritime environments. Sensors 2023, 23, 3691. [Google Scholar] [CrossRef]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Wan, X.; Yu, J.; Tan, H.; Wang, J. LAG: Layered objects to generate better anchors for object detection in aerial images. Sensors 2022, 22, 3891. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems 27 2014, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems 28 2015, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Vedaldi, A. Gather-excite: Exploiting feature context in convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 31 2018, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global second-order pooling convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3024–3033. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Lee, H.J.; Kim, H.E.; Nam, H. Srm: A style-based recalibration module for convolutional neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1854–1862. [Google Scholar]
- Yang, Z.; Zhu, L.; Wu, Y.; Yang, Y. Gated channel transformation for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11794–11803. [Google Scholar]
- Liu, X.; Leng, C.; Niu, X.; Pei, Z.; Cheng, I.; Basu, A. Find small objects in UAV images by feature mining and attention. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6517905. [Google Scholar] [CrossRef]
- Yang, L.; Zhong, J.; Zhang, Y.; Bai, S.; Li, G.; Yang, Y.; Zhang, J. An improving faster-RCNN with multi-attention ResNet for small target detection in intelligent autonomous transport with 6G. IEEE Trans. Intell. Transp. Syst. 2022, 24, 7717–7725. [Google Scholar] [CrossRef]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Wu, J.; Cai, N.; Chen, W.; Wang, H.; Wang, G. Automatic detection of hardhats worn by construction personnel: A deep learning approach and benchmark dataset. Autom. Constr. 2019, 106, 102894. [Google Scholar] [CrossRef]
- Jiang, L.; Yuan, B.; Du, J.; Chen, B.; Xie, H.; Tian, J.; Yuan, Z. MFFSODNet: Multi-Scale Feature Fusion Small Object Detection Network for UAV Aerial Images. IEEE Trans. Instrum. Meas. 2024, 73, 5015214. [Google Scholar] [CrossRef]
- Wang, M.; Yang, W.; Wang, L.; Chen, D.; Wei, F.; KeZiErBieKe, H.; Liao, Y. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 2023, 90, 103752. [Google Scholar] [CrossRef]
- Ma, Y.; Chai, L.; Jin, L.; Yu, Y.; Yan, J. AVS-YOLO: Object detection in aerial visual scene. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2250004. [Google Scholar] [CrossRef]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Process. 2020, 30, 1556–1569. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Zhang, H.; Lu, X. Adaptive Feature Fusion for Small Object Detection. Appl. Sci. 2022, 12, 11854. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Chen, Z.; Ji, H.; Zhang, Y.; Zhu, Z.; Li, Y. High-Resolution Feature Pyramid Network for Small Object Detection on Drone View. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 475–489. [Google Scholar] [CrossRef]
Param | Setup |
---|---|
Epoch | 300 |
Batch | 8 |
Optimizer | SGD |
Initial Learning Rate. | 1 × 10−2 |
Final learning Rate | 1 × 10−4 |
NMS IoU | 0.7 |
Base weight decay | 0.0005 |
Close Mosaic | 0 |
Optimizer momentum | 0.937 |
Warmup epochs | 5 |
Patience | 50 |
Module | Precision [%] | Recall [%] | mAP_50 [%] | mAP_50_95 [%] |
---|---|---|---|---|
C2f | 75.1 | 65.4 | 69.5 | 46.1 |
DCNv3-C2f | 76.7 | 67.9 | 71.0 | 48.2 |
DC2-DCNv3-C2f | 77.5 | 68.5 | 72.2 | 49.2 |
Module | Precision [%] | Recall [%] | mAP_50 [%] | mAP_50_95 [%] | GFLOPs | FPS |
---|---|---|---|---|---|---|
C2f | 75.1 | 65.4 | 69.5 | 46.1 | 36.7 | 161.3 |
FC2f | 73.6 | 63.4 | 66.9 | 44.5 | 32.5 | 175.6 |
RFC2f | 74.9 | 66.0 | 69.4 | 46.5 | 32.7 | 170.7 |
Model | Precision [%] | Recall [%] | mAP_50 [%] | mAP_50_95 [%] |
---|---|---|---|---|
Baseline | 48.6 | 37.3 | 36.2 | 20.8 |
a | 49.9 | 38.6 | 38.0 | 21.7 |
b | 49.7 | 38.1 | 37.6 | 21.4 |
c | 50.1 | 38.9 | 38.3 | 21.8 |
d | 48.5 | 37.6 | 36.1 | 20.9 |
a + b + c | 51.6 | 39.0 | 40.9 | 23.0 |
a + b + d | 50.6 | 38.8 | 38.5 | 22.3 |
a + b + c + d (our) | 51.4 | 39.2 | 40.7 | 23.1 |
Model | GFLOPs | FPS | Params [M] |
---|---|---|---|
Baseline | 36.7 | 137.0 | 10.6 |
a + b + c | 36.1 | 128.4 | 11.9 |
d | 32.7 | 148.2 | 9.2 |
a + b + c + d (our) | 31.8 | 136.4 | 10.4 |
Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning -Tricycle | Bus | Motorcycle | |
---|---|---|---|---|---|---|---|---|---|---|
Baseline | 0.69 | 0.85 | 0.9 | 0.26 | 0.67 | 0.63 | 0.75 | 0.86 | 0.51 | 0.64 |
Our Method Decrease [%] | 0.65 (5.8) | 0.82 (3.5) | 0.86 (4.4) | 0.24 (7.7) | 0.64 (4.5) | 0.56 (11.1) | 0.71 (5.3) | 0.84 (2.3) | 0.49 (3.9) | 0.59 (7.8) |
Methods | Image Size | mAP_50_95 | mAP_50 | FPS | GPU |
---|---|---|---|---|---|
Our | 640 × 640 | 29.8 | 48.3 | 136.4 | GeForce RTX 2080Ti |
YOLOv8m | 640 × 640 | 26.8 | 43.8 | 75.0 | GeForce RTX 2080Ti |
MFFSODNet [52] | 640 × 640 | - | 45.5 | 70 | TITAN RTX |
FE-YOLOv5 [53] | 640 × 640 | 21.0 | 37.0 | - | GeForce RTX 2080Ti |
AVS-YOLO [54] | 416 × 640 | 22.19 | 43.4 | 31.8 | GeForce RTX 2080Ti |
FPN+SARSA+TDA+LSRN [55] | 600 × 1000 | 25.8 | 51.5 | 1.3 | TITAN Xp |
MMF-YOLO [56] | 640 × 640 | - | 42.2 | - | GeForce RTX 3080Ti |
Li et al. [57] | 640 × 640 | - | 42.2 | 167.0 | GeForce RTX 3090Ti |
HR-FPN [58] | 1024 × 1024 | - | 50.8 | 23.9 | GeForce RTX 3090 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Zhu, M.; Tao, R.; Ren, H. Fine-Grained Feature Perception for Unmanned Aerial Vehicle Target Detection Algorithm. Drones 2024, 8, 181. https://doi.org/10.3390/drones8050181
Liu S, Zhu M, Tao R, Ren H. Fine-Grained Feature Perception for Unmanned Aerial Vehicle Target Detection Algorithm. Drones. 2024; 8(5):181. https://doi.org/10.3390/drones8050181
Chicago/Turabian StyleLiu, Shi, Meng Zhu, Rui Tao, and Honge Ren. 2024. "Fine-Grained Feature Perception for Unmanned Aerial Vehicle Target Detection Algorithm" Drones 8, no. 5: 181. https://doi.org/10.3390/drones8050181
APA StyleLiu, S., Zhu, M., Tao, R., & Ren, H. (2024). Fine-Grained Feature Perception for Unmanned Aerial Vehicle Target Detection Algorithm. Drones, 8(5), 181. https://doi.org/10.3390/drones8050181