A Dense Small Object Detection Algorithm Based on a Global Normalization Attention Mechanism
Abstract
:1. Introduction
- (1)
- A new Global Normalization Attention Mechanism is proposed, which can suppress irrelevant information in the input features, enhancing the richness of detail and semantic information in the out features;
- (2)
- We demonstrate that GNYL can efficiently handle UAV images, enhancing robustness for small object detection;
- (3)
- We propose a new small object detection algorithm, GNYL, which can be applied more efficiently in practice.
2. GNYL (Global Normalization Attention Mechanism You Only Look Once)
- (1)
- Adding the Global Normalization Attention Mechanism (GNAM): The GNAM first processes the input features using a channel attention unit, which utilizes scale factors from batch normalization (BN) to highlight features based on the variance measurement of the training model weights and then inputs them into a spatial attention unit, which incorporates spatial information using two convolutional layers, further emphasizing the spatial information. The interaction ability between the channel and spatial information is improved, fully exploits effective information for classification and localization tasks within the input features, and enhances the richness of detail in the GNAM’s output features. In order to increase the detail richness of the GNAM’s output features, the GNAM is added to the penultimate layer with rich semantic information in the backbone network;
- (2)
- The network architecture design, including the feature enhancement network and detection head design. Adding larger-scale detection heads increases the number and density of the anchor boxes, thereby improving the localization accuracy of small objects. A high-resolution feature enhancement network is employed to preserve more detailed information. Finally, utilizing large-scale detection heads with large and dense anchor boxes enhances the fitting between the predicted boxes and target boxes, thereby improving the localization accuracy.
2.1. Global Normalization Attention Mechanism
2.2. High-Resolution Feature Enhancement Network
3. Experiments
3.1. Datasets and Implementation Details
- (1)
- Random changes in object size and shape;
- (2)
- The object is often obstructed by other objects, resulting in only partial object information being visible;
- (3)
- Images typically have large scales and high resolutions, requiring higher computational power;
- (4)
- Contains various types of objects and complex background environments.
3.2. Ablation Studies
- (1)
- The scheme combining feature enhancement networks has significant accuracy advantages in detecting dense small objects in aerial photography;
- (2)
- The GNAM can fully mine the channel and spatial information of input features, effectively improving the utilization rate of input feature information.
3.3. Comparison of Detection Results of Different Object Detection Algorithms on VisDrone2019
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jiang, B.; Qu, R.K.; Li, Y.D.; Li, C. Object detection in UAV imagery based on deep learning: Review. Acta Aeronaut. Astronaut. Sin. 2021, 42, 137–151. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot MultiBox detector. In Proceedings, Part I 14, Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, CO, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Sanghyun, W.; Jongchan, P.; Joon-Young, L.; In, S.K. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based attention module. arXiv 2021, arXiv:2111.12419. [Google Scholar]
- Kim, M.; Jeong, J.; Kim, S. ECAP-YOLO: Efficient channel attention pyramid YOLO for small object detection in aerial image. Remote Sens. 2021, 13, 4851. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Chen, Y.; Zhang, P.; Li, Z.; Li, Y.; Zhang, X.; Meng, G.; Jia, J. Stitcher: Feedback-driven data provider for object detection. arXiv 2020, arXiv:2004.12432, 12. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, L.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Yu, W.; Yang, T.; Chen, C. Towards resolving the challenge of long-tail distribution in UAV images for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3258–3267. [Google Scholar]
- Ali, S.; Siddique, A.; Ateş, H.F.; Güntürk, B.K. Improved YOLOv4 for aerial object detection. In Proceedings of the 29th Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 9–11 June 2021; pp. 1–4. [Google Scholar]
- Cao, Y.; He, Z.; Wang, L.; Wang, W.; Yuan, Y.; Zhang, D.; Zhang, D.; Zhang, J.; Zhu, P.; Liu, M.; et al. VisDrone-DET2021: The vision meets drone object detection challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2847–2854. [Google Scholar]
Model | Parameters | FLOPs | Head Size |
---|---|---|---|
YOLOv8l | 43.6 M | 164.9 G | 20\40\80 |
YOLOv8l_rec | 53.5 M | 217.4 G | 40\80\160 |
Method | mAP0.5 | mAP0.95 | Parameter | FLOPs |
---|---|---|---|---|
Baseline | 41.6 | 25.0 | 43.6 M | 164.9 G |
YOLOv8l + GNAM | 42.5 | 25.8 | 45.2 M | 166.1 G |
YOLOv8l + SE | 41.5 | 25.1 | 43.7 M | 165.5 G |
YOLOv8l + CA | 41.6 | 25.3 | 43.7 M | 165.5 G |
YOLOv8l + CBAM | 41.6 | 25.2 | 43.6 M | 165.4 G |
YOLOv8l + GAM | 41.8 | 25.2 | 48.8 M | 194.1 G |
YOLOv8l + NAM | 42.1 | 25.5 | 48.2 M | 189.9 G |
YOLOv8l_rec | 48.0 | 29.5 | 53.5 M | 217.4 G |
GNYL (Ours) | 48.8 | 30.0 | 59.9 M | 222.6 G |
Method | Backbone | Object Category | mAP0.5 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pedestrian | People | Bicycle | Car | Van | Truck | Tri | Awn-tri | Bus | Motor | |||
Faster R-CNN [21] | ResNet-50 | 21.4 | 15.6 | 6.7 | 51.7 | 29.5 | 19.0 | 13.1 | 7.7 | 31.4 | 20.7 | 21.7 |
Faster R-CNN [21] | ResNet-101 | 20.9 | 14.8 | 7.3 | 51.0 | 29.7 | 19.5 | 14.0 | 8.8 | 30.5 | 21.2 | 21.8 |
YOLOv4 [22] | CSPDarknet | 24.8 | 12.6 | 8.6 | 64.3 | 22.4 | 22.7 | 11.4 | 7.6 | 44.3 | 21.7 | 30.7 |
CenterNet [23] | Hourglass-104 | 33.3 | 15.2 | 12.1 | 55.2 | 40.5 | 34.1 | 29.2 | 21.6 | 42.2 | 27.5 | 31.1 |
HR-Cascade++ [23] | HRNet-W40 | 32.6 | 17.3 | 11.1 | 54.7 | 42.4 | 35.3 | 32.7 | 24.1 | 46.5 | 28.2 | 32.5 |
CDNet [23] | ResNeXt-101 | 35.6 | 19.2 | 13.8 | 55.8 | 42.1 | 38.2 | 33.0 | 25.4 | 49.5 | 29.3 | 34.2 |
YOLOv5 | CSPDarknet | 44.4 | 36.7 | 18.5 | 74.2 | 37.7 | 37.4 | 25.3 | 12.7 | 48.6 | 43.3 | 37.9 |
GNYL (Ours) | CSPDarknet | 57.9 | 46.3 | 22.0 | 86.2 | 53.1 | 42.3 | 37.6 | 21.0 | 64.5 | 57.4 | 48.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, H.; Zhu, Y.; Wang, L. A Dense Small Object Detection Algorithm Based on a Global Normalization Attention Mechanism. Appl. Sci. 2023, 13, 11760. https://doi.org/10.3390/app132111760
Wu H, Zhu Y, Wang L. A Dense Small Object Detection Algorithm Based on a Global Normalization Attention Mechanism. Applied Sciences. 2023; 13(21):11760. https://doi.org/10.3390/app132111760
Chicago/Turabian StyleWu, Huixin, Yang Zhu, and Liuyi Wang. 2023. "A Dense Small Object Detection Algorithm Based on a Global Normalization Attention Mechanism" Applied Sciences 13, no. 21: 11760. https://doi.org/10.3390/app132111760
APA StyleWu, H., Zhu, Y., & Wang, L. (2023). A Dense Small Object Detection Algorithm Based on a Global Normalization Attention Mechanism. Applied Sciences, 13(21), 11760. https://doi.org/10.3390/app132111760