Small-Target Detection Algorithm Based on STDA-YOLOv8
Abstract
:1. Introduction
- (1)
- The STDA-YOLOv8 algorithm is proposed to significantly improve the performance of small-target detection by combining the Context Augmentation Module (CAM) and Feature Refinement Module (FRM).
- (2)
- A new data enhancement method (Copy–Reduce–Paste) is proposed, which effectively solves the problem of an insufficient number of small targets in the training data and makes the training process more balanced.
- (3)
- The experiments show that the mAP of the STDA-YOLOv8 model on the VisDrone dataset and the PASCAL VOC dataset reaches 93.5% and 94.2%, respectively, which is 5.3% and 5.7% higher, respectively, than that of the original YOLOv8, which reflects a significant performance advantage, and there is no significant increase in the model complexity.
2. Related Work
2.1. Target Detection Based on Deep Learning
2.2. Multi-Scale Feature Fusion
3. Methodology
3.1. YOLOv8 Model
3.2. YOLOv8 Model Enhancements
3.2.1. CAM Module
3.2.2. FRM
- (1)
- Unlike traditional FPN and PANet, the CAM proposed in this paper effectively solves the problem of insufficient capturing of small-target contexts by traditional structures through multi-scale expansion convolution;
- (2)
- The FRM performs adaptive fusion in both channel and spatial dimensions, which solves the problem of scale feature conflict in traditional feature fusion methods;
- (3)
- The theoretical contribution lies not only in the innovation of module structure but also in the proposed data enhancement method (Copy–Reduce–Paste) which effectively solves the training data imbalance problem with clear theoretical and practical application implications.
4. Results and Analysis
4.1. Experimental Dataset and Experimental Setup
4.1.1. Experimental Dataset
4.1.2. Data Enhancement Method: Copy–Reduce–Paste (CRP)
4.1.3. Experimental Setup
4.2. Evaluation Metrics
4.3. Results and Comparison
4.3.1. Ablation Study
4.3.2. Comparative Experiment
4.3.3. Evaluation of Computational Efficiency on Different Hardware Platforms
- (1)
- CPU platform: Intel i7-12700K with 16 GB of RAM;
- (2)
- GPU platforms: NVIDIA RTX 2060 with 8 GB of VRAM; NVIDIA RTX 3080 with 10 GB of VRAM.
5. Summary and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sui, J.; Chen, D.; Zheng, X.; Wang, H. A New Algorithm for Small Target Detection From the Perspective of Unmanned Aerial Vehicles. IEEE Access 2024, 12, 29690–29697. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective Search for Object Recognition. Int. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9905. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for small object detection. arXiv 2019, arXiv:1902.07296. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A Survey and Performance Evaluation of Deep Learning Methods for Small Object Detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Zhang, F.; Jiao, L.; Li, L.; Liu, F.; Liu, X. MultiResolution Attention Extractor for Small Object Detection. arXiv 2020, arXiv:2006.05941. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Ghiasi, G.; Lin, T.-Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, P.; Li, Z.; Li, Y.; Zhang, X.; Meng, G.; Xiang, S.; Sun, J.; Jia, J. Stitcher: Feedback-driven Data Provider for Object Detection. arXiv 2020, arXiv:2004.12432. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020. [Google Scholar] [CrossRef]
- Wang, J.; Yu, N. UTD-Yolov5: A Real-time Underwater Targets Detection Method based on Attention Improved YOLOv5. arXiv 2022, arXiv:2207.00837. [Google Scholar]
- Feng, J.; Wang, J.; Qin, R. Lightweight detection network for arbitrary-oriented vehicles in UAV imagery via precise positional information encoding and bidirectional feature fusion. Int. J. Remote Sens. 2023, 44, 4529–4558. [Google Scholar] [CrossRef]
- Wang, C.; Sun, W.; Wu, H.; Zhao, C.; Teng, G.; Yang, Y.; Du, P. A Low-Altitude Remote Sensing Inspection Method on Rural Living Environments Based on a Modified YOLOv5s-ViT. Remote Sens. 2022, 14, 4784. [Google Scholar] [CrossRef]
- Zhai, X.; Wei, H.; He, Y.; Shang, Y.; Liu, C. Underwater Sea Cucumber Identification Based on Improved YOLOv5. Appl. Sci. 2022, 12, 9105. [Google Scholar] [CrossRef]
- Xu, J.; Zou, Y.; Tan, Y.; Yu, Z. Chip Pad Inspection Method Based on an Improved YOLOv5 Algorithm. Sensors 2022, 22, 6685. [Google Scholar] [CrossRef]
- Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A Review on YOLOv8 and Its Advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; Jacob, I.J., Piramuthu, S., Falkowski-Gilski, P., Eds.; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
- Ashraf, I.; Hur, S.; Kim, G.; Park, Y. Analyzing Performance of YOLOx for Detecting Vehicles in Bad Weather Conditions. Sensors 2024, 24, 522. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Chai, W.; Han, D.; Zhou, H.; Wang, S.; Zhou, F. FDW-YOLOv8: A Lightweight Unmanned Aerial Vehicle Small Target Detection Algorithm Based on Enhanced YOLOv8. In Proceedings of the 2024 IEEE International Workshop on Radio Frequency and Antenna Technologies (iWRF&AT), Shenzhen, China, 31 May–3 June 2024; pp. 368–373. [Google Scholar] [CrossRef]
- Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
- Cai, W.; Liang, Y.; Liu, X.; Feng, J.; Wu, Y. MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting. arXiv 2023, arXiv:2401.00423. [Google Scholar] [CrossRef]
- Zeng, T.; Wu, B.; Zhou, J.; Davidson, I.; Ji, S. Recurrent Encoder-Decoder Networks for Time-Varying Dense Prediction. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; pp. 1165–1170. [Google Scholar] [CrossRef]
- Alshawi, R.; Hoque, T.; Ferdaus, M.; Abdelguerfi, M.; Niles, K.N.; Prathak, K.; Tom, J.; Klein, J.; Mousa, M.; Lopez, J.J. Dual Attention U-Net with Feature Infusion: Pushing the Boundaries of Multiclass Defect Segmentation. arXiv 2023, arXiv:2312.14053. [Google Scholar]
- Jiao, J.; Tang, Y.M.; Lin, K.Y.; Gao, Y.; Ma, A.J.; Wang, Y.; Zheng, W.S. DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition. IEEE Trans. Multimed. 2023, 25, 8906–8919. [Google Scholar] [CrossRef]
- Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized Feature Pyramid for Object Detection. IEEE Trans. Image Process. 2023, 32, 4341–4354. [Google Scholar] [CrossRef] [PubMed]
- Benjumea, A.; Teeti, I.; Cuzzolin, F.; Bradley, A. YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv 2021, arXiv:2112.11798. [Google Scholar] [CrossRef]
- Spadaro, G.; Vetrano, G.; Penna, B.; Serena, A.; Fiandrotti, A. Towards One-Shot PCB Component Detection with YOLO. In Image Analysis and Processing-ICIAP 2023 Workshops, Proceeding of the 22nd International Conference on Image Analysis and Processing, Udine, Italy, 11–15 September 2023; Springer: Cham, Switzerland, 2024; Volume 14365, pp. 51–61. [Google Scholar]
Dataset | Scene Type | Quantity | Quantity | Proportion | Proportion | P/% | P/% | R/% | R/% | /% | /% |
---|---|---|---|---|---|---|---|---|---|---|---|
(Before) | (After) | (Before) | (After) | (Before) | (After) | (Before) | (After) | (Before) | (After) | ||
VisDrone | Urban scene | 200 | 400 | 1:5 | 1:2 | 77.6 | 82.1 | 83.3 | 85.3 | 88.2 | 89.6 |
VisDrone | Rural scene | 150 | 300 | 1:4 | 1:2 | 79.8 | 84.2 | 84.2 | 86.8 | 89 | 90 |
VisDrone | Sparse scene | 120 | 240 | 1:6 | 1:3 | 81 | 86.5 | 84.7 | 88.4 | 89.8 | 91.3 |
VisDrone | Crowded scene | 180 | 360 | 1:5 | 1:2 | 80.2 | 85.9 | 83.5 | 87.6 | 90.5 | 91.9 |
PASCAL VOC | Universal scene | 200 | 400 | 1:3 | 1:1.5 | 82.3 | 86.8 | 85.4 | 88.5 | 90 | 92.2 |
Overall average | 850 | 1700 | 1:4.6 | 1:2.1 | 80.2 | 85.1 | 84.2 | 87.3 | 89.5 | 91 |
Parameter | Value |
---|---|
Epochs | 600 |
Patience | 30 |
Batch | 8 |
Imgsz | 8 |
Workers | 4 |
Optimizer | SGD |
Weight_decay | 0.0005 |
Irf | 0.05 |
Momentum | 0.937 |
Warmuop_momentum | 0.8 |
Close_mosaic | 10 |
Patience | 50 |
YOLOv8 | CAM | CAM | CAM | FRM | VisDrone | VisDrone | VisDrone | PASCAL VOC | PASCAL VOC | PASCAL VOC | Params/ | Model Size/ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(1, 3, 5) | (2, 4, 6) | (1, 2, 3) | P/% | R/% | /% | P/% | R/% | /% | MB | ||||
√ | 77.6 | 83.3 | 88.2 | 81 | 84 | 88.5 | 3.1 | 8.1 | 6.2 | ||||
√ | √ | 82.1 | 85.3 | 89.6 | 85.5 | 87 | 90.3 | 2.5 | 7.4 | 6.3 | |||
√ | √ | 83.5 | 87 | 90.2 | 86.8 | 87.5 | 91 | 2.6 | 7.6 | 6.4 | |||
√ | √ | 84.2 | 86.8 | 90 | 86.5 | 87.2 | 91.2 | 2.5 | 7.5 | 6.4 | |||
√ | √ | √ | 85.9 | 88 | 91.3 | 87.9 | 88.5 | 92 | 2.6 | 7.6 | 6.3 | ||
√ | √ | √ | 87.2 | 89.2 | 92 | 88.5 | 89.5 | 92.8 | 2.6 | 7.5 | 6.3 | ||
√ | √ | √ | 87 | 88.9 | 91.8 | 88.2 | 89 | 92.5 | 2.5 | 7.4 | 6.2 | ||
√ | √ | √ | √ | 88.6 | 91 | 92.8 | 89.5 | 90.5 | 93.2 | 2.6 | 7.5 | 6.2 | |
√ | √ | √ | √ | 88.3 | 90.7 | 92.5 | 89.1 | 90 | 93.3 | 2.6 | 7.5 | 6.2 | |
√ | √ | √ | √ | √ | 89.6 | 93.8 | 93.5 | 90.3 | 92.5 | 94.2 | 2.6 | 7.5 | 6.1 |
Model (Model | VisDrone | VisDrone | VisDrone | PASCAL VOC | PASCAL VOC | PASCAL VOC | Params/ | |
---|---|---|---|---|---|---|---|---|
Size: Small) | P/% | R/% | /% | P/% | R/% | /% | ||
YOLOv5 | 77.8 | 78.6 | 84.3 | 80.9 | 83.4 | 85.5 | 2.4 | 7.5 |
QueryDet | 72.1 | 78.2 | 82.5 | 80.5 | 82.6 | 83.8 | 4.1 | 10.8 |
YOLOv7 | 76.8 | 83.2 | 85.6 | 80.7 | 83.1 | 85.8 | 5.7 | 11.3 |
YOLOv8 | 77.6 | 83.3 | 88.2 | 81 | 84 | 88.5 | 3.1 | 8.1 |
literatures [37] | 80.5 | 85.2 | 85.7 | 82.3 | 85.4 | 88.9 | 4.3 | 9.7 |
literatures [38] | 78.9 | 82.1 | 83.9 | 81.2 | 83.6 | 86.7 | 4.7 | 10.2 |
STDA-YOLOv8 | 89.6 | 93.8 | 93.5 | 90.3 | 92.5 | 94.2 | 2.6 | 7.5 |
Hardware Platform | Model | Average Reasoning Time (ms) | Memory Usage (MB) | GPU Video Memory Usage (MB) |
---|---|---|---|---|
Intel i7-12700K | YOLOv8 | 420 | 2900 | |
Intel i7-12700K | STDA-YOLOv8 | 510 | 3100 | |
NVIDIA RTX 2060 | YOLOv8 | 38 | 2200 | |
NVIDIA RTX 2060 | STDA-YOLOv8 | 48 | 2600 | |
NVIDIA RTX 3080 | YOLOv8 | 21 | 1600 | |
NVIDIA RTX 3080 | STDA-YOLOv8 | 29 | 1800 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, C.; Jiang, S.; Cao, X. Small-Target Detection Algorithm Based on STDA-YOLOv8. Sensors 2025, 25, 2861. https://doi.org/10.3390/s25092861
Li C, Jiang S, Cao X. Small-Target Detection Algorithm Based on STDA-YOLOv8. Sensors. 2025; 25(9):2861. https://doi.org/10.3390/s25092861
Chicago/Turabian StyleLi, Cun, Shuhai Jiang, and Xunan Cao. 2025. "Small-Target Detection Algorithm Based on STDA-YOLOv8" Sensors 25, no. 9: 2861. https://doi.org/10.3390/s25092861
APA StyleLi, C., Jiang, S., & Cao, X. (2025). Small-Target Detection Algorithm Based on STDA-YOLOv8. Sensors, 25(9), 2861. https://doi.org/10.3390/s25092861