YOLO-DSNet for Small Target Detection
Abstract
1. Introduction
- We propose a lightweight object detection model, YOLO-DSNet, which integrates a dual-stream attention mechanism and multi-scale feature enhancement module to achieve efficient feature learning;
- We develop multi-stage data augmentation strategies for small-scale targets to optimize training sample distribution and improve model generalization performance;
- We perform systematic experiments on the VisDrone2019 dataset [14], demonstrating that YOLO-DSNet maintains real-time performance while achieving a 9.3 percentage point improvement in mAP@0.5 over the baseline YOLOv13n, outperforming multiple mainstream lightweight models.
2. Related Work
2.1. The Development and Evolution of Classical Object Detection Methods
2.2. Limitations of Existing Small Target Detection Methods
2.3. Baseline YOLOv13
3. Methodology
3.1. Image Enhancement for Small Targets
3.2. Small Target Enhancer—MSA-C2f
3.2.1. MSA-C2f Architecture
3.2.2. Pzconv Module
3.2.3. STE Architecture
3.3. Feature Enhancement Module DSAM
4. Experiments
4.1. Experimental Environment
4.2. Data Set
4.3. Comparative Experiment
4.4. Ablation Experiment
4.5. Analysis of Experimental Results
4.6. Comparison Results of Different Models
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DSAM | Dual-Stream Attention Module |
| MSA-C2f | Multi-scale Attention C2f |
| UAV | Unmanned Aerial Vehicle |
| FPN | Feature Pyramid Networks |
| SE | Squeeze-and-Excitation |
| CBAM | Convolutional Block Attention Module |
| ECA | Efficient Channel Attention |
| ASPP | Attentional spatial pooling |
| R-CNN | Region-based Convolutional Neural Networks |
| HRNet | High-resolution network |
| CSPDarknet | Cross Stage Partial Darknet |
| DSConv | Deep Separable Convolution |
| STE | SmallTargetEnhancer |
| MSFP | MultiScaleFeaturePyramid |
| SOFA | SmallObjectFocusAttention |
| SGD | Stochastic Gradient Descent |
| R | Recall rate |
| TP | True Positive |
| FN | False Negative |
| P | Precision |
| FP | False Positive |
| AP | Average Precision |
| mAP | Mean Average Precision |
| GAN | Generative adversarial networks |
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8759–8768. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Computer Vision—ECCV 2018; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 11531–11539. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
- Lei, M.; Li, S.; Wu, Y.; Yang, H.; Peng, L. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception. arXiv 2025, arXiv:2506.17733. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision meets drones: A challenge. arXiv 2018, arXiv:1804.07437. [Google Scholar] [CrossRef]
- Xu, W.; Sun, L.; Zhen, C.; Liu, B.; Yang, Z.; Yang, W. Deep Learning-Based Image Recognition of Agricultural Pests. Appl. Sci. 2022, 12, 12896. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; ACM Digital Library: New York, NY, USA, 2015; pp. 448–456. [Google Scholar]
- Yang, F.; He, M.; Liu, J.; Jin, H. RMH-YOLO: A Refined Multi-Scale Architecture for Small-Target Detection in UAV Aerial Imagery. Sensors 2025, 25, 7088. [Google Scholar] [CrossRef] [PubMed]
- Wu, J.; Tang, X.; Yang, Z.; Hao, K.; Lai, L.; Liu, Y. An Experimental Evaluation of LLM on Image Classification. In Australasian Database Conference; Springer: Singapore, 2024; pp. 506–518. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
- Huang, S.; Xu, Z.; Sun, X.; Wang, Z.; Jin, X.; Li, X.; Zhang, X. DEIM: DETR with Improved Matching for Fast Convergence. arXiv 2024, arXiv:2412.04234. [Google Scholar] [CrossRef]
- Yuan, Z.; Gong, J.; Guo, B.; Wang, C.; Liao, N.; Song, J.; Wu, Q. Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism. Remote Sens. 2024, 16, 4265. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; ACM Digital Library: New York, NY, USA, 2019; pp. 6105–6114. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
- Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]

















| Model | mAP@0.5 | Params (M) | GFLOPs |
|---|---|---|---|
| Faster-RCNN [1] | 25.9 | 41.4 | 292.3 |
| RTDETR-R18 [20] | 33.3 | 20 | 60 |
| RetinaNet-R50-FPN [2] | 27.6 | 36.5 | 210 |
| DEIM-D-Fine-N [21] | 32.2 | 3.73 | 7.12 |
| D-Fine-N [22] | 33.4 | 3.73 | 7.12 |
| YOLOv8n [11] | 25.9 | 3.0 | 8.1 |
| YOLOv10n | 26.1 | 2.28 | 6.5 |
| YOLO12n | 25.9 | 2.56 | 6.3 |
| YOLOv13n [12] | 30.8 | 2.45 | 6.2 |
| YOLOv13s [12] | 29.7 | 9.0 | 20.1 |
| YOLOv13n + MSA-C2f | 37.5 | 2.72 | 11.3 |
| YOLOv13n + DSAM | 36.2 | 2.98 | 13.1 |
| YOLO-DSNet (ours) | 40.1 | 3.23 | 17.8 |
| Group | YOLOV13n | DSAM | MSA-C2f | P% | R% | mAP@0.5 | mAP@0.7 |
|---|---|---|---|---|---|---|---|
| 1 | ✓ | 41.6 | 31.3 | 30.8 | 17.7 | ||
| 2 | ✓ | ✓ | 42.0 | 31.9 | 31.5 | 17.9 | |
| 3 | ✓ | ✓ | ✓ | 44.5 | 36.1 | 35.3 | 20.9 |
| Group | YOLOV13n | DSAM | MSA-C2f | P% | R% | mAP@0.5 | mAP@0.7 |
|---|---|---|---|---|---|---|---|
| 1 | ✓ | 45.7 | 33.7 | 34.0 | 19.8 | ||
| 2 | ✓ | ✓ | 47.6 | 35.8 | 36.2 | 21.4 | |
| 3 | ✓ | ✓ | ✓ | 52.9 | 38.8 | 40.1 | 24.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, H.; He, H.; Zhi, Q.; Yang, Z.; Han, B. YOLO-DSNet for Small Target Detection. Appl. Sci. 2026, 16, 1493. https://doi.org/10.3390/app16031493
Xu H, He H, Zhi Q, Yang Z, Han B. YOLO-DSNet for Small Target Detection. Applied Sciences. 2026; 16(3):1493. https://doi.org/10.3390/app16031493
Chicago/Turabian StyleXu, Haokun, Huangleshuai He, Qike Zhi, Zhengyi Yang, and Bocheng Han. 2026. "YOLO-DSNet for Small Target Detection" Applied Sciences 16, no. 3: 1493. https://doi.org/10.3390/app16031493
APA StyleXu, H., He, H., Zhi, Q., Yang, Z., & Han, B. (2026). YOLO-DSNet for Small Target Detection. Applied Sciences, 16(3), 1493. https://doi.org/10.3390/app16031493

