GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery
Abstract
1. Introduction
2. Related Work
- Existing UAV small-object detectors often lack lightweight designs, making them unsuitable for real-time deployment on aerial platforms.
- Prior studies typically focus on either global semantic context or local detail features, but effective integration of both remains underexplored.
- Robustness under practical UAV conditions, such as electromagnetic interference, image degradation, and onboard hardware constraints, is rarely investigated.
- Detail-preserving modules have received limited attention, and existing approaches often increase parameters significantly, conflicting with lightweight requirements.
- We propose GLDS-YOLO, a lightweight detection model tailored for UAV small-object detection, integrating four complementary modules: GSA, LSKA-SPPF, DCNv4, and SMSDE.
- We design SMSDE, a novel lightweight edge-detail enhancement module that reduces parameters by nearly one-third compared with MSDE while maintaining detection accuracy.
- We conduct extensive experiments on VisDrone2019 and DOTA1.0, demonstrating consistent improvements over state-of-the-art baselines.
- We provide comprehensive ablation studies, including module-wise contributions and a direct comparison between MSDE and SMSDE.
- We discuss UAV deployment considerations, highlighting both the potential applications and the current limitations of our approach.
3. Method
3.1. Small-Object Perception Module
3.2. Large-Kernel Attention for Multi-Scale Fusion Module
3.3. Deformable Convolution v4 Module
- Dynamic Weight Enhancement—The traditional softmax normalization is removed, allowing the convolution weights to take values in This broader dynamic range improves the expressiveness of the convolution operation.
- Memory Access and Thread Allocation Optimization—DCNv4 reduces redundant memory operations by optimizing access patterns for deformable sampling and introduces an adaptive thread allocation strategy. The number of threads for parallel computation is determined as
3.4. Small-Object-Enhanced Multi-Scale and Structure Detail Enhancement Module
- (1)
- Lightweight input feature compression
- (2)
- Multi-scale Edge Detail Enhancement
- (3)
- Feature Fusion and Residual Optimization Strategy
- (4)
- Efficient Feature Upsampling Strategy
4. Experimental Results
4.1. Datasets and Experimental Setup
4.2. Training Process and Convergence Analysis
- TP: The number of correctly detected targets.
- FP: The number of incorrectly detected targets.
- FN: The number of missed targets.
- mAP@0.5 represents the average detection precision when the Intersection over Union (IoU) threshold is fixed at 0.5;
- mAP@0.5:0.95 is calculated by averaging AP values over IoU thresholds ranging from 0.5 to 0.95 with a step size of 0.05.
4.3. Ablation Studies
4.4. Comparison with State-of-the-Art Methods
4.5. Visualization Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Abdellatif, T.; Sedrine, M.A.; Gacha, Y. DroMOD: A drone-based multi-scope object detection system. IEEE Access 2023, 11, 26652–26666. [Google Scholar] [CrossRef]
- Konstantinidis, D.; Stathaki, T.; Argyriou, V.; Grammalidis, N. Building detection using enhanced HOG–LBP features and region refinement processes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 888–905. [Google Scholar] [CrossRef]
- Mistry, D.; Banerjee, A. Comparison of feature detection and matching approaches: SIFT and SURF. Glob. Res. Dev. J. Eng. 2017, 2, 7–13. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Yang, S.; Li, F.; Du, Y.; Gao, W.; Sun, T. GS-YOLOv8: An Improved UAV Target Detection Algorithm Based on YOLOv8. In Proceedings of the 2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 24–26 May 2024; pp. 643–647. [Google Scholar]
- Wei, L.; Luo, X.; Kang, J. A Small Object Detection Algorithm for UAV Aerial Images Based on Improved YOLOv8. Comput. Eng. Sci. 2024, 46, 112–118. [Google Scholar]
- Zhou, X.; Zhang, L. SA-FPN: An Effective Feature Pyramid Network for Crowded Human Detection. Appl. Intell. 2022, 52, 12556–12568. [Google Scholar] [CrossRef]
- Liao, N.; Cao, T.; Liu, K.; Xu, M.; Zhu, M.; Gu, Y.; Wang, P. UAV Small Object Detection Algorithm Based on Composite Features and Multi-Scale Fusion. Comput. Eng. Appl. 2023, 59, 145–151. [Google Scholar]
- Jiang, W.; Wang, W.; Yang, J. AEM-YOLOv8s: Small Object Detection in UAV Aerial Images. Comput. Eng. Appl. 2024, 60, 191–202. [Google Scholar]
- Liang, H.; Yang, J.; Shao, M. FE-RetinaNet: Small Target Detection with Parallel Multi-Scale Feature Enhancement. Symmetry 2021, 13, 950. [Google Scholar] [CrossRef]
- Li, C.; Zhao, R.; Wang, Z.; Xu, H.; Zhu, X. RemDet: Rethinking Efficient Model Design for UAV Object Detection. Proc. AAAI Conf. Artif. Intell. 2025, 39, 4643–4651. [Google Scholar] [CrossRef]
- Lu, L.; He, D.; Liu, C.; Deng, Z. MASF-YOLO: An Improved YOLOv11 Network for Small Object Detection on Drone View. arXiv 2025, arXiv:2504.18136. [Google Scholar] [CrossRef]
- Wan, Z.; Lan, Y.; Xu, Z.; Shang, K.; Zhang, F. DAU-YOLO: A Lightweight and Effective Method for Small Object Detection in UAV Images. Remote Sens. 2025, 17, 1768. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, L.; Wang, X.; Liu, J. BPD-YOLO: A Lightweight Small Object Detection Model for UAV Images Based on Deep Semantic Integration. Sci. Rep. 2025, 15, 16878. [Google Scholar]
- Zhao, H.; Wu, Q.; Li, F.; Zhang, M. LightUAV-YOLO: A Lightweight YOLOv8n-Based UAV Object Detection Algorithm with Optimized Feature Fusion and Local Attention. J. Supercomput. 2025, 81, 105. [Google Scholar]
- Liu, K.; Sun, Y.; Hu, J. EMFE-YOLO: A Lightweight Small Object Detection Model for UAVs Based on Efficient Multi-Scale Feature Enhancement. Sensors 2025, 25, 5200. [Google Scholar]
- Wang, P.; Li, R.; Zhou, X. ELNet: An Efficient and Lightweight Network for Small Object Detection in UAV Imagery. Remote Sens. 2025, 17, 2096. [Google Scholar]
- Liu, Y.; Chen, X.; Zhao, W. Small Object Detection in UAV Remote Sensing Images Based on Intra-Group Multi-Scale Fusion Attention and Adaptive Weighted Feature Fusion Mechanism. Remote Sens. 2024, 16, 4265. [Google Scholar]
- Yang, Z.; Wen, L.; Li, Y.; Zhou, D.; Wang, X.; Ding, R.; Zhong, M.; Meng, C.; Fang, W.; Guo, Q. Analysis of the Interference Effects in CMOS Image Sensors Caused by Strong Electromagnetic Pulses. J. Electromagn. Eng. Sci. 2024, 24, 151–160. [Google Scholar] [CrossRef]
- Kim, S.-G.; Lee, E.; Hong, I.-P.; Yook, J.-G. Review of Intentional Electromagnetic Interference on UAV Sensor Modules and Experimental Study. Sensors 2022, 22, 2384. [Google Scholar] [CrossRef]
- Singh, R.K.; Mishra, S.; Pavan Kumar, Y. Undermining Live Feed ML Object Detection Accuracy with EMI on Vehicular Camera Sensors. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), Singapore, 24–27 June 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
- Xu, J.; Tong, L. Lb-unet: A lightweight boundary-assisted unet for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 361–371. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 25–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
- Valanarasu, J.M.J.; Patel, V.M. Unext: Mlp-based rapid medical image segmentation network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer Nature: Cham, Switzerland, 2022; pp. 23–33. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Xiong, Y.; Li, Z.; Chen, Y.; Wang, F.; Zhu, X.; Luo, J.; Wang, W.; Lu, T.; Li, H.; Qiao, Y.; et al. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 5652–5661. [Google Scholar]
- Ruan, J.; Xie, M.; Gao, J.; Liu, T.; Fu, Y. Ege-unet: An efficient group enhanced unet for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer Nature: Cham, Switzerland, 2023; pp. 481–490. [Google Scholar]
- Gao, S.; Zhang, P.; Yan, T.; Lu, H. Multi-scale and detail-enhanced segment anything model for salient object detection. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 9894–9903. [Google Scholar]
- Chen, T.; Zhu, L.; Ding, C.; Cao, R.; Wang, Y.; Li, Z.; Sun, L.; Mao, P.; Zang, Y. Sam-adapter: Adapting segment anything in underperformed scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 3367–3375. [Google Scholar]
- He, C.; Li, K.; Zhang, Y.; Tang, L.; Zhang, Y.; Guo, Z.; Li, X. Shenzhen International Graduate School; Tsinghua University, NEC Laboratories America; 3ETH Zürich; Tianyi Traffic Technology. Camouflaged object detection with feature decomposition and edge reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22046–22055. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 28–29 October 2019. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Peng, S.; Fan, X.; Yu, L. PS-YOLO: A Small Object Detector Based on Efficient Convolution and Multi-Scale Feature Fusion. Multimedia Syst. 2024, 30, 1–16. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection (RT-DETR). arXiv 2023, arXiv:2304.08069. [Google Scholar]
- Li, C.; Li, L.; Zhang, B.; Ouyang, W.; Wang, L. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State of the Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Parameter | Value |
---|---|
Epochs | 200 |
Batch size | 16 |
Image size | 640 × 640 |
learning rate | 0.01 |
Optimizer | SGD |
Module | P | R | mAP@0.5 | mAP@0.5:0.95 | Para (M) |
---|---|---|---|---|---|
YOLOv11n | 0.427 | 0.318 | 0.315 | 0.182 | 2.5 |
YOLOv11n + GSA | 0.452 | 0.401 | 0.366 | 0.206 | 2.9 |
YOLOv11n + GSA + LSKA | 0.476 | 0.409 | 0.394 | 0.217 | 3.2 |
YOLOv11n + GSA + LSKA + DCNv4 | 0.503 | 0.417 | 0.411 | 0.235 | 3.5 |
YOLOv11n + LSKA + DCNv4 + SMSDE | 0.521 | 0.468 | 0.421 | 0.238 | 3.7 |
YOLOv11n + GSA + LSKA + DCNv4 + SMSDE | 0.557 | 0.421 | 0.436 | 0.252 | 3.9 |
YOLOv11n + GSA + LSKA + DCNv4 + MSDE | 0.546 | 0.420 | 0.434 | 0.251 | 5.8 |
Module | APs | APm | APl | ARs | ARm | ARl |
---|---|---|---|---|---|---|
YOLOv11n | 0.106 | 0.271 | 0.344 | 0.201 | 0.447 | 0.499 |
Ours | 0.194 | 0.349 | 0.387 | 0.295 | 0.518 | 0.514 |
Model | Pedestrian | People | Car | Van | Truck | Tricycle | Awning-Tricycle | Bus | Motor |
---|---|---|---|---|---|---|---|---|---|
YOLOv11n | 33.9 | 26.6 | 75.2 | 37.7 | 26.5 | 18.6 | 11.6 | 43.4 | 34.7 |
Ours | 51.3 | 41.7 | 85.1 | 50.9 | 37.9 | 29.0 | 17.7 | 59.6 | 49.9 |
Module | P/% | R/% | mAP@0.5/% | mAP@0.5:0.95/% | Model Size/MB |
---|---|---|---|---|---|
GBS-YOLOv5S | 49.7 | 36.8 | 35.3 | 20.1 | 14.5 |
YOLOv5S | 47.3 | 35.1 | 34.8 | 19.1 | 13.4 |
YOLOv6 | 45.2 | 32.4 | 30.8 | 17.8 | 9.94 |
YOLOv7-tiny | 42.0 | 30.7 | 32.8 | 16.7 | 12.3 |
YOLOv8n | 50.6 | 33.2 | 33.3 | 19.3 | 6.3 |
MDH-YOLOv8 | 54.9 | 34.1 | 37.5 | 22.7 | 6.0 |
YOLOv10 | 52.8 | 36.0 | 31.7 | 23.3 | 9.2 |
RT-DETR | 58.7 | 41.6 | 45.3 | 27.5 | 40 |
PS-YOLO-M | 50.2 | 38.9 | 37.6 | 22.3 | 7.9 |
YOLOv10-S | 50.5 | 38.3 | 39.1 | 23.5 | 9.8 |
MASF-YOLO-n | 56.3 | 40.7 | 43.2 | 28.2 | 13.2 |
RemDet-Tiny | 57.9 | 39.7 | 37.1 | 21.8 | 12.8 |
RemDet-M | 62.7 | 37 | 46.1 | 28.2 | 93.2 |
YOLOv11n | 50.3 | 41.7 | 31.5 | 18.2 | 5.4 |
GLDS-YOLO | 55.7 | 42.1 | 43.6 | 25.2 | 7.2 |
Module | mAP50/% | mAP50:95/% |
---|---|---|
YOLOv8n | 60.4 | 38.1 |
TPH-YOLO | 69.1 | 44.7 |
Drone-DETR | 67.9 | 45.2 |
Oriented R-CNN | 75.8 | — |
GRA | 77.6 | — |
GLDS-YOLO | 74.5 | 49.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ju, Z.; Shui, J.; Huang, J. GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery. Electronics 2025, 14, 3831. https://doi.org/10.3390/electronics14193831
Ju Z, Shui J, Huang J. GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery. Electronics. 2025; 14(19):3831. https://doi.org/10.3390/electronics14193831
Chicago/Turabian StyleJu, Zhiyong, Jiacheng Shui, and Jiameng Huang. 2025. "GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery" Electronics 14, no. 19: 3831. https://doi.org/10.3390/electronics14193831
APA StyleJu, Z., Shui, J., & Huang, J. (2025). GLDS-YOLO: An Improved Lightweight Model for Small Object Detection in UAV Aerial Imagery. Electronics, 14(19), 3831. https://doi.org/10.3390/electronics14193831