YOLOv10-DSNet: A Lightweight and Efficient UAV-Based Detection Framework for Real-Time Small Target Monitoring in Smart Cities
Abstract
Highlights
- A new type of lightweight model has been designed, which enhances real-time urban monitoring by improving the detection accuracy of unmanned aerial vehicles and significantly reducing the computational load.
- This system integrates a parallel dual-attention mechanism and a lightweight feature module, enabling the robust detection of small and dense targets in complex urban scenarios.
- This work provides a framework that successfully balances accuracy and efficiency, enabling the practical deployment of advanced deep learning on edge devices for large-scale smart city applications.
Abstract
1. Introduction
- Add a small target detection layer to the neck network of YOLOv10 model to enhance its small target detection performance.
- A new attention module CBAM-P (Parallel CBAM) is introduced to improve the weight of the detection target in the whole image and enhance the ability of feature expression.
- The lightweight feature extraction module C2f-LW (Light Weight) is introduced into the neck network, in which the bottleneck block is replaced by the Group Fusion module, which can reduce the calculation of the model and simplify the model.
- The depth-based separable convolution is introduced into C2f-LW module [20]. The improved module aims to improve the feature extraction ability of the model for small targets.
2. Related Work
2.1. Attention Mechanism
2.2. Small Target Detection in UAV Imagery
3. Proposed Methodology
3.1. Overall Framework
3.2. Design of the Attention Mechanism
3.2.1. CBAM Attention Mechanism
3.2.2. CBAM-P Attention Mechanism
3.3. 2f-LW: Lightweight Feature Extraction Module
3.3.1. Group Fusion Module
3.3.2. LSKA Module
3.3.3. C2f-LW Structure
3.4. Small Target Detection Layer
4. Experiments and Results
4.1. Experimental Setup
4.1.1. Dataset for Urban Scene Analysis
4.1.2. Evaluation Metrics
4.1.3. Implementation Detail
4.2. Results and Analysis
4.2.1. Evaluations on Tiny Object Detection Layer
4.2.2. Analysis on the Effectiveness of CBAM-P in Improving Attention Mechanism
4.2.3. Ablation Experiment
4.2.4. Model Comparison Test
4.2.5. Algorithm Effect Test
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Javed, A.R.; Shahzad, F.; ur Rehman, S.U.; Zikria, Y.B.; Razzak, I.; Jalil, Z.; Xu, G. Future smart cities: Requirements, emerging technologies, applications, challenges, and future aspects. Cities 2022, 129, 103794. [Google Scholar] [CrossRef]
- Heidari, A.; Navimipour, N.J.; Unal, M. Applications of ML/DL in the management of smart cities and societies based on new trends in information technologies: A systematic literature review. Sustain. Cities Soc. 2022, 85, 104089. [Google Scholar] [CrossRef]
- Abbas, N.; Abbas, Z.; Liu, X.; Khan, S.S.; Foster, E.D.; Larkin, S. A survey: Future smart cities based on advance control of Unmanned Aerial Vehicles (UAVs). Appl. Sci. 2023, 13, 9881. [Google Scholar] [CrossRef]
- Xu, H.; Wang, L.; Han, W.; Yang, Y.; Li, J.; Lu, Y.; Li, J. A survey on UAV applications in smart city management: Challenges, advances, and opportunities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8982–9010. [Google Scholar] [CrossRef]
- Zuo, G.; Zhou, K.; Wang, Q. UAV-to-UAV small target detection method based on deep learning in complex scenes. IEEE Sens. J. 2024, 25, 3806–3820. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th Eur-Opean Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Jocher, G. Ultralytics YOLOv5. GitHub Repository. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 18 February 2025).
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Ultralytics. Ultralytics YOLOv8. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 18 February 2025).
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. arXiv 2019, arXiv:1904.01355. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
- Wang, W.; Wang, S.; Li, Y.; Jin, Y. Adaptive multi-scale dual attention network for semantic segmentation. Neurocomputing 2021, 460, 39–49. [Google Scholar] [CrossRef]
- Soydaner, D. Attention mechanism in neural networks: Where it comes and where it goes. Neural Comput. Appl. 2022, 34, 13371–13385. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Howard, A.G. Mobilenets: Efficient convolute-onal neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck attention module. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 565–578. [Google Scholar]
- Zhang, Y.; Chen, Y.; Huang, C.; Gao, M. Object detection network based on feature fusion and attention mechanism. Future Internet 2019, 11, 9. [Google Scholar] [CrossRef]
- Wang, C.; Wang, H. Cascaded feature fusion with multi-level self-attention mechanism for object detection. Pattern Recognit. 2023, 138, 109377. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Chen, G.; Wang, H.; Chen, K.; Li, Z.; Song, Z.; Liu, Y.; Chen, W.; Knoll, A. A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 936–953. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Zhao, L.; Zhu, M. MS-YOLOv7: YOLOv7 based on multi-scale for object detection on UAV aerial photography. Drones 2023, 7, 188. [Google Scholar] [CrossRef]
- Liu, Z.; Abeyrathna, R.R.D.; Sampurno, R.M.; Nakaguchi, V.M.; Ahamed, T. Faster-YOLO-AP: A lightweight apple detection algorithm based on improved YOLOv8 with a new efficient PDWConv in orchard. Comput. Electron. Agric. 2024, 223, 109118. [Google Scholar] [CrossRef]
- Benjumea, A.; Teeti, I.; Cuzzolin, F.; Bradley, A. YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv 2021, arXiv:2112.11798. [Google Scholar]
- Liu, Z.L.; Shen, X.F. A fusion of Lite-HRNet with YOLO v5 for small target detection in autonomous driving. Automot. Eng. 2022, 44, 1511–1520. [Google Scholar]
- Peng, H.; Xie, H.; Liu, H.; Guan, X. LGFF-YOLO: Small object detection method of UAV images based on efficient local–global feature fusion. J. Real-Time Image Process. 2024, 21, 167. [Google Scholar] [CrossRef]
- Wang, Z.; Xu, H.; Zhu, X.; Li, C.; Liu, Z.; Wang, Z.Y. An improved dense pedestrian detection algorithm based on YOLOv8: MER-YOLO. Comput. Eng. Sci. 2024, 46, 1050. [Google Scholar]
- Pan, Y.; Yang, Z. Optimization model for small object detection based on multi-level feature bidirectional fusion. J. Comput. Appl. 2023, 43, 1985–1992. [Google Scholar]
- Guan, T.; Chang, S.; Deng, Y.; Xue, F.; Wang, C.; Jia, X. Oriented SAR Ship Detection Based on Edge Deformable Convolution and Point Set Representation. Remote Sens. 2025, 17, 1612. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, W.; Deng, J.; Guo, Y.; Liu, S.; Zhang, J. MASFF-Net: Multi-azimuth scattering feature fusion network for SAR target recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 19425–19440. [Google Scholar] [CrossRef]
- Chen, E.; Lusi, A.; Gao, Q.; Bian, S.; Li, B.; Guo, J.; Zhang, D.; Yang, C.; Hu, W.; Huang, F. CB−YOLO: Dense object detection of YOLO for crowded wheat head identification and localization. J. Circuits Syst. Comput. 2024, 34, 2550079. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Wang, H.; Yun, L.; Yang, C.; Wu, M.; Wang, Y.; Chen, Z. Ow-yolo: An improved yolov8s lightweight detection method for obstructed walnuts. Agriculture 2025, 15, 159. [Google Scholar] [CrossRef]
- Wang, C.; Wang, L.; Ma, G.; Zhu, L. CSF-YOLO: A Lightweight Model for Detecting Grape Leafhopper Damage Levels. Agronomy 2025, 15, 741. [Google Scholar] [CrossRef]
Augmentation Method | Hyperparameter | Probability/Range |
---|---|---|
Composite Augmentations | Mosaic | 1.0 |
Geometric Augmentations | Random Scaling | [0.5, 1.5] |
Random Rotation | [−10, 10] | |
Random Horizontal Flip | 0.5 | |
Color Space Augmentations | Hue | [−0.015, 0.015] |
Saturation | [−0.7, 0.7] | |
Brightness | [−0.4, 0.4] |
Model | Precision | Recall | mAP50 | mAP50-95 | FLOPs (G) | Para (M) |
---|---|---|---|---|---|---|
YOLOv10n | 69.7% | 58.9% | 60.8% | 40.7% | 8.4 | 3.1 |
YOLOv10n-tiny | 71.8% | 57.1% | 62.7% | 43.1% | 8.9 | 3.2 |
Model | Precision | Recall | mAP50 | mAP50-95 | FLOPs (G) | Para (M) |
---|---|---|---|---|---|---|
YOLOv10n | 69.7% | 58.9% | 60.8% | 40.7% | 8.4 | 3.1 |
YOLOv10-DSNet | 72.9% | 61.7% | 64.5% | 44.8% | 2.4 | 6.8 |
YOLOv10-DSNet-F | 63.5% | 52.8% | 54.9% | 36.7% | 2.4 | 6.8 |
YOLOv10-DSNet-B | 65.2% | 55.3% | 57.3% | 38.5% | 2.4 | 6.8 |
LSKA | Group Fusion | Precision | Recall | mAP50 | mAP50-95 | FLOPs (G) | Para (M) |
---|---|---|---|---|---|---|---|
× | × | 69.7% | 58.9% | 60.8% | 40.7% | 8.4 | 3.1 |
✓ | × | 70.6% | 59.4% | 61.7% | 41.9% | 7.6 | 2.9 |
× | ✓ | 70.1% | 58.2% | 60.9% | 41.3% | 6.9 | 2.5 |
✓ | ✓ | 70.8% | 59.8% | 61.9% | 42.2% | 6.2 | 2.4 |
Detection Layer | C2F-LW | CBAM-P | Precision | Recall | MAP50 | MAP50-95 | FLOPS (G) | Para (M) |
---|---|---|---|---|---|---|---|---|
× | × | × | 69.7% | 58.9% | 60.8% | 40.7% | 8.4 | 3.1 |
✓ | × | × | 71.8% | 57.1% | 62.7% | 43.1% | 8.9 | 3.2 |
× | ✓ | × | 70.8% | 59.8% | 61.9% | 42.2% | 6.2 | 2.3 |
× | × | ✓ | 71.2% | 58.9% | 62.5% | 43.2% | 8.5 | 3.1 |
✓ | ✓ | × | 72.3% | 59.3% | 63.8% | 44.2% | 6.7 | 2.4 |
✓ | × | ✓ | 72.6% | 58.4% | 64.2% | 44.3% | 9.1 | 3.3 |
× | ✓ | ✓ | 72.0% | 61.0% | 63.4% | 43.6% | 6.4 | 2.3 |
✓ | ✓ | ✓ | 72.9% | 61.7% | 64.5% | 44.8% | 6.8 | 2.4 |
Model | Precision | Recall | mAP50 | mAP50-95 | FLOPs (G) | Para (M) |
---|---|---|---|---|---|---|
YOLOv5-lite | 52.5% | 39.7% | 42.1% | 28.8% | 5.6 | 2.1 |
YOLOv8n | 68.3% | 54.9% | 57.9% | 37.8% | 9.2 | 3.6 |
TPH-YOLOv5 | 71.2% | 61.2% | 63.9% | 43.6% | 19.3 | 7.8 |
YOLO-Z | 67.7% | 59.0% | 61.0% | 42.3% | 12.8 | 4.7 |
Faster-YOLO-AP | 70.1% | 60.8% | 62.8% | 44.2% | 7.2 | 2.6 |
OW-YOLO | 71.8% | 61.4% | 64.2% | 44.3% | 7.4 | 2.9 |
CSF-YOLO | 70.6% | 60.7% | 63.5% | 42.9% | 7.2 | 2.7 |
YOLOv10n | 69.7% | 58.9% | 60.8% | 40.7% | 8.4 | 3.1 |
YOLOv10-DSNet | 72.9% | 61.7% | 64.5% | 44.8% | 6.8 | 2.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, G.; Qiu, X.; Pan, Z.; Yang, Y.; Xu, L.; Cui, J.; Zhang, D. YOLOv10-DSNet: A Lightweight and Efficient UAV-Based Detection Framework for Real-Time Small Target Monitoring in Smart Cities. Smart Cities 2025, 8, 158. https://doi.org/10.3390/smartcities8050158
Guo G, Qiu X, Pan Z, Yang Y, Xu L, Cui J, Zhang D. YOLOv10-DSNet: A Lightweight and Efficient UAV-Based Detection Framework for Real-Time Small Target Monitoring in Smart Cities. Smart Cities. 2025; 8(5):158. https://doi.org/10.3390/smartcities8050158
Chicago/Turabian StyleGuo, Guangyou, Xiulin Qiu, Zhengle Pan, Yuwang Yang, Lei Xu, Jian Cui, and Donghui Zhang. 2025. "YOLOv10-DSNet: A Lightweight and Efficient UAV-Based Detection Framework for Real-Time Small Target Monitoring in Smart Cities" Smart Cities 8, no. 5: 158. https://doi.org/10.3390/smartcities8050158
APA StyleGuo, G., Qiu, X., Pan, Z., Yang, Y., Xu, L., Cui, J., & Zhang, D. (2025). YOLOv10-DSNet: A Lightweight and Efficient UAV-Based Detection Framework for Real-Time Small Target Monitoring in Smart Cities. Smart Cities, 8(5), 158. https://doi.org/10.3390/smartcities8050158