DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. Architecture Overview
3.1.1. DCNv3_LKA_C2f Module
- G denotes the total number of aggregation groups used to enrich the feature representation by partitioning the spatial aggregation process into multiple subspaces.
- represents the position-independent projection weights, which are shared within each group to reduce the number of parameters and computational complexity.
- is the modulation scalar of the k-th sampling point in the g-th group, normalized by the Softmax function to ensure gradient stability during the training process.
- is a sliced input feature map corresponding to a specific aggregation group.
- refers to the offset of the sampling position of the k-th grid in the g-th group, dynamically adjusting the position of the convolution kernel to adapt to the complex target shape.
3.1.2. Optimized-GFPN Module
3.1.3. Detect_SA Module
3.2. Inner-Shape IoU Loss Function
4. Experiments and Results
4.1. Datasets
4.2. Implementation
- Hardware platform: Intel® Xeon® Silver 4214R CPU @2.40 GHz, 90 GB DDR4 RAM, NVIDIA GeForce RTX 4090 (24 GB GDDR6X Graphics Processing Unit (GPU)).
- Software stack:
- –
- PyTorch 1.11.0 + Torchvision 0.12.0 (CUDA 11.3 backend).
- –
- Ultralytics YOLOv8 reference implementation (v8.1.0) [18].
4.3. Measurement Index
- Tensor preprocessing (normalization + resizing);
- Inference computation;
- Non-maximum suppression (NMS) postprocessing.
- mAP@50: IoU threshold fixed at 0.5.
- mAP@50:95: Average over IoU thresholds [0.5, 0.95] with 0.05 increments [4].
4.4. Experimental of Ablation Analysis
4.5. Algorithm Performance Analysis
4.6. Visualizations
- Object detection results: Figure 8 and Figure 9 present the detection result comparison between YOLOv8s and DGSS-YOLOv8s on the BDD100K and KITTI test datasets, respectively. Compared to YOLOv8s, DGSS-YOLOv8s demonstrates improved target localization in complex backgrounds, significantly reducing both false detections and misidentifications. Furthermore, DGSS-YOLOv8s can detect targets that YOLOv8s misses, including pedestrians in challenging backgrounds, occluded cyclists, and distant pedestrians. This demonstrates that DGSS-YOLOv8s offers enhanced performance in recognizing complex shapes, small objects, and occluded targets compared to YOLOv8s. However, it is important to note that DGSS-YOLOv8s cannot completely eliminate false positives and false negatives.
- Grad-CAM Results: We utilized Grad-CAM [67] to visualize CNN decision-making mechanisms, enabling the distinguishing and pinpointing of critical image regions influencing prediction outcomes. This technique constructs coarse localization maps by analyzing gradient propagation from the final convolutional layer to object-related concepts. As demonstrated in Figure 10 and Figure 11, comparative analysis reveals DGSS-YOLOv8s’ superior attention allocation compared to the baseline model. Visualization outcomes distinctly indicate YOLOv8s’ inadequate focus on traffic elements, whereas our enhanced architecture exhibits optimized feature concentration—effectively reducing boundary ambiguities, emphasizing target-centric characteristics, and filtering irrelevant background data. Notably, in occlusion-intensive scenarios, the modified framework demonstrates improved detection consistency through better spatial reasoning. For small object detection tasks, DGSS-YOLOv8s successfully identifies and amplifies discriminative features, achieving precise characteristic extraction that translates to measurable accuracy improvements in complex traffic environments.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Boukerche, A.; Hou, Z. Object detection using deep learning methods in traffic scenarios. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Wu, B.; Iandola, F.; Jin, P.H.; Keutzer, K. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 1–26 July 2017; pp. 129–137. [Google Scholar] [CrossRef]
- Tong, K.; Lyu, Y.; Li, Y.; Wu, Y.; Zhang, L. Deep learning-based small object detection: A survey and benchmark. Appl. Intell. 2023, 53, 15861–15883. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision—ECCV 2014; Springer: Cham, Switzerland, 2014; Volume 8693, Lecture Notes in Computer Science; pp. 740–755. [Google Scholar] [CrossRef]
- Teixeira, F.; Georgiou, T.; Mylonas, P.; Lalas, A.; Chrysostomou, D.; Stivaros, S.; Aksoy, E. A Review on Vision-Based Pedestrian and Vehicle Detection Systems and Their Performance on Benchmark Datasets. Sensors 2021, 21, 7267. [Google Scholar] [CrossRef]
- Hosseini, P.; Rezghi, M.; Etemad, A.; Kasaei, S. Occlusion Handling in Generic Object Detection: A Review. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6018–6035. [Google Scholar]
- Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Carvalho, V.B.; Forechi, A.; Jesus, L.; Berriel, R.; Paixao, T.M.; DeSouza, F.; et al. Self-Driving Cars: A Survey. Expert Syst. Appl. 2021, 175, 114816. [Google Scholar] [CrossRef]
- Liu, R.; Feng, Y.; Wang, X.; Wang, C.J. RT-BEV: A Real-Time Framework for Vision-Centric Bird’s-Eye-View Perception. In Proceedings of the 45th IEEE Real-Time Systems Symposium (RTSS), York, UK, 10–13 December 2024. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 23–19 October 2025; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. ultralytics/yolov5: V6.2—YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations; Zenodo: Geneva, Switzerland, 2022. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D.M. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. arXiv 2023, arXiv:2304.00501. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 June 2024).
- Jiang, R.; Xie, T.; Li, A.; Hu, R. YOLOv8-Based Improved Model for Traffic Object Detection. Sensors 2023, 23, 6958. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, S. Shape-IOU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Song, C.; Kim, J. SJ+LPD: Selective Jittering and Learnable Point Drop for Robust LiDAR Semantic Segmentation under Adverse Weather. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Khan, M.A.; Al-Maadeed, S.A.; Bouridane, A. Advancing vehicle detection for autonomous driving: Integrating computer vision and machine learning for lightweight, real-time performance. Connect. Sci. 2024, 36, 2358145. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Parisapogu, S.A.B.; Narla, N.; Juryala, A.; Ramavath, S. YOLO based Object Detection Techniques for Autonomous Driving. In Proceedings of the 2024 Second International Conference on Inventive Computing and Informatics (ICICI), Bangalore, India, 11–12 June 2024; pp. 249–256. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar] [CrossRef]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with Deformable Convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar] [CrossRef]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar] [CrossRef]
- Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Ke, X.; Liu, C.; Xu, X.; Wang, L.; Zhan, D.; Chen, Z. A High-Resolution Feature Pyramid Network With Attention-Based Multi-Level Feature Fusion Module for Object Detection in Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hu, J.; Zhou, Y.; Wang, H.; Qiao, P.; Wan, W. Research on Deep Learning Detection Model for Pedestrian Objects in Complex Scenes Based on Improved YOLOv7. Sensors 2024, 24, 6922. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; NIPS: Grenada, Spain, 2017; pp. 5998–6008. [Google Scholar]
- Saha, S.; Xu, L. Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies. Neurocomputing 2024, 643, 130417. [Google Scholar] [CrossRef]
- Chen, Y.; Cai, G.; Song, Z.; Li, J.; Wang, X. LVP: Leverage Virtual Points in Multi-Modal Early Fusion for 3-D Object Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Wu, H.; Wen, C.; Shi, S.; Li, X.; Wang, C. Virtual Sparse Convolution for Multimodal 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 21654–21663. [Google Scholar]
- Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and sensor fusion technology in autonomous vehicles: A review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef]
- Zhao, J.; Li, L.; Dai, J. A review of multi-sensor fusion 3D object detection for autonomous driving. In Eleventh International Symposium on Precision Mechanical Measurements; SPIE: Philadelphia, PA, USA, 2024; Volume 13178, p. 1317826. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Chen, L.; Wang, Z.; Liu, J.; Luo, H.; Han, X.; Tao, Q. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Tian, D.; Yan, X.; Zhou, D.; Wang, C.; Zhang, W. IV-YOLO: A Lightweight Dual-Branch Object Detection Network. Sensors 2024, 24, 6181. [Google Scholar] [CrossRef]
- Lv, Y.; Chen, B.; Zheng, Z.; He, Z.; Zhang, Z.; Shan, S. DETRs with Improved DeNoising Anchor Boxes for Real-Time Object Detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Azad, R.; Niggemeier, L.; Hüttemann, M.; Kazerouni, A.; Aghdam, E.K.; Velichko, Y.; Bagci, U.; Merhof, D. Beyond Self-Attention: Deformable Large Kernel Attention for medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 1287–1296. [Google Scholar] [CrossRef]
- Jiang, Y.; Tan, Z.; Wang, J.; Sun, X.; Lin, M.; Li, H. GiraffeDet: A heavy-neck paradigm for object detection. arXiv 2022, arXiv:2202.04256. [Google Scholar]
- Zhao, G.; Ge, W.; Yu, Y. GraphFPN: Graph Feature Pyramid Network for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 2763–2772. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Hu, Y.H.; Chu, L.H.; Chen, X.Y.; Huang, L.Z.; Li, B. Demystifying Deep Neural Network Operators towards Better Network Designs. arXiv 2024, arXiv:2211.05781. [Google Scholar]
- Yu, H.; Wan, C.; Liu, M.; Chen, D.; Xiao, B.; Dai, X. Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search. arXiv 2024, arXiv:2403.10413. [Google Scholar]
- Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, X.; Lu, H. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv 2021, arXiv:2101.08158. [Google Scholar] [CrossRef]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
- Pech-Pacheco, J.L.; Cristobal, G.; Alvarez-Borrego, J.; Fernandez-Valdivia, J.J. Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings of the Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Barcelona, Spain, 3–7 September 2000; Volume 3, pp. 314–317. [Google Scholar] [CrossRef]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.Y.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2918–2928. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Istanbul, Turkey, 9–11 June 2021; Volume 34, pp. 13001–13008. [Google Scholar] [CrossRef]
- Zhao, R.; Tang, S.H.; Supeni, E.E.B.; Rahim, S.A.; Fan, L. Z-YOLOv8s-based approach for road object recognition in complex traffic scenarios. Alex. Eng. J. 2024, 106, 298–311. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Augmentation Technique | Parameters/Probability |
---|---|
Random Horizontal Flip | Probability = 0.5 |
Mosaic (applied for first 150 epochs) | Probability = 1.0 |
MixUp [64] (applied after Mosaic disabled) | Probability = 0.15 |
Random Affine: | |
- Rotation | Range = ±10.0 degrees |
- Scale | Range = 0.5–1.5 |
- Translation | Range = ±0.1 × image size |
- Shear | Range = ±2.0 degrees |
HSV Color Space Jitter | H = 0.015, S = 0.7, V = 0.4 |
Copy-Paste [63] | Probability = 0.1 |
Random Erasing [65] | Probability = 0.2, Area = 0.02–0.4 |
Algorithm | Module | Result | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
DL_C2f | Opt_GFPN | Detect_SA | IS_IoU | mAP@50 (%) | mAP@50:95 (%) | P (%) | R (%) | FPS (f/s) | Params (M) | |
YOLOv8s | 47.8 | 22.1 | 61.1 | 44.2 | 136 | 11.2 | ||||
YOLOv8s | ✓ | 48.5 | 22.3 | 67.3 | 44.4 | 105 | 12 | |||
YOLOv8s | ✓ | ✓ | 48.6 | 22.8 | 62.2 | 45.6 | 98 | 14.4 | ||
YOLOv8s | ✓ | ✓ | ✓ | 49.2 | 23.3 | 65.2 | 45.7 | 107 | 13.2 | |
YOLOv8s | ✓ | ✓ | ✓ | ✓ | 50.2 | 23.3 | 65.2 | 45.7 | 107 | 13.2 |
Algorithm | Module | Result | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
DL_C2f | Opt_GFPN | Detect_SA | IS_IoU | mAP@50 (%) | mAP@50:95 (%) | P (%) | R (%) | FPS (f/s) | Params (M) | |
YOLOv8s | 82.3 | 55.6 | 90.1 | 75.9 | 156 | 11.2 | ||||
YOLOv8s | ✓ | 85.4 | 58.8 | 91.1 | 78.3 | 115 | 12 | |||
YOLOv8s | ✓ | ✓ | 86.4 | 63.1 | 85.9 | 79.6 | 102 | 14.4 | ||
YOLOv8s | ✓ | ✓ | ✓ | 86.1 | 65.4 | 89.2 | 80.5 | 110 | 13.2 | |
YOLOv8s | ✓ | ✓ | ✓ | ✓ | 86.9 | 65.4 | 89.2 | 80.5 | 110 | 13.2 |
Category | YOLOv8s | DGSS-YOLOv8s | ||||||
---|---|---|---|---|---|---|---|---|
P | R | mAP@50 | mAP@50:95 | P | R | mAP@50 | mAP@50:95 | |
All | 61.1 | 44.2 | 48.5 | 22.1 | 65.2 | 45.7 | 50.5 | 23.3 |
Vehicles | 66.3 | 62.3 | 66.2 | 38.4 | 69.9 | 63.2 | 68.1 | 40.4 |
Pedestrian | 58.9 | 43.4 | 45.6 | 18.1 | 63.3 | 42.5 | 49.6 | 19.3 |
Traffic Sign | 58.0 | 34.4 | 41.2 | 17.6 | 63.1 | 37.1 | 41.9 | 19.2 |
Traffic Light | 61.3 | 36.8 | 41.1 | 14.0 | 64.6 | 39.9 | 42.6 | 14.7 |
Category | YOLOv8s | DGSS-YOLOv8s | ||||||
---|---|---|---|---|---|---|---|---|
P | R | mAP@50 | mAP@50:95 | P | R | mAP@50 | mAP@50:95 | |
All | 90.1 | 75.9 | 82.3 | 55.6 | 89.2 | 80.5 | 86.9 | 65.4 |
Car | 89.8 | 89.4 | 93.6 | 74.4 | 92.1 | 88.5 | 93.7 | 74.6 |
Pedestrian | 91.6 | 63.9 | 72.1 | 41.2 | 88.8 | 69.4 | 79.7 | 50.3 |
Cyclist | 88.0 | 74.5 | 81.2 | 51.2 | 86.6 | 83.6 | 87.3 | 61.4 |
Model | P (%) | R (%) | mAP@50 (%) | mAP@50:95 (%) | FPS (f/s) |
---|---|---|---|---|---|
Faster-RCNN (ResNet50) | 62.7 | 55.1 | 51.5 | 25.7 | 19.4 |
Cascade-RCNN (ResNet50) | 62.3 | 55.8 | 51.7 | 26.7 | 12.3 |
RetinaNet (ResNet50) | 58.2 | 48.8 | 46.7 | 21.7 | 26.3 |
SSD | 46.3 | 38.4 | 33.9 | 14.8 | 48.6 |
YOLOv4 | 63.0 | 47.6 | 50.9 | 23.6 | 97.4 |
YOLOv5s | 60.8 | 44.3 | 48.2 | 21.6 | 180.2 |
YOLOv6s | 56.2 | 47.9 | 47.3 | 22.0 | 132.8 |
YOLOXs | 62.0 | 47.0 | 49.3 | 22.2 | 140.5 |
YOLOv7-tiny | 59.1 | 42.2 | 44.7 | 18.8 | 225.7 |
YOLOv8s | 61.5 | 44.7 | 49.1 | 22.3 | 125.6 |
YOLOv9 | 67.4 | 48.5 | 53.8 | 26.1 | 92.4 |
YOLOv10 | 66.1 | 47.2 | 51.5 | 25.3 | 96.3 |
RT-DETR (ResNet-50) | 64.7 | 45.8 | 50.1 | 24.5 | 98.1 |
DGSS-YOLOv8s | 65.2 | 46.1 | 51.3 | 24.0 | 103.1 |
Model | P (%) | R (%) | mAP@50 (%) | mAP@50:95 (%) | FPS (f/s) |
---|---|---|---|---|---|
Faster-RCNN (ResNet50) | 89.3 | 72.5 | 79.5 | 48.1 | 24.1 |
Cascade-RCNN (ResNet50) | 75.7 | 63.2 | 66.9 | 35.8 | 35.4 |
RetinaNet (ResNet50) | 87.4 | 75.8 | 78.8 | 48.7 | 17.6 |
SSD | 69.6 | 59.4 | 62.0 | 31.3 | 56.3 |
YOLOv4 | 91.6 | 78.4 | 84.2 | 56.3 | 97.5 |
YOLOv5s | 92.0 | 73.8 | 82.1 | 51.5 | 230.7 |
YOLOv6s | 87.6 | 72.5 | 80.2 | 50.9 | 155.2 |
YOLOXs | 93.1 | 74.3 | 83.2 | 53.8 | 168.3 |
YOLOv7-tiny | 88.7 | 70.2 | 79.0 | 46.3 | 276.5 |
YOLOv8s | 90.1 | 75.9 | 82.3 | 55.6 | 148.7 |
YOLOv9 | 94.2 | 77.1 | 85.3 | 57.2 | 91.2 |
YOLOv10 | 92.8 | 75.5 | 83.1 | 55.9 | 95.5 |
RT-DETR (ResNet-50) | 91.5 | 74.3 | 81.0 | 54.2 | 97.4 |
DGSS-YOLOv8s | 89.2 | 80.5 | 86.9 | 65.4 | 102.5 |
IoU | Area | maxDets | YOLOv8s (BDD100K) | DGSS-YOLOv8s (BDD100K) | YOLOv8s (KITTI) | DGSS-YOLOv8s (KITTI) | |
---|---|---|---|---|---|---|---|
Average Precision (%) | 0.50:0.95 | all | 100 | 21.7 | 22.7 | 49.8 | 57.1 |
0.50 | all | 100 | 47.3 | 49.3 | 79.8 | 86.1 | |
0.75 | all | 100 | 16.9 | 17.4 | 54.0 | 64.1 | |
0.50:0.95 | small | 100 | 10.7 | 11.4 | 25.6 | 35.3 | |
0.50:0.95 | medium | 100 | 35.3 | 36.8 | 55.2 | 60.4 | |
Average Recall (%) | 0.50:0.95 | all | 1 | 6.5 | 6.8 | 25.3 | 30.2 |
0.50:0.95 | all | 10 | 26.2 | 26.9 | 54.0 | 61.4 | |
0.50:0.95 | all | 100 | 33.2 | 34.6 | 57.1 | 63.1 | |
0.50:0.95 | small | 100 | 22.0 | 23.6 | 38.2 | 44.5 | |
0.50:0.95 | medium | 100 | 49.8 | 50.7 | 61.4 | 66.8 | |
0.50:0.95 | large | 100 | 58.8 | 59.9 | 70.1 | 79.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, S.; Chen, L.; Yang, K. DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles. Algorithms 2025, 18, 358. https://doi.org/10.3390/a18060358
Cheng S, Chen L, Yang K. DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles. Algorithms. 2025; 18(6):358. https://doi.org/10.3390/a18060358
Chicago/Turabian StyleCheng, Siqiang, Lingshan Chen, and Kun Yang. 2025. "DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles" Algorithms 18, no. 6: 358. https://doi.org/10.3390/a18060358
APA StyleCheng, S., Chen, L., & Yang, K. (2025). DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles. Algorithms, 18(6), 358. https://doi.org/10.3390/a18060358