SST-YOLO: An Improved Autonomous Driving Object Detection Algorithm Based on YOLOv8
Abstract
1. Introduction
- (1)
- An SCC module is proposed to integrate Sobel-based gradient features with conventional convolution, improving edge awareness and feature diversity.
- (2)
- A SOAPN structure is designed to enhance multi-scale representation, particularly for small object detection.
- (3)
- A TADAHead is introduced to achieve task-adaptive feature decoupling and spatial alignment, improving detection accuracy.
- (4)
- Experimental results on the KITTI dataset demonstrate consistent improvements over the YOLOv8 baseline.
2. Related Work
2.1. Overview of Object Detection
2.2. Two-Stage Object Detection Algorithms
2.3. Single-Stage Object Detection Algorithms
2.4. YOLO Network
2.5. YOLOv8
3. Method
3.1. SCC
3.2. SOAPN
3.3. TADAHead
4. Experiments and Analysis
4.1. Dataset
4.2. Experimental Environment and Training Settings
4.3. Evaluation Metrics
4.4. Comparative Experiments
4.5. Ablation Experiments
4.6. Model Generalization Experiments
4.7. Visualization Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A survey of autonomous driving: Common practices and emerging technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Sun, P.; Wergeles, N.; Shang, Y. A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 2021, 172, 114602. [Google Scholar] [CrossRef]
- Yang, X.; Wu, W.; Liu, K.; Kim, P.W.; Sangaiah, A.K.; Jeon, G. Long-distance object recognition with image super resolution: A comparative study. IEEE Access 2018, 6, 13429–13438. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Khan, F.A.; Gumaei, A.; Derhab, A.; Hussain, A. A novel two-stage deep learning model for efficient network intrusion detection. IEEE Access 2019, 7, 30373–30385. [Google Scholar] [CrossRef]
- Wang, T.; Yang, F.; Tsui, K.L. Real-time detection of railway track component via one-stage deep learning networks. Sensors 2020, 20, 4325. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-end autonomous driving: Challenges and frontiers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10164–10183. [Google Scholar] [CrossRef] [PubMed]
- Kanopoulos, N.; Vasanthavada, N.; Baker, R.L. Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits 1988, 23, 358–367. [Google Scholar] [CrossRef]
- Bappy, J.H.; Roy-Chowdhury, A.K. CNN based region proposals for efficient object detection. In 2016 IEEE International Conference on Image Processing (ICIP); IEEE: New York, NY, USA, 2016; pp. 3658–3662. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In 18th International Conference on Pattern Recognition (ICPR’06); IEEE: New York, NY, USA, 2006; Volume 3, pp. 850–855. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Chauhan, V.K.; Dahiya, K.; Sharma, A. Problem formulations and solvers in linear SVM: A review. Artif. Intell. Rev. 2019, 52, 803–855. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 8759–8768. [Google Scholar]
- Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer Nature: Cham, Switzerland, 2022; pp. 443–459. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops; IEEE: New York, NY, USA, 2020; pp. 390–391. [Google Scholar]
- Cui, Y.; Ren, W.; Knoll, A. Omni-kernel network for image restoration. In Proceedings of the AAAI Conference on Artificial Intelligence; IEEE: New York, NY, USA, 2024; Volume 38, pp. 1426–1434. [Google Scholar]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Bjorck, N.; Gomes, C.P.; Selman, B.; Weinberger, K.Q. Understanding batch normalization. Adv. Neural Inf. Process. Syst. 2018, 31, 1–12. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 11030–11039. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 9308–9316. [Google Scholar]
- Kyurkchiev, N.; Markov, S. Sigmoid Functions: Some Approximation and Modelling Aspects; LAP LAMBERT Academic Publishing: Saarbrucken, Germany, 2015; Volume 4, p. 34. [Google Scholar]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2019; pp. 9297–9307. [Google Scholar]
- Han, J.; Liang, X.; Xu, H.; Chen, K.; Hong, L.; Mao, J.; Ye, C.; Zhang, W.; Li, Z.; Liang, X.; et al. SODA10M: A large-scale 2D self/semi-supervised object detection dataset for autonomous driving. arXiv 2021, arXiv:2106.11118. [Google Scholar]









| Experimental Environment | Value |
|---|---|
| Processor | R7-7745HX |
| Operating System | Linux |
| Memory | 16 GB |
| GPU | RTX 4090 |
| GPU Memory | 24 GB |
| Programming Language | Python 3.8 |
| Deep Learning Framework | PyTorch 2.2.2 |
| Deep Learning Toolkit | CUDA 12.1 |
| Parameter | Value |
|---|---|
| Input Image Size | 640 × 640 |
| Learning Rate | 0.01 |
| Weight Decay | 0.005 |
| Momentum | 0.9 |
| Optimizer | SGD |
| Batch Size | 32 |
| Training Epochs | 250 |
| Model | P/% | R/% | mAP@0.5/% | mAP@0.5–0.95/% |
|---|---|---|---|---|
| SSD | 86.4 | 79.4 | 83.3 | 77.9 |
| Faster RCNN | 84.5 | 77.7 | 80.9 | 67.5 |
| YOLOv3 | 89.3 | 83.3 | 86.9 | 64.7 |
| YOLOv5 | 90.3 | 84.7 | 88.6 | 65.5 |
| YOLOv7 | 90.5 | 84.1 | 87.4 | 64.5 |
| YOLOv8 | 90.1 | 82.9 | 89.2 | 65.1 |
| YOLOv10 | 91.1 | 85.2 | 88.9 | 65.7 |
| SST-YOLO | 93.4 | 84.4 | 91.7 | 69.2 |
| Model | mAP@0.5/% | mAP@0.5–0.95/% | FPS | GFLOPs | Params (M) |
|---|---|---|---|---|---|
| YOLOv8s | 89.2 | 65.1 | 512.89 | 28.4 | 11.1 |
| YOLOv8s + SCC | 90.2 | 65.5 | 438.58 | 26.8 | 10.5 |
| YOLOv8s + SOAPN | 90.8 | 67.3 | 504.39 | 29.9 | 10.3 |
| YOLOv8s + TADAHead | 90.5 | 67.3 | 599.01 | 25.8 | 9.4 |
| YOLOv8s + SCC + SOAPN | 91.6 | 68.6 | 567.57 | 28.4 | 9.7 |
| YOLOv8s + SCC + TADAHead | 91.4 | 68.5 | 562.58 | 24.3 | 8.8 |
| YOLOv8s + SOAPN + TADAHead | 91.5 | 68.1 | 554.70 | 28.4 | 9.2 |
| YOLOv8s + SCC + SOAPN + TADAHead | 91.7 | 69.2 | 525.13 | 26.8 | 8.6 |
| Model | P/% | R/% | mAP@0.5/% | mAP@0.5–0.95/% |
|---|---|---|---|---|
| YOLOv5 | 79.2 | 71.3 | 83.1 | 55.1 |
| YOLOv7 | 81.1 | 73.7 | 82.1 | 54.6 |
| YOLOv8 | 80.9 | 73.8 | 82.3 | 56.4 |
| SST-YOLO | 81.4 | 73.3 | 87.8 | 60.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Du, Q.; Zhang, N.; Bi, W.; Zhu, R.; Liu, Y.; Shen, C.; Zhang, S.; Zhao, J. SST-YOLO: An Improved Autonomous Driving Object Detection Algorithm Based on YOLOv8. Appl. Sci. 2026, 16, 3456. https://doi.org/10.3390/app16073456
Du Q, Zhang N, Bi W, Zhu R, Liu Y, Shen C, Zhang S, Zhao J. SST-YOLO: An Improved Autonomous Driving Object Detection Algorithm Based on YOLOv8. Applied Sciences. 2026; 16(7):3456. https://doi.org/10.3390/app16073456
Chicago/Turabian StyleDu, Qinsheng, Ningbo Zhang, Wenqing Bi, Ruidi Zhu, Yuhan Liu, Chao Shen, Shiyan Zhang, and Jian Zhao. 2026. "SST-YOLO: An Improved Autonomous Driving Object Detection Algorithm Based on YOLOv8" Applied Sciences 16, no. 7: 3456. https://doi.org/10.3390/app16073456
APA StyleDu, Q., Zhang, N., Bi, W., Zhu, R., Liu, Y., Shen, C., Zhang, S., & Zhao, J. (2026). SST-YOLO: An Improved Autonomous Driving Object Detection Algorithm Based on YOLOv8. Applied Sciences, 16(7), 3456. https://doi.org/10.3390/app16073456

