A Structurally Optimized and Efficient Lightweight Object Detection Model for Autonomous Driving
Abstract
1. Introduction
2. Related Work
2.1. YOLO Object Detection Algorithm
2.2. Current Research Status of Lightweight Models
3. Methods
3.1. Overall Architecture of FE-YOLOv8
3.2. C2f-Faster Architecture
3.3. Efficient Head Structure
- : number of output channels,
- : number of input channels,
- : height of the output feature map,
- : width of the output feature map,
- : kernel size of the convolution.
- : number of input channels,
- : kernel size of the standard convolution,
- : number of feature maps generated per input channel (expansion factor),
- : kernel size of the linear operator in EMSConv,
- : theoretical FLOPs reduction ratio.
4. Experiment and Results
4.1. Hardware Platform and Parameters
4.2. Dataset (Shown in Supplementary Materials)
4.3. Experimental Validation and Results Analysis
4.3.1. Model Validation Analysis
4.3.2. Ablation Experiment Results
4.3.3. Comparative Experiments Results
4.3.4. Analysis of Model Lightweighting Detection Performance
4.3.5. Evaluation of Model Generalization and Visual Results
5. Summary
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Chi, X.; Sun, Y.; Zhao, Y.; Lu, D.; Gao, Y.; Zhang, Y. An Improved YOLOv8 Network for Detecting Electric Pylons Based on Optical Satellite Image. Sensors 2024, 24, 4012. [Google Scholar] [CrossRef] [PubMed]
- Bachute, M.R.; Subhedar, J.M. Autonomous Driving Systems: A Systematic Literature Review. J. King Saud Univ.—Comput. Inf. Sci. 2021, 34, 6855–6875. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; ChristopherSTAN; Changyu, L.; Laughing; tkianai; Hogan, A.; lorenzomammana; et al. Ultralytics YOLOv5: Real-Time Object Detection; v3.1; Zenodo: Geneva, Switzerland, 2020. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 1637–1646. [Google Scholar] [CrossRef]
- Yaseen, M. What Is YOLOv8: An In-Depth Exploration of YOLOv8 for Object Detection. arXiv 2024, arXiv:2408.15857. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Lyu, Z.; Bai, H.; Zhong, Z.; Jia, X.; Cao, J. A Survey of Model Compression Strategies for Object Detection. Multimed. Tools Appl. 2024, 83, 48165–48236. [Google Scholar] [CrossRef]
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A Survey of Model Compression and Acceleration for Deep Neural Networks. IEEE Signal Process. Mag. 2018, 35, 126–136. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
- Boutros, F.; Damer, N.; Kuijper, A. Quantface: Towards lightweight face recognition by synthetic data low-bit quantization. In Proceedings of the 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 855–862. [Google Scholar] [CrossRef]
- Park, J.; Qian, X.; Jo, Y.; Sung, W. Low-latency lightweight streaming speech recognition with 8-bit quantized simple gated convolutional neural networks. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1803–1807. [Google Scholar] [CrossRef]
- Peng, D.; Wang, T. Pruning algorithm based on GoogLeNet model. Control Decis. 2019, 34, 6. [Google Scholar] [CrossRef]
- Xu, J. Research on neural network compression technology based on model pruning. Inf. Commun. 2019, 12, 3. [Google Scholar]
- Xiang, H.; Yu, S.; Li, P.; Li, W.; Wu, E.; Sheng, B. SlimFluid-Net: Fast fluid simulation using ADMM pruning. In Proceedings of the Computer Graphics International Conference, Cham, Switzerland, 12–16 September 2022; pp. 582–593. [Google Scholar] [CrossRef]
- Blakeney, C.; Li, X.; Yan, Y.; Zong, Z. Parallel blockwise knowledge distillation for deep neural network compression. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 1765–1776. [Google Scholar] [CrossRef]
- Kang, M.; Kang, S. Data-free knowledge distillation in neural networks for regression. Expert Syst. Appl. 2021, 175, 114813. [Google Scholar] [CrossRef]
- Tung, F.; Mori, G. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1365–1374. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar] [CrossRef]
- He, Q.; Xu, A.; Ye, Z.; Zhou, W.; Cai, T. Lightweight YOLOX for Autonomous Driving. Sensors 2023, 23, 7596. [Google Scholar] [CrossRef] [PubMed]
- Shi, P.; Li, L.; Qi, H.; Yang, A. MobileNetV2_CA: A Lightweight Object Detection Network in Autonomous Driving. Technologies 2023, 11, 47. [Google Scholar] [CrossRef]
- Yang, M.; Fan, X. YOLOv8-Lite: A Lightweight Object Detection Model for Real-Time Autonomous Driving Systems. J. Real-Time Image Process. 2024, 1, 1–16. [Google Scholar] [CrossRef]
- Cui, S.; Liu, F.; Wang, Z.; Zhou, X.; Yang, B.; Li, H.; Yang, J. DAN-YOLO: A Lightweight and Accurate Object Detector Using Dilated Aggregation Network for Autonomous Driving. Electronics 2024, 13, 3410. [Google Scholar] [CrossRef]
- Li, M.; Liu, X.; Chen, S.; Yang, L.; Du, Q.; Han, Z.; Wang, J. MST-YOLO: Small Object Detection Model for Autonomous Driving. Sensors 2024, 24, 7347. [Google Scholar] [CrossRef] [PubMed]
- Kalgaonkar, P.; El-Sharkawy, M. An Improved Lightweight Network Using Attentive Feature Aggregation for Object Detection in Autonomous Driving. J. Low Power Electron. Appl. 2023, 13, 49. [Google Scholar] [CrossRef]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 14420–14430. [Google Scholar] [CrossRef]
- Wen, G.; Li, M.; Luo, Y.; Shi, C.; Tan, Y. The improved YOLOv8 algorithm based on EMSPConv and SPE-head modules. Multimed. Tools Appl. 2024, 83, 61007–61023. [Google Scholar] [CrossRef]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Han, J.; Liang, X.; Xu, H.; Chen, K.; Hong, L.; Mao, J.; Ye, C.; Zhang, W.; Li, Z.; Liang, X.; et al. SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving. arXiv 2021, arXiv:2106.11118. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. arXiv 2020, arXiv:1805.04687. Available online: http://arxiv.org/abs/1805.04687 (accessed on 13 June 2024).










| Name | Parameter Specifications |
|---|---|
| CPU | Intel® Xeon® CPU E5-2680v4 |
| GPU | 3080 Ti-12G |
| Memory | 32 GB |
| CUDA version | 11.6.0 |
| Python | 3.8 |
| Pytorch | 1.13.1 |
| Item | Setting |
|---|---|
| Optimizer | SGD |
| Initial learning rate | 0.01 |
| Learning rate schedule | Warm-up at the beginning followed by a decay schedule during training |
| Weight decay | 0.0005 |
| Batch size | 16 |
| Input image size | 640 × 640 |
| Data loading workers | 8 threads |
| Data augmentation | Mosaic + MixUp |
| Mosaic scheduling | Mosaic disabled in the final 20 epochs |
| Precision | Automatic Mixed Precision (AMP) |
| Training epochs | 100 |
| Confidence threshold (test) | 0.25 |
| IoU threshold (test) | 0.70 |
| Model | C2f-Faster | EfficientHead | P | R | mAP_0.5 | Param | FLOPs |
|---|---|---|---|---|---|---|---|
| Model 1 | 0.758 | 0.601 | 0.677 | 11.13 M | 28.4 G | ||
| Model 2 | √ | 0.769 | 0.581 | 0.671 | 8.31 M | 21.4 G | |
| Model 3 | √ | 0.765 | 0.595 | 0.673 | 9.43 M | 24.0 G | |
| Model 4 | √ | √ | 0.768 | 0.588 | 0.665 | 7.67 M | 16.1 G |
| Model | Param | FLOPs |
|---|---|---|
| YOLOv8-ShuffleNetv2 | 6.39 M | 16.5 G |
| YOLOv8-MobileNetv3 | 6.74 M | 16.4 G |
| YOLOv8-EfficientViT | 8.39 M | 20.4 G |
| FE-YOLOv8 | 7.67 M | 16.1 G |
| Model | P | R | mAP_0.5 | mAP_0.5: 0.95 | Param | FLOPs |
|---|---|---|---|---|---|---|
| SSD | 0.693 | 0.515 | 0.608 | 0.345 | 4.58 M | 10.2 G |
| Faster-RCNN | 0.742 | 0.560 | 0.650 | 0.380 | 46.3 M | 138.7 G |
| CenterNet | 0.725 | 0.550 | 0.640 | 0.365 | 32.1 M | 40.2 G |
| YOLOv5s | 0.742 | 0.560 | 0.650 | 0.375 | 7.2 M | 15.0 G |
| YOLOv6s | 0.757 | 0.573 | 0.664 | 0.388 | 11.2 M | 23.7 G |
| YOLOv7s | 0.765 | 0.580 | 0.670 | 0.442 | 19.8 M | 45.5 G |
| YOLOV8s | 0.758 | 0.601 | 0.677 | 0.445 | 11.13 M | 28.4 G |
| YOLOV9s | 0.766 | 0.592 | 0.665 | 0.440 | 8.4 M | 27.6 G |
| YOLOV11s | 0.761 | 0.610 | 0.668 | 0.439 | 10.6 M | 21.4 G |
| FE-YOLOv8 | 0.768 | 0.588 | 0.665 | 0.438 | 7.67 M | 16.1 G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, M.; Wang, J.; Chen, S.; Liu, L.; Li, K.; Zhao, Z.; Yun, H. A Structurally Optimized and Efficient Lightweight Object Detection Model for Autonomous Driving. Sensors 2026, 26, 54. https://doi.org/10.3390/s26010054
Li M, Wang J, Chen S, Liu L, Li K, Zhao Z, Yun H. A Structurally Optimized and Efficient Lightweight Object Detection Model for Autonomous Driving. Sensors. 2026; 26(1):54. https://doi.org/10.3390/s26010054
Chicago/Turabian StyleLi, Mingjing, Junshuai Wang, Shuang Chen, LinLin Liu, KaiJie Li, Zengzhi Zhao, and Haijiao Yun. 2026. "A Structurally Optimized and Efficient Lightweight Object Detection Model for Autonomous Driving" Sensors 26, no. 1: 54. https://doi.org/10.3390/s26010054
APA StyleLi, M., Wang, J., Chen, S., Liu, L., Li, K., Zhao, Z., & Yun, H. (2026). A Structurally Optimized and Efficient Lightweight Object Detection Model for Autonomous Driving. Sensors, 26(1), 54. https://doi.org/10.3390/s26010054
