Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM
Abstract
1. Introduction
2. Neural Network Architecture
2.1. Motion-Temporal Attention Module (MTAM)
2.2. Loss Functions
2.3. Improved YOLOv8s Loss Function with MTAM Module
3. Experimental Results
3.1. Experimental Environment Setup
3.2. Image Collection and Augmentation
3.3. Image Data Classification and Labeling
3.4. 9-Mosaic Parameters
3.5. Data Analysis
3.6. Discussion on SoTA Comparison
3.7. Performance Metrics and Calibration Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, J.; Wu, Y.; Deng, R.; Xu, S.; Gao, J.; Burke, A. A Survey of Autonomous Driving from a Deep Learning Perspective. ACM Comput. Surv. 2025, 57, 1–60. [Google Scholar] [CrossRef]
- Sakure, A.; Bachhav, K.; Bitne, C.; Chopade, V.; Dhavade, V.; Sangule, U. Generative AI solution for lane departure, pedestrian detection and paving of autonomous vehicle. In Proceedings of the 2023 4th International Conference on Computation, Automation and Knowledge Management (ICCAKM), Dubai, United Arab Emirates, 12–13 December 2023; pp. 1–6. [Google Scholar]
- Chen, X.; Wu, F.; Yang, R. Recent advances in perception and decision-making for autonomous vehicles: From deep learning to real-world deployment. IEEE Intell. Syst. 2023, 38, 46–57. [Google Scholar]
- Wang, X.; Gao, H.; Ma, X.; Wang, L. A lightweight hybrid detection system based on the OpenMV vision module for an embedded transportation vehicle. Sensors 2025, 25, 5724. [Google Scholar] [CrossRef] [PubMed]
- Khadidos, A.O.; Yafoz, A. Leveraging RetinaNet—based object detection model for assisting visually impaired individuals with metaheuristic optimization algorithm. Sci. Rep. 2025, 15, 15979–15998. [Google Scholar] [CrossRef] [PubMed]
- Ali, A.; Zhang, T.; Khan, M. YOLOv8 for real-time road object detection in autonomous driving systems. IEEE Access 2023, 11, 121230–121242. [Google Scholar]
- Ayachi, R.; Afif, M.; Said, Y. Traffic sign recognition based on scaled convolutional neural network for advanced driver assistance system. In Proceedings of the 2020 IEEE 4th International Conference on Image Processing, Applications and Systems (IPAS), Genova, Italy, 9–11 December 2020; pp. 149–154. [Google Scholar]
- Wang, K.; Lin, Y.; Zhou, Q. A lightweight SSD-based object detection model for real-time embed applications. J. Real-Time Image Proc. 2023, 20, 945–960. [Google Scholar]
- Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the performance of one-stage and two-stage object detectors in autonomous vehicles using camera data. Remote Sens. 2021, 13, 89. [Google Scholar] [CrossRef]
- Ahmed, S.; Raza, A. YOLOv8: Redefining anchor-free real-time object detection. IEEE Access 2024, 12, 65432–65445. [Google Scholar]
- Hu, S.; Luo, X. Advanced training techniques in YOLOv8: Mosaic, mixup, and beyond. IEEE Access 2023, 11, 117842–117855. [Google Scholar]
- Li, J.; Xu, R.; Liu, X.; Ma, J.; Li, B.; Zou, Q.; Ma, J.; Yu, H. Domain adaptation based object detection for autonomous driving in foggy and rainy weather. IEEE Trans. Intell. Transport. Syst. 2025, 10, 900–911. [Google Scholar] [CrossRef]
- Baek, J.-W.; Chung, K. Swin transformer-based object detection model using explainable meta-learning mining. Appl. Sci. 2023, 13, 3213. [Google Scholar] [CrossRef]
- Tseng, Y.; Chen, M.; Lee, T. Regulatory requirements for construction safety on Taiwanese expressways and their integration with smart vehicle systems. J. Transport. Saf. Secur. 2024, 16, 145–160. [Google Scholar]
- Lin, H.; Wang, J.; Hsu, C. YOLOv8-based real-time detection of highway work zone hazards for autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 2024, 25, 4021–4033. [Google Scholar]
- Zhang, T.; Liu, S. Advanced YOLOv8 architecture with CIoU/EIoU loss for robust object detection. IEEE Access 2023, 11, 125321–125335. [Google Scholar]
- Tan, W.; Yang, W.; Xu, Y.; Tang, H. Temporal attention unit towards efficient spatiotemporal predictive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 24562–24572. [Google Scholar]
- Zhang, J.; Chen, C.; Li, Z.; Wang, H. LTA-CNN: A lightweight temporal attention convolutional neural network for latency-sensitive edge intelligence. Comput. Electr. Eng. 2023, 110, 108751. [Google Scholar]
- Bertasius, G.; Wang, H.; Torresani, L. Is space-time attention all you need for video understanding? In Proceedings of the 38 th International Conference on Machine Learning, Vienna, Austria, 18–24 July 2021; Volume 34, pp. 9133–9145. [Google Scholar]
- Liu, Y.; Zhang, S.; Huang, T.; Zhao, K. STA-C3DL: Spatiotemporal attention with 3D convolution, LSTM, and attention for action recognition. Front. Physiol. 2024, 15, 1472380. [Google Scholar]
- Zhang, Y.; Xiao, Y.; Zhang, Y.; Zhang, T. Video saliency prediction via single feature enhancement and temporal recurrence. Eng. Appl. Artif. Intell. 2025, 160, 111840. [Google Scholar] [CrossRef]
- Huang, Y.; Chen, Y. Survey of state-of-art autonomous driving technologies with deep learning. In Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Macau, China, 11–14 December 2020; pp. 221–228. [Google Scholar]
- Liu, S.; Zhang, M.; Wei, L.; Chen, Y. Multi-level intertemporal attention-guided network for change detection. Remote Sens. 2025, 17, 2233. [Google Scholar] [CrossRef]
- Wang, C.-C.; Lu, Y.-S.; Lin, W.-P. An modified YOLOv5 Algorithm to improved image identification for autonomous driving. In Proceedings of the 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), Las Vegas, NV, USA, 24–27 July 2023; pp. 2722–2724. [Google Scholar]
- Muralidhara, S.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Attention-guided disentangled feature aggregation for video object detection. Sensors 2022, 22, 8583. [Google Scholar] [CrossRef]
- Olimov, B.; Kim, J.; Paul, A.; Subramanian, B. An efficient deep convolutional neural network for semantic segmentation. In Proceedings of the 2020 8th International Conference on Orange Technology (ICOT), Daegu, Republic of Korea, 18–21 December 2020; pp. 1–9. [Google Scholar]
- Zhang, Z.; Lin, P.; Ma, S.; Xu, T. An improved Yolov5s algorithm for emotion detection. In Proceedings of the 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Chengdu, China, 19–21 August 2022; pp. 1002–1006. [Google Scholar]
- Sophia, S.; Joeffred, G.J. Human behavior and abnormality detection using YOLO and conv2D net. In Proceedings of the 2024 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 24–26 April 2024; pp. 70–75. [Google Scholar]
- Ragab, M.G.; Abdulkadir, S.J.; Muneer, a.; Alqushaibi, A.; Sumiea, E.H.; Qureshi, R. A comprehensive systematic review of YOLO for medical object detection (2018 to 2023). IEEE Access 2024, 12, 57815–57836. [Google Scholar] [CrossRef]
- Kee, E.; Chong, J.J.; Choong, Z.J.; Lau, M. A Comparative analysis of cross-validation techniques for a smart and lean pick-and-place solution with deep learning. Electronics 2023, 12, 2371. [Google Scholar] [CrossRef]
Item | Specification |
---|---|
CPU/Memory | 6-Core AMD Ryzen 5 7500F 64 bit Processor 3.7 GHz/32 GB RAM |
GPU/Memory | GPU NVDIA GeForce RTX 5080/32 GB Video RAM |
Deep learning framework | Pytorch 2.1.2 |
CUDA toolkit | 12.4 |
Python | 3.13.2 |
Operating system | Microsoft Windows 11 Home |
Optimizer | SGD (momentum 0.937, weight decay 5 × 10−4) |
LR schedule | Cosine annealing + 5-epoch warm-up |
Initial LR | 0.01 |
Batch size | 24 |
Epochs | 150 |
Confidence threshold | 0.25 |
NMS IoU | 0.45 |
Item | Specification |
---|---|
CPU | Quad-core ARM A57 @ 1.43 GHz |
GPU | GPU 128-core Maxwell |
Memory | 8 GB 64-bit LPDDR4 25.6 GB/s |
Ubuntu | 22.04 |
Connectivity | Gigabit Ethernet, M.2 Key E |
Interface | USB 4x USB 3.0, USB 2.0 Micro-B |
Operating system | Jetson Linux 36.4.4 |
No. | Image Augmentations | Adjustment Range | Probability |
---|---|---|---|
1 | Contrast | Color ranges from 1–21 | 0.3 |
2 | Mixing | 0.5 mix images | 0.1 |
3 | Brightness | Between −40% and +40% | 0.3 |
4 | Grayscale | Between −10% and +10% | 0.25 |
5 | Saturation | Colorfulness from 0.5 to 1.5× | 0.3 |
6 | Blur(Gaussion σ) | 0 to 1.5 | 0.2 |
7 | Hue | Between −15° and +15° | 0.2 |
8 | Noise | Add noise 0 to 10% of pixels | 0.1 |
Class | Precision Rate | Recall Rate | mAP(IoU[0.5]) | mAP(IoU[0.5:0.95]) |
---|---|---|---|---|
all | 91.12 ± 0.70% | 89.30 ± 0.69% | 90.09 ± 0.51% | 71.13 ± 0.82% |
construction | 97.91 ± 0.30% | 95.10 ± 0.37% | 94.85 ± 0.25% | 76.30 ± 0.60% |
warning sign | 90.85± 0.60% | 88.87± 0.36% | 89.63± 0.30% | 71.85± 0.72% |
person | 84.60 ± 1.20% | 83.96 ± 1.35% | 85.80 ± 0.98% | 65.24 ± 1.15% |
Methods | Precision Rate | Recall Rate | mAP(IoU[0.5]) | mAP(IoU[0.5:0.95]) |
---|---|---|---|---|
GIoU | 91.12 ± 0.70% | 89.30 ± 0.69% | 90.09 ± 0.51% | 71.13 ± 0.82% |
CIoU | 91.80 ± 0.39% | 88.20 ± 0.25% | 90.77 ± 0.68% | 70.20 ± 0.33% |
EIoU | 88.10 ± 0.56% | 91.40 ± 0.48% | 83.80 ± 0.12% | 70.10 ± 0.46% |
(a) | ||||
Model | Params(M) | GFLOPs | mAP(IoU[0.5])(%) | Latency (ms/Frame) |
YOLOv8s baseline | 11.20 | 28.30 | 88.47 ± 0.21 | 20.30 |
CenterNet2 | 11.00 | 26.60 | 86.50 ± 0.59 | 28.00 |
+MTAM only | 12.30 | 29.00 | 90.77 ± 0.68 | 22.10 |
+9-Mosaic only +MTAM + 9-Mosaic | 11.20 13.00 | 28.30 29.50 | 89.92 ± 0.45 91.05 ± 0.35 | 22.80 23.10 |
(b) | ||||
Model | FPS | Latency (ms) | Power (W) | |
YOLOv8s baseline | 49.00 | 20.30 | 8.20 | |
+MTAM only | 44.00 | 22.10 | 8.90 | |
+9-Mosaic only +MTAM + 9-Mosaic | 43.20 42.00 | 22.80 23.10 | 9.00 9.30 |
Module | Params (M) | GFLOPs (%) | ΔmAP(IoU[0.5]) (%) | ΔLatency (ms/Frame) |
---|---|---|---|---|
YOLOv8s baseline | 11.20 | 28.30 | 88.47 ± 0.21 | 20.30 |
CBAM (2023) | +1.10 | +3.20 | +2.80 | +3.50 |
NL Block (2023) | +2.80 | +8.40 | +3.90 | +6.10 |
STA-C3DL (2024) | +3.00 | +9.10 | +4.50 | +7.20 |
MTAM+ 9-Mosaic (Ours) | +1.80 | +4.24 | +2.91 | +2.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, W.-P.; Wang, C.-C.; Li, E.-C.; Yeh, C.-H. Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM. Sensors 2025, 25, 6420. https://doi.org/10.3390/s25206420
Lin W-P, Wang C-C, Li E-C, Yeh C-H. Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM. Sensors. 2025; 25(20):6420. https://doi.org/10.3390/s25206420
Chicago/Turabian StyleLin, Wen-Piao, Chun-Chieh Wang, En-Cheng Li, and Chien-Hung Yeh. 2025. "Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM" Sensors 25, no. 20: 6420. https://doi.org/10.3390/s25206420
APA StyleLin, W.-P., Wang, C.-C., Li, E.-C., & Yeh, C.-H. (2025). Enhanced Real-Time Highway Object Detection for Construction Zone Safety Using YOLOv8s-MTAM. Sensors, 25(20), 6420. https://doi.org/10.3390/s25206420