Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network
Abstract
1. Introduction
- (1)
- Binocular Vision Depth-Compensated Head Pose Estimation component
- (2)
- Multi-source Temporal Bidirectional Long Short-Term Memory Feature Fusion component
- (3)
- High-Accuracy, Low-Latency Driver Attention State Recognition
2. Method
2.1. Introduction to the Real-Time Driver Attention State Recognition Component
2.2. Introduction to the YOLO11n Pose Detector
2.3. Binocular Vision-Based Feature Point Distance Measurement
2.4. Depth-Compensated Head Pose Estimation
2.5. Driver Gaze Region Partition
2.6. Multi-Source Information Fusion Bidirectional Long Short-Term Memory
3. Experiment, Results and Discussion
3.1. Experimental Environment Settings and Dataset
3.2. Model Training and Testing for Face Detection and Facial Landmark Detection
3.3. Comparative Experiments on Driver Head Posture Estimation
3.4. Model Training and Testing for Multi-Source Temporal Bidirectional Long Short-Term Memory
3.5. Deployment Experiment of BV-DHPE and MSTBi LSTM
3.6. In-Vehicle Testing of Driver Attention State Recognition Method
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Akiduki, T.; Nagasawa, J.; Zhang, Z.; Omae, Y.; Arakawa, T.; Takahashi, H. Inattentive Driving Detection Using Body-Worn Sensors: Feasibility Study. Sensors 2022, 22, 352. [Google Scholar] [CrossRef] [PubMed]
- Halin, A.; Verly, J.G.; Van Droogenbroeck, M. Survey and Synthesis of State of the Art in Driver Monitoring. Sensors 2021, 21, 5558. [Google Scholar] [CrossRef]
- Jegham, I.; Ben Khalifa, A.; Alouani, I.; Mahjoub, M.A. A Novel Public Dataset for Multimodal Multiview and Multispectral Driver Distraction Analysis: 3MDAD. Signal Process. Image Commun. 2020, 88, 115960. [Google Scholar] [CrossRef]
- Li, W.; Huang, J.; Xie, G.; Karray, F.; Li, R. A Survey on Vision-Based Driver Distraction Analysis. J. Syst. Archit. 2021, 121, 102319. [Google Scholar] [CrossRef]
- Wu, X.; Shi, C.; Yan, L. Driving Attention State Detection Based on GRU-EEGNet. Sensors 2024, 24, 5086. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Qiao, Y.; Si, T.; Wang, J.; Xu, T. Eye-SCAN: Eye-Movement-Attention-Based Spatial Channel Adaptive Network for Traffic Accident Prediction. Pattern Recognit. 2025, 165, 111590. [Google Scholar] [CrossRef]
- Qiao, Y.; Yang, X.; Wang, J.; Si, T.; Guo, Q. Driver Cognitive Distraction Detection Based on Eye Movement Behavior and Integration of Multi-View Space-Channel Feature. Expert Syst. Appl. 2025, 266, 125975. [Google Scholar] [CrossRef]
- Liu, J.; Huang, W.; Li, H.; Ji, S.; Du, Y.; Li, T. SLAFusion: Attention Fusion Based on SAX and LSTM for Dangerous Driving Behavior Detection. Inf. Sci. 2023, 640, 119063. [Google Scholar] [CrossRef]
- Yang, D.; Wang, Y.; Wei, R.; Guan, J.; Huang, X.; Cai, W.; Jiang, Z. An Efficient Multi-Task Learning CNN for Driver Attention Monitoring. J. Syst. Archit. 2024, 148, 103085. [Google Scholar] [CrossRef]
- Saleem, A.A.; Siddiqui, H.U.R.; Raza, M.A.; Rustam, F.; Dudley, S.; Ashraf, I. A Systematic Review of Physiological Signals Based Driver Drowsiness Detection Systems. Cognit. Neurodyn. 2022, 17, 1229–1259. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Chung, W.-Y. Electroencephalogram-Based Approaches for Driver Drowsiness Detection and Management: A Review. Sensors 2022, 22, 1100. [Google Scholar] [CrossRef]
- Satti, A.T.; Kim, J.; Yi, E.; Cho, H.; Cho, S. Microneedle Array Electrode-Based Wearable EMG System for Detection of Driver Drowsiness through Steering Wheel Grip. Sensors 2021, 21, 5091. [Google Scholar] [CrossRef]
- Li, Z.; Chen, L.; Peng, J.; Wu, Y. Automatic Detection of Driver Fatigue Using Driving Operation Information for Transportation Safety. Sensors 2017, 17, 1212. [Google Scholar] [CrossRef]
- Arakawa, T. Trends and Future Prospects of the Drowsiness Detection and Estimation Technology. Sensors 2021, 21, 7921. [Google Scholar] [CrossRef]
- Huang, J.; Liu, Y.; Peng, X. Recognition of Driver’s Mental Workload Based on Physiological Signals, a Comparative Study. Biomed. Signal Process. Control 2022, 71, 103094. [Google Scholar] [CrossRef]
- Khan, K.; Khan, R.U.; Leonardi, R.; Migliorati, P.; Benini, S. Head Pose Estimation: A Survey of the Last Ten Years. Signal Process. Image Commun. 2021, 99, 116479. [Google Scholar] [CrossRef]
- Jiang, R.-Q.; Chen, L.-L. Driving Stress Estimation in Physiological Signals Based on Hierarchical Clustering and Multi-View Intact Space Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 13141–13154. [Google Scholar] [CrossRef]
- Debie, E.; Fernandez Rojas, R.; Fidock, J.; Barlow, M.; Kasmarik, K.; Anavatti, S.; Garratt, M.; Abbass, H.A. Multimodal Fusion for Objective Assessment of Cognitive Workload: A Review. IEEE Trans. Cybern. 2021, 51, 1542–1555. [Google Scholar] [CrossRef]
- Barra, P.; Barra, S.; Bisogni, C.; De Marsico, M.; Nappi, M. Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection. IEEE Trans. Image Process. 2020, 29, 5457–5468. [Google Scholar] [CrossRef]
- Celestino, J.; Marques, M.; Nascimento, J.C.; Costeira, J.P. 2D Image Head Pose Estimation via Latent Space Regression under Occlusion Settings. Pattern Recognit. 2023, 137, 109288. [Google Scholar] [CrossRef]
- Hu, T.; Jha, S.; Busso, C. Temporal Head Pose Estimation from Point Cloud in Naturalistic Driving Conditions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 8063–8076. [Google Scholar] [CrossRef]
- Liu, H.; Wang, D.; Xu, K.; Zhou, P.; Zhou, D. Lightweight Convolutional Neural Network for Counting Densely Piled Steel Bars. Autom. Constr. 2023, 146, 104692. [Google Scholar] [CrossRef]
- Liu, H.; Xu, K. Recognition of Gangues from Color Images Using Convolutional Neural Networks with Attention Mechanism. Measurement 2023, 206, 112273. [Google Scholar] [CrossRef]
- Essahraui, S.; Lamaakal, I.; El Hamly, I.; Maleh, Y.; Ouahbi, I.; El Makkaoui, K.; Filali Bouami, M.; Pławiak, P.; Alfarraj, O.; Abd El-Latif, A.A. Real-Time Driver Drowsiness Detection Using Facial Analysis and Machine Learning Techniques. Sensors 2025, 25, 812. [Google Scholar] [CrossRef]
- Gao, Z.; Chen, X.; Xu, J.; Yu, R.; Zhang, H.; Yang, J. Semantically-Enhanced Feature Extraction with CLIP and Transformer Networks for Driver Fatigue Detection. Sensors 2024, 24, 7948. [Google Scholar] [CrossRef]
- Valle, R.; Buenaposada, J.M.; Baumela, L. Multi-Task Head Pose Estimation in-the-Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2874–2881. [Google Scholar] [CrossRef]
- Qin, Z.; Zhao, P.; Zhuang, T.; Deng, F.; Ding, Y.; Chen, D. A Survey of Identity Recognition via Data Fusion and Feature Learning. Inf. Fusion 2023, 91, 694–712. [Google Scholar] [CrossRef]
- Zhu, J.; Wei, S.; Xie, X.; Yang, C.; Li, Y.; Li, X.; Hu, B. Content-Based Multiple Evidence Fusion on EEG and Eye Movements for Mild Depression Recognition. Comput. Methods Programs Biomed. 2022, 226, 107100. [Google Scholar] [CrossRef]
- Jaafar, N.; Lachiri, Z. Multimodal Fusion Methods with Deep Neural Networks and Meta-Information for Aggression Detection in Surveillance. Expert Syst. Appl. 2023, 211, 118523. [Google Scholar] [CrossRef]
- GitHub-Ultralytics/Ultralytics: Ultralytics YOLO11. Available online: https://github.com/ultralytics/ultralytics (accessed on 23 July 2025).
Model | Precision (%) | Recall (%) | AP50-95 (%) | Parameters (M) | GFLOPs |
---|---|---|---|---|---|
YOLOv8n Pose | 99.7 | 99.6 | 89.2 | 3.3 | 9.2 |
YOLO11n Pose | 99.8 | 99.9 | 90.7 | 2.9 | 7.4 |
YOLO12n Pose | 99.6 | 99.8 | 89.2 | 2.8 | 7.4 |
Model | Precision (%) | Recall (%) | AP50-95 (%) | Inference (ms) | Postprocess (ms) |
---|---|---|---|---|---|
YOLOv8n Pose | 99.7 | 99.6 | 93.0 | 1.1 | 1.2 |
YOLO11n Pose | 99.7 | 99.9 | 94.5 | 1.1 | 1.0 |
YOLO12n Pose | 99.5 | 99.7 | 93.1 | 1.7 | 1.2 |
Head Pose | Fixation Region | Truck Speed | Five-Fold Cross-Validation | Accuracy (%) | Inference (ms) |
---|---|---|---|---|---|
√ | 85.2 | 0.1 | |||
√ | √ | 89.7 | 0.1 | ||
√ | √ | √ | 93.2 | 0.1 | |
√ | √ | √ | √ | 93.5 | 0.1 |
Model | Precision (%) | Recall (%) | AP50-95 (%) | Inference (ms) |
---|---|---|---|---|
YOLO11n Pose | 99.3 | 99.5 | 92.2 | 16.8 |
Model | Five-Fold Cross-Validation | Accuracy (%) | Inference (ms) |
---|---|---|---|
MSTBi-LSTM | 93.0 | 2.1 | |
MSTBi-LSTM | √ | 93.2 | 2.1 |
Method | Accuracy (%) | F1 Score | TPR (%) | FPR (%) | Inference (ms) |
---|---|---|---|---|---|
monocular | 80.1 | 80.3 | 80.4 | 19.8 | 18.2 |
RT-DASR | 90.4 | 92.3 | 90.7 | 8.8 | 21.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, S.; Zhang, W.; Liu, Y.; Chen, X.; Liu, H. Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network. Sensors 2025, 25, 5548. https://doi.org/10.3390/s25175548
Zhou S, Zhang W, Liu Y, Chen X, Liu H. Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network. Sensors. 2025; 25(17):5548. https://doi.org/10.3390/s25175548
Chicago/Turabian StyleZhou, Shuhui, Wei Zhang, Yulong Liu, Xiaonian Chen, and Huajie Liu. 2025. "Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network" Sensors 25, no. 17: 5548. https://doi.org/10.3390/s25175548
APA StyleZhou, S., Zhang, W., Liu, Y., Chen, X., & Liu, H. (2025). Real-Time Driver Attention Detection in Complex Driving Environments via Binocular Depth Compensation and Multi-Source Temporal Bidirectional Long Short-Term Memory Network. Sensors, 25(17), 5548. https://doi.org/10.3390/s25175548