Occupant-Aware Decision-Making with Large Vision-Language Model for Autonomous Vehicles
Abstract
1. Introduction
- We propose an occupant-aware decision-making paradigm for AD systems, which is capable of perceiving, analyzing, and reasoning about the occupant’s mental activities and needs, and making driving decisions that match the occupant’s demands and preferences.
- We develop a large-scale occupant-centric decision-making dataset, which not only includes naturalistic driving data, but also includes the occupant’s states, subjective feelings, and needs.
- We evaluate the performance of ODP on the developed dataset, and show that its decision-making capabilities can effectively match the occupant’s demands and preferences.
1.1. Related Work
1.1.1. Occupant Monitoring System
1.1.2. Mental State Analysis
1.1.3. Occupant-Aware Autonomous Driving
1.1.4. Large Vision-Language Models in Autonomous Driving
2. Materials and Methods
2.1. Occupant-Aware Decision-Making Paradigm
2.2. Dataset
2.3. Model Training
You are a senior expert in autonomous driving decision-making analysis.
Task
Based on the following input information, please analyze the driving scenario the autonomous vehicle is currently in and infer the occupant’s current states.
Input
1. Vehicle Dynamics:
The vehicle’s current speed is v km/h, longitudinal acceleration is m/s2, lateral acceleration is m/s2, and angular velocity is deg/s.
2. Driving Condition Video:
3. Occupant’s Facial Video:
4. Occupant’s States:
The occupant’s current heart rate, heart rate variability, and Baevsky stress index are bpm, ms, and , respectively. The facial expression is .
Output
Based on the above information, please analyze the current driving scenario of the autonomous vehicle and the occupant’s state, and infer the occupant’s subjective feelings and driving decisions through a clear and logical chain of thought. The specific content includes:
1. Describe the key visual elements in the driving scenario (such as road type, traffic participants, traffic light status, obstacles, lane lines, weather/lighting conditions, etc.), and determine the vehicle’s current behavior.
2. Analyze the occupant’s facial video and physiological states to infer the occupant’s emotional state (e.g., nervous, relaxed, confused, surprised, etc.).
3. Analyze the occupant’s subjective feelings and driving decisions.
Requirements
1. Subjective feelings are divided into three dimensions: sense of safety, comfort, and travel efficiency. Each dimension is divided into five levels: very good, good, medium, poor, and very poor.
2. Driving decisions should be selected from the following options: remain constant speed, move forward slowly, left turn, right turn, acceleration, deceleration, rapid acceleration, rapid deceleration, lane change to the left, lane change to the right, slight left adjustment, slight right adjustment, follow at a greater distance, follow at a closer distance, stop and wait, and overtake.
3. Output should be in JSON format, and the JSON output should contain five key-value pairs: “chain of thought”, “sense of safety”, “comfort”, “travel efficiency”, and “driving decision”.
You are a senior expert in autonomous driving decision-making analysis.
Task
Based on the following input information, please analyze the current driving scenario of the autonomous vehicle, infer the occupant’s current state, and generate a clear and logically rigorous chain of thought from input to results.
Input
1. Vehicle Dynamics:
The vehicle’s current speed is v km/h, longitudinal acceleration is m/s2, lateral acceleration is m/s2, and angular velocity is deg/s.
2. Driving Condition Video:
3. Occupant’s Facial Video:
4. Occupant’s States:
The occupant’s current heart rate, heart rate variability, and Baevsky stress index are bpm, ms, and , respectively. The facial expression is .
Results
1. Subjective Feeling Estimation:
The occupant thinks that the current vehicle’s sense of safety is , comfort is , and travel efficiency is .
2. Occupant’s Desired Action:
The occupant believes the action the vehicle should take now is .
Output
Based on the above information, please analyze the current driving scenario of the autonomous vehicle and the occupant’s state, infer the underlying reasons for the occupant’s subjective feelings and specific decisions, and generate a clear and logically rigorous chain of thought. The specific content includes:
1. Describe the key visual elements in the driving scenario (such as road type, traffic participants, traffic light status, obstacles, lane lines, weather/lighting conditions, etc.), and determine the vehicle’s current behavior.
2. Analyze occupant facial video and physiological states to infer the occupant’s emotional state (e.g., tense, relaxed, confused, surprised, etc.).
3. Analyze how the occupant’s subjective feelings can be derived from the current vehicle state and occupant expressions and physiological states.
4. Analyze how the occupant’s driving decisions can be derived from the current vehicle state and the occupant’s subjective feelings.
Requirements
1. Output a chain of thought: Present the complete causal path from environmental perception → emotional/cognitive response → decision-making behavior in the form of a logical chain: “Because… therefore… consequently… ultimately…”.
2. Avoid unsubstantiated speculation; all inferences must be based on input information.
3. If certain input is missing or ambiguous, please clearly state this and make cautious inferences based on observable information.
4. Use objective, professional, and concise language, avoiding emotional expressions.
5. Only output the chain of thought; do not include explanatory text.
{
“chain of thought”: “”,
“sense of safety”: “”,
“comfort”: “”,
“efficiency”: “”,
“driving decision”: “”
}
3. Results and Discussion
3.1. Comparative Experiments
3.2. Ablation Studies
3.3. Subjective Feeling Estimation Tests
3.4. Case Studies
1. Driving Scenario Analysis:
(Omitted for brevity…)
2. Occupant Emotional State Analysis:
- Facial Photo Analysis: The occupant’s expression is natural, with a slight smile, and their eyes are looking straight ahead, showing no obvious tension or anxiety. The heart rate is 66.0 bpm, which is within the normal range, further supporting a relaxed emotional state.
- Inference: The occupant’s overall emotional state is relaxed and focused, without significant emotional fluctuations.
3. Subjective Perception Derived from Vehicle and Occupant Status:
(Omitted for brevity…)
4. Deriving Driving Decisions from Subjective Perceptions:
- Objective Needs: The vehicle is currently in a complex traffic environment and needs to make minor adjustments to its position based on road conditions to ensure safe passage through obstacles.
- Subjective Perception: Although the occupant experienced some discomfort, he did not show strong emotional fluctuations. Instead, based on his perception of the vehicle’s state, he believed the vehicle should perform more precise positional adjustments.
- Decision Derivation: Considering the average travel efficiency, the occupant hoped that the vehicle could adapt to the traffic environment more precisely, ultimately suggesting that the vehicle slightly adjust its position to the left to optimize the driving experience.

1. Driving Scenario Analysis:
(Omitted for brevity…)
2. Occupant Emotional State Analysis:
- Facial Photo Analysis: The occupant’s facial expression is calm, eyes slightly closed, showing no obvious signs of tension or surprise. The heart rate is 68.0 bpm, within the normal range, and showed no abnormal fluctuations.
- Inference: The occupant is in a relaxed state of mind and does not feel any obvious stress or discomfort.
3. Subjective Perception Derived from Vehicle and Occupant Status:
(Omitted for brevity…)
4. Deriving Driving Decisions from Subjective Perceptions:
- Objective Needs: The vehicle is located below the overpass, in a mixed traffic section, where overall traffic speed is limited.
- Subjective Perception: The occupant is satisfied with the safety and comfort, but perceives potential room for improvement in turning paths and hoped to further enhance travel efficiency.
- Decision Derivation: The occupant believes there is room for improvement in turning path. Video shows other vehicles and obstacles (such as the red truck) on the right side of the vehicle. Ultimately, the occupant thinks the car should now make “slight left adjustment” to improve travel efficiency and optimize the driving experience.

4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AD | Autonomous Driving |
| AVs | Autonomous Vehicles |
| ODP | Occupant-Aware Decision-making Paradigm |
| OMS | Occupant Monitoring System |
| VLM | Vision-Language Model |
| CoT | Chain of Thought |
| AAA | American Auto Association |
| LLM | Large Language Model |
| ADAS | Advanced Driver Assistance System |
| HVI | Human–Vehicle Interface |
| IRL | Inverse Reinforcement Learning |
| ECG | Electrocardiogram |
| EMG | Electromyograph |
| EEG | Electroencephalograph |
| HR | Heart Rate |
| HRV | Heart Rate Variability |
| BSI | Baevsky Stress Index |
| PnC | Planning and Control |
| NOA | Navigation on Autopilot |
| SFT | Supervised Fine-tuning |
References
- Sana, F.; Azad, N.L.; Raahemifar, K. Autonomous Vehicle Decision-Making and Control in Complex and Unconventional Scenarios—A Review. Machines 2023, 11, 676. [Google Scholar] [CrossRef]
- Moye, B. AAA: Fear in Self-Driving Vehicles Persists; Technical Report; American Auto Association: Menlo Park, CA, USA, 2025. [Google Scholar]
- Xiao, J.; Goulias, K.G. Perceived Usefulness and Intentions to Adopt Autonomous Vehicles. Transp. Res. Part Policy Pract. 2022, 161, 170–185. [Google Scholar] [CrossRef]
- Vellenga, K.; Steinhauer, H.J.; Karlsson, A.; Falkman, G.; Rhodin, A.; Koppisetty, A.C. Driver Intention Recognition: State-of-the-art Review. IEEE Open J. Intell. Transp. Syst. 2022, 3, 602–616. [Google Scholar] [CrossRef]
- Karuppasamy, M.; Gangisetty, S.; Rai, S.N.; Masone, C.; Jawahar, C.V. Towards Safer and Understandable Driver Intention Prediction. In Proceedings of the International Conference on Computer Vision, Honolulu, HI, USA, 19–23 October 2025. [Google Scholar]
- Trillo, J.R.; Herrera-Viedma, E.; Morente-Molinera, J.A.; Cabrerizo, F.J. A Group Decision-Making Method Based on the Experts’ Behavior during the Debate. IEEE Trans. Syst. Man, Cybern. Syst. 2023, 53, 5796–5808. [Google Scholar] [CrossRef]
- Koksalmis, E.; Kabak, Ö. Deriving Decision Makers’ Weights in Group Decision Making: An Overview of Objective Methods. Inf. Fusion 2019, 49, 146–160. [Google Scholar] [CrossRef]
- Trillo, J.R.; Herrera-Viedma, E.; Morente-Molinera, J.A.; Cabrerizo, F.J. A Large Scale Group Decision Making System Based on Sentiment Analysis Cluster. Inf. Fusion 2023, 91, 633–643. [Google Scholar] [CrossRef]
- González-Quesada, J.C.; Trillo, J.R.; Porcel, C.; Pérez, I.J.; Cabrerizo, F.J. Modelling Large-Scale Group Decision-Making through Grouping with Large Language Models. Future Internet 2025, 17, 381. [Google Scholar] [CrossRef]
- Carlson, N.R.; Birkett, M.A. Physiology of Behavior, 12th ed.; Pearson: London, UK, 2016. [Google Scholar]
- Chen, S.; Wang, D.; Zuo, A.; Chen, Z.; Li, W.; Zan, J. Vehicle Ride Comfort Analysis and Optimization Using Design of Experiment. In Proceedings of the 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics, Nanjing, China, 26–28 August 2010; Volume 1, pp. 14–18. [Google Scholar] [CrossRef]
- Li, J.; Liu, Y.; Ji, X. Identification of Driver’s Braking Intention in Cut-In Scenarios; SAE Technical Paper 2023-01-0852; SAE International: Warrendale, PA, USA, 2023. [Google Scholar] [CrossRef]
- Cheng, Z.; Cheng, Z.Q.; He, J.Y.; Wang, K.; Lin, Y.; Lian, Z.; Peng, X.; Hauptmann, A. Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 110805–110853. [Google Scholar]
- Zhu, Y.; Wang, S.; Zhong, W.; Shen, N.; Li, Y.; Wang, S.; Li, Z.; Wu, C.; He, Z.; Li, L. A Survey on Large Language Model-Powered Autonomous Driving. Engineering, 2025; in press. [Google Scholar] [CrossRef]
- Li, J.; Yang, L.; Lv, C.; Chu, Y.; Liu, Y. GLF-STAF: A Global-Local-Facial Spatio-Temporal Attention Fusion Approach for Driver Emotion Recognition. IEEE Trans. Consum. Electron. 2025, 71, 3486–3497. [Google Scholar] [CrossRef]
- Sun, W.; Si, Y.; Guo, M.; Li, S. Driver Distraction Recognition Using Wearable IMU Sensor Data. Sustainability 2021, 13, 1342. [Google Scholar] [CrossRef]
- Dairi, A.; Harrou, F.; Sun, Y. Efficient Driver Drunk Detection by Sensors: A Manifold Learning-Based Anomaly Detector. IEEE Access 2022, 10, 119001–119012. [Google Scholar] [CrossRef]
- Lea, N.A.; Sharmin, S.; Fime, A.A. Drowsiness and Emotion Detection of Drivers for Improved Road Safety. In Proceedings of the International Conference on Human-Computer Interaction; Springer: Berlin/Heidelberg, Germany, 2024; pp. 13–26. [Google Scholar]
- Ahlström, C.; Kircher, K.; Nyström, M.; Wolfe, B. Eye Tracking in Driver Attention Research—How Gaze Data Interpretations Influence What We Learn. Front. Neuroergonomics 2021, 2, 778043. [Google Scholar] [CrossRef] [PubMed]
- Jiang, T.; Ma, Y.; Zhao, X.; Ji, X.; Liu, Y. NeuralPOS: Physiological Measurement via Remote Photoplethysmography for Driver Monitoring. In Proceedings of the 2024 8th CAA International Conference on Vehicular Control and Intelligence (CVCI), Chongqing, China, 25–27 October 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Capallera, M.; Angelini, L.; Meteier, Q.; Khaled, O.A.; Mugellini, E. Human-Vehicle Interaction to Support Driver’s Situation Awareness in Automated Vehicles: A Systematic Review. IEEE Trans. Intell. Veh. 2023, 8, 2551–2567. [Google Scholar] [CrossRef]
- Zhou, S.; Lan, R.; Sun, X.; Bai, J.; Zhang, Y.; Jiang, X. Emotional Design for In-Vehicle Infotainment Systems: An Exploratory Co-Design Study. In Proceedings of the HCI in Mobility, Transport, and Automotive Systems, Virtual Event, 26 June–1 July 2022; pp. 326–336. [Google Scholar] [CrossRef]
- Li, J.; Liu, Y.; Ji, X.; Tao, S. Detection of Driver’s Cognitive States Based on Lightgbm with Multi-Source Fused Data; SAE Technical Paper 2022-01-0066; SAE International: Warrendale, PA, USA, 2022. [Google Scholar] [CrossRef]
- Lin, C.; Zhu, X.; Wang, R.; Zhou, W.; Li, N.; Xie, Y. Early Driver Fatigue Detection System: A Cost-Effective and Wearable Approach Utilizing Embedded Machine Learning. Vehicles 2025, 7, 3. [Google Scholar] [CrossRef]
- Lin, X.; Huang, Z.; Ma, W.; Tang, W. EEG-based Driver Drowsiness Detection Based on Simulated Driving Environment. Neurocomputing 2025, 616, 128961. [Google Scholar] [CrossRef]
- Xu, X.; Yao, B.; Dong, Y.; Gabriel, S.; Yu, H.; Hendler, J.; Ghassemi, M.; Dey, A.K.; Wang, D. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2024; Volume 8, pp. 1–32. [Google Scholar] [CrossRef]
- Hu, J.; Dong, T.; Luo, G.; Ma, H.; Zou, P.; Sun, X.; Guo, D.; Yang, X.; Wang, M. PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation. IEEE Trans. Comput. Soc. Syst. 2025, 12, 539–551. [Google Scholar] [CrossRef]
- Hasenjäger, M.; Heckmann, M.; Wersing, H. A Survey of Personalization for Advanced Driver Assistance Systems. IEEE Trans. Intell. Veh. 2020, 5, 335–344. [Google Scholar] [CrossRef]
- Marina Martinez, C.; Heucke, M.; Wang, F.Y.; Gao, B.; Cao, D. Driving Style Recognition for Intelligent Vehicle Control and Advanced Driver Assistance: A Survey. IEEE Trans. Intell. Transp. Syst. 2018, 19, 666–676. [Google Scholar] [CrossRef]
- Wu, J.; Yan, Y.; Liu, Y.; Liu, Y. Research on Anthropomorphic Obstacle Avoidance Trajectory Planning for Adaptive Driving Scenarios Based on Inverse Reinforcement Learning Theory. Engineering 2024, 33, 133–145. [Google Scholar] [CrossRef]
- Yang, H.; Zhou, Y.; Wu, J.; Liu, H.; Yang, L.; Lv, C. Human-Guided Continual Learning for Personalized Decision-Making of Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2025, 26, 5435–5447. [Google Scholar] [CrossRef]
- Wang, X.; Guo, Y.; Ban, J.; Xu, Q.; Bai, C.; Liu, S. Driver Emotion Recognition of Multiple-ECG Feature Fusion Based on BP Network and D–S Evidence. IET Intell. Transp. Syst. 2020, 14, 815–824. [Google Scholar] [CrossRef]
- Jiang, T.; Li, J.; Ma, L.; Ji, X.; Liu, Y. Passenger Comfort Assessment via Motion Complexity Analysis for Autonomous Vehicles. Chin. J. Mech. Eng. 2025, 38, 149. [Google Scholar] [CrossRef]
- Thirunavukkarasu, G.S.; Abdi, H.; Mohajer, N. A Smart HMI for Driving Safety Using Emotion Prediction of EEG Signals. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 004148–004153. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual Instruction Tuning. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- OpenAI. GPT-4V(Ision) System Card. Available online: https://cdn.openai.com/papers/GPTV_System_Card.pdf (accessed on 10 January 2026).
- Bai, J.; Bai, S.; Yang, S.; Wang, S.; Tan, S.; Wang, P.; Lin, J.; Zhou, C.; Zhou, J. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv 2023, arXiv:2308.12966. [Google Scholar]
- Sima, C.; Renz, K.; Chitta, K.; Chen, L.; Zhang, H.; Xie, C.; Luo, P.; Geiger, A.; Li, H. DriveLM: Driving with Graph Visual Question Answering. In Proceedings of the European Conference on Computer Vision, Paris, France, 2–3 October 2023. [Google Scholar]
- Tian, X.; Gu, J.; Li, B.; Liu, Y.; Hu, C.; Wang, Y.; Zhan, K.; Jia, P.; Lang, X.; Zhao, H. DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models. In Proceedings of the Conference on Robot Learning, Munich, Germany, 6–9 November 2024. [Google Scholar]
- Xie, S.; Kong, L.; Dong, Y.; Sima, C.; Zhang, W.; Chen, Q.A.; Liu, Z.; Pan, L. Are Vlms Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19–23 October 2025; pp. 6585–6597. [Google Scholar]
- Baevsky, R.M.; Chernikova, A.G. Heart Rate Variability Analysis: Physiological Foundations and Main Methods. Cardiometry 2017, 10, 66–76. [Google Scholar] [CrossRef]
- Bai, S.; Cai, Y.; Chen, R.; Chen, K.; Chen, X.; Cheng, Z.; Deng, L.; Ding, W.; Gao, C.; Ge, C.; et al. Qwen3-VL Technical Report. arXiv 2025, arXiv:2511.21631. [Google Scholar] [CrossRef]




| Model | Test Set Accuracy | Safety-Critical Scenarios Accuracy | Comfort-Critical Scenarios Accuracy | Efficiency-Critical Scenarios Accuracy |
|---|---|---|---|---|
| Rule-based | 29.67% | 28.65% | 14.11% | 34.23% |
| DriveLM | 67.48% | 66.64% | 67.54% | 68.43% |
| ODP | 86.46% | 85.77% | 86.40% | 87.32% |
| Fine-Tuning | Occupant Information | CoT | Test Set Accuracy | Safety-Critical Scenarios Accuracy | Comfort-Critical Scenarios Accuracy | Efficiency-Critical Scenarios Accuracy |
|---|---|---|---|---|---|---|
| 18.95% | 17.29% | 17.16% | 21.99% | |||
| ✓ | ✓ | 51.37% | 50.11% | 53.01% | 51.91% | |
| ✓ | ✓ | 47.41% | 43.69% | 50.23% | 50.00% | |
| ✓ | ✓ | ✓ | 86.46% | 85.77% | 86.40% | 87.32% |
| Subset | Test Set Accuracy | Safety-Critical Scenarios Accuracy | Comfort-Critical Scenarios Accuracy | Efficiency-Critical Scenarios Accuracy |
|---|---|---|---|---|
| Subset-M | 81.15% | 80.03% | 82.38% | 81.75% |
| Subset-A | 77.66% | 75.33% | 77.09% | 81.22% |
| Subset-X | 83.74% | 81.71% | 85.01% | 85.40% |
| Backbone | Test Set Accuracy | Safety-Critical Scenarios Accuracy | Comfort-Critical Scenarios Accuracy | Efficiency-Critical Scenarios Accuracy |
|---|---|---|---|---|
| Qwen2.5-VL-7B-Instruct | 69.02% | 67.52% | 69.00% | 70.97% |
| Qwen3-VL-4B-Instruct | 81.97% | 80.84% | 82.63% | 83.12% |
| Qwen3-VL-8B-Instruct | 83.74% | 81.71% | 85.00% | 85.40% |
| Subjective Feeling | Balanced Accuracy | F1 Score | r |
|---|---|---|---|
| Sense of safety | 0.80 | 0.81 | 0.92 |
| Comfort | 0.73 | 0.75 | 0.89 |
| Travel efficiency | 0.79 | 0.79 | 0.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jiang, T.; Zhao, X.; Ji, X.; Liu, Y. Occupant-Aware Decision-Making with Large Vision-Language Model for Autonomous Vehicles. Machines 2026, 14, 257. https://doi.org/10.3390/machines14030257
Jiang T, Zhao X, Ji X, Liu Y. Occupant-Aware Decision-Making with Large Vision-Language Model for Autonomous Vehicles. Machines. 2026; 14(3):257. https://doi.org/10.3390/machines14030257
Chicago/Turabian StyleJiang, Titong, Xinyu Zhao, Xuewu Ji, and Yahui Liu. 2026. "Occupant-Aware Decision-Making with Large Vision-Language Model for Autonomous Vehicles" Machines 14, no. 3: 257. https://doi.org/10.3390/machines14030257
APA StyleJiang, T., Zhao, X., Ji, X., & Liu, Y. (2026). Occupant-Aware Decision-Making with Large Vision-Language Model for Autonomous Vehicles. Machines, 14(3), 257. https://doi.org/10.3390/machines14030257

