This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems
by
Jisu Kim
Jisu Kim 1,*
and
Byoung-Keon D. Park
Byoung-Keon D. Park 2
1
College of Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
2
University of Michigan Transportation Research Institute, Ann Arbor, MI 48109, USA
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(20), 6323; https://doi.org/10.3390/s25206323 (registering DOI)
Submission received: 4 September 2025
/
Revised: 4 October 2025
/
Accepted: 10 October 2025
/
Published: 13 October 2025
Abstract
Understanding occupant behavior is critical for enhancing safety and situational awareness in intelligent transportation systems. This study investigates multimodal occupant behavior recognition using sequential inputs extracted from 2D pose, 2D gaze, and facial movements. We conduct a comprehensive comparative study of three distinct architectural paradigms: a static Multi-Layer Perceptron (MLP), a recurrent Long Short-Term Memory (LSTM) network, and an attention-based Transformer encoder. All experiments are performed on the large-scale Occupant Behavior Classification (OBC) dataset, which contains approximately 2.1 million frames across 79 behavior classes collected in a controlled, simulated environment. Our results demonstrate that temporal models significantly outperform the static baseline. The Transformer model, in particular, emerges as the superior architecture, achieving a state-of-the-art Macro F1 score of 0.9570 with a configuration of a 50-frame span and a step size of 10. Furthermore, our analysis reveals that the Transformer provides an excellent balance between high performance and computational efficiency. These findings demonstrate the superiority of attention-based temporal modeling with multimodal fusion and provide a practical framework for developing robust and efficient in-vehicle occupant monitoring systems. Implementation code and supplementary resources are available (see Data Availability Statement).
Share and Cite
MDPI and ACS Style
Kim, J.; Park, B.-K.D.
Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems. Sensors 2025, 25, 6323.
https://doi.org/10.3390/s25206323
AMA Style
Kim J, Park B-KD.
Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems. Sensors. 2025; 25(20):6323.
https://doi.org/10.3390/s25206323
Chicago/Turabian Style
Kim, Jisu, and Byoung-Keon D. Park.
2025. "Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems" Sensors 25, no. 20: 6323.
https://doi.org/10.3390/s25206323
APA Style
Kim, J., & Park, B.-K. D.
(2025). Robust Occupant Behavior Recognition via Multimodal Sequence Modeling: A Comparative Study for In-Vehicle Monitoring Systems. Sensors, 25(20), 6323.
https://doi.org/10.3390/s25206323
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.