Saved Queries

Wire harness assembly is a highly manual job performed on formboards. Augmented reality (AR)-assisted wiring operations can improve work efficiency and reduce operator workload. However, investigations into the effects of AR-assisted wiring assembly on operator performance remain in the preliminary stage. To investigate how different AR wire harness modes support novice operators in completing assembly tasks effectively, this exploratory laboratory study examined the impacts of AR instruction modes for single-route conditions on assembly performance (task time and number of assembly errors), gaze behavior using eye-tracking data, and subjective experience measured with the NASA-TLX (Task Load Index) as a post-experiment questionnaire in a controlled laboratory environment. Three wire path visualization modes, i.e., static color mode (SCM), dynamic color mode with flashing display (DCM-FD), and dynamic color mode with segment display (DCM-SD), were implemented for monitor-based, AR-assisted wiring instruction on a formboard. The results reveal a substantial influence of the wire path visualization modes on task time under our controlled experimental conditions: the SCM group achieved an 18% shorter task time than the other two groups, with a statistically significant difference. This finding contradicts the existing observations in the mechanical assembly domain. For gaze behavior, an analysis of the eye-tracking data indicated that the number of switches in the SCM group was the lowest among the three groups, with a marginally significant difference from the DCM-FD group for both low- and high-complexity wiring tasks during the laying phase. Additionally, the total fixation time of the three groups showed a significant difference for low- and high-complexity tasks with a large effect size; the SCM group exhibited the shortest total fixation time across all tasks. No significant differences in the number of assembly errors and users’ perceived workload were observed among the three groups. These findings can serve as a reference for guiding the visual style design in AR-assisted wiring systems for training novice operators in human-centric Industry 5.0 and achieving a decrease in overall workload and improved task performance. Full article

(This article belongs to the Section Cognition)

►▼ Show Figures

Figure 1

16 pages, 601 KB

Open AccessArticle

Visual Attention to Emotional Faces in Children: An Eye-Tracking Study of Social Visual Attention

by Thaís de Fátima Bittencourt Oliveira, Erica de Freitas Marques, Guilherme Martins, Milena Fernandes de Oliveira, Leonardo Martins Guimaraes Rossi, Carlucio Gustavo Ribeiro Filho, Camila Fernanda Cunha Brandão, Lucas Rios Drummond, Lucas Túlio Lacerda, Michelle Morelo Pereira and Michael Jackson Oliveira de Andrade

Brain Sci. 2026, 16(7), 683; https://doi.org/10.3390/brainsci16070683 (registering DOI) - 29 Jun 2026

Abstract

Objectives: Visual attention to emotional faces provides a useful framework for investigating orienting, visual exploration, and attentional engagement across development. The present study aimed to characterize the visuospatial organization of attention in neurotypical children and to examine how this pattern is modulated by social and emotional factors. Twenty children (aged 8–12 years) participated in a passive viewing paradigm of facial expressions while their eye movements were recorded using eye tracking (120 Hz). Methods: Oculomotor metrics based on areas of interest (eyes, mouth, nose, face, and non-social regions) were analyzed, including time to first fixation (TTFF), number of fixations (NF), and total fixation duration (TFD), as well as total saccade count as a global index of visual scanning. Results: Results indicated statistically significant AOI-dependent interactions involving emotional expression, observer sex, stimulus sex, and stimulus race/ethnicity, revealing region-specific modulation of visual attention. Consistently, prioritization of the eye region was observed, particularly for angry expressions, and was associated with greater fixation recurrence and duration, whereas happy and surprised expressions were associated with increased attentional allocation to the mouth. Differences related to observer sex and stimulus characteristics reflected region-specific modulations. In contrast, global saccadic dynamics remained relatively stable across experimental conditions and showed no significant effects of observer sex, stimulus sex, race/ethnicity, or emotional expression. Conclusions: Taken together, these findings suggest that visual attention to emotional faces in childhood follows a relatively stable spatial organization characterized by preferential processing of the eye region and region-specific modulation associated with emotional expression and stimulus characteristics. Full article

(This article belongs to the Special Issue Eye-Tracking Monitoring of Neurological and Psychiatric Conditions Across Life Span)

►▼ Show Figures

Graphical abstract

17 pages, 2838 KB

Open AccessArticle

Consumer Responses to Packaging Materials in E-Commerce: Effects on Visual Attention, Disposal Behavior, and Brand Perception

by Mengmeng Zhao, Shannon Anderson, Rupert Andrew Hurley, Kirsty McLaren, Skylar Sirdashney, Greta Joneson, Leah Ivancic, Carol Pan and Tim Ohnmacht

Sustainability 2026, 18(13), 6568; https://doi.org/10.3390/su18136568 (registering DOI) - 29 Jun 2026

Abstract

As e-commerce expands, packaging increasingly serves as a communication interface in at-home consumer environments, where it may influence how consumers interpret sustainability. Unlike retail settings, where disposal decisions may be externally guided, consumers in e-commerce contexts rely on material cues and on-package information to interpret recyclability and brand intent. This study aims to examine how paper-based and plastic packaging influence visual attention, disposal behavior, and brand perception in apparel e-commerce. A controlled experimental study (n = 91) was conducted using mobile eye-tracking, behavioral observation, post-experience surveys, and follow-up interviews. Participants were randomly assigned to one of three packaging conditions: a low-density polyethylene (LDPE) plastic bag, a translucent paper bag, or a hybrid paper-based bag combining kraft and translucent materials. Results show that paper-based formats generated greater visual engagement than plastic, with translucent paper eliciting longer fixation duration and higher fixation count (p < 0.05). Recycling rates were higher for paper-based formats (70–77%) than plastic (53%), though not statistically significant. Perceived eco-friendliness differed significantly, with the hybrid paper format more strongly associated with environmental responsibility (p < 0.001). Qualitative findings indicate that material statements and disposal instructions improve confidence in interpreting recyclability. These results suggest that packaging material plays a role in shaping consumer attention and perceived eco-friendliness in e-commerce contexts. Full article

►▼ Show Figures

Figure 1

10 pages, 262 KB

Open AccessProceeding Paper

Analytical Study of Key Techniques for Cross-Modal Feature Alignment and Decision-Level Fusion in Brain–Computer Interface-Virtual Reality Systems

by Dan Liu

Eng. Proc. 2026, 141(1), 19; https://doi.org/10.3390/engproc2026141019 (registering DOI) - 29 Jun 2026

Abstract

Feature alignment and decision-level fusion in multimodal BCI–VR interaction were investigated using Transformer-based cross-modal embeddings, Lab Streaming Layer time synchronization, attention masks, and wavelet filtering for robust representation. A four-modal acquisition and synchronization platform covering electroencephalography, electromyography, eye-tracking, and speech was constructed, and fusion was achieved by introducing a stacking meta-learner together with a confidence-aware dynamic weighting mechanism. Prototype validation and comparative evaluations were conducted on virtual reality (VR) target-selection, trajectory-following, and object-manipulation tasks. The results showed that the proposed approach outperformed baselines such as weighted voting and independent single-modality classifiers in accuracy, cross-session and cross-subject generalization, and noise robustness, while achieving a measurable reduction in end-to-end response latency, indicating that an integrated semantic alignment–adaptive fusion pipeline enhanced stable outputs and robustness in multimodal interaction. The unified semantic alignment model tailored to BCI–VR can be used for establishing an integrated engineering workflow spanning synchronization, robust representation, and adaptive fusion, and for providing transferable evaluation metrics and application paradigms that offer methodological and technical references for scenarios such as rehabilitation training, virtual education, and intelligent control. Full article

28 pages, 7113 KB

Open AccessArticle

Optimization of Human–Machine Interface Layout for Mechanical Support Position of Manned Submersibles Based on a Task-Information Network Approach

by Xiyue Wang, Liping Pang, Xiaodong Cao, Yuejie Fan, Bingxu Zhao, Xin Wang and Wentao Wu

J. Mar. Sci. Eng. 2026, 14(13), 1176; https://doi.org/10.3390/jmse14131176 (registering DOI) - 26 Jun 2026

Viewed by 66

Abstract

The human–machine interface (HMI) of the mechanical support (MS) position (MS-HMIs) of manned submersibles features multiple screens, information-rich displays, and complex operational logic, which can reduce operator efficiency, increase cognitive load, and lead to human errors. The layout determines the perception of information density, complexity, and logic, making the optimization of the HMI layout highly significant. To address this issue, a layout optimization approach is proposed based on a task-information network integrating multi-objective optimization. First, the basic MS-HMI elements are decomposed, and Hierarchical Task Analysis (HTA) is used to construct task sequences and element usage sequences. The Space-P and Space-L methods are applied to build the task–information network, based on which element grouping and importance are determined through network topology analysis. Incorporating ergonomic layout principles, a multi-objective optimization model is formulated and solved using the NSGA-II algorithm to generate feasible optimized layouts. Experimental verification results demonstrate that the optimized interfaces significantly outperform the original design in terms of operational performance, eye-tracking metrics, and subjective evaluations. Operation duration and task completion time decreased by over 6%, average saccade speed was reduced by up to 17.1%, and subjective ratings improved substantially. By integrating complex network analysis, typical submersible task sequences, and ergonomic principles, this study presents a systematic, evidence-based, effective, and task-compliant method for optimizing HMI layouts. Full article

(This article belongs to the Section Ocean Engineering)

12 pages, 437 KB

Open AccessArticle

Visual, Vestibular, and Somatosensory Function in Female Rugby League Athletes

by Riley Brassington, Jocelyn Mara, Nick Ball, Gordon Waddington and Julie Cooke

Sports 2026, 14(7), 265; https://doi.org/10.3390/sports14070265 - 26 Jun 2026

Viewed by 124

Abstract

Female rugby league performance is influenced by multiple interacting sensory and physiological systems; however, the extent to which these factors vary across playing levels and positional groups remains unclear. This study explored differences in visual, vestibular, somatosensory, and autonomic performance according to playing level and position in female rugby league athletes. Elite and sub-elite athletes completed lower-limb proprioception testing using the Active Movement Extent Discrimination Assessment protocol alongside visual-vestibular and autonomic assessments obtained via a virtual reality eye-tracking system. Bayesian hierarchical models examined the effects of playing level, positional group (adjustables, backs, forwards), and their interaction. Few consistent differences were observed between elite and sub-elite athletes across the measures assessed. Posterior estimates suggest selected level-by-position effects for ankle proprioceptive acuity (PD = 0.94), vestibulo-oculomotor time on target (PD = 0.95), and autonomic dilation velocity (PD = 0.98); however, these findings were not consistent across positional groups or outcome measures, and within-group variability was evident. Overall, sensory and autonomic performance did not consistently differentiate elite and sub-elite athletes, suggesting limited utility as cross-sectional markers of playing level but potential value as longitudinal monitoring tools alongside workload, recovery, and performance data. Full article

(This article belongs to the Special Issue From Brain to Movement: Cognitive and Perceptual Roles in Motor Control and Sport Performance)

►▼ Show Figures

Figure 1

22 pages, 1501 KB

Open AccessArticle

Autism Spectrum Disorder Detection Using a Weighted-Average Ensemble of Deep Convolutional Neural Networks on Eye-Tracking Images

by Masroor Ahmed, Sadam Hussain, Ivan Amaya and José Carlos Ortiz-Bayliss

Mach. Learn. Knowl. Extr. 2026, 8(7), 176; https://doi.org/10.3390/make8070176 - 25 Jun 2026

Viewed by 191

Abstract

Autism Spectrum Disorder is a long-term neurodevelopmental disorder. Early diagnosis is crucial for timely rehabilitation and intervention. Recently, machine learning and deep learning techniques have been widely explored and have produced encouraging results using eye-tracking scanpath images for the early detection of ASD. However, these approaches exhibit inconsistent performance and classification error rates, as well as limited generalization, due to differences in learning approaches and architectural designs across individual models. To address these problems, we employed a weighted-average ensemble of deep learning models using eye-tracking scanpath images. In this work, two different pretrained convolutional neural networks were selected, including Xception and VGG16, based on their proven efficacy and performance. Moreover, we fine-tuned each model individually and evaluated them on a dataset containing eye-tracking scanpath images. We implemented a weighted-average ensemble technique to combine the diverse predictions of the two models. It reduces classification errors and improves the model’s generalization and overall performance. The adopted weighted-average ensemble technique achieved an accuracy of 98.18%, with a perfect recall, and a competitive Area Under the Curve (AUC) of 99.59%. These findings highlight that applying a weighted average to integrate multiple models’ predictions strengthens the generalization and reliability of ASD detection. Full article

(This article belongs to the Section Learning)

►▼ Show Figures

Graphical abstract

25 pages, 6334 KB

Open AccessArticle

The Influence of Personality Traits on Hazard Recognition in Construction Workers

by Zhizhong Zhao, Huajiao Li, Rongyu Xia, Jianyong Tong, Song Wu, Xinen Pan, Shuhua Cen, Shutong Zhang and Haifeng Wan

Buildings 2026, 16(13), 2495; https://doi.org/10.3390/buildings16132495 - 24 Jun 2026

Viewed by 115

Abstract

Current construction safety research has paid limited attention to the relationship between stable individual differences and hazard-related visual attention. This study combined personality assessment and eye-tracking technology to investigate visual attention allocation and hazard recognition among construction workers in static work-at-height scenarios. Personality traits were assessed using the Chinese Big Five Personality Inventory Brief Version, and 30 participants with extreme trait profiles were selected for eye-tracking experiments in two representative work-at-height scenarios. Eight eye-tracking indicators were analyzed across four dimensions: attentional span, attentional stability, attentional allocation, and attentional shifting. An AHP-based evaluation framework was further developed to assess visual attention efficacy. The results showed descriptive differences in hazard-related visual attention patterns across personality-trait groups. Individuals high in agreeableness and conscientiousness exhibited more hazard-oriented visual allocation and higher visual attention efficacy, whereas those high in openness and extraversion showed stronger exploratory tendencies and lower efficiency in allocating attention to high-risk areas. Individuals high in neuroticism showed intermediate overall performance but relatively weaker attentional organization. Sensitivity analysis indicated that the ranking results remained stable under moderate weight perturbations. These findings provide a quantitative framework for comparing visual attention efficacy across personality-trait groups and offer preliminary support for differentiated safety training, risk communication, and attentional guidance in construction safety management. Full article

(This article belongs to the Section Construction Management, and Computers & Digitization)

►▼ Show Figures

Figure 1

24 pages, 45533 KB

Open AccessArticle

Optimizing Overall Color in Film Posters: A Type-Dependent Study Based on Eye Tracking and Constrained Optimization

by Bin Zhang, Ping Ji, Zhiqiang Wen and Ruixue Zhang

Appl. Sci. 2026, 16(13), 6333; https://doi.org/10.3390/app16136333 - 24 Jun 2026

Viewed by 160

Abstract

Film posters serve as front-end visual communication media that shape viewers’ initial judgments of film genre, emotional tone, and viewing appeal. However, whether the optimal overall color configuration follows a universal rule or varies across poster types remains insufficiently examined. This study investigated how overall lightness and chroma influence the communication effects of film posters and identified type-specific optimal color intervals. Based on a cross-type poster sample library, film posters were classified into four visual grammar types: affable-entertaining, relational-emotional, spectacle-dynamic, and threat-suspenseful. Type-specific quantile thresholds for lightness and chroma were established within each category. Eye-tracking data, subjective ratings, mixed-effects response surface modeling, and constrained desirability optimization were combined to identify optimal regions of overall color configuration. The results show that no single optimal lightness–chroma interval applies across all poster types. The dominant optimal interval was low lightness–high chroma for affable-entertaining and relational-emotional posters, high lightness–low chroma for spectacle-dynamic posters, and medium lightness–high chroma for threat-suspenseful posters. These findings indicate that overall color optimization varies across poster types within the present experimental context and provide practical support for evidence-based, type-specific poster color design. Full article

►▼ Show Figures

Figure 1

12 pages, 4675 KB

Open AccessArticle

Physiology-Driven Inference Using Large Language Models Enables Probabilistic Assessment of Huntington’s Disease from Smartphone Eye-Movement Data

by Leonardo Eleuterio Ariello, Kelvin Wang, David Newman-Toker, Jee Bang and David P. W. Rastall

AI 2026, 7(7), 236; https://doi.org/10.3390/ai7070236 - 24 Jun 2026

Viewed by 212

Abstract

Background: Artificial intelligence in medicine has largely relied on supervised training of disease-specific models, limiting scalability in conditions where labeled data are scarce. Large language models (LLMs), which encode broad medical knowledge through large-scale pretraining, offer an alternative paradigm in which structured physiological measurements can be interpreted directly without task-specific model training. Objective: To evaluate whether smartphone-derived ocular motor biomarkers can be translated into clinically meaningful probabilistic assessments of Huntington’s disease (HD) using general-purpose LLMs operating as inference engines. Methods: In this prospective proof-of-concept study, 26 participants (13 with genetically confirmed HD and 13 age-matched controls) completed a standardized ocular motor assessment using a custom smartphone application. Quantitative eye-movement metrics were validated against expert neurologist ratings. Structured physiological features were then provided to four general-purpose LLMs without task-specific training or diagnostic labels, and the models generated an AI-Assigned HD Probability Score (HAIPS). Discriminative performance and associations with clinical severity measures were evaluated. Results: Smartphone-derived ocular motor metrics showed strong agreement with clinician assessments (Spearman ρ = 0.76–0.95; all p < 0.001), confirming preservation of clinically meaningful physiological signals. LLM-derived HAIPS distinguished HD from controls with high accuracy (AUC 0.879–0.944), with no significant differences across models. Discrimination was statistically equivalent to a supervised logistic regression model trained on the same features. HAIPS correlated strongly with established measures of disease severity, including cognitive (MoCA, ρ = −0.86), functional (TFC, ρ = −0.74), and motor impairment (UHDRS, ρ = 0.85) (all p ≤ 0.003). Conclusions: Structured ocular motor biomarkers acquired using a consumer smartphone can be translated into clinically meaningful probabilistic assessments of HD by general-purpose LLMs without disease-specific model training. These findings support a framework in which physiologically grounded digital biomarkers are coupled with general-purpose inference models, potentially enabling scalable assessment in rare neurological diseases where labeled data are limited. Full article

(This article belongs to the Section Medical & Healthcare AI)

►▼ Show Figures

Figure 1

28 pages, 13815 KB

Open AccessArticle

Dual-Stream Fusion of Eye-Tracking and ECG Signals for Fatigue Detection in Remote Tower Air Traffic Controllers

by Dajiang Song, Weijun Pan, Hugo Gamboa, Zirui Yin and Shengjie Wang

Bioengineering 2026, 13(7), 717; https://doi.org/10.3390/bioengineering13070717 - 23 Jun 2026

Viewed by 128

Abstract

Fatigue detection in remote tower air traffic controllers is important for maintaining operational safety under sustained visual monitoring and high cognitive workload. This study proposes MFD-Net, a dual-stream multimodal fusion framework using eye-tracking and electrocardiogram (ECG) signals. The model separately encodes eye-tracking and ECG-derived temporal inputs, incorporates an ECG-derived RMSSD expert feature, and performs lightweight late fusion for fatigue-state classification. Under the mixed-subject random-window protocol, MFD-Net achieved an Accuracy of 85.20%, a Recall of 83.33%, and an AUC of 0.9337. Because overlapping windows from the same participant and scenario could appear in both training and test sets, this result should be interpreted as a potentially optimistic within-distribution estimate. Under the stricter zero-shot leave-one-subject-out (LOSO) protocol, performance decreased substantially, with an Accuracy of

70.95 \pm 21.59 %

, a Recall of

22.98 \pm 36.30 %

, and an AUC of

0.6025 \pm 0.2984

. This low zero-shot Recall indicates limited subject-independent fatigue-detection capability. Lightweight target-subject calibration and sequential probability aggregation improved adaptation and temporal stability, although the calibration results should be interpreted cautiously because random target-subject windows were used for fine-tuning. These findings suggest that eye-tracking and ECG fusion are promising under controlled conditions, while practical deployment requires deployment-oriented calibration protocols, recall-oriented optimization, and further real-world validation. Full article

(This article belongs to the Section Biosignal Processing)

21 pages, 1843 KB

Open AccessArticle

Eye-Tracking-Based Evaluation of Cognitive Style and Driving Task Effects on AR-HUD Navigation Interfaces

by Jing Li, Xinyu Feng, Min Lin and Hua Zhang

Sensors 2026, 26(13), 3980; https://doi.org/10.3390/s26133980 - 23 Jun 2026

Viewed by 199

Abstract

As augmented reality head-up display (AR-HUD) becomes increasingly integrated into intelligent vehicles, inappropriate interface designs may increase drivers’ cognitive workload and delay hazard responses. This study investigates how cognitive style, driving task type, and AR-HUD navigation design jointly influence drivers’ behavioral performance and visual attention. A total of 55 participants were recruited and screened using the Group Embedded Figures Test, with 38 drivers finally selected for a 2 × 4 × 2 driving-simulation experiment comparing world-fixed (WF) and screen-fixed (SF) interfaces across goal-directed and stimulus-driven tasks. Reaction times and eye-tracking indicators were analyzed using generalized linear models. Results show that stimulus-driven tasks significantly increased reaction times, with rear-vehicle scenarios producing the longest responses (mean = 1.420). During lane-change tasks, WF displays significantly reduced fixation duration (p < 0.001) and fixation counts (p < 0.001), whereas SF displays improved attentional efficiency during pedestrian-warning tasks. In addition, field-dependent drivers exhibited significantly larger pupil diameters, indicating higher cognitive workload. These findings provide sensor-based evidence for AR-HUD systems that dynamically optimize interface presentation according to task context and workload conditions. Full article

(This article belongs to the Section Navigation and Positioning)

16 pages, 2080 KB

Open AccessArticle

An Eye-Tracking Study on Text Accessibility and Comprehension in University Students

by Sergio Navas-León and Jon Andoni Duñabeitia

Behav. Sci. 2026, 16(6), 1041; https://doi.org/10.3390/bs16061041 - 22 Jun 2026

Viewed by 151

Abstract

Easy-to-Read (E2R) recommendations aim to improve accessibility, but it remains unclear whether some visual and typographic adaptations may also benefit readers without disabilities. This study examined the effects of different text formats on reading comprehension and visual processing in university students using eye-tracking. Twenty-four young adults without cognitive disabilities read texts presented in three formats: hard-to-read, control, and Easy-to-Read. Reading comprehension was assessed with multiple-choice questions, and eye movements were recorded during reading. Data were analyzed using linear mixed-effects models. Text Format significantly affected reading comprehension, with estimated accuracy highest in the E2R format and significantly higher than in the hard-to-read format. The E2R format was also associated with shorter fixation durations and larger saccades than the other formats, suggesting a pattern compatible with a reduced cognitive demand in some eye-movement measures. Fixation count was highest for hard-to-read texts and significantly higher than in the control format, whereas differences involving E2R were not significant. Reading time showed a trend towards significance, with descriptively longer reading times for hard-to-read texts than for the control and E2R formats. These findings suggest that E2R adaptations, originally developed to support populations with cognitive needs, may also facilitate comprehension and reading efficiency in readers without cognitive disabilities. Full article

(This article belongs to the Section Cognition)

►▼ Show Figures

Figure 1

10 pages, 477 KB

Open AccessArticle

Subtitle Engagement Varies with Audio–Subtitle Language–Script Pairing: Evidence from Hindi–English Bilinguals with an English-Medium Instruction Background

by Inka Romero-Ortells, Manuel Perea, Eva Gutierrez-Sigut and Jon Andoni Duñabeitia

Vision 2026, 10(2), 36; https://doi.org/10.3390/vision10020036 - 22 Jun 2026

Viewed by 156

Abstract

Subtitles often attract visual attention even when they are not necessary for comprehension. In the present eye-tracking experiment, we examined whether attention to subtitles in instructional videos varies as a function of audio–subtitle language–script pairing in Hindi–English bilinguals with an English-medium instruction (EMI) background. Native Hindi participants viewed videos in three conditions: English audio with English subtitles (L2–L2), Hindi audio with Hindi subtitles (L1–L1), and English audio with Hindi subtitles (L2–L1). In the L2–L2 condition, gaze was distributed similarly across speakers’ faces and subtitles. In contrast, in both Hindi-subtitle formats, viewers allocated more dwell time to the speakers’ faces than to the subtitles. Comprehension scores did not differ significantly across conditions. These findings suggest that subtitle engagement among EMI bilinguals is not solely determined by the presence of subtitles but is also modulated by the properties and perceived utility of the written channel. More generally, our results caution against the view that subtitle engagement is uniformly automatic across multilingual instructional settings. Full article

►▼ Show Figures

Figure 1

34 pages, 8922 KB

Open AccessArticle

Behavior Recognition of Novice Drivers Based on Bimodal Eye-Tracking Characteristics and a Parallel CNN-Mamba Model

by Jianzhuo Li, Panyu Dai, Jiake Li and Ye Yu

Computers 2026, 15(6), 397; https://doi.org/10.3390/computers15060397 - 21 Jun 2026

Viewed by 123

Abstract

Driving behavior recognition plays a crucial role in intelligent driving systems and road traffic safety. Due to insufficient driving experience and limited ability to allocate visual attention, novice drivers are considered a high-risk group for traffic accidents. Existing approaches primarily focus on experienced drivers and rely on single-modal eye-tracking data, making it difficult to model spatial attention distributions and long-term temporal dependencies simultaneously. Moreover, these methods are often affected by modality asynchrony during multimodal fusion, further limiting performance gains. To address these challenges, this study proposes a novice driver behavior recognition method based on bimodal eye-tracking features and a gated cross-modal attention fusion (GCMAF) mechanism. The model adopts a spatial–temporal dual-branch architecture. The spatial branch employs ResNet34 to extract eye-tracking heatmap features to represent the visual attention distribution. In contrast, the temporal branch integrates a 1D-CNN with the Mamba model to capture local dynamic patterns and long-range temporal dependencies. In the fusion stage, the GCMAF module is introduced to enhance cross-modal interactions, and a gating mechanism is further used to adaptively adjust modality weights, thereby mitigating the adverse effects of modality asynchrony. To validate the effectiveness and generalization ability of the proposed method, repeated experiments and five-fold cross-validation are conducted. The results demonstrate that the model achieves an average classification accuracy of 93.86% across four driving behavior categories, with standard deviations below 0.3%. Compared with baseline methods, paired t-test results show that the performance improvement is statistically significant (p < 0.01). Ablation studies further confirm the independent contribution of each component. Overall, the proposed method outperforms existing approaches in terms of accuracy and stability, providing effective support for driving behavior assessment and proactive safety warning systems. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 43.

Go to page 1 2 3 4 5

Search Results (2,150)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI