MDPI - Publisher of Open Access Journals

28 pages, 3441 KiB

Open AccessArticle

Which AI Sees Like Us? Investigating the Cognitive Plausibility of Language and Vision Models via Eye-Tracking in Human-Robot Interaction

by Khashayar Ghamati, Maryam Banitalebi Dehkordi and Abolfazl Zaraki

Sensors 2025, 25(15), 4687; https://doi.org/10.3390/s25154687 - 29 Jul 2025

Viewed by 196

Abstract

As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception [...] Read more.

As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception capabilities, their cognitive plausibility remains underexplored. In this study, we address this gap by using human visual attention as a behavioural proxy for cognition in a naturalistic human-robot interaction (HRI) scenario. Eye-tracking data were previously collected from participants engaging in social human-human interactions, providing frame-level gaze fixations as a human attentional ground truth. We then prompted a state-of-the-art VLM (LLaVA) to generate scene descriptions, which were processed by four LLMs (DeepSeek-R1-Distill-Qwen-7B, Qwen1.5-7B-Chat, LLaMA-3.1-8b-instruct, and Gemma-7b-it) to infer saliency points. Critically, we evaluated each model in both stateless and memory-augmented (short-term memory, STM) modes to assess the influence of temporal context on saliency prediction. Our results presented that whilst stateless LLaVA most closely replicates human gaze patterns, STM confers measurable benefits only for DeepSeek, whose lexical anchoring mirrors human rehearsal mechanisms. Other models exhibited degraded performance with memory due to prompt interference or limited contextual integration. This work introduces a novel, empirically grounded framework for assessing cognitive plausibility in generative models and underscores the role of short-term memory in shaping human-like visual attention in robotic systems. Full article

(This article belongs to the Special Issue Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques)

► Show Figures

Figure 1

14 pages, 1419 KiB

Open AccessArticle

GhostBlock-Augmented Lightweight Gaze Tracking via Depthwise Separable Convolution

by Jing-Ming Guo, Yu-Sung Cheng, Yi-Chong Zeng and Zong-Yan Yang

Electronics 2025, 14(15), 2978; https://doi.org/10.3390/electronics14152978 - 25 Jul 2025

Viewed by 155

Abstract

This paper proposes a lightweight gaze-tracking architecture named GhostBlock-Augmented Look to Coordinate Space (L2CS), which integrates GhostNet-based modules and depthwise separable convolution to achieve a better trade-off between model accuracy and computational efficiency. Conventional lightweight gaze-tracking models often suffer from degraded accuracy due [...] Read more.

This paper proposes a lightweight gaze-tracking architecture named GhostBlock-Augmented Look to Coordinate Space (L2CS), which integrates GhostNet-based modules and depthwise separable convolution to achieve a better trade-off between model accuracy and computational efficiency. Conventional lightweight gaze-tracking models often suffer from degraded accuracy due to aggressive parameter reduction. To address this issue, we introduce GhostBlocks, a custom-designed convolutional unit that combines intrinsic feature generation with ghost feature recomposition through depthwise operations. Our method enhances the original L2CS architecture by replacing each ResNet block with GhostBlocks, thereby significantly reducing the number of parameters and floating-point operations. The experimental results on the Gaze360 dataset demonstrate that the proposed model reduces FLOPs from 16.527 × 10⁸ to 8.610 × 10⁸ and parameter count from 2.387 × 10⁵ to 1.224 × 10⁵ while maintaining comparable gaze estimation accuracy, with MAE increasing only slightly from 10.70° to 10.87°. This work highlights the potential of GhostNet-augmented designs for real-time gaze tracking on edge devices, providing a practical solution for deployment in resource-constrained environments. Full article

► Show Figures

Figure 1

28 pages, 3228 KiB

Open AccessArticle

Examination of Eye-Tracking, Head-Gaze, and Controller-Based Ray-Casting in TMT-VR: Performance and Usability Across Adulthood

by Panagiotis Kourtesis, Evgenia Giatzoglou, Panagiotis Vorias, Katerina Alkisti Gounari, Eleni Orfanidou and Chrysanthi Nega

Multimodal Technol. Interact. 2025, 9(8), 76; https://doi.org/10.3390/mti9080076 - 25 Jul 2025

Viewed by 308

Abstract

Virtual reality (VR) can enrich neuropsychological testing, yet the ergonomic trade-offs of its input modes remain under-examined. Seventy-seven healthy volunteers—young (19–29 y) and middle-aged (35–56 y)—completed a VR Trail Making Test with three pointing methods: eye-tracking, head-gaze, and a six-degree-of-freedom hand controller. Completion [...] Read more.

Virtual reality (VR) can enrich neuropsychological testing, yet the ergonomic trade-offs of its input modes remain under-examined. Seventy-seven healthy volunteers—young (19–29 y) and middle-aged (35–56 y)—completed a VR Trail Making Test with three pointing methods: eye-tracking, head-gaze, and a six-degree-of-freedom hand controller. Completion time, spatial accuracy, and error counts for the simple (Trail A) and alternating (Trail B) sequences were analysed in 3 × 2 × 2 mixed-model ANOVAs; post-trial scales captured usability (SUS), user experience (UEQ-S), and acceptability. Age dominated behaviour: younger adults were reliably faster, more precise, and less error-prone. Against this backdrop, input modality mattered. Eye-tracking yielded the best spatial accuracy and shortened Trail A time relative to manual control; head-gaze matched eye-tracking on Trail A speed and became the quickest, least error-prone option on Trail B. Controllers lagged on every metric. Subjective ratings were high across the board, with only a small usability dip in middle-aged low-gamers. Overall, gaze-based ray-casting clearly outperformed manual pointing, but optimal choice depended on task demands: eye-tracking maximised spatial precision, whereas head-gaze offered calibration-free enhanced speed and error-avoidance under heavier cognitive load. TMT-VR appears to be accurate, engaging, and ergonomically adaptable assessment, yet it requires age-specific–stratified norms. Full article

(This article belongs to the Special Issue 3D User Interfaces and Virtual Reality—2nd Edition)

► Show Figures

Figure 1

43 pages, 190510 KiB

Open AccessArticle

From Viewing to Structure: A Computational Framework for Modeling and Visualizing Visual Exploration

by Kuan-Chen Chen, Chang-Franw Lee, Teng-Wen Chang, Cheng-Gang Wang and Jia-Rong Li

Appl. Sci. 2025, 15(14), 7900; https://doi.org/10.3390/app15147900 - 15 Jul 2025

Viewed by 254

Abstract

This study proposes a computational framework that transforms eye-tracking analysis from statistical description to cognitive structure modeling, aiming to reveal the organizational features embedded in the viewing process. Using the designers’ observation of a traditional Chinese landscape painting as an example, the study [...] Read more.

This study proposes a computational framework that transforms eye-tracking analysis from statistical description to cognitive structure modeling, aiming to reveal the organizational features embedded in the viewing process. Using the designers’ observation of a traditional Chinese landscape painting as an example, the study draws on the goal-oriented nature of design thinking to suggest that such visual exploration may exhibit latent structural tendencies, reflected in patterns of fixation and transition. Rather than focusing on traditional fixation hotspots, our four-dimensional framework (Region, Relation, Weight, Time) treats viewing behavior as structured cognitive networks. To operationalize this framework, we developed a data-driven computational approach that integrates fixation coordinate transformation, K-means clustering, extremum point detection, and linear interpolation. These techniques identify regions of concentrated visual attention and define their spatial boundaries, allowing for the modeling of inter-regional relationships and cognitive organization among visual areas. An adaptive buffer zone method is further employed to quantify the strength of connections between regions and to delineate potential visual nodes and transition pathways. Three design-trained participants were invited to observe the same painting while performing a think-aloud task, with one participant selected for the detailed demonstration of the analytical process. The framework’s applicability across different viewers was validated through consistent structural patterns observed across all three participants, while simultaneously revealing individual differences in their visual exploration strategies. These findings demonstrate that the proposed framework provides a replicable and generalizable method for systematically analyzing viewing behavior across individuals, enabling rapid identification of both common patterns and individual differences in visual exploration. This approach opens new possibilities for discovering structural organization within visual exploration data and analyzing goal-directed viewing behaviors. Although this study focuses on method demonstration, it proposes a preliminary hypothesis that designers’ gaze structures are significantly more clustered and hierarchically organized than those of novices, providing a foundation for future confirmatory testing. Full article

(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)

► Show Figures

Figure 1

18 pages, 5112 KiB

Open AccessArticle

Gaze–Hand Steering for Travel and Multitasking in Virtual Environments

by Mona Zavichi, André Santos, Catarina Moreira, Anderson Maciel and Joaquim Jorge

Multimodal Technol. Interact. 2025, 9(6), 61; https://doi.org/10.3390/mti9060061 - 13 Jun 2025

Viewed by 526

Abstract

As head-mounted displays (HMDs) with eye tracking become increasingly accessible, the need for effective gaze-based interfaces in virtual reality (VR) grows. Traditional gaze- or hand-based navigation often limits user precision or impairs free viewing, making multitasking difficult. We present a gaze–hand steering technique [...] Read more.

As head-mounted displays (HMDs) with eye tracking become increasingly accessible, the need for effective gaze-based interfaces in virtual reality (VR) grows. Traditional gaze- or hand-based navigation often limits user precision or impairs free viewing, making multitasking difficult. We present a gaze–hand steering technique that combines eye tracking with hand pointing: users steer only when gaze aligns with a hand-defined target, reducing unintended actions and enabling free look. Speed is controlled via either a joystick or a waist-level speed circle. We evaluated our method in a user study (n = 20) across multitasking and single-task scenarios, comparing it to a similar technique. Results show that gaze–hand steering maintains performance and enhances user comfort and spatial awareness during multitasking. Our findings support using gaze–hand steering in gaze-dominant VR applications requiring precision and simultaneous interaction. Our method significantly improves VR navigation in gaze–dominant, multitasking-intensive applications, supporting immersion and efficient control. Full article

► Show Figures

Figure 1

12 pages, 5990 KiB

Open AccessArticle

Research on the Use of Eye-Tracking in Marking Prohibited Items in Security Screening at the Airport

by Artur Kierzkowski, Tomasz Kisiel, Ewa Mardeusz and Jacek Ryczyński

Appl. Sci. 2025, 15(11), 6161; https://doi.org/10.3390/app15116161 - 30 May 2025

Viewed by 335

Abstract

This article presents the results of an experimental study that indicates a problem in assessing the effectiveness of security control operators, that is, the Hit Rate indicator. It was pointed out that there is a significant error in determining whether an operator correctly [...] Read more.

This article presents the results of an experimental study that indicates a problem in assessing the effectiveness of security control operators, that is, the Hit Rate indicator. It was pointed out that there is a significant error in determining whether an operator correctly identifies prohibited items. This is a significant problem because the effectiveness of operators is expected to be very high, and the measurement error may interfere with their correct assessment. So far, there has yet to be any research on improving the accuracy of estimating this rate. This article examines whether eye-tracking technology can be used to eliminate this error. An experimental test of using a 120 Hz eye-tracking system was conducted to see what the effectiveness of identifying gaze location is on two 19″ monitors, which simulates the operator’s actual work environment. The tests were conducted under simulated conditions, using a replica of an actual security control operator’s station. As a contribution, it has been proven that it is possible to use eye-tracking technology to significantly increase the accuracy of screening operators’ assessments. This is important knowledge that can be the basis for further research in this area. Full article

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

► Show Figures

Figure 1

28 pages, 1519 KiB

Open AccessSystematic Review

Analysis of Teachers’ Visual Behaviour in Classes: A Systematic Review

by Rodrigo Mendes, Mário Pereira, Paulo Nobre and Gonçalo Dias

Eur. J. Investig. Health Psychol. Educ. 2025, 15(4), 54; https://doi.org/10.3390/ejihpe15040054 - 5 Apr 2025

Viewed by 789

Abstract

(1) Background: Teachers’ visual behaviour in classes has an important role in learning and instruction. Hence, understanding the dynamics of classroom interactions is fundamental in educational research. As mapping evidence on this topic would highlight concepts and knowledge gaps in this area, this [...] Read more.

(1) Background: Teachers’ visual behaviour in classes has an important role in learning and instruction. Hence, understanding the dynamics of classroom interactions is fundamental in educational research. As mapping evidence on this topic would highlight concepts and knowledge gaps in this area, this systematic review aimed to collect and systematise the analysis of teachers’ visual behaviour in classroom settings through the use of eye-tracking apparatus; (2) Methods: The methodological procedures were registered in the INPLASY database and this systematic review used the PRISMA criteria for the selection and analysis of studies in this area. We searched on six literature databases (B-on, ERIC, ScienceDirect, Scopus, TRC and WoS) between 1 January 2015 and 31 December 2024. Eligible articles used eye tracking apparatus and analysed teachers’ visual behaviour as a dependent variable in the experiment; (3) Results: The main results of the articles selected (n = 41) points to the differences in teachers’ visual behaviour in terms of professional experience and the relationship between gaze patterns and several classroom variables; (4) Conclusions: A deeper understanding of teachers’ visual behaviour can lead to more effective teacher training and better classroom environments. The scientific research in this area would benefit from more standardized and robust methodologies that allow more reliable analyses of the added value of eye tracking technology. Full article

► Show Figures

Figure 1

27 pages, 5537 KiB

Open AccessArticle

Real-Time Gaze Estimation Using Webcam-Based CNN Models for Human–Computer Interactions

by Visal Vidhya and Diego Resende Faria

Computers 2025, 14(2), 57; https://doi.org/10.3390/computers14020057 - 10 Feb 2025

Viewed by 3085

Abstract

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. [...] Read more.

Gaze tracking and estimation are essential for understanding human behavior and enhancing human–computer interactions. This study introduces an innovative, cost-effective solution for real-time gaze tracking using a standard webcam, providing a practical alternative to conventional methods that rely on expensive infrared (IR) cameras. Traditional approaches, such as Pupil Center Corneal Reflection (PCCR), require IR cameras to capture corneal reflections and iris glints, demanding high-resolution images and controlled environments. In contrast, the proposed method utilizes a convolutional neural network (CNN) trained on webcam-captured images to achieve precise gaze estimation. The developed deep learning model achieves a mean squared error (MSE) of 0.0112 and an accuracy of 90.98% through a novel trajectory-based accuracy evaluation system. This system involves an animation of a ball moving across the screen, with the user’s gaze following the ball’s motion. Accuracy is determined by calculating the proportion of gaze points falling within a predefined threshold based on the ball’s radius, ensuring a comprehensive evaluation of the system’s performance across all screen regions. Data collection is both simplified and effective, capturing images of the user’s right eye while they focus on the screen. Additionally, the system includes advanced gaze analysis tools, such as heat maps, gaze fixation tracking, and blink rate monitoring, which are all integrated into an intuitive user interface. The robustness of this approach is further enhanced by incorporating Google’s Mediapipe model for facial landmark detection, improving accuracy and reliability. The evaluation results demonstrate that the proposed method delivers high-accuracy gaze prediction without the need for expensive equipment, making it a practical and accessible solution for diverse applications in human–computer interactions and behavioral research. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

30 pages, 2719 KiB

Open AccessArticle

Predicting Shot Accuracy in Badminton Using Quiet Eye Metrics and Neural Networks

by Samson Tan and Teik Toe Teoh

Appl. Sci. 2024, 14(21), 9906; https://doi.org/10.3390/app14219906 - 29 Oct 2024

Cited by 3 | Viewed by 2716

Abstract

This paper presents a novel approach to predicting shot accuracy in badminton by analyzing Quiet Eye (QE) metrics such as QE duration, fixation points, and gaze dynamics. We develop a neural network model that combines visual data from eye-tracking devices with biomechanical data [...] Read more.

This paper presents a novel approach to predicting shot accuracy in badminton by analyzing Quiet Eye (QE) metrics such as QE duration, fixation points, and gaze dynamics. We develop a neural network model that combines visual data from eye-tracking devices with biomechanical data such as body posture and shuttlecock trajectory. Our model is designed to predict shot accuracy, providing insights into the role of QE in performance. The study involved 30 badminton players of varying skill levels from the Chinese Swimming Club in Singapore. Using a combination of eye-tracking technology and motion capture systems, we collected data on QE metrics and biomechanical factors during a series of badminton shots for a total of 750. Key results include: (1) The neural network model achieved 85% accuracy in predicting shot outcomes, demonstrating the potential of integrating QE metrics with biomechanical data. (2) QE duration and onset were identified as the most significant predictors of shot accuracy, followed by racket speed and wrist angle at impact. (3) Elite players exhibited significantly longer QE durations (M = 289.5 ms) compared to intermediate (M = 213.7 ms) and novice players (M = 168.3 ms). (4) A strong positive correlation (r = 0.72) was found between QE duration and shot accuracy across all skill levels. These findings have important implications for badminton training and performance evaluation. The study suggests that QE-based training programs could significantly enhance players’ shot accuracy. Furthermore, the predictive model developed in this study offers a framework for real-time performance analysis and personalized training regimens in badminton. By bridging cognitive neuroscience and sports performance through advanced data analytics, this research paves the way for more sophisticated, individualized training approaches in badminton and potentially other fast-paced sports. Future research directions include exploring the temporal dynamics of QE during matches and developing real-time feedback systems based on QE metrics. Full article

► Show Figures

Figure 1

34 pages, 30882 KiB

Open AccessArticle

Intelligent Evaluation Method for Design Education and Comparison Research between visualizing Heat-Maps of Class Activation and Eye-Movement

by Jiayi Jia, Tianjiao Zhao, Junyu Yang and Qian Wang

J. Eye Mov. Res. 2024, 17(2), 1-34; https://doi.org/10.16910/jemr.17.2.1 - 10 Oct 2024

Cited by 1 | Viewed by 315

Abstract

The evaluation of design results plays a crucial role in the development of design. This study presents a design work evaluation system for design education that assists design instructors in conducting objective evaluations. An automatic design evaluation model based on convolutional neural networks [...] Read more.

The evaluation of design results plays a crucial role in the development of design. This study presents a design work evaluation system for design education that assists design instructors in conducting objective evaluations. An automatic design evaluation model based on convolutional neural networks has been established, which enables intelligent evaluation of student design works. During the evaluation process, the CAM is obtained. Simultaneously, an eye-tracking experiment was designed to collect gaze data and generate eye-tracking heat maps. By comparing the heat maps with CAM, an attempt was made to explore the correlation between the focus of the evaluation’s attention on human design evaluation and the CNN intelligent evaluation. The experimental results indicate that there is some certain correlation between humans and CNN in terms of the key points they focus on when conducting an evaluation. However, there are significant differences in background observation. The research results demonstrate that the intelligent evaluation model of CNN can automatically evaluate product design works and effectively classify and predict design product images. The comparison shows a correlation between artificial intelligence and the subjective evaluation of human eyes in evaluation strategy. Introducing artificial intelligence into the field of design evaluation for education has a strong potential to promote the development of design education. Full article

► Show Figures

Figure 1

26 pages, 3818 KiB

Open AccessArticle

Human–AI Co-Drawing: Studying Creative Efficacy and Eye Tracking in Observation and Cooperation

by Yuying Pei, Linlin Wang and Chengqi Xue

Appl. Sci. 2024, 14(18), 8203; https://doi.org/10.3390/app14188203 - 12 Sep 2024

Cited by 1 | Viewed by 3896

Abstract

Artificial intelligence (AI) tools are rapidly transforming the field of traditional artistic creation, influencing painting processes and human creativity. This study explores human–AI cooperation in real-time artistic drawing by using the AIGC tool KREA.AI. Participants wear eye trackers and perform drawing tasks by [...] Read more.

Artificial intelligence (AI) tools are rapidly transforming the field of traditional artistic creation, influencing painting processes and human creativity. This study explores human–AI cooperation in real-time artistic drawing by using the AIGC tool KREA.AI. Participants wear eye trackers and perform drawing tasks by adjusting the AI parameters. The research aims to investigate the impact of cross-screen and non-cross-screen conditions, as well as different viewing strategies, on cognitive load and the degree of creative stimulation during user–AI collaborative drawing. Adopting a mixed design, it examines the influence of different cooperation modes and visual search methods on creative efficacy and visual perception through eye-tracking data and creativity performance scales. The cross-screen type and task type have a significant impact on total interval duration, number of fixation points, average fixation duration, and average pupil diameter in occlusion decision-making and occlusion hand drawing. There are significant differences in the variables of average gaze duration and average pupil diameter among different task types and cross-screen types. In non-cross-screen situations, occlusion and non-occlusion have a significant impact on average gaze duration and pupil diameter. Tasks in non-cross-screen environments are more sensitive to visual processing. The involvement of AI in hand drawing in non-cross-screen collaborative drawing by designers has a significant impact on their visual perception. These results help us to gain a deeper understanding of user behaviour and cognitive load under different visual tasks and cross-screen conditions. The analysis of the creative efficiency scale data reveals significant differences in designers’ ability to supplement and improve AI ideas across different modes. This indicates that the extent of AI participation in the designer’s hand-drawn creative process significantly impacts the designer’s behaviour when negotiating design ideas with the AI. Full article

► Show Figures

Figure 1

25 pages, 10627 KiB

Open AccessArticle

A Study on Differences in Educational Method to Periodic Inspection Work of Nuclear Power Plants

by Yuichi Yashiro, Gang Wang, Fumio Hatori and Nobuyoshi Yabuki

CivilEng 2024, 5(3), 760-784; https://doi.org/10.3390/civileng5030040 - 9 Sep 2024

Viewed by 1215

Abstract

Construction work and regular inspection work at nuclear power plants involve many special tasks, unlike general on-site work. In addition, the opportunity to transfer knowledge from skilled workers to unskilled workers is limited due to the inability to easily enter the plant and [...] Read more.

Construction work and regular inspection work at nuclear power plants involve many special tasks, unlike general on-site work. In addition, the opportunity to transfer knowledge from skilled workers to unskilled workers is limited due to the inability to easily enter the plant and various security and radiation exposure issues. Therefore, in this study, we considered the application of virtual reality (VR) as a method to increase opportunities to learn anytime and anywhere and to transfer knowledge more effectively. In addition, as an interactive learning method to improve comprehension, we devised a system that uses hand tracking and eye tracking to allow participants to experience movements and postures that are closer to the real work in a virtual space. For hand-based work, three actions, “pinch”, “grab”, and “hold”, were reproduced depending on the sizes of the parts and tools, and visual confirmation work was reproduced by the movement of the gaze point of the eyes, faithfully reproducing the special actions of the inspection work. We confirmed that a hybrid learning process that appropriately combines the developed active learning method, using experiential VR, with conventional passive learning methods, using paper and video, can improve the comprehension and retention of special work at nuclear power plants. Full article

(This article belongs to the Collection Recent Advances and Development in Civil Engineering)

► Show Figures

Figure 1

18 pages, 11425 KiB

Open AccessArticle

SmartVR Pointer: Using Smartphones and Gaze Orientation for Selection and Navigation in Virtual Reality

by Brianna McDonald, Qingyu Zhang, Aiur Nanzatov, Lourdes Peña-Castillo and Oscar Meruvia-Pastor

Sensors 2024, 24(16), 5168; https://doi.org/10.3390/s24165168 - 10 Aug 2024

Cited by 2 | Viewed by 1757

Abstract

Some of the barriers preventing virtual reality (VR) from being widely adopted are the cost and unfamiliarity of VR systems. Here, we propose that in many cases, the specialized controllers shipped with most VR head-mounted displays can be replaced by a regular smartphone, [...] Read more.

Some of the barriers preventing virtual reality (VR) from being widely adopted are the cost and unfamiliarity of VR systems. Here, we propose that in many cases, the specialized controllers shipped with most VR head-mounted displays can be replaced by a regular smartphone, cutting the cost of the system, and allowing users to interact in VR using a device they are already familiar with. To achieve this, we developed SmartVR Pointer, an approach that uses smartphones to replace the specialized controllers for two essential operations in VR: selection and navigation by teleporting. In SmartVR Pointer, a camera mounted on the head-mounted display (HMD) is tilted downwards so that it points to where the user will naturally be holding their phone in front of them. SmartVR Pointer supports three selection modalities: tracker based, gaze based, and combined/hybrid. In the tracker-based SmartVR Pointer selection, we use image-based tracking to track a QR code displayed on the phone screen and then map the phone’s position to a pointer shown within the field of view of the camera in the virtual environment. In the gaze-based selection modality, the user controls the pointer using their gaze and taps on the phone for selection. The combined technique is a hybrid between gaze-based interaction in VR and tracker-based Augmented Reality. It allows the user to control a VR pointer that looks and behaves like a mouse pointer by moving their smartphone to select objects within the virtual environment, and to interact with the selected objects using the smartphone’s touch screen. The touchscreen is used for selection and dragging. The SmartVR Pointer is simple and requires no calibration and no complex hardware assembly or disassembly. We demonstrate successful interactive applications of SmartVR Pointer in a VR environment with a demo where the user navigates in the virtual environment using teleportation points on the floor and then solves a Tetris-style key-and-lock challenge. Full article

(This article belongs to the Special Issue Sensors and Techniques for Virtual Reality, Augmented Reality and Mixed Reality Applications)

► Show Figures

Figure 1

18 pages, 4494 KiB

Open AccessArticle

Balancing Accuracy and Speed in Gaze-Touch Grid Menu Selection in AR via Mapping Sub-Menus to a Hand-Held Device

by Yang Tian, Yulin Zheng, Shengdong Zhao, Xiaojuan Ma and Yunhai Wang

Sensors 2023, 23(23), 9587; https://doi.org/10.3390/s23239587 - 3 Dec 2023

Cited by 1 | Viewed by 1537

Abstract

Eye gaze can be a potentially fast and ergonomic method for target selection in augmented reality (AR). However, the eye-tracking accuracy of current consumer-level AR systems is limited. While state-of-the-art AR target selection techniques based on eye gaze and touch (gaze-touch), which follow [...] Read more.

Eye gaze can be a potentially fast and ergonomic method for target selection in augmented reality (AR). However, the eye-tracking accuracy of current consumer-level AR systems is limited. While state-of-the-art AR target selection techniques based on eye gaze and touch (gaze-touch), which follow the “eye gaze pre-selects, touch refines and confirms” mechanism, can significantly enhance selection accuracy, their selection speeds are usually compromised. To balance accuracy and speed in gaze-touch grid menu selection in AR, we propose the Hand-Held Sub-Menu (HHSM) technique.tou HHSM divides a grid menu into several sub-menus and maps the sub-menu pointed to by eye gaze onto the touchscreen of a hand-held device. To select a target item, the user first selects the sub-menu containing it via eye gaze and then confirms the selection on the touchscreen via a single touch action. We derived the HHSM technique’s design space and investigated it through a series of empirical studies. Through an empirical study involving 24 participants recruited from a local university, we found that HHSM can effectively balance accuracy and speed in gaze-touch grid menu selection in AR. The error rate was approximately 2%, and the completion time per selection was around 0.93 s when participants used two thumbs to interact with the touchscreen, and approximately 1.1 s when they used only one finger. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

12 pages, 1948 KiB

Open AccessArticle

Kids Save Lives by Learning through a Serious Game

by Miriam Mendoza López, Petronila Mireia Alcaraz Artero, Carlos Truque Díaz, Manuel Pardo Ríos, Juan José Hernández Morante and Rafael Melendreras Ruiz

Multimodal Technol. Interact. 2023, 7(12), 112; https://doi.org/10.3390/mti7120112 - 1 Dec 2023

Cited by 1 | Viewed by 2359

Abstract

This study focuses on the development and assessment of a serious game for health (SGH) aimed at educating children about cardiopulmonary resuscitation (CPR). A video game was created using the Berkeley Snap platform, which uses block programming. Eye-tracking technology was utilized to validate [...] Read more.

This study focuses on the development and assessment of a serious game for health (SGH) aimed at educating children about cardiopulmonary resuscitation (CPR). A video game was created using the Berkeley Snap platform, which uses block programming. Eye-tracking technology was utilized to validate the graphic design. To assess the tool’s effectiveness, a pre-post analytical study was conducted with primary education children to measure the knowledge acquired. The study involved 52 participants with a mean age of 9 years. The results from a custom questionnaire used to measure their theoretical CPR knowledge showed significant improvements in CPR knowledge after the use of the videogame, and their emotional responses improved as well. The assessment of the knowledge acquired through the video game obtained an average score of 5.25 out of 6. Ten video segments consisting of 500 frames each (20 s of video per segment) were analyzed. Within these segments, specific areas that captured the most relevant interaction elements were selected to measure the child’s attention during game play. The average number of gaze fixations, indicating the points in which the child’s attention was placed within the area of interest, was 361.5 out of 500. In conclusion, the utilization of SGH may be an effective method for educating kids about CPR, to provide them with fundamental knowledge relevant to their age group. Full article

(This article belongs to the Special Issue Designing EdTech and Virtual Learning Environments)

► Show Figures

Figure 1

Search Results (85)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (85)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI