Next Issue
Volume 9, April
Previous Issue
Volume 9, February
 
 

Multimodal Technol. Interact., Volume 9, Issue 3 (March 2025) – 11 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
28 pages, 7066 KiB  
Systematic Review
A Systematic Review on Artificial Intelligence-Based Multimodal Dialogue Systems Capable of Emotion Recognition
by Luis Bravo, Ciro Rodriguez, Pedro Hidalgo and Cesar Angulo
Multimodal Technol. Interact. 2025, 9(3), 28; https://doi.org/10.3390/mti9030028 - 14 Mar 2025
Viewed by 1226
Abstract
In the current context, the use of technologies in applications for multimodal dialogue systems with computers and emotion recognition through artificial intelligence continues to grow rapidly. Consequently, it is challenging for researchers to identify gaps, propose new models, and increase user satisfaction. The [...] Read more.
In the current context, the use of technologies in applications for multimodal dialogue systems with computers and emotion recognition through artificial intelligence continues to grow rapidly. Consequently, it is challenging for researchers to identify gaps, propose new models, and increase user satisfaction. The objective of this study is to explore and analyze potential applications based on artificial intelligence for multimodal dialogue systems incorporating emotion recognition. The methodology used in selecting papers is in accordance with PRISMA and identifies 13 scientific articles whose research proposals are generally focused on convolutional neural networks (CNNs), Long Short-Term Memory (LSTM), GRU, and BERT. The research results identify the proposed models as Mindlink-Eumpy, RHPRnet, Emo Fu-Sense, 3FACRNNN, H-MMER, TMID, DKMD, and MatCR. The datasets used are DEAP, MAHNOB-HCI, SEED-IV, SEDD-V, AMIGOS, and DREAMER. In addition, the metrics achieved by the models are presented. It is concluded that emotion recognition models such as Emo Fu-Sense, 3FACRNNN, and H-MMER obtain outstanding results, with their accuracy ranging from 92.62% to 98.19%, and multimodal dialogue models such as TMID and the scene-aware model with BLEU4 metrics obtain values of 51.59% and 29%, respectively. Full article
Show Figures

Figure 1

24 pages, 4499 KiB  
Article
Human Attitudes in Robotic Path Programming: A Pilot Study of User Experience in Manual and XR-Controlled Robotic Arm Manipulation
by Oscar Escallada, Nagore Osa, Ganix Lasa, Maitane Mazmela, Fatih Doğangün, Yigit Yildirim, Serdar Bahar and Emre Ugur
Multimodal Technol. Interact. 2025, 9(3), 27; https://doi.org/10.3390/mti9030027 - 10 Mar 2025
Viewed by 720
Abstract
Extended reality (XR) and collaborative robots are reshaping human–robot interaction (HRI) by introducing novel control methods that enhance user experience (UX). However, human factors such as cognitive workload, usability, trust, and task performance are often underexplored. This study evaluated UX during robotic manipulation [...] Read more.
Extended reality (XR) and collaborative robots are reshaping human–robot interaction (HRI) by introducing novel control methods that enhance user experience (UX). However, human factors such as cognitive workload, usability, trust, and task performance are often underexplored. This study evaluated UX during robotic manipulation tasks under three interaction modalities: manual control, XR-based control at real-time speed (RS), and XR-based control at reduced safety speed (SS). Twenty-one participants performed a series of tasks across three scenarios, where we measured usability, workload, flow state, trust, and agency using a subjective questionnaire adapted from SUS, NASA-TLX, FSS, SoAS, and Trust in Industrial Human–Robot Collaboration Questionnaire, and objective task metrics (completion time, errors, and attempts). Our results reveal that RS-based control modes significantly reduced physical workload and improved usability compared to manual control. RS control at real-time speed enhanced task efficiency but increased error rates during complex tasks, while SS mode mitigated errors at the cost of prolonged completion times. Trust and agency remained stable across all modalities, indicating extended reality technologies do not undermine user confidence. These findings contribute to the field of human–robot collaboration by offering insights regarding efficiency, accuracy, and UX. The results are particularly relevant for industries seeking to optimize safety, productivity, and human-centric robotic systems. Full article
Show Figures

Graphical abstract

25 pages, 2317 KiB  
Article
diaLogic: A Multi-Modal Framework for Automated Team Behavior Modeling Based on Speech Acquisition
by Ryan Duke and Alex Doboli
Multimodal Technol. Interact. 2025, 9(3), 26; https://doi.org/10.3390/mti9030026 - 10 Mar 2025
Viewed by 529
Abstract
This paper presents diaLogic, a humans-in-the-loop system for modeling the behavior of teams during collective problem solving. Team behavior is modeled using multi-modal data about cognition, social interactions, and emotions acquired from speech inputs. The system includes methods for speaker diarization, speaker interaction [...] Read more.
This paper presents diaLogic, a humans-in-the-loop system for modeling the behavior of teams during collective problem solving. Team behavior is modeled using multi-modal data about cognition, social interactions, and emotions acquired from speech inputs. The system includes methods for speaker diarization, speaker interaction characterization, speaker emotion recognition, and speech-to-text conversion. Hypotheses about the invariant and differentiated aspects of teams are extracted using the similarities and dissimilarities of their behavior over time. Hypothesis extraction, a novel contribution of this work, uses a method to identify the clauses and concepts in each spoken sentence. Experiments present system performance for a broad set of cases of team behavior during problem solving. The average errors of the various methods are between 6% and 21%. The system can be used in a broad range of applications, from education to team research and therapy. Full article
Show Figures

Figure 1

25 pages, 2851 KiB  
Article
Self-Created Film as a Resource in a Multimodal Conversational Narrative
by Mascha Legel, Stijn R. J. M. Deckers, Gloria Soto, Nicola Grove, Annalu Waller, Hans van Balkom, Ronald Spanjers, Christopher S. Norrie and Bert Steenbergen
Multimodal Technol. Interact. 2025, 9(3), 25; https://doi.org/10.3390/mti9030025 - 10 Mar 2025
Viewed by 633
Abstract
When access to natural speech is limited or challenging, as is the case for people with complex communication needs, self-created digital film can be practical to use as one of the resources within a multimodal conversation about a personal experience. The detailed and [...] Read more.
When access to natural speech is limited or challenging, as is the case for people with complex communication needs, self-created digital film can be practical to use as one of the resources within a multimodal conversation about a personal experience. The detailed and contextual information such audiovisual media offers with today’s available technology may assist the utility of other communication modes, such as (computerized) spoken, written or signed language, to foster mutual understanding and story growth. To promote the use of self-created film, here named a personal-video-scene (PVS), in the practice of augmentative and alternative communication (AAC), a greater understanding is required of how such media can operate as a resource within social interactions, such as daily conversations. This study therefore introduces a multimodal coding model developed to study the employment of a PVS within a film-elicited conversational narrative, relating to four aspects of conversational control: (a) topic development, (b) conversational structure, (c) conversational repair and (d) conversational maintenance. A case study illustrates how the use of a PVS in story-sharing was instrumental in establishing common ground between narrators, boosting the frequency of comments and questions, mitigating instances of conversational repair and expanding topic development. Full article
Show Figures

Graphical abstract

20 pages, 1416 KiB  
Article
Effects of Flight Experience or Simulator Exposure on Simulator Sickness in Virtual Reality Flight Simulation
by Alexander Somerville, Keith Joiner and Graham Wild
Multimodal Technol. Interact. 2025, 9(3), 24; https://doi.org/10.3390/mti9030024 - 6 Mar 2025
Viewed by 548
Abstract
The use of virtual reality (VR) for flight simulation, particularly in the earliest stages of pilot training, is gaining attention in both research and industry. The use of the technology for this ab initio training requires suitable consideration of the risks of simulator [...] Read more.
The use of virtual reality (VR) for flight simulation, particularly in the earliest stages of pilot training, is gaining attention in both research and industry. The use of the technology for this ab initio training requires suitable consideration of the risks of simulator sickness—risks that are heightened relative to conventional simulators. If simulator sickness results in the development of compensatory skills, or otherwise disrupts the training process, the benefits of the technology may be negated. Enabling the effective integration of VR within flight training requires that, to the extent that simulator sickness is an issue, practical mechanisms are developed to manage the occurrence without disrupting existing training structures. The primary objective of this research is, thus, to evaluate an intervention and a nuisance factor in relation to the reduction of simulator sickness, considering their practicality within existing flight training syllabi. The Total Severity (TS) of the Simulator Sickness Questionnaire (SSQ) was evaluated within a quasi-experimental, non-equivalent pre-test–post-test design, incorporating three groups: a prior flight experience nuisance factor group, a prior personal computer aviation training device (PCATD) exposure intervention group, and a control group with neither prior experience nor prior simulator exposure. The results indicated that the TS was significantly reduced for the prior flight experience nuisance factor (rrb = 0.375), but that the PCATD exposure intervention produced no such reduction (rrb = 0.016). The findings suggest that VR flight simulation is likely best used as a supplemental tool, introduced after initial airborne experience. Notwithstanding this finding, the relatively low median TS scores (<20) for all groups suggest that the technology may still be used with caution earlier in the training process. No other published research has examined this important effect in the context of the new VR situation. Full article
Show Figures

Figure 1

15 pages, 910 KiB  
Brief Report
Real-Time Norwegian Sign Language Recognition Using MediaPipe and LSTM
by Md. Zia Uddin, Costas Boletsis and Pål Rudshavn
Multimodal Technol. Interact. 2025, 9(3), 23; https://doi.org/10.3390/mti9030023 - 3 Mar 2025
Viewed by 1202
Abstract
The application of machine learning models for sign language recognition (SLR) is a well-researched topic. However, many existing SLR systems focus on widely used sign languages, e.g., American Sign Language, leaving other underrepresented sign languages such as Norwegian Sign Language (NSL) relatively underexplored. [...] Read more.
The application of machine learning models for sign language recognition (SLR) is a well-researched topic. However, many existing SLR systems focus on widely used sign languages, e.g., American Sign Language, leaving other underrepresented sign languages such as Norwegian Sign Language (NSL) relatively underexplored. This work presents a preliminary system for recognizing NSL gestures, focusing on numbers 0 to 10. Mediapipe is used for feature extraction and Long Short-Term Memory (LSTM) networks for temporal modeling. This system achieves a testing accuracy of 95%, aligning with existing benchmarks and demonstrating its robustness to variations in signing styles, orientations, and speeds. While challenges such as data imbalance and misclassification of similar gestures (e.g., Signs 3 and 8) were observed, the results underscore the potential of our proposed approach. Future iterations of the system will prioritize expanding the dataset by including additional gestures and environmental variations as well as integrating additional modalities. Full article
Show Figures

Figure 1

23 pages, 2067 KiB  
Article
Choice Vectors: Streamlining Personal AI Alignment Through Binary Selection
by Eleanor Watson, Minh Nguyen, Sarah Pan and Shujun Zhang
Multimodal Technol. Interact. 2025, 9(3), 22; https://doi.org/10.3390/mti9030022 - 3 Mar 2025
Viewed by 817
Abstract
Value alignment for AI is not “one-size-fits-all”: even polite and friendly models can still fail to represent individual user contexts and preferences, and local cultural norms. This paper presents a modular workflow for personal fine-tuning, synthesizing four core components from our previous research: [...] Read more.
Value alignment for AI is not “one-size-fits-all”: even polite and friendly models can still fail to represent individual user contexts and preferences, and local cultural norms. This paper presents a modular workflow for personal fine-tuning, synthesizing four core components from our previous research: (1) robust vectorization of user values and preferences, (2) a binary choice user interface (UI) approach to capturing those preferences with minimal cognitive load, (3) contrastive activation methods for steering large language models (LLMs) via difference vectors, and (4) knowledge graph integration for more auditable and structured alignment. Our approach—descended from past research on “Towards an End-to-End Personal Fine-Tuning Framework”—demonstrates how these elements can be combined to create personalized, context-rich alignment solutions. We report on user studies for the forced-choice UI, describe an experimental pipeline for deriving “control vectors”, and propose a “moral graph” method for bridging symbolic and vector-based alignment. Our findings suggest that multi-pronged personalization can significantly reduce user annotation fatigue, improve alignment fidelity, and allow for more flexible, interpretable AI behaviors. Full article
Show Figures

Figure 1

17 pages, 3021 KiB  
Article
Perceptions of Parents and Children About Videogame Use
by Michela Franzò, Gaia Maria Olivieri, Anna Salerni and Marco Iosa
Multimodal Technol. Interact. 2025, 9(3), 21; https://doi.org/10.3390/mti9030021 - 28 Feb 2025
Viewed by 843
Abstract
This study aims to investigate the gap in perceptions of parents and children on the use of videogames in childhood. Methods: A survey was conducted with 75 pairs formed by a son or daughter and one parent. The data collected contradict the prejudice [...] Read more.
This study aims to investigate the gap in perceptions of parents and children on the use of videogames in childhood. Methods: A survey was conducted with 75 pairs formed by a son or daughter and one parent. The data collected contradict the prejudice that playing video games reduces study time and leads to lower grades at school (R < 0.13). Our results support the idea that playing together fosters bonding and facilitates conversation. The impact of videogames on mood showed the most substantial differences in perception, with parents mainly reporting negative mood changes, while children reported similar frequencies of negative, neutral, and positive ones. In relation to the educational and informative potential of videogames, children had slightly more positive opinions than their parents (p < 0.001). Finally, more than half of the participants potentially agreed with the possibility of using videogames as academic tools. In conclusion, there is a gap between parents’ and children’s perceptions about videogaming, especially concerning their effects on children’s mood. Playing together and developing deeper knowledge about videogames could enhance positive effects on children’s development as well as their relationships with peers, parents, and at school. Full article
Show Figures

Graphical abstract

19 pages, 1582 KiB  
Article
Designing Digital Escape Rooms with Generative AI in University Contexts: A Qualitative Study
by Paula Rodríguez-Rivera, José M. Rodríguez-Ferrer and Ana Manzano-León
Multimodal Technol. Interact. 2025, 9(3), 20; https://doi.org/10.3390/mti9030020 - 27 Feb 2025
Cited by 1 | Viewed by 1366
Abstract
The rapid evolution of technology in education highlights the need for methodologies that enhance student engagement and skill development. This study examines students’ perceptions of designing educational escape rooms using ICT tools and generative AI (GenAI) as a learning methodology. A total of [...] Read more.
The rapid evolution of technology in education highlights the need for methodologies that enhance student engagement and skill development. This study examines students’ perceptions of designing educational escape rooms using ICT tools and generative AI (GenAI) as a learning methodology. A total of 47 students participated in creating digital escape rooms with GenAI, Genially, and HeroForge in the course “Mediation in Conflicts and Situations of Violence” within a Social Education degree. A qualitative approach was used, analyzing focus group discussions conducted after the activity. Results indicate that students valued the experience, emphasizing its impact on digital competence, creativity, and problem-solving skills. Collaborative learning helped overcome initial technical challenges, and students recognized the practical applicability of escape room design in mediation contexts. However, they identified areas for improvement, such as the need for more initial training, extended development time, and better access to digital tools. This study contributes to game-based learning and AI-enhanced education research, positioning students as active designers rather than passive users. Future research should explore the long-term impact on knowledge retention and transferable skills in professional settings. Full article
Show Figures

Graphical abstract

14 pages, 981 KiB  
Article
Sensory Perception During Partial Pseudo-Haptics Applied to Adjacent Fingers
by Satoshi Saga and Kotaro Sakae
Multimodal Technol. Interact. 2025, 9(3), 19; https://doi.org/10.3390/mti9030019 - 26 Feb 2025
Viewed by 482
Abstract
Pseudo-haptics, the phenomenon of creating a simulated tactile sensation by introducing a discrepancy between a voluntary movement and its visual feedback, is well known. Typically, when inducing pseudo-haptics, the same control-display ratio (C/D ratio) is applied to all effectors. However, with the aim [...] Read more.
Pseudo-haptics, the phenomenon of creating a simulated tactile sensation by introducing a discrepancy between a voluntary movement and its visual feedback, is well known. Typically, when inducing pseudo-haptics, the same control-display ratio (C/D ratio) is applied to all effectors. However, with the aim of expanding the potential illusions that can be presented with pseudo-haptics, we investigated how perceived sensations change when partial pseudo-haptics are applied to adjacent body parts. In this research, we examined how perceived sensations change when pseudo-haptic stimuli are applied to adjacent body parts. Specifically, we investigated the correlation between finger states and the magnitude of illusory perception during both quasi-static and dynamic movements and identified the finger that experienced discomfort during dynamic movements with pseudo-haptics. Our findings revealed that: First, the magnitude of the illusion varied based on the contact state of adjacent fingers. Second, the illusion was more pronounced during dynamic movements compared to quasi-static movements. Third, regardless of the finger receiving the pseudo-haptic stimulus, the discomfort was primarily experienced in the finger exhibiting an overall inhibitory movement. The findings contribute to the practical application of pseudo-haptics as a virtual haptic display technology. Full article
Show Figures

Figure 1

22 pages, 616 KiB  
Article
Influence of Personality Traits and Demographics on Rapport Recognition Using Adversarial Learning
by Wenqing Wei, Sixia Li, Candy Olivia Mawalim, Xiguang Li, Kazunori Komatani and Shogo Okada
Multimodal Technol. Interact. 2025, 9(3), 18; https://doi.org/10.3390/mti9030018 - 20 Feb 2025
Viewed by 639
Abstract
The automatic recognition of user rapport at the dialogue level for multimodal dialogue systems (MDSs) is a critical component of effective dialogue system management. Both the dialogue systems and their evaluations need to be based on user expressions. Numerous studies have demonstrated that [...] Read more.
The automatic recognition of user rapport at the dialogue level for multimodal dialogue systems (MDSs) is a critical component of effective dialogue system management. Both the dialogue systems and their evaluations need to be based on user expressions. Numerous studies have demonstrated that user personalities and demographic data such as age and gender significantly affect user expression. Neglecting users’ personalities and demographic data will result in less accurate user expression and rapport recognition. To the best of our knowledge, no existing studies have considered the effects of users’ personalities and demographic data on the automatic recognition of user rapport in MDSs. To analyze the influence of users’ personalities and demographic data on dialogue level user rapport recognition, we first used a Hazummi dataset which is an online dataset containing users’ personal information (personality, age, and gender information). Based on this dataset, we analyzed the relationship between user rapport in dialogue systems and users’ traits, finding that gender and age significantly influence the recognition of user rapport. These factors could potentially introduce biases into the model. To mitigate the impact of users’ traits, we introduced an adversarial-based model. Experimental results showed a significant improvement in user rapport recognition compared to models that do not account for users’ traits. To validate our multimodal modeling approach, we compared it to human perception and instruction-based Large Language Models (LLMs). The results showed that our model outperforms that of human and instruction-based LLM models. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop