Previous Issue
Volume 9, April
 
 

Multimodal Technol. Interact., Volume 9, Issue 5 (May 2025) – 12 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
11 pages, 12024 KiB  
Article
Computer Vision-Based Obstacle Detection Mobile System for Visually Impaired Individuals
by Gisel Katerine Bastidas-Guacho, Mario Alejandro Paguay Alvarado, Patricio Xavier Moreno-Vallejo, Patricio Rene Moreno-Costales, Nayely Samanta Ocaña Yanza and Jhon Carlos Troya Cuestas
Multimodal Technol. Interact. 2025, 9(5), 48; https://doi.org/10.3390/mti9050048 - 18 May 2025
Abstract
Traditional tools, such as canes, are no longer enough to subsist the mobility and orientation of visually impaired people in complex environments. Therefore, technological solutions based on computer vision tasks are presented as promising alternatives to help detect obstacles. Object detection models are [...] Read more.
Traditional tools, such as canes, are no longer enough to subsist the mobility and orientation of visually impaired people in complex environments. Therefore, technological solutions based on computer vision tasks are presented as promising alternatives to help detect obstacles. Object detection models are easy to couple to mobile systems, do not require a large consumption of resources on mobile phones, and act in real-time to alert users of the presence of obstacles. However, existing object detectors were mostly trained with images from platforms such as Kaggle, and the number of existing objects is still limited. For this reason, this study proposes to implement a mobile system that integrates an object detection model for the identification of obstacles intended for visually impaired people. Additionally, the mobile application integrates multimodal feedback through auditory and haptic interaction, ensuring that users receive real-time obstacle alerts via voice guidance and vibrations, further enhancing accessibility and responsiveness in different navigation contexts. The chosen scenario to develop the obstacle detection application is the Specialized Educational Unit Dr. Luis Benavides for impaired people, which is the source of images for building the dataset for the model and evaluating it with impaired individuals. To determine the best model, the performance of YOLO is evaluated by means of a precision adjustment through the variation of epochs, using a proprietary data set of 7600 diverse images. The YOLO-300 model turned out to be the best, with a mAP of 0.42. Full article
Show Figures

Figure 1

27 pages, 1259 KiB  
Article
The Influence of the Labeling Effect on the Perception of Command Execution Delay in Gaming
by Duy H. Nguyen and Peter A. Kara
Multimodal Technol. Interact. 2025, 9(5), 47; https://doi.org/10.3390/mti9050047 - 15 May 2025
Viewed by 189
Abstract
Gaming is one of the largest industries of digital entertainment. Modern gaming software may be susceptible to command execution delay, which may be caused by various factors, such as insufficient rendering capabilities or limited network resources. At the time of writing this paper, [...] Read more.
Gaming is one of the largest industries of digital entertainment. Modern gaming software may be susceptible to command execution delay, which may be caused by various factors, such as insufficient rendering capabilities or limited network resources. At the time of writing this paper, the utilized advances in gaming are often accompanied by brief descriptions when communicated to the users. While such descriptions may be compressed into a couple of words, even a single word may impact user experience. Due to the cognitive bias induced by the labeling effect, the impact of such a word may actually be more significant than what the user genuinely perceives. In this paper, we investigate the influence of the labeling effect on the perception of command execution delay in gaming. We carried out a series of subjective tests to measure how the word “optimized” affects gaming experience. The test variables of our experiment were the added delay between command and execution, the speed of the game, and the label that was assigned to gaming sequences. The test participants were tasked to directly compare gaming sequences with the different labels assigned: “optimized” and “not optimized”. In every comparison, both sequences had the same objective characteristics; only the label differed. The experiment was conducted on single-input and continuous-input computer games that we developed for this research. The obtained results indicate that for both of these input types, the labeling effect has a statistically significant impact on perceived delay. Overall, more than 70% of the subjective ratings were affected by the assigned labels. Moreover, there is a strong correlation between the amount of delay and the effect of cognitive bias. The speed of the game also affected the obtained results, yet statistically significant differences were only measured between the slowest and the fastest gameplay. Full article
Show Figures

Figure 1

15 pages, 3211 KiB  
Article
Optimizing HUD-EVS Readability: Effects of Hue, Saturation and Lightness on Information Recognition
by Xuyi Qiu
Multimodal Technol. Interact. 2025, 9(5), 46; https://doi.org/10.3390/mti9050046 - 14 May 2025
Viewed by 135
Abstract
Enhanced Vision System (EVS) offers a display advantage that conventional devices lack, enabling interface information to be overlaid on real-world imagery. However, information overload, especially in complex environments, can reduce the recognizability of important information and impair decision-making. This study investigates a dual [...] Read more.
Enhanced Vision System (EVS) offers a display advantage that conventional devices lack, enabling interface information to be overlaid on real-world imagery. However, information overload, especially in complex environments, can reduce the recognizability of important information and impair decision-making. This study investigates a dual color-coding strategy to optimize the recognizability of Primary Information (PI) and Secondary Information (SI) in Head-Up Display–Enhanced Vision System (HUD-EVS) against complex backgrounds. The results show that adjusting the hue, saturation, and lightness of SI affects the recognizability of both PI and SI. Specifically, certain saturation (20% or 80%) and lightness (60%) combinations should be avoided to ensure PI prominence and maintain sufficient recognizability for SI. These findings provide insights for designing color-coding strategies for EVS, enhancing the recognizability of information on mobile devices. Full article
Show Figures

Figure 1

34 pages, 10532 KiB  
Article
Personalized and Timely Feedback in Online Education: Enhancing Learning with Deep Learning and Large Language Models
by Óscar Cuéllar, Manuel Contero and Mauricio Hincapié
Multimodal Technol. Interact. 2025, 9(5), 45; https://doi.org/10.3390/mti9050045 - 14 May 2025
Viewed by 187
Abstract
This study investigates an Adaptive Feedback System (AFS) that integrates deep learning (a recurrent neural network trained with historical student data) and GPT-4 to provide personalized feedback in a Digital Art course. In a quasi-experimental design, the intervention group (n = 42) [...] Read more.
This study investigates an Adaptive Feedback System (AFS) that integrates deep learning (a recurrent neural network trained with historical student data) and GPT-4 to provide personalized feedback in a Digital Art course. In a quasi-experimental design, the intervention group (n = 42) received weekly feedback generated from model predictions, while the control group (n = 39) followed the same program without this intervention across four learning blocks or levels. The results revealed (1) a cumulative effect with a significant performance difference in the fourth learning block (+12.63 percentage points); (2) a reduction in performance disparities between students with varying levels of prior knowledge in the experimental group (−56.5%) versus an increase in the control group (+103.3%); (3) an “overcoming effect” where up to 42.9% of students surpassed negative performance predictions; and (4) a positive impact on active participation, especially in live class attendance (+30.21 points) and forum activity (+9.79 points). These findings demonstrate that integrating deep learning with LLMs can significantly improve learning outcomes in online educational environments, particularly for students with limited prior knowledge. Full article
Show Figures

Figure 1

20 pages, 1108 KiB  
Article
LLMs in Education: Evaluation GPT and BERT Models in Student Comment Classification
by Anabel Pilicita and Enrique Barra
Multimodal Technol. Interact. 2025, 9(5), 44; https://doi.org/10.3390/mti9050044 - 12 May 2025
Viewed by 288
Abstract
The incorporation of artificial intelligence in educational contexts has significantly transformed the support provided to students facing learning difficulties, facilitating both the management of their educational process and their emotions. Additionally, online comments play a vital role in understanding student feelings. Analyzing comments [...] Read more.
The incorporation of artificial intelligence in educational contexts has significantly transformed the support provided to students facing learning difficulties, facilitating both the management of their educational process and their emotions. Additionally, online comments play a vital role in understanding student feelings. Analyzing comments on social media platforms can help identify students in vulnerable situations so that timely interventions can be implemented. However, manually analyzing student-generated content on social media platforms is challenging due to the large amount of data and the frequency with which it is posted. In this sense, the recent revolution in artificial intelligence, marked by the implementation of powerful large language models (LLMs), may contribute to the classification of student comments. This study compared the effectiveness of a supervised learning approach using five different LLMs: bert-base-uncased, roberta-base, gpt-4o-mini-2024-07-18, gpt-3.5-turbo-0125, and gpt-neo-125m. The evaluation was carried out after fine-tuning them specifically to classify student comments on social media platforms with anxiety/depression or neutral labels. The results obtained were as follows: gpt-4o-mini-2024-07-18 and gpt-3.5-turbo-0125 obtained 98.93%, roberta-base 98.14%, bert-base-uncased 97.13%, and gpt-neo-125m 96.43%. Therefore, when comparing the effectiveness of these models, it was determined that all LLMs performed well in this classification task. Full article
Show Figures

Figure 1

22 pages, 5933 KiB  
Article
Education 4.0 for Industry 4.0: A Mixed Reality Framework for Workforce Readiness in Manufacturing
by Andrea Bondin and Joseph Paul Zammit
Multimodal Technol. Interact. 2025, 9(5), 43; https://doi.org/10.3390/mti9050043 - 9 May 2025
Viewed by 175
Abstract
The rapid emergence of Industry 4.0 technologies has transformed manufacturing, requiring a workforce skilled in automation, data-driven decision-making, and process optimisation. While traditional education includes structured formats such as lectures and tutorials, it may not always equip graduates with the hands-on expertise demanded [...] Read more.
The rapid emergence of Industry 4.0 technologies has transformed manufacturing, requiring a workforce skilled in automation, data-driven decision-making, and process optimisation. While traditional education includes structured formats such as lectures and tutorials, it may not always equip graduates with the hands-on expertise demanded by modern industrial challenges. This study presents a Mixed Reality (MR)-based educational framework that promotes interactive experiences to enhance students’ engagement with and understanding of Industry 4.0 concepts, aiming to bridge the skills gap through immersive Virtual Learning Factories (VLFs). The framework was developed using a mixed-methods approach, combining qualitative feedback with quantitative benchmarking. A proof-of-concept MR application was developed and tested at the (Anonymised), simulating Industry 4.0 scenarios in an engineering education context to validate the framework. The findings indicate that MR-based learning improved students’ engagement with the academic content, leading to better knowledge retention and deeper conceptual understanding. The students also demonstrated enhanced problem-solving, process optimisation, and adaptability compared to traditional methods. The immersive nature of MR provided an interactive, context-rich environment that fostered active learning. This research highlights MR’s potential as a transformative educational tool, aligning academic training with industry needs. Future research is recommended to evaluate the framework’s scalability and long-term effectiveness. Full article
Show Figures

Figure 1

25 pages, 1517 KiB  
Article
Towards Structured Gaze Data Classification: The Gaze Data Clustering Taxonomy (GCT)
by Yahdi Siradj, Kiki Maulana Adhinugraha and Eric Pardede
Multimodal Technol. Interact. 2025, 9(5), 42; https://doi.org/10.3390/mti9050042 - 3 May 2025
Viewed by 273
Abstract
Gaze data analysis plays a crucial role in understanding human visual attention and behaviour. However, raw gaze data is often noisy and lacks inherent structure, making interpretation challenging. Therefore, preprocessing techniques such as classification are essential to extract meaningful patterns and improve the [...] Read more.
Gaze data analysis plays a crucial role in understanding human visual attention and behaviour. However, raw gaze data is often noisy and lacks inherent structure, making interpretation challenging. Therefore, preprocessing techniques such as classification are essential to extract meaningful patterns and improve the reliability of gaze-based analysis. This study introduces the Gaze Data Clustering Taxonomy (GCT), a novel approach that categorises gaze data into structured clusters to improve its reliability and interpretability. GCT classifies gaze data based on cluster count, target presence, and spatial–temporal relationships, allowing for more precise gaze-to-target association. We utilise several machine learning techniques, such as k-NN, k-Means, and DBScan, to apply the taxonomy to a Random Saccade Task dataset, demonstrating its effectiveness in gaze classification. Our findings highlight how clustering provides a structured approach to gaze data preprocessing by distinguishing meaningful patterns from unreliable data. Full article
Show Figures

Figure 1

14 pages, 1689 KiB  
Article
Evaluating the Effectiveness of Tilt Gestures for Text Property Control in Mobile Interfaces
by Sang-Hwan Kim and Xuesen Liu
Multimodal Technol. Interact. 2025, 9(5), 41; https://doi.org/10.3390/mti9050041 - 29 Apr 2025
Viewed by 312
Abstract
The objective of this study is to verify the usability of gesture interactions such as tilting or shaking, rather than conventional touch gestures, on mobile devices. To this end, a prototype was developed that manipulates the text size in a mobile text messaging [...] Read more.
The objective of this study is to verify the usability of gesture interactions such as tilting or shaking, rather than conventional touch gestures, on mobile devices. To this end, a prototype was developed that manipulates the text size in a mobile text messaging application through tilt gestures. In the text input interface, three types of tilt gesture interaction methods (‘Shaking’, ‘Leaning’, and ‘Acceleration’) were implemented to select the text size level among five levels (extra-small, small, normal, large, and extra-large). Along with the gesture-based interaction methods, the conventional button method was also evaluated. A total of 24 participants were asked to prepare text messages of specified font sizes using randomly assigned interaction methods to select the font size. Task completion time, accuracy (setting errors and input errors), workload, and subjective preferences were collected and analyzed. As a result, the ‘Shaking’ method was generally similar to the conventional button method and superior to the other two ‘Leaning’ and ‘Acceleration’ methods. This may be because ‘Leaning’ and ‘Acceleration’ are continuous operations, while ‘Shaking’ is an individual operation for each menu (font size level). According to subjective comments, tilting gestures on mobile devices can not only be useful if users take the time to learn them, but also provide ways to convey intentions with simple text. Although tilting gestures were not found to significantly improve text editing performance compared to conventional screen touch methods, the use of motion gestures beyond touch on mobile devices can be considered for interface manipulations such as app navigation, gaming, or multimedia controls across diverse applications. Full article
Show Figures

Figure 1

20 pages, 3083 KiB  
Article
Exploring an Emoji-Based Evaluation and Intervention Method for Psychological Safety in Ongoing Co-Creations
by Qiner Lyu, Gaku Kutsuzawa, Hiroyuki Umemura, Kenta Kimura, Masaaki Mochimaru and Akihiko Murai
Multimodal Technol. Interact. 2025, 9(5), 40; https://doi.org/10.3390/mti9050040 - 24 Apr 2025
Viewed by 938
Abstract
Psychological safety is pivotal for co-creation to build an open environment where innovative ideas can flourish. Traditionally, psychological safety has been evaluated from a stable and long-term perspective by implementing psychological scales. Consequently, existing interventions often focus on steadily enhancing psychological safety, which [...] Read more.
Psychological safety is pivotal for co-creation to build an open environment where innovative ideas can flourish. Traditionally, psychological safety has been evaluated from a stable and long-term perspective by implementing psychological scales. Consequently, existing interventions often focus on steadily enhancing psychological safety, which is less suitable for dynamic short-term co-creation settings. The purpose of this study is to introduce the use of emojis as a novel and intuitive interaction during co-creations and assess their effectiveness in evaluating and influencing psychological safety. We performed two experiments with 140 participants in total to test emojis as evaluations and interventions, respectively. The participants watched videos and annotated them with emojis based on their perceptions of emotions. This process allowed us to explore the relationship between perceived emotions and psychological safety. In the next phase, we embedded emojis directly into the videos to observe whether the participants’ emotional perceptions—and, consequently, their psychological safety—could be influenced by visual cues. Our findings demonstrate that positive emojis are positively correlated with psychological safety and negative emojis are negatively correlated with psychological safety. We also revealed that negative emojis significantly decreased psychological safety scores, whereas positive emojis did not lead to a corresponding increase. Full article
Show Figures

Figure 1

22 pages, 4770 KiB  
Article
Internet of Things and Artificial Intelligence for Secure and Sustainable Green Mobility: A Multimodal Data Fusion Approach to Enhance Efficiency and Security
by Manuel J. C. S. Reis
Multimodal Technol. Interact. 2025, 9(5), 39; https://doi.org/10.3390/mti9050039 - 24 Apr 2025
Viewed by 419
Abstract
The increasing complexity of urban mobility systems demands innovative solutions to address challenges such as traffic congestion, energy inefficiency, and environmental sustainability. This paper proposes an IoT and AI-driven framework for secure and sustainable green mobility, leveraging multimodal data fusion to enhance traffic [...] Read more.
The increasing complexity of urban mobility systems demands innovative solutions to address challenges such as traffic congestion, energy inefficiency, and environmental sustainability. This paper proposes an IoT and AI-driven framework for secure and sustainable green mobility, leveraging multimodal data fusion to enhance traffic management, energy efficiency, and emissions reduction. Using publicly available datasets, including METR-LA for traffic flow and OpenWeatherMap for environmental context, the framework integrates machine learning models for congestion prediction and reinforcement learning for dynamic route optimization. Simulation results demonstrate a 20% reduction in travel time, 15% energy savings per kilometer, and a 10% decrease in CO2 emissions compared to baseline methods. The modular architecture of the framework allows for scalability and adaptability across various smart city applications, including traffic management, energy grid optimization, and public transit coordination. These findings underscore the potential of IoT and AI technologies to revolutionize urban transportation, contributing to more efficient, secure, and sustainable mobility systems. Full article
Show Figures

Figure 1

36 pages, 12483 KiB  
Article
Environments That Boost Creativity: AI-Generated Living Geometry
by Nikos A. Salingaros
Multimodal Technol. Interact. 2025, 9(5), 38; https://doi.org/10.3390/mti9050038 - 23 Apr 2025
Viewed by 337
Abstract
Generative AI leads to designs that prioritize cognition, emotional resonance, and health, thus offering a tested alternative to current trends. In a first AI experiment, the large language model ChatGPT-4o generated six visual environments that are expected to boost creative thinking for their [...] Read more.
Generative AI leads to designs that prioritize cognition, emotional resonance, and health, thus offering a tested alternative to current trends. In a first AI experiment, the large language model ChatGPT-4o generated six visual environments that are expected to boost creative thinking for their occupants. The six test cases are evaluated using Christopher Alexander’s 15 fundamental properties of living geometry as criteria, as well as ChatGPT-4o, to reveal a strong positive correlation. Living geometry is a specific type of geometry that shows coherence across scales, fractal structure, and nested symmetries to harmonize with human neurophysiology. The human need for living geometry is supported by interdisciplinary evidence from biology, environmental psychology, and neuroscience. Then, in a second AI experiment, ChatGPT-4o was asked to generate visual environments that suppress creativity for comparison with the cases that boost creative thinking. Checking these negative examples using Alexander’s 15 fundamental properties, they are almost entirely deficient in living geometry, thus confirming the diagnostic model. Used together with generative AI, living geometry therefore offers a useful method for both creating and evaluating designs based on objective criteria. Adopting a hybrid epistemological framework of AI plus living geometry as a basis for design uncovers a flaw within contemporary architectural practice. Dominant design styles, rooted in untested aesthetic preferences, lack the empirical validation required to address fundamental questions of spatial quality responsible for human creativity. Full article
Show Figures

Figure 1

36 pages, 18792 KiB  
Article
VICTORIOUS: A Visual Analytics System for Scoping Review of Document Sets
by Amir Haghighati, Amir Reza Haghverdi and Kamran Sedig
Multimodal Technol. Interact. 2025, 9(5), 37; https://doi.org/10.3390/mti9050037 - 22 Apr 2025
Viewed by 303
Abstract
Scoping review is an iterative knowledge synthesis methodology concerned with broad questions about the nature of a research subject. The increasingly large number of published documents in scholarly domains poses challenges in conducting scoping reviews. Despite attempts to address these challenges, the specific [...] Read more.
Scoping review is an iterative knowledge synthesis methodology concerned with broad questions about the nature of a research subject. The increasingly large number of published documents in scholarly domains poses challenges in conducting scoping reviews. Despite attempts to address these challenges, the specific step of sensemaking in the context of scoping reviews is seldom addressed. We address sensemaking of a curated document collection by developing a VIsual analytiCs sysTem for scOping RevIew of dOcUment Sets (VICTORIOUS). Using known methods within the machine learning community, we propose and develop six modules within VICTORIOUS: Map, Summary, Skim, SemJump, BiblioNetwork, and Compare. To demonstrate the utility of VICTORIOUS, we describe three usage scenarios. We conclude by a qualitative comparison of VICTORIOUS and other available systems. While existing systems leave their users with singular information items regarding a document set and gaining an aggregated assessment in a scoping review is often a challenge, VICTORIOUS shows promise for making sense of documents in a scoping review process. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop