MDPI - Publisher of Open Access Journals

19 pages, 3031 KB

Open AccessArticle

Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums

by Pakinee Ariya, Perasuk Worragin, Songpon Khanchai, Darin Poollapalin and Phichete Julrode

Informatics 2026, 13(3), 42; https://doi.org/10.3390/informatics13030042 - 11 Mar 2026

Viewed by 653

Virtual museums delivered through immersive virtual reality (VR) function as information environments where users access interpretive content while navigating spatially. With the integration of generative artificial intelligence (AI), conversational assistants can dynamically mediate information interaction; however, evidence remains limited regarding how different AI [...] Read more.

Virtual museums delivered through immersive virtual reality (VR) function as information environments where users access interpretive content while navigating spatially. With the integration of generative artificial intelligence (AI), conversational assistants can dynamically mediate information interaction; however, evidence remains limited regarding how different AI interface representations affect user experience. This study compares three generative AI interface modalities in a VR virtual museum: voice only, voice with synchronized text, and voice with an embodied AI avatar. A controlled experiment with 75 participants examined their effects on user engagement, perceived information quality, and subjective cognitive workload while holding informational content constant. The results indicate that the voice-and-text modality produced the highest perceived information quality, whereas the embodied AI avatar modality yielded the highest user engagement. No significant differences were observed in cognitive workload across modalities. These findings suggest that AI interface modalities play complementary roles in VR-based information interaction and provide design guidance for selecting appropriate AI representations in immersive information systems. Full article

(This article belongs to the Special Issue Real-World Applications and Prototyping of Information Systems for Extended Reality (VR, AR, and MR))

► Show Figures

Figure 1

17 pages, 14849 KB

Open AccessArticle

A Collaborative Robotic System for Autonomous Object Handling with Natural User Interaction

by Federico Neri, Gaetano Lettera, Giacomo Palmieri and Massimo Callegari

Robotics 2026, 15(3), 49; https://doi.org/10.3390/robotics15030049 - 27 Feb 2026

Viewed by 527

Abstract

In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic [...] Read more.

In Industry 5.0, the transition from fixed traditional automation to flexible human–robot collaboration (HRC) needs interfaces that are both intuitive and efficient. This paper introduces a novel, multimodal control system for autonomous object handling, specifically designed to enhance natural user interaction in dynamic work environments. The system integrates a 6-Degrees of Freedom (DoF) collaborative robot (UR5e) with a hand-eye RGB-D vision system to achieve robust autonomy. The core technical contribution lies in a vision pipeline utilizing deep learning for object detection and point cloud processing for accurate 6D pose estimation, enabling advanced tasks such as human-aware object handover directly onto the operator’s hand. Crucially, an Automatic Speech Recognition (ASR) is incorporated, providing a Natural Language Understanding (NLU) layer that allows operators to issue real-time commands for task modification, error correction and object selection. Experimental results demonstrate that this multimodal approach offers a streamlined workflow aiming to improve operational flexibility compared to traditional HMIs, while enhancing the perceived naturalness of the collaborative task. The system establishes a framework for highly responsive and intuitive human–robot workspaces, advancing the state of the art in natural interaction for collaborative object manipulation. Full article

(This article belongs to the Special Issue Human–Robot Collaboration in Industry 5.0)

► Show Figures

Figure 1

27 pages, 15108 KB

Open AccessArticle

Inclusive Digital Gaming Platform

by Rodrigo Mendonça, Salvador Lopes, Ângela Oliveira, Paulo Serra and Filipe Fidalgo

Multimedia 2026, 2(1), 4; https://doi.org/10.3390/multimedia2010004 - 27 Feb 2026

Viewed by 420

Abstract

The lack of accessibility in digital gaming platforms remains a significant barrier to equitable user participation. To address this issue, this article presents an inclusive solution developed as a multimedia project designed to promote access to digital games for any user through the [...] Read more.

The lack of accessibility in digital gaming platforms remains a significant barrier to equitable user participation. To address this issue, this article presents an inclusive solution developed as a multimedia project designed to promote access to digital games for any user through the ipcb.games platform. The platform offers features that enhance accessibility, including voice-based authentication, voice-assisted registration, facial recognition, visual and auditory feedback, and a simplified interface. It also enables users to submit their own games for subsequent approval and integration. The development process followed a multimedia project methodology, structured into phases of analysis, planning, design, production, testing, and validation. The proposal was informed by a systematic review of scientific literature on digital inclusion and accessibility, complemented by a comparative analysis of existing platforms. During usability testing, the platform was evaluated by approximately 50 teachers from different educational levels, who provided highly positive feedback. Future work includes implementing voice-controlled gameplay, enabling keyboard-based navigation, re-implementing a functional eye-tracking system, and creating pedagogical groups, further strengthening the platform’s role in educational contexts. Full article

► Show Figures

Figure 1

20 pages, 636 KB

Open AccessArticle

Using Denoising Diffusion Model for Predicting Global Style Tokens in an Expressive Text-to-Speech System

by Wiktor Prosowicz and Tomasz Hachaj

Electronics 2025, 14(23), 4759; https://doi.org/10.3390/electronics14234759 - 3 Dec 2025

Viewed by 1039

Abstract

Text-to-speech (TTS) systems based on neural networks have undergone a significant evolution, taking a step forward towards achieving human-like quality and expressiveness, which is crucial for applications such as social media content creation and voice interfaces for visually impaired individuals. An entire branch [...] Read more.

Text-to-speech (TTS) systems based on neural networks have undergone a significant evolution, taking a step forward towards achieving human-like quality and expressiveness, which is crucial for applications such as social media content creation and voice interfaces for visually impaired individuals. An entire branch of research, known as Expressive Text-to-speech (ETTS), has emerged to address the so-called one-to-many mapping problem, which limits the naturalness of generated output. However, most ETTS systems applying explicit style modeling treat the prediction of prosodic features as a regressive, rather than generative, process and, consequently, do not capture prosodic diversity. We address this problem by proposing a novel technique for inference-time prediction of speaking-style features, which leverages a diffusion framework for sampling from a learned space of Global Style Tokens-based embeddings, which are then used to condition a neural TTS model. By incorporating the diffusion model, we can leverage its powerful modeling capabilities to learn the distribution of possible stylistic features and, during inference, sample them non-deterministically, which makes the generated speech more human-like by alleviating prosodic monotony across multiple sentences. Our system blends a regressive predictor with a diffusion-based generator to enable smooth control over the diversity of generated speech. Through quantitative and qualitative (human-centered) experiments, we demonstrated that our system generates expressive human speech with non-deterministic high-level prosodic features. Full article

(This article belongs to the Special Issue Advances in Algorithm Optimization and Computational Intelligence)

► Show Figures

Figure 1

22 pages, 2265 KB

Open AccessArticle

A Secure and Robust Multimodal Framework for In-Vehicle Voice Control: Integrating Bilingual Wake-Up, Speaker Verification, and Fuzzy Command Understanding

by Zhixiong Zhang, Yao Li, Wen Ren and Xiaoyan Wang

Eng 2025, 6(11), 319; https://doi.org/10.3390/eng6110319 - 10 Nov 2025

Viewed by 1317

Abstract

Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-channel wake-up mechanism, combining a commercial [...] Read more.

Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-channel wake-up mechanism, combining a commercial English engine (Picovoice) with a custom lightweight ResNet-Lite model for Chinese, to achieve robust cross-lingual activation. For reliable identity authentication, an optimized ECAPA-TDNN model is introduced, enhanced with spectral augmentation, sliding window feature fusion, and an adaptive threshold mechanism. Furthermore, a two-tier fuzzy command matching algorithm operating at character and pinyin levels is designed to significantly improve tolerance to speech variations and ASR errors. Comprehensive experiments on a test set encompassing various Chinese dialects, English accents, and noise environments demonstrate that the proposed system achieves high performance across all components: the wake-up mechanism maintains commercial-grade reliability for English and provides a functional baseline for Chinese; the improved ECAPA-TDNN attains low equal error rates of 2.37% (quiet), 5.59% (background music), and 3.12% (high-speed noise), outperforming standard baselines and showing strong noise robustness against the state of the art; and the fuzzy matcher boosts command recognition accuracy to over 95.67% in quiet environments and above 92.7% under noise, substantially outperforming hard matching by approximately 30%. End-to-end tests confirm an overall interaction success rate of 93.7%. This work offers a practical, integrated solution for developing secure, robust, and flexible voice interfaces in intelligent vehicles. Full article

(This article belongs to the Section Electrical and Electronic Engineering)

► Show Figures

Figure 1

12 pages, 890 KB

Open AccessArticle

Control Modality and Accuracy on the Trust and Acceptance of Construction Robots

by Daeguk Lee, Donghun Lee, Jae Hyun Jung and Taezoon Park

Appl. Sci. 2025, 15(21), 11827; https://doi.org/10.3390/app152111827 - 6 Nov 2025

Viewed by 841

Abstract

This study investigates how control modalities and recognition accuracy influence construction workers’ trust and acceptance of collaborative robots. Sixty participants evaluated voice and gesture control under varying levels of recognition accuracy while performing tiling together with collaborative robots. Experimental results indicated that recognition [...] Read more.

This study investigates how control modalities and recognition accuracy influence construction workers’ trust and acceptance of collaborative robots. Sixty participants evaluated voice and gesture control under varying levels of recognition accuracy while performing tiling together with collaborative robots. Experimental results indicated that recognition accuracy significantly affected perceived enjoyment (PE, p = 0.010), ease of use (PEOU, p = 0.030), and intention to use (ITU, p = 0.022), but not trust, usefulness (PU), or attitude (ATT). Furthermore, the interaction between control modality and accuracy shaped most acceptance factors (PE, p = 0.049; PEOU, p = 0.006; PU, p = 0.006; ATT, p = 0.003, and ITU, p < 0.001) except trust. In general, high recognition accuracy enhanced user experience and adoption intentions. Voice interfaces were favored when recognition accuracy was high, whereas gesture interfaces were more acceptable under low-accuracy conditions. These findings highlight the importance of designing high-accuracy, task-appropriate interfaces to support technology acceptance in construction. The preference for voice interfaces under accurate conditions aligns with the noisy, fast-paced nature of construction sites, where efficiency is paramount. By contrast, gesture interfaces offer resilience when recognition errors occur. The study provides practical guidance for robot developers, interface designers, and construction managers, emphasizing that carefully matching interaction modalities and accuracy levels to on-site demands can improve acceptance and long-term adoption in this traditionally conservative sector. Full article

(This article belongs to the Special Issue Robot Control in Human–Computer Interaction)

► Show Figures

Figure 1

17 pages, 2127 KB

Open AccessArticle

Leveraging Large Language Models for Real-Time UAV Control

by Kheireddine Choutri, Samiha Fadloun, Ayoub Khettabi, Mohand Lagha, Souham Meshoul and Raouf Fareh

Electronics 2025, 14(21), 4312; https://doi.org/10.3390/electronics14214312 - 2 Nov 2025

Cited by 2 | Viewed by 2899

Abstract

As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, [...] Read more.

As drones become increasingly integrated into civilian and industrial domains, the demand for natural and accessible control interfaces continues to grow. Conventional manual controllers require technical expertise and impose cognitive overhead, limiting their usability in dynamic and time-critical scenarios. To address these limitations, this paper presents a multilingual voice-driven control framework for quadrotor drones, enabling real-time operation in both English and Arabic. The proposed architecture combines offline Speech-to-Text (STT) processing with large language models (LLMs) to interpret spoken commands and translate them into executable control code. Specifically, Vosk is employed for bilingual STT, while Google Gemini provides semantic disambiguation, contextual inference, and code generation. The system is designed for continuous, low-latency operation within an edge–cloud hybrid configuration, offering an intuitive and robust human–drone interface. While speech recognition and safety validation are processed entirely offline, high-level reasoning and code generation currently rely on cloud-based LLM inference. Experimental evaluation demonstrates an average speech recognition accuracy of 95% and end-to-end command execution latency between 300 and 500 ms, validating the feasibility of reliable, multilingual, voice-based UAV control. This research advances multimodal human–robot interaction by showcasing the integration of offline speech recognition and LLMs for adaptive, safe, and scalable aerial autonomy. Full article

(This article belongs to the Special Issue Innovations in NLP and Large Language Models: Shaping the Future of AI)

► Show Figures

Figure 1

28 pages, 4508 KB

Open AccessArticle

Mixed Reality-Based Multi-Scenario Visualization and Control in Automated Terminals: A Middleware and Digital Twin Driven Approach

by Yubo Wang, Enyu Zhang, Ang Yang, Keshuang Du and Jing Gao

Buildings 2025, 15(21), 3879; https://doi.org/10.3390/buildings15213879 - 27 Oct 2025

Viewed by 1339

Abstract

This study presents a Digital Twin–Mixed Reality (DT–MR) framework for the immersive and interactive supervision of automated container terminals (ACTs), addressing the fragmented data and limited situational awareness of conventional 2D monitoring systems. The framework employs a middleware-centric architecture that integrates heterogeneous [...] Read more.

This study presents a Digital Twin–Mixed Reality (DT–MR) framework for the immersive and interactive supervision of automated container terminals (ACTs), addressing the fragmented data and limited situational awareness of conventional 2D monitoring systems. The framework employs a middleware-centric architecture that integrates heterogeneous subsystems—covering terminal operation, equipment control, and information management—through standardized industrial communication protocols. It ensures synchronized timestamps and delivers semantically aligned, low-latency data streams to a multi-scale Digital Twin developed in Unity. The twin applies level-of-detail modeling, spatial anchoring, and coordinate alignment (from Industry Foundation Classes (IFCs) to east–north–up (ENU) coordinates and Unity space) for accurate registration with physical assets, while a Microsoft HoloLens 2 device provides an intuitive Mixed Reality interface that combines gaze, gesture, and voice commands with built-in safety interlocks for secure human–machine interaction. Quantitative performance benchmarks—latency ≤100 ms, status refresh ≤1 s, and throughput ≥10,000 events/s—were met through targeted engineering and validated using representative scenarios of quay crane alignment and automated guided vehicle (AGV) rerouting, demonstrating improved anomaly detection, reduced decision latency, and enhanced operational resilience. The proposed DT–MR pipeline establishes a reproducible and extensible foundation for real-time, human-in-the-loop supervision across ports, airports, and other large-scale smart infrastructures. Full article

(This article belongs to the Special Issue Digital Technologies, AI and BIM in Construction)

► Show Figures

Figure 1

26 pages, 7995 KB

Open AccessArticle

Smart Home Control Using Real-Time Hand Gesture Recognition and Artificial Intelligence on Raspberry Pi 5

by Thomas Hobbs and Anwar Ali

Electronics 2025, 14(20), 3976; https://doi.org/10.3390/electronics14203976 - 10 Oct 2025

Viewed by 4323

Abstract

This paper outlines the process of developing a low-cost system for home appliance control via real-time hand gesture classification using Computer Vision and a custom lightweight machine learning model. This system strives to enable those with speech or hearing disabilities to interface with [...] Read more.

This paper outlines the process of developing a low-cost system for home appliance control via real-time hand gesture classification using Computer Vision and a custom lightweight machine learning model. This system strives to enable those with speech or hearing disabilities to interface with smart home devices in real time using hand gestures, such as is possible with voice-activated ‘smart assistants’ currently available. The system runs on a Raspberry Pi 5 to enable future IoT integration and reduce costs. The system also uses the official camera module v2 and 7-inch touchscreen. Frame preprocessing uses MediaPipe to assign hand coordinates, and NumPy tools to normalise them. A machine learning model then predicts the gesture. The model, a feed-forward network consisting of five fully connected layers, was built using Keras 3 and compiled with TensorFlow Lite. Training data utilised the HaGRIDv2 dataset, modified to consist of 15 one-handed gestures from its original of 23 one- and two-handed gestures. When used to train the model, validation metrics of 0.90 accuracy and 0.31 loss were returned. The system can control both analogue and digital hardware via GPIO pins and, when recognising a gesture, averages 20.4 frames per second with no observable delay. Full article

(This article belongs to the Special Issue Machine Learning in Electronic and Biomedical Engineering, 3rd Edition)

► Show Figures

Figure 1

15 pages, 6691 KB

Open AccessProceeding Paper

Smart Customizable Spinning System

by Wei-Chuan Lin, Yu-Wen Hsu and Wan-Lin Yu

Eng. Proc. 2025, 108(1), 46; https://doi.org/10.3390/engproc2025108046 - 12 Sep 2025

Viewed by 597

Abstract

As global obesity rates rise, cardiovascular diseases increase, and stress-related issues become more severe. This increases the public awareness of health and exercise. However, existing spinning fitness equipment lacks personalized customization for individual needs. To address this, we developed a smart customizable spinning [...] Read more.

As global obesity rates rise, cardiovascular diseases increase, and stress-related issues become more severe. This increases the public awareness of health and exercise. However, existing spinning fitness equipment lacks personalized customization for individual needs. To address this, we developed a smart customizable spinning system that enables health monitoring, central computation, flywheel, voice interaction, notification, and query subsystems. Users can set fitness goals based on their personal needs, monitor workout data via sensors, and utilize voice interaction and control to track their exercise status in real time. The system notifies users of workout progress through a buzzer and message queuing telemetry transport, while the Web interface provides access to past workouts and health records. Additionally, the system supports bilingual functionality (Chinese and English), allowing users to operate it in their preferred language, enhancing global usability. Full article

(This article belongs to the Proceedings of 2025 IEEE 5th International Conference on Electronic Communications, Internet of Things and Big Data)

► Show Figures

Figure 1

11 pages, 1005 KB

Open AccessProceeding Paper

Multimodal Fusion for Enhanced Human–Computer Interaction

by Ajay Sharma, Isha Batra, Shamneesh Sharma and Anggy Pradiftha Junfithrana

Eng. Proc. 2025, 107(1), 81; https://doi.org/10.3390/engproc2025107081 - 10 Sep 2025

Cited by 2 | Viewed by 2102

Abstract

Our paper introduces a novel idea of a virtual mouse character driven by gesture detection, eye-tracking, and voice monitoring. This system uses cutting-edge computer vision and machine learning technology to let users command and control the mouse pointer using eye motions, voice commands, [...] Read more.

Our paper introduces a novel idea of a virtual mouse character driven by gesture detection, eye-tracking, and voice monitoring. This system uses cutting-edge computer vision and machine learning technology to let users command and control the mouse pointer using eye motions, voice commands, or hand gestures. This system’s main goal is to provide users who want a more natural, hands-free approach to interacting with their computers as well as those with impairments that limit their bodily motions, such as those with paralysis—with an easy and engaging interface. The system improves accessibility and usability by combining many input modalities, therefore providing a flexible answer for numerous users. While the speech recognition function permits hands-free operation via voice instructions, the eye-tracking component detects and responds to the user’s gaze, therefore providing exact cursor control. Gesture recognition enhances these features even further by letting users use their hands simply to execute mouse operations. This technology not only enhances personal user experience for people with impairments but also marks a major development in human–computer interaction. It shows how computer vision and machine learning may be used to provide more inclusive and flexible user interfaces, therefore improving the accessibility and efficiency of computer usage for everyone. Full article

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

► Show Figures

Figure 1

20 pages, 2732 KB

Open AccessArticle

Redesigning Multimodal Interaction: Adaptive Signal Processing and Cross-Modal Interaction for Hands-Free Computer Interaction

by Bui Hong Quan, Nguyen Dinh Tuan Anh, Hoang Van Phi and Bui Trung Thanh

Sensors 2025, 25(17), 5411; https://doi.org/10.3390/s25175411 - 2 Sep 2025

Cited by 1 | Viewed by 1649

Abstract

Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech [...] Read more.

Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech commands, wake-word detection, and vocal gestures. However, existing systems often suffer from limitations in responsiveness and accuracy, especially under real-world conditions. In this paper, we present 3-Modal Human-Computer Interaction (3M-HCI), a novel interaction system that dynamically integrates facial, vocal, and eye-based inputs through a new signal processing pipeline and a cross-modal coordination mechanism. This approach not only enhances recognition accuracy but also reduces interaction latency. Experimental results demonstrate that 3M-HCI outperforms several recent hands-free interaction solutions in both speed and precision, highlighting its potential as a robust assistive interface. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

25 pages, 19135 KB

Open AccessArticle

Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children

by Isaac León, Camila Reyes, Iesus Davila, Bryan Puruncajas, Dennys Paillacho, Nayeth Solorzano, Marcelo Fajardo-Pruna, Hyungpil Moon and Francisco Yumbla

Multimodal Technol. Interact. 2025, 9(9), 88; https://doi.org/10.3390/mti9090088 - 26 Aug 2025

Viewed by 2220

Abstract

The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. [...] Read more.

The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. At the same time, the technology currently available to provide emotional support in these contexts remains limited. In response to the growing need for emotional support and companionship in child care, this project proposes the development of a multi-platform software architecture based on artificial intelligence (AI), designed to be integrated into humanoid robots that assist children between the ages of 6 and 14. The system enables daily verbal and non-verbal interactions intended to foster a sense of presence and personalized connection through conversations, games, and empathetic gestures. Built on the Robot Operating System (ROS), the software incorporates modular components for voice command processing, real-time facial expression generation, and joint movement control. These modules allow the robot to hold natural conversations, display dynamic facial expressions on its LCD (Liquid Crystal Display) screen, and synchronize gestures with spoken responses. Additionally, a graphical interface enhances the coherence between dialogue and movement, thereby improving the quality of human–robot interaction. Initial evaluations conducted in controlled environments assessed the system’s fluency, responsiveness, and expressive behavior. Subsequently, it was implemented in a pediatric hospital in Guayaquil, Ecuador, where it accompanied children during their recovery. It was observed that this type of artificial intelligence-based software, can significantly enhance the experience of children, opening promising opportunities for its application in clinical, educational, recreational, and other child-centered settings. Full article

(This article belongs to the Special Issue Human-AI Collaborative Interaction Design: Rethinking Human-Computer Symbiosis in the Age of Intelligent Systems)

► Show Figures

Graphical abstract

20 pages, 3244 KB

Open AccessArticle

SOUTY: A Voice Identity-Preserving Mobile Application for Arabic-Speaking Amyotrophic Lateral Sclerosis Patients Using Eye-Tracking and Speech Synthesis

by Hessah A. Alsalamah, Leena Alhabrdi, May Alsebayel, Aljawhara Almisned, Deema Alhadlaq, Loody S. Albadrani, Seetah M. Alsalamah and Shada AlSalamah

Electronics 2025, 14(16), 3235; https://doi.org/10.3390/electronics14163235 - 14 Aug 2025

Viewed by 1379

Abstract

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disorder that progressively impairs motor and communication abilities. Globally, the prevalence of ALS was estimated at approximately 222,800 cases in 2015 and is projected to increase by nearly 70% to 376,700 cases by 2040, primarily driven [...] Read more.

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disorder that progressively impairs motor and communication abilities. Globally, the prevalence of ALS was estimated at approximately 222,800 cases in 2015 and is projected to increase by nearly 70% to 376,700 cases by 2040, primarily driven by demographic shifts in aging populations, and the lifetime risk of developing ALS is 1 in 350–420. Despite international advancements in assistive technologies, a recent national survey in Saudi Arabia revealed that 100% of ALS care providers lack access to eye-tracking communication tools, and 92% reported communication aids as inconsistently available. While assistive technologies such as speech-generating devices and gaze-based control systems have made strides in recent decades, they primarily support English speakers, leaving Arabic-speaking ALS patients underserved. This paper presents SOUTY, a cost-effective, mobile-based application that empowers ALS patients to communicate using gaze-controlled interfaces combined with a text-to-speech (TTS) feature in Arabic language, which is one of the five most widely spoken languages in the world. SOUTY (i.e., “my voice”) utilizes a personalized, pre-recorded voice bank of the ALS patient and integrated eye-tracking technology to support the formation and vocalization of custom phrases in Arabic. This study describes the full development life cycle of SOUTY from conceptualization and requirements gathering to system architecture, implementation, evaluation, and refinement. Validation included expert interviews with Human–Computer Interaction (HCI) expertise and speech pathology specialty, as well as a public survey assessing awareness and technological readiness. The results support SOUTY as a culturally and linguistically relevant innovation that enhances autonomy and quality of life for Arabic-speaking ALS patients. This approach may serve as a replicable model for developing inclusive Augmentative and Alternative Communication (AAC) tools in other underrepresented languages. The system achieved 100% task completion during internal walkthroughs, with mean phrase selection times under 5 s and audio playback latency below 0.3 s. Full article

► Show Figures

Figure 1

26 pages, 6831 KB

Open AccessArticle

Human–Robot Interaction and Tracking System Based on Mixed Reality Disassembly Tasks

by Raúl Calderón-Sesmero, Adrián Lozano-Hernández, Fernando Frontela-Encinas, Guillermo Cabezas-López and Mireya De-Diego-Moro

Robotics 2025, 14(8), 106; https://doi.org/10.3390/robotics14080106 - 30 Jul 2025

Cited by 4 | Viewed by 3513

Abstract

Disassembly is a crucial process in industrial operations, especially in tasks requiring high precision and strict safety standards when handling components with collaborative robots. However, traditional methods often rely on rigid and sequential task planning, which makes it difficult to adapt to unforeseen [...] Read more.

Disassembly is a crucial process in industrial operations, especially in tasks requiring high precision and strict safety standards when handling components with collaborative robots. However, traditional methods often rely on rigid and sequential task planning, which makes it difficult to adapt to unforeseen changes or dynamic environments. This rigidity not only limits flexibility but also leads to prolonged execution times, as operators must follow predefined steps that do not allow for real-time adjustments. Although techniques like teleoperation have attempted to address these limitations, they often hinder direct human–robot collaboration within the same workspace, reducing effectiveness in dynamic environments. In response to these challenges, this research introduces an advanced human–robot interaction (HRI) system leveraging a mixed-reality (MR) interface embedded in a head-mounted device (HMD). The system enables operators to issue real-time control commands using multimodal inputs, including voice, gestures, and gaze tracking. These inputs are synchronized and processed via the Robot Operating System (ROS2), enabling dynamic and flexible task execution. Additionally, the integration of deep learning algorithms ensures precise detection and validation of disassembly components, enhancing accuracy. Experimental evaluations demonstrate significant improvements, including reduced task completion times, enhanced operator experience, and compliance with strict adherence to safety standards. This scalable solution offers broad applicability for general-purpose disassembly tasks, making it well-suited for complex industrial scenarios. Full article

(This article belongs to the Special Issue Robot Teleoperation Integrating with Augmented Reality)

► Show Figures

Figure 1

Search Results (65)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (65)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI