Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (449)

Search Parameters:
Keywords = robot vision control

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 1945 KB  
Article
A Symmetry-Informed Multimodal LLM-Driven Approach to Robotic Object Manipulation: Lowering Entry Barriers in Mechatronics Education
by Jorge Gudiño-Lau, Miguel Durán-Fonseca, Luis E. Anido-Rifón and Pedro C. Santana-Mancilla
Symmetry 2025, 17(10), 1756; https://doi.org/10.3390/sym17101756 - 17 Oct 2025
Viewed by 105
Abstract
The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages [...] Read more.
The integration of Large Language Models (LLMs), particularly Visual Language Models (VLMs), into robotics promises more intuitive human–robot interactions; however, challenges remain in efficiently translating high-level commands into precise physical actions. This paper presents a novel architecture for vision-based object manipulation that leverages a VLM’s reasoning capabilities while incorporating symmetry principles to enhance operational efficiency. Implemented on a Yahboom DOFBOT educational robot with a Jetson Nano platform, our system introduces a prompt-based framework that uniquely embeds symmetry-related cues to streamline feature extraction and object detection from visual data. This methodology, which utilizes few-shot learning, enables the VLM to generate more accurate and contextually relevant commands for manipulation tasks by efficiently interpreting the symmetric and asymmetric features of objects. The experimental results in controlled scenarios demonstrate that our symmetry-informed approach significantly improves the robot’s interaction efficiency and decision-making accuracy compared to generic prompting strategies. This work contributes a robust method for integrating fundamental vision principles into modern generative AI workflows for robotics. Furthermore, its implementation on an accessible educational platform shows its potential to simplify complex robotics concepts for engineering education and research. Full article
Show Figures

Graphical abstract

32 pages, 12557 KB  
Article
Controlling an Industrial Robot Using Stereo 3D Vision Systems with AI Elements
by Jarosław Panasiuk
Sensors 2025, 25(20), 6402; https://doi.org/10.3390/s25206402 - 16 Oct 2025
Viewed by 340
Abstract
Robotization of production processes and the use of 3D vision systems are currently becoming more and more popular. It allows for more flexibility in the robotic process as well as expands the possibilities of process control, depending on changes in the parameters of [...] Read more.
Robotization of production processes and the use of 3D vision systems are currently becoming more and more popular. It allows for more flexibility in the robotic process as well as expands the possibilities of process control, depending on changes in the parameters of the object, its pose, and changes in the process itself. Unfortunately, the use of standard solutions is limited to a relatively small space in which the robot’s vision system operates. The use of the latest solutions in the field of Artificial Intelligence (AI) and external vision systems, in combination with the closed structures of industrial robot control systems, provides advantages by enhancing the digital awareness of the environment of robotic systems. This article presents an example of solving the problem of low digital awareness of the environment of robotic systems resulting from the limited field of view of vision systems used in industrial robots, while maintaining high precision of the systems consisting of the combination of a 3D vision system using a stereovision camera and software with AI elements with the control system of an industrial robot from FANUC and an integrated Robot Vision (iRVision) system to maintain the positioning accuracy of the robot tool. Full article
Show Figures

Figure 1

15 pages, 2133 KB  
Article
A LiDAR SLAM and Visual-Servoing Fusion Approach to Inter-Zone Localization and Navigation in Multi-Span Greenhouses
by Chunyang Ni, Jianfeng Cai and Pengbo Wang
Agronomy 2025, 15(10), 2380; https://doi.org/10.3390/agronomy15102380 - 12 Oct 2025
Viewed by 443
Abstract
Greenhouse automation has become increasingly important in facility agriculture, yet multi-span glass greenhouses pose both scientific and practical challenges for autonomous mobile robots. Scientifically, solid-state LiDAR is vulnerable to glass-induced reflections, sparse geometric features, and narrow vertical fields of view, all of which [...] Read more.
Greenhouse automation has become increasingly important in facility agriculture, yet multi-span glass greenhouses pose both scientific and practical challenges for autonomous mobile robots. Scientifically, solid-state LiDAR is vulnerable to glass-induced reflections, sparse geometric features, and narrow vertical fields of view, all of which undermine Simultaneous Localization and Mapping (SLAM)-based localization and mapping. Practically, large-scale crop production demands accurate inter-row navigation and efficient rail switching to reduce labor intensity and ensure stable operations. To address these challenges, this study presents an integrated localization-navigation framework for mobile robots in multi-span glass greenhouses. In the intralogistics area, the LiDAR Inertial Odometry-Simultaneous Localization and Mapping (LIO-SAM) pipeline was enhanced with reflection filtering, adaptive feature-extraction thresholds, and improved loop-closure detection, generating high-fidelity three-dimensional maps that were converted into two-dimensional occupancy grids for A-Star global path planning and Dynamic Window Approach (DWA) local control. In the cultivation area, where rails intersect with internal corridors, YOLOv8n-based rail-center detection combined with a pure-pursuit controller established a vision-servo framework for lateral rail switching and inter-row navigation. Field experiments demonstrated that the optimized mapping reduced the mean relative error by 15%. At a navigation speed of 0.2 m/s, the robot achieved a mean lateral deviation of 4.12 cm and a heading offset of 1.79°, while the vision-servo rail-switching system improved efficiency by 25.2%. These findings confirm the proposed framework’s accuracy, robustness, and practical applicability, providing strong support for intelligent facility-agriculture operations. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

24 pages, 2134 KB  
Article
Smart Risk Assessment and Adaptive Control Strategy Selection for Human–Robot Collaboration in Industry 5.0: An Intelligent Multi-Criteria Decision-Making Approach
by Ertugrul Ayyildiz, Tolga Kudret Karaca, Melike Cari, Bahar Yalcin Kavus and Nezir Aydin
Processes 2025, 13(10), 3206; https://doi.org/10.3390/pr13103206 - 9 Oct 2025
Viewed by 557
Abstract
The emergence of Industry 5.0 brings a paradigm shift towards collaborative environments where humans and intelligent robots work side-by-side, enabling personalized, flexible, and resilient manufacturing. However, integrating humans and robots introduces new operational and safety risks that require proactive and adaptive control strategies. [...] Read more.
The emergence of Industry 5.0 brings a paradigm shift towards collaborative environments where humans and intelligent robots work side-by-side, enabling personalized, flexible, and resilient manufacturing. However, integrating humans and robots introduces new operational and safety risks that require proactive and adaptive control strategies. This study proposes an intelligent multi-criteria decision-making framework for smart risk assessment and the selection of optimal adaptive control strategies in human–robot collaborative manufacturing settings. The proposed framework integrates advanced risk analytics, real-time data processing, and expert knowledge to evaluate alternative control strategies, such as real-time wearable sensor integration, vision-based dynamic safety zones, AI-driven behavior prediction models, haptic feedback, and self-learning adaptive robot algorithms. A cross-disciplinary panel of ten experts structures six main and eighteen sub-criteria spanning safety, adaptability, ergonomics, reliability, performance, and cost, with response time and implementation/maintenance costs modeled as cost types. Safety receives the most significant weight; the most influential sub-criteria are collision avoidance efficiency, return on investment (ROI), and emergency response capability. The framework preserves linguistic semantics from elicitation to aggregation and provides a transparent, uncertainty-aware tool for selecting and phasing adaptive control strategies in Industry 5.0 collaborative cells. Full article
Show Figures

Figure 1

19 pages, 9302 KB  
Article
Real-Time Face Gesture-Based Robot Control Using GhostNet in a Unity Simulation Environment
by Yaseen
Sensors 2025, 25(19), 6090; https://doi.org/10.3390/s25196090 - 2 Oct 2025
Viewed by 466
Abstract
Unlike traditional control systems that rely on physical input devices, facial gesture-based interaction offers a contactless and intuitive method for operating autonomous systems. Recent advances in computer vision and deep learning have enabled the use of facial expressions and movements for command recognition [...] Read more.
Unlike traditional control systems that rely on physical input devices, facial gesture-based interaction offers a contactless and intuitive method for operating autonomous systems. Recent advances in computer vision and deep learning have enabled the use of facial expressions and movements for command recognition in human–robot interaction. In this work, we propose a lightweight, real-time facial gesture recognition method, GhostNet-BiLSTM-Attention (GBA), which integrates GhostNet and BiLSTM with an attention mechanism, is trained on the FaceGest dataset, and is integrated with a 3D robot simulation in Unity. The system is designed to recognize predefined facial gestures such as head tilts, eye blinks, and mouth movements with high accuracy and low inference latency. Recognized gestures are mapped to specific robot commands and transmitted to a Unity-based simulation environment via socket communication across machines. This framework enables smooth and immersive robot control without the need for conventional controllers or sensors. Real-time evaluation demonstrates the system’s robustness and responsiveness under varied user and lighting conditions, achieving a classification accuracy of 99.13% on the FaceGest dataset. The GBA holds strong potential for applications in assistive robotics, contactless teleoperation, and immersive human–robot interfaces. Full article
(This article belongs to the Special Issue Smart Sensing and Control for Autonomous Intelligent Unmanned Systems)
Show Figures

Figure 1

25 pages, 12510 KB  
Article
Computer Vision-Based Optical Odometry Sensors: A Comparative Study of Classical Tracking Methods for Non-Contact Surface Measurement
by Ignas Andrijauskas, Marius Šumanas, Andrius Dzedzickis, Wojciech Tanaś and Vytautas Bučinskas
Sensors 2025, 25(19), 6051; https://doi.org/10.3390/s25196051 - 1 Oct 2025
Viewed by 555
Abstract
This article presents a principled framework for selecting and tuning classical computer vision algorithms in the context of optical displacement sensing. By isolating key factors that affect algorithm behavior—such as feed window size and motion step size—the study seeks to move beyond intuition-based [...] Read more.
This article presents a principled framework for selecting and tuning classical computer vision algorithms in the context of optical displacement sensing. By isolating key factors that affect algorithm behavior—such as feed window size and motion step size—the study seeks to move beyond intuition-based practices and provide rigorous, repeatable performance evaluations. Computer vision-based optical odometry sensors offer non-contact, high-precision measurement capabilities essential for modern metrology and robotics applications. This paper presents a systematic comparative analysis of three classical tracking algorithms—phase correlation, template matching, and optical flow—for 2D surface displacement measurement using synthetic image sequences with subpixel-accurate ground truth. A virtual camera system generates controlled test conditions using a multi-circle trajectory pattern, enabling systematic evaluation of tracking performance using 400 × 400 and 200 × 200 pixel feed windows. The systematic characterization enables informed algorithm selection based on specific application requirements rather than empirical trial-and-error approaches. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

21 pages, 2975 KB  
Article
ARGUS: An Autonomous Robotic Guard System for Uncovering Security Threats in Cyber-Physical Environments
by Edi Marian Timofte, Mihai Dimian, Alin Dan Potorac, Doru Balan, Daniel-Florin Hrițcan, Marcel Pușcașu and Ovidiu Chiraș
J. Cybersecur. Priv. 2025, 5(4), 78; https://doi.org/10.3390/jcp5040078 - 1 Oct 2025
Viewed by 730
Abstract
Cyber-physical infrastructures such as hospitals and smart campuses face hybrid threats that target both digital and physical domains. Traditional security solutions separate surveillance from network monitoring, leaving blind spots when attackers combine these vectors. This paper introduces ARGUS, an autonomous robotic platform designed [...] Read more.
Cyber-physical infrastructures such as hospitals and smart campuses face hybrid threats that target both digital and physical domains. Traditional security solutions separate surveillance from network monitoring, leaving blind spots when attackers combine these vectors. This paper introduces ARGUS, an autonomous robotic platform designed to close this gap by correlating cyber and physical anomalies in real time. ARGUS integrates computer vision for facial and weapon detection with intrusion detection systems (Snort, Suricata) for monitoring malicious network activity. Operating through an edge-first microservice architecture, it ensures low latency and resilience without reliance on cloud services. Our evaluation covered five scenarios—access control, unauthorized entry, weapon detection, port scanning, and denial-of-service attacks—with each repeated ten times under varied conditions such as low light, occlusion, and crowding. Results show face recognition accuracy of 92.7% (500 samples), weapon detection accuracy of 89.3% (450 samples), and intrusion detection latency below one second, with minimal false positives. Audio analysis of high-risk sounds further enhanced situational awareness. Beyond performance, ARGUS addresses GDPR and ISO 27001 compliance and anticipates adversarial robustness. By unifying cyber and physical detection, ARGUS advances beyond state-of-the-art patrol robots, delivering comprehensive situational awareness and a practical path toward resilient, ethical robotic security. Full article
(This article belongs to the Special Issue Cybersecurity Risk Prediction, Assessment and Management)
Show Figures

Figure 1

24 pages, 4993 KB  
Article
Skeletal Image Features Based Collaborative Teleoperation Control of the Double Robotic Manipulators
by Hsiu-Ming Wu and Shih-Hsun Wei
Electronics 2025, 14(19), 3897; https://doi.org/10.3390/electronics14193897 - 30 Sep 2025
Viewed by 189
Abstract
In this study, a vision-based remote and synchronized control scheme is proposed for the double six-DOF robotic manipulators. Using an Intel RealSense D435 depth camera and MediaPipe skeletal image feature technique, the operator’s 3D hand pose is captured and mapped to the robot’s [...] Read more.
In this study, a vision-based remote and synchronized control scheme is proposed for the double six-DOF robotic manipulators. Using an Intel RealSense D435 depth camera and MediaPipe skeletal image feature technique, the operator’s 3D hand pose is captured and mapped to the robot’s workspace via coordinate transformation. Inverse kinematics is then applied to compute the necessary joint angles for synchronized motion control. Implemented on double robotic manipulators with the MoveIt framework, the system successfully achieves a collaborative teleoperation control task to transfer an object from a robotic manipulator to another one. Further, moving average filtering techniques are used to enhance trajectory smoothness and stability. The framework demonstrates the feasibility and effectiveness of non-contact, vision-guided multi-robot control for applications in teleoperation, smart manufacturing, and education. Full article
(This article belongs to the Section Systems & Control Engineering)
Show Figures

Figure 1

26 pages, 2589 KB  
Article
Vision-Based Adaptive Control of Robotic Arm Using MN-MD3+BC
by Xianxia Zhang, Junjie Wu and Chang Zhao
Appl. Sci. 2025, 15(19), 10569; https://doi.org/10.3390/app151910569 - 30 Sep 2025
Viewed by 263
Abstract
Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) [...] Read more.
Aiming at the problems of traditional calibrated visual servo systems relying on precise model calibration and the high training cost and low efficiency of online reinforcement learning, this paper proposes a Multi-Network Mean Delayed Deep Deterministic Policy Gradient Algorithm with Behavior Cloning (MN-MD3+BC) for uncalibrated visual adaptive control of robotic arms. The algorithm improves upon the Twin Delayed Deep Deterministic Policy Gradient (TD3) network framework by adopting an architecture with one actor network and three critic networks, along with corresponding target networks. By constructing a multi-critic network integration mechanism, the mean output of the networks is used as the final Q-value estimate, effectively reducing the estimation bias of a single critic network. Meanwhile, a behavior cloning regularization term is introduced to address the common distribution shift problem in offline reinforcement learning. Furthermore, to obtain a high-quality dataset, an innovative data recombination-driven dataset creation method is proposed, which reduces training costs and avoids the risks of real-world exploration. The trained policy network is embedded into the actual system as an adaptive controller, driving the robotic arm to gradually approach the target position through closed-loop control. The algorithm is applied to uncalibrated multi-degree-of-freedom robotic arm visual servo tasks, providing an adaptive and low-dependency solution for dynamic and complex scenarios. MATLAB simulations and experiments on the WPR1 platform demonstrate that, compared to traditional Jacobian matrix-based model-free methods, the proposed approach exhibits advantages in tracking accuracy, error convergence speed, and system stability. Full article
(This article belongs to the Special Issue Intelligent Control of Robotic System)
Show Figures

Figure 1

86 pages, 4498 KB  
Review
Autonomous Driving in Agricultural Machinery: Advancing the Frontier of Precision Agriculture
by Qingchao Liu, Ruohan Yu, Haoda Suo, Yingfeng Cai, Long Chen and Haobin Jiang
Actuators 2025, 14(9), 464; https://doi.org/10.3390/act14090464 - 22 Sep 2025
Viewed by 1004
Abstract
Increasing global food production to address challenges from population growth, labor shortages, and climate change necessitates a significant enhancement of agricultural sustainability. Autonomous agricultural machinery, a recognized application of precision agriculture, offers a promising solution to boost productivity, resource efficiency, and environmental sustainability. [...] Read more.
Increasing global food production to address challenges from population growth, labor shortages, and climate change necessitates a significant enhancement of agricultural sustainability. Autonomous agricultural machinery, a recognized application of precision agriculture, offers a promising solution to boost productivity, resource efficiency, and environmental sustainability. This study presents a systematic review of autonomous driving technologies for agricultural machinery based on 506 rigorously selected publications. The review emphasizes three core aspects: navigation reliability assurance, motion control mechanisms for both vehicles and implements, and actuator fault-tolerance strategies in complex agricultural environments. Applications in farmland, orchards, and livestock farming demonstrate substantial potential. This study also discusses current challenges and future development trends. It aims to provide a reference and technical guidance for the engineering implementation of intelligent agricultural machinery and to support sustainable agricultural transformation. Full article
Show Figures

Figure 1

35 pages, 6625 KB  
Review
Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes
by Mantas Makulavičius, Juratė Jolanta Petronienė, Ernestas Šutinys, Vytautas Bučinskas and Andrius Dzedzickis
Appl. Sci. 2025, 15(18), 10249; https://doi.org/10.3390/app151810249 - 20 Sep 2025
Cited by 1 | Viewed by 758
Abstract
Since the development of industrial robots, they have been used to enhance efficiency and reduce the need for manual labor. Industrial robots have become a universal tool across all economic sectors, with the integration of software that is extremely important for the effective [...] Read more.
Since the development of industrial robots, they have been used to enhance efficiency and reduce the need for manual labor. Industrial robots have become a universal tool across all economic sectors, with the integration of software that is extremely important for the effective operation of machines and processes. Robotic action accuracy is currently experiencing rapid development in all robot-involving activities. Currently, a significant breakthrough has been observed in modifying algorithms and controlling robot actions, as well as in monitoring and planning software and hardware compatibility to prevent errors in real-time. The integration of the Internet of Things, machine learning, and other advanced techniques has enhanced the intelligent features of industrial robots. As industrial automation advances, there is an increasing demand for precise control in a variety of robotic arm applications. It is essential to refine current solutions to address the challenges posed by the high connectivity, complex computations, and various scenarios involved. This review examines the application of vision-based models, particularly YOLO (You Only Look Once) variants, in object detection within industrial robotic environments, as well as other machine learning models for tasks such as classification and localization. Finally, this review summarizes the results presented in selected publications, compares represented methods, identifies challenges in prospective object-tracking technologies, and suggests future research directions. Full article
(This article belongs to the Special Issue Multimodal Robot Intelligence for Grasping and Manipulation)
Show Figures

Figure 1

18 pages, 2575 KB  
Article
Gestures in Motion: Exploring Referent-Free Elicitation Method for Hexapod Robot Control in Urban Environments
by Natalia Walczak, Julia Trzebuchowska, Wiktoria Krzyżańska, Franciszek Sobiech, Aleksandra Wysokińska, Andrzej Romanowski and Krzysztof Grudzień
Electronics 2025, 14(18), 3667; https://doi.org/10.3390/electronics14183667 - 16 Sep 2025
Viewed by 444
Abstract
Gesture elicitation studies (GES) is a promising method for intuitive interaction with mobile robots in urban environments. Traditional gesture elicitation methods rely on predefined commands, which may restrict creativity and adaptability. This study explores referent-free gesture elicitation as a method for discovering natural, [...] Read more.
Gesture elicitation studies (GES) is a promising method for intuitive interaction with mobile robots in urban environments. Traditional gesture elicitation methods rely on predefined commands, which may restrict creativity and adaptability. This study explores referent-free gesture elicitation as a method for discovering natural, user-defined gestures to control a hexapod robot. Through a three-phase user study, we explore gesture diversity, user confidence, and agreement rates across tasks. Results show that referent-free methods foster creativity but present consistency challenges, while referent-based approaches offer better convergence for familiar commands. As a proof of concept, we implemented a subset of gestures on an embedded platform using a stereo vision system and tested live classification with two gestures. This technical extension demonstrates early-stage feasibility and informs future deployments. Our findings contribute a design framework for human-centered gesture interfaces in mobile robotics, especially for dynamic public environments. Full article
(This article belongs to the Special Issue Human Robot Interaction: Techniques, Applications, and Future Trends)
Show Figures

Figure 1

25 pages, 33918 KB  
Article
A Digital Twin Framework for Visual Perception in Electrical Substations Under Dynamic Environmental Conditions
by Tiago Trindade Ribeiro, Andre Gustavo Scolari Conceição, Leonardo de Mello Honório, Iago Zanuti Biundini and Celso Moreira Lima
Sensors 2025, 25(18), 5689; https://doi.org/10.3390/s25185689 - 12 Sep 2025
Viewed by 603
Abstract
Electrical power substations are visually complex and safety-critical environments with restricted access and highly variable lighting; a digital twin (DT) framework provides a controlled and repeatable context for developing and validating vision-based inspections. This paper presents a novel sensor-centric DT framework that combines [...] Read more.
Electrical power substations are visually complex and safety-critical environments with restricted access and highly variable lighting; a digital twin (DT) framework provides a controlled and repeatable context for developing and validating vision-based inspections. This paper presents a novel sensor-centric DT framework that combines accurate 3D substation geometry with physically based lighting dynamics (realistic diurnal variation, interactive sun-pose control) and representative optical imperfections. A Render-In-The-Loop (RITL) pipeline generates synthetic datasets with configurable sensor models, variable lighting, and time-dependent material responses, including dynamic object properties. A representative case study evaluates how well the framework reproduces the typical perceptual challenges of substation inspection, and the results indicate strong potential to support the development, testing, and benchmarking of robotic perception algorithms in large-scale, complex environments. This research is useful to utility operators and asset management teams, robotics/computer vision researchers, and inspection and sensor platform vendors by enabling the generation of reproducible datasets, benchmarking, and pre-deployment testing. Full article
Show Figures

Figure 1

19 pages, 7346 KB  
Article
Human–Robot Variable-Impedance Skill Transfer Learning Based on Dynamic Movement Primitives and a Vision System
by Honghui Zhang, Fang Peng and Miaozhe Cai
Sensors 2025, 25(18), 5630; https://doi.org/10.3390/s25185630 - 10 Sep 2025
Viewed by 545
Abstract
To enhance robotic adaptability in dynamic environments, this study proposes a multimodal framework for skill transfer. The framework integrates vision-based kinesthetic teaching with surface electromyography (sEMG) signals to estimate human impedance. We establish a Cartesian-space model of upper-limb stiffness, linearly mapping sEMG signals [...] Read more.
To enhance robotic adaptability in dynamic environments, this study proposes a multimodal framework for skill transfer. The framework integrates vision-based kinesthetic teaching with surface electromyography (sEMG) signals to estimate human impedance. We establish a Cartesian-space model of upper-limb stiffness, linearly mapping sEMG signals to end-point stiffness. For flexible task execution, dynamic movement primitives (DMPs) generalize learned skills across varying scenarios. An adaptive admittance controller, incorporating sEMG-modulated stiffness, is developed and validated on a UR5 robot. Experiments involving elastic-band stretching demonstrate that the system successfully transfers human impedance characteristics to the robot, enhancing stability, environmental adaptability, and safety during physical interaction. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

20 pages, 3941 KB  
Article
Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction
by Mu Li, Wenjin Xu, Chao Zeng and Ning Wang
Biomimetics 2025, 10(9), 603; https://doi.org/10.3390/biomimetics10090603 - 9 Sep 2025
Viewed by 634
Abstract
Human–robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision–Language–Action (VLA) models demonstrating particular promise in human–robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for [...] Read more.
Human–robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision–Language–Action (VLA) models demonstrating particular promise in human–robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control. Full article
(This article belongs to the Special Issue Intelligent Human–Robot Interaction: 4th Edition)
Show Figures

Figure 1

Back to TopTop