error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (92)

Search Parameters:
Keywords = robotic imitation learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 13574 KB  
Article
Deep Reinforcement Learning Control of a Hexapod Robot
by Taesoo Kim, Minjun Choi, Seunguk Choi, Taeuan Yoon and Dongil Choi
Actuators 2026, 15(1), 33; https://doi.org/10.3390/act15010033 - 5 Jan 2026
Viewed by 70
Abstract
Recent advances in legged robotics have highlighted deep reinforcement learning (DRL)-based controllers for their robust adaptability to diverse, unstructured environments. While position-based DRL controllers achieve high tracking accuracy, they offer limited disturbance rejection, which degrades walking stability; torque-based DRL controllers can mitigate this [...] Read more.
Recent advances in legged robotics have highlighted deep reinforcement learning (DRL)-based controllers for their robust adaptability to diverse, unstructured environments. While position-based DRL controllers achieve high tracking accuracy, they offer limited disturbance rejection, which degrades walking stability; torque-based DRL controllers can mitigate this issue but typically require extensive time and trial-and-error to converge. To address these challenges, we propose a Real-Time Motion Generator (RTMG). At each time step, RTMG kinematically synthesizes end-effector trajectories from target translational and angular velocities (yaw rate) and step length, then maps them to joint angles via inverse kinematics to produce imitation data. The RL agent uses this imitation data as a torque bias, which is gradually annealed during training to enable fully autonomous behavior. We further combine the RTMG-generated imitation data with a decaying action priors scheme to ensure both initial stability and motion diversity. The proposed training pipeline, implemented in NVIDIA Isaac Gym with Proximal Policy Optimization (PPO), reliably converges to the target gait pattern. The trained controller is Tensor RT-optimized and runs at 50 Hz on a Jetson Nano; relative to a position-based baseline, torso oscillation is reduced by 24.88% in simulation and 21.24% on hardware, demonstrating the effectiveness of the approach. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

18 pages, 6246 KB  
Article
Cross-Modality Alignment Perception and Multi-Head Self-Attention Mechanism for Vision-Language-Action of Humanoid Robot
by Bin Ren and Diwei Shi
Sensors 2026, 26(1), 165; https://doi.org/10.3390/s26010165 - 26 Dec 2025
Viewed by 345
Abstract
For a humanoid robot, it is difficult to predict a motion trajectory through end-to-end imitation learning when performing complex operations and multi-step processes, leading to jittering in the robot arm. To alleviate this problem and reduce the computational complexity of the self-attention module [...] Read more.
For a humanoid robot, it is difficult to predict a motion trajectory through end-to-end imitation learning when performing complex operations and multi-step processes, leading to jittering in the robot arm. To alleviate this problem and reduce the computational complexity of the self-attention module in Vision-Language-Action (VLA) operations, we proposed a memory-gated filtering attention model that improved the multi-head self-attention mechanism. Then, we designed a cross-modal alignment perception during training, combined with a few-shot data-collection strategy for key steps. The experimental results showed that the proposed scheme significantly improved the task success rate and alleviated the robot arm jitter problem, while reducing video memory usage by 72% and improving training speed from 1.35 s to 0.129 s per batch. This maintained higher action accuracy and robustness in the humanoid robot. Full article
Show Figures

Figure 1

18 pages, 1678 KB  
Article
Body Knowledge and Emotion Recognition in Preschool Children: A Comparative Study of Human Versus Robot Tutors
by Alice Araguas, Arnaud Blanchard, Sébastien Derégnaucourt, Adrien Chopin and Bahia Guellai
Behav. Sci. 2026, 16(1), 29; https://doi.org/10.3390/bs16010029 - 23 Dec 2025
Viewed by 304
Abstract
Social robots are increasingly integrated into early childhood education, yet limited research exists examining preschoolers’ learning from robotic versus human demonstrators across embodied tasks. This study investigated whether children (aged between 3 and 6) demonstrate comparable performance when learning body-centered tasks from a [...] Read more.
Social robots are increasingly integrated into early childhood education, yet limited research exists examining preschoolers’ learning from robotic versus human demonstrators across embodied tasks. This study investigated whether children (aged between 3 and 6) demonstrate comparable performance when learning body-centered tasks from a humanoid robot compared to a human demonstrator. Sixty-two typically developing children were randomly assigned to a robot or a human condition. Participants completed three tasks: body part comprehension and production, body movement imitation, and emotion recognition from body postures. Performance was measured using standardized protocols. No significant main effects of demonstrator type emerged across most tasks. However, age significantly predicted performance across all measures, with systematic improvements between 3 and 6. A significant age × demonstrator interaction was observed for sequential motor imitation, with stronger age effects for the human demonstrator condition. Preschool children demonstrate comparable performance when interacting with a humanoid robot versus a human in body-centered tasks, though motor imitation shows differential developmental trajectories. These findings suggest appropriately designed social robots may serve as supplementary pedagogical tools for embodied learning in early childhood education under specific conditions. The primacy of developmental effects highlights the importance of age-appropriate design in both traditional and technology-enhanced educational contexts. Full article
Show Figures

Figure 1

30 pages, 4814 KB  
Article
Cross-Embodiment Kinematic Behavioral Cloning (X-EKBC): An Energy-Based Framework for Human–Robot Imitation Learning with the Embodiment Gap
by Yoshiki Tsunekawa, Masaki Tanaka and Kosuke Sekiyama
Machines 2025, 13(12), 1134; https://doi.org/10.3390/machines13121134 - 10 Dec 2025
Viewed by 533
Abstract
In imitation learning with the embodiment gap, directly transferring human motions to robots is challenging due to differences in body structures. Therefore, it is necessary to reconstruct human motions in accordance with each robot’s embodiment. Our previous work focused on the right arm [...] Read more.
In imitation learning with the embodiment gap, directly transferring human motions to robots is challenging due to differences in body structures. Therefore, it is necessary to reconstruct human motions in accordance with each robot’s embodiment. Our previous work focused on the right arm of a humanoid robot, which limited the generality of the approach. To address this, we propose Cross-Embodiment Kinematic Behavioral Cloning (X-EKBC), an imitation learning framework that enables movement-level imitation on a one-to-one basis between humans and multiple robots with embodiment gaps. We introduce a joint matrix that represents the structural correspondence between the human and robot bodies, and by solving kinematics based on this matrix, the system can efficiently reconstruct motions adapted to each robot’s embodiment. Furthermore, by employing Implicit Behavioral Cloning (IBC), the proposed method achieves both imitation learning of the reconstructed motions and quantitative evaluation of embodiment gaps using energy-based modeling. As a result, motion reconstruction through the joint matrix became feasible, enabling both imitation learning and quantitative embodiment evaluation based on reconstructed behaviors. Future work will aim to extend this framework toward motion-level imitation that captures higher-level behavioral outcomes. Full article
(This article belongs to the Special Issue Robots with Intelligence: Developments and Applications)
Show Figures

Figure 1

25 pages, 2873 KB  
Article
Dynamic Attention Analysis of Body Parts in Transformer-Based Human–Robot Imitation Learning with the Embodiment Gap
by Yoshiki Tsunekawa and Kosuke Sekiyama
Machines 2025, 13(12), 1133; https://doi.org/10.3390/machines13121133 - 10 Dec 2025
Viewed by 447
Abstract
In imitation learning between humans and robots, the embodiment gap is a key challenge. By focusing on a specific body part and compensating for the rest according to the robot’s size, the embodiment gap can be overcome. In this paper, we analyze dynamic [...] Read more.
In imitation learning between humans and robots, the embodiment gap is a key challenge. By focusing on a specific body part and compensating for the rest according to the robot’s size, the embodiment gap can be overcome. In this paper, we analyze dynamic attention to body parts in imitation learning between humans and robots based on a Transformer model. To adapt human imitation movements to a robot, we solved forward and inverse kinematics using the Levenberg–Marquardt method and performed feature extraction using the k-means method to make the data suitable for Transformer input. The imitation learning process is carried out using the Transformer. UMAP is employed to visualize the attention layer within the Transformer. As a result, this system enabled imitation of movements while focusing on multiple body parts between humans and robots with an embodiment gap, revealing the transitions of body parts receiving attention and their relationships in the robot’s acquired imitation movements. Full article
(This article belongs to the Special Issue Robots with Intelligence: Developments and Applications)
Show Figures

Figure 1

19 pages, 8700 KB  
Article
Human-Inspired Force-Motion Imitation Learning with Dynamic Response for Adaptive Robotic Manipulation
by Yuchuang Tong, Haotian Liu, Tianbo Yang and Zhengtao Zhang
Biomimetics 2025, 10(12), 825; https://doi.org/10.3390/biomimetics10120825 - 9 Dec 2025
Viewed by 436
Abstract
Recent advances in bioinspired robotics highlight the growing demand for dexterous, adaptive control strategies that allow robots to interact naturally, safely, and efficiently with dynamic, contact-rich environments. Yet, achieving robust adaptability and reflex-like responsiveness to unpredictable disturbances remains a fundamental challenge. This paper [...] Read more.
Recent advances in bioinspired robotics highlight the growing demand for dexterous, adaptive control strategies that allow robots to interact naturally, safely, and efficiently with dynamic, contact-rich environments. Yet, achieving robust adaptability and reflex-like responsiveness to unpredictable disturbances remains a fundamental challenge. This paper presents a bioinspired imitation learning framework that models human adaptive dynamics to jointly acquire and generalize motion and force skills, enabling compliant and resilient robot behavior. The proposed framework integrates hybrid force–motion learning with dynamic response mechanisms, achieving broad skill generalization without reliance on external sensing modalities. A momentum-based force observer is combined with dynamic movement primitives (DMPs) to enable accurate force estimation and smooth motion coordination, while a broad learning system (BLS) refines the DMP forcing function through style modulation, feature augmentation, and adaptive weight tuning. In addition, an adaptive radial basis function neural network (RBFNN) controller dynamically adjusts control parameters to ensure precise, low-latency skill reproduction, and safe physical interaction. Simulations and real-world experiments confirm that the proposed framework achieves human-like adaptability, robustness, and scalability, attaining a competitive learning time of 5.56 s and a rapid generation time of 0.036 s, thereby demonstrating its efficiency and practicality for real-time applications and offering a lightweight yet powerful solution for bioinspired intelligent control in complex and unstructured environments. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

18 pages, 1653 KB  
Article
Sim2Real Transfer of Imitation Learning of Motion Control for Car-like Mobile Robots Using Digital Twin Testbed
by Narges Mohaghegh, Hai Wang and Amirmehdi Yazdani
Robotics 2025, 14(12), 180; https://doi.org/10.3390/robotics14120180 - 30 Nov 2025
Viewed by 841
Abstract
Reliable transfer of control policies from simulation to real-world robotic systems remains a central challenge in robotics, particularly for car-like mobile robots. Digital Twin (DT) technology provides a robust framework for high-fidelity replication of physical platforms and bi-directional synchronization between virtual and real [...] Read more.
Reliable transfer of control policies from simulation to real-world robotic systems remains a central challenge in robotics, particularly for car-like mobile robots. Digital Twin (DT) technology provides a robust framework for high-fidelity replication of physical platforms and bi-directional synchronization between virtual and real environments. In this study, a DT-based testbed is developed to train and evaluate an imitation learning (IL) control framework in which a neural network policy learns to replicate the behavior of a hybrid Model Predictive Control (MPC)–Backstepping expert controller. The DT framework ensures consistent benchmarking between simulated and physical execution, supporting a structured and safe process for policy validation and deployment. Experimental analysis demonstrates that the learned policy effectively reproduces expert behavior, achieving bounded trajectory-tracking errors and stable performance across simulation and real-world tests. The results confirm that DT-enabled IL provides a viable pathway for Sim2Real transfer, accelerating controller development and deployment in autonomous mobile robotics. Full article
(This article belongs to the Section AI in Robotics)
Show Figures

Figure 1

24 pages, 2109 KB  
Article
ToggleMimic: A Two-Stage Policy for Text-Driven Humanoid Whole-Body Control
by Weifeng Zheng, Shigang Wang and Bohua Qian
Sensors 2025, 25(23), 7259; https://doi.org/10.3390/s25237259 - 28 Nov 2025
Viewed by 991
Abstract
For humanoid robots to interact naturally with humans and seamlessly integrate into daily life, natural language serves as an essential communication medium. While recent advances in imitation learning have enabled robots to acquire complex motions through expert demonstration, traditional approaches often rely on [...] Read more.
For humanoid robots to interact naturally with humans and seamlessly integrate into daily life, natural language serves as an essential communication medium. While recent advances in imitation learning have enabled robots to acquire complex motions through expert demonstration, traditional approaches often rely on rigid task specifications or single-modal inputs, limiting their ability to interpret high-level semantic instructions (e.g., natural language commands) or dynamically switch between actions. Directly translating natural language into executable control commands remains a significant challenge. To address this, we propose ToggleMimic, an end-to-end imitation learning framework that generates robotic motions from textual instructions, enabling language-driven multi-task control. In contrast to end-to-end methods that struggle with generalization or single-action models that lack flexibility, our ToggleMimic framework uniquely combines the following: (1) a two-stage policy distillation that efficiently bridges the sim-to-real gap, (2) a lightweight cross-attention mechanism for interpretable text-to-action mapping, and (3) a gating network that enhances robustness to linguistic variations. Extensive simulation and real-world experiments demonstrate the framework’s effectiveness, generalization capability, and robust text-guided control performance. This work establishes an efficient, interpretable, and scalable learning paradigm for cross-modal semantic-driven autonomous robot control. Full article
Show Figures

Figure 1

23 pages, 4428 KB  
Article
Learning to Navigate in Mixed Human–Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework
by Ibrahim K. Kabir, Muhammad F. Mysorewala, Yahya I. Osais and Ali Nasir
Mach. Learn. Knowl. Extr. 2025, 7(4), 145; https://doi.org/10.3390/make7040145 - 13 Nov 2025
Viewed by 876
Abstract
The rapid growth of technology has introduced robots into daily life, necessitating navigation frameworks that enable safe, human-friendly movement while accounting for social aspects. Such methods must also scale to situations with multiple humans and robots moving simultaneously. Recent advances in Deep Reinforcement [...] Read more.
The rapid growth of technology has introduced robots into daily life, necessitating navigation frameworks that enable safe, human-friendly movement while accounting for social aspects. Such methods must also scale to situations with multiple humans and robots moving simultaneously. Recent advances in Deep Reinforcement Learning (DRL) have enabled policies that incorporate these norms into navigation. This work presents a socially aware navigation framework for mobile robots operating in environments shared with humans and other robots. The approach, based on single-agent DRL, models all interaction types between the ego robot, humans, and other robots. Training uses a reward function balancing task completion, collision avoidance, and maintaining comfortable distances from humans. An attention mechanism enables the framework to extract knowledge about the relative importance of surrounding agents, guiding safer and more efficient navigation. Our approach is tested in both dynamic and static obstacle environments. To improve training efficiency and promote socially appropriate behaviors, Imitation Learning is employed. Comparative evaluations with state-of-the-art methods highlight the advantages of our approach, especially in enhancing safety by reducing collisions and preserving comfort distances. Results confirm the effectiveness of our learned policy and its ability to extract socially relevant knowledge in human–robot environments where social compliance is essential for deployment. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

23 pages, 7306 KB  
Article
Two-Layered Reward Reinforcement Learning in Humanoid Robot Motion Tracking
by Jiahong Xu, Zhiwei Zheng and Fangyuan Ren
Mathematics 2025, 13(21), 3445; https://doi.org/10.3390/math13213445 - 29 Oct 2025
Viewed by 1842
Abstract
In reinforcement learning (RL), reward function design is critical to the learning efficiency and final performance of agents. However, in complex tasks such as humanoid motion tracking, traditional static weighted reward functions struggle to adapt to shifting learning priorities across training stages, and [...] Read more.
In reinforcement learning (RL), reward function design is critical to the learning efficiency and final performance of agents. However, in complex tasks such as humanoid motion tracking, traditional static weighted reward functions struggle to adapt to shifting learning priorities across training stages, and designing a suitable shaping reward is problematic. To address these challenges, this paper proposes a two-layered reward reinforcement learning framework. The framework decomposes the reward into two layers: an upper-level goal reward that measures task completion, and a lower-level optimizing reward that includes auxiliary objectives such as stability, energy consumption, and motion smoothness. The key innovation lies in the online optimization of the lower-level reward weights via an online meta-heuristic optimization algorithm. This online adaptivity enables goal-conditioned reward shaping, allowing the reward structure to evolve autonomously without requiring expert demonstrations, thereby improving learning robustness and interpretability. The framework is tested on a gymnastic motion tracking problem for the Unitree G1 humanoid robot in the Isaac Gym simulation environment. The experimental results show that, compared to a static reward baseline, the proposed framework achieves 7.58% and 10.30% improvements in upper-body and lower-body link tracking accuracy, respectively. The resulting motions also exhibit better synchronization and reduced latency. The simulation results demonstrate the effectiveness of the framework in promoting efficient exploration, accelerating convergence, and enhancing motion imitation quality. Full article
(This article belongs to the Special Issue Nonlinear Control Systems for Robotics and Automation)
Show Figures

Figure 1

24 pages, 38382 KB  
Article
Skeleton Information-Driven Reinforcement Learning Framework for Robust and Natural Motion of Quadruped Robots
by Huiyang Cao, Hongfa Lei, Yangjun Liu, Zheng Chen, Shuai Shi, Bingquan Li, Weichao Xu and Zhi-Xin Yang
Symmetry 2025, 17(11), 1787; https://doi.org/10.3390/sym17111787 - 22 Oct 2025
Viewed by 1805
Abstract
Legged robots have great potential in complex environments, but achieving robust and natural locomotion remains difficult due to challenges in generating smooth gaits and resisting disturbances. This article presents a novel reinforcement learning framework that integrates a skeleton-aware graph neural network (GNN), a [...] Read more.
Legged robots have great potential in complex environments, but achieving robust and natural locomotion remains difficult due to challenges in generating smooth gaits and resisting disturbances. This article presents a novel reinforcement learning framework that integrates a skeleton-aware graph neural network (GNN), a single-stage teacher–student architecture, a system-response model, and a Wasserstein Adversarial Motion Priors (wAMP) module. The skeleton-aware GNN enriches observations by encoding key node information and link properties, providing structured body information and better spatial awareness on irregular terrains. Unlike conventional two-stage approaches, this method jointly trains teacher and student policies to accelerate learning and improve sim-to-real transfer using hybrid advantage estimation (HAE). The system-response model further enhances robustness by predicting future observations from historical states via contrastive learning, enabling the policy to anticipate terrain variations and external disturbances. Finally, wAMP provides a more stable adversarial imitation method for fitting expert datasets of both flat ground and stair locomotion. Experiments on quadruped robots demonstrate that the proposed approach achieves more natural gaits and stronger robustness than existing baselines. Full article
Show Figures

Figure 1

19 pages, 7767 KB  
Article
Fabric Flattening with Dual-Arm Manipulator via Hybrid Imitation and Reinforcement Learning
by Youchun Ma, Fuyuki Tokuda, Akira Seino, Akinari Kobayashi, Mitsuhiro Hayashibe and Kazuhiro Kosuge
Machines 2025, 13(10), 923; https://doi.org/10.3390/machines13100923 - 6 Oct 2025
Viewed by 794
Abstract
Fabric flattening is a critical pre-processing step for automated garment manufacturing. Most existing approaches employ single-arm robotic systems that act at a single contact point. Due to the nonlinear and deformable dynamics of fabric, such systems often require multiple actions to achieve a [...] Read more.
Fabric flattening is a critical pre-processing step for automated garment manufacturing. Most existing approaches employ single-arm robotic systems that act at a single contact point. Due to the nonlinear and deformable dynamics of fabric, such systems often require multiple actions to achieve a fully flattened state. This study introduces a dual-arm fabric-flattening method based on a cascaded Proposal–Action network with a hybrid training framework. The PA network is first trained through imitation learning from human demonstrations and is subsequently refined through reinforcement learning with real-world flattening feedback. Experimental results demonstrate that the hybrid training framework substantially improves the overall flattening success rate compared with a policy trained only on human demonstrations. The success rate for a single flattening operation increases from 74% to 94%, while the overall success rate improves from 82% to 100% after two rounds of training. Furthermore, the learned policy, trained exclusively on baseline fabric, generalizes effectively to fabrics with varying thicknesses and stiffnesses. The approach reduces the number of required flattening actions while maintaining a high success rate, thereby enhancing both efficiency and practicality in automated garment manufacturing. Full article
Show Figures

Figure 1

25 pages, 4810 KB  
Review
Deep Reinforcement and IL for Autonomous Driving: A Review in the CARLA Simulation Environment
by Piotr Czechowski, Bartosz Kawa, Mustafa Sakhai and Maciej Wielgosz
Appl. Sci. 2025, 15(16), 8972; https://doi.org/10.3390/app15168972 - 14 Aug 2025
Cited by 1 | Viewed by 6668
Abstract
Autonomous driving is a complex and fast-evolving domain at the intersection of robotics, machine learning, and control systems. This paper provides a systematic review of recent developments in reinforcement learning (RL) and imitation learning (IL) approaches for autonomous vehicle control, with a dedicated [...] Read more.
Autonomous driving is a complex and fast-evolving domain at the intersection of robotics, machine learning, and control systems. This paper provides a systematic review of recent developments in reinforcement learning (RL) and imitation learning (IL) approaches for autonomous vehicle control, with a dedicated focus on the CARLA simulator, an open-source, high-fidelity platform that has become a standard for learning-based autonomous vehicle (AV) research. We analyze RL-based and IL-based studies, extracting and comparing their formulations of state, action, and reward spaces. Special attention is given to the design of reward functions, control architectures, and integration pipelines. Comparative graphs and diagrams illustrate performance trade-offs. We further highlight gaps in generalization to real-world driving scenarios, robustness under dynamic environments, and scalability of agent architectures. Despite rapid progress, existing autonomous driving systems exhibit significant limitations. For instance, studies show that end-to-end reinforcement learning (RL) models can suffer from performance degradation of up to 35% when exposed to unseen weather or town conditions, and imitation learning (IL) agents trained solely on expert demonstrations exhibit up to 40% higher collision rates in novel environments. Furthermore, reward misspecification remains a critical issue—over 20% of reported failures in simulated environments stem from poorly calibrated reward signals. Generalization gaps, especially in RL, also manifest in task-specific overfitting, with agents failing up to 60% of the time when faced with dynamic obstacles not encountered during training. These persistent shortcomings underscore the need for more robust and sample-efficient learning strategies. Finally, we discuss hybrid paradigms that integrate IL and RL, such as Generative Adversarial IL, and propose future research directions. Full article
(This article belongs to the Special Issue Design and Applications of Real-Time Embedded Systems)
Show Figures

Figure 1

19 pages, 3818 KB  
Article
Robotic Arm Trajectory Planning in Dynamic Environments Based on Self-Optimizing Replay Mechanism
by Pengyao Xu, Chong Di, Jiandong Lv, Peng Zhao, Chao Chen and Ruotong Wang
Sensors 2025, 25(15), 4681; https://doi.org/10.3390/s25154681 - 29 Jul 2025
Cited by 1 | Viewed by 2002
Abstract
In complex dynamic environments, robotic arms face multiple challenges such as real-time environmental changes, high-dimensional state spaces, and strong uncertainties. Trajectory planning tasks based on deep reinforcement learning (DRL) suffer from difficulties in acquiring human expert strategies, low experience utilization (leading to slow [...] Read more.
In complex dynamic environments, robotic arms face multiple challenges such as real-time environmental changes, high-dimensional state spaces, and strong uncertainties. Trajectory planning tasks based on deep reinforcement learning (DRL) suffer from difficulties in acquiring human expert strategies, low experience utilization (leading to slow convergence), and unreasonable reward function design. To address these issues, this paper designs a neural network-based expert-guided triple experience replay mechanism (NETM) and proposes an improved reward function adapted to dynamic environments. This replay mechanism integrates imitation learning’s fast data fitting with DRL’s self-optimization to expand limited expert demonstrations and algorithm-generated successes into optimized expert experiences. Experimental results show the expanded expert experience accelerates convergence: in dynamic scenarios, NETM boosts accuracy by over 30% and safe rate by 2.28% compared to baseline algorithms. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

21 pages, 1118 KB  
Review
Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines
by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia
AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025
Cited by 1 | Viewed by 9540
Abstract
This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into [...] Read more.
This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article
Show Figures

Figure 1

Back to TopTop