Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (377)

Search Parameters:
Keywords = imitation learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 3055 KB  
Article
Simulation Study on Real-Time Autonomous Driving Decision-Making Using BEV Perception and Large Language Models
by Gaosong Shi, Mingxiao Yu and Xiaofan Sun
Technologies 2026, 14(3), 172; https://doi.org/10.3390/technologies14030172 - 10 Mar 2026
Abstract
Large language models (LLMs) exhibit strong semantic reasoning capabilities for autonomous driving decision-making; however, their substantial inference latency poses a critical challenge for real-time closed-loop vehicle control. This study proposes an engineering-oriented framework to enable latency-constrained LLM-based decision-making by integrating bird’s-eye-view (BEV) structured [...] Read more.
Large language models (LLMs) exhibit strong semantic reasoning capabilities for autonomous driving decision-making; however, their substantial inference latency poses a critical challenge for real-time closed-loop vehicle control. This study proposes an engineering-oriented framework to enable latency-constrained LLM-based decision-making by integrating bird’s-eye-view (BEV) structured perception with low-bit quantized inference. The BEV perception module compresses multi-view visual inputs into structured semantic representations, thereby reducing input redundancy and enhancing inference efficiency. In addition, 4-bit post-training quantization (PTQ), combined with an optimized inference engine, is employed to alleviate computational and memory bandwidth constraints during autoregressive decoding. Experiments conducted on the CARLA simulation platform under car-following, overtaking, and mixed driving scenarios—validated through 500 independent trials—demonstrate that the proposed framework substantially reduces end-to-end inference latency while maintaining stable decision-making performance. The results indicate that the system satisfies the 10 Hz real-time control requirement and significantly improves control quality, as evidenced by reduced collision rates and lower Average Jerk compared with both traditional imitation learning (Behavioral Cloning, BC) and the Transformer-based TransFuser baseline. Furthermore, sensitivity analyses confirm the robustness of the framework under environmental degradation and perception noise, underscoring the practical feasibility of deploying LLMs for safe and reliable closed-loop autonomous driving. Full article
Show Figures

Figure 1

33 pages, 66120 KB  
Article
Frequency-Domain Trajectory Planning for Autonomous Driving in Highly Dynamic Scenarios
by Jie Xia, Zhuo Kong, Xiaodong Wu, Boran Shi, Yuanbo Han and Min Xu
Appl. Sci. 2026, 16(5), 2447; https://doi.org/10.3390/app16052447 - 3 Mar 2026
Viewed by 182
Abstract
Trajectory planning is a central problem in autonomous driving, requiring long-horizon reasoning, strict safety guarantees, and robustness to rare but critical events. Recent learning-based planners increasingly formulate planning as an autoregressive sequence generation problem, analogous to large language models, where future motions are [...] Read more.
Trajectory planning is a central problem in autonomous driving, requiring long-horizon reasoning, strict safety guarantees, and robustness to rare but critical events. Recent learning-based planners increasingly formulate planning as an autoregressive sequence generation problem, analogous to large language models, where future motions are discretized into action tokens and predicted by Transformer-based neural sequence models. Despite promising empirical results, most existing approaches adopt time-domain action representations, in which consecutive actions are highly correlated. When combined with autoregressive decoding, this design induces degenerate generation behavior in learning-based planners, encouraging local action continuation and leading to rapid error accumulation during closed-loop execution, particularly in safety-critical corner cases such as sudden pedestrian emergence. To address this limitation of time-domain autoregressive planning, we propose a unified trajectory planning framework built upon three core ideas: (1) explicit action tokenization for long-horizon planning, (2) transformation of the action space from the time domain to the frequency domain, and (3) a hybrid learning paradigm that combines imitation learning with reinforcement learning. By representing future motion using compact frequency-domain action coefficients rather than per-timestep actions, the proposed planner is encouraged to reason about global motion intent before refining local details. This change in action representation fundamentally alters the inductive bias of learning-based autoregressive planning, mitigates exposure bias, and enables earlier and more decisive responses in complex and safety-critical environments. We present the model formulation, learning objectives, and training strategy, and outline a comprehensive experimental protocol. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

16 pages, 1002 KB  
Article
Does the Translation Continuation Task Exhibit Interaction and Alignment Effects? Evidence from a CSL Classroom in Cambodia
by Huan Zhang
Behav. Sci. 2026, 16(3), 351; https://doi.org/10.3390/bs16030351 - 2 Mar 2026
Viewed by 166
Abstract
The Continuation Argument, a newly emerging perspective on language acquisition, requires further exploration to deepen our understanding of how continuation-based tasks facilitate foreign language learning. This study examines the use of observable language forms within the integrated pedagogical procedure of the translation continuation [...] Read more.
The Continuation Argument, a newly emerging perspective on language acquisition, requires further exploration to deepen our understanding of how continuation-based tasks facilitate foreign language learning. This study examines the use of observable language forms within the integrated pedagogical procedure of the translation continuation task in Chinese as a second language (CSL) learning. Data were collected from 60 learners attending Khmer-Chinese translation classes in Grade 8 at a Chinese school in Phnom Penh, Cambodia. The findings reveal a consistent pattern of language reuse. (i) Learners demonstrate a significant increase in their reuse of target Chinese language structures (e.g., words, grammar, and discourse knowledge) from the pre-reading materials when completing the translation continuation tasks. (ii) The translation continuation task helps reduce errors and improve the quality of Chinese translations. (iii) Both teachers and students generally recognize the positive impact and pedagogical value of the translation continuation task. The observed “language reuse” is discussed in light of multiple potential mechanisms, such as priming and pedagogically induced imitation. Thus, the translation continuation task proves to be an effective method for guiding learners’ attention to and reuse of target language forms in practical CSL translation teaching. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

15 pages, 5293 KB  
Systematic Review
Embodied Artificial Intelligence in Healthcare: A Systematic Review of Robotic Perception, Decision-Making, and Clinical Impact
by Bilal Ahmad Mir, Dur E. Nishwa and Seung Won Lee
Healthcare 2026, 14(5), 572; https://doi.org/10.3390/healthcare14050572 - 25 Feb 2026
Viewed by 386
Abstract
Background: Embodied artificial intelligence (EAI), integrating advanced AI algorithms with robotic platforms capable of sensing, planning, and acting, has emerged as a transformative approach in healthcare delivery. This systematic review synthesizes evidence on robotic perception, decision-making, and clinical impact of EAI systems [...] Read more.
Background: Embodied artificial intelligence (EAI), integrating advanced AI algorithms with robotic platforms capable of sensing, planning, and acting, has emerged as a transformative approach in healthcare delivery. This systematic review synthesizes evidence on robotic perception, decision-making, and clinical impact of EAI systems in healthcare settings. Methods: Following PRISMA 2020 guidelines, we searched PubMed/MEDLINE, Scopus, Web of Science, IEEE Xplore, and ACM Digital Library for studies published between January 2020 and August 2025. Seventeen studies met eligibility criteria, spanning four domains: surgical assistance, rehabilitation, hospital logistics, and telepresence. The protocol was prospectively registered in PROSPERO under ID: CRD420261285936. Results: Perception architectures predominantly employed multimodal sensor fusion, combining vision with force/torque, depth, and physiological signals. Decision-making approaches included imitation learning, reinforcement learning, and hybrid symbolic-neural control. Key findings indicate that surgical robots demonstrated consistency advantages in specific experimental tasks, rehabilitation robotics produced statistically significant improvements (SMD = 0.29) across 396 randomized controlled trials, and both logistics and telepresence systems achieved very high operational success levels. Nonetheless, important barriers remain, including limited external validation, small sample sizes, and insufficient cost-effectiveness data. Conclusions: Future research should prioritize standardized benchmarks, prospective multicenter trials, and patient-centered outcome measures to facilitate clinical translation of EAI technologies. Full article
Show Figures

Figure 1

26 pages, 10726 KB  
Article
PI-VLA: Adaptive Symmetry-Aware Decision-Making for Long-Horizon Vision–Language–Action Manipulation
by Yina Jian, Di Tian, Xuan-Jing Chen, Zhen-Yuan Wei, Chen-Wei Liang and Mu-Jiang-Shan Wang
Symmetry 2026, 18(3), 394; https://doi.org/10.3390/sym18030394 - 24 Feb 2026
Viewed by 333
Abstract
Vision–language–action (VLA) models often suffer from limited robustness in long-horizon manipulation tasks—where robots must execute extended sequences of actions over multiple time steps to achieve complex goals—due to their inability to explicitly exploit structural symmetries and to react adaptively when such symmetries are [...] Read more.
Vision–language–action (VLA) models often suffer from limited robustness in long-horizon manipulation tasks—where robots must execute extended sequences of actions over multiple time steps to achieve complex goals—due to their inability to explicitly exploit structural symmetries and to react adaptively when such symmetries are violated by environmental uncertainty. To address this limitation, this paper proposes PI-VLA, a symmetry-aware predictive and interactive VLA framework for robust robotic manipulation. PI-VLA is built upon three key symmetry-driven principles. First, a Cognitive–Motor Synergy (CMS) module jointly generates discrete and continuous action chunks together with predictive world-model features in a single forward pass, enforcing cross-modal action consistency as an implicit symmetry constraint across heterogeneous action representations. Second, a unified training objective integrates imitation learning, reinforcement learning, and state prediction, encouraging invariance to task-relevant transformations while enabling adaptive symmetry breaking when long-horizon deviations emerge. Third, an Active Uncertainty-Resolving Decider (AURD) explicitly monitors action consensus discrepancies and state prediction errors as symmetry-breaking signals, dynamically adjusting the execution horizon through closed-loop replanning. Extensive experiments on long-horizon benchmarks demonstrate that PI-VLA achieves state-of-the-art performance, attaining a 73.2% average success rate on the LIBERO benchmark (with particularly strong gains on the Long-Horizon suite) and an 88.3% success rate in real-world manipulation tasks under visual distractions and unseen conditions. Ablation studies confirm that symmetry-aware action consensus and uncertainty-triggered replanning are critical to robust execution. These results establish PI-VLA as a principled framework that leverages symmetry preservation and controlled symmetry breaking to enable reliable and interactive robotic manipulation. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

21 pages, 600 KB  
Article
The Role of the Different Components of Attention on Observational Learning in Early Primary School Children: New Insights and Educational Implications
by Francesca Foti, Valentina Lucia La Rosa, Luca Pullano, Tiziana Iaquinta and Elena Commodari
Brain Sci. 2026, 16(2), 237; https://doi.org/10.3390/brainsci16020237 - 19 Feb 2026
Viewed by 324
Abstract
Background/Objectives: Observational learning enables children to acquire new skills by observing others’ actions. Attention is widely recognized as a key supporting process and consists of multiple components that develop substantially during the early school years. Empirical evidence on the association between specific components [...] Read more.
Background/Objectives: Observational learning enables children to acquire new skills by observing others’ actions. Attention is widely recognized as a key supporting process and consists of multiple components that develop substantially during the early school years. Empirical evidence on the association between specific components of attention and observational learning remains limited. Therefore, this study examined the relationship between the main components of attention and observational learning among early primary school children. Methods: Sixty-eight children, aged 6–8, completed a computerized battery assessing the main components of attention (reaction times, simple and related to a choice; focused attention; short-term span of attention; divided and alternating attention) and an observational learning task where children observed an actor detecting a hidden spatial sequence and then reproduced it across detection phase (DP), exercise phase (EP), and automatization phase (AP). Correlational and regression analyses were conducted, controlling for age and gender. Results: Visual and visual–spatial focused attention emerged as significant predictors of performance during DP and EP, with higher levels of focused attention associated with fewer errors and repetitions. Choice reaction time showed phase-specific associations with error rates during early learning phases, whereas age was primarily related to performance during the AP. Conclusions: Observational learning in early primary school relies on specific components of attention rather than on attention as a unitary construct. Visual and visual–spatial focused attention plays a central role during the acquisition and consolidation of observed sequences, with implications for understanding learning from models and for educational practices based on demonstration. Full article
(This article belongs to the Section Developmental Neuroscience)
Show Figures

Figure 1

31 pages, 461 KB  
Systematic Review
Techniques Applied to Autonomous Liquid Pouring: A Scoping Review
by Jeeangh Jennessi Reyes-Montiel, Ericka Janet Rechy-Ramirez and Antonio Marin-Hernandez
Math. Comput. Appl. 2026, 31(1), 30; https://doi.org/10.3390/mca31010030 - 14 Feb 2026
Viewed by 291
Abstract
In recent years, autonomous liquid pouring systems have gained more relevance, with applications from daily service tasks to complex industrial operations. While seemingly simple for humans, this task poses major challenges for automated systems, as it requires precise control and adaptation to varying [...] Read more.
In recent years, autonomous liquid pouring systems have gained more relevance, with applications from daily service tasks to complex industrial operations. While seemingly simple for humans, this task poses major challenges for automated systems, as it requires precise control and adaptation to varying container geometries, liquid properties, and environmental conditions. This review examines the state-of-the-art on liquid pouring through five research questions: (1) What are the characteristics of the liquids used in the experiments? (2) What are the characteristics of the containers used in the experiments and how do they affect the performance of the pouring tasks? (3) What techniques are used to control liquid pouring (i.e., to control the robotic arm or device)? (4) What metrics are used to assess the methods for pouring liquid? (5) What devices are used to measure poured volume? This scoping review follows the Arksey and O’Malley framework, and uses the PRISMA-ScR protocol to filter the articles. A total of 285 studies published between 2018 and 2025 were screened from IEEE Xplore, SpringerLink, ScienceDirect, Web of Science, and EBSCOhost, of which 23 met the inclusion criteria. Results showed that the most widely used methods for autonomous liquid pouring were classical control methods—PID, PD (30.4% of the studies). Conversely, the least widely used methods for autonomous liquid pouring were learning, imitation learning, and probabilistic models (15% of the studies). Full article
(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2025)
Show Figures

Figure 1

21 pages, 3773 KB  
Article
Motion Strategy Generation Based on Multimodal Motion Primitives and Reinforcement Learning Imitation for Quadruped Robots
by Qin Zhang, Guanglei Li, Benhang Liu, Chenxi Li, Chuanle Zhu and Hui Chai
Biomimetics 2026, 11(2), 115; https://doi.org/10.3390/biomimetics11020115 - 4 Feb 2026
Viewed by 510
Abstract
With the advancement of task-oriented reinforcement learning (RL), the capability of quadruped robots for motion generation and complex task completion has significantly improved. However, current control strategies require extensive domain expertise and time-consuming design processes to acquire operational skills and achieve multi-task motion [...] Read more.
With the advancement of task-oriented reinforcement learning (RL), the capability of quadruped robots for motion generation and complex task completion has significantly improved. However, current control strategies require extensive domain expertise and time-consuming design processes to acquire operational skills and achieve multi-task motion control, often failing to effectively manage complex behaviors composed of multiple coordinated actions. To address these limitations, this paper proposes a motion policy generation method for quadruped robots based on multimodal motion primitives and imitation learning. A multimodal motion library was constructed using 3D engine motion design, motion capture data retargeting, and trajectory planning. A temporal domain-based behavior planner was designed to combine these primitives and generate complex behaviors. We developed a RL-based imitation learning training framework to achieve precise trajectory tracking and rapid policy deployment, ensuring the effective application of actions/behaviors on the quadruped platform. Simulation and physical experiments conducted on the Lite3 quadruped robot validated the efficacy of the proposed approach, offering a new paradigm for the deployment and development of motion strategies for quadruped robots. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

21 pages, 3516 KB  
Article
Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications
by Jonghyuck Choi and Kyunghoon Cho
Electronics 2026, 15(3), 551; https://doi.org/10.3390/electronics15030551 - 27 Jan 2026
Viewed by 335
Abstract
We study control synthesis under Signal Temporal Logic (STL) specifications for driving scenarios where strict rule satisfaction is not always feasible and human experts exhibit context-dependent flexibility. We represent such behavior using robustness slackness—learned rule-wise lower bounds on STL robustness—and introduce sub-goals that [...] Read more.
We study control synthesis under Signal Temporal Logic (STL) specifications for driving scenarios where strict rule satisfaction is not always feasible and human experts exhibit context-dependent flexibility. We represent such behavior using robustness slackness—learned rule-wise lower bounds on STL robustness—and introduce sub-goals that encode intermediate intent in the state/output space (e.g., lane-level waypoints). Prior learning-based MPC–STL methods typically infer slackness with VAE priors and plug it into MPC, but these priors can underrepresent multimodal and rare yet valid expert behaviors and do not explicitly model intermediate intent. We propose a diffusion-guided MPC–STL framework that jointly learns slackness and sub-goals from demonstrations and integrates both into STL-constrained MPC. A conditional diffusion model generates pairs of (rule-wise slackness, sub-goal) conditioned on features from the ego vehicle, surrounding traffic, and road context. At run time, a few denoising steps produce samples for the current situation; slackness values define soft STL margins, while sub-goals shape the MPC objective via a terminal (optionally stage) cost, enabling context-dependent trade-offs between rule relaxation and task completion. In closed-loop simulations on held-out highD track-driving scenarios, our method improves task success and yields more realistic lane-changing behavior compared to imitation-learning baselines and MPC–STL variants using CVAE slackness or strict rule enforcement, while remaining computationally tractable for receding-horizon MPC in our experimental setting. Full article
(This article belongs to the Special Issue Real-Time Path Planning Design for Autonomous Driving Vehicles)
Show Figures

Figure 1

30 pages, 3291 KB  
Article
Identifying the Impact of Cross-Border E-Commerce on Urban Entrepreneurship: New Insights from China’s Cross-Border E-Commerce Comprehensive Pilot Zone
by Xianpu Xu, Yuchen Yan and Jiarui Hu
J. Theor. Appl. Electron. Commer. Res. 2026, 21(2), 42; https://doi.org/10.3390/jtaer21020042 - 26 Jan 2026
Viewed by 585
Abstract
Cross-border e-commerce, as an emerging trade format, offers new chances for optimizing industrial chains’ layout, enhancing economic resilience, and attaining high-quality development at the city level. In this context, treating the execution of the cross-border e-commerce comprehensive pilot zone (CBEC) as a quasi-natural [...] Read more.
Cross-border e-commerce, as an emerging trade format, offers new chances for optimizing industrial chains’ layout, enhancing economic resilience, and attaining high-quality development at the city level. In this context, treating the execution of the cross-border e-commerce comprehensive pilot zone (CBEC) as a quasi-natural experiment, this study subtly attests to how the CBEC affects urban entrepreneurship by using a difference-in-differences (DID) technique. The results exhibit that the CBEC greatly promotes urban entrepreneurship, which is supported by some robustness tests, including instrumental variable testing and placebo testing. Heterogeneity analysis reveals that in cities with more developed economies, stronger digitalization, richer cultures, sounder law rules, and better business environments, the benefit for the CBEC on entrepreneurship is more significant. Mechanism testing argues that the CBEC promotes urban entrepreneurship through talent aggregation and industrial upgrading. Precisely, the more concentrated high-quality talents are and the more advanced the industrial structure is, the higher the urban entrepreneurship. More importantly, the CBEC exhibits a spatial spillover effect on entrepreneurship, promoting local entrepreneurship while stimulating the motivation to imitate and learn in neighboring areas, thereby driving their entrepreneurship. The findings offer a viable decision-making guide for building a unified factor market and achieving regional coordinated development. Full article
(This article belongs to the Section Entrepreneurship, Innovation, and Digital Business Models)
Show Figures

Figure 1

33 pages, 32306 KB  
Article
A Reward-and-Punishment-Aware Incentive Mechanism for Directed Acyclic Graph Blockchain-Based Federated Learning in Unmanned Aerial Vehicle Networks
by Xiaofeng Xue, Qiong Li and Haokun Mao
Drones 2026, 10(1), 70; https://doi.org/10.3390/drones10010070 - 21 Jan 2026
Viewed by 254
Abstract
The integration of unmanned aerial vehicles (UAVs) and Federated Learning (FL) enables distributed model training while preserving data privacy. To overcome the challenges caused by centralized and synchronous model updates, we integrate Directed Acyclic Graph (DAG) blockchain-based FL into UAV networks. In this [...] Read more.
The integration of unmanned aerial vehicles (UAVs) and Federated Learning (FL) enables distributed model training while preserving data privacy. To overcome the challenges caused by centralized and synchronous model updates, we integrate Directed Acyclic Graph (DAG) blockchain-based FL into UAV networks. In this decentralized and asynchronous framework, UAVs can independently and autonomously participate in the FL process according to their own requirement. To achieve the high FL performance, it is essential for UAVs to actively contribute their computational and data resources to the FL process. However, it is challenging to ensure that UAVs consistently contribute their resources, as they may have a propensity to prioritize their own self-interest. Therefore, it is crucial to design effective incentive mechanisms that encourage UAVs to actively participate in the FL process and contribute their computational and data resources. Currently, research on effective incentive mechanisms for DAG blockchain-based FL framework in UAV networks remains limited. To address these challenges, this paper proposes a novel incentive mechanism that integrates both rewards and punishments to encourage UAVs to actively contribute to FL and to deter free riding under incomplete information. We formulate the interactions among UAVs as an evolutionary game, and the aspiration-driven rule is employed to imitate the UAV’s decision-making processes. We evaluate the proposed mechanism for UAVs within a DAG blockchain-based FL framework. Experimental results show that the proposed incentive mechanism substantially increases the average UAV contribution rate from 77.04±0.84% (without incentive mechanism) to 97.48±1.29%. Furthermore, the higher contribution rate results in an approximate 2.23% improvement in FL performance. Additionally, we evaluate the impact of different parameter configurations to analyze how they affect the performance and efficiency of the FL system. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

18 pages, 1756 KB  
Article
Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies
by Rodolfo Vera-Amaro, Alberto Luviano-Juárez and Mario E. Rivero-Ángeles
Drones 2026, 10(1), 34; https://doi.org/10.3390/drones10010034 - 6 Jan 2026
Viewed by 725
Abstract
This paper studies delay-aware formation control for (unmanned aerial vehicle) UAV swarms operating under realistic air-to-air communication latency. An attractive–repulsive distance-based potential-field (ARD-PF) controller is used as an expert to generate demonstrations for imitation learning in multi-UAV cooperative systems. By augmenting the training [...] Read more.
This paper studies delay-aware formation control for (unmanned aerial vehicle) UAV swarms operating under realistic air-to-air communication latency. An attractive–repulsive distance-based potential-field (ARD-PF) controller is used as an expert to generate demonstrations for imitation learning in multi-UAV cooperative systems. By augmenting the training data with communication delay, the learned policy implicitly compensates for outdated neighbor information and improves swarm coordination during autonomous flight. Extensive simulations across different swarm sizes, formation spacings, and delay levels show that delay-robust imitation learning significantly enlarges the probabilistic stability region compared with classical ARD-PF control and non-robust learning baselines. Formation control performance is evaluated using internal geometric error, global offset, and multi-run stability metrics. In addition, a predictive delay–stability model is introduced, linking the maximum admissible communication delay to swarm size and inter-agent spacing, with low fitting error against simulated stability boundaries. The results provide quantitative insights for designing communication-aware UAV swarm systems under latency constraints. Full article
(This article belongs to the Special Issue Advanced Flight Dynamics and Decision-Making for UAV Operations)
Show Figures

Figure 1

19 pages, 13574 KB  
Article
Deep Reinforcement Learning Control of a Hexapod Robot
by Taesoo Kim, Minjun Choi, Seunguk Choi, Taeuan Yoon and Dongil Choi
Actuators 2026, 15(1), 33; https://doi.org/10.3390/act15010033 - 5 Jan 2026
Viewed by 771
Abstract
Recent advances in legged robotics have highlighted deep reinforcement learning (DRL)-based controllers for their robust adaptability to diverse, unstructured environments. While position-based DRL controllers achieve high tracking accuracy, they offer limited disturbance rejection, which degrades walking stability; torque-based DRL controllers can mitigate this [...] Read more.
Recent advances in legged robotics have highlighted deep reinforcement learning (DRL)-based controllers for their robust adaptability to diverse, unstructured environments. While position-based DRL controllers achieve high tracking accuracy, they offer limited disturbance rejection, which degrades walking stability; torque-based DRL controllers can mitigate this issue but typically require extensive time and trial-and-error to converge. To address these challenges, we propose a Real-Time Motion Generator (RTMG). At each time step, RTMG kinematically synthesizes end-effector trajectories from target translational and angular velocities (yaw rate) and step length, then maps them to joint angles via inverse kinematics to produce imitation data. The RL agent uses this imitation data as a torque bias, which is gradually annealed during training to enable fully autonomous behavior. We further combine the RTMG-generated imitation data with a decaying action priors scheme to ensure both initial stability and motion diversity. The proposed training pipeline, implemented in NVIDIA Isaac Gym with Proximal Policy Optimization (PPO), reliably converges to the target gait pattern. The trained controller is Tensor RT-optimized and runs at 50 Hz on a Jetson Nano; relative to a position-based baseline, torso oscillation is reduced by 24.88% in simulation and 21.24% on hardware, demonstrating the effectiveness of the approach. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

17 pages, 540 KB  
Article
Research on Imitation–Reinforcement Hybrid Machine Learning Algorithms: Application in Path Planning
by Linsong Zhang and Xiaohui Yan
Mathematics 2026, 14(1), 161; https://doi.org/10.3390/math14010161 - 31 Dec 2025
Viewed by 549
Abstract
Path planning in complex, dynamic environments presents a significant challenge. Deep Reinforcement Learning (DRL) offers an end-to-end solution but suffers from critical sample inefficiency and a “cold-start” problem. Imitation Learning (IL) accelerates training but is constrained by a performance ceiling and poor generalization. [...] Read more.
Path planning in complex, dynamic environments presents a significant challenge. Deep Reinforcement Learning (DRL) offers an end-to-end solution but suffers from critical sample inefficiency and a “cold-start” problem. Imitation Learning (IL) accelerates training but is constrained by a performance ceiling and poor generalization. To address these limitations, we propose a novel Imitation–Reinforcement Hybrid Machine Learning Algorithm (Hybrid IL-RL). This framework balances exploration and performance via a two-stage process: First, an offline pre-training phase uses Behavioral Cloning (BC) with “non-expert” A* data from static environments for a “warm start”. Second, an online fine-tuning phase uses a DRL algorithm (SAC) to adapt this policy in complex, dynamic environments, allowing the agent to surpass the teacher’s limitations. Simulation experiments validate the approach. The framework demonstrates significantly faster convergence than DRL algorithms trained from scratch. Most critically, in the dynamic environment, our Hybrid IL-RL algorithm achieved the highest success rate (82.4%), while pure IL methods (BC, GAIL) failed due to poor generalization (e.g., 82.1% collision rate) and pure DRL methods struggled (approx. 51–56% success rate). Our results confirm the hybrid framework effectively solves the cold-start problem while using DRL to break the IL performance ceiling. Full article
Show Figures

Figure 1

33 pages, 3147 KB  
Review
Perception–Production of Second-Language Mandarin Tones Based on Interpretable Computational Methods: A Review
by Yujiao Huang, Zhaohong Xu, Xianming Bei and Huakun Huang
Mathematics 2026, 14(1), 145; https://doi.org/10.3390/math14010145 - 30 Dec 2025
Viewed by 742
Abstract
We survey recent advances in second-language (L2) Mandarin lexical tones research and show how an interpretable computational approach can deliver parameter-aligned feedback across perception–production (P ↔ P). We synthesize four strands: (A) conventional evaluations and tasks (identification, same–different, imitation/read-aloud) that reveal robust tone-pair [...] Read more.
We survey recent advances in second-language (L2) Mandarin lexical tones research and show how an interpretable computational approach can deliver parameter-aligned feedback across perception–production (P ↔ P). We synthesize four strands: (A) conventional evaluations and tasks (identification, same–different, imitation/read-aloud) that reveal robust tone-pair asymmetries and early P ↔ P decoupling; (B) physiological and behavioral instrumentation (e.g., EEG, eye-tracking) that clarifies cue weighting and time course; (C) audio-only speech analysis, from classic F0 tracking and MFCC–prosody fusion to CNN/RNN/CTC and self-supervised pipelines; and (D) interpretable learning, including attention and relational models (e.g., graph neural networks, GNNs) opened with explainable AI (XAI). Across strands, evidence converges on tones as time-evolving F0 trajectories, so movement, turning-point timing, and local F0 range are more diagnostic than height alone, and the contrast between Tone 2 (rising) and Tone 3 (dipping/low) remains the persistent difficulty; learners with tonal vs. non-tonal language backgrounds weight these cues differently. Guided by this synthesis, we outline a tool-oriented framework that pairs perception and production on the same items, jointly predicts tone labels and parameter targets, and uses XAI to generate local attributions and counterfactual edits, making feedback classroom-ready. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

Back to TopTop