Next Article in Journal
Three-Vector-Based Model Predictive Direct Speed Control Strategy for Enhanced Target Tracking in Risley Prism Systems
Previous Article in Journal
A Study on Control System Design for Tugboat-Assisted Vessel Berthing Under Tugboat Failure
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy

School of Control Science and Engineering, Shandong University, Jinan 250100, China
*
Author to whom correspondence should be addressed.
Actuators 2026, 15(4), 212; https://doi.org/10.3390/act15040212
Submission received: 16 March 2026 / Revised: 5 April 2026 / Accepted: 9 April 2026 / Published: 11 April 2026
(This article belongs to the Section Actuators for Robotics)

Abstract

Translating large-scale motion datasets into robust, deployable humanoid controllers is a critical challenge in engineering informatics, primarily due to the scarcity of high-quality annotations, the risk of mode collapse in conditional generation, and the strict constraints of onboard computing hardware. This paper presents a deployable two-stage learning system that maps clip-level motion datasets to a single-policy multi-skill controller and its deployable counterpart. We adopt coarse one-hot skill labels that can be assigned automatically at the clip level with negligible manual effort, enabling scalable dataset construction. To prevent conditional discriminators from ignoring skill conditions, we inject mismatched (transition, label) pairs and introduce a condition-aware loss that explicitly penalizes incorrect transition–label associations, improving controllability and mitigating mode collapse. For real-world deployment, we further propose a two-stage training strategy: a privileged teacher policy is first trained in simulation and then distilled into a student policy that relies on stacked historical proprioceptive observations, ensuring robustness against sensing noise and latency without relying on external state estimation. Extensive evaluations in simulation and on real hardware demonstrate improved skill coverage, transition coverage, realism, and training efficiency across heterogeneous embodiments. With the onboard computer of a Unitree G1 robot, the distilled policy runs at 100 Hz with 15–25 ms latency, confirming the system’s engineering feasibility.
Keywords: humanoid robot; adversarial imitation learning; policy distillation; sim-to-real transfer humanoid robot; adversarial imitation learning; policy distillation; sim-to-real transfer

Share and Cite

MDPI and ACS Style

Fang, X.; Liao, H.; Chen, Y.; Tan, W.; Li, X. HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators 2026, 15, 212. https://doi.org/10.3390/act15040212

AMA Style

Fang X, Liao H, Chen Y, Tan W, Li X. HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators. 2026; 15(4):212. https://doi.org/10.3390/act15040212

Chicago/Turabian Style

Fang, Xing, Honghao Liao, Yanyun Chen, Wenhao Tan, and Xiaolei Li. 2026. "HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy" Actuators 15, no. 4: 212. https://doi.org/10.3390/act15040212

APA Style

Fang, X., Liao, H., Chen, Y., Tan, W., & Li, X. (2026). HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators, 15(4), 212. https://doi.org/10.3390/act15040212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop