HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy

Fang, Xing; Liao, Honghao; Chen, Yanyun; Tan, Wenhao; Li, Xiaolei

doi:10.3390/act15040212

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy

by

Xing Fang

,

Honghao Liao

,

Yanyun Chen

,

Wenhao Tan

and

Xiaolei Li

^*

School of Control Science and Engineering, Shandong University, Jinan 250100, China

^*

Author to whom correspondence should be addressed.

Actuators 2026, 15(4), 212; https://doi.org/10.3390/act15040212

Submission received: 16 March 2026 / Revised: 5 April 2026 / Accepted: 9 April 2026 / Published: 11 April 2026

(This article belongs to the Section Actuators for Robotics)

Download Versions Notes

Abstract

Translating large-scale motion datasets into robust, deployable humanoid controllers is a critical challenge in engineering informatics, primarily due to the scarcity of high-quality annotations, the risk of mode collapse in conditional generation, and the strict constraints of onboard computing hardware. This paper presents a deployable two-stage learning system that maps clip-level motion datasets to a single-policy multi-skill controller and its deployable counterpart. We adopt coarse one-hot skill labels that can be assigned automatically at the clip level with negligible manual effort, enabling scalable dataset construction. To prevent conditional discriminators from ignoring skill conditions, we inject mismatched (transition, label) pairs and introduce a condition-aware loss that explicitly penalizes incorrect transition–label associations, improving controllability and mitigating mode collapse. For real-world deployment, we further propose a two-stage training strategy: a privileged teacher policy is first trained in simulation and then distilled into a student policy that relies on stacked historical proprioceptive observations, ensuring robustness against sensing noise and latency without relying on external state estimation. Extensive evaluations in simulation and on real hardware demonstrate improved skill coverage, transition coverage, realism, and training efficiency across heterogeneous embodiments. With the onboard computer of a Unitree G1 robot, the distilled policy runs at 100 Hz with 15–25 ms latency, confirming the system’s engineering feasibility.

Keywords: humanoid robot; adversarial imitation learning; policy distillation; sim-to-real transfer

Share and Cite

MDPI and ACS Style

Fang, X.; Liao, H.; Chen, Y.; Tan, W.; Li, X. HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators 2026, 15, 212. https://doi.org/10.3390/act15040212

AMA Style

Fang X, Liao H, Chen Y, Tan W, Li X. HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators. 2026; 15(4):212. https://doi.org/10.3390/act15040212

Chicago/Turabian Style

Fang, Xing, Honghao Liao, Yanyun Chen, Wenhao Tan, and Xiaolei Li. 2026. "HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy" Actuators 15, no. 4: 212. https://doi.org/10.3390/act15040212

APA Style

Fang, X., Liao, H., Chen, Y., Tan, W., & Li, X. (2026). HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy. Actuators, 15(4), 212. https://doi.org/10.3390/act15040212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HAML: Humanoid Adversarial Multi-Skill Learning via a Single Policy

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI