1. Introduction
Quadruped robots exhibit exceptional terrain adaptability, demonstrating broad application prospects in disaster rescue, environmental inspection, and related fields [
1,
2,
3]. A representative example is Boston Dynamics’ Spot, which has shown outstanding performance across diverse environments, including construction sites and disaster zones [
4]. However, existing research predominantly focuses on unimodal policy transfer, failing to effectively integrate the complementary advantages of different motion generation methods [
5]. Consequently, establishing a cross-modal collaborative framework to achieve motion strategy fusion and adaptive selection is pivotal for advancing quadruped locomotion intelligence.
To enable effective motion strategy fusion, a systematic evaluation of existing motion generation approaches and their limitations is essential. Current research focuses on two primary methodologies: trajectory planning-based and motion capture-based methods.
Trajectory planning-based motion generation aims to produce optimal motion trajectories that satisfy smoothness, feasibility, and energy efficiency under multiple constraints. Medeiros et al. [
6] employed nonlinear programming to co-optimize base/wheel positions, interaction forces, and terrain information for wheeled-legged robots. Liu et al. [
7] proposed a hierarchical framework combining front-end safety search, B-spline convex hull optimization, and iterative refinement, achieving 100% navigation success in static cluttered environments while reducing energy consumption. Song et al. [
8] focused on energy-optimal jumping trajectory planning, enabling robots to overcome complex obstacles.
Motion data-driven generation methods leverage biological motion characteristics to create robust, lifelike, and generalizable quadruped motions. Ju et al. [
9] pioneered the cross-validation of spiral theory stability models with biological gait data, systematically revealing the dynamic advantages of common gait sequences. Yao et al. [
10] developed a video-based biomimetic adaptation network, using deep learning to extract spatiotemporal key features from animal motions and transferring them via a motion adapter. Motion video tracking captures actions from videos and extracts corresponding trajectories for motion generation. Additionally, motion capture technology provides high-precision motion data. Li et al. [
3] adopted multimodal motion primitive encoding to decouple cross-scale motion features from canine multi-terrain motion capture data. Fawcett et al. [
11] developed a data-driven template-based hierarchical control method for the real-time planning and control of dynamic quadruped robots.
In recent years, reinforcement learning (RL)-based motion control has emerged as a unified framework for robotic locomotion [
12]. RL has become a promising paradigm for developing robust legged movement control strategies [
13,
14,
15], enabling agents to learn motion generation policies directly through environmental interactions [
16]. Remarkable achievements include agile behaviors such as balancing, running, jumping, and robust walking under environmental uncertainties [
17]. Hwangbo et al. [
18] established a sim-to-real transfer framework with data-driven actuator modeling. Bellegarda et al. [
19] addressed unstructured terrain disturbances via a hybrid RL framework for dynamic jumping control. Azimi and Hoseinnezhad [
20] proposed a hierarchical RL framework to enhance the stability and adaptability of quadruped robots in dynamic environments.
Recent studies explore integrating imitation learning into RL to reduce reward design and unnatural behaviors. Peng et al. [
21] pioneered a primitive-fused deep RL paradigm, constructing a bio-inspired transfer framework for cross-domain animal-to-robot motion style conversion. To address skill generalization challenges, Yang et al. [
22] proposed a biomimetic motion primitive learning framework with heterogeneous reward mechanisms, enabling robots to acquire diverse skills through imitation learning. Roh [
23] designed a ground reaction force (GRF)-based reward function for animal motion imitation, achieving dynamic speed transitions during galloping and validating the efficacy of bio-inspired strategies for dynamic performance optimization. Chen et al. [
24] introduced an end-to-end torque control RL paradigm, directly outputting joint torques instead of traditional position control, demonstrating superior anti-disturbance capabilities and reward maximization. Wang et al. [
25] abandoned static control for load-carrying quadruped manipulators, proposing an RL-based arm–body dynamic coordination method inspired by quadruped limb synergies, significantly improving disturbance rejection. Miki et al. [
26] fused vision and proprioception via gated attention mechanisms, reducing terrain misclassification during Alpine field tests while achieving 0.8 m/s locomotion speeds—their dynamic weighting mechanism offers a novel paradigm for cross-modal collaborative control. Similarly, Ding et al. [
27] proposed a vision–language–action model, enabling quadruped robots to perform complex tasks in diverse environments with enhanced adaptability. From the perspective of system modeling assumptions and prior information, existing motion generation and control methods exhibit different trade-offs among interpretability, flexibility, and engineering practicality. Trajectory planning approaches rely on explicit dynamic models and constraints, offering strong interpretability but limited flexibility in complex multi-task scenarios [
28]. Reinforcement learning methods optimize policies through reward-driven learning and demonstrate strong adaptability; however, they typically require carefully designed reward functions and extensive interaction data, and their training stability and generalization performance remain challenging in real-world applications [
29]. In contrast, imitation learning introduces expert demonstrations as prior knowledge, providing an effective inductive bias for policy search and constraining the optimization process within a reasonable motion manifold [
30]. As a result, imitation learning improves training efficiency while maintaining stability and engineering feasibility for complex behavior generation.
Current research on optimizing the locomotion capabilities of quadruped robots often faces challenges, such as limited dimensionality in motion generation, abrupt transitions during behavior composition, and constrained control optimization objectives. These limitations hinder the reliable and efficient execution of smooth movements and multi-task operations in complex scenarios. To address these issues, this paper proposes a motion strategy generation method for quadruped robots based on multimodal motion primitives and imitation learning. The multimodal motion primitives do not refer to multiple motion primitives learned within a unified parameter space. Instead, they denote a collection of heterogeneous action representations derived from distinct motion generation paradigms, including 3D-engine-based keyframe specification, motion primitives obtained via motion capture data retargeting, and analytically generated trajectories based on central pattern generators (CPGs).
Existing approaches are typically trained for a single motion pattern, which makes it difficult to achieve the integrated execution of heterogeneous behaviors—such as stepping in place, locomotion, and posture adjustment—within a unified control framework [
18]. In contrast, the proposed method enables a unified representation and seamless switching among multiple behaviors through multimodal motion modeling and a behavior planning mechanism. The main contributions of this paper are as follows:
- •
A fundamental motion primitive library for quadruped robots was designed, establishing an underlying behavioral foundation for executing complex tasks and enabling flexible motion control.
- •
A modular architecture was employed to achieve spatiotemporal encoding of motion primitives and skill-chain recombination for quadruped robots, enabling the dynamic synthesis of behavior sequences in complex scenarios through a behavioral planner.
- •
An expert trajectory-guided Actor–Critic multi-objective optimization framework was improved for the motion control of quadruped robots. It incorporates a composite reward function to achieve hierarchical control under multi-task objectives, while integrating motion primitive imitation learning to accelerate policy convergence during training.
3. Experimental Tests and Results
This study employed the Lite3 quadruped robot model developed by DeepRobotics (Hangzhou, China), conducting training on the Isaac Gym simulation platform with an NVIDIA GeForce RTX 4090 GPU. Subsequently, the trained policy underwent dynamic adaptability validation for the robot model in the PyBullet physics engine. Additionally, the Blender 4.1 3D animation engine was utilized to create quadruped robot motion animations via keyframe insertion, facilitating motion sequence design and export. For RL, the PPO algorithm was adopted due to its computational efficiency. The hyperparameter configurations used during PPO-based simulation training are summarized in
Table 5.
In the experiments, the centroid height of the quadruped robot in standing posture was set to 0.32 m. The joint angle vector was initialized to
for hip, thigh, and calf joints, respectively. To ensure that joint rotation remained within safe operating limits, the upper and lower bounds of the joint motion range are detailed in
Table 6.
3.1. Motion Design Validation Using a 3D Engine
The experiment utilized Blender to create motion sequences, incorporating roll, pitch, and yaw rotations, with target angles set at ±20° in both directions. Motion control policies were generated in parametric space through imitation learning, with trajectory tracking accuracy verified via PyBullet simulation and the Lite3 physical platform.
Figure 7 presents comparative results of Euler angle tracking performance. Quantitative analysis shows the maximum tracking errors of 0.03 rad (roll), 0.01 rad (pitch), and 0.01 rad (yaw), meeting precision requirements for motion control.
3.2. Experimental Validation of Motion Capture-Based Motion Repositioning
In quadruped motion repositioning research, we first acquired the reference trajectory data adapted for the Lite3 robot by performing inverse kinematic mapping on motion-captured trot gait patterns, validating the effectiveness of forward trot motion (whose action sequence is shown in
Figure 8). Subsequently, through inverse kinematics parameter inversion, a mirrored backward trot motion sequence and control strategy were generated (with motion snapshots presented in
Figure 9).
We recorded the three DOF joint rotation angles in the robot’s right forelimb. As illustrated in
Figure 10, the complete gait cycle analysis demonstrates that trained, simulated, and experimentally measured trajectories all effectively track target joint angles, while exhibiting excellent continuity and smoothness without observable step distortion. Notably, the hip joint’s initial rotation displays an inward flexion tendency due to training-phase configurations designed to ensure motion initiation continuity.
Figure 11 further presents the joint angle tracking performance of the hip, thigh, and calf joints during 5 s periodic motions, with quantitative data confirming the effective reference angle across all three joints.
3.3. Sim-to-Real Deployment and Experimental Verification
To evaluate sim-to-real transfer performance, the policy trained in simulation was directly deployed on the Lite3 quadruped robot without any additional fine-tuning. During training, limited domain randomization and observation noise were introduced to improve robustness to real-world uncertainties, as described in
Section 2.2.4.
After training converges, the PyTorch 1.13.1 actor network was serialized into a static TorchScript model (.pt) and transferred from the training workstation to the robot’s onboard computer (NVIDIA Jetson Orin NX) via a secure network interface (SSH). Onboard, the policy was loaded using C++/LibTorch for online inference, with observation normalization and state processing kept consistent with simulation.
The real-world control system follows a low-rate policy inference–high-rate execution architecture. The policy runs at 50 Hz and outputs joint position residuals, which are mapped to target joint positions and tracked by a 1 kHz low-level PD controller. A state-based safety failsafe was implemented to ensure hardware safety during experiments.
To quantitatively assess sim-to-real transfer fidelity, simulated (Sim) and real-world (Real) trajectories were compared under identical command inputs. The discrepancy between simulation and real execution was evaluated using three standard metrics: root mean square error (RMSE), mean absolute error (MAE), and maximum absolute error (Max Error).
Specifically, errors of the robot CoM attitude, represented by Roll, Pitch, and Yaw, are summarized in
Table 7. In addition, joint-level errors for Hip, Thigh, and Knee are reported in
Table 8 to characterize sim-to-real discrepancies at the actuator execution level.
The quantitative errors reported in
Table 7 and
Table 8 are computed from the corresponding tracking trajectories presented in the subsequent experimental results. In particular, the CoM attitude errors are derived from the yaw motion shown in
Figure 7, whereas the joint-state errors are calculated based on the right foreleg joint trajectories shown in
Figure 10 and
Figure 11.
As shown in
Table 7 and
Table 8, the discrepancies between simulated and real robot executions remain within a small and bounded range across all reported metrics. Specifically, the RMSE and MAE of the robot CoM Euler angles were below 0.1 rad, whereas the corresponding maximum absolute errors remain limited. Similarly, for key joint states, both RMSE and MAE remain below 0.04 rad, with bounded maximum errors. Overall, these results indicate that the simulation results closely reflect the execution behavior on real hardware.
3.4. Experimental Validation of CPG-Based Trajectory Planning Motion Design
Taking the CPG-generated in-place stepping motion as an example, a phase-coupled oscillator model was employed to establish parametric equations for foot-end trajectories, with a gait cycle of 0.5 s and a leg lift height of 0.1 m.
Figure 12 demonstrates the physical control performance, where the robot accurately replicates reference trajectories, validating the framework’s effectiveness for CPG-planned motions. This controller maintains consistency with the previously described Blender and motion capture-based controllers.
3.5. Multi-Action Composite Behavior Experiment
Building upon the validated single-task control architecture, we developed an imitation learning-based multi-task control system. This system integrates Blender-generated torso twisting, motion-captured trot repositioning (forward/backward), and CPG-based in-place stepping into a unified behavioral dataset.
Figure 13 presents the physical robot executing this multi-action sequence, visually confirming the imitation learning framework’s capability for composite behavior generation.
3.6. Ablation Study on Temporal Behavior Planning
To isolate the contribution of the proposed temporal behavior planner, we conducted an ablation study, in which all compared methods shared the same imitation-learning-based low-level controller. All methods were evaluated under identical dataset splits, experimental settings, evaluation metrics, and on the same physical hardware platform, ensuring a fair and controlled comparison. We compared our behavior-planner-based motion primitive transition approach with two motion switching strategies commonly used in practice:
Method A: direct switching, where motion primitives are concatenated without any temporal smoothing or state alignment;
Method B: reset-to-neutral execution, where the robot returns to a stable standing pose and waits for stabilization before executing the next primitive [
18,
28];
Method C: continuous behavior planning (ours), which explicitly synthesizes smooth transition trajectories in the temporal domain.
Figure 14 shows the variation of the robot’s CoM Euler angles under different motion switching strategies. As shown in
Figure 14a, Method A lacks effective continuity when transitioning from the previous motion to the next, resulting in poorly controlled attitude changes and preventing the robot from successfully completing subsequent actions. As shown in
Figure 14b, Method B restores the robot to an initial standing posture before executing the next motion, enabling the completion of the entire motion sequence.
In contrast, as shown in
Figure 14c, Method C (ours) maintains continuity between consecutive motions during switching, keeping the attitude changes smooth. The roll, pitch, and yaw angles vary smoothly over time, allowing the motion sequence to be completed continuously.
Overall, although both the proposed method and the reset-based baseline were able to complete the task without instability, our approach achieves substantially smoother execution and higher efficiency by eliminating unnecessary waiting phases. In contrast, direct switching consistently fails due to the lack of effective continuity between consecutive motions.
Since all methods employ the same low-level controller, the observed performance differences can be solely attributed to the motion primitive transition strategy. These results suggest that the temporal behavior planner is beneficial for achieving stable, continuous, and efficient multi-skill execution on real robotic systems.