Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions

Li, Jianfei; Zhao, Wenrui; Chen, Lei; Liu, Zhiyong; Sun, Shengxin

doi:10.3390/math14050848

Open AccessArticle

Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions

by

Jianfei Li

^1,2,

Wenrui Zhao

³,

Lei Chen

²,

Zhiyong Liu

¹ and

Shengxin Sun

^4,*

¹

State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

²

Beijing Key Laboratory of Intelligent Space Robotic System Technology and Applications, Beijing Institute of Spacecraft System Engineering, Beijing 100094, China

³

The Institute of Remote Sensing Satellites (IRSS), The China Academy of Space Technology (CAST), Beijing 100094, China

⁴

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(5), 848; https://doi.org/10.3390/math14050848

Submission received: 20 January 2026 / Revised: 21 February 2026 / Accepted: 22 February 2026 / Published: 2 March 2026

(This article belongs to the Special Issue Advances in Switched Systems and Control Theory: Theory and Application)

Download

Browse Figures

Versions Notes

Abstract

Quadruped robots possess strong adaptability to rugged terrain, soft ground, and multi-obstacle environments, offering broad application prospects in extraterrestrial planetary exploration. However, large diurnal temperature variations on extraterrestrial bodies exacerbate joint friction nonlinearity, degrading motion control accuracy and stability. To address this, a quadruped robot prototype with hybrid serial–parallel legs is designed for lunar exploration, and an 18-DOF dynamic model is derived using d’Alembert’s principle. Based on the PPO (Proximal Policy Optimization) reinforcement learning algorithm, joint friction parameters are identified using joint velocity and foot–ground contact force. By introducing friction compensation and contact force, an accurate dynamics-based feedback linearization control model is constructed, and a motion impedance control law is designed. Finally, joint friction parameters are identified and validated through both virtual and experimental prototypes, and the proposed control method is tested on flat and sloped terrain. Results show that the method can precisely regulate contact force and foot position, keeping RMSE (Root Mean Square Error) of position within 21.04 mm while preventing slipping and false contact.

Keywords:

quadruped robot; robot dynamics; joint friction identification; locomotion control; impedance control

MSC:

93C85

1. Introduction

With increasing demands for autonomous mobility in lunar exploration missions, quadruped robots, leveraging their discrete foot-end support, multi-degree-of-freedom configuration, and modular design, exhibit outstanding terrain adaptability and obstacle-crossing capabilities. They can access complex areas such as impact craters and lava tubes that are difficult for wheeled platforms to reach [1,2], demonstrating significant potential for future lunar surface exploration and resource utilization. To date, a variety of approaches have been proposed for quadruped robot motion control, including reinforcement learning-based robust control in wild environments [1], energy-efficient locomotion controllers for complex terrains [2], and whole-body hybrid control frameworks designed for deep-space exploration [3]. However, the extreme lunar surface conditions—such as high vacuum and low temperatures—severely degrade lubrication, leading to considerable uncertainty in joint friction. This adversely affects motion control accuracy and stability [4,5,6], representing a key challenge that must be urgently addressed.

The precision and stability of motion control in quadruped robots are influenced not only by their own dynamics but also significantly depend on the frictional torque in joint transmissions and the foot–ground contact friction characteristics [7,8,9]. Joint friction introduces additional torque disturbances, while variations in ground friction coefficients can lead to foot slippage, compromising walking stability [10]. In extreme environments such as the lunar surface, vacuum and low temperatures degrade lubrication conditions, making joint friction behavior more unpredictable and further increasing the risk of control instability [11]. Therefore, extensive research has been conducted on frictional torque modeling and ground friction coefficient identification to enhance the motion robustness of robots in complex environments.

In terms of frictional torque prediction and modeling, a common approach is to model and compensate for joint friction based on whole-body dynamic parameters [12,13]. For example, Li et al. proposed a whole-body disturbance suppression control framework capable of compensating in real-time for unknown torque disturbances, including unmodeled friction, significantly improving the robot’s disturbance rejection capability [14]. Morlando et al. designed a momentum-based disturbance observer to estimate and counteract unmodeled disturbances such as joint frictional torque, enhancing the robustness of the control system [15]. Furthermore, Gao et al. combined model predictive control with adaptive terminal sliding mode control for trajectory tracking in multi-legged robots, effectively mitigating disturbances from nonlinear factors like friction [16].

On the other hand, online identification and adaptation of ground friction coefficients are also key research directions. Elobaid et al. proposed an adaptive nonlinear centroidal model predictive control method that explicitly incorporates friction cone constraints, enabling the robot to adapt to terrains with different static friction coefficients [17]. Domestic scholars have also conducted relevant research: Liu et al. achieved stable walking on low-friction maritime platform environments by integrating foothold optimization with discrete control barrier functions [18]; Feng et al. combined trajectory planning with reinforcement learning to enhance the robot’s multi-gait adaptation to varying terrains and friction conditions [19]; Xu et al. improved the driving stability of wheel-legged robots on rough terrain through contact redundancy and rational distribution of wheel–ground friction [20].

In summary, existing research has primarily focused on macroscopic whole-body parameter modeling and empirical friction compensation, while in-depth analysis of friction mechanisms and accurate online identification of friction coefficients remain insufficient. In special lubrication environments such as the lunar surface, research on intelligent control reinforced by friction characteristics holds significant importance and offers broad potential for further development.

Accurate calculation and identification of joint frictional torque in robots are crucial for achieving high-precision motion control. However, existing friction models and parameter identification methods still exhibit shortcomings under complex working conditions [21,22]. Current approaches mainly include analytical methods based on empirical models, data-driven methods, disturbance observation methods, and online identification techniques. For instance, Afrough et al. integrated a polynomial empirical friction model into the dynamic identification Jacobian matrix, enabling simultaneous identification of inertial parameters and friction coefficients, thereby improving model accuracy for planar serial robots [23]. Huang et al. adopted a hybrid identification strategy by introducing a Stribeck nonlinear friction model into the iterative least squares algorithm, combined with backpropagation neural networks to optimize torque estimation. This approach enhanced identification accuracy while ensuring physical feasibility [24]. Boschetti et al. proposed a two-step identification process that independently identifies joint friction parameters separate from inertial parameter estimation, utilizing a nonlinear model incorporating Stribeck effects to better characterize low-speed friction behavior [25]. Furthermore, some studies have introduced dynamic friction models such as the Generalized Maxwell Slip (GMS) model for parameter optimization [26], while others have integrated nonlinear friction and hysteresis models into a unified dynamic parameter identification framework to reduce errors from stepwise identification [27].

With increasing demands for precise control, data-driven methods have gained more attention [28]. Dai et al. proposed a physically consistent Gaussian process regression model for joint friction modeling and compensation, embedding Coulomb and viscous friction priors to achieve accurate torque estimation and compensation even under high noise conditions [29]. Hu et al. employed Physics-Informed Neural Networks (PINN) to incorporate friction into the robot dynamic model by combining Lagrangian dynamics with Stribeck friction during training, significantly reducing joint torque prediction errors [30]. Shim et al. developed a unified domain neural network disturbance model combined with traditional white-box friction models to compensate for unmodeled friction across joints, enhancing the robustness of contact force estimation without force/torque sensors [28]. Additionally, Turlapati et al. integrated torque sensors directly into robot joints to measure and identify internal friction and torque ripple, achieving high-precision compensation through linear regression and basis function networks [31]. Observer-based methods such as Extended Kalman Filters have also been used to estimate and compensate for torque disturbances caused by harmonic drive flexibility and friction, improving joint torque control accuracy [31,32].

Although these methods have shown effectiveness under terrestrial room-temperature conditions, they generally fail to account for the critical impact of extreme temperature variations on lubrication in space environments [33,34]. Traditional models often assume friction parameters remain constant or vary only with motion states such as velocity, neglecting characteristic drifts induced by temperature changes [21]. Research indicates that joint temperature rise can alter lubricant viscosity, thereby affecting frictional torque—a relationship describable by a double-exponential law [21]. Tadese et al. experimentally established a temperature-dependent nonlinear friction model, enhancing the accuracy of dynamic parameters under practical operating conditions [33]. Similarly, Li et al. introduced joint temperature and load factors into a comprehensive friction model for collaborative robot identification, employing an improved iteratively reweighted least squares (I-IRLS) algorithm to simultaneously estimate frictional torque and dynamic parameters, reducing errors by over 14% compared to conventional methods [35]. Similarly, Wu et al. proposed an adaptive end-effector buffeting sliding mode control method for heavy-duty robots with long arms, further enhancing robustness under dynamic loads [36].

However, the temperature ranges considered in these studies remain within conventional environments, unable to cover the extreme thermal fluctuations on the lunar surface, ranging from +120 °C to −170 °C. In the vacuum of the Moon, high-low temperature cycles cause drastic changes in lubricant viscosity and rheological properties, leading to significant fluctuations in friction coefficients that exceed the calibration range of existing models. This drift in lubrication conditions undermines the effectiveness of friction compensation and parameter identification, subsequently affecting the precision and reliability of motion control for lunar surface robots. Currently, research on the variation patterns of friction parameters under extreme spatial temperature differentials is still lacking, and existing modeling and identification methods cannot ensure applicability in the lunar environment. There is an urgent need to investigate the mechanisms underlying lubrication condition changes in space to improve the motion control performance of lunar robots under such harsh conditions.

The paper is to address the locomotion instability of lunar quadruped robots caused by extreme temperature-varying joint friction through a novel PPO-based friction identification method and an integrated impedance control framework with friction compensation. In Section 2, an 18-DOF dynamic model of the proposed lunar quadruped robot is derived using d’Alembert’s principle. In Section 3, a PPO reinforcement learning framework for identifying joint friction parameters under space lubrication conditions is established. In Section 4, an integrated locomotion control strategy combining feedback linearization and impedance control with friction compensation is designed. In Section 5, the proposed friction identification and control methods are validated through simulation and experiments on both flat and sloped terrains.

2. Dynamics Model of a Lunar Quadruped Robot

2.1. Mechanical Design of a Lunar Quadruped Robot

To address the challenges of mobile exploration across the complex terrain of the lunar surface, this paper proposes and designs a lunar quadruped robot. Adopting a bio-inspired leg configuration, the platform exhibits high-dynamic locomotion, terrain adaptability, and efficient obstacle-negotiation capabilities, making it suitable for long-distance, multi-task scientific exploration under the low-gravity, soft-soil, and rugged topographical conditions of the Moon. The robot employs a symmetric quadruped layout, with each leg featuring three active degrees of freedom (DoFs) that respectively achieve hip abduction/adduction, hip pitch, and knee pitch motion. The overall structure is compact and highly integrated; while ensuring sufficient structural stiffness and load-bearing capacity, lightweight design and high-performance actuation enable efficient mobility. This section focuses on the design of the robot’s leg mechanism, including configuration selection, kinematic layout, materials and lightweighting strategies, and the design of the joint drive units.

As the core components responsible for both locomotion and support, the leg configurations directly influence the robot’s locomotion performance, terrain adaptability, and energy efficiency. The legs of the proposed quadruped robot adopt a hybrid parallel–serial mechanism, with the main structure based on a Parallel Five-Bar Mechanism (PFBM) for knee actuation, combined with a serial hip abduction/adduction DoF. This design ensures a large workspace and high load-bearing capacity while enabling a concentrated mass distribution of the drive units, significantly reducing the leg’s moment of inertia and benefiting joint dynamic response and walking efficiency.

As illustrated in Figure 1, the robot’s thigh and shank are connected via a set of parallel five-bar linkages, where the actual knee motion is jointly controlled by two active actuators located at the hip. Specifically, the thigh link is connected to the shank link through two parallel rocker arms, forming a closed kinematic chain. This layout relocates all knee-driving motors upward near the hip joint, substantially reducing the mass of the shank and foot-end, thereby lowering the inertial load of the swing leg and improving gait frequency and energy efficiency. In addition, all electrical components are positioned above the hip, avoiding direct exposure of motors, encoders, and wiring to harsh lunar conditions such as dust and extreme temperatures, thus enhancing environmental tolerance and system reliability.

Each leg possesses three active DoFs:

Hip abduction/adduction: Enables leg spreading and retraction in the lateral plane, used to adjust the support polygon, achieve turning, and maintain lateral balance.
Hip pitch: Cooperates with the knee to generate forward and backward leg swinging, serving as the primary source of propulsion.
Knee pitch: Implemented via the PFBM, works in conjunction with hip pitch to accomplish foot-end trajectory tracking, ground contact force control, and terrain adaptation.

The motion range of each joint is optimized to balance workspace requirements with structural interference avoidance: the hip abduction/adduction range is

\pm 90^{\circ}

, the hip pitch range is

- 90^{\circ}

to

30^{\circ}

, and the equivalent knee pitch range is

- 180^{\circ}

to

70^{\circ}

. This configuration allows the robot to perform various locomotion modes on the lunar surface, including large-stride walking, in-place turning, and leg lifting for obstacle negotiation.

Leg geometric parameters directly affect kinematic performance and force transmission efficiency. Considering the lunar low-gravity environment and soil mechanical properties, this paper selects equal lengths of 0.5 m for both the thigh and shank links. This equal-length design maximizes the workspace for a given total leg length and ensures that the hip and knee joints share a similar torque distribution during the stance phase, which is beneficial for actuator load balancing and thermal management. Moreover, equal-length links enhance kinematic symmetry, simplifying gait generation and control algorithms.

In terms of structural materials, the primary load-bearing components of the legs—such as the thigh, shank, and connecting rods—are fabricated from carbon fiber composite materials into hollow thin-walled tubular structures, achieving extreme lightweighting while maintaining adequate stiffness and strength. Specifically, the thigh link weighs 2.1 kg, the shank 1.2 kg, and the foot pad 0.6 kg. Carbon fiber materials offer high specific strength, a low coefficient of thermal expansion, and good vibration damping characteristics, making them well-suited for the large diurnal temperature variations and mechanical vibration environment of the Moon.

To improve energy efficiency during walking and enhance adaptability to ground impacts, a passive linear tension spring (with a stiffness of 6000 N/m and a free length of 0.21 m) is integrated at the knee joint. This spring stores part of the kinetic energy during the stance phase and releases it during the swing phase, helping to reduce joint motor power consumption and achieve tendon-like elastic energy cycling. During continuous walking, the passive spring can smooth joint torque fluctuations, reduce peak actuator loads, and improve overall system efficiency and endurance.

Each active joint is driven by a custom-developed Integrated Drive Unit (IDU), which is compact, lightweight, and capable of high torque output. The IDU comprises an outer-rotor servo motor, a harmonic drive reducer, a torque sensor, an encoder, and an integrated housing, with a total mass of only 3.0 kg and a maximum output torque of 140 N·m. The motor features a high torque density design, with a rated torque of 1.8 N·m and a peak torque of 2.4 N·m; when combined with a harmonic reducer with a transmission ratio of 100, it meets the high-torque and dynamic response requirements of lunar locomotion. The torque sensor provides real-time measurement of joint output torque for force control and ground contact detection, while the high-resolution encoder offers precise joint position feedback to support high-accuracy trajectory tracking.

The robot’s body adopts a semi-monocoque structure combining an aluminum alloy frame with carbon fiber skin, measuring 1.45 m (length) × 1.45 m (width) × 0.4 m (height). The interior integrates the control system, communication module, energy storage battery, and interfaces for scientific payloads. The total mass of the robot is adjustable within the range of 200–360 kg to accommodate different mission configurations. The four legs are symmetrically arranged at the four corners of the body, forming a stable support base. All hip joint actuators are embedded within the body, further lowering the center of mass and enhancing both static and dynamic stability. In summary, through the synergistic design of parallel actuation layout, equal-length link optimization, carbon fiber lightweighting, passive elasticity integration, and high-performance drive units, the robot’s leg mechanism achieves a favorable balance among structural stiffness, motion agility, environmental adaptability, and energy efficiency, laying a solid mechanical foundation for robust mobility across the complex terrain of the lunar surface.

The specific parameters of the lunar quadruped robot are shown in Table 1.

2.2. Dynamic Model of a Lunar Quadruped Robot

2.2.1. Coordinate System Definition

As shown in Figure 2, the body frame

Σ_{B}

is defined with its origin at the lander’s body center. The x-axis of

Σ_{B}

points forward (in the direction of travel) and the z-axis is perpendicular to the upper plane of the body (pointing roughly upward). Each leg i (for

i = 1, \dots, 4

) has an associated leg frame

Σ_{Li}

with origin at the leg’s hip joint (the intersection point of the axes of the three leg joints). In frame

R_{L_{i}}

, the x-axis points from the leg hip toward the body center, and the y-axis is chosen parallel to the z-axis of

Σ_{B}

(so all frames are oriented consistently in the vertical direction). The world frame

Σ_{W}

is a fixed reference frame (e.g., lunar surface frame) that coincides with

Σ_{B}

at the initial time. All coordinate systems are right-handed Cartesian frames.

For each leg, we define the vector of generalized coordinates (active joint angles) as

q_{i} = {[θ_{a_{i}}, θ_{t_{i}},; θ_{s_{i}}]}^{T}

, where

θ_{a_{i}}

is the abduction angle,

θ_{t_{i}}

is the thigh angle, and

θ_{s_{i}}

is the rocker arm angle (related to the knee joint). In the following, since all legs have identical structure, we derive the leg dynamics for a generic leg and omit the leg index i for simplicity.

2.2.2. Body Dynamic Modeling

The Lunar Quadruped Robot is modeled as a central rigid body supported by four legs, as shown in Figure 2. Let

m_{b}

be the total mass of the body, and

I_{b}

its inertia tensor about the body frame origin

O_{b}

. We denote by

r c o m

the vector from

O_{b}

to the body center of mass (COM). For each leg, let

O_{L}

be the origin of its leg frame (coincident with the hip joint location), and

{}^{B}P_{tip}

the vector from

O_{b}

to

O_{L}

. The vector from

O_{b}

to the leg’s tip is denoted

{}^{B}P_{tip}

, and similarly we denote by

{}^{Li}P_{tip}

the vector from the leg frame origin

O_{L}

to the tip (thus

{}^{B}P_{tip} = {}^{B}P_{Li} + {}^{Li}P_{tip}

in coordinates of

Σ_{B}

). When the lander contacts the ground, a ground reaction force

F_{g}

will act on the tip of each contacting leg (pointing upward from the ground on the tip). Each leg contains active joint actuators that behave as tunable spring-damper units (by compliance control), characterized by spring stiffness coefficients

K_{a}, K_{t}, K_{s}

and damping coefficients

B_{a}, B_{t}, B_{s}

for the abduction, thigh, and rocker joints respectively. In addition, each leg has a passive linear spring (in the parallel five-bar mechanism of the knee) with stiffness

K_{p}

.

Using Newton-Euler equations, the translational and rotational equations of motion for the body can be written as:

m_{b} a_{b} = m_{b} g + \sum_{i = 1}^{4} F_{gi}

(1)

\frac{d}{d t} (I_{b} ω_{b}) = \sum_{i = 1}^{4} ((r_{com} + r_{ti}) \times F_{gi})

(2)

where,

a_{b}

and

ω_{b}

are the linear acceleration and angular velocity of the body in

Σ_{B}

, and

g

is the gravitational acceleration vector (in

Σ_{B}

coordinates). The first equation is simply

F = m a

for the body center of mass, stating that the net force (gravity plus all ground contact forces

F_{gi}

on the legs) equals the total mass times acceleration. The second equation is the angular momentum balance about the body’s center of mass (incorporating the gyroscopic term

ω_{b} \times I_{b} ω_{b}

) and states that the sum of moments due to all ground reaction forces about

O_{b}

equals the time rate of change of angular momentum of the body. In these equations,

r_{com}

and

r_{ti}

are expressed in the body frame.

When the lander is in flight (no ground contact),

F_{gi} = 0

. During landing, as soon as a leg’s foot contacts the ground, the corresponding

F_{gi}

becomes nonzero and the leg begins to compress, causing

r_{ti}

(the tip position relative to the body) to change with time. Thus, to solve the above body equations for the landing dynamics, we first need the leg tip trajectories

r_{ti}

, which are obtained by solving the leg dynamic model as derived next.

2.2.3. Model Illustration

Leg mechanism and coordinates: As shown in Figure 2, each leg is a 3-DOF hybrid mechanism consisting of a hip abduction joint (axis roughly vertical), a hip thigh joint (pitch rotation), and a rocker arm that forms part of a parallelogram four-bar linkage for the knee. We label the key link segments as follows:

Link 1 (hip segment): from the hip joint to the thigh joint; its center of mass is $c_{1}$ , located a distance $r_{1}$ from $O_{L}$ along the link.
Link 2 (thigh segment): from the hip pitch joint to the knee mechanism; center of mass $c_{2}$ at distance $r_{2}$ from $O_{L}$ along the thigh.
Link 3 (rocker arm segment): the other side of the four-bar knee mechanism, with center of mass $c_{3}$ at distance $r_{3}$ from $O_{L}$ (when rotated into the plane of leg motion).
Link 4 (connecting link): the link connecting the rocker to the shank in the four-bar mechanism; center of mass $c_{4}$ at distance $r_{4}$ from the rocker–connecting link joint.
Link 5 (shank segment): the lower leg from the four-bar joint to the foot; center of mass $c_{5}$ at distance $r_{5}$ from the thigh–shank (four-bar) joint.

Due to the asymmetric geometry of the rocker arm and shank, the centers of mass

c_{3}

and

c_{5}

are offset from the geometric axes. We denote by

u_{3}

and

u_{5}

the small constant COM offset angles for link 3 and link 5, respectively, measured from their symmetric centerlines. (For the other links 1, 2, 4 which are symmetric, the COM lies on the centerline so no offset angle is needed.) The leg also contains a passive linear spring (part of the knee’s parallel five-bar mechanism) of natural length

L_{p 0}

, connecting points on link 3 (rocker) and link 4 (connecting link). The distance from the rocker–connecting link joint to the two ends of this spring are fixed lengths denoted

L_{a}

and

L_{b}

. As the knee flexes, the spring length

L_{p}

changes accordingly. Finally, let

L_{t}

be the length of the thigh link (distance from hip pitch joint to the knee/four-bar joint) and

L_{s}

the length of the shank link (distance from the four-bar joint to the foot). For convenience in the equations below, we introduce the shorthand notations

c (\cdot) = cos (\cdot)

and

s (\cdot) = sin (\cdot)

.

2.2.4. Kinematic Model

Using the above definitions, we can write the position vectors of the link centers of mass

c_{1}

–

c_{5}

and the foot tip, expressed in the leg’s coordinate frame

Σ_{Li}

. (Recall the leg frame

Σ_{Li}

is oriented such that rotating an initial vector about the x-axis by the abduction angle

θ_{a}

accounts for the out-of-plane angle of the leg.) Taking

O_{L}

as the origin, the COM positions are:

r_{c 1} = {[\begin{matrix} r_{1} & 0 & 0 \end{matrix}]}^{T}

(3)

r_{c 2} = R_{x} (θ_{a}) \cdot {[\begin{matrix} r_{2} c θ_{t} & r_{2} s θ_{t} & 0 \end{matrix}]}^{T} = [\begin{matrix} r_{2} c θ_{t} \\ r_{2} s θ_{t} c θ_{a} \\ r_{2} s θ_{t} s θ_{a} \end{matrix}]

(4)

r_{c 3} = R_{x} (θ_{a}) \cdot {[r_{3} c (θ_{s} - φ_{3}) r_{3} s (θ_{s} - φ_{3}) 0]}^{T} = [\begin{matrix} r_{3} c (θ_{s} - φ_{3}) \\ r_{3} s (θ_{s} - φ_{3}) c θ_{a} \\ r_{3} s (θ_{s} - φ_{3}) s θ_{a} \end{matrix}]

(5)

r_{c 4} = R_{x} (θ_{a}) \cdot [\begin{matrix} L_{a} c θ_{s} + r_{4} c θ_{t} \\ L_{a} s θ_{s} + r_{4} s θ_{t} \\ 0 \end{matrix}] = [\begin{matrix} L_{a} c θ_{s} + r_{4} c θ_{t} \\ (L_{a} s θ_{s} + r_{4} s θ_{t}) c θ_{a} \\ (L_{a} s θ_{s} + r_{4} s θ_{t}) s θ_{a} \end{matrix}]

(6)

r_{c 5} = R_{x} (θ_{a}) \cdot [\begin{matrix} L_{t} c θ_{t} + r_{5} c (θ_{s} + π - φ_{5}) \\ L_{t} s θ_{t} + r_{5} s (θ_{s} + π - φ_{5}) \\ 0 \end{matrix}] = [\begin{matrix} L_{t} c θ_{t} - r_{5} c (θ_{s} - φ_{5}) \\ L_{t} s θ_{t} - r_{5} s (θ_{s} - φ_{5})] c θ_{a} \\ L_{t} s θ_{t} - r_{5} s (θ_{s} - φ_{5})] s θ_{a} \end{matrix}]

(7)

where,

R_{x} (θ_{a})

is the rotation transform matrix.

r_{t i p}

is the vector from

O_{L}

to the tiptoe and can be written as

r_{tip} = R_{x} (θ_{a}) \cdot [\begin{matrix} L_{t} c θ_{t} + L_{s} c (θ_{s} + π) \\ L_{t} s θ_{t} + L_{s} s (θ_{s} + π) \\ 0 \end{matrix}] = [\begin{matrix} L_{t} c θ_{t} - L_{s} c θ_{s} \\ (L_{t} s θ_{t} - L_{s} s θ_{s}) c θ_{a} \\ (L_{t} s θ_{t} - L_{s} s θ_{s}) s θ_{a} \end{matrix}]

(8)

2.2.5. Dynamic Model

We derive the dynamic equations of a single leg of the Lunar Quadruped Robot using d’Alembert’s principle in the form of virtual work. Let the leg’s generalized coordinates be

q = {[θ_{a},; θ_{t},; θ_{s},]}^{T}

, where

θ_{a}

is the hip abduction (or swing) angle,

θ_{t}

is the thigh (upper leg) rotation angle, and

θ_{s}

is the shank (lower leg) rotation angle. These angles uniquely define the leg configuration. We consider infinitesimal virtual displacements

δ θ_{a}

,

δ θ_{t}

, and

δ θ_{s}

in each of these coordinates. Correspondingly, each link j (for

j = 1, \dots, 5

) of the leg experiences a virtual linear displacement

δ r_{c j}

of its center of mass (COM) and a virtual angular displacement

δ θ_{c j}

about its COM.

In applying d’Alembert’s principle, we introduce the inertial force and inertial torque for each link j:

F_{I j} = - m_{j} a_{c j} T_{I j} = - I_{j} α_{c j}

(9)

where,

m_{j}

is the mass of link j,

I_{j}

its moment of inertia about the relevant axis,

a_{c j}

the linear acceleration of its COM, and

α_{c j}

its angular acceleration. The negative signs indicate these inertial forces/torques act opposite to the actual accelerations. The leg is also subjected to applied forces/torques: (i)

T_{a}

,

T_{t}

,

T_{s}

are the torques applied by the actuators at the joints corresponding to

θ_{a}

,

θ_{t}

,

θ_{s}

, respectively (positive in the direction of increasing each coordinate); (ii) a passive linear spring (with an inline damper) in the leg’s parallelogram mechanism exerts an internal spring force

F_{p}

along its length

L_{p}

; and (iii) an external ground reaction force

F_{g} = {[F_{e x}; F_{e y}; F_{e z}]}^{T}

acts on the foot (tip). Using the principle of virtual work, the total virtual work done by all these forces and torques (including inertial forces) must sum to zero for any set of virtual displacements

(δ θ_{a}, δ θ_{t}, δ θ_{s})

. To account for joint friction at the three actuated joints (

θ_{a}

,

θ_{t}

,

θ_{s}

), we introduce friction torques

T_{f a}

,

T_{f t}

,

T_{f s}

at the hip abduction, thigh, and shank joints, respectively. These torques act in opposition to the motion at each joint. Incorporating these into the principle of virtual work adds additional negative work terms for each DOF. Therefore, we can write the equilibrium of virtual work as:

\begin{matrix} F_{p}^{T} δ L_{p} + T_{a}^{T} δ θ_{a} + T_{t}^{T} δ θ_{t} + T_{s}^{T} δ θ_{s} - T_{f a}^{T} δ θ_{a} - T_{f t}^{T} δ θ_{t} - T_{f s}^{T} δ θ_{s} + \dots \\ \dots + \sum_{j = 1}^{5} (F_{Ij}^{T} δ r_{cj} + T_{Ij}^{T} δ θ_{cj}) + F_{g}^{T} δ r_{tip} = 0 \end{matrix}

(10)

Each term in Equation (10) corresponds to the virtual work done by a force or torque: for example,

T_{a} δ θ_{a}

is the work by the hip actuator torque through the virtual rotation

δ θ_{a}

,

F_{Ij}^{T} δ r_{cj}

is the work by the inertial force of link j through its virtual COM displacement, etc. We now proceed to expand each term in detail.

The passive spring (of length

L_{p}

) is connected between the thigh and shank via the parallelogram linkage. Let

L_{a}

and

L_{b}

be the fixed link lengths from the spring’s mounting points to the hip and shank joints, respectively, and let p be a constant geometric offset angle in the spring’s placement. The length

L_{p}

is a function of the joint angles

θ_{t}

and

θ_{s}

(and p). In fact, from the law of cosines one can derive:

L_{p} = \sqrt{[L_{a}^{2} + L_{b}^{2} - 2 L_{a} L_{b} c (θ_{s} - θ_{t})]}

(11)

Differentiating this expression, the virtual change in spring length

δ L_{p}

is obtained as:

δ L_{p} = \frac{L_{a} L_{b}}{L_{p}} s (π + θ_{s} - θ_{t}) \cdot (- δ θ_{t} + δ θ_{s})

(12)

The spring’s virtual work contribution in Equation (10) is then

F_{p} δ L_{p} = F_{p} \frac{L_{a} L_{b}}{L_{p}} s (π + θ_{s} - θ_{t}) \cdot (- δ θ_{t} + δ θ_{s})

(13)

Next we derive

δ r_{c j}

for each link j. For clarity, let us define shorthand notations

c_{x} : = cos x

and

s_{x} : = sin x

for any angle x. All coordinates and vectors are expressed in the leg’s local coordinate frame

Σ_{Li}

(attached at the leg’s base). The kinematic structure of the leg of Lunar Quadruped Robot is as follows: link 1 is the hip housing (rotating about the body with angle

θ_{a}

about a nearly horizontal axis), link 2 is the thigh (rotating by

θ_{t}

relative to the hip), link 3 is a rocker arm in the four-bar knee mechanism (rotating by

θ_{s}

relative to the hip, in concert with the shank), link 4 is the connecting link of the four-bar (pinned between the thigh and rocker), and link 5 is the shank (which holds the foot). Each link j has a known length or offset to its center of mass, denoted

r_{j}

(for

j = 1, \dots, 5

). Additionally, links 3 and 5 have small COM offset angles

u_{3}

and

u_{5}

, respectively (accounting for the fact that their centers of mass are not located exactly along the geometric link line). Links 1, 2, 4 have COMs on their symmetry axis so any offset angle for them is zero or neglected. Using the leg geometry, we can write the position vectors of each link’s COM in the

R_{L}

frame as functions of

θ_{a}

,

θ_{t}

,

θ_{s}

. Differentiating those will give the virtual displacements

δ r_{c j}

. We list the results here:

δ r_{c 1} = {[\begin{matrix} 0 & 0 & 0 \end{matrix}]}^{T}

(14)

δ r_{c 2} = [\begin{matrix} - r_{2} s θ_{t} δ θ_{t} \\ r_{2} c θ_{t} c θ_{a} δ θ_{t} - r_{2} s θ_{t} s θ_{a} δ θ_{a} \\ r_{2} c θ_{t} s θ_{a} δ θ_{t} + r_{2} s θ_{t} c θ_{a} δ θ_{a} \end{matrix}]

(15)

δ r_{c 3} = [\begin{matrix} - r_{3} s (θ_{s} - φ_{3}) δ θ_{s} \\ r_{3} c (θ_{s} - φ_{3}) δ θ_{s} c θ_{a} - r_{3} s (θ_{s} - φ_{3}) δ θ_{a} s θ_{a} \\ r_{3} c (θ_{s} - φ_{3}) δ θ_{s} s θ_{a} + r_{3} s (θ_{s} - φ_{3}) δ θ_{a} c θ_{a} \end{matrix}]

(16)

δ r_{c 4} = [\begin{matrix} - L_{a} s θ_{s} δ θ_{s} - r_{4} s θ_{t} δ θ_{t} \\ (L_{a} c θ_{s} δ θ_{s} + r_{4} c θ_{t} δ θ_{t}) c θ_{a} - (L_{a} s θ_{s} + r_{4} s θ_{t}) s θ_{a} δ θ_{a} \\ (L_{a} c θ_{s} δ θ_{s} + r_{4} c θ_{t} δ θ_{t}) s θ_{a} - (L_{a} s θ_{s} + r_{4} s θ_{t}) c θ_{a} δ θ_{a} \end{matrix}]

(17)

δ r_{c 5} = [\begin{matrix} - L_{t} s θ_{t} δ θ_{t} + r_{5} s (θ_{s} - φ_{5}) δ θ_{s} \\ [L_{t} c θ_{t} δ θ_{t} - r_{5} c (θ_{s} - φ_{5}) δ θ_{s}] c θ_{a} \\ - [L_{t} s θ_{t} - r_{5} s (θ_{s} - φ_{5})] s θ_{a} δ θ_{a} \\ [L_{t} c θ_{t} δ θ_{t} - r_{5} c (θ_{s} - φ_{5}) δ θ_{s}] s θ_{a} \\ + [L_{t} s θ_{t} - r_{5} s (θ_{s} - φ_{5})] c θ_{a} δ θ_{a} \end{matrix}]

(18)

We must also account for the virtual rotations

δ θ_{c j}

of each link’s COM. Each link’s small rotation can be expressed in terms of the virtual changes

δ θ_{a}

,

δ θ_{t}

,

δ θ_{s}

. Importantly,

θ_{a}

is a rotation about the leg’s x-axis, while

θ_{t}

and

θ_{s}

are rotations taking place in the plane that has been rotated by

θ_{a}

. A small rotation of link j by

δ θ_{a}

corresponds to an angular displacement

δ θ

about the x-axis (i.e.,

δ θ_{x} = δ θ_{a}

,

δ θ_{y} = 0

,

δ θ_{z} = 0

in

R_{L}

coordinates). A small rotation by

δ θ_{t}

(of those links that depend on

θ_{t}

, namely links 2 and 4) corresponds to an angular displacement about an axis perpendicular to the x-axis. When

θ_{a} = 0

, the

θ_{t}

rotation axis is along the z-axis of

R_{L}

; for a general

θ_{a}

, that axis is rotated by

θ_{a}

about x, yielding a unit direction vector

(0,, - s θ_{a},; c_{θ_{a}})

in the

R_{L}

frame. Similarly, a small rotation

δ θ_{s}

(for links 3 and 5 which depend on

θ_{s}

) is about an axis initially along the z-axis (when

θ_{a} = 0

) and thus along

(0,, - s_{θ_{a}},; c_{θ_{a}})

after rotation by

θ_{a}

.

Using these observations, we can write the virtual rotation of each link’s COM as a 3-component vector. For example:

\{\begin{matrix} δ θ_{c 1} = {[\begin{matrix} δ θ_{a} & 0 & 0 \end{matrix}]}^{T} \\ δ θ_{c 2} = {[\begin{matrix} δ θ_{a} & - s θ_{a} δ θ_{t} & c θ_{a} δ θ_{t} \end{matrix}]}^{T} \\ δ θ_{c 3} = {[\begin{matrix} δ θ_{a} & - s θ_{a} δ θ_{s} & c θ_{a} δ θ_{s} \end{matrix}]}^{T} \\ δ θ_{c 4} = δ θ_{c 2} \\ δ θ_{c 5} = δ θ_{c 3} \end{matrix}

(19)

Finally, From Equation (8), the virtual displacements of

F_{p}

can be given as

δ r_{tip} = [\begin{matrix} - L_{t} s θ_{t} δ θ_{t} + L_{s} s θ_{s} δ θ_{s} \\ (L_{t} c θ_{t} δ θ_{t} - L_{s} c θ_{s} δ θ_{s}) c θ_{a} - (L_{t} s θ_{t} - L_{s} s θ_{s}) s θ_{a} δ θ_{a} \\ 0 (L_{t} c θ_{t} δ θ_{t} - L_{s} c θ_{s} δ θ_{s}) s θ_{a} + (L_{t} s θ_{t} - L_{s} s θ_{s}) c θ_{a} δ θ_{a} \end{matrix}]

(20)

The above equations can be recognized as the dynamic equilibrium conditions (force/moment balance) for the

θ_{a}

,

θ_{t}

, and

θ_{s}

motions, including all inertial and applied effects. While these expanded equations are correct, it is helpful to express them in a more compact form. We can identify in each equation the contributions from inertial forces (which will be associated with acceleration terms

{\ddot{θ}}_{a}, {\ddot{θ}}_{t}, {\ddot{θ}}_{s}

), from velocity-dependent effects (Coriolis and centrifugal terms, associated with

\dot{h}

products), from gravity and spring forces (potential forces), and from external forces. In fact, The above equations can be written in matrix-vector form as:

M (q) \ddot{q} + C (q, \dot{q}) + G (q) = τ

(21)

where,

M (q) \in R^{3 \times 3}

is the symmetric mass/inertia matrix of the leg,

C (q, \dot{q}), \dot{q} \in R^{3}

is the vector of Coriolis and centrifugal terms,

G (q) \in R^{3}

is the gravity and spring force generalized force vector (including weight of links as well as the passive spring force, which acts like an elastic potential), and

τ \in R^{3}

is the vector of generalized actuator forces (here, the three actuator torques

{[T_{a},; T_{t},; T_{s}]}^{T}

). Equation (21) is the standard form of the leg’s equations of motion. In the absence of external contact force (

F_{g} = 0

),

τ

would equal the actual motor torques required to produce the motion

q (t)

. When an external foot force

F_{g}

is present (such as during landing impact), its effect appears on the right-hand side of Equation (21) as well, typically through an additional term

J^{T} (q) F_{g}

(where J is the foot Jacobian). In other words, the actuator torques must also counteract the external force’s influence.

3. Prediction of Joint Friction Parameters Based on PPO Reinforcement Learning

3.1. Reinforcement Learning Training Framework for Joint Friction Parameter Prediction

In the space environment, joint friction is dynamically influenced by temperature, wear, and lubrication conditions. Traditional offline identification methods struggle to adapt to this time-varying characteristic. This section formulates joint friction parameter prediction as a sequential modeling and optimization problem. By training an agent, it can continuously adjust and ultimately accurately output the joint’s friction parameters based on multi-modal time-series observation data generated during the system’s movement.

The joint friction parameter prediction process is constructed as a Markov Decision Process (MDP). The agent’s goal is to learn an optimal mapping strategy from complex measurement data to explicit physical parameters through interaction with the environment. The state space characterizes the current operating conditions of the joint. To capture the variation law of friction characteristics, time-series observation data are input to construct the state as follows:

s_{t} = \{\begin{matrix} T_{t - k : t} & {\dot{q}}_{t - k : t} & τ_{t - k : t} & P_{t} \end{matrix}\}

(22)

where,

T_{t - k : t}

is the foot-end contact force,

{\dot{q}}_{t - k : t}

is the joint angular velocity,

τ_{t - k : t}

is the joint driving force,

P_{t}

denotes the temperature.

Actions directly correspond to the friction parameters to be predicted. Since friction parameters change continuously, actions are defined in a continuous space, and the agent’s output at each time step is a set of estimated parameters:

a_{t} = \{\begin{matrix} {\hat{p}}_{0} & {\hat{p}}_{1} & {\hat{p}}_{2} \end{matrix}\}

(23)

where,

{\hat{p}}_{0}

,

{\hat{p}}_{1}

and

{\hat{p}}_{2}

are the identified friction parameters.

To ensure the physical interpretability of the prediction results, a physical consistency reward is introduced. To ensure that the reinforcement learning agent converges to physically meaningful parameters, the reward function

r_{t}

is explicitly constructed as a negative exponential mapping of the torque estimation error, combined with a boundary penalty term:

r_{t} = w_{1} \cdot exp (- k_{1} \frac{| τ_{r e s, t} - {\hat{τ}}_{f, t} |}{τ_{n o r m}}) + w_{2} \cdot Φ_{b o u n d}

(24)

where,

{\hat{τ}}_{f, t}

is the friction torque predicted, and

τ_{r e s, t}

is the observed residual torque derived from the dynamic model utilizing measured actuator currents and contact forces. The scaling factor

k_{1}

regulates the sensitivity of the reward to the estimation error, while

τ_{n o r m}

represents a normalization constant based on the rated torque of the joint. The boundary constraint

Φ_{b o u n d}

acts as a penalty term to ensure the identified parameters remain within physically plausible ranges (e.g., ensuring

p_{0} > 0

). The weighting coefficients

w_{1}

and

w_{2}

are utilized to balance accuracy and physical feasibility.

3.2. Joint Friction Model Under Space Lubrication Conditions

Under space low-temperature grease lubrication conditions, harmonic lubrication operates in the mixed lubrication region as shown in Figure 3. The friction torque is related not only to temperature, rotation direction, and rotational speed but also to the output torque. Additionally, there are significant differences between the forward driving state (torque and rotational speed are in the same direction, and the motor does positive work) and the reverse driving state (torque and rotational speed are in opposite directions, and the motor does negative work).

Under the lubrication conditions used in lunar environments, as shown in Figure 4, the joint friction torque is approximately linearly related to rotational speed and load, and non-linearly related to temperature. Therefore, the friction torque model is established as follows:

τ_{f} = p_{0} + p_{1} \cdot ω + p_{2} \cdot τ_{l o a d}

(25)

where,

ω

denotes the joint angular velocity,

τ_{l o a d}

denotes the joint output torque,

τ_{f}

is the joint friction torque.

p_{0}

is the Coulomb friction coefficient,

p_{1}

is the viscous friction coefficient, and

p_{2}

is the load friction coefficient.

3.3. PPO Reinforcement Learning Strategy

To achieve stable and efficient policy updates in the continuous action space, the Proximal Policy Optimization (PPO) algorithm is adopted. Compared with traditional policy gradient methods, PPO can effectively prevent performance collapse caused by excessively large policy update steps. The core of the PPO algorithm is to limit the magnitude of policy updates by clipping the surrogate objective [37]. The objective function is defined as:

L^{CLIP} (θ) = E_{t} [min (\frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{old}} (a_{t} | s_{t})} {\hat{A}}_{t}, clip (\frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{old}} (a_{t} | s_{t})}, 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]

(26)

where,

π_{θ}

is the current policy to be optimized,

π_{θ_{old}}

is the old policy before update,

ϵ

is the clipping hyperparameter,

L_{t}^{CLIP}

is the clipped policy objective, and

{\hat{A}}_{t}

is the advantage function estimate.

During PPO training shown in Figure 5, to balance policy optimization, value function fitting, and explorability, the PPO loss function consists of three components:

L_{t}^{PPO} θ = - {\hat{E}}_{t} [L_{t}^{CLIP} (θ) - c_{1} L_{t}^{V F} (θ) + c_{2} S [π_{θ}] (s_{t})]

(27)

where,

L_{t}^{PPO}

denotes the clipped surrogate objective, used to maximize the cumulative reward;

L_{t}^{V F}

is used to update the critic network to more accurately evaluate the current state value;

S [π_{θ}] (s_{t})

encourages the randomness of the policy distribution to prevent the agent from prematurely converging to a local optimal solution;

c_{1}

and

c_{2}

are hyperparameters.

To balance bias and variance, PPO uses Generalized Advantage Estimation (GAE) to calculate the advantage function:

{\hat{A}}_{t}^{GAE} = \sum_{l = 0}^{\infty} {(γ λ)}^{l} δ_{t + l}, δ_{t} = r_{t} + γ V (s_{t + 1}) - V (s_{t})

(28)

where,

γ

is the environmental discount factor;

λ

is an additional decay parameter introduced by GAE;

r_{t}

is the reward at time t;

V (s_{t + 1})

is the value estimate of the next state; and

V (s_{t})

is the value estimate of the current state. The detailed configuration of PPO hyperparameters, including the learning rate and clipping range, is summarized in Table 2.

4. Locomotion Control of a Lunar Quadruped Robot

4.1. Control Architecture

During lunar surface locomotion, the proposed quadruped robot faces dual challenges of complex terrain and extreme temperature variations. The temperature fluctuations on the lunar surface can alter the lubrication and contact characteristics within its joints, leading to significant nonlinear changes in joint friction [38]. Meanwhile, this robot exhibits high-dimensional, strongly coupled nonlinear dynamic behaviors. Therefore, the friction disturbances induced by environmental temperature variations combined with its inherent nonlinear dynamics constitute critical challenges that must be overcome to achieve compliant motion control for this lunar quadruped system.

To solve the above problems, this paper proposes a motion control framework for the proposed lunar quadruped robot, as shown in Figure 6. The framework comprises four main modules: the pose transformation module, the friction coefficient identification module, the feedback linearization controller, and the impedance controller. The pose transformation module converts the joint state variables measured by sensors into the robot’s body pose information. The friction coefficient identification module utilizes various state information of the quadruped robot fed back by sensors and adopts an online PPO strategy to identify the joint friction coefficients. The feedback linearization controller utilizes the measured joint states, foot–ground contact forces, and the identified joint friction parameters to compensate for nonlinear responses caused by foot contact disturbances, joint friction variations, and inherent dynamic coupling, thereby transforming the nonlinear dynamic model into a linearized model in terms of the foot tracking error in the task space. This forms a synergistic closed loop: the controller executes motion using the identified friction parameters, while the sensor data (joint velocity, torque, contact force, and temperature) generated during operation are fed back to the PPO agent to continuously refine its parameter estimates. Subsequently, the impedance controller calculates the required joint control torques based on the foot position and velocity errors, ensuring compliant and stable locomotion of the lunar quadruped robot.

4.2. Feedback Linearization Control

Based on the full-body dynamic model of the quadruped robot, we decompose the generalized coordinates into the body state variable

q_{u} \in R^{6}

and the joint drive variable

q_{a} \in R^{12}

. The dynamic model can be described as:

[\begin{matrix} M_{uu} & M_{ua} \\ M_{au} & M_{aa} \end{matrix}] [\begin{matrix} {\ddot{q}}_{u} \\ {\ddot{q}}_{a} \end{matrix}] + [\begin{matrix} C_{uu} & C_{ua} \\ C_{au} & C_{aa} \end{matrix}] [\begin{matrix} {\dot{q}}_{u} \\ {\dot{q}}_{a} \end{matrix}] + [\begin{matrix} G_{u} \\ G_{a} \end{matrix}] + [\begin{matrix} 0 \\ {\hat{τ}}_{f} + Δ τ_{f} \end{matrix}] + [\begin{matrix} I & J_{u}^{T} \\ 0 & J_{a}^{T} \end{matrix}] [\begin{matrix} 0 \\ F_{e} \end{matrix}] = [\begin{matrix} 0 \\ τ \end{matrix}]

(29)

where,

M_{uu}

,

M_{ua}

,

M_{au}

,

M_{aa}

denote the inertia terms of the quadruped robot;

C_{uu}

,

C_{ua}

,

C_{au}

,

C_{aa}

represent the Coriolis and centrifugal terms;

G_{u}

,

G_{a}

denote the gravity terms. The nominal joint friction torque

{\hat{τ}}_{f}

can be derived from Equation (25), and

Δ τ_{f}

represents the residual disturbance of the joint friction torque. Furthermore, it is assumed that

∥ Δ τ_{f} ∥_{\infty} \leq ρ

, where

ρ

is a positive constant.

J_{u}^{T}

,

J_{a}^{T}

are the Jacobian matrices corresponding to the foot-end contact force, and

F_{e}

denotes the foot-end contact force. By representing the non-linear terms except for the inertia terms as

H_{u} (q, \dot{q}, F_{e})

and

H_{a} (q, \dot{q}, {\hat{τ}}_{f}, F_{e})

, the acceleration of the base can be expressed as:

{\ddot{q}}_{u} = - M_{uu}^{- 1} (M_{ua} {\ddot{q}}_{a} + H_{u})

(30)

Further, the joint drive torque can be written as:

\bar{M} {\ddot{q}}_{a} + \bar{H} + Δ τ_{f} = τ

(31)

where, the equivalent inertia

\bar{M}

and nonlinear term

\bar{H}

are defined as:

\bar{M} = M_{aa} - M_{au} M_{uu}^{- 1} M_{ua}

(32)

\bar{H} = H_{a} - M_{au} M_{uu}^{- 1} H_{a}

(33)

Based on Equation (31), we design the feedback linearization control law as:

τ = \bar{M} u + \bar{H} + τ_{r}

(34)

where

τ

denotes the joint control input, and

Δ τ_{f}

represents the residual disturbance compensation term for friction torque.

4.3. Multi-Leg Cooperative Impedance Motion Controller

To achieve high-precision trajectory tracking at the foot-end of the quadruped robot while ensuring the robot’s compliance, this paper designs a model-based impedance control method. The specific design process is as follows:

Define the foot-end state of the quadruped robot in the task space as x_B. Based on the Equation (29), the foot-end acceleration can be derived:

{\ddot{x}}_{B} = J_{u} {\ddot{q}}_{u} + J_{a} {\ddot{q}}_{a} + {\dot{J}}_{u} {\dot{q}}_{u} + {\dot{J}}_{a} {\dot{q}}_{a}

(35)

Substitute Equation (30) into the above equation, and let

{\ddot{q}}_{a} = u

. After rearrangement, we get:

{\ddot{x}}_{B} = (J_{a} - J_{u} M_{uu}^{- 1} M_{ua}) u + ({\dot{J}}_{u} {\dot{q}}_{u} + {\dot{J}}_{a} {\dot{q}}_{a} - J_{u} M_{uu}^{- 1} H_{u})

(36)

Let

J_{e q} = J_{a} - J_{u} M_{uu}^{- 1} M_{ua}

and

η_{e q} = {\dot{J}}_{u} {\dot{q}}_{u} + {\dot{J}}_{a} {\dot{q}}_{a} - J_{u} M_{uu}^{- 1} H_{u}

. Then the foot-end acceleration of the quadruped robot can be expressed as:

{\ddot{x}}_{B} = J_{e q} u + η_{e q}

(37)

Define the foot-end state error

e = x_{Bd} - x_{B}

, where

x_{Bd}

is the desired trajectory. The impedance control law is then given by:

{\ddot{x}}_{B} = {\ddot{x}}_{Bd} + M_{d}^{- 1} (D_{d} \dot{e} + K_{d} e)

(38)

where,

M_{d}

,

D_{d}

and

K_{d}

are the virtual inertia, virtual damping and virtual stiffness matrices, respectively, and all are designed as positive definite symmetric matrices. Assume the robot works in a non-singular workspace, so

J_{e q}

is full-rank and invertible. Further, the joint-space impedance control law of the quadruped robot can be derived as:

u = J_{e q}^{- 1} [{\ddot{x}}_{Bd} + M_{d}^{- 1} (D_{d} \dot{e} + K_{d} e) - η_{e q}]

(39)

In addition, the compensation for residual disturbances of friction torque can be designed as:

τ_{r} = \bar{M} J_{e q}^{- 1} K \cdot tanh (\frac{s}{Φ})

(40)

where,

K

represents the gain matrix,

Φ

denotes the smoothing parameter, and

s

represents the sliding mode surface of the quadruped robot in the task space. Its expression is given as:

s = \dot{e} + M_{d}^{- 1} D_{d} e + M_{d}^{- 1} K_{d} \int e d t

(41)

The joint driving torque of the quadruped robot is:

τ = \bar{M} J_{e q}^{- 1} [{\ddot{x}}_{Bd} + M_{d}^{- 1} (D_{d} \dot{e} + K_{d} e) - η_{e q}] + \bar{H} + τ_{r}

(42)

Substituting Equation (42) into Equation (34), the closed-loop system dynamics can be expressed as:

M_{d} \ddot{e} + D_{d} \dot{e} + K_{d} e = M_{d} J_{e q} {\bar{M}}^{- 1} (Δ τ_{f} - τ_{r})

(43)

4.4. Stability Analysis

To verify the stability and robustness of the proposed multi-leg cooperative impedance control law, this paper provides a proof based on Lyapunov stability theory. Before proceeding with the stability proof, the following assumptions are made:

Assumption 1.

The actuators possess sufficient output power, and the system state variables always remain within the set

Ω = \{τ \in R^{12} ∣ |τ_{i}| \leq τ_{max}\}

. Within this region, the driving torques calculated by the controller satisfy the physical output constraints of the actuators, where

τ_{max}

denotes the maximum output torque of the joint motors.

The selected Lyapunov function is given as:

V = \frac{1}{2} {\dot{e}}^{T} M_{d} \dot{e} + \frac{1}{2} e^{T} K_{d} e + \frac{1}{2} s^{T} M_{d} s

(44)

where, the first term represents the generalized kinetic energy related to the base velocity error, the second term represents the generalized potential energy, and the third term is a quadratic component of the sliding mode variable. Since both

M_{d}

and

K_{d}

are symmetric positive-definite matrices, it is known that V is a positive-definite function. Differentiating the above equation and substituting Equations (41) and (43) yields:

\begin{matrix} \dot{V} & = {\dot{e}}^{T} M_{d} \ddot{e} + {\dot{e}}^{T} K_{d} e + s^{T} M_{d} \dot{s} \\ = {\dot{e}}^{T} [- D_{d} \dot{e} - K_{d} e + M_{d} J_{e q}^{- 1} {\bar{M}}^{- 1} (Δ τ_{f} - τ_{r})] + {\dot{e}}^{T} K_{d} e \\ + s^{T} (M_{d} \ddot{e} + D_{d} \dot{e} + K_{d} e) \\ = - {\dot{e}}^{T} D_{d} \dot{e} + {(\dot{e} + s)}^{T} M_{d} J_{e q}^{- 1} {\bar{M}}^{- 1} (Δ τ_{f} - τ_{r}) \end{matrix}

(45)

If we define

d = M_{d} J_{e q} {\bar{M}}^{- 1} Δ τ_{f}

, it can be inferred that there exists a positive scalar D such that

{∥ d ∥}_{\infty} \leq D

. Furthermore, by substituting Equation (40) into Equation (45) and applying the scaling principle, we obtain:

\dot{V} \leq - {\dot{e}}^{T} D_{d} \dot{e} - ∥ \dot{e} + s ∥ [λ_{min} (M_{d}^{- 1} K) - D]

(46)

where,

D_{d}

is a symmetric positive-definite matrix, the quadratic term

D_{d}

is always non-positive. If the control gain matrix

K

satisfies the constraint

λ_{min} (M_{d} K) > D

, then for any non-zero error state,

\dot{V} \leq 0

is always satisfied. According to LaSalle’s Invariance Principle,

\dot{V} = 0

if and only if

\dot{e} = 0

and

s = 0

. From Equation (41), it can be further derived that

e \equiv 0

. Therefore, the quadruped robot control system achieves global asymptotic stability in the lunar surface operating environment.

5. Simulation and Experiments

5.1. Simulation and Experimental Prototype

This study comprehensively validates the proposed dynamic model and multi-leg coordinated impedance control algorithm for a lunar quadruped robot through a combination of virtual prototype co-simulation and physical prototype experiments. In the virtual prototype co-simulation phase, a virtual model of the lunar quadruped robot is constructed based on multi-body dynamics (MBD) software (MATLAB/Simulink with Simscape Multibody R2025a), and the control algorithm is implemented in the same environment and integrated into the co-simulation framework to realize the multi-leg coordinated impedance control strategy. The MBD software is responsible for solving the dynamic responses of the multi-rigid and multi-flexible body system, while the control algorithm provides the required driving torques for each joint to maintain the robot’s dynamic stability. The two components complete closed-loop verification through real-time interaction: the MBD software transmits key information such as the robot’s center-of-mass position, motion state, and foot–ground contact forces to the control algorithm, which then feeds back the calculated joint torques to the virtual model to enable simulation-based evaluation of control effectiveness.

The physical experimental system consists of the lunar quadruped robot, a host computer, a motion capture system (3D accuracy:

\pm 0.1

mm, maximum frame rate: 1000 FPS), an inertial measurement unit (IMU, gyroscope bias:

1^{\circ} / h

, angular random walk

⩽ {0.125}^{\circ} / \sqrt{h}

), acceleration sensors (bias stability:

10 μ g

), a six-axis force/torque sensor, simulated lunar soil terrain, and a rail-suspended low-gravity unloading platform, as shown in Figure 7. The overall experimental area measures at least 10 m × 10 m, providing sufficient space for the robot to perform continuous walking trials over extended distances. The simulated lunar soil terrain is prepared within a dedicated test bed of dimensions 3 m × 3 m; this test bed is mounted on casters and can be repositioned according to the walking path, effectively extending the usable terrain area. The host computer runs the multi-leg coordinated impedance control algorithm, which generates joint torque commands to drive the motors and achieve precise motion control of the robot. The robot is equipped with an IMU to monitor attitude changes, and optical markers attached to its surface are tracked in real time by the motion capture system to obtain global pose information, forming a closed-loop feedback to maintain system stability. This study validates the locomotion performance of a lunar quadruped robot on simulated lunar soil through forward-motion experiments conducted under two terrain conditions: a horizontal surface and an

8^{\circ}

slope, as shown in Figure 8.

To simulate the low-gravity environment of the lunar surface, a rail-suspended low-gravity unloading platform is used in the experiments. This platform is composed of an aluminum profile frame, low-friction horizontal sliding rails, a steel wire suspension system, and counterweight blocks, which together provide gravity compensation for the robot.

Based on the lunar polar soil parameters listed in Table 3, the simulated lunar soil used in the experiments is precisely prepared and mechanically calibrated to ensure that its physical properties match those of real lunar soil. This terrain simulates the soft and uneven mechanical characteristics of the lunar surface, which significantly influence the robot’s sinkage behavior, traction performance, and overall locomotion stability, thereby providing highly realistic experimental conditions for validating the foot–ground interaction model and the control algorithm.

5.2. Friction Parameter Identification and Analysis

To accurately characterize the friction characteristics of the robot joint under different operating conditions, the key parameters of the joint friction model are identified based on the PPO algorithm. The state space for the algorithm consisted of the foot-end contact force, joint angular velocity, and joint driving torque. The output action was three consecutive friction parameters: the Coulomb friction coefficient

p_{0}

, the viscous friction coefficient

p_{1}

, and the load correlation coefficient

p_{2}

. Identification experiments were conducted over a wide temperature range from −20 °C to 120 °C, with tests performed separately for both forward and reverse driving modes of the joint. The fitting plane of joint friction torque shows in Figure 9.

The results are summarized in Table 4 and Table 5. Analysis of the identification results reveals significant patterns in how the friction parameters vary with temperature and driving direction:

Temperature Dependence of Coulomb Friction Torque: The Coulomb friction exhibits strong temperature sensitivity. The value shows a monotonic decreasing trend with increasing temperature, observed in both forward and reverse driving modes. Under forward drive, drops sharply from approximately 34.09 N·m at −20 °C to about 3.36 N·m at 120 °C; a similar decrease is seen in reverse drive from 53.42 N·m to 7.00 N·m. Which suggests that increased lubricant viscosity or material contraction at low temperatures enhances static friction effects, while high temperatures reduce the shear resistance at the friction interface.

Complex Behavior of Viscous Friction Coefficient: The viscous coefficient demonstrates more complex behavior. In most forward drive conditions and high-temperature reverse drive conditions, which is positive, aligning with the classic viscous friction model. However, under reverse drive at low temperatures particularly from −20 °C to 10 °C, which is consistently negative. This atypical phenomenon may be related to the rheological properties of the lubricant in specific temperature ranges, directional resistance from seals, or coupling effects within the identification model at low velocities, revealing the nonlinear nature of friction dynamics. Under reverse drive, the viscous coefficient

p_{1}

increases as temperature rises, transitioning from negative to positive values and reaching a peak within the 50–90 °C range. Although the temperature continues to increment to 120 °C, the value of

p_{1}

subsequently exhibits a slight decrease from its maximum level.

Relative Stability of Load Correlation Coefficient: Compared to

p_{0}

and

p_{1}

, the load coefficient

p_{2}

shows relatively smaller variation across different temperatures and driving directions, with values primarily distributed between 0.2 and 0.4. Which indicates that the proportion of friction contribution from the load (transmitted through the mechanical structure, such as via foot-end contact force) is relatively stable and not significantly affected by temperature. However, a notable decrease in

p_{2}

(to approximately 0.106) is observed under reverse drive at 120 °C, suggesting that extreme high temperatures may alter the load distribution mechanism within the friction pair.

Asymmetry with Driving Direction: Comparing the data for forward and reverse drive reveals a clear asymmetry in friction characteristics. At the same temperature, the identified Coulomb friction

p_{0}

under reverse drive is generally higher than under forward drive, especially in the low-temperature range. The sign and magnitude of the viscous coefficient

p_{1}

also differ significantly between directions, further confirming directional differences inherent in the joint transmission system. This asymmetry must be considered when constructing a high-precision joint friction model.

The friction parameter identification based on the PPO algorithm successfully captured the dynamic evolution of joint friction with temperature and driving direction. Coulomb friction decreases with rising temperature, viscous friction exhibits nonlinear and even non-classical negative damping behavior, and the load-dependent component remains relatively stable. The significant parameter differences between forward and reverse drive highlight the necessity for a direction-dependent friction model.

5.3. Planar Autonomous Walking Test and Validation

In the simulation part, a typical lunar obstacle terrain containing bumps and trenches was constructed to comprehensively verify the obstacle-crossing capability and locomotion stability of the quadruped robot, as shown in Figure 10. The experiment simulated the continuous process of the robot sequentially crossing a bump and a trench. Based on multi-body dynamics simulation, key data such as body motion, foot-end trajectories, and joint driving torques were obtained to evaluate its adaptability to rough terrain and the effectiveness of the motion control algorithm.

The robot’s height variation curve (Figure 11A) shows that during the bump-crossing phase, the body height first increases, then stabilizes, and finally decreases, corresponding to the gait sequence of the front foot stepping onto the bump, the body lifting, and the hind foot following. The height changes continuously and smoothly without significant sudden jumps or oscillations, indicating that the robot has effective height adjustment capability when dealing with elevation-type obstacles. During trench crossing, the height curve exhibits a characteristic of first decreasing and then rising, corresponding to the phases of the front foot descending to touch the bottom, the body lowering, the hind foot providing propulsion, and the body lifting. Throughout the entire obstacle-crossing process, the fluctuation of the body height is kept within a reasonable range, and the robot can quickly return to the reference height after the terrain recovers, demonstrating good terrain adaptability and posture recovery stability.

From the Euler angle variation curves (Figure 11B), it can be seen that the roll, pitch, and yaw angles all generate corresponding dynamic responses during obstacle crossing, with the pitch angle showing the most significant change. When crossing the bump, the pitch angle increases positively as the front foot lifts, causing the body to lean forward; as the center of mass moves past the obstacle, the angle gradually swings back, changing smoothly with a controllable amplitude, indicating that the robot can effectively maintain balance during sudden longitudinal terrain changes. The roll angle varies within a small range (within

\pm 4^{\circ}

), mainly responding to slight lateral tilting caused by coordinated leg movements, which reflects the system’s good lateral stability. The yaw angle remains near zero throughout the process without obvious heading deviation, verifying the effectiveness of gait planning and foot-end control in maintaining the traveling direction. Overall, the Euler angle variations of the robot in continuous obstacle terrain are limited in amplitude and smooth in transition, reflecting the good performance of the multi-leg coordinated impedance control in maintaining the body posture.

The foot-end trajectories (Figure 12) indicate that each leg can adjust its gait parameters in real time according to terrain features. The front legs extend the leg-lifting phase when crossing the bump and increase the swing time to achieve precise foot placement when crossing the trench; the vertical trajectories of the foot ends are smooth without sudden jumps, demonstrating the effective suppression of landing impact and slippage by the impedance control. The joint torque data (Figure 13) further support this conclusion: the hip and knee joint torques all exhibit corresponding peaks during key obstacle-crossing phases, responding timely and changing smoothly, which shows that the control system can adjust its output in real time based on foot–ground interaction forces, balancing obstacle-crossing performance and locomotion smoothness.

In summary, the simulation results show that the designed lunar quadruped robot exhibits reliable obstacle-crossing stability, coordinated gait adaptability, and effective force interaction control capability in continuous obstacle terrain, providing a basis for its engineering application in real lunar surface environments.

To verify the effectiveness of the proposed impedance control strategy on the physical prototype, a plane locomotion test of the lunar quadruped robot was designed. The trot gait was adopted in the test, and the desired trajectory of the foot-end was generated by Bézier curves to ensure motion continuity and reduce impact. Figure 14, Figure 15 and Figure 16 show the experimental data curves of foot-end position tracking, joint torque response, and foot-end contact force, respectively.

The trajectory tracking results for each leg of the quadruped robot under various control methods are shown in Figure 14. Therein,

{ID}_{x}

,

{ID}_{y}

, and

{ID}_{z}

represent the control outcomes based on the PPO identification strategy friction model, while

C_{x}

,

C_{y}

, and

C_{z}

denote the results based on the Coulomb friction model.

{Cmd}_{x}

,

{Cmd}_{y}

, and

{Cmd}_{z}

refer to the desired foot-end trajectory curves. Furthermore, the

X

direction corresponds to the robot’s forward direction, the Y direction represents the lateral offset, and the

Z

direction indicates the lifting and landing of the legs. As clearly observed from the figure, within a 50-s continuous walking cycle, the actual foot-end position curves produced by the proposed control method fit the desired trajectories closely. No significant drift or accumulated errors occurred in the

X

,

Y

, or

Z

directions. In contrast, the control method relying solely on the Coulomb friction model exhibited prominent lag and offset issues. This fully demonstrates the significant advantages of the proposed control method in the trajectory tracking control of quadruped robot foot-ends.

A quantitative analysis was conducted on the planar walking foot-end tracking errors for both control methods, the results of which are presented in Table 6 and Table 7. Regarding the forward motion (

X

direction), the RMSE for the four legs remains at a high level between

28.25 mm

and

32.60 mm

when only the traditional Coulomb friction model is considered, which reflects the limitations of a single model in handling forward motion loads. By contrast, after adopting the proposed method, the RMSE in the forward direction significantly contracts to a range of

12.47 mm

to

18.55 mm

, indicating that the trajectory tracking precision of the quadruped robot in the forward direction has achieved a remarkable improvement along with enhanced gait repeatability. In the lateral offset direction (

Y

direction), the RMSE generated by the traditional Coulomb model fluctuates between

26.29 mm

and

28.65 mm

, leading to visible lateral deviation of the foot-end during the walking process; however, the RMSE of the proposed method in this direction is between

4.31 mm

and

6.94 mm

, fully demonstrating the superior performance of this method in maintaining lateral stability and effectively suppressing the body’s swaying during locomotion. For the vertical motion direction (

Z

direction), if only the traditional Coulomb friction model is used, the RMSE for each leg in the vertical direction exceeds

38 mm

, with the maximum reaching

41.04 mm

. With the proposed method, the RMSE of the legs in the vertical direction is robustly maintained within the interval of

18.05 mm

to

21.04 mm

. Consequently, the results confirm that the proposed scheme is significantly superior to control methods relying solely on the Coulomb friction model and effectively satisfies the requirements of trajectory tracking control for the quadruped robot.

The torque tracking curves of each joint of the four legs of the quadruped robot are shown in Figure 15, including the torque responses of the hip roll joint, hip pitch joint, and knee pitch joint. It can be seen from the figure that during the entire walking cycle, the actual torque curves of each joint are highly consistent with the desired torque curves, capable of real-time tracking the dynamic changes of the desired torque without obvious phase lag or amplitude deviation.

The quantitative error analysis results in Table 8 show that the RMSE of the hip roll joint torque is 1.6–3.2 N·m, the hip pitch joint is 4.3–5.5 N·m, and the knee pitch joint is 2.8–3.8 N·m. Among them, the hip roll joint has the smallest torque error, indicating that this joint has the fastest control response speed and the strongest stability, which is crucial for maintaining the lateral balance of the robot. The hip pitch joint needs to support the body weight and drive the leg swing, resulting in a relatively large torque fluctuation range, but the RMSE is still controlled within 5.5 N·m, reflecting the robustness of the controller in handling dynamic changes of large loads. Overall, the high tracking accuracy and low error level of each joint torque fully prove the stability and reliability of the control system at the torque output level, providing a solid power guarantee for the stable walking of the robot.

The change curves of the foot-end contact force of the quadruped robot are shown in Figure 16. The contact force of each leg shows an obvious peak during the support phase and approaches zero during the swing phase, with smooth rising and falling processes and no obvious severe impact oscillation.

The statistical analysis results in Table 9 show that the average peak contact force of the four legs ranges from 89.71 N to 165.72 N. Among them, leg 4 has the largest peak contact force, which is directly related to the higher proportion of body weight it bears during the support phase. This may be caused by a slight offset of the robot’s center of gravity or local differences in the ground friction coefficient. At the same time, the standard deviation of the peak contact force of each leg is 67.31 N to 105.23 N, reflecting that the fluctuation degree of the contact force in different gait cycles is within a reasonable range, indicating that the walking state of the robot has good consistency and repeatability. The results show that the foot-end obtains sufficient ground reaction force to support the body, while achieving compliant interaction with the ground, fully verifying the effectiveness of the impedance control strategy in the flat-ground walking task.

5.4. Slope Autonomous Walking Test and Validation

To further verify the terrain adaptability of the proposed control strategy in unstructured terrain, a walking experiment of the quadruped robot on an 8° slope was carried out. The trot gait was still adopted in the experiment, and the desired trajectory of the foot-end was generated by Bezier curves to ensure motion continuity and reduce impact. Figure 17, Figure 18 and Figure 19 show the experimental data curves of foot-end position tracking, joint torque response, and foot-end contact force, respectively.

The experimental results regarding the foot-end trajectory tracking of the quadruped robot under the more complex operating condition of slope walking are illustrated in Figure 17. Under the disturbance of gravity components inherent to the slope environment, the trajectory curves corresponding to the traditional Coulomb friction model demonstrate prominent magnitude deviations and phase lags. Conversely, the curves derived from the PPO-identified parameter model remain closely aligned with the desired commands across the entire locomotion cycle. This level of robustness, sustained throughout the dynamic evolution of the gait, serves as an intuitive reflection of the proposed method’s capability to suppress complex non-linear disturbances in non-horizontal terrain. Furthermore, such performance guarantees that the foot-ends of each leg are capable of precisely reproducing the preset gait patterns even under challenging environmental constraints.

As shown in Table 10 and Table 11, by comparing the Root Mean Square Error (RMSE) values from the experimental records, in the forward direction (X-axis), when only the traditional Coulomb friction model is considered for compensation, the RMSE of the four legs generally ranges from 28.55 mm to 32.90 mm. This indicates that the slope gradient further exacerbates the displacement deviation in the longitudinal dimension. In contrast, after adopting the identification strategy proposed in this paper, the error values of leg 1 to leg 4 in this dimension significantly contract to the range of 12.99 mm to 20.20 mm. This result provides strong support for the motion accuracy of the robot in the direction of travel, ensuring the reliable execution of step-length planning.

Regarding the lateral stability in the Y-direction, the RMSE of the traditional Coulomb friction model in this dimension stabilizes between 25.11 mm and 25.70 mm, reflecting that the traditional model struggles to maintain the lateral centering of the foot-end under the influence of slope force components. With the introduction of the PPO identification strategy, the RMSE of the four legs in the Y-direction is markedly suppressed to between 3.63 mm and 8.26 mm. This fully demonstrates the superior robustness of the proposed method in dealing with lateral disturbances on slopes, effectively eliminating the transverse swaying of the body during the walking process and maintaining excellent heading stability.

In the vertical motion dimension (Z-axis), affected by the instantaneous impact of ground contact on sloped terrain, the RMSE produced by the Coulomb model on all four legs exceeds 37 mm, with a peak value of 38.25 mm. The model based on PPO identification parameters exhibits strong convergence under the same working conditions, with its error in the Z-direction stabilizing within the range of 17.67 mm to 20.60 mm. In the more challenging task of slope walking, the proposed scheme meets the practical requirements for high-precision trajectory tracking of quadruped robots in complex terrain.

The torque tracking curves of each joint of the four legs of the quadruped robot are shown in Figure 18, including the torque responses of the hip roll joint, hip pitch joint, and knee pitch joint. It can be seen from the figure that during the entire walking cycle, the actual torque curves of each joint are highly consistent with the desired torque curves, capable of real-time tracking the dynamic changes of the desired torque without obvious phase lag or amplitude deviation.

According to the error analysis results in Table 12, the RMSE of each joint torque is controlled at a low level: the RMSE of the hip roll joint torque is 1.36–3.33 N·m, the hip pitch joint is 4.27–5.79 N·m, and the knee pitch joint is 2.94–3.97 N·m. Among them, the hip roll joint has the smallest torque error, indicating that this joint has the fastest control response speed and the strongest stability, which is crucial for maintaining the lateral balance of the robot on slopes. The hip pitch joint needs to bear a larger gravity component and drive the legs to adapt to the slope angle, resulting in a relatively large torque fluctuation range, but the RMSE is still controlled within 5.79 N·m, reflecting the robustness of the controller in handling the additional load caused by slopes.

Figure 19 shows the change of the foot-end contact force of each leg with time. Combined with the statistical analysis results of the contact force in Table 13, it can be found that the average peak contact force of each leg is distributed between 90.79 N and 124.98 N. Among them, leg 1 has the highest value of 124.98 N, and leg 2 has the lowest value of 90.79 N, reflecting the difference in load distribution among different legs during the support phase when climbing slopes. At the same time, the standard deviation of the peak contact force also shows a certain fluctuation range: leg 1 is 96.03 N, and leg 4 is 87.63 N, indicating that these legs bear relatively large dynamic impacts. However, overall, the contact force curve maintains good periodicity without obvious slipping or lifting phenomena, which means that the foot-end always maintains effective contact with the ground. It shows that the quadruped robot based on impedance control can achieve stable climbing motion in complex terrain.

6. Conclusions

This paper uses the PPO reinforcement learning method to identify joint friction parameters of a quadruped robot and designs an impedance motion control algorithm integrating joint friction compensation, foot–ground contact force compensation, and dynamics-based feedback linearization, which can address the robot’s motion instability caused by lunar-surface temperature variations that intensify joint friction nonlinearities. The joint friction of the robot is significantly affected by the rotation direction and exhibits pronounced asymmetry; its influence on the Coulomb friction coefficient is the most evident, followed by its influence on the viscous friction coefficient. Because the shear resistance at the friction interface decreases as the temperature increases, the Coulomb friction coefficient decreases accordingly. The viscous friction coefficient shows complex dynamic variations and is influenced by multiple factors such as lubricant rheological properties, temperature, and joint rotation direction. The load-related coefficient is relatively stable and is not significantly affected by temperature. The flat and slope locomotion tests show that the mean peak contact force of the foot within a gait cycle can reach 165.72 N, the foot position tracking RMSE is 21.04 mm, and the RMSE of the joint torque is still controlled within 5.79 N·m. Experimental results demonstrate that the impedance motion control method considering joint-friction compensation can ensure stable tracking of foot position and joint torque when the legs experience large dynamic impacts.

Although the proposed lunar-surface quadruped locomotion method in this paper is based on our self-developed prototype design, the modeling and analysis methods are general and are also applicable to other lunar exploration mechanisms. In future work, obstacle-avoidance motion planning and intelligent control methods will be investigated based on the quadruped robot impedance motion control method designed in this paper.

Author Contributions

Conceptualization, J.L. and S.S.; methodology, J.L.; software, J.L.; validation, J.L., L.C. and Z.L.; formal analysis, J.L. and W.Z.; investigation, J.L. and W.Z.; resources, J.L. and W.Z.; data curation, J.L. and W.Z.; writing—original draft preparation, J.L.; writing—review and editing, J.L.; visualization, J.L.; supervision, J.L.; project administration, J.L.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Civil Aerospace Technology Advanced Research Program during the 14th Five-Year Plan Period (No. D010107), the National Natural Science Foundation of China (No. U22B2080), and the Heilongjiang Provincial Natural Science Foundation of China (No. JJ2024LH0935).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Miki, T.; Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 2022, 7, eabk2822. [Google Scholar] [CrossRef]
Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 2020, 5, eabc5986. [Google Scholar] [CrossRef] [PubMed]
Scholl, P.; Iskandar, M.; Wolf, S.; Lee, J.; Bacho, A.; Dietrich, A.; Albu-Schäffer, A.; Kutyniok, G. Learning-based adaption of robotic friction models. Robot. Comput.-Integr. Manuf. 2024, 89, 102780. [Google Scholar] [CrossRef]
Pagani, R.; Legnani, G.; Incerti, G.; Gheza, M. Evaluation and modeling of the friction in robotic joints considering thermal effects. J. Mech. Robot. 2020, 12, 021108. [Google Scholar] [CrossRef]
Hong, S.; Um, Y.; Park, J.; Park, H.W. Agile and versatile climbing on ferromagnetic surfaces with a quadrupedal robot. Sci. Robot. 2022, 7, eadd1017. [Google Scholar] [CrossRef]
Ding, L.; Gao, H.; Deng, Z.; Song, J.; Liu, Y.; Liu, G.; Iagnemma, K. Foot–terrain interaction mechanics for legged robots: Modeling and experimental validation. Int. J. Robot. Res. 2013, 32, 1585–1606. [Google Scholar] [CrossRef]
Li, Q.; Qian, L.; Wang, S.; Sun, P.; Luo, X. Towards generation and transition of diverse gaits for quadrupedal robots based on trajectory optimization and whole-body impedance control. IEEE Robot. Autom. Lett. 2023, 8, 2389–2396. [Google Scholar] [CrossRef]
Wang, X.; Niu, B.; Zhao, X.; Zong, G.; Cheng, T.; Li, B. Command-Filtered Adaptive Fuzzy Finite-Time Tracking Control Algorithm for Flexible Robotic Manipulator: A Singularity-Free Approach. IEEE Trans. Fuzzy Syst. 2024, 32, 409–419. [Google Scholar] [CrossRef]
Niu, B.; Zhao, X.; Gao, Y.; Li, S.; Sui, J.; Wang, H. Adaptive Fixed-Time Event-Triggered Consensus Tracking Control for Robotic Multiagent Systems. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 7238–7246. [Google Scholar] [CrossRef]
Ma, X.; Liu, J. Adaptive contact-force control and vibration suppression for multi flexible manipulators with unknown control directions and time-varying actuator faults. Mech. Syst. Signal Process. 2025, 228, 112441. [Google Scholar] [CrossRef]
Lu, G.; Chen, T.; Rong, X.; Zhang, G.; Bi, J.; Cao, J.; Jiang, H.; Li, Y. Whole-body motion planning and control of a quadruped robot for challenging terrain. J. Field Robot. 2023, 40, 1657–1677. [Google Scholar] [CrossRef]
Meduri, A.; Shah, P.; Viereck, J.; Khadiv, M.; Havoutis, I.; Righetti, L. Biconmp: A nonlinear model predictive control framework for whole body motion planning. IEEE Trans. Robot. 2023, 39, 905–922. [Google Scholar] [CrossRef]
Nie, Y.; Li, X. Antidisturbance Distributed Lyapunov-Based Model Predictive Control for Quadruped Robot Formation Tracking. IEEE Trans. Ind. Electron. 2025, 72, 10359–10369. [Google Scholar] [CrossRef]
Li, B.; Zhang, W.; Huang, X.; Zhu, L.; Ding, H. A Whole-Body Disturbance Rejection Control Framework for Dynamic Motions in Legged Robots. IEEE Robot. Autom. Lett. 2025, 10, 9774–9781. [Google Scholar] [CrossRef]
Morlando, V.; Teimoorzadeh, A.; Ruggiero, F. Whole-body control with disturbance rejection through a momentum-based observer for quadruped robots✩. Mech. Mach. Theory 2021, 164, 104412. [Google Scholar] [CrossRef]
Gao, Y.; Wei, W.; Wang, X.; Wang, D.; Li, Y.; Yu, Q. Trajectory tracking of multi-legged robot based on model predictive and sliding mode control. Inf. Sci. 2022, 606, 489–511. [Google Scholar] [CrossRef]
Elobaid, M.; Turrisi, G.; Rapetti, L.; Romualdi, G.; Dafarra, S.; Kawakami, T.; Chaki, T.; Yoshiike, T.; Semini, C.; Pucci, D. Adaptive Non-Linear Centroidal MPC With Stability Guarantees for Robust Locomotion of Legged Robots. IEEE Robot. Autom. Lett. 2025, 10, 2806–2813. [Google Scholar] [CrossRef]
Liu, K.; Gu, J.; He, X.; Jia, J. Safety-critical motion optimization for quadruped robots on offshore platforms: A hierarchical nonlinear model predictive control framework based on foothold optimization and control barrier function. Control Eng. Pract. 2025, 165, 106559. [Google Scholar] [CrossRef]
Feng, W.; Wang, Z.; Xu, H.; Zhou, Y.; He, B.; Dong, C. Multiple gait locomotion generation for quadruped robots based on trajectory planning and reinforcement learning. Control Eng. Pract. 2025, 165, 106536. [Google Scholar] [CrossRef]
Xu, K.; Lu, Y.; Shi, L.; Li, J.; Wang, S.; Lei, T. Whole-body stability control with high contact redundancy for wheel-legged hexapod robot driving over rough terrain. Mech. Mach. Theory 2023, 181, 105199. [Google Scholar] [CrossRef]
Hao, L.; Pagani, R.; Beschi, M.; Legnani, G. Dynamic and friction parameters of an industrial robot: Identification, comparison and repetitiveness analysis. Robotics 2021, 10, 49. [Google Scholar] [CrossRef]
Gong, D.; Song, Y.; Zhu, M.; Teng, Y.; Jiang, J.; Zhang, S. Adaptive Variable-Damping Impedance Control for Unknown Interaction Environment. Mathematics 2023, 11, 4961. [Google Scholar] [CrossRef]
Afrough, M.; Hanieh, A.A. Identification of Dynamic Parameters and Friction Coefficients: Of a Robot with Planar Serial Kinemtic Linkage. J. Intell. Robot. Syst. 2019, 94, 3–13. [Google Scholar] [CrossRef]
Huang, Y.; Ke, J.; Zhang, X.; Ota, J. Dynamic parameter identification of serial robots using a hybrid approach. IEEE Trans. Robot. 2022, 39, 1607–1621. [Google Scholar] [CrossRef]
Boschetti, G.; Sinico, T. Dynamic Modeling and Parameter Identification of a SCARA Robot Including Nonlinear Friction and Ball Screw Spline Coupling. J. Intell. Robot. Syst. 2025, 111, 96. [Google Scholar] [CrossRef]
Huang, D.; Yang, J.; Xu, G.; Zhou, H.; Chen, J. Effective parameter identification of the GMS friction model for feed systems in CNC machines. Control Eng. Pract. 2024, 152, 106061. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, H.; Liu, Z.; Cao, R.; Zhang, B.; Liu, H.; Ding, H. A unified framework for dynamic parameter identification of elastic joint robots with hysteresis and friction nonlinearities. Mech. Syst. Signal Process. 2025, 238, 113208. [Google Scholar] [CrossRef]
Shim, J.; Lee, S.; Jeon, D.; Ha, J.I. Contact force estimation using uncertain torque model and friction models for robot manipulator. IEEE Trans. Ind. Electron. 2024, 71, 12634–12644. [Google Scholar] [CrossRef]
Dai, R.; Rossini, L.; Laurenzi, A.; Patrizi, A.; Tsagarakis, N. Effective Data-Driven Joint Friction Modeling and Compensation With Physical Consistency. IEEE Robot. Autom. Lett. 2025, 10, 5321–5328. [Google Scholar] [CrossRef]
Hu, H.; Shen, Z.; Zhuang, C. A PINN-Based Friction-Inclusive Dynamics Modeling Method for Industrial Robots. IEEE Trans. Ind. Electron. 2025, 72, 5136–5144. [Google Scholar] [CrossRef]
Turlapati, S.H.; Nguyen, V.P.; Gurnani, J.; Bin Ariffin, M.Z.; Kana, S.; Yee Wong, A.H.; Han, B.S.; Campolo, D. Identification of intrinsic friction and torque ripple for a robotic joint with integrated torque sensors with application to wheel-bearing characterization. Sensors 2024, 24, 7465. [Google Scholar] [CrossRef] [PubMed]
Heng, S.; Zang, X.; Song, C.; Chen, B.; Zhang, Y.; Zhu, Y.; Zhao, J. Balance and Walking Control for Biped Robot Based on Divergent Component of Motion and Contact Force Optimization. Mathematics 2024, 12, 2188. [Google Scholar] [CrossRef]
Tadese, M.; Pico, N.; Seo, S.; Moon, H. A two-step method for dynamic parameter identification of indy7 collaborative robot manipulator. Sensors 2022, 22, 9708. [Google Scholar] [CrossRef]
Zhang, C. A Parameter Identification Method for Stewart Manipulator Based on Wavelet Transform. Mathematics 2020, 8, 257. [Google Scholar] [CrossRef]
Li, Z.; Wei, H.; Liu, C.; He, Y.; Liu, G.; Zhang, H.; Li, W. An improved iterative approach with a comprehensive friction model for identifying dynamic parameters of collaborative robots. Robotica 2024, 42, 1500–1522. [Google Scholar] [CrossRef]
Wu, W.; Qin, G.; Xiao, Z.; Wu, W.; Chen, C.; Yu, M.; Ren, Z.; Zhang, T.; Long, G. Adaptive End-Effector Buffeting Sliding Mode Control for Heavy-Duty Robots with Long Arms. Mathematics 2023, 11, 2977. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Moshayedi, A.J.; Liao, L.; Kolahdooz, A. Gentle Survey on MIR Industrial Service Robots: Review & Design. J. Mod. Process. Manuf. Prod. 2021, 10, 31–50. [Google Scholar]

Figure 1. Mechanical Design of a Lunar Quadruped Robot.

Figure 2. Dynamic model of the Lunar Quadruped Robot.

Figure 3. Joint lubrication of the Lunar Quadruped Robot.

Figure 4. Lubrication condition of the Lunar Quadruped Robot.

Figure 5. PPO training strategy.

Figure 6. Lunar Locomotion Control Framework for Quadruped Robots.

Figure 7. Virtual and experimental prototype.

Figure 8. Experimental results of forward locomotion of the lunar quadruped robot on simulated lunar soil under horizontal ground (A) and an

8^{\circ}

slope (B).

Figure 8. Experimental results of forward locomotion of the lunar quadruped robot on simulated lunar soil under horizontal ground (A) and an

8^{\circ}

slope (B).

Figure 9. Fitting Plane of Joint Friction Torque.

Figure 10. Simulation schematic of the lunar quadruped robot, (a–h) show the continuous locomotion process of the lunar quadruped robot.

Figure 11. Variations in body altitude and attitude of the lunar quadruped robot.

Figure 12. Relative motion trajectory of the foot end of the lunar quadruped robot (based on the leg coordinate system, taking Leg 1 as an example).

Figure 13. Torque variations of the hip roll, hip pitch, and knee pitch joints for the lunar quadruped robot (taking Leg 1 as an example).

Figure 14. Foot Trajectory Tracking Curves of Quadruped Robot during Planar Locomotion.

Figure 15. Leg Joint Torque Tracking Curves of Quadruped Robot during Planar Locomotion.

Figure 16. Foot Contact Force Curves of Quadruped Robot during Planar Locomotion.

Figure 17. Foot Trajectory Tracking Curves of Quadruped Robot during Sloped Terrain Locomotion.

Figure 18. Leg Joint Torque Tracking Curves of Quadruped Robot during Sloped Terrain Locomotion.

Figure 19. Foot Contact Force Curves of Quadruped Robot during Sloped Terrain Locomotion.

Table 1. Design of Key Parameters for a Lunar Quadruped Robot.

Name	Unit	Number
Mass	kg	200–360
Thigh link length	m	0.5
Shank link length	m	0.5
Body length	m	1.45
Body width	m	1.45
Body height	m	0.4
Maximum joint torque	N·m	140
Roll joint motion range	°	−90–90
Hip pitch joint motion range	°	−90–30
Knee pitch joint motion range	°	−180–70
Tension spring stiffness	N/m	6000
Tension spring free length	m	0.21
Joint mass	kg	3
Thigh link mass	kg	2.1
Shank link mass	kg	1.2
Foot pad mass	kg	0.6

Table 2. Key hyperparameters for the PPO-based friction identification agent.

Hyperparameter	Value	Description
Learning Rate	$3 \times 10^{- 4}$	Step size for Adam optimizer
Discount Factor ( $γ$ )	0.99	Importance of future rewards
GAE Parameter ( $λ$ )	0.95	Bias-variance trade-off for advantage
Clip Range ( $ϵ$ )	0.2	Limits the change in policy per update
Batch Size	256	Number of transitions per gradient update
Number of Epochs	10	Times to reuse each batch of data
Entropy Coefficient	0.01	Encourages policy exploration
Value Loss Coeff ( $c_{1}$ )	0.5	Weight of the value function error
Reward Scaling ( $k_{1}$ )	10.0	Sensitivity of reward to torque error

Table 3. Key parameters of the simulated lunar soil.

No.	Parameter	Test Value
1	Bulk density (g/cm³)	1.4–2.3
2	Deformation index	0.97–1.32
3	Cohesion modulus (kN/mⁿ⁺¹)	1.46–4.89
4	Friction modulus (kN/mⁿ⁺²)	281–652
5	Shear modulus (cm)	1.09–1.66
6	Equivalent stiffness modulus (kPa/mⁿ)	840–2800
7	Contact stiffness (N)	$3.2 \times 10^{5}$ – $2.0 \times 10^{7}$ $(0 wt %$ – $16 wt %)$
8	Cohesion (kPa)	34.4–41.9
9	Internal friction angle (°)	44.1–48.3
10	Thermal conductivity ( $W / (m \cdot k)$ )	0.0773–0.935 $(0 wt %$ – $15 wt %)$
11	Albedo	0.25–0.65

Table 4. Statistical Table of Joint Friction Characteristics Under Positive Driving.

Temperature	Coulomb Friction	Load Coefficient	Viscous Coefficient
T	$p_{0}$	$p_{2}$	$p_{1}$
( $° C$ )	( $N \cdot m$ )	(/)	( $Nm \cdot s / rad$ )
−20 (Positive)	34.09	0.322	−1.046
−20 (Negative)	31.602	0.329	1.338
−10 (Positive)	27.752	0.351	2.086
−10 (Negative)	26.044	0.333	1.093
−10 (Positive)	27.609	0.343	2.637
−10 (Negative)	27.421	0.356	0.347
0 (Positive)	25.115	0.367	1.123
0 (Negative)	20.32	0.375	3.425
10 (Positive)	16.861	0.244	6.605
10 (Negative)	15.771	0.338	7.282
20 (Positive)	16.22	0.316	2.938
20 (Negative)	12.823	0.312	6.425
30 (Positive)	16.033	0.304	3.678
30 (Negative)	10.06	0.318	6.629
50 (Positive)	10.899	0.298	5.544
50 (Negative)	7.442	0.326	5.352
70 (Positive)	6.405	0.289	5.83
70 (Negative)	4.413	0.293	4.781
90 (Positive)	3.621	0.289	5.948
90 (Negative)	1.053	0.304	6.189
120 (Positive)	3.356	0.27	3.586
120 (Negative)	1.457	0.286	3.606

Table 5. Statistical Table of Joint Friction Characteristics Under Reverse Driving.

Temperature	Coulomb Friction	Load Coefficient	Viscous Coefficient
T	$p_{0}$	$p_{2}$	$p_{1}$
( $° C$ )	( $N \cdot m$ )	(/)	( $Nm \cdot s / rad$ )
−20 (Positive)	53.42	0.3655	−13.37
−20 (Negative)	45.281	0.411	−11.582
−10 (Positive)	41.42	0.2916	−8.417
−10 (Negative)	37.48	0.2694	−5.931
−10 (Positive)	53.23	0.2636	−10.63
−10 (Negative)	44	0.3549	−8.488
0 (Positive)	37.251	0.244	−5.143
0 (Negative)	31.477	0.296	−3.279
10 (Positive)	34.641	0.205	−3.523
10 (Negative)	33.503	0.226	−2.457
20 (Positive)	28.103	0.216	−1.436
20 (Negative)	25.99	0.247	−0.768
30 (Positive)	19.764	0.217	0.993
30 (Negative)	21.82	0.216	1.287
50 (Positive)	11.479	0.208	4.036
50 (Negative)	11.776	0.265	2.592
70 (Positive)	10.29	0.201	4.2
70 (Negative)	10.362	0.206	4.48
90 (Positive)	5.061	0.193	4.912
90 (Negative)	5.194	0.213	4.696
120 (Positive)	7.001	0.106	3.078
120 (Negative)	6.338	0.106	3.62

Table 6. Foot-end RMSE Analysis for Quadrupedal Planar Locomotion Based on PPO Friction Model Identification (Unit: mm).

Leg No.	X Direction	Y Direction	Z Direction
1	18.55	4.97	18.46
2	12.47	4.31	18.05
3	15.10	6.25	19.45
4	17.74	6.94	21.04

Table 7. Foot-end RMSE Analysis for Quadrupedal Planar Locomotion Based on Coulomb Friction Models (Unit: mm).

Leg No.	X Direction	Y Direction	Z Direction
1	32.44	28.65	41.04
2	29.04	28.31	40.13
3	28.25	26.36	40.38
4	32.60	26.29	38.23

Table 8. Joint Torque RMSE Analysis Table of Quadruped Robot during Planar Locomotion (Unit: N·m).

Leg No.	Hip Roll Joint	Hip Pitch Joint	Knee Pitch Joint
1	1.87	4.62	2.82
2	2.03	5.12	2.94
3	1.93	4.31	2.98
4	3.22	5.49	3.82

Table 9. Analysis Table of Foot End Contact Force Peaks for Quadruped Robot during Planar Locomotion.

Leg No.	Mean Peak Contact Force (N)	Standard Deviation of Peak Contact Force (N)
1	104.65	82.17
2	89.71	67.31
3	91.60	72.15
4	165.72	105.23

Table 10. Foot-end RMSE Analysis for Quadrupedal Slope Locomotion Based Based on PPO Friction Model Identification (Unit: mm).

Leg No.	X Direction	Y Direction	Z Direction
1	18.81	6.15	18.30
2	12.99	3.63	17.67
3	13.40	4.95	20.02
4	20.20	8.26	20.60

Table 11. Foot-end RMSE Analysis for Quadrupedal Slope Locomotion Based on Coulomb Friction Models (Unit: mm).

Leg No.	X Direction	Y Direction	Z Direction
1	32.16	25.11	37.78
2	28.61	25.24	37.37
3	28.55	25.63	37.99
4	32.90	25.70	38.25

Table 12. Joint Torque RMSE Analysis Table of Quadruped Robot during Sloped Locomotion (Unit: N·m).

Leg No.	Hip Roll Joint	Hip Pitch Joint	Knee Pitch Joint
1	1.36	4.65	3.20
2	1.61	4.65	3.11
3	1.61	4.27	2.94
4	3.33	5.79	3.97

Table 13. Analysis Table of Foot End Contact Force Peaks for Quadruped Robot during Sloped Locomotion.

Leg No.	Mean Peak Contact Force (N)	Standard Deviation of Peak Contact Force (N)
1	124.98	96.03
2	90.79	58.25
3	91.14	70.92
4	110.43	87.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Zhao, W.; Chen, L.; Liu, Z.; Sun, S. Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions. Mathematics 2026, 14, 848. https://doi.org/10.3390/math14050848

AMA Style

Li J, Zhao W, Chen L, Liu Z, Sun S. Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions. Mathematics. 2026; 14(5):848. https://doi.org/10.3390/math14050848

Chicago/Turabian Style

Li, Jianfei, Wenrui Zhao, Lei Chen, Zhiyong Liu, and Shengxin Sun. 2026. "Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions" Mathematics 14, no. 5: 848. https://doi.org/10.3390/math14050848

APA Style

Li, J., Zhao, W., Chen, L., Liu, Z., & Sun, S. (2026). Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions. Mathematics, 14(5), 848. https://doi.org/10.3390/math14050848

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning-Based Locomotion Control for a Lunar Quadruped Robot Considering Space Lubrication Conditions

Abstract

1. Introduction

2. Dynamics Model of a Lunar Quadruped Robot

2.1. Mechanical Design of a Lunar Quadruped Robot

2.2. Dynamic Model of a Lunar Quadruped Robot

2.2.1. Coordinate System Definition

2.2.2. Body Dynamic Modeling

2.2.3. Model Illustration

2.2.4. Kinematic Model

2.2.5. Dynamic Model

3. Prediction of Joint Friction Parameters Based on PPO Reinforcement Learning

3.1. Reinforcement Learning Training Framework for Joint Friction Parameter Prediction

3.2. Joint Friction Model Under Space Lubrication Conditions

3.3. PPO Reinforcement Learning Strategy

4. Locomotion Control of a Lunar Quadruped Robot

4.1. Control Architecture

4.2. Feedback Linearization Control

4.3. Multi-Leg Cooperative Impedance Motion Controller

4.4. Stability Analysis

5. Simulation and Experiments

5.1. Simulation and Experimental Prototype

5.2. Friction Parameter Identification and Analysis

5.3. Planar Autonomous Walking Test and Validation

5.4. Slope Autonomous Walking Test and Validation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI