Next Article in Journal
Correction: Zhao et al. A Closed-Form Inverse Kinematic Analytical Method for Seven-DOF Space Manipulator with Aspheric Wrist Structure. Machines 2024, 12, 632
Previous Article in Journal
Smart In-Process Inspection in Human–Cyber–Physical Manufacturing Systems: A Research Proposal on Human–Automation Symbiosis and Its Prospects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Control of Parallel Quadruped Robots Based on Adaptive Dynamic Programming Control

by
Junwei Liang
1,
Shenyu Tang
2 and
Bingyi Jia
1,*
1
College of Mechanical and Electronic Engineering, Shandong University of Science and Technology, Qingdao 266590, China
2
College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China
*
Author to whom correspondence should be addressed.
Machines 2024, 12(12), 875; https://doi.org/10.3390/machines12120875
Submission received: 11 November 2024 / Revised: 23 November 2024 / Accepted: 27 November 2024 / Published: 2 December 2024
(This article belongs to the Section Automation and Control Systems)

Abstract

:
With the rapid development of robotics technology, quadruped robots have shown significant potential in navigating complex terrains due to their excellent stability and adaptability. This paper proposes an adaptive dynamic programming control method based on policy iteration, aimed at improving the motion performance and autonomous adaptability of parallel quadruped robots in unknown environments. First, the study establishes a kinematic model of the robot and performs inverse kinematics calculations to determine the angular functions for each joint of the robot’s legs. To improve the robot’s mobility on challenging terrains, we design an optimal tracking controller based on Generalized Policy Iteration (GPI). This approach reduces the model’s dependency on strict requirements and is applied to the control of quadruped robots. Finally, kinematic simulations are conducted based on pre-planned robot gaits. In addition, experiments are then conducted based on the simulation results. The results of simulation experiments indicate that the quadruped robot, under the adaptive optimal control algorithm, can achieve smooth walking on complex terrains, verifying the rationality and effectiveness of the parallel quadruped robot in handling such conditions. The experimental results further demonstrate that this strategy significantly improves the stability and robustness of the robot across various terrains.

1. Introduction

Quadruped robots are highly complex and versatile machines that replicate the movement of four-legged animals, such as dogs, cats, or horses. They are capable of autonomous walking, running, jumping, and even climbing in complex and changing environments [1]. The design of these robots draws on knowledge from several disciplines, including mechanical engineering, electronic engineering, computer science, control theory, and bionics, to achieve efficient and stable locomotion. With the rapid advancement in automation and intelligent technologies, quadruped robots have attracted considerable attention for their stability and adaptability in challenging terrains. They exhibit distinct advantages in various applications, including disaster relief, military reconnaissance, and everyday service tasks [2].
Since the 1990s, numerous experts and scholars have conducted research on bionic quadruped robots. To improve the mechanical structure performance of these robots, an innovative bionic actuator system was proposed in the literature [3]. This system combines additive manufacturing and topology optimization techniques to effectively reduce redundant materials in the limb units, thereby significantly enhancing their dynamic performance. In addition, reference [4] introduced a novel leg phase detection method for robots, which uses proprioceptive feedback to estimate the leg state, addressing the issue of inaccuracy when external equipment is unavailable. Furthermore, reference [5] reported the development of a quadruped robot with soft–rigid hybrid legs, where the legs were designed using a multistage topology optimization method, providing a new solution to the traditional challenge of a limited bearing capacity in soft legs. In the context of adapting to complex environments, the bionic quadruped obstacle-crossing robot described in [6] achieves high obstacle-crossing performance by storing energy in its hind limb structures. Similarly, ref. [7] introduces a novel quadruped robot with a reconfigurable sprawl posture, enabling it to switch between different postures to navigate narrow spaces, obstacles, and inclined surfaces. As research on quadruped robots advances, alongside the optimization of mechanical structures, many advanced algorithms are gradually being integrated. These algorithms play a critical role in improving the robot’s performance, intelligence, and autonomy. In terms of control strategies, ref. [8] introduces an end-to-end training framework that enables quadruped robots to learn various gaits through deep learning. Reference [9] presents a comprehensive decision-making and control framework specifically addressing the issue of falls in quadruped robots, incorporating a model-free deep reinforcement learning method to achieve post-fall recovery. Furthermore, reference [10] details a contextual meta-reinforcement learning method developed to formulate a fault-tolerant strategy that enhances the mobility of quadruped robots within unstructured environments. Reference [11] introduces a Lyapunov model predictive control algorithm that incorporates sequential scaling to improve the tracking accuracy of these robots. Subsequently, reference [12] implements a teacher–student policy architecture, which allows the robot to adjust its motion strategy based on environmental variables. Experiments performed on actual quadruped robots have confirmed the efficacy of these strategies in enabling high-speed motion and adaptation to complex terrains. The swift advancement in quadruped robot technology has not only achieved significant theoretical innovations but has also been substantiated by practical implementations, showcasing the substantial potential and value of these robots for future advancements in automation and intelligent systems.
However, the design of control systems for quadruped robots faces numerous challenges, particularly in achieving precise gait adjustments and maintaining balance in dynamic environments [13]. In [14], a deep reinforcement learning framework was developed to enable a quadruped robot to perform agile and dynamic locomotion tasks. Similarly, ref. [15] employed neural networks to control a quadruped robot traversing challenging terrains, showcasing the effectiveness of neural network-based control in real-world robotic applications. These works highlight the potential of neural networks in enhancing the control capabilities of quadruped robots. Adaptive optimal control [16] offers an effective solution by optimizing control strategies in real time to adapt to environmental changes, thereby enhancing the robot’s adaptability and stability.
Optimal control [17] is a method used to determine the best control strategy under given constraints to maximize (or minimize) a system’s performance index. It plays a critical role in controller design, ensuring optimal system performance, improving the response speed and stability, and enhancing robot path planning. Key advantages of optimal control include performance optimization, precision, flexibility, robustness, and efficiency, which make it particularly suited for addressing control challenges in complex systems. As a result, it has been widely applied across various fields. Among the most commonly used approaches are reinforcement learning-based solutions [18,19]. Reinforcement learning is a method where an agent learns by interacting with its environment, accumulating experience, and optimizing its strategy through trial and error [20]. One key advantage of reinforcement learning over traditional control techniques is its ability to autonomously learn control strategies through environmental interaction, even without precise system models. This feature offers greater flexibility and robustness, particularly when dealing with highly complex and dynamic environments, such as those encountered by quadruped robots, where it demonstrates significant advantages. With the continuous development of reinforcement learning, the idea of combining adaptive control and optimal control has recently emerged, commonly referred to as adaptive dynamic programming (ADP) [21]. A common framework for studying reinforcement learning or ADP is the Markov Decision Process (MDP), in which the control process is typically stochastic and modeled in discrete time. However, there is a growing need to extend this approach to controlling deterministic systems in continuous time domains. In reference [22], an online adaptive optimal control algorithm based on ADP is proposed to solve multiplayer nonzero-sum games (MP-NZSGs) for discrete time unknown nonlinear systems. In reference [23], a robust optimal control design for a class of uncertain nonlinear systems is explored from the perspective of RADP [24], aiming to address dynamic uncertainties or unmodeled dynamics. In reference [25], an event-driven adaptive robust control method is developed for continuous time uncertain nonlinear systems using a neural dynamic programming (NPD) strategy. This method provides a novel approach to integrating self-learning control, event-triggered adaptive control, and robust control, advancing research on nonlinear adaptive robust feedback design in uncertain environments. While most of these algorithms focus on regulation problems, real-world scenarios often encounter tracking challenges. In quadruped robot control, tracking a desired trajectory under uncertain conditions is critical. Reference [26] introduces a neural network-based control approach for quadruped robots to achieve precise trajectory tracking despite dynamic uncertainties. In reference [27], a hierarchical autonomous control scheme based on a twin double deep deterministic policy gradient algorithm and model predictive control is proposed, achieving the precise tracking of pressure control targets and promoting the application of AI technology in underground construction. For the path tracking control of autonomous vehicles with uncertain dynamics, ref. [28] presents an adaptive optimal control method with prescribed performance, introducing prescribed performance functions into ADP to constrain tracking errors within defined performance boundaries while optimizing control. Additionally, ref. [29] develops an optimal neural control framework with prescribed performance constraints to solve the trajectory tracking control problem for hypersonic vehicles. In view of the dynamic uncertainty and the disturbance factors acting on the system, reference [30] develops a novel online adaptive H control scheme for ACC systems. Furthermore, it possesses the unique capability of learning control and disturbance policies concurrently.
The design of the aforementioned controller addresses both the regulation and tracking problems in nonlinear systems, while disregarding the influence of modeling inaccuracy, particularly in models with a complex modeling process, such as those of quadruped robots. Although these systems can be calculated, they still take a lot of time. And they still need to be recalculated in the face of a new system. The complexity and accuracy of the calculation are difficult to guarantee. To overcome this issue, we propose a novel optimal tracking control method based on Generalized Policy Iteration (GPI), applied for the first time in quadruped robot control. Generalized Policy Iteration (GPI) is a reinforcement learning method that involves two interacting processes: policy evaluation and policy improvement. The strategy is always improved based on a specific value function, and the value function will always converge to the true value function corresponding to the specific strategy. This GPI-based approach eliminates the need for prior knowledge of the system’s drift function f(x), providing a partially model-free online solution that effectively resolves the foot-end displacement control problem in quadruped robots.
The main contributions of this paper are as follows:
  • A kinematic model of the parallel quadruped robot’s leg mechanism is established, and a novel algorithm based on Generalized Policy Iteration (GPI) is applied for the first time to the tracking control of this model.
  • Unlike the existing algorithm [31], GPI avoids solving complex dynamic models. It will give an online model-free solution with any quadruped robot.
  • A co-simulation model for the parallel quadruped robot is developed, significantly improving the accuracy of the simulation process and addressing the limitations of the mathematical model. The simulation results are validated through experimental verification.
The structure of the remainder of the paper is as follows: Section 2 presents the kinematic model of the parallel quadruped robot. Section 3 describes the controller design. Section 4 covers the simulation and experimental validation, and Section 5 provides the conclusion.

2. Kinematic Model of Quadruped Robot’s Control System

Establishing a kinematic model is crucial to determining the relationship between the foot-end and the joints of the legs in terms of the position and orientation for legged robots. Kinematics forms the foundation for enabling effective motion control in robotics. By integrating the robot’s movements with mathematical modeling, further advanced research can be conducted in areas such as gait planning, trajectory planning, and posture control.
The mechanical structure of the quadruped robot was inspired by the physiological structure of mammals and consists of five parts. The legs are connected to the torso by four joints, with each joint having two degrees of freedom. Considering the balance between mechanical flexibility and complexity, the mechanical structure has been simplified. The simplified structure is shown in Figure 1 and consists of five parts. The parameters and structure of the four legs are identical and symmetrically distributed around the center. The kinematic model in its initial state is shown in Figure 2.
If the pose of the robot’s end-effector is given, the values of the other joints can be determined by performing inverse kinematics. There are two methods for solving inverse kinematics: The first is the analytical method, which is fast and accurate, utilizing algebraic calculation and geometric analysis for the solution. The second is the numerical method, which has a slower computation speed and unstable numerical iteration. Therefore, in this research, the geometric analysis approach from the analytical method was adopted to solve the inverse kinematics of the robot’s single leg.
The position inverse solution analysis involves determining the motor angles that achieve the desired foot-end trajectory, given the known lengths of the actuators AE and BC, as well as the lengths of the driven links ED and CD.
The continuous phase transition process of the foot-end of a single leg, from ground contact to liftoff, is referred to as the stance phase. The swing phase describes the continuous phase transition process in which the foot-end moves along the planned trajectory from liftoff to ground contact. During the stance phase, the leg supports the robot’s load and moves the torso in the target direction through phase transitions, shifting from the anterior extreme to the posterior extreme. Conversely, the swing phase shifts from the posterior extreme to the anterior extreme, determining the robot’s stride length and leg lift height when crossing obstacles. Therefore, research on the stance and swing phases is essential for deriving the torso trajectory of quadruped robots.
To study the stance phase of the right foreleg of the quadruped robot, we first assume that the parallel mechanism exhibits no lateral motion. The origin of the coordinate system is set at the center of the foot-end when the quadruped robot is in a static stance, with the forward direction aligned with the positive X-axis and the vertical direction upward along the positive Y-axis. The kinematic model under general conditions is shown in Figure 3.
From the geometric relationships and the law of cosines, we derive
c 2 = ( h 0 n ) 2 + m 2
where c represents the distance between the motor center and the foot-end center during movement, h 0 is the distance between the motor center and the foot-end center when stationary, n is the horizontal distance of the foot-end, and m is the vertical distance of the foot-end.
b 2 = a 2 + h 0 2 2 a h 0 cos θ 0
where b is the length of the thigh, a is the length of the calf, and θ 0 is the angle between the stationary shin and the vertical axis.
cos γ = h 0 n c
where γ is the angle between the line connecting the motor center and the foot-end center and the vertical axis.
b 2 = a 2 + c 2 2 a c cos ( θ 1 + θ 0 γ ) = a 2 + c 2 2 a c cos ( θ 0 + γ θ 2 )
where θ 1 is the angle between the moving thigh and the stationary thigh at the front, which represents the internal motor rotation, and θ 2 is the angle between the moving thigh and the stationary thigh at the back, representing the external motor rotation.
From (3) and (4), the relationship between the two motor angles is derived as
θ 1 . + θ 2 = 2 γ
γ = arccos h 0 n c
From (4) and (6), the expression for motor angle θ 1 as a function of the foot-end position is
θ 1 = arccos a 2 + c 2 b 2 2 a c + arccos h 0 n c θ 0
From (5) and (7), the expression for motor angle θ 2 as a function of the foot-end position is
θ 2 = arccos h 0 n c arccos a 2 + c 2 b 2 2 a c + θ 0
To investigate the stance phase of the quadruped robot’s right foreleg, it is assumed that the parallel mechanism does not undergo lateral motion. The foot-end center in the initial standing position during the stance phase is designated as the coordinate origin, with the forward direction aligned along the positive X-axis and the vertical direction upward along the positive Y-axis. The kinematic model under general conditions is illustrated in Figure 4.
Based on geometric relationships and the law of cosines, it can be derived as
b 2 = a 2 + l 0 2 2 a l 0 2 cos β 0 = a 2 + d 2 2 a d cos β 1
where l 0 represents the initial distance between the motor and the foot-end, and β 0 is the angle between the thigh and the line connecting the motor and foot-end at the initial moment. d represents the distance between the motor and the foot-end during motion, and β 1 is the corresponding angle during movement.
l 0 2 = h 0 2 + S 2
where S denotes the step length.
d 2 = x 2 + y 2
where x and y represent the motor coordinates.
The relationship between the two motor angles is
φ 1 = φ 2 = β 0 β 1
where φ 1 denotes the inner motor angle and φ 2 denotes the outer motor angle.
From Equations (9)–(12), the relationship between motor angles φ 1 and φ 2 and the motor position is given by
φ 1 = arccos a 2 b 2 + h 0 2 + S 2 2 a ( h 0 2 + S 2 ) 1 2 arccos a 2 b 2 + x 2 + y 2 2 a ( x 2 + y 2 ) 1 2
The commonly used methods for the dynamics study of parallel robots include the Lagrange method, the Kane method, and the Newton-Euler method. In this section, the Lagrange method is used to establish the dynamic equation of the parallel leg mechanism, and the expression of the driving torque is obtained by solving the equation. The schematic diagram of the parallel leg mechanism is shown in Figure 5.
It is assumed that the position of the center of mass of each bar is on the common perpendicular of the adjacent joint axes. From the closed vector method and the projection relation, we can obtain
a cos θ 6 + b cos θ 5 = a cos θ 4 + b cos θ 3 a sin θ 6 + b sin θ 5 = a sin θ 4 + b sin θ 3
where θ 3 is the included angle between the left thigh and the positive direction of the X axis, θ 4 is the included angle between the left crus and the positive direction of the X axis, θ 5 is the included angle between the right thigh and the positive X axis direction, and θ 6 is the included angle between the right crus and the positive X axis direction. The center of mass of each member is located on the common perpendicular of the adjacent joint axes. Choose θ 3 and θ 5 as generalized coordinates. It can be seen from Figure 3 and Figure 5 that
θ 3 = 270 ° θ 0 + θ 2 θ 6 = θ 1 + θ 0 90 °
It can be obtained from Formula (14) that
θ 4 = 2 arctan x 1 θ 5 = arcsin a sin θ 3 a sin θ 6 b sin θ 4 b
where x 1 = D ± D 2 A + B + C A + B C A + B + C , A = 2 a 2 , B = 2 a 2 cos θ 3 θ 6 , C = 2 a b cos θ 3 cos θ 6 , and D = 2 a b sin θ 3 sin θ 6 .
Take the partial derivative of Equation (14) with respect to θ 3 and θ 6 , respectively, to obtain
θ 5 θ 6 = a sin θ 6 θ 4 b sin θ 5 θ 4 θ 5 θ 3 = a sin θ 4 θ 3 b sin θ 5 θ 4 θ 4 θ 6 = a sin θ 6 θ 5 b sin θ 5 θ 4 θ 4 θ 3 = a sin θ 5 θ 3 b sin θ 5 θ 4
By taking the derivative of both sides of Equation (14), we can obtain
a cos θ 6 θ ˙ 6 + b cos θ 5 θ ˙ 5 = b cos θ 3 θ ˙ 3 + a cos θ 4 θ ˙ 4 a sin θ 6 θ ˙ 6 + b sin θ 5 θ ˙ 5 = b sin θ 3 θ ˙ 3 + a sin θ 4 θ ˙ 4
Rewrite Equation (18) into a matrix form:
θ ˙ 5 θ ˙ 4 = A θ ˙ 6 θ ˙ 3
where A = G 1 H , G = b cos θ 5 b cos θ 4 b sin θ 5 b sin θ 4 , and H = a cos θ 1 a cos θ 4 a sin θ 1 a sin θ 4 .
The position of the center of mass of each rod can be obtained from Figure 5 and the linear velocity of the center of mass of each rod can be obtained through time derivation.
The linear velocity of the center of mass of the right thigh center is
x ˙ s 4 y ˙ s 4 = q 1 sin θ 6 θ ˙ 6 q 2 cos θ 6 θ ˙ 6
where x ˙ s 4 is the horizontal linear velocity of the centroid of the right thigh and y ˙ s 4 is the vertical linear velocity of the centroid of the right thigh. q 1 is the distance between the thigh mass center and the motor center and q 2 is the distance between the crus mass center and a joint connecting the thigh with the crus.
The linear velocity of the center of mass of the right lower leg is
x ˙ s 3 y ˙ s 3 = b sin θ 6 θ ˙ 6 q 2 sin θ 5 θ ˙ 5 b cos θ 6 θ ˙ 6 + q 2 cos θ 5 θ ˙ 5
where x ˙ s 3 is the linear velocity in the horizontal direction of the mass center of the right leg and y ˙ s 3 is the linear velocity in the vertical direction of the mass center of the right leg.
The linear velocity of the center of mass of the left lower leg is
x ˙ s 2 y ˙ s 2 = a sin θ 3 θ ˙ 3 q 2 sin θ 4 θ ˙ 4 a cos θ 3 θ ˙ 3 + q 2 cos θ 4 θ ˙ 4
where x ˙ s 2 is the horizontal linear velocity of the mass center of the left leg and y ˙ s 2 is the vertical linear velocity of the mass center of the left leg.
The linear velocity of the center of mass of the left thigh center is
x ˙ s 1 y ˙ s 1 = q 1 sin θ 3 θ ˙ 3 q 1 cos θ 3 θ ˙ 3
where x ˙ s 1 is the horizontal linear velocity of the mass center of the left thigh and y ˙ s 1 is the vertical linear velocity of the mass center of the left thigh.
According to (16) and (19)–(23), the total kinetic energy of the system is expressed as
E = i = 1 4 1 2 m i x ˙ s i 2 + y ˙ s i 2 + J i θ ˙ i + 2 2 = 1 2 m 1 x ˙ s 1 2 + m 1 y ˙ s 1 2 + m 2 x ˙ s 2 2 + m 2 y ˙ s 2 2 + m 3 x ˙ s 3 2 + m 3 y ˙ s 3 2 + m 4 x ˙ s 4 2 + m 4 y ˙ s 4 2 + J 1 θ ˙ 3 2 + J 2 θ ˙ 4 2 + J 3 θ ˙ 5 2 + J 4 θ ˙ 6 2 = 1 2 J 33 θ ˙ 3 2 + J 36 θ ˙ 3 θ ˙ 6 + 1 2 J 66 θ ˙ 6 2
where the masses of the left thigh, left calf, right calf, and right thigh are m 1 , m 2 , m 3 , and m 4 , respectively, and the moments of inertia of the left thigh, left calf, right calf, and right thigh around the center of mass are J 1 , J 2 , J 3 , and J 4 , respectively, and m 1 = m 4 , m 2 = m 3 , J 1 = J 4 , and J 2 = J 3 . The coefficients J 33 , J 66 , and J 36 are, respectively, expressed as
J 33 = J 1 + m 1 q 1 2 + m 2 a 2 + 2 a q 2 θ 4 θ 3 cos θ 4 θ 3 + m 3 q 2 2 + J 3 θ 5 θ 3 2 + m 2 q 2 2 + J 2 θ 4 θ 3 2
J 66 = J 4 + m 4 q 1 2 + m 3 a 2 + 2 a q 2 θ 5 θ 6 cos θ 5 θ 6 + m 3 q 2 2 + J 3 θ 5 θ 6 2 + m 2 q 2 2 + J 2 θ 4 θ 6 2
J 36 = m 3 q 2 2 + J 3 θ 5 θ 3 · θ 5 θ 6 + m 2 q 2 2 + J 2 θ 4 θ 3 · θ 4 θ 6 + m 3 a q 2 θ 5 θ 3 cos θ 5 θ 6 + m 2 a q 2 θ 4 θ 6 cos θ 4 θ 3
The total potential energy of the legs can be expressed as
U = i 4 m i g h 1
where the height of the center of mass of the left thigh, the left leg, the right leg, and the right thigh from the ground is h 1 , h 2 , h 3 , and h 4 , respectively.
The driving moments M 1 and M 2 can be obtained from the Lagrangian equation M I = d d t E θ ˙ i + 2 E θ i + 2 + U θ ˙ i + 2 as follows:
M 1 = J 66 θ ¨ 6 + J 36 θ ¨ 3 + 1 2 J 66 θ 6 θ ˙ 6 2 + J 36 θ 3 1 2 J 33 θ 6 θ ˙ 3 2 + J 66 θ 3 θ ˙ 3 θ ˙ 6 M 2 = J 33 θ ¨ 3 + J 36 θ ¨ 6 + 1 2 J 33 θ 3 θ ˙ 3 2 + J 36 θ 6 1 2 J 66 θ 3 θ ˙ 6 2 + J 33 θ 6 θ ˙ 3 θ ˙ 6
In the gait cycle of a quadruped walking robot, the torso movement follows a regular pattern. Therefore, it is necessary to deduce the robot’s torso trajectory from the foot-end trajectory, specifically calculating the actual trajectory of the foot during single-leg movement.
To enhance the adaptability of quadruped robots to unstructured environments and ability to efficiently complete assigned tasks, the trot gait is generally used as the ideal state. In the trot gait, the duty cycle is 0.5, meaning that the robot’s diagonal legs move in sync, both either in the swing phase or in the stance phase.
The foot-end trajectory of the quadruped robot adopts the earliest composite cycloid trajectory. The stance phase and swing phase each occupy half of the total cycle. When the foot-end is in the swing phase, the trajectory equation of the quadruped robot’s foot-end relative to the torso is given by the following equation:
m = S 0.5 1 2 π sin 2 π t T n = H 0.5 0.5 cos 2 π t T
where H represents the lift height, T is the cycle, and t is the time.
The foot-end trajectory during the swing phase of the quadruped robot is corrected. Since the quadruped robot uses the trot gait, the foot-end trajectory should be the vector sum of the torso trajectory, which is derived from the inverse of the composite cycloid trajectory planned during the stance phase and the theoretical sine trajectory planned during the swing phase. The absolute trajectory equation of the foot-end during the swing phase is thus given by
m = 2 S 0.5 1 2 π sin 2 π t T n = 2 H 0.5 0.5 cos 2 π t T
Since the stance phase moves the robot’s torso by changing the leg phase to achieve movement in the target direction, the motor rotation pattern can be derived through the analysis of the stance phase. By taking the first derivative of (13) with respect to time, the expression for the motor’s angular velocity can be obtained:
φ ˙ = ( x ˙ x + y ˙ y ) ( a 2 b 2 x 2 y 2 ) ( x 2 + y 2 ) 4 a 2 ( x 2 + y 2 ) ( a 2 b 2 + x 2 + y 2 ) 2
By taking the second derivative of (13) with respect to time, the angular acceleration expression for motor rotation is obtained:
φ ¨ = σ μ 2 λ 2 δ + μ ( a 2 + b 2 μ ) μ δ x ¨ x + y ¨ y + x ˙ 2 + y ˙ 2 2 λ 2 μ δ μ δ 3 2
where λ = x ˙ x + y ˙ y , μ = x 2 + y 2 , σ = a 2 + b 2 , and δ = 4 a 2 μ σ + μ 2 .
Since the quadruped robot uses the trot gait with a duty cycle of 0.5, and the foot trajectory is planned using a composite cycloid curve, kinematic analysis reveals that the trajectory equations of the robot’s torso match those of the foot relative to the driving mechanism:
x = S t T 1 2 π sin 2 π t T y = H 0.5 0.5 cos 2 π t T
x ˙ = S T S T cos 2 π t T
x ¨ = 2 π S T 2 sin 2 π t T
Using the state variables x , x ˙ T from (34)–(36), we obtain
x ¨ x ˙ = 0 4 π 2 T 2 1 0 x ˙ x + 4 π 2 S t T 3 0
Through a forward kinematics analysis of the swing phase, setting b = 2 a and θ 0 = 121.83 ° , the following geometric relationship is derived:
m = a cos θ 0 + θ 1 θ 2 2 + b 2 a 2 sin 2 θ 0 + θ 1 θ 2 2 sin θ 1 + θ 2 2 = a cos θ 0 sin θ 1 + θ 2 2 + sin θ 1 + θ 2 2 b 2 a 2 sin 2 θ 0 = 0.5 a sin θ 1 + θ 2 2 + sin θ 1 + θ 2 2 b 2 0.75 a 2 = 1.3 a sin θ 1 + θ 2 2
Next, we analyzed the stance phase. From [32], the motor state equations can be described as
z ˙ 1 = z 2 z ˙ 2 = 1 J 0 K 1 u K 2 z 2 + T
The state variables z = z 1 , z 2 T represent the motor’s position and angular velocity, J 0 is the inertia, and T denotes the torque vector.
For the quadruped robot’s joint motors, let q = q 1 , q 2 T , q 2 = θ 1 , and q 2 = θ ˙ 1 , leading to
q ˙ 1 = q 2 q ˙ 2 = 1 J 0 K 1 u K 2 q 2 + T
Therefore, (40) can be expressed as
q ˙ = η q + ν q u
where ν q = 0 , K 1 J 0 T .
Let η q = q 2 , 1 J 0 K 2 q 2 + T T , and the state variables of the quadruped robot are X = X 1 , X 2 T , where X 1 represents displacement and X 2 represents velocity. Let θ 2 = 2 γ θ 1 be denoted as h θ 1 . Based on (37) and (39), the overall system equations of the quadruped robot can be expressed as
X ˙ 1 = X 2 = 13 a 2 q 1 + q 1 h ˙ θ 1 20 c 11 q 1 q 1 h ˙ θ 1 40 17 a + a 5 s 11
X ˙ 2 = 13 a 2 20 q ˙ 1 + q ˙ 1 h ˙ θ 1 + q 1 2 h ¨ θ 1 c 11 13 q 1 + q 1 h ˙ θ 1 2 40 s 11 q ˙ 1 q ˙ 1 h ˙ θ 1 q 1 2 h ¨ θ 1 a 40 17 + 0.2 s 11 a q 1 2 1 h ˙ θ 1 2 400 c 11
where s 11 = sin θ 1 + h θ 1 2 and c 11 = cos θ 1 + h θ 1 2 .
By defining
X ˙ = h ˙ q = h q f 0 q + g 0 q u f X + g X u
Let the tracking parameter X d = X 1 d , X 2 d T . Similarly to [33], we constructed an augmented system:
X ˙ = f X X d X ˙ d + g X 0 u F X + G X u
Remark 1.
Due to the nonlinearity of the system, especially in the dynamics of robotic joint torque, direct linearization may lose some dynamic characteristics. However, it is difficult to obtain specific expressions of f X  and  g X  without further simplification. In this paper, we opted to use AMESim and Simulink for co-simulation instead of manually building complex state equation models. AMESim accurately simulates the dynamic behavior, motion variations, and interaction characteristics of the multi-robot system, eliminating the need to manually construct detailed kinematic and mechanical coupling equations. This approach significantly simplifies the modeling process and enhances the accuracy of the simulation results. Simulink was employed to implement the control algorithm, and the proposed algorithm is model-free. This process will be explained in detail in Section 4.

3. Optimal Controller Design

This section presents the general formula for the infinite horizon nonlinear optimal control problem in continuous time systems. The flow chart of the algorithm in this paper can be seen in Figure 6.
In Section 2, we defined the control system for the quadruped robot as follows:
X ˙ t = F X + G X u , X 0 = X 0
where X t n represents the state vector, u t m is the control input or strategy, and F X t n and G X n × m are the system and input gain functions, respectively. The value function V X is defined as the infinite horizon cost:
V X = t r S X τ , u τ d τ
where r = S X t + u T t R u t with R is a positive definite matrix, and S X t .
Assumption 1.
Assume that F X + G X u  satisfies Lipschitz continuity on a compact set Θ n  that contains the origin, and the system (46) is stable, meaning that the state X  converges to a stable control law u .
The optimal control problem is solved by selecting the optimal stabilizing control u t that minimizes the value function in (47). The optimal value function V X is defined as
V X t = min u t r X τ , u τ d τ
The partial differential equation of the optimal value function V X can be expressed as a general solution to the nonlinear optimal control problem. We define the Hamiltonian for this problem as
Ψ X , u , V X = r X , u + V X T F X + G X u
where the gradient vector is V X = V / X n . The optimal value function V X from (48) satisfies the Hamilton–Jacobi–Bellman (HJB) equation:
0 = min u Ψ X , u , V X
For the unconstrained control u , the optimal control u can be found by setting Ψ X , u , V X / u = 0 , hence
u = 1 2 R 1 F X T V X
Substituting the optimal control expression (51) into (50) yields the HJB equation based on V X :
0 = S x + V X T F X 1 4 V X T G X R 1 G X T V X
Given the nonlinearity of the HJB (50) and the requirement for an accurate understanding of the system dynamics F X and the input gain G X , solving the equation is typically challenging.
Remark 2.
Policy iteration is a reinforcement learning method used to determine optimal values and control strategies. By alternating between policy evaluation and policy improvement, it incrementally converges towards the optimal policy. This approach has led to the development of various algorithms, as described in [34,35], which are designed to solve the HJB equation either in real time or in advance. In the discussion within this section, policy evaluation and improvement occur synchronously, as both the critic and actor are updated simultaneously. This process is considered an extreme form of Generalized Policy Iteration (GPI).
In continuous time systems, policy evaluation can be achieved using adaptive critics, with the core involving the solution of a nonlinear Lyapunov equation (e.g., [35,36]). The derivation of such equations can be via Leibniz’s formula for differentiating the value function [35]. Additionally, another approach for solving the Bellman equation is through Integral Reinforcement Learning (IRL) [37].
V X t T = t T t r X τ , u τ d τ + V X t
Let the period T > 0 . This is a discrete time Bellman equation, similar to the integral form. Note that the system equation F X and input gain G X in the Lyapunov equation do not appear in the Bellman (53). For policy improvement, following [38], the control is iteratively obtained by solving the value function V (53), as follows:
u = 1 2 R 1 G X T V X
The system will converge to the optimal control u from (51).

3.1. Value Function Approximation with Adaptive Critic

This section introduces a novel design for policy evaluation using an adaptive critic. We approximate the value function through a critic neural network:
V X = w T ϕ X + ξ X
where ϕ X : n N in the activation vector with N hidden neurons, and w N is the weight vector. ξ X represents the approximation error of the neural network. By selecting appropriate activation functions, a complete set of independent bases can be constructed to ensure the accurate approximation of V X . According to the Weierstrass higher order approximation theorem [38], within a compact set Θ , as the number of neurons N , for a fixed N , the approximation error ξ X and its gradient ξ X approach zero, i.e., ξ X 0 and ξ X 0 .
We applied the Bellman method to update the critic. Substituting the approximated value function (55) into the Bellman equation (53) gives
t T t r X τ , u τ d τ R X , u + w T ϕ X t w T ϕ X t T w T Δ ϕ t = ξ B
By using integral reinforcement R X , u , the difference Δ ϕ t = ϕ X t ϕ X t T , and the Bellman equation, the residual error ξ B = ξ X t ξ X t T is bounded within the compact set Θ , ensuring that ξ X remains within bounds. To design an adaptive law for the weight vector that guarantees the convergence of the value function approximation, we introduce the auxiliary variables M 1 N × N and N 1 N through the low-pass-filtered variables in (56):
M ˙ 1 = l M 1 + Δ ϕ t Δ ϕ t T , M 1 0 = 0 N ˙ 1 = l N 1 + Δ ϕ t R X , u , N 1 0 = 0
Here, the filtering parameter l > 0 and the discount factor l are introduced to ensure exponential decay, effectively preventing unbounded growth in the values of M 1 t and N 1 t while maintaining stability. M 1 and N 1 are defined as follows:
M 1 t = 0 t e l t τ Δ ϕ τ Δ ϕ T τ d τ N 1 t = 0 t e l t τ Δ ϕ τ R τ d τ
Definition 1.
(Persistency of excitation (PE) [39]). If there exists a strictly positive constant σ 1 > 0  such that the following condition holds over the interval Δ φ t , the signal t T , t  is said to be persistently excited, i.e.,
t T t Δ ϕ τ Δ ϕ τ T d τ σ 1 I , t > 0
The PE condition [39] is widely required in adaptive control to ensure parameter convergence.
Lemma 1.
([40]). If the signal Δ φ t  is persistently excited for all t > 0 , then the auxiliary variable M 1  defined in (57) is positive definite, i.e., M 1 > 0 , and the minimum eigenvalue λ min M 1 > σ 1 > 0 , t > 0 .
Proof. 
Detailed proof can be found in [40].
The adaptive critic neural network can be expressed as
V ^ X = w ^ T ϕ X
where w ^ and V ^ X represent the current estimates of w and V X , respectively. We now design an adaptive law to update w ^ using a sliding mode technique, defined as follows:
w ^ ˙ = Γ 1 M 1 P 1 P 1
P 1 N is defined as P 1 = M 1 w ^ + N 1 , and Γ 1 > 0 is a positive adjustment gain to be determined in [36]. This gain facilitates the adaptive learning process. □
Lemma 2.
Considering the adaptive law (61), if the system state X t  remains bounded under a given stable control, and if u t , ϕ t  , and the system state X t  are persistently excited, then the estimation error w ˜ = w w ^  satisfies the following conditions:
If the approximation error of the neural network ξ X  is zero, the estimation error w ^  will converge to zero within a finite time t 1 > 0 .
If ξ X 0 , the estimation error w ^  will converge to a compact set within a finite time t 1 > 0 .
Proof. 
We begin by examining the boundedness of P 1 . In Equation (58), when the states X t and X t T are bounded, the matrix M 1 is positive and has an upper bound δ M 1 > 0 , such that λ max M 1 δ M 1 . Due to the Bellman equation, the residual error ξ B is bounded. By introducing some constants δ 1 > 0 and substituting R into Equations (56)–(58), we find that Q 1 = M 1 w + Λ 1 is also bounded, where Λ 1 is given by Λ 1 = t = 0 t e l t τ Δ φ τ ξ B τ d τ . Thus, M 1 can be expressed as
P 1 = M 1 w ˜ + Λ 1
Since Δ φ t is persistently excited, from Lemma 1, we know that M 1 is symmetric and positive definite, and thus invertible. Therefore, we have M 1 1 P 1 = w ˜ + M 1 1 Λ 1 . Here, M 1 1 P 1 can be used to design a suitable Lyapunov function, as it includes both the estimation error w ˜ and Λ 1 . We differentiate M 1 1 P 1 as follows:
t M 1 1 P 1 = w ˜ ˙ + M 1 1 t Λ 1 + M 1 1 Λ ˙ 1 = w ^ ˙ + Λ ¯ 1
where Λ ¯ 1 = M 1 1 M ˙ 1 M 1 1 Λ 1 + M 1 1 Λ ˙ 1 and Λ 1 is bounded, meaning that Λ ¯ 1 δ ¯ 1 for some constants δ ¯ 1 > 0 . Note that M 1 1 is bounded since λ min M 1 > σ 1 and λ max M 1 < δ M 1 . Therefore, the upper and lower bounds for M 1 1 can easily be derived as λ min M 1 1 > δ M 1 and λ max M 1 1 < 1 / σ 1 . Consequently, we can easily find the lower and upper bounds for the two Ψ -type functions in P 1 [41], which serve as the lower and upper bounds of the following modified Lyapunov function:
Ψ 1 = L 1 2 M 1 1 P 1 T Γ 1 1 M 1 1 P 1
where L 1 > 0 is a constant. The time derivative can be calculated as
Ψ ˙ 1 = L 1 P 1 T M 1 1 Γ 1 1 w ^ ˙ + Λ ¯ 1 = L 1 P 1 T M 1 1 Γ 1 1 Γ 1 1 M 1 P 1 P 1 + Λ ¯ 1 α 1 Ψ 1
where α 1 = σ 1 L 1 δ ¯ 1 λ max Γ 1 1 2 / λ max Γ 1 1 is a positive constant for a properly chosen L 1 with 0 < L 1 < σ 1 / λ max Γ 1 1 δ ¯ 1 . Based on [42], we can conclude that Ψ 1 = 0 and P 1 = 0 within a finite time t 1 = 2 Ψ 1 0 / α 1 > 0 . Thus, we have the following results:
  • In the case where ξ X = 0 , we find that ξ B = 0 , P 1 = 0 , and Λ 1 = Λ ¯ 1 = 0 , meaning that w ˜ = 0 with α 1 = σ 1 2 / λ max Γ 1 1 , and w ˜ will converge to zero within finite time t 1 .
  • In the case where ξ X 0 , that is, when ξ B 0 and P 1 = 0 , this implies that w ˜ = M 1 1 Λ 1 and w ˜ δ 1 / σ 1 , meaning that w ˜ will be bounded after the finite time t 1 .
Remark 3.
Based on Lemma 1, the minimum eigenvalue of M 1  can be monitored to verify whether the persistent excitation (PE) condition is satisfied in real time. To ensure that the PE condition is always satisfied, the methods proposed in the literature, such as those in [35,43], can be used. These methods introduce an appropriate level of noise into the system states or the controller design.
Remark 4.
The sliding mode term P 1 / P 1  in the adaptive law (61) ensures finite time convergence and also prevents the potential chattering phenomenon caused by integral action, as discussed in [40].

3.2. Adaptive Optimal Control Through GPI

In this section, we design an actor focused on policy improvement. By verifying the derivation in (54), if the estimated weight w ˜ converges to the unknown actual weight w , we successfully solve the Bellman equation (53). This allows us to directly use the adaptive critic (60) to determine the optimal control strategy. Therefore, the control output is defined as
u = 1 2 R 1 G X T Φ T w ^
Now, we summarize the first result of this paper as follows:
Theorem 1.
Consider the continuous time nonlinear affine system given by Equation (46) and the infinite time value function defined in Equation (47). Under the adaptive critic neural network control structure specified in Equation (60) with an adaptive law (61) and actor (66), the adaptive optimal control achieves the following properties:
  • In the absence of an N N  approximation error, the adaptive critic weight estimation error w ˜  converges to zero in finite time t 1 > 0 , and the control input u t  converges to the optimal solution u t .
  • In the presence of an N N  approximation error, the adaptive critic weight estimation error w ˜  converges within a finite time t 1 > 0  to a very small neighborhood around zero, and the control input u t  converges to a small bounded neighborhood around the optimal solution u t .
Proof. 
We designed the Lyapunov function similarly to those in [36,37], as follows:
Ψ 2 = Ψ 1 + L 2 V + L 3 2 Λ 1 T Λ 1
L 2 and L 3 are positive constants. Θ ˜ N × n × m × N . We analyzed the Lyapunov function Ψ 2 over a compact set. Assume that the elements P 1 , X , u , Λ 1 are located within Θ ˜ Θ . Further, suppose that the initial conditions of P 1 , X , u , Λ 1 are uniformly positioned inside the Θ ˜ . Therefore, the initial trajectory has no effect on state X and control u for a duration in time t 0 , T 1 . In Equation (67), the term L 2 V X involves V = V X T X ˙ . H J B (50) can be defined as
0 = r X , u + V X T F X + G X u
By applying Young’s Inequality a b η 1 2 a 2 + 1 2 η 1 b 2 and using the constant η 1 > 0 , along with results from [38,42,44], the derivative of Ψ 2 can be derived as
Ψ ˙ 2 = Ψ ˙ 1 + L 2 V X F + G u + L 3 Λ 1 T Λ ˙ 1 = L 1 P 1 T M 1 1 Γ 1 1 w ^ ˙ + Λ ˙ 1 + L 2 r X , u + L 3 Λ 1 T l Λ 1 + Δ ϕ ξ B α 1 P 1 α 2 S x α 3 v 2 α 4 Λ 1 2 + β 1
where α 1 = 1 L 1 δ ¯ 1 λ max Γ 1 1 / σ 1 , α 2 = L 2 , α 3 = L 2 λ min R , and α 4 = L 3 l L 3 η 1 / 2 are positive constants for a properly chosen L 1 , L 2 , L 3 , and η 1 with 0 < L 1 < σ 1 / λ max Γ 1 1 δ 1 , L 2 > 0 , L 3 > 0 , and 0 < η 1 < 2 l , respectively. β 1 = L 3 Δ ϕ ξ B / 2 η 1 addresses the effect of the neural network approximation error.
Thus, the first four terms in the inequality (69) form a negative definite function within the set Θ ˜ , ensuring the existence of an ultimate bounded set Θ u . The bound depends on the magnitude of β 1 ; as β 1 decreases, the set Θ u also becomes smaller. Assuming N is sufficiently large, this implies that β 1 is small enough, such that Θ u Θ ˜ . Consequently, it follows that no trajectory can escape from Θ u , indicating that it is an invariant set. This means that the state X t remains bounded, and the function X t stays within bounds. Similarly, the approximation errors ξ X , ϕ X are bounded functions defined over the compact set. Based on the Lyapunov theorem and Lemma 2, both Ψ 2 and w ˜ will converge to a set of ultimate boundedness. Furthermore, according to Equations (54)–(66), the difference between the actor and the optimal control is bounded by
u u 1 2 R 1 G X T ϕ w ˜ + 1 2 R 1 G X T ξ
This occurs within a finite time t 1 , after which the bound holds. This establishes part b while part a can be easily deduced. □
Remark 5.
In the absence of prior knowledge of system drift F X , the proposed GPI method (Theorem 1) is a partially model-free algorithm that can solve nonlinear optimal control problems in continuous time through online approximation. This approach further eliminates the need for the identifier of the dual approximation structure in [36]. Additionally, because the critic weights converge within finite time, the actor neural network used in [35] is not required. The adaptive critic and actor continuously update each other, effectively avoiding the mixed structure in [34] and eliminating the need for a stable initial control policy as required in [34,43].

4. Simulation and Experiment

To further evaluate the performance of the nonlinear controller, this section applies the control method presented in Section 3 to the parallel quadruped robot described in Section 2. Both computer simulations and physical experiments were conducted to validate the approach.

4.1. Simulation

In this section, a co-simulation model of the parallel quadruped robot was built using AMESim 2021.2 and MATLAB 2017b software as shown in Figure 7. The kinematics model of the parallel quadruped robot was established in AMESim, which was composed of a joint torque motor, five-bar mechanism, electric drive, hinge drive, rotary joint, and foot-end. The algorithm module was built in simulink. The co-simulation not only improved the accuracy of the simulation process but also better captured the robot’s dynamic characteristics and movement capabilities. The key system parameters are listed in Table 1.
To further verify the generalizability of the developed tracking control method, we also validated it for the reference trajectories with different target steady states. The reference trajectories are defined as y = 0.4 t + 0.2 t 2 , t 0 , 2 ; y = 1.6 + 1.2 t 2 , t 2 , 10 ; y = 11.2 + 0.1 t 10 2 , t 10 , 15 ; and y = 13.7 + 0.5 t 15 , t 15 , 25 .
For the optimal trajectory tracking control of the augmented system, it was necessary to minimize the value of the cost function. This required the construction of a critic neural network system. Therefore, the activation function of the neural network was defined as
V X = X 1 2 , X 1 X 2 , X 1 X 3 , X 1 X 4 , X 2 2 , X 2 X 3 , X 2 X 4 , X 3 2 , X 3 X 4 , X 4 2 T
We defined the weights of the neural network as
w = w 1 , w 2 , w 3 , w 4 , w 5 , w 6 , w 7 , w 8 , w 9 , w 10
During the learning process of the algorithm, the initial weight value of the critic was set to w 0 = 0 .
The learning gain was set to Γ 1 = 4.8 , and the simulation duration was set to 20 s. The neural network weights converged to X 1 2 , X 1 X 2 , X 1 X 3 , X 1 X 4 , X 2 2 , X 2 X 3 , X 2 X 4 , X 3 2 , X 3 X 4 , X 4 2 , as illustrated in Figure 8. With an increase in time, all the neural network weight curves tended to a horizontal straight line or converged to a limited range, and the results verify their convergence, which confirms the effectiveness and feasibility of the proposed algorithm. The results verify the convergence of x , thereby confirming the validity and feasibility of the proposed algorithm. At the same time, the PID control algorithm, sliding mode control (SMC) algorithm, and the algorithm design in [45] (applied ANN, Chen (2022)) were compared with the GPI simulation results, which more intuitively highlighted the strong tracking effect of the controller in this paper on speed and displacement. Specifically, the demonstrated convergence behavior illustrates that the solution is close to the desired state, affirming the robustness of the design. To better illustrate the advantages of the algorithm introduced in this study, two performance metrics were proposed and analyzed in detail, including the integral of absolute error I A E = 0 t e d τ and the integral of absolute input I A I = 0 t u d τ . The successful online convergence of the criticism neural network makes the nonlinear system have strong tracking performance, which is particularly obvious when the system converges to the target trajectory quickly, as shown in Figure 9 and Figure 10. The effectiveness of the proposed control method is further demonstrated in Figure 11 and Figure 12, which show that the tracking error decreases rapidly and converges to zero; especially when large tracking errors occur, the GPI control method is able to track the setpoint curve faster than other control methods and reduce the error and converge it to zero. This result indicates not only the high level of tracking accuracy of GPI, but also the stable system response, emphasizing the stability and reliability of the control strategy. In addition, the control input signals shown in Figure 13 further support the ability of the system to achieve the desired control target. The control input maintains good performance throughout the process, ensuring stability and efficient tracking. Figure 14 and Figure 15 illustrate the IAE and IAI of the four methods. As shown in the figures, the proposed method outperforms the other three methods in both metrics. This demonstrates its significant advantages in error control and input consumption. Notably, under complex working conditions, the method is more effective in balancing system performance and energy consumption.

4.2. Experiment

To further validate the feasibility of the algorithm, an experimental setup was constructed, as shown in Figure 16. The setup consisted of a parallel quadruped robot, dis laser radar, depth camera, and other related components, as shown in Figure 17.
In the experiment, a series of trajectory tracking tasks were assigned to the robot, and the precise data of the speed and displacement over time were obtained in real time by a laser radar and depth camera. The experimental results show that the quadruped robot can track the reference trajectory curve with high precision, and when there is a large error, it can also be adjusted quickly in a very short time, with strong tracking performance and robustness. As shown in Figure 18 and Figure 19, when the displacement and velocity are changed suddenly, the quadruped robot can track the set curve quickly in a very short time, so that the error converges to zero or a minimum value quickly, which further verifies the effectiveness of the proposed control algorithm in practical applications.

5. Conclusions

In this paper, we developed an optimal control method for quadruped robot speed control based on ADP. This method successfully designs a speed controller that addresses the problem of ineffective control over the movement speed of parallel quadruped robots. The approach studies the system’s tracking problem from an optimal control perspective. By using tracking errors and reference trajectories, an adaptive law was designed to approximate the value function weights with guaranteed convergence, transforming the tracking problem into an optimal control problem of error convergence. By selecting appropriate activation functions, the controller achieves uniform function V X approximation. Through the use of sliding mode techniques, the weight estimation error is made to converge to zero or within a compact set, thereby obtaining the optimal control solution and achieving the adaptive modulation of the quadruped robot’s movement speed. This process was supported by constructing a cost function with a discount factor to ensure boundedness. Additionally, an actor focused on policy improvement was introduced to solve the optimal control problem by addressing the Bellman equation. The introduction of an ADP scheme based on policy improvement, which serves as a replacement for the original critic–actor structure, facilitates the direct computation of the optimal value function using GPI. Simulations confirm the effectiveness of the proposed algorithm. This method enhances the stability of speed control for quadruped robots, ensuring better adaptability to varying road conditions and dynamic environmental changes, thus improving the robot’s reliability and safety in real-world applications.

Author Contributions

Methodology, B.J.; Data curation, S.T. and B.J.; Writing—original draft, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Biswal, P.; Mohanty, P.K. Development of quadruped walking robots: A review. Ain Shams Eng. J. 2021, 12, 2017–2031. [Google Scholar] [CrossRef]
  2. Chen, Z.; Fan, T.; Zhao, X.; Liang, J.; Shen, C.; Chen, H.; Manocha, D.; Pan, J.; Zhang, W. Autonomous social distancing in urban environments using a quadruped robot. IEEE Access 2021, 9, 8392–8403. [Google Scholar] [CrossRef]
  3. Zong, H.; Zhang, J.; Jiang, L.; Zhang, K.; Shen, J.; Lu, Z.; Wang, K.; Wang, Y. Bionic lightweight design of limb leg units for hydraulic quadruped robots by additive manufacturing and topology optimization. Bio-Des. Manuf. 2024, 7, 1–13. [Google Scholar] [CrossRef]
  4. Sun, J.; Zhou, L.; Geng, B.; Zhang, Y.; Li, Y. Leg State Estimation for Quadruped Robot by Using Probabilistic Model With Proprioceptive Feedback. IEEE/ASME Trans. Mechatron. 2024. [Google Scholar] [CrossRef]
  5. Sun, Y.; Pancheri, F.; Rehekampff, C.; Lueth, T.C. TurBot: A Turtle-Inspired Quadruped Robot Using Topology Optimized Soft-Rigid Hybrid Legs. IEEE/ASME Trans. Mechatron. 2024, 29, 3193–3202. [Google Scholar] [CrossRef]
  6. Zhang, C.; Zhao, J.; Zhang, T.; Niu, Q.; Gu, Y.; Yan, S.; Wang, W. Design and Simulation of Bionic Quadruped Obstacle-Overcoming Robot. Adv. Mater. Technol. 2024, 2400992. [Google Scholar] [CrossRef]
  7. Yuan, J.; Wang, S.; Wang, B.; Shi, R.; Wu, X.; Li, L.; Li, W.; Wang, Z.; Dai, Z. Design of a Quadruped Robot with Morphological Adaptation through Reconfigurable Sprawling Structure and Method. Adv. Intell. Syst. 2024, 6, 2300645. [Google Scholar] [CrossRef]
  8. Liu, X.; Wu, J.; Xue, Y.; Qi, C.; Xin, G.; Gao, F. Skill Latent Space Based Multigait Learning for a Legged Robot. IEEE Trans. Ind. Electron. 2024. [Google Scholar] [CrossRef]
  9. Sun, H.; Yang, J.; Jia, Y.; Zhang, C.; Yu, X.; Wang, C. Fall prediction, control, and recovery of quadruped robots. ISA Trans. 2024. [Google Scholar] [CrossRef]
  10. Chen, C.; Li, C.; Lu, H.; Wang, Y.; Xiong, R. Meta Reinforcement Learning of Locomotion Policy for Quadruped Robots with Motor Stuck. IEEE Trans. Autom. Sci. Eng. 2024. [Google Scholar] [CrossRef]
  11. Nie, Y.; Yuan, Q.; Li, X. A Tracking Control Approach with Sequence-Scaling Lyapunov-Based MPC for Quadruped Robots. IEEE Trans. Ind. Inform. 2024. [Google Scholar] [CrossRef]
  12. Jiang, H.; Chen, T.; Cao, J.; Bi, J.; Lu, G.; Zhang, G.; Rong, X.; Li, Y. Stable skill improvement of quadruped robot based on privileged information and curriculum guidance. Robot. Auton. Syst. 2023, 170, 104550. [Google Scholar] [CrossRef]
  13. He, J.Y.; Shao, J.P.; Sun, G.T.; Shao, X. Survey of quadruped robots coping strategies in complex situations. Electronics 2019, 8, 1414. [Google Scholar] [CrossRef]
  14. Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 2020, 5, eabc5986. [Google Scholar] [CrossRef]
  15. Miki, T.; Lee, J.; Hwangbo, J.; Wellhausen, L.; Koltun, V.; Hutter, M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 2022, 7, eabk2822. [Google Scholar] [CrossRef]
  16. Ioannou, P.; Fidan, B. Adaptive Control Tutorial; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2006. [Google Scholar]
  17. Lewis, F.L.; Vrabie, D.; Syrmos, V.L. Optimal Control; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  18. Modares, H.; Lewis, F.L.; Jiang, Z.P. H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
  19. Kiumarsi, B.; Lewis, F.L. Actor–critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 140–151. [Google Scholar] [CrossRef]
  20. AlMahamid, F.; Grolinger, K. Reinforcement learning algorithms: An overview and classification. In Proceedings of the 2021 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Virtual, 12–17 September 2021; IEEE: New York, NY, USA, 2021; pp. 1–7. [Google Scholar]
  21. Werbos, P. Approximate dynamic programming for real-time control and neural modeling. In Handbook of Intelligent Control; Van Nostrand Reinhold: New York, NY, USA, 1992. [Google Scholar]
  22. Wei, Q.; Zhu, L.; Song, R.; Zhang, P.; Liu, D.; Xiao, J. Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 879–892. [Google Scholar] [CrossRef]
  23. Jiang, Y.; Jiang, Z.P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 882–893. [Google Scholar] [CrossRef]
  24. Jiang, Z.P.; Jiang, Y. Robust adaptive dynamic programming for linear and nonlinear systems: An overview. Eur. J. Control 2013, 19, 417–425. [Google Scholar] [CrossRef]
  25. Wang, D.; Mu, C.; He, H.; Liu, D. Event-driven adaptive robust control of nonlinear systems with uncertainties through NDP strategy. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1358–1370. [Google Scholar] [CrossRef]
  26. Lee, C.; An, D. Reinforcement learning and neural network-based artificial intelligence control algorithm for self-balancing quadruped robot. J. Mech. Sci. Technol. 2021, 35, 307–322. [Google Scholar] [CrossRef]
  27. Liu, X.; Zhang, W.; Shao, C.; Wang, Y.; Cong, Q.; Ma, L. Autonomous collaborative optimization control of earth pressure balance shield machine based on hierarchical control architecture. Eng. Appl. Artif. Intell. 2024, 137, 109200. [Google Scholar] [CrossRef]
  28. Hu, C.; Wang, Z.; Bu, X.; Zhao, J.; Na, J.; Gao, H. Optimal Tracking Control for Autonomous Vehicle with Prescribed Performance via Adaptive Dynamic Programming. IEEE Trans. Intell. Transp. Syst. 2024, 25, 12437–12449. [Google Scholar] [CrossRef]
  29. An, K.; Wang, Z.; Huang, W.; Liu, S. Performance-prescribed optimal neural control for hypersonic vehicles considering disturbances: An adaptive dynamic programming approach. Aerosp. Sci. Technol. 2024, 152, 109370. [Google Scholar] [CrossRef]
  30. Zhao, J.; Jia, B.; Zhao, Z. Model-Free H Prescribed Performance Control of Adaptive Cruise Control Systems via Policy Learning. IEEE Trans. Intell. Transp. Syst. 2024. [Google Scholar] [CrossRef]
  31. Chen, A.S.; Herrmann, G. Adaptive optimal control via continuous-time q-learning for unknown nonlinear affine systems. In Proceedings of the 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, 11–13 December 2019; IEEE: New York, NY, USA, 2019; pp. 1007–1012. [Google Scholar]
  32. Zhao, J.; Lv, Y.; Zhao, Z.; Wang, Z. Adaptive Optimal Tracking Control of Servo Mechanisms via Generalized Policy Learning. IEEE Trans. Instrum. Meas. 2024. [Google Scholar] [CrossRef]
  33. Zeng, Q.; Jia, B.; Zhao, J.; Wan, L. Optimal tracking control of hydraulic support initial support force based on adaptive dynamic programming. In Proceedings of the 2024 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024; IEEE: New York, NY, USA, 2024; pp. 1736–1741. [Google Scholar]
  34. Vrabie, D.; Lewis, F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009, 22, 237–246. [Google Scholar] [CrossRef]
  35. Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  36. Na, J.; Herrmann, G. Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems. IEEE/CAA J. Autom. Sin. 2014, 1, 412–422. [Google Scholar] [CrossRef]
  37. Vrabie, D.; Vamvoudakis, K.G.; Lewis, F.L. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles; IET: Stevenage, UK, 2012. [Google Scholar]
  38. Abu-Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
  39. Slotine, J.J.; Li, W. Applied Nonlinear Control, Prentice-Hall, Englewood Cliffs. 1991. Available online: https://www.semanticscholar.org/paper/Applied-Nonlinear-Control-Slotine-Li/1ae0d9625f9f580a3b8d8e92a0edbc2087a1cc0e (accessed on 10 November 2024).
  40. Na, J.; Mahyuddin, M.N.; Herrmann, G.; Ren, X.; Barber, P. Robust adaptive finite-time parameter estimation and control for robotic systems. Int. J. Robust Nonlinear Control 2015, 25, 3045–3071. [Google Scholar] [CrossRef]
  41. Khalil, H.K.; Grizzle, J. Nonlinear Systems; Prentice hall: Upper Saddle River, NJ, USA, 2002; Volume 3. [Google Scholar]
  42. Utkin, V.I. Sliding Modes in Control and Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  43. Lee, J.Y.; Park, J.B.; Choi, Y.H. Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 916–932. [Google Scholar] [PubMed]
  44. Nevistic, V.; Primbs, J.A. Constrained Nonlinear Optimal Control: A Converse HJB Approach; Technical Memorandum No. CIT-CDS 96-021; California Institute of Technology: Pasadena, CA, USA, 1996. [Google Scholar]
  45. Chen, L.; Dai, S.L.; Dong, C. Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 7520–7533. [Google Scholar] [CrossRef]
Figure 1. Simplified 3D model of the quadruped robot.
Figure 1. Simplified 3D model of the quadruped robot.
Machines 12 00875 g001
Figure 2. Kinematic model in the initial state.
Figure 2. Kinematic model in the initial state.
Machines 12 00875 g002
Figure 3. Kinematic model of the swing phase.
Figure 3. Kinematic model of the swing phase.
Machines 12 00875 g003
Figure 4. Kinematic model of the stance phase.
Figure 4. Kinematic model of the stance phase.
Machines 12 00875 g004
Figure 5. Schematic diagram of parallel leg mechanism.
Figure 5. Schematic diagram of parallel leg mechanism.
Machines 12 00875 g005
Figure 6. Algorithm flow chart.
Figure 6. Algorithm flow chart.
Machines 12 00875 g006
Figure 7. Picture of the co-simulation model.
Figure 7. Picture of the co-simulation model.
Machines 12 00875 g007
Figure 8. Neural network weight curve diagram.
Figure 8. Neural network weight curve diagram.
Machines 12 00875 g008
Figure 9. Robot simulation displacement tracking curve diagram.
Figure 9. Robot simulation displacement tracking curve diagram.
Machines 12 00875 g009
Figure 10. Robot simulation velocity tracking curve diagram.
Figure 10. Robot simulation velocity tracking curve diagram.
Machines 12 00875 g010
Figure 11. Tracking error of e 1 curve diagram [45].
Figure 11. Tracking error of e 1 curve diagram [45].
Machines 12 00875 g011
Figure 12. Tracking error of e 2 curve diagram [45].
Figure 12. Tracking error of e 2 curve diagram [45].
Machines 12 00875 g012
Figure 13. System input curve diagram.
Figure 13. System input curve diagram.
Machines 12 00875 g013
Figure 14. IAE performance comparison diagram [45].
Figure 14. IAE performance comparison diagram [45].
Machines 12 00875 g014
Figure 15. IAI performance comparison diagram [45].
Figure 15. IAI performance comparison diagram [45].
Machines 12 00875 g015
Figure 16. Physical model of the quadruped robot.
Figure 16. Physical model of the quadruped robot.
Machines 12 00875 g016
Figure 17. Physical model of the quadruped robot.
Figure 17. Physical model of the quadruped robot.
Machines 12 00875 g017
Figure 18. Robot experiment displacement tracking curve diagram.
Figure 18. Robot experiment displacement tracking curve diagram.
Machines 12 00875 g018
Figure 19. Robot experiment velocity tracking curve diagram.
Figure 19. Robot experiment velocity tracking curve diagram.
Machines 12 00875 g019
Table 1. Key parameter values.
Table 1. Key parameter values.
VariableValueUnit
a0.14m
b0.28m
h 0 0.18m
S0.18m
H0.06m
θ 0 121.83 °
m 1 42.51g
m 2 78.29g
q 1 0.07m
q 2 0.14m
J 1 1.1 k g · c m 2
J 2 4.68 k g · c m 2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, J.; Tang, S.; Jia, B. Control of Parallel Quadruped Robots Based on Adaptive Dynamic Programming Control. Machines 2024, 12, 875. https://doi.org/10.3390/machines12120875

AMA Style

Liang J, Tang S, Jia B. Control of Parallel Quadruped Robots Based on Adaptive Dynamic Programming Control. Machines. 2024; 12(12):875. https://doi.org/10.3390/machines12120875

Chicago/Turabian Style

Liang, Junwei, Shenyu Tang, and Bingyi Jia. 2024. "Control of Parallel Quadruped Robots Based on Adaptive Dynamic Programming Control" Machines 12, no. 12: 875. https://doi.org/10.3390/machines12120875

APA Style

Liang, J., Tang, S., & Jia, B. (2024). Control of Parallel Quadruped Robots Based on Adaptive Dynamic Programming Control. Machines, 12(12), 875. https://doi.org/10.3390/machines12120875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop