Next Article in Journal
A SiC-MOSFET Bidirectional Switch Solution for Direct Matrix Converter Topologies
Previous Article in Journal
From Manned to Unmanned Helicopters: A Transformer-Driven Cross-Scale Transfer Learning Framework for Vibration-Based Anomaly Detection
error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning

1
School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
2
Tianjin Research Institute of Construction Machinery Co., Ltd., Tianjin 300409, China
*
Author to whom correspondence should be addressed.
Actuators 2026, 15(1), 39; https://doi.org/10.3390/act15010039
Submission received: 22 November 2025 / Revised: 20 December 2025 / Accepted: 4 January 2026 / Published: 6 January 2026
(This article belongs to the Section Control Systems)

Abstract

Achieving high-precision motion control for hydraulic manipulators presents a challenging task. Addressing the issue of low motion control accuracy caused by the strong electromechanical-hydraulic coupling characteristics of hydraulic manipulator systems, this paper innovatively introduces an RBF neural network and an Actor–Critic reinforcement learning architecture within a performance-based control framework designed using the inverse method. This approach enables dual compensation for both internal uncertainties and external disturbances within the manipulator, thereby enhancing the system’s control performance. First, within the control architecture, the performance function ensures system transient performance while employing an RBF neural network to estimate and compensate for internal unmodeled errors caused by mechanical coupling and hydraulic parameter uncertainties. Stability proofs are used to derive the network weight update rate. Second, a disturbance compensator is designed based on reinforcement learning. Deployed into the controller through offline training and online adaptation, it compensates for external system disturbances, further improving control accuracy. Finally, comparative and ablation experiments conducted on a hydraulic manipulator testbed demonstrate the effectiveness of the disturbance compensator. Compared to PID control, the proposed approach achieves a 60–65% improvement in control accuracy.

1. Introduction

Manipulators serve as indispensable high-tech equipment in energy and public welfare sectors such as mining operations, emergency rescue, urban and rural infrastructure development, and industrial applications. They represent high-value-added, multifunctional flagship products [1,2]. They increasingly meet market demands for high-precision motion control amid the trend toward intelligentization. Not only do they significantly enhance operational efficiency and work quality, but they also reduce human workload, improve construction safety and automation levels, and lay the foundation for intelligent development [3].
The primary methods currently applied to the motion control of manipulators include Proportional Integral Derivative (PID) control [4], adaptive robust control [5,6], fuzzy logic control [7], improved Linear Quadratic Regulator [8], and sliding mode control [9]. Although PID, as a typical model-free controller, offers advantages such as convenient parameter tuning and simple structure, its inability to effectively address the strong electromechanical-hydraulic coupling characteristics of manipulators results in low control accuracy, making it difficult to meet current market demands [10]. Furthermore, unlike electrically driven manipulators, hydraulically driven manipulators present unpredictable, complex operating conditions and external disturbances. Their inherent joint strong coupling characteristics and uncertainties in hydraulic system parameters, specifically, that manipulators commonly exhibit nonlinear differential equation structures, friction nonlinearities, and uncertainties in parameters such as load, hydraulic oil volumetric elastic modulus, and viscous friction coefficients. Additionally, external disturbances, oil leakage rates, and dynamic friction forces present difficult-to-model uncertain nonlinearities [11,12]. These factors compromise the accuracy of model-based control algorithms. There is an urgent need for a control algorithm that can overcome the challenges inherent in hydraulic manipulators and enhance their motion control precision.
To enhance control accuracy, prior work introduced extended state observers (ESOs) and nonlinear disturbance observers (NDOs) to monitor system uncertainties and disturbances [13]. A classic approach integrates these observers into adaptive robust control strategies, enabling both precise estimation of unknown disturbances and effective handling of parameter uncertainties [14]. Building upon this foundation, Zhang et al. proposed an adaptive robust control method based on a dual extended state observer to observe the uncertainties in a hydraulic manipulator system, achieving good tracking accuracy [15]. Shi et al. effectively resolved the trade-off between tracking accuracy and disturbance rejection capability by employing a disturbance compensation mechanism based on NDO to address external disturbances and unmodeled dynamics [16]. Although the aforementioned methods effectively compensate for parameter and unmodeled nonlinearities, they remain sensitive to sensor noise. The high computational demands for real-time operation introduce significant control feedback delays, hindering practical implementation.
In manipulator motion control, neural networks are widely employed as feedforward compensators to address system nonlinearity and model uncertainty due to their strong approximation capabilities and learning capacity [17,18]. Asko Rouvinen et al.approximate the uncertainties in the system by using different methods of neural network, and obtain good control performance [19]. Tran et al. designed an adaptive control method for multi-degree-of-freedom hydraulic manipulators, employing RBF neural networks to approximate and compensate for uncertainties, achieving high control performance [20]. Su et al. proposed a manipulator control method combining adaptive neural networks with backstepping control to compensate for unknown system dynamics, enhancing control accuracy [21].
The aforementioned control algorithms primarily focus on nonlinear compensation and steady-state tracking performance improvement, without considering the transient performance of the control system. In current research, prescriptive performance control is achieved by setting different performance functions [22,23], keeping the tracking error within an adjustable range and demonstrating significant transient and steady-state performance. Wang Shubo et al. proposed an adaptive funnel control scheme for manipulators with unknown blind zones. Error transformation was applied in controller design [24]. Yang et al. introduced a novel adaptive dynamic surface asymptotic tracking controller based on neural networks, approximating unknown dynamics while ensuring tracking error remains within predetermined boundaries via performance functions, achieving high transient tracking performance [25]. Liang Xianglong et al. proposed an adaptive neural network admittance control method based on integral obstacle Lyapunov functions, achieving excellent position and force tracking control while ensuring system outputs remain within preset limits [26].
In recent years, researchers have explored applying reinforcement learning to hydraulic manipulators. As an intelligent algorithm applicable to both model-based and model-free scenarios, its Actor–Critic architecture has seen the widest industrial adoption [27]. Yao et al. combined the Actor–Critic reinforcement learning architecture with PID control, achieving high motion accuracy through online PID parameter tuning [28]. Additionally, when integrated with RISE controllers, reinforcement learning estimated and compensated for unmodeled errors in the system, similarly yielding high control precision [29]. These studies demonstrate the feasibility of applying reinforcement learning to hydraulic manipulators.
Building on the above research, this paper addresses the current challenge in hydraulic manipulator control algorithms, which struggle to simultaneously accommodate both system transient response and internal/external uncertainties. By employing a performance-based controller as the foundational architecture, the system’s position output error can be constrained within predetermined boundaries, thereby ensuring transient performance. Furthermore, an RBF neural network is introduced to estimate and compensate for unmodeled internal errors. Simultaneously, a disturbance compensator based on the Actor–Critic reinforcement learning architecture is designed to handle external disturbance uncertainties. This dual compensation mechanism significantly enhances the system’s motion control accuracy.
The rest of this paper is as follows. Section 2 constructs the excavator model; Section 3 designs the proposed control algorithm; Section 4 analyzes the algorithm’s stability and convergence; Section 5 experimentally validates the algorithm’s application; Section 6 summarizes the paper.

2. Hydraulic Manipulator Model

The hydraulic manipulator studied in this paper is modified from an excavator, with its structure shown in Figure 1. This paper only considers the compound motion of the boom and arm joints within a single plane. The dynamics of the hydraulic manipulator in the joint space can be described as follows:
The force balance equation of inertial load can be expressed as:
M ( q ) q ¨ + C ( q , q ˙ ) q ˙ + B q ˙ + G ( q ) + Δ M ( q ) q ¨ + Δ C ( q , q ˙ ) q ˙ + Δ B q ˙ + Δ G ( q ) = τ
where M ( q ) R 3 × 3 represents the inertia matrix; C ( q , q ˙ ) R 3 × 3 denotes the Coriolis and centrifugal force matrix; G ( q ) R 3 signifies the gravitational force matrix; B R 3 × 3 indicates the viscous friction coefficient matrix; q R 3 , q ˙ R 3 , and q ¨ R 3 respectively denote the joint angular displacement, angular velocity, and angular acceleration vectors; Δ M ( q ) , Δ C ( q , q ˙ ) , Δ G ( q ) , Δ B represent the unmodeled portions of joint coupling dynamics; τ denotes the torque vector acting on the joint.
The motion of this hydraulic manipulator is driven by a single-rod hydraulic cylinder, as shown in Figure 2. Let x h a = [ x h a 1 , x h a 2 ] T denote the displacement vector of the boom and arm hydraulic cylinder, i.e., the displacement vector in the actuator space. Then, the velocity vector x ˙ h a can be expressed in terms of the joint vectors q and q ˙ as follows:
x ˙ h a = x h a q q ˙ = d i a g x h a 1 q 1 , x h a 2 q 2 q ˙
The torque vector acting on the manipulator joint can be expressed as:
τ = J h a ( A 1 P 1 A 2 P 2 ) = x h a q ( A 1 P 1 A 2 P 2 )
where A 1 is the piston area of the rodless chamber of the hydraulic cylinder; A 2 is the piston area of the rod chamber of the hydraulic cylinder; P 1 is the pressure in the rodless chamber of the hydraulic cylinder; P 2 is the pressure in the rod chamber of the hydraulic cylinder;
Neglecting internal leakage, the dynamic pressure equations for the rod chamber and rodless chamber of the hydraulic cylinder can be expressed as [30]:
P ˙ 1 = β e A 1 x ˙ h a + Q 1 / V 1               P ˙ 2 = β e A 2 x ˙ h a Q 2 / V 2
where β e is the effective volumetric modulus of the oil; V 1 = V 01 + A 1 x h a is the rodless chamber volume of the hydraulic cylinder; V 01 is the initial volume of the rodless chamber; V 2 = V 02 A 2 x h a is the rod chamber volume; V 02 is the initial volume of the rod chamber; Q 1 is the supply flow rate to the hydraulic cylinder; Q 2 is the return flow rate from the hydraulic cylinder.
The hydraulic cylinder flow equation is:
Q 1 = k v u s u P s P 1 + s u P 1 P r   Q 2 = k v u s u P 2 P r + s u P s P 2
where k v represents flow gain; u denotes the proportional valve control signal; P s indicates the supply pressure; and P r signifies the return pressure.
Define state variable x = [ x 1 , x 2 , x 3 ] T , where x 1 = q , x 2 = q ˙ , x 3 = A 1 P 1 A 2 P 2 . The state equation is:
x ˙ 1 = x 2 x ˙ 2 = M 1 ( J h a x 3 N 1 x 2 + N 2 x 2 ) x ˙ 3 = g 1 ( x 3 ) u g 2 ( x 2 )
where
N 1 x 2 = C x 2 + B x 2 + G                                         N 2 x 2 = Δ M x ˙ 2 Δ C x 2 Δ B x 2 Δ G g 1 ( x 3 ) = β e k v A 1 R 1 / V 1 + A 2 R 2 / V 2           g 2 ( x 2 ) = β e J h a A 1 2 / V 1 + A 2 2 / V 2 x 2 R 1 = s u P s P 1 + s u P 1 P r           R 2 = s u P 2 P r + s u P s P 2

3. Controller Design

This section first designs a performance-based controller that employs an RBF neural network to estimate and compensate for unmodeled mechanical and hydraulic errors in the manipulator. Subsequently, building upon this performance controller, a disturbance compensator is designed using the Actor–Critic reinforcement learning architecture to counteract external disturbances, thereby further enhancing control accuracy.

3.1. RBF Neural Network

The RBF neural network consists of an input layer, a hidden layer, and an output layer. Due to its simple structure, fast learning convergence, and ability to approximate continuous nonlinear functions on compact sets with arbitrary precision, it is commonly used in controllers for real-time fitting of unmodeled errors. Adaptive laws designed based on Lyapunov stability theory can adjust network weights online, enabling the controller to compensate for unmodeled errors while ensuring the stability and tracking performance of the closed-loop system [31]. The transformation from the input layer to the hidden layer is nonlinear, while the transformation from the hidden layer to the output layer is linear. The activation function of the neurons in the hidden layer is the radial basis function. Any continuous unknown function can be approximated as:
F ( X ) = W * T ϕ ( X ) + ε
where W * T c × v is the weight vector, c and v represent the number of outputs and the number of hidden layer neurons, respectively; X z × l is the input to the neural network, z is the number of inputs; ε is the neural network approximation error satisfying ε ε ¯ , ε ¯ is a positive constant; ϕ ( X ) = [ ϕ 1 ( X ) , ϕ 2 ( X ) , , ϕ l ( X ) ] is the activation function, expressed as:
ϕ i ( X ) = exp ( X μ i ) T ( X 1 μ i ) / σ i 2 , i = 1 , 2 , l
where μ i and σ i represent the center and width of the Gaussian function for the i th neuron, respectively.

3.2. Performance Function

The time-varying logarithmic Lyapunov function is designed as follows:
V = 1 2 ln k a 2 t k a 2 t z T z
where k a t represents the error constraint boundary, k a t > 0 ; z denotes tracking errors.
As shown in Equation (10), when the system tracking error variable exceeds the time-varying boundary, the function value of the time-varying log-type Lyapunov function tends toward infinity. Based on this characteristic, by rigorously proving the boundedness of this Lyapunov function under the controller’s action, it is ensured that the closed-loop system tracking error strictly remains within the preset event constraint boundary k a t , k a t , where k a i t is expressed as:
k a ( t ) = ( k a 0 k a ) exp n t + k a
where k a 0 , k a , n are constants greater than zero and satisfy 0 < k a < k a 0 , z 0 < k a 0 ; z 0 is the initial value of z . The image of k a ( t ) is shown in Figure 3:

3.3. Prescribed Performance Controller Design

Define the error variable as:
z 1 = x 1 x 1 d ,   z 2 = x 2 α 1 ,   z 3 = x 3 α 2
where α 1 , α 2 are virtual control variables.
Define the first Lyapunov function as:
V 1 = 1 2 ln k b 1 2 t k b 1 2 t z 1 T z 1
The derivation of V 1 yields:
V ˙ 1 = z 1 T k b 1 2 z 1 T z 1 x 2 x ˙ 1 d k ˙ b 1 k b 1 z 1 = z 1 T k b 1 2 z 1 T z 1 z 2 + α 1 x ˙ 1 d k ˙ b 1 k b 1 z 1
For the convenience of representation, in the following, ln · i is used to denote ln ( k b i 2 t / k b i 2 t z i T z i ) , i = 1 , 2 , 3 , and the virtual control rate α 1 is designed based on Equation (14) and the fixed time theory as:
α 1 = x ˙ 1 d + k ˙ b 1 k b 1 z 1 k 11 ( ln γ 1 ( · ) 1 ( k b 1 2 z 1 T z 1 ) z 1 T ) k 12 ( ln γ 2 ( · ) 1 ( k b 1 2 z 1 T z 1 ) z 1 T )
where k 11 and k 12 are constants greater than zero; 0 < γ 1 < 1 , 1 < γ 2 < .
Substituting Equation (15) into Equation (14) yields:
V ˙ 1 = z 1 T z 2 k b 1 2 z 1 T z 1 k 11 ln γ 1 ( · ) 1 k 12 ln γ 2 ( · ) 1
Further, the second Lyapunov function is designed:
V 2 = V 1 + 1 2 ln k b 2 2 t k b 2 2 t z 2 T z 2 + 1 2 η w 1 W ˜ 1 T W ˜ 1 + 1 2 η ϑ 1 ϑ ˜ 1 T ϑ ˜ 1
where W ˜ 1 = W 1 W ^ 1 , ϑ ˜ 1 = ϑ 1 ϑ ^ 1 , η w 1 and η ϑ 1 are constants greater than zero.
The derivation of Equation (17) is obtained:
V ˙ 2 = V ˙ 1 + z 2 T k b 2 2 z 2 T z 2 J h a M x 3 N 1 ( x 2 ) M + N 2 ( x 2 ) M α ˙ 1 k ˙ b 2 k b 2 z 2 1 η w 1 W ˜ 1 T W ^ ˙ 1 1 η ϑ 1 ϑ ˜ 1 T ϑ ^ ˙ 1 = k 11 ln γ 1 ( · ) 1 k 12 ln γ 2 ( · ) 1 + z 2 T k b 2 2 z 2 T z 2 J h a M z 3 + J h a M α 2 N 2 ( x 2 ) M + f 1 k ˙ b 2 k b 2 z 2 1 η w 1 W ˜ 1 T W ^ ˙ 1 1 η ϑ 1 ϑ ˜ 1 T ϑ ^ ˙ 1
where f 1 = N 2 ( x 2 ) M + d 1 M α ˙ 1 + ( k b 2 2 z 2 T z 2 ) z 1 k b 1 2 z 1 T z 1 .
The virtual control volume α 2 is designed according to Equation (18) as:
α 2 =   M J h a f ^ 1 + N 1 ( x 2 ) M + k ˙ b 2 k b 2 z 2 M J h a k a 2 2 z 2 T z 2 z 2 T k 21 ln γ 1 ( · ) 2 + k 22 ln γ 2 ( · ) 2
where k 21 and k 22 are constants greater than zero.
Let f ^ 1 = W ^ 1 T ϕ ( X 1 ) + ϑ ^ 1 and substituting together with Equation (19) into Equation (18) yields:
V ˙ 2 = i = 1 2 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i + z 2 T k b 2 2 z 2 T z 2 W ˜ 1 T ϕ ( X 1 ) + ϑ ˜ 1 1 η w 1 W ˜ 1 T W ^ ˙ 1 + z 2 T k b 2 2 z 2 T z 2 J h a M z 3 1 η ϑ 1 ϑ ˜ 1 T ϑ ^ ˙ 1
According to Equation (20), the neural network weight update rate can be obtained as:
W ^ ˙ 1 = η w 1 z 2 T k b 2 2 z 2 T z 2 ϕ ( X 1 ) σ w 1 W ^ 1           ϑ ^ ˙ 1 = η ϑ 1 z 2 T k b 2 2 z 2 T z 2 σ ϑ 1 ϑ ^ 1
where both σ w 1 and σ ϑ 1 are constants greater than zero; X 1 = [ q , q ˙ , q d , q ˙ d , α 1 ] . Substituting Equation (21) has:
V ˙ 2 = i = 1 2 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i + σ w 1 η w 1 W ˜ 1 T W ^ 1 + σ ϑ 1 η ϑ 1 ϑ ˜ 1 T ϑ ^ 1 + z 2 T k b 2 2 z 2 T z 2 J h a M z 3
According to Young’s inequality,
W ˜ 1 T W ^ 1 1 2 W ˜ 1 T W ˜ 1 + 1 2 W 1 T W 1             ϑ ˜ 1 T ϑ ^ 1 1 2 ϑ ˜ 1 T ϑ ˜ 1 + 1 2 ϑ 1 T ϑ 1
Further, the deflation of Equation (22) yields:
V ˙ 2 i = 1 2 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i σ w 1 2 η w 1 W ˜ 1 T W ˜ 1 σ ϑ 1 2 η ϑ 1 ϑ ˜ 1 T ϑ ˜ 1 + σ w 1 2 η w 1 W 1 T W 1 + σ ϑ 1 2 η ϑ 1 ϑ 1 T ϑ 1 + z 2 T k b 2 2 z 2 T z 2 J h a M z 3
It can be shown that if z 3 converges, then V 2 eventually reaches consistent boundedness. Define the third Lyapunov function as:
V 3 = V 2 + 1 2 ln k b 3 2 t k b 3 2 t z 3 T z 3 + 1 2 η w 2 W ˜ 2 T W ˜ 2 + 1 2 η ϑ 2 ϑ ˜ 2 T ϑ ˜ 2
Derivation of Equation (25) yields:
V ˙ 3 = V ˙ 2 + z 3 T k b 3 2 z 3 T z 3 g 1 ( x 3 ) u g 2 ( x 2 ) α ˙ 2 k ˙ b 3 k b 3 z 3 1 η w 2 W ˜ 2 T W ^ ˙ 2 1 η ϑ 2 ϑ ˜ 2 T ϑ ^ ˙ 2 i = 1 2 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i σ w 1 2 η w 1 W ˜ 1 T W ˜ 1 σ ϑ 1 2 η ϑ 1 ϑ ˜ 1 T ϑ ˜ 1 + σ w 1 2 η w 1 W 1 T W 1 + σ ϑ 1 2 η ϑ 1 ϑ 1 T ϑ 1 + 1 2 ε 2 + z 3 T k b 3 2 z 3 T z 3 g 1 ( x 3 ) u g 2 ( x 2 ) + f 2 k ˙ b 3 k b 3 z 3 1 η w 2 W ˜ 2 T W ^ ˙ 2 1 η ϑ 2 ϑ ˜ 2 T ϑ ^ ˙ 2
where f 2 = α ˙ 2 + ( k b 3 2 z 3 T z 3 ) z 2 J h a M 1 k b 2 2 z 2 T z 2 .
The control input u is designed according to Equation (26) as:
u = 1 g 1 ( x 3 ) g 2 ( x 2 ) f ^ 2 + k ˙ b 3 k b 3 z 3 k b 3 2 z 3 T z 3 z 3 T ( k 31 ln γ 1 ( · ) 3 k 32 ln γ 2 ( · ) 3 )
where k 31 and k 32 are constants greater than zero;
Let f ^ 2 = W ^ 2 T ϕ ( X 2 ) + ϑ ^ 2 and substituting together with Equation (27) into Equation (26) has:
V ˙ 3 i = 1 3 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i σ w 1 2 η w 1 W ˜ 1 T W ˜ 1 σ ϑ 1 2 η ϑ 1 ϑ ˜ 1 T ϑ ˜ 1 + σ w 1 2 η w 1 W 1 T W 1 + σ ϑ 1 2 η ϑ 1 ϑ 1 T ϑ 1 1 η w 2 W ˜ 2 T W ^ ˙ 2 1 η ϑ 2 ϑ ˜ 2 T ϑ ^ ˙ 2 + z 3 T k b 3 2 z 3 T z 3 ( W ˜ 2 T ϕ ( X 2 ) + ϑ ˜ 2 )
According to Equation (28), the neural network weight update rate can be obtained as:
W ^ ˙ 2 = η w 2 z 3 T k b 3 2 z 3 T z 3 ϕ ( X 2 ) σ w 2 W ^ 2             ϑ ^ ˙ 2 = η ϑ 2 z 3 T k b 3 2 z 3 T z 3 σ ϑ 2 ϑ ^ 2
where both σ w 2 and σ ϑ 2 are constants greater than zero; X 2 = [ q , q ˙ , q d , q ˙ d , α 2 ] .

3.4. Disturbance Compensator

This subsection designs the disturbance compensator based on the reinforcement learning Actor–Critic architecture to compensate for external disturbances based on Equation (27), i.e., [32]:
u = 1 g 1 ( x 3 ) [ g 2 ( x 2 ) f ^ 2 + k ˙ b 3 k b 3 z 3 k b 3 2 z 3 T z 3 z 3 T ( k 31 ln γ 1 ( · ) 3 k 32 ln γ 2 ( · ) 3 ) ] + F D
where F D = F D ( x ( k ) ) , is the disturbance compensator designed in this paper.
Define the stage cost function as:
δ ( k ) = Z 1 T ( k ) M Z 1 ( k ) + u T ( k ) P u ( k )
where M 3 × 3 , P 3 × 3 , are semipositive definite matrices. Parameter M serves to fully account for the impact of error on the system’s control accuracy. Meanwhile, to prevent the output signal from becoming excessively large, parameter P is added to constrain the output signal.
According to reinforcement learning theory, the long-term reward function is defined as:
J ( x ( k ) , F D ( k ) ) = j = k λ j k δ ( k )
where 0 < λ < 1 , representing the reward discount factor, according to Equation (32), the following Bellman equation holds:
J ( x ( k ) , F D ( k ) ) = δ ( k ) + λ J ( x ( k + 1 ) , F D ( k + 1 ) )
where J ( x ( k ) , F D ( k ) ) and F D ( k ) are the optimal value function and optimal strategy, respectively, and F D ( k ) is expressed as:
F D ( k ) = arg min J ( x ( k ) , F D ( k ) )
This paper employs the Q-learning algorithm to solve the equations, where the actor–critic neural network utilizes a 2 × 64 fully connected layer with a hyperbolic tangent activation function.
The state action function Q ( k ) is defined as based on Equation (32):
Q ( x ( k ) , F D ( k ) ) = δ ( k ) + j = 1 λ j δ ( k + j )
This paper starts with Q 0 ( x ( k ) , F D ( k ) ) = 0 and solves the initial control strategy π 0 ( k ) using the following equation:
F D 0 ( k ) = arg min Q 0 ( x ( k ) , F D ( k ) )
Then, Q 1 can be computed in the next iteration using the determined strategy F D 0 ( k ) :
Q 1 ( x ( k ) , F D ( k ) ) = δ ( k ) + λ Q 0 ( x ( k + 1 ) , F D 0 ( k + 1 ) )
By means of mathematical induction, there is the following:
F D i ( k ) = arg min Q i ( x ( k ) , F D ( k ) )
Q i ( x ( k ) , F D ( k ) ) = δ ( k ) + λ Q i ( x ( k + 1 ) , F D i ( k + 1 ) )
where i is the iteration index.
By means of Equations (38) and (39), there is the following Bellman equation:
Q i + 1 ( x ( k ) , F D ( k ) ) = r ( k ) + λ min Q i ( x ( k + 1 ) , F D ( k + 1 ) )
In this paper, critic neural network is used to approximate the Q ( k ) . There is:
Q ^ i ( k ) = w ^ c 2 , i ( k ) σ ( w ^ c 1 , i ( k ) x c ( k ) ) = w ^ c 2 , i ( k ) σ c , i ( k )
where w ^ c 1 , i and w ^ c 2 , i represent neural network weight matrices, and denotes the inner product.
The input of the critic neural network is:
x c ( k ) = [ x a ( k ) , u ( k ) ] T
where x a ( k ) = [ x 1 ( k ) , x 2 ( k ) , x 3 ( k ) , e 1 ( k ) , x d ( k ) ] T .
From the Bellman Equation (39), the prediction error of the critic neural network is:
e c , i + 1 ( k ) = δ ( k ) + λ Q ^ i ( k + 1 ) Q ^ i + 1 ( k )
To design the weight update rate, the following minimization approximation cost equation is designed:
E c , i + 1 ( k ) = e c , i + 1 T ( k ) e c , i + 1 ( k ) / 2
In this paper, the gradient descent method is used to update the weights as:
Δ w ^ c 1 , i + 1 ( k ) = K c [ E c , i + 1 ( k ) w ^ c 1 , i + 1 ( k ) ] E c , i + 1 ( k ) w ^ c 1 , i + 1 ( k ) = E c , i + 1 ( k ) Q ^ i ( k ) Q ^ i ( k ) σ c , i ( k ) σ c , i ( k ) s c , i ( k ) s c , i ( k ) w ^ c 1 , i ( k ) = λ e c , i + 1 ( k ) w ^ c 2 , i ( k ) [ 1 2 ( 1 ( σ c , i ( k ) ) 2 ) ] x c ( k ) Δ w ^ c 2 , i + 1 ( k ) = K c [ E c , i + 1 ( k ) w ^ c 2 , i + 1 ( k ) ]                   E c , i + 1 ( k ) w ^ c 2 , i + 1 ( k ) = E c , i + 1 ( k ) Q ^ i ( k ) Q ^ i ( k ) w ^ c 2 , i ( k ) = λ e c , i + 1 ( k ) σ c , i ( k )
where K c > 0 is the weighted learning step.
In this paper, using actor neural network to output the strategy function F D ( k ) .
F D i ( k ) = w ^ a 2 , i ( k ) σ ( w ^ a 1 , i x a ( k ) ) = w ^ a 2 , i ( k ) σ a , i ( k )
where w ^ a 1 , i and w ^ a 2 , i represent neural network weight matrices, and denotes the inner product.
The purpose of the Actor neural network is to minimize the value function Q ^ i ( k ) . Therefore, the following minimization approximation equation is designed:
E a , i + 1 ( k ) = Q ^ i + 1 ( k ) Q ^ i + 1 ( k ) / 2
The weights are updated using the gradient descent method as follows:
Δ w ^ a 1 , i + 1 ( k ) = K a [ E a , i + 1 ( k ) w ^ a 1 , i + 1 ( k ) ] E a , i + 1 ( k ) w ^ a 1 , i + 1 ( k ) = E a , i + 1 ( k ) Q ^ i ( k ) [ Q ^ i ( k ) π i ( k ) ] T π i ( k ) σ a , i ( k ) σ a , i ( k ) s a , i ( k ) s a , i ( k ) w ^ a 1 , i ( k ) = Q ^ i ( k ) l = 1 n r = 1 N [ w ^ c 2 , i ( k ) 1 2 ( 1   ( σ a , i ( k ) ) 2 ) w ^ c 1 , i ( k ) ] w ^ a 2 , i ( k ) 1 2 ( 1 ( σ a , i ( k ) ) 2 ) x a ( k ) Δ w ^ a 2 , i + 1 ( k ) = K a [ E a , i + 1 ( k ) w ^ a 2 , i + 1 ( k ) ] E a , i + 1 ( k ) w ^ a 2 , i + 1 ( k ) = E a , i + 1 ( k ) Q ^ i ( k ) [ Q ^ i ( k ) π i ( k ) ] T π i ( k ) w ^ a 2 , i ( k ) = e a ( k ) l = 1 n r = 1 N [ w ^ c 2 , i ( k ) 1 2 ( 1 ( σ a , i ( k ) ) 2 ) w ^ c 1 , i ( k ) ] σ a , i ( k )
where K a > 0 is the weighted learning step.

4. Stability and Optimality Analysis

This subsection first analyzes the global stability of the prescribed-performance controller based on Lyapunov theory, and second, since the disturbance compensator is an externally attached compensation signal, this subsection analyzes the optimality and suboptimality of its compensation signal.

4.1. Global Stability Analysis of Prescribed Performance Controller

Theorem 1. 
Let a continuous function  V t  be positive definite and satisfy the following differential equation:
V ˙ ( t ) ϑ a V + ϑ b
where  ϑ a > 0 ,  0 < ϑ b < 1 , at this time,  V t  satisfies consistent eventual bounded convergence.
Theorem 2. 
[33Suppose that there exists a Lyapunov function  V ( x )  with the constants  α ,  β ,  r 1 ,  r 2 ,  η 0  satisfying  α > 0 ,  β > 0 ,  0 < η 0 < ,  0 < r 1 < 1  and  r 2 > 1 , If  V ˙ x α V x r 1 β V x r 2 + η 0  holds, then at a fixed time,  V x  is able to converge to a neighborhood of the equilibrium point, which can be denoted as:
lim t T x V ( x ) min η 0 α ( 1 θ ) r 1 , η 0 β ( 1 θ ) r 2
where  θ  is a constant and satisfies  0 < θ < 1 . The time required for the system variables to converge to this neighborhood satisfies the following inequality:
T 1 α θ 1 1 r 1 + 1 β θ 1 r 2 1
Proof. 
By substituting Equation (29) into Equation (28) and based on the comparison principle in Lemma 1, the following is obtained:
V ˙ 3 i = 1 3 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i j = 1 2 σ w j 2 η w j W ˜ j T W ˜ j + σ ϑ j 2 η ϑ j ϑ ˜ j T ϑ ˜ j   + j = 1 2 σ w j 2 η w j W j T W j + σ ϑ j 2 η ϑ j ϑ j T ϑ j + 1 2 ε 2 ρ V 3 + ζ
where
ρ = min k 11 , k 12 , k 21 , k 22 , k 31 , k 32 , σ w 1 , σ w 2 , σ ϑ 1 , σ ϑ 2
ζ = j = 1 2 σ w j 2 η w j W j T W j + σ ϑ j 2 η ϑ j ϑ j T ϑ j + 1 2 ε 2
There are integrals on both sides of Equation (52):
0 V 3 ζ / ρ + V 3 ( 0 ) ζ / ρ e ρ t
When t tends to infinity, there is 0 V 3 ζ / ρ . From the fact that V 3 is bounded it follows that the tracking error of the system is always bounded within the time-varying boundary k b i ( t ) , k b i ( t ) .
Since V 3 is bounded, the error signals z 3 , W ˜ j and ϑ ˜ j are all bounded, i.e., there exist positive constants W ¯ j and ϑ ¯ j satisfying W ˜ j W ¯ j and ϑ ˜ j ϑ ¯ j . This leads to the following inequality holding:
σ w j 2 η w j W ˜ j T W ˜ j γ 1 σ ϑ j 2 η ϑ j ϑ ˜ j T ϑ ˜ j γ 1 + σ w j 2 η w j W ¯ j T W ¯ j γ 1 + σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 1 0 σ w j 2 η w j W ˜ j T W ˜ j γ 2 σ ϑ j 2 η ϑ j ϑ ˜ j T ϑ ˜ j γ 2 + σ w j 2 η w j W ¯ j T W ¯ j γ 2 + σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 2 0
Substituting Equation (56) into Equation (52) yields:
V ˙ 3 i = 1 3 k i 1 ln γ 1 ( · ) i k i 2 ln γ 2 ( · ) i + j = 1 2 σ w j 2 η w j W j T W j + σ ϑ j 2 η ϑ j ϑ j T ϑ j j = 1 2 σ w j 2 η w j W ˜ j T W ˜ j γ 1 + σ ϑ j 2 η ϑ j ϑ ˜ j T ϑ ˜ j γ 1 j = 1 2 σ w j 2 η w j W ˜ j T W ˜ j γ 2 + σ ϑ j 2 η ϑ j ϑ ˜ j T ϑ ˜ j γ 2 + j = 1 2 σ w j 2 η w j W ¯ j T W ¯ j γ 1 + σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 1 + j = 1 2 σ w j 2 η w j W ¯ j T W ¯ j γ 2 + σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 2 + 1 2 ε 2
This can be obtained using Lemma 2:
V ˙ 3 α V 3 γ 1 β V 3 γ 2 + η 0
where α , β and η 0 can be expressed, respectively, as:
α = min k i 1 , σ w j γ 1 , σ ϑ j γ 1
β = min ( 2 n ) 1 γ 2 k i 2 , ( 2 n ) 1 γ 2 σ w j γ 2 , ( 2 n ) 1 γ 2 σ ϑ j γ 2
η 0 = j = 1 2 σ w j 2 η w j W ¯ j T W ¯ j γ 1 + σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 1 + σ w j 2 η w j W ¯ j T W ¯ j γ 2 + j = 1 2 σ ϑ j 2 η ϑ j ϑ ¯ j T ϑ ¯ j γ 2 + σ w j 2 η w j W j T W j + σ ϑ j 2 η ϑ j ϑ j T ϑ j + 1 2 ε 2
Using Lemma 2, it can be obtained that the controller proposed in this section can converge in a neighborhood near the zero point in a fixed time, where the neighborhood expression is lim t T x V ( x ) min η 0 / k i 1 ( 1 θ ) γ 1 , η 0 / k i 2 ( 1 θ ) γ 2 and the convergence time expression is T = 1 / k i 1 θ ( 1 γ 1 ) + 1 / k i 2 θ ( γ 2 1 ) .□

4.2. Suboptimality Analysis of Disturbance Compensator

Next, the optimality and suboptimality of the compensation signal are analyzed:
Define the value function under any strategy as:
V i + 1 ( x ( k ) , F D ( k ) ) = r ( k ) + λ V i ( x ( k + 1 ) , F D ( k + 1 ) )
According to Equation (40), Q i + 1 ( x ( k ) , F D ( k ) ) is the result of minimizing the value function, while V i + 1 ( x ( k ) , F D ( k ) ) is the result of the value function of an arbitrary action, so there is:
Q i + 1 ( x ( k ) , F D ( k ) ) V i + 1 ( x ( k ) , F D ( k ) )
When Q 0 = V 0 = 0 , this holds for both i = 0 , 1 , 2 .
According to Equation (62) there is:
V i + 1 ( x ( k ) , F D ( k ) ) V i ( x ( k ) , F D ( k ) ) = λ V i ( x ( k + 1 ) , F D ( k + 1 ) ) λ V i 1 ( x ( k + 1 ) , F D ( k + 1 ) ) = λ 2 V i 1 ( x ( k + 2 ) , F D ( k + 2 ) ) V i 2 ( x ( k + 2 ) , F D ( k + 2 ) ) = λ i V 1 ( x ( k + i ) , F D ( k + i ) ) V 0 ( x ( k + i ) , F D ( k + i ) ) = λ i V 1 ( x ( k + i ) , F D ( k + i ) )
Therefore, the value function is expressed iteratively as:
V i + 1 ( x ( k ) , F D ( k ) ) = λ i V 1 ( x ( k + i ) , F D ( k + i ) ) + V i ( x ( k ) , F D ( k ) ) = λ i V 1 ( x ( k + i ) , F D ( k + i ) ) + λ i 1 V 1 ( x ( k + i 1 ) , F D ( k + i 1 ) )   + V i 1 ( x ( k ) , F D ( k ) ) = λ i V 1 ( x ( k + i ) , F D ( k + i ) ) + λ i 1 V 1 ( x ( k + i 1 ) , F D ( k + i 1 ) )   + + λ V 1 ( x ( k + 1 ) , F D ( k + 1 ) ) + V 1 ( x ( k ) , F D ( k ) ) = j = 0 i λ j V 1 ( x ( k + i ) , F D ( k + i ) )
According to Equation (62) there is:
V i + 1 ( x ( k ) , F D ( k ) ) j = 0 λ j r ( x ( k + i ) , F D ( k + i ) )
Due to the positive characterization of the stage cost function δ ( x ( k + i ) , F D ( k + i ) ) , it is assumed that 0 δ ( x ( k + i ) , F D ( k + i ) ) C , there exists an upper bound .
j = 0 λ j δ ( x ( k + i ) , F D ( k + i ) ) j = 0 λ j C j = 0 λ j
There is through Equation (63):
Q i + 1 ( x ( k ) , F D ( k ) ) V i + 1 ( x ( k ) , F D ( k ) )
Next, the optimality of the value function and control policy will be proven.
Since Q 0 = V 0 = 0 , it follows that when i = 0 , there is Q 0 ( x ( k ) , F D ( k ) ) Q 1 ( x ( k ) , F D ( k ) ) = δ ( k ) , there is:
Q 1 ( x ( k ) , F D ( k ) ) V 0 = δ ( k ) 0
Thus, by mathematical induction, there is:
V i 1 ( x ( k ) , F D ( k ) ) Q i ( x ( k ) , F D ( k ) )
Further, there is:
      Q i + 1 ( x ( k ) , F D ( k ) ) V i ( x ( k ) , F D ( k ) ) = λ [ Q i ( x ( k ) , F D ( k ) ) V i 1 ( x ( k ) , F D ( k ) ) ] 0
According to inequality (63), there is:
Q i ( x ( k ) , F D ( k ) ) V i ( x ( k ) , F D ( k ) ) Q i + 1 ( x ( k ) , F D ( k ) )
For any control strategy F D ( k ) , there is the following inequality:
Q i ( x ( k ) , F D ( k ) ) δ ( k ) + λ Q i 1 ( x ( k + 1 ) , F D ( k + 1 ) )
According to Equation (72),
Q i ( x ( k ) , F D ( k ) ) δ ( k ) + λ Q ( x ( k + 1 ) , F D ( k + 1 ) )
It applies to i :
Q ( x ( k ) , F D ( k ) ) δ ( k ) + λ Q ( x ( k + 1 ) , F D ( k + 1 ) )
When there is an arbitrary control strategy π ( k + 1 ) , there is:
Q ( x ( k ) , F D ( k ) ) δ ( k ) + λ min π ( k + 1 ) Q ( x ( k + 1 ) , F D ( k + 1 ) )
Also, according to Equations (40) and (72),
Q ( x ( k ) , F D ( k ) ) δ ( k ) + λ Q ( x ( k + 1 ) , F D ( k + 1 ) )
Obviously, there is:
Q ( x ( k ) , F D ( k ) ) = δ ( k ) + λ min π ( k + 1 ) Q ( x ( k + 1 ) , F D ( k + 1 ) ) = δ ( k ) + λ Q ( x ( k + 1 ) , F D ( k + 1 ) )
The optimal control strategy can be solved by the following equation:
F D ( k ) = arg min Q ( x ( k ) , F D ( k ) )
Therefore, according to Equations (33) and (34), there is:
Q ( x ( k ) , F D ( k ) ) = J ( x ( k ) , F D ( k ) )         F D ( k ) = F D ( k )
However, due to the inherent error in neural networks, the optimal convergence properties of the value function and control policy only hold under ideal conditions. Therefore, considering the approximation errors in Equations (41) and (46),
F ^ D i ( k ) = arg min Q ^ i ( x ( k ) , F D ( k ) ) + ε i ( x ( k ) )
Q ^ i + 1 ( x ( k ) , F ^ D ( k ) ) = δ ( k ) + λ Q ^ i ( x ( k + 1 ) , F ^ D ( k + 1 ) ) + δ i ( x ( k ) , F ^ D ( k ) )
where ε i ( x ( k ) ) and δ i ( x ( k ) , F ^ D ( k ) ) are the estimation errors and ε i ( 0 ) = δ i ( 0 , 0 ) = 0 .
Next, a new convergence criterion will be established at each iteration, taking into account the error, i.e., the value iterates converge to a neighborhood of the optimal value function, and the iterative value function is:
Q t , i ( x ( k ) , F D ( k ) ) = δ ( k ) + λ min Q ^ i 1 ( x ( k + 1 ) , F D ( k + 1 ) )
where Q ^ 0 ( x ( k ) , F D ( k ) ) = Q ^ t , 0 ( x ( k ) , F D ( k ) ) = 0 and there exists a constant 1 ϖ < such that the following inequality holds:
Q ^ i ( x ( k ) , F D ( k ) ) ϖ Q t , i ( x ( k ) , F D ( k ) )
Defining the constant 0 < ν < , the following inequality holds:
J ( x ( k + 1 ) , F D ( k + 1 ) ) ϖ δ ( k )
According to Equation (84), when i = 0 ,
Q ^ 0 ( x ( k ) , F D ( k ) ) ϖ J ( x ( k ) , F D ( k ) )
When i = 1 ,
Q t , 1 ( x ( k ) , F D ( k ) ) = δ ( k ) + λ min π ( k + 1 ) Q ^ 0 ( x ( k + 1 ) , F D ( k + 1 ) ) δ ( k ) + λ ϖ J ( x ( k + 1 ) , F D ( k + 1 ) ) ( 1 + ν ( ϖ 1 ) / ( ϖ + 1 ) ) δ ( k ) +   λ ( ϖ ( ϖ 1 ) / ( ϖ + 1 ) ) J ( x ( k + 1 ) , F D ( k + 1 ) ) = ( 1 + ν ( ϖ 1 ) / ( ϖ + 1 ) ) J ( x ( k + 1 ) , F D ( k + 1 ) )
According to Equations (33) and (85),
λ J ( x ( k + 1 ) , F D ( k + 1 ) ) J ( x ( k + 1 ) , F D ( k + 1 ) ) ν δ ( k )
Therefore, the following expression holds:
Q ^ 1 ( x ( k ) , F D ( k ) ) ϖ ( 1 + ν ( ϖ 1 ) / ( ϖ + 1 ) ) J ( x ( k ) , F D ( k ) )
Define the following equation:
β = j = 1 i 1 ( ν j 1 ϖ j 1 ( ϖ 1 ) ) / ( ν + 1 ) j             ω = ( ν j 1 ϖ j 1 ( ϖ 1 ) ) / ( ν + 1 ) i
There is:
Q t , i ( x ( k ) , π ( k ) ) = δ ( k ) + λ min π ( k + 1 ) Q ^ i 1 ( x ( k + 1 ) , F D ( k + 1 ) ) δ ( k ) + λ ϖ ( 1 + ν β ) J ( x ( k + 1 ) , F D ( k + 1 ) ) ( 1 + ϖ β + ϖ ω ) δ ( k ) + λ ( ϖ ( 1 + ν β ) β ω ) J ( x ( k + 1 ) , F D ( k + 1 ) ) = ( 1 + j = 1 i ν j ϖ j 1 ( ϖ 1 ) / ( ν + 1 ) j J ( x ( k + 1 ) , F D ( k + 1 ) )
According to Equation (84), by mathematical induction,
Q ^ i ( x ( k ) , F D ( k ) ) ( 1 + j = 0 i ν j ϖ j 1 ( ϖ 1 ) / ( ν + 1 ) j ) J ( x ( k + 1 ) , F D ( k + 1 ) )
According to inequality (91), it is known that ν j ϖ j 1 ( ϖ 1 ) / ( ν + 1 ) j is a geometric progression, so inequality (91) can be expressed as:
Q t , i ( x ( k ) , F D ( k ) ) = ( 1 + ( ν ( ϖ 1 ) / ( ν + 1 ) ) ( 1 ( ν ϖ / ( ν + 1 ) ) i ) / ( 1   ν ϖ / ( ν + 1 ) ) ) J ( x ( k + 1 ) , F D ( k + 1 ) )
When i , inequality (93) becomes:
lim i Q t , i ( x ( k ) , F D ( k ) ) ( 1 + ν ( ϖ 1 ) / ( 1 ν ( ϖ 1 ) ) ) J ( x ( k ) , F D ( k ) )
Combined with inequality (84), there is:
Q ^ ( x ( k ) , F D ( k ) ) ϖ lim i Q t , i ( x ( k ) , F D ( k ) )
If inequality 1 ϖ ( ν + 1 ) / ν holds, according to Equations (94) and (95), there is:
lim i Q ^ i ( x ( k ) , F D ( k ) ) ϖ ( 1 + ν ( ϖ 1 ) / ( 1 ν ( ϖ 1 ) ) ) J ( x ( k ) , F D ( k ) )
When network errors exist, Q ^ i ( x ( k ) , F D ( k ) ) converges within the scaled range of the optimal value function J ( x ( k ) , F D ( k ) ) , i.e., converging to a suboptimal solution. Therefore, the suboptimality of the disturbance compensator is proven.
The overall block diagram of the proposed controller is shown in Figure 4.

5. Experimental Verification

The hydraulic manipulator arm experiment platform is shown in Figure 5, the boom and arm are driven by two hydraulic cylinders, each cylinder is controlled by two Rexroth 4WREE-10 proportional servo valves (Bosch Rexroth, Lohr am Main, Germany), the displacement measurement of the hydraulic cylinders is performed by Shanghai Loxin NS-WY06B tie-wire displacement sensors (Loxin, Shanghai, China), with the measurement noise of about ±0.5 mm, and the pressure measurement is performed by Schneider-Ford RPT8304-C-02 pressure sensors (Schneider, Shanghai, China), with the measurement noise of about ±0.07 MPa, and the pump used for the experiment is Rexroth SYDFEE-20/071R electric proportional pump (Bosch Rexroth, Lohr am Main, Germany), with the displacement of 71 mL/rad. The measurement and control system adopts the closed-loop architecture of upper computer-controller-CAN communication-lower computer-sensor, and the real-time data acquisition and control of the system is based on dSPACE (DS4302 and DS2004 Germany) and MATLAB (2015b) with a sampling period of 10 ms. The sampling period is 10 ms.
As the controller-related parameters need the real-time angle of each joint, the control algorithm joins the hydraulic cylinder and the joint angle conversion equation. The length of the boom cylinder with the angle change equation is:
l 1 = L A E 2 + L A F 2 2 L A E L A F cos ( E A N + B A F + θ 1 )
The length of the arm cylinder with the angle change equation is:
G B H = arccos ( L B G 2 + L B H 2 λ 2 2 2 L B G L B H ) θ 2 = π A B G G B H H B C l 2 = L B G 2 + L B H 2 2 L B G L B H cos ( π A B G H B C θ 2 )
Since the neural network does not converge at the initial stage, it is necessary to train the disturbance compensator offline and deploy it online to the controller. This enables fine-tuning of network weights to estimate and compensate for external disturbances. The relevant structural parameters of the manipulator are shown in Table 1. By constructing the manipulator model using Equation (6) and introducing irregular external disturbances as depicted in Figure 6, the network weights are ensured to learn real disturbances of the same magnitude. Given the excitation trajectory shown in Figure 7, the disturbance compensator is trained using the performance-specific controller designed in this paper for tracking. When the tracking error Z 1 E A , the training is stopped and the trained actor and critic neural network weights are deployed in dSPACE. Taking the boom controller training as an example, the training process is shown in Figure 8, consisting of 30 iterations. Figure 8a represents the first iteration, while Figure 8b shows the last iteration.
In this subsection, the superiority of the proposed control algorithm is verified through comparison and ablation experiments, firstly, the PI controller is compared with the prescribed performance controller based on neural network compensation to verify the effectiveness of the prescribed performance function and the neural network in the controller, and secondly, the prescribed performance controller is compared with the addition of the trained disturbance compensator for ablation experiments to verify the effectiveness of the disturbance compensator, and the relevant parameters are shown below:
C1: This is a typical Proportional Integral (PI) controller with control gain k P = diag 55 ,   85 , k I = diag 35 ,   75 .
C2: This is a prescribed performance controller based on neural network compensation. The constraint boundaries of the moving arm and the boom are the same, and the constraint boundaries are set as: k a 1 = 0.8 e 0.7 t + 0.8 , k a 2 = 1.3 e 0.8 t + 1.6 , k a 3 = 1.8 e 0.4 t + 2.1 . Select controller parameters through a combination of simulation and experimentation: k 11 = k 21 = k 31 = diag 16 ,   16 , k 12 = k 22 = k 32 = 24 ,   24 , γ 1 = 1.2 , γ 2 = 3 . the number of nodes in the hidden layer of the RBF neural network is set to 32, the initial value of the neural network weights W 1 i 0 = W 2 i 0 = 0.2 ,   i = 1 , 2 , , 32 , the initial value of the approximation error ϑ 10 = ϑ 20 = 0 , η w 1 = η w 2 = 1 , η ϑ 1 = η ϑ 2 = 210 , σ ϑ 1 = σ ϑ 2 = 0.16 , center value interval is [−1.5, 1.5]. The relevant structural parameters of the manipulator are shown in Table 1.
C3: This is the prescribed performance controller with the addition of the disturbance compensator, and its training parameters are: h c = h a = 0.01 , λ = diag [ 0.9 , 0.9 , 0.9 ] . Its base control parameters are the same as those in C2 to ensure fairness in comparison.
The experiment is used to simulate the normal operation of the hydraulic manipulator arm by giving the time-varying sinusoidal signals of the boom and arm, while 20 kg sandbags are added to the tip of the bucket teeth to simulate the load during normal operation. The signals are as follows:
q d = 10 sin 0.2 t + 8 + 25 , 25 sin 0.2 t + 10.5 59.8 T
Figure 9 shows the coverage effect of desired trajectories and tracking trajectories under three controllers for the boom and arm. Figure 10 depicts the angular tracking error of the boom and arm. Figure 11 illustrates the compensation signal for the boom and arm generated by the disturbance compensator. Figure 12 presents the unmodeled error estimation compensation value for the boom and arm derived from the neural network. Based on experimental results, it is evident that for continuously varying sinusoidal velocity signals, the tracking performance of C3 and C2 significantly outperforms that of C1. This improvement stems from the fact that C2 incorporates a predefined performance function to constrain system errors while ensuring stability, thereby mitigating the manipulator’s inherent joint strong coupling characteristics and hydraulic system parameter uncertainties to a certain extent. Concurrently, the RBF neural network estimates and compensates for internal system uncertainties, thereby mitigating the manipulator’s inherent joint strong coupling characteristics and hydraulic, which the PI controller cannot resolve. Consequently, C2 achieves significantly higher control accuracy than C1. Furthermore, as shown in the error results of Figure 10, C3 demonstrates markedly superior control accuracy compared to C2. This is because C3 incorporates an additional disturbance compensator on top of C2, enabling compensation for disturbances encountered during the hydraulic manipulator’s operation. In this experiment, a 20 kg sandbag was introduced to simulate external disturbances. However, due to the inherent robustness of the specified performance controller, this robustness effectively suppresses disturbances during the motion of the heavy-duty hydraulic manipulator. Due to its substantial mass and the time-varying characteristics of the inertia matrix, Coriolis torque matrix, and gravity matrix, the improvement in control accuracy achieved by C3 over C2 is less pronounced than the improvement C2 achieves over C1.
To further analyze the performance of each controller, this paper employs widely used evaluation metrics: the maximum absolute value, mean, and standard deviation of tracking error as assessment criteria [34], expressed as:
M e = max i = 1 , , N z 1 i           μ = 1 N i = 1 N z 1 i           σ = 1 N i = 1 N z 1 ( i ) μ 2
where M e is the maximum absolute value error; μ is the mean error; and σ is the standard deviation.
The results are shown in Table 2. Based on the quantified data, it is evident that due to the performance controller imposing boundary constraints while the neural network compensates for unmodeled internal system errors, coupled with the control architecture’s inherent robustness, suppressing certain disturbances, C2 achieved a 61.3% and 61.8% reduction in average tracking error compared to C1 for the boom and arm, respectively. The standard deviation decreased by 59.7% and 60.8%, respectively. Furthermore, since C3 incorporates a disturbance compensator on top of C2, its control accuracy improvement surpasses that of C2. Specifically, C3 reduced the mean tracking error for the boom and arm by 65% and 64.7% compared to C1. The standard deviation decreased by 63% and 62.8%, respectively. To more clearly demonstrate the effectiveness of the disturbance compensator, comparing the control accuracy between C2 and C3 shows that C3 reduced the average tracking error compared to C2 by 9.6% and 7.5% for the boom and arm, respectively. The standard deviation decreased by 8.2% and 5.2%, respectively, fully validating the effectiveness of the disturbance compensator.
Through the aforementioned experiments, the high-precision motion control algorithm for hydraulic manipulators proposed in this paper demonstrates that by defining a performance function, errors can be stabilized within a preset range. Concurrently, the RBF neural network compensates for internal system uncertainties, achieving a significant improvement in precision compared to PID control algorithms. Furthermore, a disturbance compensator based on the Actor–Critic reinforcement learning architecture was introduced. Ablation experiments confirmed the effectiveness of this disturbance compensator.

6. Conclusions

This paper addresses the challenge of achieving high-precision motion control for hydraulic manipulators with strong electromechanical-hydraulic coupling. A high-precision motion control algorithm with a dual compensation mechanism and specified performance is designed, significantly enhancing the system’s control accuracy. The following key conclusions are drawn:
(1)
Within a performance-based control architecture based on the inverse method, a neural network is introduced to estimate and compensate for internal system uncertainties: unmodeled mechanical and hydraulic errors. The network update rate is obtained through a global stability proof, significantly enhancing system control accuracy.
(2)
An Actor–Critic reinforcement learning architecture is employed to design a disturbance compensator for external system uncertainties: external disturbances. The network’s online update rate was obtained using local stability, further enhancing the system’s motion control precision.
(3)
Through a hydraulic manipulator experimental platform, the proposed control algorithm demonstrated a 60–65% improvement in control accuracy compared to the PID algorithm. Ablation experiments confirmed that the disturbance compensator designed using the reinforcement learning Actor–Critic architecture further enhances the hydraulic manipulator’s control accuracy by 7–10%, validating its effectiveness.
Furthermore, due to the network’s local updates and proven optimality, the disturbance compensator is not only applicable to the proposed algorithm but can also be extended to adaptive robust control algorithms to further enhance system motion control accuracy.

Author Contributions

Methodology, X.Q.; Software, Y.L.; Validation, Y.L.; Investigation, Y.L.; Resources, Y.L.; Writing—original draft, Y.L.; Writing—review & editing, X.Q.; Supervision, X.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Yuhe Li, employed by the Tianjin Research Institute of Construction Machinery Co., Ltd., Tianjin 300409, declares no financial or non-financial conflicts of interest related to the submitted work.

References

  1. Phan, V.D.; Ahn, K.K. Fault-Tolerant Control for an Electro-Hydraulic Servo System with Sensor Fault Compensation and Disturbance Rejection. Nonlinear Dyn. 2023, 111, 10131–10146. [Google Scholar] [CrossRef]
  2. Islam, R.U.; Iqbal, J.; Manzoor, S.; Khalid, A.; Khan, S. An Autonomous Image-Guided Robotic System Simulating Industrial Applications. In Proceedings of the 2012 7th International Conference on System of Systems Engineering (SoSE), Genova, Italy, 16–19 July 2012; pp. 344–349. [Google Scholar]
  3. Xiang, Y.; Li, R.; Brach, C.; Liu, X.; Geimer, M. A Novel Algorithm for Hydrostatic-Mechanical Mobile Machines with a Dual-Clutch Transmission. Energies 2022, 15, 2095. [Google Scholar] [CrossRef]
  4. Feng, H.; Yin, C.; Ma, W.; Yu, H.; Cao, D. Parameters Identification and Trajectory Control for a Hydraulic System. ISA Trans. 2019, 92, 228–240. [Google Scholar] [CrossRef] [PubMed]
  5. Lv, L.; Chen, Z.; Yao, B. High Precision and High Efficiency Control of Pump and Valves Combined Hydraulic System. In Proceedings of the 2018 IEEE 15th International Workshop on Advanced Motion Control (AMC), Tokyo, Japan, 9–11 March 2018; pp. 391–396. [Google Scholar]
  6. Li, C.; Ding, R.; Cheng, M.; Chen, Z.; Yao, B. Accurate Motion Control of an Independent Metering Actuator with Adaptive Robust Compensation of Uncertainties in Pressure Dynamics. IEEE/ASME Trans. Mechatron. 2024, 29, 3877–3889. [Google Scholar] [CrossRef]
  7. Sha, Y.; Wang, Q.; He, X.; Zhu, X.; Yang, F.; Du, M. Trajectory Tracking Control of Hydraulic Manipulator Based on Fuzzy Compensation. Inf. Control 2021, 50, 184–194. [Google Scholar]
  8. Khan, O.; Pervaiz, M.; Ahmad, E.; Iqbal, J. On the Derivation of Novel Model and Sophisticated Control of Flexible Joint Manipulator. Rev. Roum. Sci. Techn.-Électrotechn. Énerg. 2017, 62, 103–108. [Google Scholar]
  9. Zhu, Y.; Qiao, J.; Guo, L. Adaptive Sliding Mode Disturbance Observer-Based Composite Control with Prescribed Performance of Space Manipulators for Target Capturing. IEEE Trans. Ind. Electron. 2019, 66, 1973–1983. [Google Scholar] [CrossRef]
  10. Sun, Y.; Wan, Y.; Ma, H.; Liang, X. Compensation Control of Hydraulic Manipulator under Pressure Shock Disturbance. Nonlinear Dyn. 2023, 111, 11153–11169. [Google Scholar] [CrossRef]
  11. Wang, Q.; Shen, Y.; Wang, J.; Su, F.; Feng, C.; Li, X. Free-Shape Contour Control for Excavators Based on Cross-Coupling and Double Error Pre-Compensation. Autom. Constr. 2024, 160, 105336. [Google Scholar] [CrossRef]
  12. Li, C.; Lyu, L.; Helian, B.; Chen, Z.; Yao, B. Precision Motion Control of an Independent Metering Hydraulic System With Nonlinear Flow Modeling and Compensation. IEEE Trans. Ind. Electron. 2022, 69, 7088–7098. [Google Scholar] [CrossRef]
  13. Guo, X.; He, X.; Wang, H.; Liu, H.; Sun, X. Model Feedforward Compensation Active Disturbance Rejection Control for Heavy Hydraulic Manipulator. J. S. China Univ. Technol. 2024, 52, 59–67. [Google Scholar]
  14. Yao, J.; Deng, W. Active Disturbance Rejection Adaptive Control of Hydraulic Servo Systems. IEEE Trans. Ind. Electron. 2017, 64, 8023–8032. [Google Scholar] [CrossRef]
  15. Zhang, X.; Shi, G. Dual Extended State Observer-Based Adaptive Dynamic Surface Control for a Hydraulic Manipulator with Actuator Dynamics. Mech. Mach. Theory 2022, 169, 104647. [Google Scholar] [CrossRef]
  16. Shi, D.; Zhang, J.; Sun, Z.; Shen, G.; Xia, Y. Composite Trajectory Tracking Control for Robot Manipulator with Active Disturbance Rejection. Control Eng. Pract. 2021, 106, 104670. [Google Scholar] [CrossRef]
  17. Liang, X.; Yao, Z.; Deng, W.; Yao, J. Adaptive Neural Network Finite-Time Tracking Control for Uncertain Hydraulic Manipulators. IEEE/ASME Trans. Mechatron. 2025, 30, 645–656. [Google Scholar] [CrossRef]
  18. Zhao, Z.; Feng, K.; Wang, X.; Yang, C.; Li, X.; Hong, K.-S. Adaptive NN Control for a Flexible Manipulator with Input Backlash and Output Constraint. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 7472–7481. [Google Scholar] [CrossRef]
  19. Rouvinen, A.; Handroos, H. Deflection Compensation of a Flexible Hydraulic Manipulator Utilizing Neural Networks. Mechatronics 1997, 7, 355–368. [Google Scholar] [CrossRef]
  20. Tran, D.; Truong, H.; Ahn, K.K. Adaptive Nonsingular Fast Terminal Sliding Mode Control of Robotic Manipulator Based Neural Network Approach. Int. J. Precis. Eng. Manuf. 2021, 22, 417–429. [Google Scholar] [CrossRef]
  21. Guo, Q.; Zhang, Y.; Celler, B.G.; Su, S.W. Neural Adaptive Backstepping Control of a Robotic Manipulator with Prescribed Performance Constraint. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3572–3583. [Google Scholar] [CrossRef]
  22. Fan, S.; Wang, S.; Wang, X.; Wang, Q.; Liu, D. Preset Adaptive Finite-Time Control for Electro-Hydrostatic Actuator in Wide Temperature Range. J. Beijing Univ. Aeronaut. Astronaut. 2025, 1–18. [Google Scholar] [CrossRef]
  23. Deng, W.; Zhou, H.; Zhou, J.; Yao, J. Neural Network-Based Adaptive Asymptotic Prescribed Performance Tracking Control of Hydraulic Manipulators. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 285–295. [Google Scholar] [CrossRef]
  24. Wang, S.; Yu, H.; Yu, J.; Na, J.; Ren, X. Neural-Network-Based Adaptive Funnel Control for Servo Mechanisms with Unknown Dead-Zone. IEEE Trans. Cybern. 2020, 50, 1383–1394. [Google Scholar] [CrossRef] [PubMed]
  25. Yang, X.; Deng, W.; Yao, J. Neural Adaptive Dynamic Surface Asymptotic Tracking Control of Hydraulic Manipulators with Guaranteed Transient Performance. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 7339–7449. [Google Scholar] [CrossRef] [PubMed]
  26. Liang, X.; Yao, J. Adaptive Neural Network Force Tracking Control for Hydraulic Manipulator with Output Constraints. Control Theory Appl. 2025, 42, 138–148. [Google Scholar]
  27. Wang, C.; Wang, J.; Ye, J.; Guo, Q.; Li, T. A Reinforcement Learning-Based Optimized Backstepping Control Approach for Uncertain Electro-Hydraulic Systems. Mech. Syst. Signal Process. 2025, 237, 112928. [Google Scholar] [CrossRef]
  28. Yao, Z.; Xu, F.; Jiang, G.; Yao, J. Data-Driven Control of Hydraulic Manipulators by Reinforcement Learning. IEEE/ASME Trans. Mechatron. 2023, 28, 2673–2684. [Google Scholar] [CrossRef]
  29. Yao, Z.; Liang, X.; Jiang, G.; Yao, J. Model-Based Reinforcement Learning Control of Electrohydraulic Position Servo Systems. IEEE/ASME Trans. Mechatron. 2023, 28, 1446–1455. [Google Scholar]
  30. Bu, F.; Yao, B. Nonlinear Adaptive Robust Control of Hydraulic Actuators Regulated by Proportional Directional Control Valves with Deadband and Nonlinear Flow Gains. In Proceedings of the 2000 American Control Conference, Chicago, IL, USA, 28–30 June 2000; pp. 4129–4133. [Google Scholar]
  31. Wang, X.; Shao, H. Theory of RBF Neural Network and Its Application in Control. Inf. Control 1997, 4, 32–44. [Google Scholar]
  32. Xu, K.; Ai, C.; Chen, G.; Chen, J.; Kong, X. Modelling and Motion Control of Hydraulic Manipulator Based on Deep Learning and Reinforcement Learning. Neurocomputing 2026, 669, 132371. [Google Scholar] [CrossRef]
  33. Gao, J.; Fu, Z.; Zhang, S. Adaptive Fixed-Time Attitude Tracking Control for Rigid Spacecraft with Actuator Faults. IEEE Trans. Ind. Electron. 2019, 66, 7141–7149. [Google Scholar] [CrossRef]
  34. Chen, J.; Guo, Y.; Kong, X.; Xu, K.; Ai, C. Trajectory Planning and High-Precision Motion Control of Excavators Based on Independent Metering Hydraulic Configuration. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 8689–8700. [Google Scholar] [CrossRef]
Figure 1. Hydraulic manipulator structure diagram.
Figure 1. Hydraulic manipulator structure diagram.
Actuators 15 00039 g001
Figure 2. Manipulator hydraulic schematic diagram.
Figure 2. Manipulator hydraulic schematic diagram.
Actuators 15 00039 g002
Figure 3. Performance function boundary and tracking error.
Figure 3. Performance function boundary and tracking error.
Actuators 15 00039 g003
Figure 4. Control block.
Figure 4. Control block.
Actuators 15 00039 g004
Figure 5. Hydraulic manipulator experimental platform.
Figure 5. Hydraulic manipulator experimental platform.
Actuators 15 00039 g005
Figure 6. External disturbance.
Figure 6. External disturbance.
Actuators 15 00039 g006
Figure 7. Excitation trajectory.
Figure 7. Excitation trajectory.
Actuators 15 00039 g007
Figure 8. The training process of arm disturbance compensator.
Figure 8. The training process of arm disturbance compensator.
Actuators 15 00039 g008
Figure 9. Angle tracking of boom/arm under different controllers.
Figure 9. Angle tracking of boom/arm under different controllers.
Actuators 15 00039 g009
Figure 10. Angle tracking error of boom/arm under different controllers.
Figure 10. Angle tracking error of boom/arm under different controllers.
Actuators 15 00039 g010
Figure 11. Disturbance compensator compensation signal.
Figure 11. Disturbance compensator compensation signal.
Actuators 15 00039 g011
Figure 12. Neural network estimation.
Figure 12. Neural network estimation.
Actuators 15 00039 g012
Table 1. Some parameters of the hydraulic manipulator.
Table 1. Some parameters of the hydraulic manipulator.
ParameterValueParameterValueParameterValueParameterValueParameterValue
A 11 3.12 × 10−3 m2 A 12 2.4 × 10−3 m2 A 21 2.38 × 10−3 m2 A 22 1.7 × 10−3 m2 g 9.8 m/s2
P r 0 Pa I 1 28 kg∙m2 I 2 10.2 kg∙m2 I 3 1.95 kg∙m2 m 1 86.6 kg
m 2 64 kg m 3 43 kg L 1 1.806 m L 2 1.151 m L 3 0.535 m
β e 1 × 109 k v 1 3.3 × 10−8 k v 2 3.3 × 10−8
Table 2. Control performance index of each controller.
Table 2. Control performance index of each controller.
JointController M e μ σ
BoomC11.61040.61070.5701
C20.79130.23640.2296
C30.70260.21360.2108
ArmC11.60730.60460.5658
C20.79240.23090.2217
C30.70430.21360.2102
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Qi, X. Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning. Actuators 2026, 15, 39. https://doi.org/10.3390/act15010039

AMA Style

Li Y, Qi X. Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning. Actuators. 2026; 15(1):39. https://doi.org/10.3390/act15010039

Chicago/Turabian Style

Li, Yuhe, and Xiaowen Qi. 2026. "Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning" Actuators 15, no. 1: 39. https://doi.org/10.3390/act15010039

APA Style

Li, Y., & Qi, X. (2026). Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning. Actuators, 15(1), 39. https://doi.org/10.3390/act15010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop