Next Article in Journal
A TEDE Algorithm Studies the Effect of Dataset Grouping on Supervised Learning Accuracy
Previous Article in Journal
Energy Efficient Load-Balancing Mechanism in Integrated IoT–Fog–Cloud Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Critic-Only Learning Based Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance

School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(11), 2545; https://doi.org/10.3390/electronics12112545
Submission received: 25 April 2023 / Revised: 29 May 2023 / Accepted: 2 June 2023 / Published: 5 June 2023

Abstract

:
A critic-only learning-based tracking control with prescribed performance was proposed for a class of uncertain nonlinear systems. Based on an estimator and an optimal controller, a novel controller was designed to make tracking errors uniformly ultimately bounded and limited in a prescribed region. First, an unknown system dynamic estimator was employed online to approximate the uncertainty with an invariant manifold. Subsequently, by running a novel cost function, an optimal controller was derived by online learning with a critic-only neural network, which ensured that tracking errors can evolve within a prescribed area while minimizing the cost function. Specifically, weight update can be driven by weight estimation error, avoiding introducing an actor-critic architecture with a complicated law. At last, the stability of a closed-loop system was analyzed by Lyapunov theorem, and tracking errors evolved within prescribed performance with the optimal controller. The effectiveness of the proposed control can be demonstrated by two examples.

1. Introduction

Nowadays, many plants in the real word can be formulated as uncertain nonlinear systems, such as autonomous underwater vehicles [1], mobile robots [2], and quadrotors [3]. Thus, tracking problems for uncertain nonlinear systems have become one of the issues, and various control methods have been developed and employed in many physical systems to achieve tracking performance.
To deal with the impact induced by uncertainties during controller design, there emerge a large amount of tools to approximate uncertainties and combined with other controllers, e.g., neural networks (NNs) [4,5], fuzzy logic systems [6,7,8], and disturbance observers [9,10,11]. For example, by integrating with a backstepping control, an adaptive tracking controller is designed in [5] for uncertain nonlinear systems, where an NN is utilized to estimate the uncertain term of the model. Due to an uncertain dynamical model in surface vessels, proportional derivative feedback combined with a fuzzy logic system is proposed in [7] with satisfying tracking performance and theoretical results. Taking the disturbance ability of an observer-based controller into consideration, the tracking performance can be guaranteed. For manipulators, a sliding-mode controller is investigated for tracking problems, where a nonlinear disturbance observer is implemented to predict and remove the effect of disturbance [10]. However, manually tunned parameters limit above controllers applied in practical applications. Recently, the unknown system dynamics estimator (USDE) is a novel estimation method to deal with uncertainty and disturbance of nonlinear systems [12], where the state can be an added filtered operation such that invariant manifold is conducted for accurate estimation. Different from NNs and fuzzy logic systems with repeated tuning, the estimator only uses the system state and the control input and can achieve rapid convergence of disturbance estimation by adjusting a parameter. For the motion control of robot systems, lumped disturbances are estimated through an improved unknown disturbance estimator [13]. In [14], by implementing a USDE to compensate the disturbance, a sliding-mode control is designed to obtain the performance with fast convergence and strong robustness. Since USDEs in the above literature are aimed at integral series systems, it is difficult to solve the strong coupling and multivariable functions of nonlinear systems. Moreover, it is worth mentioning that the above control methods with an inherent structure do not discuss the output constraints such as the convergence rate and the maximum steady-state error, which are important in engineering.
Combined with reinforcement learning (RL), which is a branch of machine learning, a mass of adaptive controllers for stability of closed-loop systems have been developed. By integrating it with dynamic programming, there emerge an optimal controller with a learning ability and the balancing between the tracking performance and the control cost, which are generalized as adaptive dynamic programming (ADP) without dimensionality. One of prevalent structures is actor-critic ADP to pursue the optimal control and the optimal value function [15,16,17]. However, tracking error may satisfy the preassigned convergence in practical engineering applications. To address this issue, the introduction of the RL algorithm into the prescribed performance control (PPC) has attracted attention [18], which significantly reduces tracking error and control input and improves performance [19,20,21]. In [19], a data-driven RL algorithm for performance specification was proposed to simultaneously pursue control methods for satisfying optimality and tracking errors to meet output constraints. Combined with the fault-tolerant control (FTC), nonlinear systems with output constraints are considered by RL algorithms [22,23,24]. It is noted that it is difficult to achieve the fault tolerant control by RL alone. In [23], for nonlinear systems with actuator faults, a model-free adaptive optimal control method with specified performance was designed, where an adaptive observer is employed to estimate faults, and the incremental system parameters are estimated by the recursive least squares identification method. By combining the PPC with ADP, the optimal control strategy is obtained, such that tracking error satisfies the specified performance. By introducing an intermediate controller, a controller, and a fault tolerant controller based on RL algorithms, a fault-tolerant dynamic surface control algorithm based on actor-critic ADP was proposed in [24] for nonlinear systems with unknown parameters and actuator faults, which avoids the difficulty of RL in fault tolerant control. For the manipulator, a robust motion control method with specified performance based on reinforcement learning was proposed in [25]. The measurement noise is eliminated by carefully adding an integral term and adopting a robust generalized proportional integral observer, and an optimal control strategy based on error transformation was designed with the actor-critic ADP method to ensure the stability of the system. For the high-order nonlinear multi-agent system containing uncertainty, the optimal consistency control problem with specified performance is considered in [26], where the stability of a closed-loop system and the convergence of consensus errors within a certain range are proved. Based on an actor-critic network, the optimal control is investigated for robot [27] and pure feedback system [28] separately. It is noted that the above controllers rely on multiple neural networks to ensure the stability and optimality of the system. The weight of the neural network is complicated due to the increasing number of nodes, and many parameters are difficult to adjust. In order to satisfy the optimal predetermined performance, the main idea is to transform the constrained tracking error into an unconstrained variable by constructing transformation function, and approximate optimal control is designed within actor-actor NN by minimizing the value function related to the unconstrained variable. Noting that the Hamilton-Jacobi-Isaacs (HJI) equation can be solved for deriving the optimal control in [29,30], it is deemed to be conservative for the worst case of disturbance with massive control inputs. Moreover, existing design and theoretical analysis of the preset performance optimal control are mostly combined with fault-tolerant control, robust control, and adaptive control. There is no relevant research for the USDE-based optimal control to assure the preassigned convergence rate.
Motivated by the above statements, we proposed an optimal control with asymmetric performance constraints under the framework of critic-only ADP by constructing a new value function. The highlights in this article can be expressed as follows:
(1) Compared with the existing observer-based controllers [9,10,11], where the disturbance is estimated with observers by adjusting manually multiple parameters, the USDE was employed to approximate the lumped disturbance of nonlinear uncertain systems with an invariant manifold principle. Moreover, differing from previous function approximator-based control schemes [12,13,14], by combining the RL technique into the optimal control design, the precise tracking performance and the low control cost can be achieved.
(2) Different from the optimal control derived by the actor-critic ADP framework in [24,26,27,28], a critic-only NN was designed for online learn optimal control without constructing an actor NN. In addition, to achieve the convergence rate with a preassigned region, a novel value function was minimized, such that tracking errors can evolve within the prescribed region with low control consumption. In contrast to the traditional gradient descent for weight update in actor-critic ADP, the weight law was designed ingeniously to update the weight of a critic NN, reduce the online training computation and accelerate the weight convergence.
The outline of the article is organized as follows. We first gave the definition of the optimal tracking control problem with prescribed performance in Section 2. In Section 3, the main result of the control design was discussed, in which a feedback controller and an optimal controller were proposed. The stability analysis is illustrated in Section 4. Section 5 provides the effectiveness of the proposed controller on two examples.
Throughout this paper, the vector or matrix is represented by bold fonts, which is different from the scalar. 0 represents the zero matrix and I denotes the identity matrix. d i a g ( x ) is the diagonal matrix conducted with vector x . λ min ( ) is the minimized eigenvalue of the corresponding matrix.

2. Preliminaries

The following multiple-input–multiple-output system with disturbance is considered:
x ˙ = f ( x ) + g ( x ) u + d ( t )
where x = [ x 1 , x 2 , , x n ] T denotes the measurable state vector and u = [ u 1 , u 2 , , u m ] T represents the control input vectors; f ( x ) and d ( t ) are uncertain due to modelling errors and disturbance caused by the environment; matrix g ( x ) is precisely known, which represents the input dynamics. Given bounded reference command x d , the dynamic system of tracking errors can be derived from System (1) as:
e ˙ = x ˙ x ˙ d = f ( x ) + g ( x ) u + d ( t ) x ˙ d
Due to the Lipschitz continuity of f ( x ) , g ( x ) , and d ( t ) , System (2) can be stabilizable from [31].
The goal of the paper is to conduct the optimal tracking control, such that tracking errors are limited within a prescribed region while minimizing the novel value function. Specifically, the prescribed region is denoted by the following equality:
ϑ l i ( t ) < e i ( t ) < ϑ u i ( t ) ,   for   i = 1 , , n
where ϑ l i ( t ) > 0 and ϑ u i ( t ) > 0 ( i = 1 , 2 , , n ) are predefined envelope functions with specific expressions:
ϑ l i ( t ) = l _ ϑ i ( t ) = l _ [ ( ϑ i 0 ϑ i ) e i a i t + ϑ i ]
ϑ u i ( t ) = l ¯ ϑ i ( t ) = l ¯ [ ( ϑ i 0 ϑ i ) e i a i t + ϑ i ]
with a i > 0 representing the lower bound on the rate of convergence and ϑ i 0 ϑ i > 0 , l _ > 0 , l ¯ > 0 . It should be noted that uncertain f ( x ) and d ( t ) can be considered as lumped disturbances δ = f ( x ) + d ( t ) . On account of the initial state x ( 0 ) = x 0 , the following assumptions are necessary to achieve the controller design.
Assumption 1.
There exists the positive constant Δ satisfying δ ˙ Δ .
Assumption 2.
The ϑ l i ( 0 ) and ϑ u i ( 0 ) are chosen such that (3) holds.
Remark 1.
There exist many plants, such as autonomous underwater vehicles [1] and quadrotors [3], which can be modeled as System (1) with disturbance. Unlike the investigated controllers in [1] and [3] regardless of control consumption, the proposed optimal tracking controller was designed to achieve the stability of a closed-loop system with an adequate control input.

3. Results

In this section, the whole controller consisting of a feedback controller with an estimator and an optimal controller is given. Firstly, the feedback controller is derived by a USDE, which is employed for estimating lumped disturbances. Based on it, the optimal tracking controller is designed within critic-only ADP while minimizing the novel value function considering prescribed functions (4) and (5). Therefore, the whole controller is constructed as:
u = u d + u *
with u d is the feedback controller and u * is the optimal controller. The specific diagram of the whole controller is detailed in Figure 1.

3.1. Feedback Controller Design

Inspired by [12], a USDE is employed on System (2) to approximate δ . By importing it into the filter operation of tracking error and control input, the following equations can be obtained:
{ k x ˙ f + x f = x , x ˙ f ( 0 ) = 0 k u ˙ f + u f = u , u f ( 0 ) = 0
where k is positive and can be adjusted to generate filter variables x f and u f .
Lemma 1.
The vector χ = k 1 ( x x f ) ( g u f + δ ) is bounded such that:
lim k 0 { lim t [ 1 k ( x x f ) ( g u f + δ ) ] } = 0
Noting that χ represents the map form filtered variables and lumped disturbances, there exists an invariant manifold χ , which further derives:
δ ^ = 1 k ( x x f ) g u f
where δ ^ is the estimation of δ . The estimation error can converge to the neighborhood of zero.
Theorem 1.
For nonlinear system (1) with uncertainty, by virtue of estimation from Equation (9), the estimation errors δ ˜ = δ δ ^ can converge to the following region:
δ ˜ δ ˜ ( 0 ) 2 e t / k + k 2 Δ 2  
Proof of Theorem 1.
By importing the first-order filter operation on nonlinear system (3), we can derive the following form with the Laplace operator:
1 k s + 1 x ˙ = 1 k s + 1 g ( x ) u + 1 k s + 1 δ
Based on nonlinear system (1) and filter operation (7), it can be induced that:
x ˙ f = 1 k ( x x f ) = g ( x ) u f + δ f
where δ f = δ / ( k s + 1 ) .
From Equation (9), one has δ f = δ ^ . Therefore, the estimation error can be expressed as:
δ ˜ = δ δ f = k s k s + 1 δ
Combined with Equations (7) and (13), we deduce the time derivative of the estimation error as:
δ ˜ ˙ = δ ˙ δ ˙ f = δ ˙ x ˙ x ˙ f + k u ˙ f k = δ ˙ x ˙ ( x x f ) / k + k u ˙ f k = δ ˙ g ( x ) u + Δ ( x x f ) / k + k u ˙ f k = δ ˙ δ δ f k
Then, the estimator errors can be analyzed by the following candidate function:
V 1 = 1 2 δ ˜ T δ ˜
Taking the derivative respect to time and following (), one can have:
V ˙ 1 = δ ˜ T δ ˜ ˙ = δ ˜ T ( δ ˙ δ δ f k ) = δ ˜ T δ ˙ 1 k δ ˜ T ( δ δ f ) = δ ˜ T δ ˙ 1 k δ ˜ T δ ˜
There exist the positive constant α and Young’s inequality such that:
V ˙ 1 1 2 k δ ˜ 2 + k Δ 2 2 1 k δ ˜ T δ ˜ = 1 k V 1 + k Δ 2 2
which can further obtain:
V 1 ( t ) 1 k V 1 ( 0 ) e t / k + k 2 Δ 2 2
Thus, the estimation error can converge the following region:
δ ˜ = 2 V 1 ( t ) δ ˜ ( 0 ) 2 e t / k + k 2 Δ 2
According to above discussion, the estimation error can converge to zero with given positive k , when time goes to infinity. □
For eliminating the impression of lumped disturbances, the feedback controller is constructed as:
u d = G [ x ˙ d δ ^ Κ e ]
where G is a generalized inverse matrix and Κ is a positive control gain. G = ( g T g ) T g induces the tracking errors as:
e ˙ = Κ e + g ( x ) u * + δ ^
In the next section, we will consider designing the controller by defining a novel value function that balances control consumption and tracking performance.

3.2. Optimal Controller Design

To stabilize Equation (21), a value function is predefined for balancing control cost and tracking performance. In addition, the tracking error should be required to satisfy the predefined function, namely prescribed performance control. The PPC can be regarded as the constraint of the asymptotic rate and the overshoot of the tracking error. Generally, by introducing an error transformation function, the constrained dynamic of the tracking error can be converted to an equivalent unconstrained model. Different from introducing a transformation function, we first consider the tracking error directly:
r s = e T Q e
which Q n × n is a positive definite matrix. Noting that it incorporates an actual tracking error, it depicts the distance from a desired trajectory.
In addition, the corresponding term related to control u * is added to the value function to achieve a tradeoff between tracking performance and control consumption in a quadratic form:
r u = u * T R u *
where positive definite matrix R m × m denotes the weight of control expanse. It is common that Equations (22) and (23) are constructed in the utility function for controller design, which can be find in [15,16,17].
Last but not least, we take the PPC into consideration in the value function design. With prescribed performance, the evolvement of tracking error e must be within the range of predefined behavior. Furthermore, the maximum overshoot should be kept away from exceeding the predetermined performance, which may cause the damage to facilities. Thus, the tracking error can be constrained within the predetermined region and satisfy:
r c = e T Q c C e
where Q c n × n is a positive definite matrix and matrix C is denoted as:
C = d i a g ( ln ϑ u i ( t ) e i ( t ) ϑ l i ( t ) + e i ( t ) ) , i = 1 , 2 , , n
In view of the above analysis, the value function is predefined by Equation (25), consisting of the corresponding characterizations of the tracking error, the control, and the PPC as:
r = r s + r u + r c
The optimal control is designed by minimizing the following value function:
V ( e ) = t r ( e ( τ ) , u * ( τ ) ) d τ
where r is defined in Equation (26). On the basis of Equation (27), the Hamilton function can be derived as:
H ( e , u * , V ) = V e T [ K e + g u * + δ ˜ ] + r s + r u + r c
where V e is a partial derivative of V . By taking the derivatives of Equations (27) and (21), the HJB equation can be derived as:
H ( e , u , V ) = V e * T [ K e + g u * + δ ˜ ] + r s + u * T R u * + r c = 0
In order to learn an optimal control strategy, one can derive the following equation from H / u * = 0 :
u * = 1 2 R 1 g T V e *
By substituting Equation (30) into Equation (29), the Hamilton-Jacobian-Bellman (HJB) equation can be rewritten as:
V e T [ K e + δ ˜ ] + r c + r s 1 4 V e T g R g T V e = 0
Since Equation (31) is a nonlinear equation of the optimal value function, the optimal control cannot be obtained directly by Equation (30). Referring to the core of ADP, an NN can be introduced to approximate the optimal value function and its derivative:
V * ( e ) = W T σ ( e ) + ε
V e * ( e ) = σ ( e ) T W + ε
where W and σ are the ideal weight and the activation function; ε is the estimation error; σ ( e ) and ε are gradients of the activation function and the estimation error.
Noting that the ideal weight is unknown, the approximate optimal value function can be estimated by:
V ^ = W ^ T σ ( e ) + ε
In addition, the optimal control can be derived as:
u ^ = 1 2 R 1 g T σ ( e ) T W ^
Supposing ψ = r s + r u + r c and Φ = σ ( K e + g u e ) , one can yield:
ψ = Φ T W ε H J B
By denoting the matrices:
{ N ˙ = N + Φ Φ T , N ( 0 ) = 0 S ˙ = S + Φ ψ , S ( 0 ) = 0
the auxiliary matrix is introduced for designing the weight law:
P = N W ^ + S
From Equation (37), there holds:
{ N ( t ) = 0 t e ( t r ) Φ Φ T d r , N ( 0 ) = 0 S ( t ) = 0 t e ( t r ) Φ ψ d r , S ( 0 ) = 0
Then, we can derive the following equations with Equation (36):
S = N W + ς = N W 0 t e c ( t r ) Φ ε H J B d r
Thus, one has:
P = N W ^ + S = N W ˜ + ς
where W ˜ is the weight estimation error, inducing the weight law:
W ^ ˙ = 𝚼 P
where 𝚼 represents a positive gain. The convergence of weight can be derived by the following Lyapunov function:
V W = 1 2 W ˜ T 𝚼 1 W ˜
with derivation respect to time:
V ˙ W = W ˜ T 𝚼 1 W ˜ ˙ = W ˜ T ( N W ˜ + ς ) ( γ W ˜ 2 + ς W ˜ ) W ˜ ( γ W ˜ ε ς )
where γ = λ min ( N ) and ς ε ς .

4. Stability Analysis

Due to the above-mentioned theoretical result of the disturbance approximation error and weight error, the stability of the closed-loop system can be proven with prescribed performance.
Theorem 2.
For nonlinear uncertain system (1) with constraint (3), the optimal control is designed with weight update (42), the tracking error is uniform ultimate boundedness and evolves within predefined region.
Proof of Theorem 2.
According to Equations (21) and (35), the system of the tracking error can be obtained:
e ˙ = K e + g u ^ + δ ˜ = K e 1 2 g R 1 g T σ T W ^ + 1 2 g R 1 g T ( σ T W + ε ) + g u * + δ ˜ = K e + 1 2 g R 1 g T σ T W ˜ + 1 2 g R 1 g T ε + g u * + δ ˜
L = 1 2 W ˜ T 𝚼 0 1 W ˜ + Γ 1 e T e + Γ 2 V * + Γ 3 ζ T ζ = L 1 + L 2 + L 3
with V * is the optimal value function form and Γ 1 , Γ 2 , Γ 3 > 0 . □
Firstly, due to the weight convergence of the critic NN, one has:
L ˙ 1 = W ˜ T 𝚼 1 W ˜ ˙ = W ˜ T ( N W ˜ + ς ) ( γ W ˜ 2 + ς W ˜ ) W ˜ ( γ W ˜ ς ) [ ( γ 1 2 τ Γ 3 ) W ˜ 2 τ Γ 3 ς 2 2 ]
Then,
L ˙ 2 = 2 Γ 1 e T e ˙ + Γ 2 V ˙ * = 2 Γ 1 e T e ˙ + Γ 2 ( e T Q e u * T R u * r c ) = 2 Γ 1 e T [ K e + 1 2 g R 1 g T σ T W ˜ + 1 2 g R 1 g T ε + g u * + δ ˜ ] + Γ 2 ( e T Q e u * T R u * r c ) [ 2 λ min ( K ) Γ 1 + Γ 2 λ min ( Q ) ( g R 1 g T σ T + g R 1 g T + 2 ) Γ 1 ] e 2 + 1 4 Γ 1 g R 1 g T ε 2 + 1 4 Γ 1 g R 1 g T σ T W ˜ 2 + Γ 1 δ ˜ 2 Γ 2 r c ( Γ 2 λ min ( R ) Γ 1 g 2 ) u * 2
In addition,
L ˙ 3 = 2 Γ 3 ς T ς ˙ = 2 Γ 3 ς T ( ς + Φ ε H J B ) = 2 Γ 3 ς T { ς + Φ [ ( W T σ + ε ) T δ ˜ + ε T ( 1 2 g R 1 g T σ T W ^ ) ε T K e ] } = Γ 3 ( 2 3 τ ) ς 2 + Γ 3 τ Φ ( σ T W + ε ) T 2 δ ˜ 2 + Γ 3 τ Φ ε T K 2 e 2 + Γ 3 4 τ Φ ε T g R 1 g T σ T W ^ 2
Lastly, the stability of the whole system can be derived as:
L ˙ = L ˙ 1 + L ˙ 2 + L ˙ 3 [ ( γ 1 2 τ Γ 3 ) 1 4 Γ 1 g R 1 g T σ T ] W ˜ 2 ( Γ 2 λ min ( R ) Γ 1 g 2 ) u e * 2 [ 2 λ min ( K ) Γ 1 + Γ 2 λ min ( Q ) ( g R 1 g T σ T + g R 1 g T + 2 ) Γ 1 Γ 3 τ Φ ε T K 2 ] e 2 Γ 3 ( 2 7 2 τ ) ς 2 + 1 4 Γ 1 g R 1 g T ε 2 + Γ 3 4 τ Φ ε T g R 1 g T σ T W ^ 2 + ( Γ 1 + Γ 3 τ Φ ( W T σ + ε ) T 2 ) δ ˜ 2 Γ 2 r c
If Γ 2 λ min ( R ) Γ 1 g 2 > 0 , Equation (50) can be written in the following form:
L ˙ c 1 W ˜ 2 c 2 e 2 c 3 ς 2 + ρ
where
c 1 = γ 1 2 τ Γ 3 1 4 Γ 1 g R 1 g T σ T
c 2 = 2 λ min ( K ) Γ 1 + Γ 2 λ min ( Q ) ( g R 1 g T σ T + g R 1 g T + 2 ) Γ 1 Γ 3 τ Φ ε T K 2
c 3 = Γ 3 ( 2 7 2 τ )
ρ = 1 4 Γ 1 g R 1 g T ε 2 + Γ 3 4 τ Φ ε T g R 1 g T σ T W ^ 2 + ( Γ 1 + Γ 3 τ Φ ( σ T W + ε ) 2 ) δ ˜ 2 Γ 2 r c
Noting that r c is bounded, the boundedness of ρ can be derived along the boundness of disturbance estimation errors, the weight estimation error, and the HJB error. Thus, the parameters of the controller design should satisfy:
Γ 1 < 4 γ g R 1 g T σ T
τ > max ( 1 Γ 3 ( 2 γ ( 1 / 2 ) Γ 2 g R 1 g T σ T ) , Γ 3 Φ ε T K 2 2 λ min ( K ) Γ 2 )
Γ 2 > max ( 2 λ min ( K ) Γ 1 + ( Γ 3 / τ ) Φ ε K 2 ( g R 1 g T σ T + g R 1 g T + 2 ) Γ 1 λ min ( Q ) , Γ 1 g 2 λ min ( R ) )
λ min ( K ) > g R 1 g T + g R 1 g T σ T + 2 2
> 7 4 τ , Γ 3 > 0
such that c 1 , c 2 , c 3 > 0 , resulting in the uniform ultimate boundedness of the tracking error.
For arbitrary | r c | ε r ( ε r > 0 ) , there exists:
| ln ϑ u i ( t ) e i ( t ) ϑ l i ( t ) + e i ( t ) | ε r
which can indicate the PPC is be violated.

5. Simulations

In this section, we implemented the following examples to demonstrate the effectiveness and superiority of the investigated controller scheme. Without loss of generality, the sampling period was 5 ms, and solver ode4 was selected during the simulations.
Example 1.
In order to verify the design of the optimal tracking controller with preset performance, a second-order nonlinear system is considered as:
{ x ˙ 1 = x 1 + x 2 x ˙ 2 = 0.5 x 1 0.5 x 2 ( 1 ( cos ( 2 x 1 ) + 2 ) 2 ) + ( cos ( 2 x 1 ) + 2 ) u
The given track commands are x 1 d = sin ( t ) and x 2 d = cos ( t ) + sin ( t ) , and the tracking error should meet the following constraints:
ϑ l i ( t ) < e i ( t ) < ϑ u i ( t ) ,   for   i = 1 , 2
where l _ 1 = 2 ,   l ¯ 1 = 3 ,   ϑ 10 = 3 ,   ϑ 1 = 0.2 ,   a 1 = 3 and l _ 2 = 1.8 ,   l ¯ 2 = 1.5 ,   ϑ 20 = 3 ,   ϑ 2 = 0.25 ,   a 2 = 2 . The initial state is x = [ 3 , 1 ] T . For verifying the devised controller, we assume that:
f = [ f 1 f 2 ] = [ x 1 + x 2 0.5 x 1 0.5 x 2 ( 1 ( cos ( 2 x 1 ) + 2 ) 2 ) ]
which is uncertain, and disturbance d ( t ) = [ 0 ; sin ( 2 t ) ] T is added.
In order to achieve the control objective, the proposed optimal tracking controller is realized based on a feedback control and an optimal control. The feedback control is based on a USDE to compensate the influence of disturbance on the system. The optimal regulation law is to minimize value function (26) under the ADP framework, where Q ,   R , and Q c are the identity matrices of the respective dimensions. In order to approximate the optimal value function, the activation function was selected as σ = [ e 1 2 , e 1 e 2 , e 2 2 ] T , and other simulation parameters were selected as K = d i a g [ 0.5 , 0.5 ] , Γ = 2 Ι , l = 5 , and k = 0.01 . The initial weight was W 0 = [ 0 , 0 , 0 ] T .
The simulation results for a period of 20 s are shown from Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Figure 2 gives the simulation results of the system state and the reference command, which indicates that the state can accurately track the reference command within 4 s. The tracking errors enter the steady state quickly, which can be due to the estimation from the USDE, as shown in Figure 3. Lumped disturbances can be approximated precisely within 0.1 s.
The control inputs of the proposed controller are described in Figure 4, which include the feedback control and the optimal control. Figure 5 describes the simulation results of the tracking error and asymmetric output performance constraints. It can be found that the tracking errors converged to a specified asymmetric envelope, which indicated that the designed controller with an optimal preset performance can make tracking errors converge to zero within a predetermined convergence rate. The stability of the closed-loop system hinged on the weight of the critic NN, as illustrated in Figure 6, which showed the approximation of weight can converge to the ideal weight within 2 s.
In order to illustrate the superiority of the proposed controller in achieving preset performance, it was compared with the optimal controller based on the traditional value function in [32], which is named WPPC. Noting that the controller in [32] is based on a fixed-time disturbance observer, we set the feedback controller based on a USDE for the sake of fair comparison. Figure 7 shows the simulation results of the tracking error and the PPC envelope. Although the system state tracked the reference command successfully, the tracking errors could not evolve within the envelope, and the transient performance could not be guaranteed to meet the output constraints. This indicates that the convergence rate of the tracking error was slower than that of the specified convergence rate and the predetermined performance could not be guaranteed. Figure 8 compares the value functions of the two controllers for 20 s. In contrast, the value function of the new controller decreased by 6% compared to the optimal controller without taking into preset performance account. Therefore, the new controller can ensure that the tracking error meets the output performance constraints while its value function is smaller.
Example 2.
Noting that the trajectory tracking problem of the quadrotor can be affected by uncertain dynamic drift and disturbance induced by wind, the effectiveness of the proposed control is carried on a quadrotor [3,31]. Noting that position and attitude loops are considered in [3,31], the position dynamic of the quadrotor is considered as:
{ p ˙ = v v ˙ = ( g h 3 v ) / m + u + d v
where parameters of the model are listed in Table 1. The given reference trajectory is p d = [ 10 ( 1 cos ( 0.1 π t ) ) , 5 sin ( 0.2 π t ) , 9 ( 1 e 0.3 t ) ] T .
To carry out the proposed controller on Equation (53), we reformulate it as:
x ˙ = f ( x ) + g ( x ) u + d ( t )
where
x = [ p v ] ,   f ( x ) = [ v ( g h 3 v ) / m ] ,   g ( x ) = [ 0 I ]
For approximating the optimal control, the critic NN is structured with 15 neurons with [−5,5]. For the initial position and weight, we choose the following parameters: p ( 0 ) = [ 5 , 3 , 2 ] T and W 0 = 0 . The other controller parameters are listed in Table 2.
The performance of the proposed controller was discussed as follows. Figure 9 and Figure 10 show the tracking performances of the position and the velocity. It can be found that tracking errors evolved the predefined constraints. By running the novel value function, tracking errors converged to zero within the convergence rate. During the controller design, the weight of the critic NN is crucial for solving the HJB equation. Figure 11 gives the convergence of weight, which is essential for the stability of the closed-loop system. To verify the advantage of the novel value function, the optimal tracking controller in [32] was executed for System (54). In the position loop, the tracking errors of the position and the velocity are shown in Figure 12 and Figure 13, which indicates the tracking errors could not evolve the predefined envelope by minimizing the value function without the PPC constraints. Therefore, the effectiveness of the proposed optimal control was verified to deal with prescribed performance. Noting that the proposed controller and the optimal tracking controller in [32] are based on simulation, we will further consider experimental validation based on physical systems to verify the feasibility of the proposed method, where complete response associated is impossible.

6. Conclusions

In this paper, an optimal tracking control with output constraints was proposed to ensure tracking errors with prescribed performance. The perturbation estimation error was introduced into critic-only ADP, and the novel value function was run within a critic-only ADP framework to derive the optimal control. It proved that the tracking error can be realized in the specified envelope of the PPC. The numerical simulation verified that the designed controller can achieve tracking errors with prescribed performance on the quadrotor trajectory tracking problem and a class of second-order nonlinear systems. However, actuator saturation may limit the control input, which makes it difficult to guarantee prescribed performance. We will focus on the optimal tracking controller for uncertain nonlinear systems with prescribed performance under actuator saturation.

Author Contributions

Conceptualization, Y.G. and Z.L.; software, Y.G.; validation, Y.G.; writing—original draft preparation, Y.G.; supervision, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the first author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vu, Q.V.; Dinh, T.A.; Nguyen, T.V.; Tran, H.V.; Le, H.X.; Pham, H.V.; Kim, T.D.; Nguyen, L. An adaptive hierarchical sliding mode controller for autonomous underwater vehicles. Electronics 2021, 10, 2316. [Google Scholar] [CrossRef]
  2. Liang, L.; Liu, H.; Li, X.; Zhu, X.; Lan, B.; Liu, Y.; Wang, X. Model-based coordinated trajectory tracking control of skid-steer mobile robot with timing-belt servo system. Electronics 2023, 12, 699. [Google Scholar] [CrossRef]
  3. Shao, X.; Yue, X.; Li, J. Event-triggered robust control for quadrotors with preassigned time performance constraints. Appl. Math. Comput. 2021, 392, 125667. [Google Scholar] [CrossRef]
  4. Li, S.; Ahn, C.K.; Guo, J.; Xiang, Z. Neural-network approximation-based adaptive periodic event-triggered output-feedback control of switched nonlinear systems. IEEE Trans. Cybern. 2020, 51, 4011–4020. [Google Scholar] [CrossRef] [PubMed]
  5. Li, Y.; Li, K.; Tong, S. Adaptive neural network finite-time control for multi-input and multi-output nonlinear systems with positive powers of odd rational numbers. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2532–2543. [Google Scholar] [CrossRef]
  6. Tatlicioglu, E.; Yilmaz, B.M.; Savran, A.; Alci, M. Adaptive fuzzy logic with self-adjusting membership functions based tracking control of surface vessels. Ocean. Eng. 2022, 253, 111129. [Google Scholar] [CrossRef]
  7. Tang, J.; Dang, Z.; Deng, Z.; Li, C. Adaptive fuzzy nonlinear integral sliding mode control for unmanned underwater vehicles based on ESO. Ocean. Eng. 2022, 266, 113154. [Google Scholar] [CrossRef]
  8. Li, Y.; Li, K.; Tong, S. An observer-based fuzzy adaptive consensus control method for nonlinear multiagent systems. IEEE Trans. Fuzzy Syst. 2022, 30, 4667–4678. [Google Scholar] [CrossRef]
  9. Chen, M.; Xiong, S.; Wu, Q. Tracking flight control of quadrotor based on disturbance observer. IEEE Trans. Syst. Man Cybern. 2019, 51, 1414–1423. [Google Scholar] [CrossRef]
  10. Guo, K.; Shi, P.; Wang, P.; He, C.; Zhang, H. Non-singular terminal sliding mode controller with nonlinear disturbance observer for robotic manipulator. Electronics 2023, 12, 849. [Google Scholar] [CrossRef]
  11. Kukurowski, N.; Pazera, M.; Witczak, M. Fault-tolerant tracking control for a descriptor system under an unknown input disturbances. Electronics 2021, 10, 2247. [Google Scholar] [CrossRef]
  12. Huang, Y.; Wu, J.; Na, J.; Han, S.; Gao, G. Unknown system dynamics estimator for active vehicle suspension control systems with time-varying delay. IEEE Trans. Cybern. 2021, 52, 8504–8514. [Google Scholar] [CrossRef] [PubMed]
  13. Na, J.; Jing, B.; Huang, Y.; Gao, G.; Zhang, C. Unknown system dynamics estimator for motion control of nonlinear robotic systems. IEEE Trans. Ind. Electron. 2019, 67, 3850–3859. [Google Scholar] [CrossRef]
  14. Wang, S.; Tao, L.; Chen, Q.; Na, J.; Ren, X. USDE-based sliding mode control for servo mechanisms with unknown system dynamics. IEEE/ASME Trans. Mech. 2020, 25, 1056–1066. [Google Scholar] [CrossRef]
  15. Khodamipour, G.; Khorashadizadeh, S.; Farshad, M. Adaptive formation control of leader-follower mobile robots using reinforcement learning and the fourier series expansion. ISA Trans. 2023, in press. [Google Scholar] [CrossRef]
  16. Bao, C.; Wang, P.; He, R.; Tang, G. Observer-based optimal control method combination with event-triggered strategy for hypersonic morphing vehicle. Aerosp. Sci. Technol. 2023, 136, 108219. [Google Scholar] [CrossRef]
  17. Hua, H.; Fang, Y. A novel reinforcement learning-based robust control strategy for a quadrotor. IEEE Trans. Ind. Electron. 2022, 70, 2812–2821. [Google Scholar] [CrossRef]
  18. Bechlioulis, C.P.; Rovithakis, G.A. Robust adaptive control of feedback linearizable MIMO nonlinear systems with prescribed performance. IEEE Trans. Autom. Control 2008, 53, 2090–2099. [Google Scholar] [CrossRef]
  19. Wang, N.; Gao, Y.; Zhang, X. Data-driven performance-prescribed reinforcement learning control of an unmanned surface vehicle. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5456–5467. [Google Scholar] [CrossRef]
  20. Liu, H.; Cheng, Q.; Xiao, J.; Hao, L. Data-driven optimal tracking control for SMA actuated systems with prescribed performance via reinforcement learning. Mech. Syst. Signal Process. 2022, 177, 109191. [Google Scholar] [CrossRef]
  21. Chen, H.; Yan, H.; Wang, Y.; Xie, S.; Zhang, D. Reinforcement learning-based close formation control for underactuated surface vehicle with prescribed performance and time-varying state constraints. Ocean. Eng. 2022, 256, 111361. [Google Scholar] [CrossRef]
  22. Wang, X.; Wang, Q.; Sun, C. Prescribed performance fault-tolerant control for uncertain nonlinear MIMO system using actor–critic learning structure. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4479–4490. [Google Scholar] [CrossRef] [PubMed]
  23. Zhang, S.; Huang, C.; Ji, K.; Zhang, H. Prescribed performance incremental adaptive optimal fault-tolerant control for nonlinear systems with actuator faults. ISA Trans. 2022, 120, 99–109. [Google Scholar] [CrossRef] [PubMed]
  24. Li, D.; Dong, J. Performance-constrained fault-tolerant DSC based on reinforcement learning for nonlinear systems with uncertain parameters. Appl. Math. Comput. 2023, 443, 127759. [Google Scholar] [CrossRef]
  25. Liu, G.; Sun, N.; Yang, T.; Fang, Y. Reinforcement learning-based prescribed performance motion control of pneumatic muscle actuated robotic arms with measurement noises. IEEE Trans. Syst. Man Cybern. 2022, in press. [Google Scholar] [CrossRef]
  26. Yan, L.; Liu, Z.; Chen, C.L.P.; Zhang, Y.; Wu, Z. Optimized adaptive consensus control for multi-agent systems with prescribed performance. Inf. Sci. 2022, 613, 649–666. [Google Scholar] [CrossRef]
  27. Ouyang, Y.; Sun, C.; Dong, L. Actor-critic learning based coordinated control for a dual-arm robot with prescribed performance and unknown backlash-like hysteresis. ISA Trans. 2022, 126, 1–13. [Google Scholar] [CrossRef]
  28. Luo, A.; Xiao, W.; Li, X.M.; Yao, D.; Zhou, Q. Performance-guaranteed containment control for pure-feedback multi agent systems via reinforcement learning algorithm. Int. J. Robust Nonlinear Control. 2022, 32, 10180–10200. [Google Scholar] [CrossRef]
  29. Peng, Z.; Ji, H.; Zou, C.; Kuang, Y.; Cheng, H.; Shi, K.; Ghosh, B. Optimal H tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs. Neural Netw. 2023, 164, 105–114. [Google Scholar] [CrossRef]
  30. Huo, Y.; Wang, D.; Qiao, J.; Li, M. Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints. Nonlinear Dyn. 2023, 111, 11671–11683. [Google Scholar] [CrossRef]
  31. Vamvoudakis, K.; Lewis, F. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  32. Liu, H.; Li, B.; Xiao, B.; Ran, D.; Zhang, C. Reinforcement-learning-based tracking control for a quadrotor unmanned aerial vehicle under external disturbances. Int. J. Robust Nonlinear Control. 2022, in press. [Google Scholar] [CrossRef]
Figure 1. The diagram of the whole controller.
Figure 1. The diagram of the whole controller.
Electronics 12 02545 g001
Figure 2. The tracking performance.
Figure 2. The tracking performance.
Electronics 12 02545 g002
Figure 3. The estimation errors of lumped disturbances.
Figure 3. The estimation errors of lumped disturbances.
Electronics 12 02545 g003
Figure 4. The control inputs.
Figure 4. The control inputs.
Electronics 12 02545 g004
Figure 5. The tracking errors with the PPC.
Figure 5. The tracking errors with the PPC.
Electronics 12 02545 g005
Figure 6. The weight of the critic NN (Example 1).
Figure 6. The weight of the critic NN (Example 1).
Electronics 12 02545 g006
Figure 7. The tracking errors with the WPPC.
Figure 7. The tracking errors with the WPPC.
Electronics 12 02545 g007
Figure 8. Comparison of cost functions.
Figure 8. Comparison of cost functions.
Electronics 12 02545 g008
Figure 9. The tracking errors of the position.
Figure 9. The tracking errors of the position.
Electronics 12 02545 g009
Figure 10. The tracking errors of the velocity.
Figure 10. The tracking errors of the velocity.
Electronics 12 02545 g010
Figure 11. The weight of the critic NN (Example 2).
Figure 11. The weight of the critic NN (Example 2).
Electronics 12 02545 g011
Figure 12. The tracking errors of the position [32].
Figure 12. The tracking errors of the position [32].
Electronics 12 02545 g012
Figure 13. The tracking errors of the velocity [32].
Figure 13. The tracking errors of the velocity [32].
Electronics 12 02545 g013
Table 1. Parameters of the quadrotor attitude dynamics.
Table 1. Parameters of the quadrotor attitude dynamics.
SectionValues
Mass2
Inertia moment matrix diag ( [ 0.01 , 0.01 , 0.01 ] )
Disturbances [ sin ( 4 t ) + cos ( 2 t ) sin ( t ) ; cos ( 4 t ) + sin ( 2 t ) cos ( t ) ; sin ( 3 t ) cos ( 2 t ) cos ( t ) ] T
Table 2. Parameters of the controllers.
Table 2. Parameters of the controllers.
SectionValues
USDE k = 0.01
PPC l _ p x = 2.5 , l ¯ p x = 1.5 , ϑ 0 p x = 5 , ϑ p x = 0.2 , a p x = 2
l _ p y = 2.5 , l ¯ p y = 3 , ϑ 0 p x = 5 , ϑ p x = 0.2 , a p x = 3
l _ p z = 2.5 , l ¯ p z = 2 , ϑ 0 p z = 5 , ϑ p z = 0.2 , a p z = 3
l _ v x = 2 , l ¯ v x = 1.5 , ϑ 0 v x = 8 , ϑ v x = 0.2 , a v x = 3
l _ v y = 1.3 , l ¯ v y = 1.5 , ϑ 0 v y = 10 , ϑ v y = 0.2 , a v y = 3
l _ v z = 0.6 , l ¯ v z = 0.8 , ϑ 0 v z = 8 , ϑ v z = 0.2 , a v z = 3
Control gain K = d i a g [ 2 , 2 , 2 , 2 , 2 , 2 ]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, Y.; Liu, Z. Critic-Only Learning Based Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance. Electronics 2023, 12, 2545. https://doi.org/10.3390/electronics12112545

AMA Style

Gao Y, Liu Z. Critic-Only Learning Based Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance. Electronics. 2023; 12(11):2545. https://doi.org/10.3390/electronics12112545

Chicago/Turabian Style

Gao, Yanping, and Zuojun Liu. 2023. "Critic-Only Learning Based Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance" Electronics 12, no. 11: 2545. https://doi.org/10.3390/electronics12112545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop