Robust Approximate Optimal Trajectory Tracking Control for Quadrotors

Rong Li; Zhengliang Yang; Gaowei Yan; Long Jian; Guoqiang Li; Zhiqiang Li

doi:10.3390/aerospace11020149

,

and

¹

College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China

²

Nuclear Emergency and Nuclear Safety Department, China Institute for Radiation Protection, Taiyuan 030006, China

³

College of Mechanical and Vehicle Engineering, Taiyuan University of Technology, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Aerospace2024, 11(2), 149;https://doi.org/10.3390/aerospace11020149

This article belongs to the Special Issue Flight Dynamics, Control & Simulation (2nd Edition)

Version Notes

Order Reprints

Abstract

This paper uses the adaptive dynamic programming (ADP) method to achieve optimal trajectory tracking control for quadrotors. Relying on an established mathematical model of a quadrotor, the approximate optimal trajectory tracking control, which consists of the steady-state control input and the approximate optimal feedback control input, is designed for a nominal system. Considering the compound disturbances in position and attitude dynamic models, disturbance observers are introduced. The estimated values are used to design robust compensation inputs to suppress the effect of the compound disturbances for good trajectory tracking performance. Theoretically, the Lyapunov theorem demonstrates the stability of a closed-loop system. The robustness and effectiveness of the proposed controller are confirmed by the simulation results.

Keywords:

ADP; quadrotor; trajectory tracking control; disturbance observer

1. Introduction

The miniaturization and reduction in cost of the relevant control components in aircraft, as well as the development and progress of computer and sensing and measurement technologies, have improved the stability of flight control systems and greatly facilitated the development of quadrotors [1]. The high operability, strong mobility and flexibility of quadrotors allow them to meet the specific needs of many projects, generally used in military, industrial and other fields [2,3]. A quadrotor system is multivariable, nonlinear and strongly coupled, and quadrotors will also be disturbed by the surrounding environment during flight [4]. These factors can affect the accuracy of quadrotor control systems. The requirements for high-accuracy and robust flight control in the design of controllers for quadrotors are stringent, and the design of a core control algorithm is a prerequisite for quadrotors to achieve a stable and high-precision flight performance. Therefore, the research and development of controllers for quadrotor systems is of great significance.

At present, it is no longer a problem to ensure the uniformity of quadrotors through control algorithms. Many controllers for quadrotors have been designed and are already in application [5]. Since the dynamics of quadrotors can be linearized around the equilibrium point, traditional linear control methods are used for a designed controller [6]. On this basis, linear techniques are employed in the flight control of quadrotors, such as linear quadratic regulator (LQR) control [7]. However, quadrotors need to be controlled away from the equilibrium point to accomplish complex control tasks and withstand external disturbances. As a result, a technique has been devised that is regarded as a robust feedback linearization method that uses extended state observers to estimate the nonlinear state feedback term online, containing aerodynamic forces, moments and unknown disturbances, and obtains the desired closed-loop dynamics via pole assignment [8]. Moreover, several robust controllers relying on nonlinear techniques have been proposed, such as sliding mode control [9], adaptive control [10], backstepping-based control [11] and robust control [12]. These control methods ensure the stability and robustness of nonlinear systems and have generally been used for the tracking control of these systems, but their optimal properties have not been considered. Therefore, the concept of optimization has been introduced into control design.

To derive the optimal control policy for the infinite horizon optimal control problem, solving the Hamilton–Jacobi–Bellman (HJB) equation or the Hamilton–Jacobi–Isaacs (HJI) equation for the

H_{\infty}

optimal control problem considering uncertainties is essential. Nevertheless, it is difficult to mathematically derive the corresponding analytical solutions in most cases. Neural networks are an optional method to overcome this problem [13,14,15]. The approximation property of neural networks makes it possible to find approximate solutions to partial differential equations. The convergence of neural networks can be ensured by penalizing them to ensure they satisfy the given partial differential equations. The ADP method is the combination of reinforcement learning, dynamic programming and neural network adaptive methods to derive approximate solutions of the HJB/HJI equations using function approximate structures to address nonlinear optimal control problems [16,17]. The ADP method is used for control design with suitable performance index functions to derive the desired dynamic performance and stabilize a nominal system with uncertainties. However, most nonlinear optimal control methods using the ADP method are aimed at nominal systems or uncertain systems satisfying specific conditions [18,19,20], while the immunity to disturbances is still weak for such systems with external time-varying disturbances independent of the state, and the control effect under stronger disturbances is not ideal. The ADP method has been used in the design of controllers for quadrotors and efforts have been made to improve the robustness, but the designed controllers are more geared towards linear systems and design uncertainty is a unique problem [21,22]. Quadrotors will often experience various external effects in flight, requiring strong adaptive and anti-disturbance capabilities in flight control. The disturbance observer technique achieves disturbance suppression of the target utilizing feedback regulation [23], which can attenuate compound disturbances containing external disturbances and model uncertainties, thus improving the system robustness. A disturbance observer can accurately estimate compound disturbances in a system, which greatly reduces the conservatism of the control. In addition, since a disturbance observer can usually be designed independently of the controller, this ensures that the method can be easily combined with other advanced control methods and more flexible in its application. There are experiments suggesting that the introduction of a disturbance observer significantly improves performance, which is a good reference for methods for quadrotors to overcome disturbances [24].

Considering the above analysis, a robust approximate optimal trajectory tracking control method is proposed for quadrotors to solve the optimal control problem under the conditions of compound disturbances. The main contributions are summarized as follows:

(1): The combination of modeling uncertainties and external time-varying disturbances is considered as compound disturbances. Disturbance observers are introduced to estimate the compound disturbances in the position and attitude subsystems, and the estimated values are used to design robust compensation inputs to suppress the effects of the compound disturbances and ensure the stability of a quadrotor system under the ADP method.
(2): To obtain optimal trajectory tracking control for a quadrotor without composite disturbances, the ADP method is used to design approximate optimal control inputs for the nominal system of a quadrotor.

The rest of the paper is organized as follows. In Section 2, the quadrotor mathematical model is developed, and the quadrotor system is divided into two subsystems. Section 3 describes the design of the robust approximate optimal trajectory tracking control and the stability analysis of the closed-loop system. Section 4 describes the robust approximate optimal trajectory tracking control for the quadrotor. The results of the corresponding simulation and the results of the comparative simulation without disturbance observer are presented in Section 5. Section 6 gives the conclusion of the paper.

2. Mathematical Modeling of a Quadrotor

The quadrotor has four evenly spaced, cross-symmetrical brushless motors in the plane, The rotors of motor 1 and motor 3 rotate clockwise, while the rotors of motor 2 and motor 4 rotate counterclockwise. By changing the rotational speed of the four rotors, the quadrotor generates different magnitudes of lift forces and torques, which can control the takeoff, landing and attitude motions of the quadrotor. As a result, the location of the quadrotor can be altered in the three-dimensional space. Figure 1 depicts the basic structure of the quadrotor.

Figure 1. Basic structure of the quadrotor.

To clarify the mathematical model of the quadrotor system and satisfy the implementation of the control method, the earth-fixed inertial frame

O_{I} X_{I} Y_{I} Z_{I}

and the body-fixed body frame

O_{B} X_{B} Y_{B} Z_{B}

are established. To ensure that the constructed mathematical model does not lose the generality, it is assumed that the deformation and elastic vibration properties of the rotors and body are neglected, and the quadrotor is considered as an ideal rigid body; the quadrotor’s structure is symmetrical, its mass is uniformly distributed, and its center of mass is located at the geometric center. The translational and rotational motions of the quadrotor are satisfied by [25]

\dot{P} = v,

(1)

\dot{Θ} = W_{B}^{I} ω,

(2)

where

P = {[x, y, z]}^{T} \in R^{3}

represents the position of the quadrotor in the inertial frame and

v = {[v_{x}, v_{y}, v_{z}]}^{T} \in R^{3}

represents the corresponding velocity.

Θ = {[ϕ, θ, ψ]}^{T} \in R^{3}

denotes the vector of Euler angles.

ω = {[p, q, r]}^{T} \in R^{3}

denotes the angular velocity of the quadrotor in the body frame.

W_{B}^{I} \in R^{3 \times 3}

is the rotation matrix for the angular velocity in the form of

W_{B}^{I} = [\begin{matrix} 1 & s_{ϕ} t_{θ} & c_{ϕ} t_{θ} \\ 0 & c_{ϕ} & - s_{ϕ} \\ 0 & s_{ϕ} / c_{θ} & c_{ϕ} / c_{θ} \end{matrix}],

(3)

in which

s_{*} = sin (*)

,

c_{*} = cos (*)

and

t_{*} = tan (*)

.

Relying on the Newton–Euler method, the dynamical equation of quadrotor with compound disturbances is represented by [26]

m \dot{v} = F - k v + d_{p},

(4)

I \dot{ω} + ω \times I ω = τ + d_{a},

(5)

where

m \in R

represents the mass of the quadrotor.

I \in R^{3 \times 3}

represents the inertia matrix of the quadrotor. As the assumptions of the quadrotor structure, its inertia matrix can be defined as the diagonal array

I ≜ diag \{I_{x x}, I_{y y}, I_{z z}\}

.

k = diag \{k_{x}, k_{y}, k_{z}\} \in R^{3 \times 3}

is the drag coefficient matrix.

F = {[F_{x}, F_{y}, F_{z}]}^{T} \in R^{3}

represents the resultant force consisting of the gravity and the total lift in the inertial frame.

τ = {[τ_{x}, τ_{y}, τ_{z}]}^{T} \in R^{3}

represents the torque in the body frame.

d_{p} \in R^{3}

and

d_{a} \in R^{3}

are the compound disturbances in position and attitude dynamic models, which contain modeling uncertainties and external time-varying disturbances.

According to the mechanical analysis, the quadrotor is affected by gravity and lift forces. Since the special structure of the quadrotor, the lift forces are along the z-axis direction of the body frame. Then, the resultant force expressed in the inertial frame is [27]

F = R_{B}^{I} z_{u} T - z_{u} m g_{I},

(6)

where

T \in R

represents the total lift force and

g_{I} \in R

represents the gravity acceleration.

z_{u} = {[0, 0, 1]}^{T}

.

R_{B}^{I} \in R^{3 \times 3}

is the rotation matrix of the body frame transformed into the inertial frame in the form of

R_{B}^{I} = [\begin{matrix} c_{θ} c_{ψ} & s_{ϕ} s_{θ} c_{ψ} - c_{ϕ} s_{ψ} & c_{ϕ} s_{θ} c_{ψ} + s_{ϕ} s_{ψ} \\ c_{θ} s_{ψ} & s_{ϕ} s_{θ} s_{ψ} + c_{ϕ} c_{ψ} & c_{ϕ} s_{θ} s_{ψ} - s_{ϕ} c_{ψ} \\ - s_{θ} & s_{ϕ} c_{θ} & c_{ϕ} c_{θ} \end{matrix}] .

(7)

Assumption 1

([28]). The pitch and roll angles hold the conditions

| ϕ | < π / 2

and

| θ | < π / 2

to avoid the singularities of the matrices

W_{B}^{I}

and

R_{B}^{I}

.

Assumption 2

([29]). In the control process, the total compound disturbance

d_{t} = {[d_{p}^{T}, d_{a}^{T}]}^{T} \in R^{6}

has finite energy. In addition,

d_{t}

is a continuous function and its norm is bounded such that

∥ d_{t} ∥ \leq d_{t M}

, where

d_{t M}

is an unknown positive constant. Simultaneously, the compound disturbances are usually considered to be superimposed by the low-frequency period signals. Hence, it is assumed that the total compound disturbance has a low change rate and its rate of change is slow compared to the dynamic properties of the disturbance observer, which can be considered that

{\dot{d}}_{t} ≃ 0

.

Assumption 3

([30]). The desired trajectory of position

P_{d} = {[x_{d}, y_{d}, z_{d}]}^{T} \in R^{3}

and the desired trajectory of yaw angle

ψ_{d} \in R

and their higher order derivatives are known, continuous and bounded.

Remark 1.

Assumption 2 is common in control studies using disturbance observers [31,32,33], while there are different considerations for compound disturbances in [34]. In the case of this paper, the considerations in Assumption 2 are used. Assumption 3 ensures that the ADP method can be utilized for the control design and the stability analysis.

The total lift and torque of the quadrotor are related to the force and torque of the four rotors as follows [35]:

\{\begin{matrix} T = T_{1} + T_{2} + T_{3} + T_{1} \\ τ_{x} = l (T_{2} - T_{4}) \\ τ_{y} = l (T_{1} - T_{3}) \\ τ_{z} = τ_{1} - τ_{2} + τ_{3} - τ_{4} \end{matrix},

(8)

where

T_{i}

and

τ_{i}

(i = 1, 2, 3, 4)

are the lift and torque generated by the four rotors of the quadrotor, respectively. l represents the length from each rotor to the center of the body.

The rotor speeds are related to pulse-width modulated (PWM) signals through the motors. The lift forces and torques generated by the four motors are related to the pulse width of the input signals as follows [36]:

\{\begin{matrix} T_{i} = K_{t} \frac{B_{w}}{s + B_{w}} u_{i} \\ τ_{i} = K_{o} \frac{B_{w}}{s + B_{w}} u_{i} \end{matrix},

(9)

where

K_{t}

and

K_{o}

are the positive gains of the lift coefficient and the inverse torque coefficient, respectively.

B_{w}

is the motor bandwidth and

u_{i}

represents the PWM signals of each corresponding motor, which should be limited between 0 and 1.

Assuming that the motors have a sufficiently fast response speed, then the motor model can be simplified as [37]

\{\begin{matrix} T_{i} = K_{t} u_{i} \\ τ_{i} = K_{o} u_{i} \end{matrix} .

(10)

Hence, (8) can be rewritten as [38]

[\begin{matrix} T \\ τ_{x} \\ τ_{y} \\ τ_{z} \end{matrix}] = [\begin{matrix} K_{t} & K_{t} & K_{t} & K_{t} \\ 0 & K_{t} l & 0 & - K_{t} l \\ K_{t} l & 0 & - K_{t} l & 0 \\ K_{o} & - K_{o} & K_{o} & - K_{o} \end{matrix}] [\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \end{matrix}] .

(11)

Considering the trajectory tracking control for the quadrotor, the control objective is to design a controller that allows the position and attitude to track the desired trajectory asymptotically within a small error.

Combining (1), (2), (4) and (5), the overall model of the quadrotor can be decomposed into a position subsystem and an attitude subsystem. The position subsystem can be represented as

{\dot{x}}_{1} = f_{1} (x_{1}) + g_{1} (x_{1}) (F + d_{p}),

(12)

with

\begin{matrix} x_{1} = & {[P^{T}, v^{T}]}^{T} = {[x, y, z, v_{x}, v_{y}, v_{z}]}^{T} \in R^{6}, \\ f_{1} (x_{1}) = & {[v_{x}, v_{y}, v_{z}, - k_{x} v_{x} / m, - k_{y} v_{y} / m, - k_{z} v_{z} / m]}^{T} \in R^{6}, \\ g_{1} (x_{1}) = & {[\begin{matrix} 0 & 0 & 0 & 1 / m & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 / m & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 / m \end{matrix}]}^{T} \in R^{6 \times 3} . \end{matrix}

While the attitude subsystem is expressed in the form of

{\dot{x}}_{2} = f_{2} (x_{2}) + g_{2} (x_{2}) (τ + d_{a}),

(13)

with

\begin{matrix} x_{2} = & {[Θ^{T}, ω^{T}]}^{T} = {[ϕ, θ, ψ, p, q, r]}^{T} \in R^{6}, \\ f_{2} (x_{2}) = & [p + q s_{ϕ} t_{θ} + r c_{ϕ} t_{θ}, q c_{ϕ} - r s_{ϕ}, q s_{ϕ} / c_{θ} + r c_{ϕ} / c_{θ}, \\ q r (I_{y y} - I_{z z}) / I_{x x}, p r (I_{z z} - I_{x x}) / I_{y y}, p q (I_{x x} - I_{y y}) / I_{z z}]^{T} \in R^{6}, \\ g_{2} (x_{2}) = & {[\begin{matrix} 0 & 0 & 0 & 1 / I_{x x} & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 / I_{y y} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 / I_{z z} \end{matrix}]}^{T} \in R^{6 \times 3} . \end{matrix}

In the next section, (12) and (13) will be the focus of our research.

3. Robust Approximate Optimal Trajectory Tracking Control Design

Considering the convenience of describing the control design process, (12) and (13) is represented in the uniform form

\dot{X} = f (X) + g (X) (U + D),

(14)

in which

f (X) \in R^{6}

and

g (X) \in R^{6 \times 3}

represent the drift dynamics and the input dynamics of the system, respectively.

X \in R^{6}

denotes the observable state vector,

U \in R^{3}

denotes the control input, and

D \in R^{3}

denotes the compound disturbance.

Definition 1

([39]). A state vector

X

is said to be uniformly ultimately bounded (UUB) if there exists a compact set

Ø_{X}

, a positive number

b_{X}

and a time

t_{b} (X (t_{0}), b_{X})

such that

∥ X ∥ \leq b_{X}

for all state variable initial value

X (t_{0}) \in Ø_{X}

and all

t \geq t_{0} + t_{b}

.

Lemma 1

([40]).

X

is UUB if the time derivative of a positive definite function

L_{X} (X)

is negative when

∥ X ∥ > b_{X}

for a positive constant

b_{X}

.

To realize the trajectory tracking control with robustness for the system, the designed controller consists of two parts, the form of which is as follows:

U = U^{N} + U^{R},

(15)

where

U^{R}

is the robust compensation input designed through the disturbance observer for suppressing the effect of compound disturbances in the system.

U^{N}

is the control input designed based on the ADP method for the nominal system, which takes the form of

U^{N} = U_{d} + U_{E},

(16)

where

U_{d}

represents the steady-state control input and

U_{E}

represents the feedback control input.

3.1. Disturbance Observer Design

The disturbance observer is applied to derive the estimate of the compound disturbance. The estimated value is then used for the design of the robust compensation input to improve robustness. The disturbance observer is designed as

\{\begin{matrix} \dot{Z} = - l_{D} (X) (f (X) + g (X) (p_{D} (X) + U + Z)) \\ \hat{D} = Z + p_{D} (X) \end{matrix},

(17)

in which

\hat{D} \in R^{3}

represents the estimate of the unknown compound disturbance,

p_{D} (X) \in R^{3}

represents the designed vector-valued function,

l_{D} (X) = \partial p_{D} (X) / \partial X \in R^{3 \times 6}

is the observer gain and

Z \in R^{3}

represents the auxiliary variable vector of the disturbance observer.

Remark 2.

In the disturbance observer (17), the derivative of the state is required, which is unknown because the compound disturbance is unknown. Then, the auxiliary variable vector is given to avoid calculating the derivative of the state.

Define the estimation error of compound disturbance as

\tilde{D} = D - \hat{D}

. With regard to Assumption 2 and the disturbance observer (17), the time derivative of

\tilde{D}

is developed as

\begin{matrix} \dot{\tilde{D}} & = \dot{D} - \dot{\hat{D}} = - \dot{Z} - l_{D} (X) \dot{X} \\ = l_{D} (X) (f (X) + g (X) (p_{D} (X) + U + Z)) - l_{D} (X) \dot{X} \\ = l_{D} (X) g (X) (Z + p_{D} (X)) - l_{D} (X) (\dot{X} - f (X) - g (X) U) . \end{matrix}

(18)

Combined with (14), we have

\begin{matrix} \dot{\tilde{D}} & = l_{D} (X) g (X) (Z + p_{D} (X)) - l_{D} (X) g (X) D \\ = - l_{D} (X) g (X) (D - \hat{D}) \\ = - l_{D} (X) g (X) \tilde{D} . \end{matrix}

(19)

Then,

\tilde{D}

is convergent by appropriately designing the vector-valued function

p_{D} (X)

.

Theorem 1.

Considering System (14), the disturbance observer is designed as (17). If

l_{D} (X) g (X)

is ensured to be positive definite for the design of the vector-valued function

p_{D} (X)

, then the estimated compound disturbance

\hat{D}

would follow the compound disturbance D, which means the estimation error

\tilde{D}

could converge to zero.

Proof.

Select the candidate Lyapunov function as follows:

L_{D} = \frac{1}{2} {\tilde{D}}^{T} \tilde{D} .

(20)

Combined with (18), the time derivative of

L_{D}

is

{\dot{L}}_{D} = {\tilde{D}}^{T} \dot{\tilde{D}} = - {\tilde{D}}^{T} l_{D} (X) g (X) \tilde{D} .

(21)

In the case where

l_{D} (X) g (X)

is positive definite, then we derive

{\dot{L}}_{D} \leq - κ {∥ \tilde{D} ∥}^{2},

(22)

where

κ = λ_{min} (l_{D} (X) g (X))

and

λ_{min} (*)

denotes the minimum eigenvalue. Obviously,

{\dot{L}}_{D} < 0

when

\tilde{D} \neq 0

. Hence, the disturbance observer (17) can estimate D and

\tilde{D}

will converge to zero. This completes the proof. □

Then, the robust compensation input

U^{R}

is designed as

U^{R} = - \hat{D} .

(23)

3.2. Optimal Trajectory Tracking Control Design and Analysis

The compound disturbance is estimated by the disturbance observer. The robust compensation input is designed by the estimated value to suppress the effect of the compound disturbances. As a result, converting the trajectory tracking control problem of the nonlinear system with the compound disturbance into the trajectory tracking control problem of the nominal system is possible. In order to derive the optimal control for the nominal system, deriving the solution of the associated HJB equation is essential. Unfortunately, deriving the analytical solution is difficult for the nonlinear system by the direct solution method. Then, the ADP method is utilized for achieving the approximate optimal control by constructing the critic network. The weight update law designed for the critic network ensures the convergence of the weight and the stability of the closed-loop system.

For System (14), the nominal system is represented by

\dot{X} = f (X) + g (X) U .

(24)

Given the desired trajectory

X_{d} \in R^{6}

, the steady-state control input

U_{d}

is obtained from (24) as

U_{d} = g^{+} (X_{d}) ({\dot{X}}_{d} - f (X_{d})),

(25)

in which

g^{+} (X_{d})

denotes the pseudo-inverse of

g (X_{d})

.

Define the tracking error as

E = X - X_{d} \in R^{6}

. Combined with (14) and (15), the error system is developed as

\begin{matrix} \dot{E} = & f (X) + g (X) (U + D) - {\dot{X}}_{d} \\ = & f (E + X_{d}) + g (E + X_{d}) U_{d} - {\dot{X}}_{d} + g (E + X_{d}) U_{E} + g (E + X_{d}) \tilde{D} . \end{matrix}

(26)

Let

f_{E} = f (E + X_{d}) + g (E + X_{d}) U_{d} - {\dot{X}}_{d}

and

g_{E} = g (E + X_{d})

, then we have

\begin{matrix} \dot{E} = & f_{E} + g_{E} U_{E} + g_{E} \tilde{D} . \end{matrix}

(27)

Noting that

g_{E} = g (X)

, the norm of

g_{E}

is bounded such that

g_{m} \leq ∥ g_{E} ∥ \leq g_{M}

for the positive constants

g_{m}

and

g_{M}

.

As a result of Theorem 1, the disturbance observer (20) can successfully estimate the compound disturbance D and the estimation error of compound disturbance

\tilde{D}

can converge to zero. Therefore, it is possible to neglect

\tilde{D}

in the error system (27) for the optimal control design [41,42]. However,

\tilde{D}

would still be considered in the stability analysis. Then, the nominal error system is represented by

\dot{E} = f_{E} + g_{E} U_{E} .

(28)

Define the cost function as

V (E) = \int_{t_{0}}^{\infty} (E^{T} Q E + U_{E}^{T} R U_{E}) d t,

(29)

where

Q \in R^{6 \times 6}

and

R \in R^{3 \times 3}

are the designed symmetric positive definite matrices.

The nonlinear Lyapunov equation for (29) is achieved as

\nabla V^{T} (f_{E} + g_{E} U_{E}) + E^{T} Q E + U_{E}^{T} R U_{E} = 0,

(30)

where

\nabla V = \partial V (E) / \partial E

and

V (0) = 0

.

Definition 2

([43]). A control policy

μ (E)

is said to be admissible on the compact set Ø for (29) if

μ (E)

is continuous on Ø,

μ (0) = 0

,

μ (E)

stabilizes (28) on Ø and

V (E)

is finite

\forall E \in Ø

. This is represented by

μ (E) \in Ψ (Ø)

, where

Ψ (Ø)

denotes the set of admissible control policies.

The Hamiltonian function takes the following form

H (E, U_{E}, \nabla V) = \nabla V^{T} (f_{E} + g_{E} U_{E}) + E^{T} Q E + U_{E}^{T} R U_{E} .

(31)

The optimal cost function is represented by

V^{*} (E) = min_{U_{E} \in Ψ (Ø)} \int_{t_{0}}^{\infty} (E^{T} Q E + U_{E}^{T} R U_{E}) d t,

(32)

and the following relation is satisfied

min_{U_{E} \in Ψ (Ø)} H (E, U_{E}, \nabla V^{*}) = 0,

(33)

where

\nabla V^{*} = \partial V^{*} (E) / \partial E

.

Under the existence condition of the optimal solution

\partial H (E, U_{E}^{*}, \nabla V^{*}) / \partial U_{E}^{*} = 2 R U_{E}^{*} + g_{E}^{T} \nabla V^{*}

= 0

, the optimal feedback control input is derived by

U_{E}^{*} = - \frac{1}{2} R^{- 1} g_{E}^{T} \nabla V^{*} .

(34)

Substituting (34) and (31) into (33), the HJB equation is developed as

\nabla V^{* T} f_{E} + E^{T} Q E - \frac{1}{4} \nabla V^{* T} g_{E} R^{- 1} g_{E}^{T} \nabla V^{*} = 0 .

(35)

3.3. Approximate Optimal Control Design

Clearly, it is necessary to derive

\nabla V^{*}

by solving the HJB Equation (35) for deriving the optimal feedback control input (34). However, (35) is a typical nonlinear partial differential equation and its solution is difficult to derive in the analytic form [44,45]. To overcome the difficulty, the ADP method relying on the policy iteration technique is utilized to derive the approximate solution.

Assumption 4

([46]). The continuously differentiable Lyapunov function candidate

J (E)

for the nominal error system (28) satisfies

\nabla J^{T} (f_{E} + g_{E} U_{E}^{*}) < 0

, where

\nabla J = \partial J (E) / \partial E

. Meanwhile, there exists a symmetric positive definite matrix

Λ (E)

such that

\nabla J^{T} (f_{E} + g_{E} U_{E}^{*}) = - \nabla J^{T} Λ (E) \nabla J

. Moreover, the relation

Λ_{m} \leq ∥ Λ (E) ∥ \leq Λ_{M}

holds for positive constants

Λ_{m}

,

Λ_{M}

.

Remark 3.

Assumption 4 is a common assumption that has been used for the ADP method. Generally, it is assumed that the closed-loop dynamics with the optimal feedback control is bounded by a function of the system state on the compact set. In such a situation, there exists a positive constant

η

such that

∥f_{E} + g_{E} U_{E}^{*}∥ \leq η ∥ \nabla J ∥

. Hence, we can further derive

∥\nabla J^{T} (f_{E} + g_{E} U_{E}^{*})∥ \leq η {∥ \nabla J ∥}^{2}

. Furthermore, the function

J (E)

can be correctly selected as a quadratic polynomial [47], such as

J (E) = \frac{1}{2} E^{T} E

.

Considering the uniform estimation property of neural networks, the optimal cost function is approximated by

V^{*} (E) = W_{c}^{T} φ_{c} (E) + ε_{c} (E),

(36)

where

W_{c} \in R^{N}

represents the unknown ideal constant weight,

φ_{c} (E) \in R^{N}

represents the activation function,

ε_{c} (E)

represents the approximate error, and N represents the number of neurons. This neural network is called the critic network in the ADP method.

Lemma 2

([48]). The estimation error

ε_{c} (E)

is expected to be bounded when the approximated function

V^{*} (E)

is bounded.

Then, by the definition of

\nabla V^{*}

, it is developed as follows

\nabla V^{*} = \nabla φ_{c}^{T} W_{c} + \nabla ε_{c},

(37)

where

\nabla φ_{c} = \partial φ_{c} (E) / \partial E

and

\nabla ε_{c} = \partial ε_{c} (E) / \partial E

.

Invoking (37), the optimal feedback control input (34) is developed as

U_{E}^{*} = - \frac{1}{2} R^{- 1} g_{E}^{T} (\nabla φ_{c}^{T} W_{c} + \nabla ε_{c}) .

(38)

Substituting (37) into (35), the HJB equation is developed as

W_{c}^{T} \nabla φ_{c} f_{E} + E^{T} Q E - \frac{1}{4} W_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} W_{c} + ε_{H} = 0,

(39)

where

Ξ = g_{E} R^{- 1} g_{E}^{T}

.

ε_{H}

represents the residual error, which takes the form of

\begin{matrix} ε_{H} & = \nabla ε_{c}^{T} f_{E} - \frac{1}{2} \nabla ε_{c}^{T} Ξ \nabla φ_{c}^{T} W_{c} - \frac{1}{4} \nabla ε_{c}^{T} Ξ \nabla ε_{c} \\ = \nabla ε_{c}^{T} (f_{E} + g_{E} U_{E}^{*}) + \frac{1}{4} \nabla ε_{c}^{T} Ξ \nabla ε_{c} . \end{matrix}

(40)

Since

∥ g_{E} ∥

is bounded, there exists the positive constants

Ξ_{m}

and

Ξ_{M}

such that

Ξ_{m} \leq ∥Ξ∥ \leq Ξ_{M}

.

Define the estimate of

W_{c}

as

{\hat{W}}_{c}

, then the estimate of

V^{*} (E)

is derived as follows:

\hat{V} (E) = {\hat{W}}_{c}^{T} φ_{c} (E) .

(41)

Moreover, the approximate optimal feedback control input is derived as

U_{E} = - \frac{1}{2} R^{- 1} g_{E}^{T} \nabla φ_{c}^{T} {\hat{W}}_{c} .

(42)

Remark 4.

The classical ADP method utilizes the critic network and the actor network to approximate the optimal cost function and the optimal feedback control, respectively [43,49,50]. Considering the association between the optimal cost function and the optimal feedback control for the continuous affine nonlinear system, it is possible to omit the actor network and only use the critic network [51,52]. This framework provides smaller computational effort, faster convergence and compared to the actor–critic network framework, which has a better practical value.

Combining (31), (41) and (42), the approximate Hamiltonian function is developed as

H (E, {\hat{W}}_{c}) = {\hat{W}}_{c}^{T} \nabla φ_{c} f_{E} + E^{T} Q E - \frac{1}{4} {\hat{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\hat{W}}_{c} ≜ e_{c} .

(43)

Define the objective function as

E_{c} = \frac{1}{2} e_{c}^{2} .

(44)

Moreover, the weight update law is designed as

{\dot{\hat{W}}}_{c} = - \frac{α_{1} σ}{σ_{c}^{2}} ({\hat{W}}_{c}^{T} \nabla φ_{c} f_{E} + E^{T} Q E - \frac{1}{4} {\hat{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\hat{W}}_{c}) + \frac{α_{2}}{2} Π (E, U_{E}) \nabla φ_{c} Ξ \nabla J,

(45)

where

α_{1} > 0

,

α_{2} > 0

are the learning rates to be designed.

σ = \nabla φ_{c} (f_{E} + g_{E} U_{E})

and

σ_{c} = σ^{T} σ + 1

.

\nabla J

is given in Assumption 4.

Π (E, U_{E})

in the last term is defined as

Π (E, U_{E}) = \{\begin{matrix} 0, & i f \nabla J^{T} (f_{E} + g_{E} U_{E}) + α_{3} \nabla J^{T} g_{E} g_{E}^{T} \nabla J < 0 \\ 1, & e l s e \end{matrix},

(46)

where

α_{3}

is a designed positive constant.

Remark 5.

The first term in (45) is employed for minimizing the objective function (44). To ensure that

{\hat{W}}_{c}

will converge to

W_{c}

, the existence of the persistence of excitation (PE) condition is essential during the learning process is necessary [49]. In addition, the probing noise is typically introduced to the control input for satisfying this condition, which may enable the closed-loop system to become unstable during the learning process [53,54]. The second term in (45) is employed for the stability of the closed-loop system.

Define the weight estimation error as

{\tilde{W}}_{c} = W_{c} - {\hat{W}}_{c}

. Observing that

{\dot{\tilde{W}}}_{c} = - {\dot{\hat{W}}}_{c}

,

σ = \nabla φ_{c} (f_{E} + g_{E} U_{E}) = \nabla φ_{c} {\dot{E}}^{*} + \frac{1}{2} \nabla φ_{c} Ξ \nabla ε_{c} + \frac{1}{2} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}

where

{\dot{E}}^{*} = f_{E} + g_{E} U_{E}^{*}

, and using (39) and (45), we have

\begin{matrix} {\dot{\tilde{W}}}_{c} = & - \frac{α_{1}}{σ_{c}^{2}} (\nabla φ_{c} {\dot{E}}^{*} + \frac{1}{2} \nabla φ_{c} Ξ \nabla ε_{c} + \frac{1}{2} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} \\ + \frac{1}{2} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} + \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c} + ε_{H}) - \frac{α_{2}}{2} Π (E, U_{E}) \nabla φ_{c} Ξ \nabla J . \end{matrix}

(47)

3.4. Stability Analysis

Assumption 5

([50]). The ideal weight

W_{c}

have bound over the compact set Ø such that

∥W_{c}∥ \leq W_{c M}

for a positive constant

W_{c M}

. Meanwhile, the activation function

φ_{c}

and the approximate error

ε_{c}

are bounded such that

∥φ_{c}∥ \leq φ_{c M}

,

∥ε_{c}∥ \leq ε_{c M}

for positive constants

φ_{c M}

and

ε_{c M}

, and their derivatives are also bounded such that

∥\nabla φ_{c}∥ \leq {\bar{φ}}_{c M}

and

∥\nabla ε_{c}∥ \leq {\bar{ε}}_{c M}

for positive constants

{\bar{φ}}_{c M}

and

{\bar{ε}}_{c M}

. Moreover, the residual error

ε_{H}

will converge to zero when the number of neurons N is sufficiently large, as suggested by Remark 3 and the bound of

∥ Ξ ∥

. That is, the relation

∥ ε_{H} ∥ \leq ε_{H M}

exists for the positive constant

ε_{H M}

.

Theorem 2.

Considering System (14), the robust approximate optimal controller for the trajectory tracking control is designed as (15), which consists of the robust compensation input (23) and the nominal system control input (16), and the weight update law is designed as (45) for the critic network, then it is ensured that the tracking error E of the closed-loop system and the weight estimation error

{\tilde{W}}_{c}

are UUB.

Proof.

Select the candidate Lyapunov function as follows

L = L_{D} + L_{J} + L_{W},

(48)

where

L_{D}

is designed as (20),

L_{J} = α_{2} J (E)

and

L_{W} = \frac{1}{2} {\tilde{W}}_{c}^{T} {\tilde{W}}_{c}

.

Considering the second term in (48) and using (27), the time derivative is developed as

{\dot{L}}_{J} = α_{2} \nabla J^{T} (f_{E} + g_{E} U_{E}) + α_{2} \nabla J^{T} g_{E} \tilde{D} .

(49)

Considering the third term in (48) and according to (47), the time derivative is developed as

\begin{matrix} {\dot{L}}_{W} = & {\tilde{W}}_{c}^{T} {\dot{\tilde{W}}}_{c} \\ = & - \frac{α_{1}}{σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} + \frac{1}{2} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} + \frac{1}{2} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} \\ + \frac{1}{2} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} + \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c} + ε_{H}) - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J \\ = & - \frac{α_{1}}{σ_{c}^{2}} {({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*})}^{2} - \frac{α_{1}}{4 σ_{c}^{2}} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c})}^{2} - \frac{α_{1}}{8 σ_{c}^{2}} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2} \\ - \frac{α_{1}}{σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) - \frac{3 α_{1}}{4 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) \\ - \frac{3 α_{1}}{8 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) - \frac{α_{1}}{σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} ε_{H} \\ - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} ε_{H} - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c} ε_{H} - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J . \end{matrix}

(50)

Since the first two terms in the final form of (50) are negative semi-definite, we then derive

\begin{matrix} {\dot{L}}_{W} \leq & - \frac{α_{1}}{8 σ_{c}^{2}} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2} - \frac{α_{1}}{σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) \\ - \frac{3 α_{1}}{4 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) \\ - \frac{3 α_{1}}{8 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) - \frac{α_{1}}{σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} ε_{H} \\ - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} ε_{H} - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c} ε_{H} - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J . \end{matrix}

(51)

According to Remark 3 and Assumption 5, and considering the bound of

∥ Ξ ∥

, we assume that

λ_{1 m} \leq ∥\nabla φ_{c} Ξ \nabla φ_{c}^{T}∥ \leq λ_{1 M}

,

∥ Ξ ∥ \leq λ_{2}

,

∥ \nabla φ_{c} {\dot{E}}^{*} ∥ \leq λ_{3}

,

∥ \nabla ε_{c} ∥ \leq λ_{4}

,

∥ \nabla φ_{c} Ξ \nabla ε_{c} ∥ \leq λ_{5}

and

∥ ε_{H} ∥ \leq λ_{6}

. Noticing that the PE condition guarantees

σ_{c}

to be bounded, there exists a positive constant

λ_{7}

such that

λ_{7} \leq 1 / σ_{c}^{2} \leq 1

. In addition, based on Young’s inequality, there exists the relation

- a b \leq \frac{1}{2} (c^{2} a^{2} + \frac{b^{2}}{c^{2}})

, where c is a nonzero constant. Then, we have

- \frac{α_{1}}{8 σ_{c}^{2}} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2} \leq - \frac{α_{1}}{8} λ_{7} λ_{1 m}^{2} {∥ {\tilde{W}}_{c} ∥}^{4},

(52)

\begin{matrix} - \frac{α_{1}}{σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) \leq & \frac{α_{1}}{2 σ_{c}^{2}} (c_{1}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*})}^{2} + \frac{{({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c})}^{2}}{c_{1}^{2}}) \\ \leq & \frac{α_{1} c_{1}^{2}}{2} λ_{3}^{2} ∥ {\tilde{W}}_{c} ∥^{2} + \frac{α_{1}}{2 c_{1}^{2}} λ_{5}^{2} {∥ {\tilde{W}}_{c} ∥}^{2}, \end{matrix}

(53)

\begin{matrix} - \frac{3 α_{1}}{4 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) \leq & \frac{3 α_{1}}{8 σ_{c}^{2}} (c_{2}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*})}^{2} + \frac{{({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2}}{c_{2}^{2}}) \\ \leq & \frac{3 α_{1} c_{2}^{2}}{8} λ_{3}^{2} ∥ {\tilde{W}}_{c} ∥^{2} + \frac{3 α_{1}}{8 c_{2}^{2}} λ_{1 M}^{2} {∥ {\tilde{W}}_{c} ∥}^{4}, \end{matrix}

(54)

\begin{matrix} - \frac{3 α_{1}}{8 σ_{c}^{2}} ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c}) ({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c}) \leq & \frac{3 α_{1}}{16 σ_{c}^{2}} (c_{3}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c})}^{2} + \frac{{({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2}}{c_{3}^{2}}) \\ \leq & \frac{3 α_{1} c_{3}^{2}}{16} λ_{5}^{2} ∥ {\tilde{W}}_{c} ∥^{2} + \frac{3 α_{1}}{16 c_{3}^{2}} λ_{1 M}^{2} {∥ {\tilde{W}}_{c} ∥}^{4}, \end{matrix}

(55)

\begin{matrix} - \frac{α_{1}}{σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*} ε_{H} \leq & \frac{α_{1}}{2 σ_{c}^{2}} (c_{4}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} {\dot{E}}^{*})}^{2} + \frac{ε_{H}^{2}}{c_{4}^{2}}) \\ \leq & \frac{α_{1} c_{4}^{2}}{2} λ_{3}^{2} {∥ {\tilde{W}}_{c} ∥}^{2} + \frac{α_{1}}{2 c_{4}^{2}} λ_{6}^{2}, \end{matrix}

(56)

\begin{matrix} - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c} ε_{H} \leq & \frac{α_{1}}{4 σ_{c}^{2}} (c_{5}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla ε_{c})}^{2} + \frac{ε_{H}^{2}}{c_{5}^{2}}) \\ \leq & \frac{α_{1} c_{5}^{2}}{4} λ_{5}^{2} {∥ {\tilde{W}}_{c} ∥}^{2} + \frac{α_{1}}{4 c_{5}^{2}} λ_{6}^{2}, \end{matrix}

(57)

\begin{matrix} - \frac{α_{1}}{2 σ_{c}^{2}} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c} ε_{H} \leq & \frac{α_{1}}{4 σ_{c}^{2}} (c_{6}^{2} {({\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla φ_{c}^{T} {\tilde{W}}_{c})}^{2} + \frac{ε_{H}^{2}}{c_{6}^{2}}) \\ \leq & \frac{α_{1} c_{6}^{2}}{4} λ_{1 M}^{2} {∥ {\tilde{W}}_{c} ∥}^{4} + \frac{α_{1}}{4 c_{6}^{2}} λ_{6}^{2} . \end{matrix}

(58)

Then, (51) is developed as

{\dot{L}}_{W} \leq - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J,

(59)

where

\begin{matrix} λ_{8} = & \frac{1}{8} λ_{7} λ_{1 m}^{2} - \frac{3}{8 c_{2}^{2}} λ_{1 M}^{2} - \frac{3}{16 c_{3}^{2}} λ_{1 M}^{2} - \frac{c_{6}^{2}}{4} λ_{1 M}^{2}, \\ λ_{9} = & \frac{c_{1}^{2}}{2} λ_{3}^{2} + \frac{1}{2 c_{1}^{2}} λ_{5}^{2} + \frac{3 c_{2}^{2}}{8} λ_{3}^{2} + \frac{3 c_{3}^{2}}{16} λ_{5}^{2} + \frac{c_{4}^{2}}{2} λ_{3}^{2} + \frac{c_{5}^{2}}{4} λ_{5}^{2}, \\ λ_{10} = & \frac{1}{2 c_{4}^{2}} λ_{6}^{2} + \frac{1}{4 c_{5}^{2}} λ_{6}^{2} + \frac{1}{4 c_{6}^{2}} λ_{6}^{2}, \end{matrix}

(60)

and

c_{j}

(j = 1, 2, . . ., 6)

are all non-zero constants whose selection guarantees

λ_{8} > 0

. Combining the results of (22), (49) and (51), we have

\begin{matrix} \dot{L} = & {\dot{L}}_{D} + {\dot{L}}_{J} + {\dot{L}}_{W} \\ \leq & - κ ∥ \tilde{D} ∥^{2} + α_{2} \nabla J^{T} (f_{E} + g_{E} U_{E}) + α_{2} \nabla J^{T} g_{E} \tilde{D} - α_{1} λ_{8} {∥ {\tilde{W}}_{c} ∥}^{4} \\ + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J . \end{matrix}

(61)

By using Young’s inequality, the relation

α_{2} \nabla J^{T} g_{E} \tilde{D} \leq \frac{α_{2} α_{3}}{2} \nabla J^{T} g_{E} g_{E}^{T} \nabla J + \frac{α_{2}}{2 α_{3}} {∥ \tilde{D} ∥}^{2}

exists. Then, (61) is developed as

\begin{matrix} \dot{L} \leq & - (κ - \frac{α_{2}}{2 α_{3}}) {∥ \tilde{D} ∥}^{2} + α_{2} \nabla J^{T} (f_{E} + g_{E} U_{E}) + \frac{α_{2} α_{3}}{2} \nabla J^{T} g_{E} g_{E}^{T} \nabla J \\ - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} - \frac{α_{2}}{2} Π (E, U_{E}) {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J . \end{matrix}

(62)

The following discussion is divided into two cases.

Case 1.

In this case,

Π (E, U_{E}) = 0

. Since

\nabla J^{T} (f_{E} + g_{E} U_{E}) + α_{3} \nabla J^{T} g_{E} g_{E}^{T} \nabla J < 0

, we can derive that

\nabla J^{T} (f_{E} + g_{E} U_{E}) < 0

. According to the dense property of

R

, there exists a positive constant

λ_{11}

such that

0 < λ_{11} ∥ \nabla J ∥ \leq - \nabla J^{T} (f_{E} + g_{E} U_{E})

for all

E \in Ø

. Then, (62) becomes

\dot{L} \leq - (κ - \frac{α_{2}}{2 α_{3}}) ∥ \tilde{D} ∥^{2} - \frac{α_{2}}{2} λ_{11} ∥ \nabla J ∥ - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} .

(63)

By selecting

α_{2}

and

α_{3}

, such that

κ - \frac{α_{2}}{2 α_{3}} > 0

, then

\dot{L} < 0

is satisfied provided that one of the following conditions holds:

∥\nabla J∥ > \frac{α_{1} (4 λ_{8} λ_{10} + λ_{9}^{2})}{2 α_{2} λ_{8} λ_{11}} ≜ ℓ_{1},

(64)

or

∥{\tilde{W}}_{c}∥ > \sqrt{\frac{λ_{9} + \sqrt{4 λ_{8} λ_{10} + λ_{9}^{2}}}{2 λ_{8}}} ≜ ℏ_{1} .

(65)

Case 2.

Considering the case

Π (E, U_{E}) = 1

, (62) is developed as

\begin{matrix} \dot{L} \leq & α_{2} (\nabla J^{T} (f_{E} + g_{E} U_{E}) + \frac{α_{3}}{2} \nabla J^{T} g_{E} g_{E}^{T} \nabla J) - (κ - \frac{α_{2}}{2 α_{3}}) {∥ \tilde{D} ∥}^{2} \\ - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} - \frac{α_{2}}{2} {\tilde{W}}_{c}^{T} \nabla φ_{c} Ξ \nabla J \\ = & α_{2} (\nabla J^{T} (f_{E} + g_{E} U_{E}^{*}) + \frac{α_{3}}{2} \nabla J^{T} g_{E} g_{E}^{T} \nabla J) + \frac{α_{2}}{2} \nabla J^{T} Ξ \nabla ε_{c} \\ - (κ - \frac{α_{2}}{2 α_{3}}) ∥ \tilde{D} ∥^{2} - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} . \end{matrix}

(66)

Based on Assumption 4, and considering

∥g_{E}∥ \leq g_{M}

, we have

\begin{matrix} \dot{L} \leq & - α_{2} (Λ_{m} - \frac{α_{3}}{2} g_{M}^{2}) {∥\nabla J∥}^{2} + \frac{α_{2}}{2} λ_{2} λ_{4} ∥\nabla J∥ - (κ - \frac{α_{2}}{2 α_{3}}) {∥ \tilde{D} ∥}^{2} \\ - α_{1} λ_{8} ∥ {\tilde{W}}_{c} ∥^{4} + α_{1} λ_{9} {∥ {\tilde{W}}_{c} ∥}^{2} + α_{1} λ_{10} . \end{matrix}

(67)

Similarly, by selecting

α_{2}

and

α_{3}

such that

λ_{12} = Λ_{m} - \frac{α_{3}}{2} g_{M}^{2} > 0

and

κ - \frac{α_{2}}{2 α_{3}} > 0

, then it means that

\dot{L} < 0

holds as long as

∥\nabla J∥ > \frac{λ_{2} λ_{4}}{4 λ_{12}} + \sqrt{\frac{α_{1} (4 λ_{8} λ_{10} + λ_{9}^{2})}{4 α_{2} λ_{8} λ_{12}} + \frac{λ_{2}^{2} λ_{4}^{2}}{16 λ_{12}^{2}}} ≜ ℓ_{2},

(68)

or

∥{\tilde{W}}_{c}∥ > \sqrt{\frac{λ_{9}}{2 λ_{8}} + \sqrt{\frac{λ_{10}}{λ_{8}} + \frac{λ_{9}^{2}}{4 λ_{8}^{2}} + \frac{α_{2} λ_{2}^{2} λ_{4}^{2}}{16 α_{1} λ_{8} λ_{12}}}} ≜ ℏ_{2} .

(69)

In conclusion,

\dot{L} < 0

when

∥\nabla J∥ > max \{ℓ_{1}, ℓ_{2}\}

or

∥{\tilde{W}}_{c}∥ > max \{ℏ_{1}, ℏ_{2}\}

. Relying on Lemma 1 and the standard Lyapunov extension theorem [55], it is further concluded that the tracking error E of the closed-loop system and the weight estimation error

{\tilde{W}}_{c}

are UUB. This completes the proof. □

Remark 6.

As a result of Theorem 2, the approximate optimal cost function

\hat{V} (E)

in (41) and the approximate optimal feedback control input

U_{E}

in (42) can, respectively, converge to the neighborhoods of the optimal cost function

V^{*} (E)

and the optimal feedback control input

U_{E}^{*}

within finite bounds when the PE condition holds [41].

4. Robust Approximate Optimal Trajectory Tracking Control for a Quadrotor

Position and yaw angle are the system outputs for the quadrotor that tracks the desired trajectory of position and the desired trajectory of yaw angle. The desired trajectories of roll and pitch angles required by the attitude subsystem are generated according to the position subsystem control inputs. The tracking errors in lateral and longitudinal positions are eliminated by the attitude subsystem tracking the desired trajectories of roll and pitch angles. According to the description of the control design in the previous section, the control design for the quadrotor is shown in Figure 2, which can guarantee that the tracking error of the quadrotor remains within a small range.

Figure 2. Control design of the quadrotor.

4.1. Position Control Design

The estimated value of unknown compound disturbance

{\hat{d}}_{p}

in the position subsystem is derived by the following disturbance observer

\{\begin{matrix} {\dot{z}}_{1} = - l_{1} (x_{1}) (f_{1} (x_{1}) + g_{1} (x_{1}) (p_{1} (x_{1}) + F + z_{1})) \\ {\hat{d}}_{p} = z_{1} + p_{1} (x_{1}) \end{matrix},

(70)

where

l_{1} (x_{1}) = \partial p_{1} (x_{1}) / \partial x_{1}

denotes the observer gain of the disturbance observer in the position subsystem and F is derived by (6). Then, the position subsystem robust compensation input is designed as

F^{R} = - {\hat{d}}_{p} .

(71)

The steady-state control input for the position nominal system is designed as

F_{d} = g_{1}^{+} (x_{1 d}) ({\dot{x}}_{1 d} - f_{1} (x_{1 d})),

(72)

where

x_{1 d} = {[P_{d}^{T}, v_{d}^{T}]}^{T} \in R^{6}

and

v_{d} = {[v_{x d}, v_{y d}, v_{z d}]}^{T} = {\dot{P}}_{d} \in R^{3}

.

g_{1}^{+} (x_{1 d})

denotes the pseudo-inverse of

g_{1} (x_{1 d})

. Then, define the position subsystem tracking error as

e_{1} = x_{1} - x_{1 d} ≜ {[e_{x}, e_{y}, e_{z}, e_{v_{x}}, e_{v_{y}}, e_{v_{z}}]}^{T} \in R^{6} .

(73)

The cost function of the position subsystem is represented as

V_{1} (e_{1}) = \int_{t_{0}}^{\infty} (e_{1}^{T} Q_{1} e_{1} + F_{e}^{T} R_{1} F_{e}) d t,

(74)

where

Q_{1} \in R^{6 \times 6}

and

R_{1} \in R^{3 \times 3}

are the designed symmetric definite matrices. The approximate optimal feedback control input in the position subsystem is

F_{e} = - \frac{1}{2} R_{1}^{- 1} g_{e 1}^{T} \nabla φ_{c 1}^{T} {\hat{W}}_{c 1},

(75)

where

g_{e 1} = g_{1} (x_{1})

and

\nabla φ_{c 1} = \partial φ_{c 1} (e_{1}) / \partial e_{1}

.

φ_{c 1} (e_{1})

is the activation function and

{\hat{W}}_{c 1}

represents the estimate of the ideal weight for the critic network of the position subsystem. The corresponding weight update law is designed as

\begin{matrix} {\dot{\hat{W}}}_{c 1} = & - \frac{α_{11} σ_{1}}{σ_{c 1}^{2}} (e_{1}^{T} Q_{1} e_{1} + {\hat{W}}_{c 1}^{T} \nabla φ_{c 1} f_{e 1} - \frac{1}{4} {\hat{W}}_{c 1}^{T} \nabla φ_{c 1} Ξ_{1} \nabla φ_{c 1}^{T} {\hat{W}}_{c 1}) \\ + \frac{α_{12}}{2} Π (e_{1}, F_{e}) \nabla φ_{c 1} Ξ_{1} \nabla J_{1}, \end{matrix}

(76)

where

α_{11} > 0

,

α_{12} > 0

are the designed learning rates.

σ_{1} = \nabla φ_{c 1} (f_{e 1} + g_{e 1} F_{e})

,

σ_{c 1} = σ_{1}^{T} σ_{1} + 1

and

Ξ_{1} = g_{e 1} R_{1}^{- 1} g_{e 1}^{T}

.

\nabla J_{1} = \partial J_{1} (e_{1}) / \partial e_{1}

, where

J_{1} (e_{1})

is the Lyapunov function candidate that satisfies Assumption 4.

Then, the robust approximate optimal trajectory tracking control in the position subsystem is designed as

F = F^{N} + F^{R} = F_{d} + F_{e} + F^{R} .

(77)

4.2. Attitude Resolution

Since the system of the quadrotor is underactuated and strongly coupled, the information of the position subsystem is used to calculate the total lift force. The desired trajectories of roll and pitch angles are determined by the position subsystem through the relation between the kinematic equation and the Euler equation and passed to the attitude subsystem. For the position subsystem, the generated tracking error and the received compound disturbance can be eliminated by the attitude subsystem. By a matrix operation on (6), the following equations are derived:

\begin{matrix} F_{x} = T (c_{ϕ} s_{θ} c_{ψ} + s_{ϕ} s_{ψ}), \\ F_{y} = T (c_{ϕ} s_{θ} s_{ψ} - s_{ϕ} c_{ψ}), \\ F_{z} = T c_{ϕ} c_{θ} - m g_{I} . \end{matrix}

(78)

The actual total lift force for the quadrotor system is designed as

T = (F_{z} + m g_{I}) / c_{ϕ} c_{θ} .

(79)

Substituting (79) into (78), the form is transformed as

[\begin{matrix} F_{x} \\ F_{y} \end{matrix}] = (F_{z} + m g_{I}) [\begin{matrix} c_{ψ} & s_{ψ} \\ s_{ψ} & - c_{ψ} \end{matrix}] [\begin{matrix} t_{θ} \\ t_{ϕ} / c_{θ} \end{matrix}] .

(80)

The desired trajectories of the pitch and roll angles are derived by the following equations:

\begin{matrix} F_{x} c_{ψ} + F_{y} s_{ψ} = (F_{z} + m g_{I}) t_{θ_{d}}, \\ F_{x} s_{ψ} - F_{y} c_{ψ} = (F_{z} + m g_{I}) \frac{t_{ϕ_{d}}}{c_{θ}} . \end{matrix}

(81)

Then, we have

\begin{matrix} θ_{d} = arctan (\frac{F_{x} c_{ψ} + F_{y} s_{ψ}}{F_{z} + m g_{I}}), \\ ϕ_{d} = arctan (c_{θ} \frac{F_{x} s_{ψ} - F_{y} c_{ψ}}{F_{z} + m g_{I}}) . \end{matrix}

(82)

4.3. Attitude Control Design

Similarly, the estimated value of unknown compound disturbance

{\hat{d}}_{a}

in the attitude subsystem is derived by the following disturbance observer

\{\begin{matrix} {\dot{z}}_{2} = - l_{2} (x_{2}) (f_{2} (x_{2}) + g_{2} (x_{2}) (p_{2} (x_{2}) + τ + z_{2})) \\ {\hat{d}}_{a} = z_{2} + p_{2} (x_{2}) \end{matrix},

(83)

where

l_{2} (x_{2}) = \partial p_{2} (x_{2}) / \partial x_{2}

denotes the observer gain of the disturbance observer in the attitude subsystem. Then, the attitude subsystem robust compensation input is

τ^{R} = - {\hat{d}}_{a} .

(84)

The desired trajectory for the angular velocity is given by [56]

ω_{d} = [\begin{matrix} 1 & 0 & - s_{θ_{d}} \\ 0 & c_{ϕ_{d}} & s_{ϕ_{d}} c_{θ_{d}} \\ 0 & - s_{ϕ_{d}} & c_{ϕ_{d}} c_{θ_{d}} \end{matrix}] {\dot{Θ}}_{d},

(85)

in which

Θ_{d} = {[ϕ_{d}, θ_{d}, ψ_{d}]}^{T} \in R^{3}

is the desired trajectory of Euler angles and

ω_{d} = {[p_{d}, q_{d}, r_{d}]}^{T} \in R^{3}

is the desired trajectory of the angular velocity. The steady-state control input for the attitude nominal system is designed as

τ_{d} = g_{2}^{+} (x_{2 d}) ({\dot{x}}_{2 d} - f_{2} (x_{2 d})),

(86)

where

x_{2 d} = {[Θ_{d}^{T}, ω_{d}^{T}]}^{T} \in R^{6}

and

g_{2}^{+} (x_{2 d})

denotes the pseudo-inverse of

g_{2} (x_{2 d})

. Then, define the attitude subsystem tracking error as

e_{2} = x_{2} - x_{2 d} ≜ {[e_{ϕ}, e_{θ}, e_{ψ}, e_{p}, e_{q}, e_{r}]}^{T} \in R^{6} .

(87)

While the cost function of the attitude subsystem is represented as

V_{2} (e_{2}) = \int_{t_{0}}^{\infty} (e_{2}^{T} Q_{2} e_{2} + τ_{e}^{T} R_{2} τ_{e}) d t,

(88)

where

Q_{2} \in R^{6 \times 6}

and

R_{2} \in R^{3 \times 3}

are the designed symmetric definite matrices. The approximate optimal feedback control input in the attitude subsystem is

τ_{e} = - \frac{1}{2} R_{2}^{- 1} g_{e 2}^{T} \nabla φ_{c 2}^{T} {\hat{W}}_{c 2},

(89)

where

g_{e 2} = g_{2} (x_{2})

and

\nabla φ_{c 2} = \partial φ_{c 2} (e_{2}) / \partial e_{2}

.

φ_{c 2} (e_{2})

is the activation function and

{\hat{W}}_{c 2}

represents the estimate of the ideal weight for the critic network of the attitude subsystem. The corresponding weight update law is designed as

\begin{matrix} {\dot{\hat{W}}}_{c 2} = & - \frac{α_{21} σ_{2}}{σ_{c 2}^{2}} (e_{2}^{T} Q_{2} e_{2} + {\hat{W}}_{c 2}^{T} \nabla φ_{c 2} f_{e 2} - \frac{1}{4} {\hat{W}}_{c 2}^{T} \nabla φ_{c 2} Ξ_{2} \nabla φ_{c 2}^{T} {\hat{W}}_{c 2}) \\ + \frac{α_{22}}{2} Π (e_{2}, τ_{e}) \nabla φ_{c 2} Ξ_{2} \nabla J_{2}, \end{matrix}

(90)

where

α_{21} > 0

,

α_{22} > 0

are the learning rates,

σ_{2} = \nabla φ_{c 2} (f_{e 2} + g_{e 2} τ_{e})

,

σ_{c 2} = σ_{2}^{T} σ_{2} + 1

and

Ξ_{2} = g_{e 2} R_{2}^{- 1} g_{e 2}^{T}

.

\nabla J_{2} = \partial J_{2} (e_{2}) / \partial e_{2}

, where

J_{2} (e_{2})

is the Lyapunov function candidate that satisfies Assumption 4.

Then, the robust approximate optimal trajectory tracking control in the attitude subsystem is designed as

τ = τ^{N} + τ^{R} = τ_{d} + τ_{e} + τ^{R} .

(91)

5. Simulation Results

In this section, the robustness and effectiveness of the designed controller are evaluated through numerical simulations. The quadrotor is considered to be in a flight environment with slow-changing disturbances. The parameters of the quadrotor model are presented in Table 1 [24].

Table 1. Parameters of quadrotor model.

A representative desired trajectory is selected to emulate the trajectory tracking performance of the quadrotor. The desired trajectory is designed as

P_{d} = [0.5 cos (0.5 t), 0.5 sin (0.5 t),

{0.05 t + 0.5]}^{T}

and

ψ_{d} = π / 12

. In addition, referring to [57,58], the unknown compound disturbances considered are described as

d_{p} = {[0.3 + 0.5 (sin (t) + sin (0.5 t) - cos (0.8 t)); 0.3 + 0.5 (cos (t) + sin (0.5 t) - cos (0.8 t)); 0.2 + 0.5 sin (1.5 t)]}^{T}

and

d_{a} = [0.1 + 0.2 (sin (t) + sin (0.5 t)); {0.1 + 0.2 (cos (0.5 t) - cos (0.8 t)); 0.05 + 0.2 sin (t) sin (0.5 t)]}^{T}

. In this way, the performance of the disturbance observers is reflected by comparing them with the estimates. The initial states of the quadrotor are all set to zero.

The vector-valued functions of the disturbance observers are designed as

p_{1} (x_{1}) = l_{1} (x_{1}) x_{1}

,

p_{2} (x_{2}) = l_{2} (x_{2}) x_{2}

, while the observer gains are selected as

l_{1} (x_{1}) = [\begin{matrix} 0 & 0 & 0 & 60 & 0 & 0 \\ 0 & 0 & 0 & 0 & 60 & 0 \\ 0 & 0 & 0 & 0 & 0 & 60 \end{matrix}], l_{2} (x_{2}) = [\begin{matrix} 0 & 0 & 0 & 5 & 0 & 0 \\ 0 & 0 & 0 & 0 & 5 & 0 \\ 0 & 0 & 0 & 0 & 0 & 5 \end{matrix}] .

Clearly,

l_{1} (x_{1}) g_{1} (x_{1})

and

l_{2} (x_{2}) g_{2} (x_{2})

are positive definite and satisfy the design requirements of Theorem 1. To derive the appropriate dynamic performance, the parameters of the performance index functions are designed as

Q_{1} = diag {7, 7, 10, 9, 9, 6}

,

Q_{2} = diag {1.5, 1.5, 1.2, 0.3, 0.3, 0.4}

,

R_{1} = R_{2} = I_{3}

. The activation functions are designed as

φ_{c 1} (e_{1}) = {[e_{x}^{2}, e_{x} e_{v_{x}}, e_{y}^{2}, e_{y} e_{v_{y}}, e_{z}^{2}, e_{z} e_{v_{z}}, e_{v_{x}}^{2}, e_{v_{y}}^{2}, e_{v_{z}}^{2}]}^{T}

,

φ_{c 2} (e_{2}) = [e_{ϕ}^{2}, e_{ϕ} e_{p}, e_{θ}^{2}, e_{θ} e_{q}, e_{ψ}^{2}, e_{ψ} e_{r}, e_{p}^{2},

e_{p} e_{q}, e_{p} e_{r}, e_{q}^{2}, e_{q} e_{r}, e_{r}^{2}, e_{ϕ}^{2} e_{q} e_{r}, e_{ϕ} e_{p} e_{q} e_{r}, e_{θ}^{2} e_{p} e_{r}, e_{θ} e_{p} e_{q} e_{r}, e_{ψ}^{2} e_{p} e_{q}, e_{ψ} e_{p} e_{q} e_{r}, e_{p}^{4}, e_{p}^{3} e_{q}, e_{p}^{3} e_{r}, e_{p}^{2} e_{q}^{2}, e_{p}^{2} e_{q} e_{r},

e_{p}^{2} e_{r}^{2}, e_{p} e_{q}^{3}, e_{p} e_{q}^{2} e_{r}, e_{p} e_{q} e_{r}^{2}, e_{p} e_{r}^{3}, e_{q}^{4}, e_{q}^{3} e_{r}, e_{q}^{2} e_{r}^{2}, e_{q} e_{r}^{3}, e_{r}^{4}]^{T}

. The relevant constants of the weight update laws are selected as

α_{11} = 10

,

α_{12} = 0.01

,

α_{13} = 0.1

,

α_{21} = 20

,

α_{22} = 0.001

,

α_{23} = 0.1

. The Lyapunov function candidates are selected as

J_{1} (e_{1}) = \frac{1}{2} e_{1}^{T} e_{1}

and

J_{2} (e_{2}) = \frac{1}{2} e_{2}^{T} e_{2}

. The initial weights are assigned values within the interval

[0, 1]

.

The PE condition is ensured by the method mentioned in Remark 5 to excite the system states. The weights gradually vary to become slower and stabilize during the learning process. The converged weights are already very close to the ideal weights after sufficient learning. The convergence of the whole critic network weights

{\hat{W}}_{c 1}

,

{\hat{W}}_{c 2}

in the learning processes are depicted in Figure 3. The final converged values of

{\hat{W}}_{c 1}

,

{\hat{W}}_{c 2}

are as follows

\begin{matrix} {\hat{W}}_{c 1} = & {[11.3714, 9.4718, 11.3713, 9.4718, 13.1589, 11.3203, 7.6718, 7.6718, 7.4279]}^{T}, \\ {\hat{W}}_{c 2} = & [0.7500, 0.0646, 0.7182, 0.0630, 0.8202, 0.0888, 0.0214, - 0.0006, - 0.0004, \\ 0.0221, - 0.0024, 0.0287, 0.0092, 0.0045, 0.0346, - 0.0059, - 0.0096, 0.0453, \\ 0.0340, 0.0211, 0.0106, - 0.0025, 0.0039, - 0.0045, 0.0227, 0.0126, 0.0155, \\ {0.0076, 0.0256, 0.0020, 0.0016, 0.0090, - 0.0185]}^{T} . \end{matrix}

Figure 3. Convergence of critic network weights.

The converged weights are used to design the approximate feedback optimal control inputs. Figure 4 and Figure 5 present the variation of states in trajectory tracking control, revealing the corresponding tracking errors in Figure 6 and Figure 7. In addition, Figure 8 visualizes the path in three-dimensional space, whereas Figure 9 illustrates the PWM signals for the motors. The figures clearly demonstrate that the quadrotor system effectively tracks the desired trajectory and achieves a small convergence bound for the tracking error. These results highlight the rapidity and accuracy of the designed controller in the control process.

Figure 4. Variation of states in the position subsystem.

Figure 5. Variation of states in the attitude subsystem.

Figure 6. Tracking errors in the position subsystem.

Figure 7. Tracking errors in the attitude subsystem.

Figure 8. Results of three-dimensional path.

Figure 9. Pulse-width of input signals.

The estimates for the compound disturbances are depicted in Figure 10. It shows that the estimated values from the disturbance observers can quickly follow the actual compound disturbances. Moreover, the trajectory tracking control performs well in the presence of compound disturbances, which implies the robustness of the designed controller.

Figure 10. Estimates of compound disturbances.

In order to verify that the designed controller rejects the compound disturbances, a comparative simulation is performed without the disturbance observers in the position subsystem and the attitude subsystem. The control inputs use only the control inputs designed for the nominal system. Under such control, the variation of states is presented in Figure 11 and Figure 12, while Figure 13 and Figure 14 show the corresponding tracking errors.

Figure 11. Variation of states in the position subsystem without disturbance observers.

Figure 12. Variation of states in the attitude subsystem without disturbance observers.

Figure 13. Tracking errors in the position subsystem without disturbance observers.

Figure 14. Tracking errors in the attitude subsystem without disturbance observers.

By comparing the simulation results, it is clear that the trajectory tracking control of the quadrotor cannot be realized without the robust compensation inputs. Thus, further demonstrating the robustness of the designed controller. Moreover, the corresponding path in three-dimensional space and the PWM signals of the motors are shown in Figure 15 and Figure 16, respectively.

Figure 15. Results of three-dimensional path without disturbance observers.

Figure 16. Pulse-width of input signals without disturbance observers.

In summary, the controller designed for quadrotor trajectory tracking control has good dynamic performance, high tracking accuracy and strong robustness when the quadrotor is subjected to compound disturbances.

6. Conclusions

This paper proposes a robust approximate optimal controller for the trajectory tracking control of the quadrotor with unknown compound disturbances. By incorporating the estimated values of compound disturbances that are estimated by the disturbance observers into the control design, the effect of compound disturbances can be suppressed, resulting in ensured tracking accuracy and improved robustness. Moreover, the ADP method can then be utilized in the nominal system for ensuring the performance index of the control. The stability of the closed-loop system is analyzed by the Lyapunov theorem, which demonstrates that the tracking errors are UUB. Simulation results further confirm the robustness and effectiveness of the designed controller. In future work, experiments will be considered to validate the performance of the proposed controller.

Author Contributions

Conceptualization, R.L. and Z.Y.; methodology, R.L.; software, Z.Y.; validation, R.L., Z.Y. and G.Y.; formal analysis, R.L.; investigation, Z.Y.; resources, R.L.; data curation, L.J.; writing—original draft preparation, Z.Y.; writing—review and editing, R.L., Z.Y. and G.Y.; visualization, G.L.; supervision, Z.L.; project administration, R.L.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62003233), the Fundamental Research Program of Shanxi Province (Grant Nos. 201901D211083 and 20210302124552), and the Science and Technology Innovation Project of Higher Education Institutions in Shanxi Province (Grant No. 2019L0236).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hassanalian, M.; Abdelkefi, A. Classifications, applications, and design challenges of drones: A review. Prog. Aerosp. Sci. 2017, 91, 99–131. [Google Scholar] [CrossRef]
Salem, K.A.; Palaia, G.; Chiarelli, M.R.; Bianchi, M. A simulation framework for aircraft take-off considering ground effect aerodynamics in conceptual design. Aerospace 2023, 10, 459. [Google Scholar] [CrossRef]
Salem, K.A.; Palaia, G.; Quarta, A.A. Review of hybrid-electric aircraft technologies and designs: Critical analysis and novel solutions. Prog. Aerosp. Sci. 2023, 141, 100924. [Google Scholar] [CrossRef]
Shao, S.; Chen, M.; Hou, J.; Zhao, Q. Event-triggered-based discrete-time neural control for a quadrotor UAV using disturbance observer. IEEE/ASME Trans. Mechatronics 2021, 26, 689–699. [Google Scholar] [CrossRef]
Idrissi, M.; Salami, M.; Annaz, F. A review of quadrotor unmanned aerial vehicles: Applications, architectural design and control algorithms. J. Intell. Robot. Syst. 2022, 104, 22. [Google Scholar] [CrossRef]
Rinaldi, F.; Chiesa, S.; Quagliotti, F. Linear quadratic control for quadrotors UVAs dynamics and formation flight. J. Intell. Robot. Syst. 2013, 70, 203–220. [Google Scholar] [CrossRef]
Dharmawan, A.; Priyambodo, T.K. Model of linear quadratic regulator (lqr) control method in hovering state of quadrotor. J. Telecommun. Electron. Comput. Eng. (JTEC) 2017, 9, 135–143. [Google Scholar]
Alonge, F.; D’Ippolito, F.; Fagiolini, A.; Garraffa, G.; Sferlazza, A. Trajectory robust control of autonomous quadcopters based on model decoupling and disturbance estimation. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421996974. [Google Scholar] [CrossRef]
Yang, Y.; Yan, Y. Attitude regulation for unmanned quadrotors using adaptive fuzzy gain-scheduling sliding mode control. Aerosp. Sci. Technol. 2016, 54, 208–217. [Google Scholar] [CrossRef]
Avram, R.C.; Zhang, X.; Muse, J. Nonlinear adaptive fault-tolerant quadrotor altitude and attitude tracking with multiple actuator faults. IEEE Trans. Control. Syst. Technol. 2017, 26, 701–707. [Google Scholar] [CrossRef]
Chen, F.; Lei, W.; Zhang, K.; Tao, G.; Jiang, B. A novel nonlinear resilient control for a quadrotor UVA via backstepping control and nonlinear disturbance observer. Nonlinear Dyn. 2016, 85, 1281–1295. [Google Scholar] [CrossRef]
Liu, H.; Xi, J.; Zhong, Y. Robust attitude stabilization for nonlinear quadrotor systems with uncertainties and delays. IEEE Trans. Ind. Electron. 2017, 64, 5585–5594. [Google Scholar] [CrossRef]
Liu, E.; Yan, Y.; Yang, Y. Neural network approximation-based backstepping sliding mode control for spacecraft with input saturation and dynamics uncertainty. Acta Astronaut. 2022, 191, 1–10. [Google Scholar] [CrossRef]
Li, R.; Chen, M.; Wu, Q. Robust control for an unmanned helicopter with constrained flapping dynamics. Chin. J. Aeronaut. 2018, 31, 2136–2148. [Google Scholar] [CrossRef]
Li, R.; Chen, M.; Wu, Q. Adaptive neural tracking control for uncertain nonlinear systems with input and output constraints using disturbance observer. Neurocomputing 2017, 235, 27–37. [Google Scholar] [CrossRef]
Yang, Y.; Modares, H.; Vamvoudakis, K.G.; He, W.; Xu, C.Z.; Wunsch, D.C. Hamiltonian-driven adaptive dynamic programming with approximation errors. IEEE Trans. Cybern. 2021, 52, 13762–13773. [Google Scholar] [CrossRef]
Xue, S.; Luo, B.; Liu, D. Event-triggered adaptive dynamic programming for zero-sum game of partially unknown continuous-time nonlinear systems. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 3189–3199. [Google Scholar] [CrossRef]
Du, Y.; Jiang, B.; Ma, Y.; Cheng, Y. Robust ADP-based sliding-mode fault-tolerant control for nonlinear systems with application to spacecraft. Appl. Sci. 2022, 12, 1673. [Google Scholar] [CrossRef]
Huang, Y.; Wang, D.; Liu, D. Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing 2017, 266, 128–140. [Google Scholar] [CrossRef]
Wang, D.; Liu, D.; Li, H. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 2014, 11, 627–632. [Google Scholar] [CrossRef]
Dou, L.; Su, X.; Zhao, X.; Zong, Q.; He, L. Robust tracking control of quadrotor via on-policy adaptive dynamic programming. Int. J. Robust Nonlinear Control 2021, 31, 2509–2525. [Google Scholar] [CrossRef]
Mu, C.; Zhang, Y. Learning-based robust tracking control of quadrotor with time-varying and coupling uncertainties. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 259–273. [Google Scholar] [CrossRef]
Chen, W.H.; Yang, J.; Guo, L.; Li, S. Disturbance-observer-based control and related methods—An overview. IEEE Trans. Ind. Electron. 2015, 63, 1083–1095. [Google Scholar] [CrossRef]
Chen, M.; Xiong, S.; Wu, Q. Tracking flight control of quadrotor based on disturbance observer. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1414–1423. [Google Scholar] [CrossRef]
Chen, F.; Jiang, R.; Zhang, K.; Jiang, B.; Tao, G. Robust backstepping sliding-mode control and observer-based fault estimation for a quadrotor UVA. IEEE Trans. Ind. Electron. 2016, 63, 5044–5056. [Google Scholar]
Shao, X.; Liu, J.; Cao, H.; Shen, C.; Wang, H. Robust dynamic surface trajectory tracking control for a quadrotor UVA via extended state observer. Int. J. Robust Nonlinear Control 2018, 28, 2700–2719. [Google Scholar] [CrossRef]
Mofid, O.; Mobayen, S. Adaptive sliding mode control for finite-time stability of quad-rotor UAVs with parametric uncertainties. ISA Trans. 2018, 72, 1–14. [Google Scholar] [CrossRef] [PubMed]
Lei, W.; Li, C.; Chen, M.Z. Robust adaptive tracking control for quadrotors by combining PI and self-tuning regulator. IEEE Trans. Control Syst. Technol. 2018, 27, 2663–2671. [Google Scholar] [CrossRef]
Maqsood, H.; Qu, Y. Nonlinear disturbance observer based sliding mode control of quadrotor helicopter. J. Electr. Eng. Technol. 2020, 15, 1453–1461. [Google Scholar] [CrossRef]
Hua, H.; Fang, Y.; Zhang, X.; Lu, B. A novel robust observer-based nonlinear trajectory tracking control strategy for quadrotors. IEEE Trans. Control Syst. Technol. 2020, 29, 1952–1963. [Google Scholar] [CrossRef]
Song, R.; Lewis, F.L. Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020, 390, 185–195. [Google Scholar] [CrossRef]
Lee, D. Nonlinear disturbance observer-based robust control for spacecraft formation flying. Aerosp. Sci. Technol. 2018, 76, 82–90. [Google Scholar] [CrossRef]
Yuan, W.; Gao, G. Sliding mode control of the automobile electro-coating conveying mechanism with a nonlinear disturbance observer. Adv. Mech. Eng. 2018, 10, 1687814018795748. [Google Scholar] [CrossRef]
Orozco Soto, S.M.; Cacace, J.; Ruggiero, F.; Lippiello, V. Active Disturbance Rejection Control for the Robust Flight of a Passively Tilted Hexarotor. Drones 2022, 6, 258. [Google Scholar] [CrossRef]
Wang, Y.; Sun, J.; He, H.; Sun, C. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 3713–3725. [Google Scholar] [CrossRef]
Li, C.; Wang, Y.; Yang, X. Adaptive fuzzy control of a quadrotor using disturbance observer. Aerosp. Sci. Technol. 2022, 128, 107784. [Google Scholar] [CrossRef]
Fan, Y.; Guo, H.; Han, X.; Chen, X. Research and verification of trajectory tracking control of a quadrotor carrying a load. Appl. Sci. 2022, 12, 1036. [Google Scholar] [CrossRef]
Wang, B.; Yu, X.; Mu, L.; Zhang, Y. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech. Syst. Signal Process. 2019, 120, 727–743. [Google Scholar] [CrossRef]
Fei, Y.; Shi, P.; Lim, C.C. Robust and collision-free formation control of multiagent systems with limited information. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4286–4295. [Google Scholar] [CrossRef]
Fei, Y.; Shi, P.; Lim, C.C. Robust formation control for multi-agent systems: A reference correction based approach. IEEE Trans. Circuits Syst. Regul. Pap. 2021, 68, 2616–2625. [Google Scholar] [CrossRef]
Xia, R.; Wu, Q.; Shao, S. Disturbance observer-based optimal flight control of near space vehicle with external disturbance. Trans. Inst. Meas. Control 2020, 42, 272–284. [Google Scholar] [CrossRef]
Sun, J.; Liu, C. Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming. J. Frankl. Inst. 2018, 355, 2344–2368. [Google Scholar] [CrossRef]
Zhang, H.; Cui, L.; Zhang, X.; Luo, Y. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 2011, 22, 2226–2236. [Google Scholar] [CrossRef] [PubMed]
Xu, N.; Niu, B.; Wang, H.; Huo, X.; Zhao, X. Single-network ADP for solving optimal event-triggered tracking control problem of completely unknown nonlinear systems. Int. J. Intell. Syst. 2021, 36, 4795–4815. [Google Scholar] [CrossRef]
Xia, R.; Wu, Q.; Chen, M. Disturbance observer-based optimal longitudinal trajectory control of near space vehicle. Sci. China Inf. Sci. 2019, 62, 1–3. [Google Scholar] [CrossRef]
Sun, J.; Liu, C. Backstepping-based adaptive dynamic programming for missile-target guidance systems with state and input constraints. J. Frankl. Inst. 2018, 355, 8412–8440. [Google Scholar] [CrossRef]
Wang, D.; Liu, D.; Li, H.; Ma, H. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 2014, 282, 167–179. [Google Scholar] [CrossRef]
Zheng, S.; Shi, P.; Wang, S.; Shi, Y. Adaptive neural control for a class of nonlinear multiagent systems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 763–776. [Google Scholar] [CrossRef] [PubMed]
Fan, Q.Y.; Yang, G.H. Adaptive actor–critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 165–177. [Google Scholar] [CrossRef] [PubMed]
Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 142–160. [Google Scholar] [CrossRef]
Zhao, B.; Liu, D.; Luo, C. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4330–4340. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Liu, D.; Zhang, Y.; Li, H. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw. 2018, 97, 11–18. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Wei, Q.; Wang, D.; Yang, X.; Li, H. Adaptive Dynamic Programming with Applications in Optimal Control; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Lewis, F.L.; Jagannathan, S.; Yesildirek, A. Neural Network Control of Robot Manipulators and Nonlinear Systems; Taylor & Francis: London, UK, 1999. [Google Scholar]
Castillo, A.; Sanz, R.; Garcia, P.; Qiu, W.; Wang, H.; Xu, C. Disturbance observer-based quadrotor attitude tracking control for aggressive maneuvers. Control Eng. Pract. 2019, 82, 14–23. [Google Scholar] [CrossRef]
Mobayen, S.; El-Sousy, F.F.; Alattas, K.A.; Mofid, O.; Fekih, A.; Rojsiraphisal, T. Adaptive fast-reaching nonsingular terminal sliding mode tracking control for quadrotor UAVs subject to model uncertainties and external disturbances. Ain Shams Eng. J. 2023, 14, 102059. [Google Scholar] [CrossRef]
Shao, X.; Yue, X.; Li, J. Event-triggered robust control for quadrotors with preassigned time performance constraints. Appl. Math. Comput. 2021, 14, 102059. [Google Scholar] [CrossRef]

Figure 1. Basic structure of the quadrotor.

Figure 2. Control design of the quadrotor.

Figure 3. Convergence of critic network weights.

Figure 4. Variation of states in the position subsystem.

Figure 5. Variation of states in the attitude subsystem.

Figure 6. Tracking errors in the position subsystem.

Figure 7. Tracking errors in the attitude subsystem.

Figure 8. Results of three-dimensional path.

Figure 9. Pulse-width of input signals.

Figure 10. Estimates of compound disturbances.

Figure 11. Variation of states in the position subsystem without disturbance observers.

Figure 12. Variation of states in the attitude subsystem without disturbance observers.

Figure 13. Tracking errors in the position subsystem without disturbance observers.

Figure 14. Tracking errors in the attitude subsystem without disturbance observers.

Figure 15. Results of three-dimensional path without disturbance observers.

Figure 16. Pulse-width of input signals without disturbance observers.

Table 1. Parameters of quadrotor model.

Symbol	Value	Units
m	1.79	kg
$g_{I}$	9.81	m/s²
l	0.20	m
$K_{t}$	12.0	N
$K_{o}$	0.40	N·m
$I_{x x} = I_{y y}$	0.03	kg·m²
$I_{z z}$	0.04	kg·m²
$k_{x} = k_{y} = k_{z}$	0.012	N· s/m

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Robust Approximate Optimal Trajectory Tracking Control for Quadrotors

Abstract

1. Introduction

2. Mathematical Modeling of a Quadrotor

3. Robust Approximate Optimal Trajectory Tracking Control Design

3.1. Disturbance Observer Design

3.2. Optimal Trajectory Tracking Control Design and Analysis

3.3. Approximate Optimal Control Design

3.4. Stability Analysis

4. Robust Approximate Optimal Trajectory Tracking Control for a Quadrotor

4.1. Position Control Design

4.2. Attitude Resolution

4.3. Attitude Control Design

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics