Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties

Yang, Shaoyu; Yu, Fang; Liu, Hui; Ma, Hongyue; Zhang, Haichao

doi:10.3390/app132312672

Open AccessArticle

Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties

by

Shaoyu Yang

¹,

Fang Yu

^1,*,

Hui Liu

¹,

Hongyue Ma

¹ and

Haichao Zhang

²

¹

Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China

²

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(23), 12672; https://doi.org/10.3390/app132312672

Submission received: 5 November 2023 / Revised: 19 November 2023 / Accepted: 23 November 2023 / Published: 26 November 2023

(This article belongs to the Special Issue Adaptive Dynamic Programming and Control Application in Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Thiswork addresses the trajectory-tracking-control problem for a quadrotor unmanned aerial vehicle with external disturbances and parameter uncertainties. A novel adaptive-dynamic-programming-based robust control method is proposed to eliminate the effects of lumped uncertainties (including external disturbances and parameter uncertainties) and to ensure the approximate optimal control performance. Its novelty lies in that two radial basis function neural network observers with fixed-time convergence properties were first established to reconstruct the lumped uncertainties. Notably, they tune only the scalar parameters online and have low computational complexities. Subsequently, two actor–critic neural networks were designed to approximate the optimal cost functions and control policies for the nominal system. In this design, two new actor–critic neural network weight update laws are proposed to eliminate the persistent excitation condition. Then, two adaptive-dynamic-programming-based robust control laws were obtained by integrating the observer reconstruction information and the nominal control policies. The uniformly ultimately bounded stability of the closed-loop tracking control systems was ensured using the Lyapunov methodology. Finally, numerical results are shown to verify the effectiveness and superiority of the proposed control scheme.

Keywords:

adaptive dynamic programming; fixed-time neural-network-based observer; actor–critic neural networks; quadrotor unmanned aerial vehicle; trajectory tracking control

1. Introduction

In recent years, unmanned aerial vehicles (UAVs) have attracted significant research interest on account of their wide range of applications in both the civilian and military fields, such as agricultural production, logistics and distribution, urban management, border blockade prevention, reconnaissance, surveillance, etc. [1,2,3,4,5]. Compared with other flight vehicles, a quadrotor UAV has many significant advantages, including a simple structure, vertical take-off and landing, a low manufacturing cost, and so on [6]. However, the quadrotor UAV flight control system is an underactuated system, and its dynamic model possesses the characteristics of nonlinearity and a strong coupling of the translational and rotational dynamics. Moreover, it is very sensitive to disturbances, including external gust disturbances, aerodynamic effects, model uncertainties, and so on. Considering the trajectory tracking control of a quadrotor UAV as the basis for the previously mentioned applications, a series of extensive studies has been undertaken on this subject. For practical applications, linear control schemes have been adopted to linearize the complex dynamic model to ensure the basic operation of a quadrotor UAV, such as proportional–integral–derivative (PID) control [7] and linear quadratic regulator (LQR) control [8]. However, the linear model can cause performance degradations when performing complex tasks and when in unknown environments. In order to improve the tracking performance, various nonlinear control strategies have been further studied, including sliding mode control [9,10], backstepping control [11], adaptive control [12], and so on. However, the control schemes mentioned above only focus on the stability of the system and do not consider the control cost. Since mobile vehicles can carry limited energy equipment, it is necessary to consider the control cost and energy optimization [13,14]. Therefore, it is of great significance to design a tracking control scheme that can not only stabilize the quadrotor UAV flight control system, but also minimize its control cost.

Adaptive dynamic programming (ADP) is an optimization-theory-based control method and achieves a balance between the control cost and control performance by adaptively adjusting the control policy. As an effective method for solving the optimal control problems of nonlinear systems, it has received much attention in its related fields. Werbos et al. [15] first proposed a heuristic dynamic programming algorithm based on reinforcement learning and designed a preliminary online learning control framework. Vamvoudakis et al. [16] proposed an online algorithm based on actor–critic neural networks (AC NNs) to learn the optimal control solutions for nonlinear systems of known dynamics with an infinite horizon cost. The AC NNs were used to approximate the control policy and cost function, respectively. Kamalapurkar et al. [17] investigated an ADP-based optimal tracking problem for nonlinear systems. Wang et al. [18] solved the robust stabilization problem via adaptive-critic-based techniques. Wen et al. [19] introduced a new optimized backstepping control method based on ADP and backstepping technologies, which solved the tracking control problem of strict feedback systems. Furthermore, ADP technology has also been extensively researched in fault-tolerant control [20], zero-sum games [21], multi-agent systems [22], etc. However, most of the existing weight update laws are required to satisfy persistent excitation (PE) conditions. It is worth mentioning that some scholars have performed successive studies to avoid the use of PE conditions and make ADP technology easier to implement in practical engineering. In [23,24], the experience replay technique was utilized to overcome the PE condition for weight convergence. Dong et al. [25] proposed a simplified online learning strategy, which removed the PE conditions via the concurrent learning method. Liu et al. [26] designed a novel weight update law by utilizing adaptive technology, which removed the PE condition. Unfortunately, most of the existing methods still rely on real data. Thus, how to effectively avoid the use of PE conditions is still an open problem.

Meanwhile, uncertainties can seriously degrade the performance of a quadrotor UAV flight control system. Uncertainties usually have time-varying characteristics, and the ADP technique lacks the ability to tackle time-varying signals [26,27]. Thus, it alone cannot be utilized to achieve high-quality control of quadrotor UAVs. Therefore, integrating the uncertainty attenuation technique with the ADP technique is of great significance in solving the control problems of quadrotor UAVs. In recent years, due to their strong fitting ability, neural networks (NNs) have provided some promising approaches to the uncertainty attenuation problem in practical systems such as quadrotor UAVs [28,29,30,31,32]. Zhao et al. [30] proposed an NN-based adaptive control scheme for the unknown and continuous dynamics of quadrotor UAVs, which cannot achieve fast convergence characteristics. To improve the convergence speed of the control system, some scholars have integrated finite-time control techniques with NN-based techniques. Wang et al. [31] proposed a control scheme based on a finite-time multivariate NN interference observer for helicopter systems. Liu et al. [32] proposed an adaptive NN-learning-based control scheme with finite-time convergence characteristics for the quadrotor UAV fault-tolerant control problem. However, the performance of the finite-time control techniques depends heavily on the initial state errors or observation errors, which limits the scope of their application in quadrotor UAVs. Therefore, combining NN-based techniques with fixed-time techniques for application in a quadrotor UAV control system requires further research. In addition, NN units commonly directly approximate the uncertainties in the above-mentioned studies, which leads to a large computational load and parameter explosion.

Motivated by the aforementioned significant research, this paper investigated the control scheme of a quadrotor UAV under external disturbances and parameter uncertainties. The investigation focused on tracking control by integrating the NN-based observer technique with the ADP-based control technique, ensuring that the resulting control laws demonstrate both robust and approximately optimal characteristics. The main contributions are outlined below:

(1): Two fixed-time NN-based observers (FTNNOBs) were developed to compensate the control inputs of quadrotor UAV, which can estimate external disturbances and parameter uncertainties in a fixed time. Different from the traditional NN approximators designed in [33,34,35], the FTNNOBs only need to adjust the scalar parameters rather than the weight vectors or matrices. They provided a simple structure and inexpensive computation.
(2): A novel ADP-based robust control scheme is proposed by combining the ADP technique with the estimated information from FTNNOBs, which improves the tracking control accuracy and optimizes the control cost consumption. Different from the existing weight update laws of AC NNs [16,25], two novel weight update laws were introduced in this work. They not only have a simple structure, but are also independent of the PE condition. Moreover, two auxiliary terms related to the system states were introduced to improve the data utilization efficiency of the AC NNs and make the training results more effective.

The rest of the paper is organized as follows. Section 2 introduces the dynamics model of a quadrotor UAV and derives the position and attitude error systems. Section 3 describes the design process of the ADP-based robust control scheme and the stability analysis of the overall closed-loop system. Section 4 introduces the simulation results. Section 5 presents the conclusions of this paper.

Notations 1.

∥\cdot∥

is utilized for the Euclidean norm of vectors and matrices.

λ_{\min} (\cdot)

and

λ_{\max} (\cdot)

represent the minimum or maximum eigenvalues of symmetric matrices, respectively. For any

x = {[x_{1}, x_{2}, \dots, x_{n}]}^{T}

, we define

{sig}^{η} (x) = {[{|x_{1}|}^{η} sgn (x_{1}), \dots, {|x_{n}|}^{η} sgn (x_{n})]}^{T}

, where

η \in R^{+}

,

sgn (\cdot)

denotes the sign function.

I_{n}

represents an identity matrix of size n.

2. Model Description and Transformation

2.1. Dynamics Model

The mechanical structure of a quadrotor UAV and the coordinate reference system are shown in Figure 1.

ς_{E} \{o_{E} x_{E} y_{E} z_{E}\}

is the Earth-fixed inertial frame.

ς_{B} \{o_{B} x_{B} y_{B} z_{B}\}

denotes the body-fixed frame. In the frame

ς_{E}

,

ζ = {[x, y, z]}^{T} \in R^{3}

presents the position state vector.

Θ = {[ϕ, θ, ψ]}^{T} \in R^{3}

is the Euler angle vector, in which

ϕ, θ

and

ψ

denote the roll, pitch, and yaw angles.

η = {[u, v, w]}^{T} \in R^{3}

and

Ω = {[p, q, r]}^{T} \in R^{3}

are the linear and angular velocity vectors under the frame

ς_{B}

, respectively. Furthermore, the kinematic transformation relationships of the quadrotor UAV between the frame

ς_{E}

and

ς_{B}

are introduced as [32]:

\begin{matrix} \dot{ζ} = R_{B}^{E} η, \dot{Θ} = Φ_{B}^{E} Ω, \end{matrix}

(1)

where the transformation matrices

R_{B}^{E}

and

Φ_{B}^{E}

from the frame

ς_{B}

to

ς_{E}

can be expressed as

\begin{matrix} \begin{matrix} R_{B}^{E} = [\begin{matrix} C θ C ψ & S ϕ S θ C ψ - C ϕ S ψ & C ϕ S θ C ψ + S ϕ S ψ \\ C θ S ψ & S ϕ S θ S ψ + C ϕ S ψ & C ϕ S θ S ψ - S ϕ S ψ \\ - S θ & S ϕ C θ & C ϕ C θ \end{matrix}], \end{matrix} \end{matrix}

\begin{matrix} Φ_{B}^{E} = [\begin{matrix} 1 & S ϕ T θ & C ϕ T θ \\ 0 & C ϕ & - S ϕ \\ 0 & S ϕ / C θ & C ϕ / C θ \end{matrix}], \end{matrix}

with

S \cdot, C \cdot

and

T \cdot

denoting

\sin (\cdot), \cos (\cdot)

and

\tan (\cdot)

, respectively. The dynamical model [36,37] with respect to the frame

ς_{B}

can be established as

\begin{matrix} \begin{matrix} m \dot{η} = - Ω_{\times} m η - F_{f} + F_{g} + U_{f} + D_{f}, \\ J \dot{Ω} = - Ω_{\times} J Ω - M_{t} + G_{a} + U_{t} + D_{t}, \end{matrix} \end{matrix}

(2)

where m is the total mass of the quadrotor UAV.

J = diag \{J_{x}, J_{y}, J_{z}\} \in R^{3 \times 3}

represents the moments of inertia.

F_{f} = {[k_{1} u, k_{2} v, k_{3} w]}^{T} \in R^{3}

and

M_{t} = {[k_{4} p, k_{5} q, k_{6} r]}^{T} \in R^{3}

represent aerodynamic friction vectors with

k_{1}, k_{2}, \dots, k_{6}

being the aerodynamic friction coefficients.

U_{f} = {[0, 0, f]}^{T} \in R^{3}

is the total thrust vector, and

U_{t} = {[U_{t 1}, U_{t 2}, U_{t 3}]}^{T} \in R^{3}

denotes the control torque vector.

D_{f} = {[D_{f 1}, D_{f 2}, D_{f 3}]}^{T} \in R^{3}

and

D_{t} = {[D_{t 1}, D_{t 2}, D_{t 3}]}^{T} \in R^{3}

denote the external disturbance vectors of the quadrotor UAV.

F_{g} = {(R_{B}^{E})}^{- 1} {[0, 0, - m g]}^{T} \in R^{3}

denotes the gravity vector with g being the gravity coefficient.

G_{a} = {[J_{r} q \bar{ω}, - J_{r} p \bar{ω}, 0]}^{T} \in R^{3}

represents the gyroscopic moment vector, where

J_{r}

is the inertia of each motor, and

\bar{ω} = ω_{1} - ω_{2} + ω_{3} - ω_{4}

is the total residual rotor speed. Note that the inaccessibility of the rotor speed,

G_{a}

is considered as an additional disturbance during the control design [38]. In addition,

{(\cdot)}_{\times}

denotes the skew-symmetric matrix satisfying

\begin{matrix} x_{\times} = [\begin{matrix} 0 & - x_{3} & x_{2} \\ x_{3} & 0 & - x_{1} \\ - x_{2} & x_{1} & 0 \end{matrix}], \forall x = {[x_{1}, x_{2}, x_{3}]}^{T} \in R^{3} . \end{matrix}

Consider the additive parameter uncertainties of the dynamics system as (2); the model parameters are assumed to be

m = m_{0} + Δ_{m},

J_{x} = J_{x 0} + Δ_{J_{x}},

J_{y} = J_{y 0} + Δ_{J_{y}},

J_{z} = J_{z 0} + Δ_{J_{z}},

k_{1} = k_{10} + Δ_{k_{1}},

k_{2} = k_{20} + Δ_{k_{2}},

k_{3} = k_{30} + Δ_{k_{3}},

k_{4} = k_{40} + Δ_{k_{4}},

k_{5} = k_{50} + Δ_{k_{5}},

k_{6} = k_{60} + Δ_{k_{6}}

.

m_{0}

denotes the nominal total mass.

J_{0} = diag \{J_{x 0}, J_{y 0}, J_{z 0}\}

denotes the nominal inertia matrix.

k_{10}, k_{20}, \dots, k_{60}

denote the nominal value of the aerodynamic friction coefficients, respectively, and

Δ_{*}

denotes the corresponding parameter uncertainties. Combining (1) and (2), one can further obtain the position and attitude subsystems in the form of

\begin{matrix} \{\begin{matrix} {\dot{x}}_{11} = x_{12}, \\ {\dot{x}}_{12} = - g q - P_{1} x_{12} + \frac{1}{m_{0}} (u + d_{1}), \end{matrix} \end{matrix}

(3)

\begin{matrix} \{\begin{matrix} {\dot{x}}_{21} = x_{22}, \\ {\dot{x}}_{22} = - P_{2} x_{22} + J_{0}^{- 1} (τ + d_{2}) \end{matrix} \end{matrix}

(4)

with

x_{11} = ζ, x_{12} = \dot{ζ}

,

x_{21} = Θ, x_{22} = \dot{Θ}

, and

q = {[0, 0, 1]}^{T}

.

u = R_{B}^{E} U_{f}

and

τ = Φ_{B}^{E} U_{t}

represent the control inputs under the frame

ς_{E}

.

P_{1}

and

P_{2}

are defined as

\begin{matrix} \begin{matrix} P_{1} = \frac{1}{m_{0}} diag \{k_{10}, k_{20}, k_{30}\}, \\ P_{2} = - {\dot{Φ}}_{B}^{E} {(Φ_{B}^{E})}^{- 1} + J_{0}^{- 1} F_{1} + Φ_{B}^{E} J_{0}^{- 1} {({(Φ_{B}^{E})}^{- 1} \dot{Θ})}_{\times} J_{0} {(Φ_{B}^{E})}^{- 1} \end{matrix} \end{matrix}

with

F_{1} = diag \{k_{40}, k_{50}, k_{60}\}

.

d_{1} = u_{d} - f_{p}

and

d_{2} = τ_{d} - f_{a}

represent the lumped uncertainties, where

u_{d} = R_{B}^{E} D_{f}

,

τ_{d} = Φ_{B}^{E} D_{t}

; the parameter uncertainties

f_{p}

and

f_{a}

are defined as

\begin{matrix} \begin{matrix} f_{p} = Δ_{m} g q + Δ_{m} \ddot{ζ} + Δ_{P_{1}} \dot{ζ}, \\ f_{a} = Δ_{J} \ddot{Θ} + Δ_{P_{3}} \dot{Θ} + Δ_{F_{1}} \dot{Θ} \end{matrix} \end{matrix}

with

\begin{matrix} \begin{matrix} Δ_{P_{1}} = diag \{Δ_{k_{1}}, Δ_{k_{2}}, Δ_{k_{3}}\}, \\ Δ_{F_{1}} = diag \{Δ_{k_{4}}, Δ_{k_{5}}, Δ_{k_{6}}\}, \\ Δ_{P_{3}} = - Φ_{B}^{E} Δ_{J} {({\dot{Φ}}_{B}^{E})}^{- 1} + Φ_{B}^{E} ({(Δ_{J} {(Φ_{B}^{E})}^{- 1} \dot{Θ})}_{\times} + F_{2}) {(Φ_{B}^{E})}^{- 1}, \\ F_{2} = [0, J_{r} \bar{ω}, 0; - J_{r} \bar{ω}, 0, 0; 0, 0, 0] . \end{matrix} \end{matrix}

Since

Δ_{P_{3}}

is a Coriolis-like matrix, we have the following result according to the properties of the Coriolis matrix [39].

Property 1.

∥Δ_{P_{3}}∥ \leq α_{1} ∥\dot{Θ}∥

with

α_{1}

being a positive constant.

2.2. Model Transformation

Define the reference trajectories of the position and attitude subsystems as

ζ_{d} = {[x_{d}, y_{d}, z_{d}]}^{T}

and

Θ_{d} = {[ϕ_{d}, θ_{d}, ψ_{d}]}^{T}

. Due to the under-actuated characteristics of a quadrotor UAV, the desired roll and pitch can be generated by

\begin{matrix} \begin{matrix} ϕ_{d} = \arcsin (\frac{u_{q 1} \sin ψ_{d} - u_{q 2} \cos ψ_{d}}{∥u_{q}∥}), θ_{d} = \arctan (\frac{u_{q 1} \cos ψ_{d} + u_{q 2} \sin ψ_{d}}{u_{q 3}}), \end{matrix} \end{matrix}

where

u_{q} = \frac{u}{m_{0}}

with

u_{q} = {[u_{q 1}, u_{q 2}, u_{q 3}]}^{T}

.

Then, the tracking errors can be represented as

ζ_{e} = {[x_{e}, y_{e}, z_{e}]}^{T} = ζ - ζ_{d}

and

Θ_{e} = {[ϕ_{e}, θ_{e}, ψ_{e}]}^{T} = Θ - Θ_{d}

. Combining with (3) and (4), one can further obtain the error subsystems:

\begin{matrix} \{\begin{matrix} {\dot{e}}_{11} = e_{12}, \\ {\dot{e}}_{12} = - g q - P_{1} e_{12} + \frac{1}{m_{0}} (u + d_{1}) - {\ddot{ζ}}_{d} - P_{1} {\dot{ζ}}_{d}, \end{matrix} \end{matrix}

(5)

\begin{matrix} \{\begin{matrix} {\dot{e}}_{21} = e_{22}, \\ {\dot{e}}_{22} = - P_{2} e_{22} + J_{0}^{- 1} (τ + d_{2}) - {\ddot{Θ}}_{d} - P_{2} {\dot{Θ}}_{d}, \end{matrix} \end{matrix}

(6)

where

e_{11} = ζ_{e}, e_{12} = {\dot{ζ}}_{e}

,

e_{21} = Θ_{e}

, and

e_{22} = {\dot{Θ}}_{e}

. For subsequent optimal controller design, divide the control inputs

u

and

τ

into two parts (i.e.,

u = u_{0} + u_{l}, τ = τ_{0} + τ_{l}

). By setting two feedforward controllers

u_{l} = m_{0} (g q + {\ddot{ζ}}_{d} + P_{1} {\dot{ζ}}_{d})

and

τ_{l} = J_{0} ({\ddot{Θ}}_{d} + P_{2} {\dot{Θ}}_{d})

, (5) and (6) can be formulated as

\begin{matrix} {\dot{e}}_{i} = f (e_{i}) + g_{i} (χ_{i} + d_{i}) . \end{matrix}

(7)

As shown in (7),

i = 1, 2

represent the error subsystems of position and attitude, respectively. In addition,

e_{i} = {[e_{i 1}, e_{i 2}]}^{T}, χ_{1} = u_{0}, χ_{2} = τ_{0}

:

\begin{matrix} f (e_{i}) = [\begin{matrix} e_{i 2} \\ - P_{i} e_{i 2} \end{matrix}], g_{1} = [\begin{matrix} 0_{3 \times 3} \\ \frac{1}{m_{0}} I_{3} \end{matrix}], g_{2} = [\begin{matrix} 0_{3 \times 3} \\ J_{0}^{- 1} . \end{matrix}] \end{matrix}

On the basis of the above transformation of the dynamical model, the UAV trajectory tracking control problem is transformed into a stabilization problem. To the end, we first designed FTNNOBs to estimate the uncertainties in fixed time and give online compensation for the approximate optimal control policies. Then, the approximate optimal control policies for the nominal systems

{\dot{e}}_{i} = f (e_{i}) + g_{i} χ_{i}

are obtained using the ADP technique. The block diagram of the control systems is shown in Figure 2.

Lemma 1

([40]). Consider a nonlinear system in the form:

\begin{matrix} \dot{x} = f (x), f (0) = 0, \end{matrix}

(8)

where

f : U_{0} \to R^{n}

is continuous in an open neighborhood

U_{0}

of the origin. Consider the system (8); if a Lyapunov function

V (x)

satisfies

\dot{V} (x) \leq - α V^{p} (x) - β V^{q} (x) + ϑ

, where α, β and ϑ are positive constants, and

0 < p < 1

,

q > 1

, then the trajectory of the system is practical fixed-time stable within settling time

t_{f}

, which satisfies

t_{f} \leq \frac{1}{α ω (1 - p)} + \frac{1}{β ω (q - 1)}

with

0 < ω < 1

. In addition, the residual set is given by

\bar{ϑ} = \{x | V (x) \leq \min \{{(\frac{ϱ}{α})}^{\frac{1}{p}}, {(\frac{ϱ}{β})}^{\frac{1}{q}}\}\}

with

ϱ = \frac{ϑ}{1 - ω}

.

Lemma 2

([41]). For any

x, y \in R

,

{|x + y|}^{v_{1}} \leq 2^{v_{1} - 1} |x^{v_{1}} + y^{v_{1}}|

, if

v_{1} \in R^{+}

and

v_{1} > 1

.

Lemma 3

([37]). For any

x, y \in R

,

{(|x| + |y|)}^{v_{2}} \leq {|x|}^{v_{2}} + {|y|}^{v_{2}}

with

0 < v_{2} \leq 1

.

Assumption 1.

Without loss of generality, we can make the following assumption from [42,43]. The parameter uncertainties

f_{p}, f_{a}

are bounded by the following functions:

\begin{matrix} \begin{matrix} ∥f_{p}∥ \leq α_{2} + α_{3} ∥ζ∥ + α_{4} {∥\dot{ζ}∥}^{2}, ∥f_{a}∥ \leq α_{5} + α_{6} ∥Θ∥ + α_{7} {∥\dot{Θ}∥}^{2}, \end{matrix} \end{matrix}

where

α_{2}, α_{3}, α_{4}, α_{5}, α_{6}

, and

α_{7}

are unknown, but bounded positive constants.

Assumption 2.

The external continuous disturbances

u_{d}, τ_{d}

under the frame

ς_{E}

and the system matrices

g_{i}

are norm bounded (i.e.,

∥u_{d}∥ \leq u_{\bar{d}}

,

∥τ_{d}∥ \leq τ_{\bar{d}}

, and

∥g_{i}∥ \leq {\bar{g}}_{i}

), where

u_{\bar{d}}, τ_{\bar{d}}

, and

\bar{g_{i}}

are positive constants.

3. Adaptive-Dynamic-Programming-Based Robust Control Design

3.1. Online-Uncertainty-Compensation-Based Fixed-Time NN-Based Observers

Consider the systems (7) with uncertainties; the auxiliary systems are introduced as

\begin{matrix} {\dot{\hat{e}}}_{i} = f (e_{i}) + g_{i} (χ_{i} + {\hat{d}}_{i}), \end{matrix}

(9)

where

{\hat{d}}_{i}

are the estimated values of the lumped uncertainties. Define the auxiliary variables as

Ξ_{i} = g_{i}^{†} (e_{i} - {\hat{e}}_{i})

with

g_{i}^{†}

being pseudo inverse matrices of

g_{i}

. Consider the Assumptions 1 and 2; this yields

\begin{matrix} ∥d_{i}∥ \leq h_{i} H_{i} (S_{i}) = ξ_{i} (S_{i}) \end{matrix}

with

\begin{matrix} \begin{matrix} h_{1} = \max \{u_{\bar{d}}, α_{2}, α_{3}, α_{4}\}, h_{2} = \max \{τ_{\bar{d}}, α_{5}, α_{6}, α_{7}\}, \\ H_{1} (S_{1}) = 1 + ∥ζ∥ + {∥\dot{ζ}∥}^{2}, H_{2} (S_{2}) = 1 + ∥Θ∥ + {∥\dot{Θ}∥}^{2}, \end{matrix} \end{matrix}

where

H_{i} (S_{i})

are defined as the core functions [44]. Furthermore, they are dependent on

S_{1} = {[ζ, \dot{ζ}, {\dot{ζ}}^{2}]}^{T}

and

S_{2} = {[Θ, \dot{Θ}, {\dot{Θ}}^{2}]}^{T}

. As pointed out in [45,46], the continuous nonlinear scalar functions

ξ_{i} (S_{i})

can be approximated using NNs with Gaussian basis functions:

\begin{matrix} ξ_{i} (S_{i}) = W_{f i}^{T} Z (S_{i}) + ε_{f i}, \end{matrix}

where

Z (S_{i}) = {[Z_{1} (S_{i}), Z_{2} (S_{i}), \dots, Z_{n} (S_{i})]}^{T}

are activation function vectors. Furthermore,

Z_{n} (S_{i})

are expressed in detail as

\begin{matrix} Z_{n} (S_{i}) = \exp (- \frac{{∥S_{i} - ι_{i n}∥}^{2}}{h_{i n}^{2}}), \end{matrix}

where

ι_{i n}

and

h_{i n}

are the center vectors and width and n is the sum of the number of the Gaussian basis functions. It is straightforward to show that

Z (S_{i})

have norm upper bounds

ϵ_{i}

, that is

∥Z (S_{i})∥ < ϵ_{i} < \infty

with

ϵ_{i}

are positive constants.

W_{f i}

denote the optimal weight vectors, and

ε_{f i}

are the approximation errors, in which

W_{f i}

and

ε_{f i}

are bounded, i.e.,

∥W_{f i}∥ \leq {\bar{W}}_{f i}

and

∥ε_{f i}∥ \leq {\bar{ε}}_{f i}

.

In order to reduce the number of parameters for online calculation, the following procedure is designed. Firstly, we have

\begin{matrix} ∥d_{i}∥ \leq ξ_{i} (S_{i}) \leq ∥W_{f i}^{T}∥ ∥Z (S_{i})∥ + ∥ε_{f i}∥ \leq b_{i} Z_{f i} \end{matrix}

with

b_{i} = \max \{∥W_{f i}∥, ∥ε_{f i}∥\}, Z_{f i} = 1 + ∥Z (S_{i})∥

.

Then, the FTNNOBs are established as

{\hat{d}}_{i} = r_{i 3} {sig}^{\frac{m_{i}}{n_{i}}} (Ξ_{i}) + r_{i 4} {sig}^{\frac{p_{i}}{q_{i}}} (Ξ_{i}) + c_{i} {\hat{l}}_{i} Ξ_{i} Z_{f i}^{2},

(10)

where

m_{i}, n_{i}, p_{i}

and

q_{i}

are positive odd integers satisfying

m_{i} < n_{i}, p_{i} > q_{i}

.

r_{i 3}, r_{i 4}

, and

c_{i}

are positive constants.

{\hat{l}}_{i}

are the estimated values of

l_{i} = b_{i}^{2}

, which are tuned by the following adaptive laws:

{\dot{\hat{l}}}_{i} = - ϱ_{i 1} {\hat{l}}_{i} + c_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2}, ({\hat{l}}_{i} (0) \geq 0)

(11)

with

ϱ_{i 1}

being positive constants. The estimation errors of

l_{i}

are defined as

{\tilde{l}}_{i} = l_{i} - {\hat{l}}_{i}

. We have the following result for the proposed procedure.

Lemma 4.

Based on the proposed adaptive laws (11), there exists positive constants

{\bar{l}}_{i}

such that

{\hat{l}}_{i} \leq {\bar{l}}_{i}

and

l_{i} \leq {\bar{l}}_{i}

.

Proof.

Select the following positive definite Lyapunov functions:

\begin{matrix} L_{i 1} = \frac{1}{2} Ξ_{i}^{T} Ξ_{i} + \frac{1}{2} {\tilde{l}}_{i}^{2} . \end{matrix}

Substituting (10) and (11) into the time derivative of

L_{i 1}

yields

\begin{matrix} \begin{matrix} {\dot{L}}_{i 1} \leq b_{i} Z_{f i} ∥Ξ_{i}∥ - r_{i 3} {({∥Ξ_{i}∥}^{2})}^{\frac{m_{i} + n_{i}}{2 n_{i}}} - c_{i} {\hat{l}}_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} - c_{i} {\tilde{l}}_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} + ϱ_{i 1} {\tilde{l}}_{i} {\hat{l}}_{i} . \end{matrix} \end{matrix}

(12)

According to Young’s inequality [47], one has

\begin{matrix} x y \leq \frac{{\overset{ˇ}{α}}^{a}}{a} {|x|}^{a} + \frac{1}{p {\overset{ˇ}{α}}^{p}} {|y|}^{p}, \end{matrix}

where

x, y \in R, \overset{ˇ}{α} > 0, a > 1, p > 1

and

(a - 1) (p - 1) = 1

. Then, one has

\begin{matrix} \begin{matrix} b_{i} Z_{f i} ∥Ξ_{i}∥ \leq c_{i} b_{i}^{2} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} + \frac{1}{4 c_{i}} ϱ_{i 1}, {\tilde{l}}_{i} {\hat{l}}_{i} \leq - \frac{ϱ_{i 1}}{2} {\tilde{l}}_{i}^{2} + \frac{ϱ_{i 1}}{2} l_{i}^{2} . \end{matrix} \end{matrix}

(13)

According to Lemma 3 and substituting (13) into (12), one can obtain that

\begin{matrix} \begin{matrix} {\dot{L}}_{i 1} \leq & - r_{i 3} 2^{\frac{m_{i} + n_{i}}{2 n_{i}}} {(\frac{1}{2} {∥Ξ_{i}∥}^{2})}^{\frac{m_{i} + n_{i}}{2 n_{i}}} - ϱ_{i 1} 2^{\frac{m_{i} + n_{i}}{2 n_{i}}} {(\frac{1}{2} {\tilde{l}}_{i}^{2})}^{\frac{m_{i} + n_{i}}{2 n_{i}}} + ϱ_{i 1} {({\tilde{l}}_{i}^{2})}^{\frac{m_{i} + n_{i}}{2 n_{i}}} - ϱ_{i 1} {\tilde{l}}_{i}^{2} \\ + \frac{ϱ_{i 1}}{2} l_{i}^{2} + \frac{1}{4 c_{i}} \\ \leq & - π_{i 1} L_{i 1}^{a_{i 2}} - ϱ_{i 1} {\tilde{l}}_{i}^{2} + ϱ_{i 1} {({\tilde{l}}_{i}^{2})}^{a_{i 2}} + \frac{ϱ_{i 1}}{2} l_{i}^{2} + \frac{1}{4 c_{i}} . \end{matrix} \end{matrix}

If

{\tilde{l}}_{i}^{2} \geq 1

, one can obtain that

\begin{matrix} ϱ_{i 1} {({\tilde{l}}_{i}^{2})}^{a_{i 2}} - ϱ_{i 1} {\tilde{l}}_{i}^{2} \leq 0 . \end{matrix}

If

{\tilde{l}}_{i}^{2} < 1

, one can obtain that

\begin{matrix} 0 \leq ϱ_{i 1} {({\tilde{l}}_{i}^{2})}^{a_{i 2}} - ϱ_{i 1} {\tilde{l}}_{i}^{2} \leq a_{i 1}, \end{matrix}

where

\begin{matrix} \begin{matrix} π_{i 1} = \min \{r_{i 3} 2^{a_{i 2}}, ϱ_{i 1} 2^{a_{i 2}}\}, a_{i 1} = ϱ_{i 1} ({a_{i 2}}^{\frac{a_{i 2}}{1 - a_{i 2}}} - {a_{i 2}}^{\frac{1}{1 - a_{i 2}}}) \end{matrix} \end{matrix}

with

a_{i 2} = \frac{m_{i} + n_{i}}{2 n_{i}}

. Then, it can further deduced that

L_{i 1} \leq - π_{i 1} L_{i 1}^{a_{i 2}} + ϑ_{i 0} = - L_{i 1}^{a_{i 2}} (ϑ_{i 0} - π_{i 1} L_{i 1}^{- a_{i 2}})

with

ϑ_{i 0} = a_{i 1} + \frac{ϱ_{i 1}}{2} l_{i}^{2} + \frac{1}{4 c_{i}}

. Furthermore, if

ϑ_{i 0} - π_{i 1} L_{i 1}^{- a_{i 2}} > 0

,

L_{i 1}

will reachan invariant set

\{L_{i 1} | L_{i 1} \leq {(\frac{π_{i 1}}{ϑ_{i 0}})}^{\frac{1}{a_{i 2}}}\}

. Thus, one can conclude that

L_{i 1}

and

{\tilde{l}}_{i} = l_{i} - {\hat{l}}_{i}

are bounded, and there exists positive constants

{\bar{l}}_{i}

such that

{\hat{l}}_{i}, l_{i} \leq {\bar{l}}_{i}

. □

Furthermore, we will show that the proposed FTNNOBs are stable, as shown in the following theorem.

Theorem 1.

Consider the systems with uncertainties (7); if the FTNNOBs are designed as the form of (10) with

{\hat{l}}_{i}

being updated by (11), the uncertainties’ observation errors

{\tilde{d}}_{i} = d_{i} - {\hat{d}}_{i}

will converge to a bounded region of the origin in a fixed time.

Proof.

Select a candidate Lyapunov functions in the form of

\begin{matrix} L_{i 2} = \frac{1}{2} Ξ_{i}^{T} Ξ_{i} + \frac{1}{2} {\overset{ˇ}{l}}_{i}^{2} . \end{matrix}

where

{\overset{ˇ}{l}}_{i} = {\bar{l}}_{i} - {\hat{l}}_{i}

. Substituting (10) and (11) into the time derivative of

L_{i 2}

, one can achieve

\begin{matrix} \begin{matrix} {\dot{L}}_{i 2} \leq & b_{i} Z_{f i} ∥Ξ_{i}∥ - r_{i 3} {({∥Ξ_{i}∥}^{2})}^{a_{i 2}} - r_{i 4} {({∥Ξ_{i}∥}^{2})}^{a_{i 4}} - c_{i} {\hat{l}}_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} - c_{i} {\overset{ˇ}{l}}_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} \\ + ϱ_{i 1} {\overset{ˇ}{l}}_{i} {\hat{l}}_{i} . \end{matrix} \end{matrix}

(14)

where

a_{i 4} = \frac{p_{i} + q_{i}}{2 q_{i}}

. According to Young’s inequality, one has

\begin{matrix} \begin{matrix} b_{i} Z_{f i} ∥Ξ_{i}∥ \leq c_{i} {\bar{l}}_{i} {∥Ξ_{i}∥}^{2} Z_{f i}^{2} + \frac{1}{4 c_{i}}, ϱ_{i 1} {\overset{ˇ}{l}}_{i} {\hat{l}}_{i} \leq - \frac{ϱ_{i 1}}{2} {\overset{ˇ}{l}}_{i}^{2} + \frac{ϱ_{i 1}}{2} {\bar{l}}_{i}^{2} . \end{matrix} \end{matrix}

(15)

Substituting (15) into (14), one can obtain that

\begin{matrix} \begin{matrix} {\dot{L}}_{i 2} \leq & - r_{i 3} 2^{a_{i 2}} {(\frac{1}{2} {∥Ξ_{i}∥}^{2})}^{a_{i 2}} - ϱ_{i 1} 2^{a_{i 2}} {(\frac{1}{2} {\overset{ˇ}{l}}_{i}^{2})}^{a_{i 2}} - r_{i 4} 2^{a_{i 4}} {(\frac{1}{2} {∥Ξ_{i}∥}^{2})}^{a_{i 4}} \\ - ϱ_{i 1} 2^{a_{i 4}} {(\frac{1}{2} {\overset{ˇ}{l}}_{i}^{2})}^{a_{i 4}} - ϱ_{i 1} {\overset{ˇ}{l}}_{i}^{2} + ϱ_{i 1} {({\overset{ˇ}{l}}_{i}^{2})}^{a_{i 2}} + ϱ_{i 1} {({\overset{ˇ}{l}}_{i}^{2})}^{a_{i 4}} + \frac{ϱ_{i 1}}{2} {\bar{l}}_{i}^{2} + \frac{1}{4 c_{i}} . \end{matrix} \end{matrix}

If

{\overset{ˇ}{l}}_{i}^{2} \geq 1

, one can obtain that

\begin{matrix} ϱ_{i 1} {({\overset{ˇ}{l}}_{i}^{2})}^{a_{i 2}} - ϱ_{i 1} {\overset{ˇ}{l}}_{i}^{2} \leq 0 . \end{matrix}

If

{\overset{ˇ}{l}}_{i}^{2} < 1

, one can obtain that

\begin{matrix} 0 \leq ϱ_{i 1} {({\overset{ˇ}{l}}_{i}^{2})}^{a_{i 2}} - ϱ_{i 1} {\overset{ˇ}{l}}_{i}^{2} \leq a_{i 1}, \end{matrix}

On account of the inequalities

|{\overset{ˇ}{l}}_{i}| = |{\bar{l}}_{i} - {\hat{l}}_{i}| \leq |{\bar{l}}_{i}|, 0 < {\hat{l}}_{i} < {\bar{l}}_{i}

and according to Lemmas 2 and 3, one can further obtain that

\begin{matrix} {\dot{L}}_{i 2} \leq - π_{i 1} L_{i 1}^{a_{i 2}} - π_{i 2} 2^{a_{i 3}} L_{i 1}^{a_{i 4}} + ϑ_{i 1}, \end{matrix}

where

π_{i 1} = \min \{r_{i 3} 2^{a_{i 2}}, ϱ_{i 1} 2^{a_{i 2}}\}

,

π_{i 2} = \min \{r_{i 4} 2^{a_{i 4}}, ϱ_{i 1} 2^{a_{i 4}}\}, a_{i 3} = \frac{q_{i} - p_{i}}{2 q_{i}}

, and

ϑ_{i 1} = \frac{1}{4 c_{i}} + a_{i 1} + ϱ_{i 1} (\frac{{\bar{l}}_{i}^{2}}{2} + {\bar{l}}_{i}^{2 a_{i 4}})

. Based on Lemma 1,

L_{i 2}

will reduce to a residual set in fixed time. The settling time can be achieved by

t_{i f} \leq \frac{1}{π_{i 1} ω_{i} (1 - a_{i 2})} + \frac{1}{π_{i 2} 2^{a_{i 3}} ω_{i} (a_{i 4} - 1)}

with

0 < ω_{i} < 1

. Defining an augmented vector

{\overset{ˇ}{X}}_{i} = {[Ξ_{i}^{T}, {\overset{ˇ}{l}}_{i}]}^{T}

,

L_{i 2}

is also represented as

L_{i 2} = \frac{1}{2} {\overset{ˇ}{X}}_{i}^{T} {\overset{ˇ}{X}}_{i}

, and it is bounded by

\begin{matrix} {\overset{ˇ}{ϑ}}_{i 1} = \{{\overset{ˇ}{X}}_{i} ∣ L_{i 2} ({\overset{ˇ}{X}}_{i}) \leq \min \{ϑ_{i 2}, ϑ_{i 3}\}\}, \end{matrix}

where

ϑ_{i 2} = {(\frac{ϱ_{i 0}}{π_{i 1}})}^{\frac{1}{a_{i 2}}}

and

ϑ_{i 3} = {(\frac{ϱ_{i 0}}{π_{i 2} 2^{a_{i 3}}})}^{\frac{1}{a_{i} 4}}

with

ϱ_{i 0} = \frac{ϑ_{i 1}}{1 - ω_{i}}

. Then, one can conclude that

\begin{matrix} ∥Ξ_{i}∥ \leq ∥{\overset{ˇ}{X}}_{i}∥ \leq \sqrt{2 {\bar{ϑ}}_{i 1}} . \end{matrix}

(16)

where

{\bar{ϑ}}_{i 1} = \min \{ϑ_{i 2}, ϑ_{i 3}\}

. Consider the definition of

Ξ_{i}

; one has

\begin{matrix} \begin{matrix} {\dot{Ξ}}_{i} = & d_{i} - {\hat{d}}_{i} = d_{i} - r_{i 3} {sig}^{\frac{m_{i}}{n_{i}}} (Ξ_{i}) - r_{i 4} {sig}^{\frac{p_{i}}{q_{i}}} (Ξ_{i}) - c_{i} {\hat{l}}_{i} Ξ_{i} Z_{f i}^{2} = {\tilde{d}}_{i} . \end{matrix} \end{matrix}

Then, according to the result (16), the upper norm bounds of

{\tilde{d}}_{i}

are given by

\begin{matrix} \begin{matrix} ∥{\tilde{d}}_{i}∥ \leq & ∥d_{i}∥ + r_{i 3} ∥{sig}^{\frac{m_{i}}{n_{i}}} (Ξ_{i})∥ + r_{i 4} ∥{sig}^{\frac{p_{i}}{q_{i}}} (Ξ_{i})∥ + c_{i} {\bar{l}}_{i} Z_{f i}^{2} ∥Ξ_{i}∥ \\ \leq & b_{i} (1 + ϵ_{i}) + 3 r_{i 3} {(\sqrt{2 {\bar{ϑ}}_{i 1}})}^{\frac{m_{i}}{n_{i}}} + 3 r_{i 4} {(\sqrt{2 {\bar{ϑ}}_{i 1}})}^{\frac{p_{i}}{q_{i}}} + c_{i} {\bar{l}}_{i} \sqrt{2 {\bar{ϑ}}_{i 1}} {(1 + ϵ_{i})}^{2} . \end{matrix} \end{matrix}

Based on the above analysis and the selection pf appropriate parameter values, the uncertainties’ observation errors

{\tilde{d}}_{i}

will converge to a bounded region of origin in a fixed time. □

Remark 1.

Compared with previous NN-based observer methods, it is obvious that the proposed FTNNOBs only need to estimate a scalar parameter online instead of a weight vector or matrix, which reduces the computational complexities. Moreover, the fixed-time convergence of the estimation error is guaranteed.

3.2. ADP-Based Nominal Optimal Control Design

Consider the nominal systems of (7):

\begin{matrix} {\dot{e}}_{i} = f (e_{i}) + g_{i} μ_{i} . \end{matrix}

where

μ_{i}

is the approximated optimal control policies to be designed. According to [17,26,33], the infinite horizon cost functions can be defined as

\begin{matrix} V_{i} (e_{i} (0)) = \int_{0}^{\infty} (e_{i}^{T} Q_{i} e_{i} + μ_{i}^{T} R_{i} μ_{i}) d t, \end{matrix}

(17)

where

Q_{i}

and

R_{i}

are assumed to be positive definite diagonal constant matrices. Assuming that the optimal policies

μ_{i}^{*}

exist, the corresponding optimally cost functions of (17) are formulated as

\begin{matrix} \begin{matrix} V_{i}^{*} (e_{i} (0)) = \min_{μ_{i} \in Ψ_{i} (ϖ_{i})} (\int_{0}^{\infty} (e_{i}^{T} Q_{i} e_{i} + μ_{i}^{T} R_{i} μ_{i}) d t), \end{matrix} \end{matrix}

(18)

where

Ψ_{i} (ϖ_{i})

are the admissible sets on the compact sets

ϖ_{i}

. Taking the time derivative of (18), one can obtain the Hamilton–Jacobi–Bellman (HJB) equations:

\begin{matrix} \begin{matrix} H_{i} (e_{i}, μ_{i}^{*}, \nabla V_{i}^{*} (e_{i})) = \nabla^{T} V_{i}^{*} (f (e_{i}) + g_{i} μ_{i}^{*}) + {μ_{i}^{*}}^{T} R_{i} μ_{i}^{*} + e_{i}^{T} Q_{i} e_{i} = 0, \end{matrix} \end{matrix}

(19)

where

\nabla = \partial / \partial e_{i}

are the Jacobian matrices. Then, the optimal control policies

μ_{i}^{*}

can be derived by taking the partial derivative for (19) with respect to

μ_{i}^{*}

:

\begin{matrix} μ_{i}^{*} = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} \nabla V_{i}^{*} (e_{i}) . \end{matrix}

(20)

Substituting (20) into (19), the HJB equations can be further written as

\begin{matrix} \begin{matrix} 0 = \nabla V_{i}^{* T} (e_{i}) f (e_{i}) + e_{i}^{T} Q_{i} e_{i} - \frac{1}{4} \nabla V^{* T} (e_{i}) g_{i} R_{i}^{- 1} g_{i}^{T} \nabla V_{i}^{*} (e_{i}) . \end{matrix} \end{matrix}

(21)

However, for the HJB Equation (21), it is difficult to obtain the analytical solutions owing to their nonlinearity. Therefore, the AC NN technique will be introduced to solve this problem.

For all

e_{i} \in ϖ_{i}

, the optimal cost functions

{V_{i}}^{*} (e_{i})

can be approximated by

\begin{matrix} {V_{i}}^{*} (e_{i}) = W_{i}^{T} σ_{i} (e_{i}) + ε_{i}, \end{matrix}

where

W_{i} \in R^{n}

denote the unknown ideal weight vectors, and

σ_{i} (e_{i}) \in R^{n}

are the basis function vectors with n being the number of hidden layer neurons;

ε_{i}

is the approximation error. In addition, the optimal control policies

μ_{i}^{*}

can be reconstructed in the form of

\begin{matrix} μ_{i}^{*} = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} (\nabla^{T} σ_{i} (e_{i}) W_{i} + \nabla ε_{i}) . \end{matrix}

Two NNs (i.e., AC NNs) are usually utilized to approximate the control policies and optimal cost functions, respectively. The approximation results can be given by

V_{i} (e_{i}) = {\hat{W}}_{i c}^{T} σ_{i} (e_{i}),

(22)

μ_{i} = - \frac{1}{2} R_{i}^{- 1} g_{i}^{T} \nabla^{T} σ_{i} (e_{i}) {\hat{W}}_{i a},

(23)

where

{\hat{W}}_{i a}, {\hat{W}}_{i c} \in R^{n}

denote the weight vectors of the AC NNs.

Substituting (22) and (23) into (21), one can obtain the approximate results of the HJB equations:

\begin{matrix} \begin{matrix} {\hat{H}}_{i} = {\hat{W}}_{i c}^{T} \nabla σ_{i} (f (e_{i}) + e_{i}^{T} Q_{i} e_{i} - \frac{1}{2} g_{i} R_{i}^{- 1} {g^{T}}_{i} \nabla^{T} σ_{i} (e_{i}) {\hat{W}}_{i a}) + \frac{1}{4} {\hat{W}}_{i a}^{T} B_{i} {\hat{W}}_{i a} = E_{i}, \end{matrix} \end{matrix}

where

{\hat{H}}_{i} = H_{i} (e_{i}, μ_{i}, \nabla σ_{i} (e_{i}))

and

B_{i} = \nabla σ_{i} (e_{i}) g_{i} R_{i}^{- 1} {g^{T}}_{i} \nabla^{T} σ_{i} (e_{i})

.

E_{i}

represent the Bellman residual errors. Two positive functions are constructed as

\begin{matrix} κ_{i} (t) = {({\hat{W}}_{i a} - {\hat{W}}_{i c})}^{T} ({\hat{W}}_{i a} - {\hat{W}}_{i c}) . \end{matrix}

Then, calculating the time derivative of

κ_{i} (t)

along

{\hat{W}}_{i a}

and

{\hat{W}}_{i c}

, one has

\begin{matrix} \frac{d κ_{i} (t)}{d t} = \frac{\partial κ_{i} (t)}{\partial {\hat{W}}_{i a}} {\dot{\hat{W}}}_{i a} + \frac{\partial κ_{i} (t)}{\partial {\hat{W}}_{i c}} {\dot{\hat{W}}}_{i c} . \end{matrix}

(24)

The weight update laws of

{\hat{W}}_{i a}, {\hat{W}}_{i c}

are proposed in the form of

{\dot{\hat{W}}}_{i c} = - k_{i 1} Λ_{i} {\hat{W}}_{i c} + k_{i 3} β_{i 1},

(25)

{\dot{\hat{W}}}_{i a} = - k_{i 1} Λ_{i} {\hat{W}}_{i c} - k_{i 2} ({\hat{W}}_{i a} - {\hat{W}}_{i c}) + k_{i 3} β_{i 1},

(26)

where

k_{i 1}, k_{i 2}

, and

k_{i 3}

are the learning rates.

k_{i 1}

and

k_{i 2}

are positive constants, and

k_{i 3} = ∥Q_{i}∥

.

β_{i 1} = \frac{β_{i 2}}{1 + β_{i 2}^{T} β_{i 2}}

with

β_{i 2} = \nabla^{T} σ_{i} (e_{i}) \cdot e_{i}

.

Λ_{i} = r_{i 1} ∥β_{i 1}∥ + r_{i 2}

with

r_{i 1}

and

r_{i 2}

being positive constants. In addition, the AC NNs’ weight errors are defined as

{\tilde{W}}_{i a} = W_{i} - {\hat{W}}_{i a}

and

{\tilde{W}}_{i c} = W_{i} - {\hat{W}}_{i c}

. Substituting (25) and (26) into (24) yields

\begin{matrix} \frac{d κ_{i} (t)}{d t} = - 2 k_{i 2} {({\hat{W}}_{i a} - {\hat{W}}_{i c})}^{T} ({\hat{W}}_{i a} - {\hat{W}}_{i c}) \leq 0 . \end{matrix}

(27)

According to (27), it can be further concluded that

κ_{i} (t) = 0

will be finally achieved under the weight update laws (25) and (26). According to [48], the approximate optimal solutions

μ_{i}

are expected to satisfy the Bellman residual errors

E_{i} \to 0

. Furthermore, if

E_{i} = 0

has a unique solution, it is equivalent to the following equations:

\begin{matrix} \begin{matrix} \frac{\partial E_{i}}{\partial {\hat{W}}_{i a}} & = \frac{1}{2} B_{i} ({\hat{W}}_{i a} - {\hat{W}}_{i c}) = 0_{n \times 1} . \end{matrix} \end{matrix}

(28)

Obviously, based on the results

κ_{i} (t) = 0

, the derivation of (27) and (28) will be satisfied.

3.3. Adaptive-Dynamic-Programming-Based Robust Control Law Design

Based on the reconstructed information of the above-proposed FTNNOBs

{\hat{d}}_{i}

in (10) and the designed approximate optimal control policies

μ_{i}

in (23), the feedforward method was employed to compensate the approximate optimal control policies to restrain the lumped uncertainties. Then, the ADP-based robust control laws can be designed as

\begin{matrix} χ_{i} = μ_{i} - {\hat{d}}_{i} . \end{matrix}

(29)

Assumption 3.

For all

e_{i} \in ϖ_{i}

, the following conditions

∥\nabla σ_{i} (e_{i})∥ \leq \nabla {\bar{σ}}_{i}, ∥W_{i}∥ \leq {\bar{W}}_{i}

, and

∥\nabla ε_{i}∥ \leq \nabla {\bar{ε}}_{i}

are satisfied with

\nabla {\bar{σ}}_{i}, {\bar{W}}_{i}

and

\nabla {\bar{ε}}_{i}

being positive constants.

With the designed control law, we can show that the closed-loop system is stable in the following theorem.

Theorem 2.

Consider the systems in (7), Assumptions 1 and 3, and the cost function (18); if the proposed ADP-based robust control laws are defined as the form of (29) with the FTNNOBs (10) and the approximate optimal laws (23), then the tracking errors

e_{i}

and AC NNs’ weight errors

{\tilde{W}}_{i a}, {\tilde{W}}_{i c}

can be guaranteed to be ultimately uniformly bounded (UUB).

Proof.

Select a candidate Lyapunov functions as

\begin{matrix} L_{i 3} = V^{*} (e_{i}) + \frac{ρ_{i 1}}{2} {\tilde{W}}_{i a}^{T} {\tilde{W}}_{i a} + \frac{ρ_{i 2}}{2} {\tilde{W}}_{i c}^{T} {\tilde{W}}_{i c}, \end{matrix}

where

ρ_{i 1}

and

ρ_{i 2}

are positive constants with

ρ_{i 1} > ρ_{i 2}

. Calculating the time derivative of

L_{i 3}

yields

\begin{matrix} {\dot{L}}_{i 3} = & \nabla^{T} V^{*} (f (e_{i}) + g_{i} μ_{i}^{*}) + \nabla^{T} V^{*} (g_{i} μ_{i} - g_{i} μ_{i}^{*}) + \nabla^{T} V^{*} g_{i} (d_{i} - {\hat{d}}_{i}) \\ - ρ_{i 2} \tilde{W}_{i c}^{T} {\dot{\hat{W}}}_{i c} - ρ_{i 1} \tilde{W}_{i a}^{T} {\dot{\hat{W}}}_{i a} \\ = & - e_{i}^{T} Q e_{i} - \frac{1}{4} W_{i}^{T} B_{i} W_{i} + \frac{k_{i 1} ρ_{i 1}}{2} W_{i}^{T} Λ_{i} W_{i} - \frac{1}{2} W_{i}^{T} \nabla σ_{i} (e_{i}) g_{i} R_{i}^{- 1} g_{i}^{T} \nabla ε_{i} \\ - \frac{1}{2} \nabla^{T} ε_{i} g_{i} R_{i}^{- 1} g_{i}^{T} \nabla^{T} σ_{i} (e_{i}) {\tilde{W}}_{i a} + \frac{1}{2} W_{i}^{T} \nabla σ_{i} (e_{i}) g_{i} R_{i}^{- 1} g_{i}^{T} \nabla ε_{i} \\ - \frac{k_{i 1} ρ_{i 1}}{2} \tilde{W}_{i c}^{T} Λ_{i} {\tilde{W}}_{i c} - \frac{k_{i 1} ρ_{i 1}}{2} \hat{W}_{i c}^{T} Λ_{i} {\hat{W}}_{i c} + \frac{1}{4} \nabla^{T} ε_{i} g_{i} R_{i}^{- 1} g_{i}^{T} \nabla ε_{i} + k_{i 1} ρ_{i 2} \tilde{W}_{i a}^{T} Λ_{i} {\hat{W}}_{i c} \\ - k_{i 3} ρ_{i 2} \tilde{W}_{i c}^{T} β_{i 1} - k_{i 3} ρ_{i 1} \tilde{W}_{i a}^{T} β_{i 1} - k_{i 2} ρ_{i 2} \tilde{W}_{i a}^{T} {\tilde{W}}_{i a} + k_{i 2} ρ_{i 2} \tilde{W}_{i a}^{T} {\tilde{W}}_{i c} + ϑ_{i 4} \end{matrix}

where

ϑ_{i 4} = (\nabla^{T} σ_{i} (e_{i}) W_{i} + \nabla^{T} ε_{i}) g_{i} {\tilde{d}}_{i}

. Based on Assumptions 2 and 3, it is reasonable to assume that

\frac{1}{2} \nabla σ_{i} (e_{i}) g_{i} R_{i}^{- 1} {g^{T}}_{i} \nabla^{T} σ_{i} (e_{i})

,

\frac{1}{2} \nabla^{T} ε_{i} g_{i} R_{i}^{- 1} g_{i}^{T} \nabla ε_{i}

, and

(\nabla^{T} σ_{i} (e_{i}) W_{i} + \nabla^{T} ε_{i}) g_{i}

are norm-bounded. From Theorem 1, since the disturbance error bounds are fixed-time stable, we can assume that

{\tilde{d}}_{i}

are also norm-bounded, that is

∥{\tilde{d}}_{i}∥ \leq d_{i M}

with

d_{i M}

being positive constants. Then, one has

\begin{matrix} \begin{matrix} ∥\frac{1}{2} \nabla σ_{i} (e_{i}) g_{i} R_{i}^{- 1} {g^{T}}_{i} \nabla^{T} σ_{i} (e_{i})∥ \leq b_{D_{i}}, ∥\frac{1}{2} \nabla^{T} ε_{i} g_{i} R_{i}^{- 1} g_{i}^{T} \nabla ε_{i} + ϑ_{i 4}∥ \leq α_{i 8} \end{matrix} \end{matrix}

where

b_{D_{i}}

are positive constants. Consider the inequality

∥β_{1}∥ \leq \nabla {\bar{σ}}_{i} ∥e_{i}∥

; the following inequalities can be obtained by using Young’s inequality:

- k_{i 3} ρ_{i 2} \tilde{W}_{i c}^{T} β_{i 1} \leq \frac{\nabla {\bar{σ}}_{i}^{2}}{2} {∥{\tilde{W}}_{i c}∥}^{2} + \frac{k_{i 3}^{2} ρ_{i 2}^{2}}{2} {∥e_{i}∥}^{2},

(30)

- k_{i 3} ρ_{i 1} \tilde{W}_{i a}^{T} β_{i 1} \leq \frac{\nabla {\bar{σ}}_{i}^{2}}{2} {∥{\tilde{W}}_{i a}∥}^{2} + \frac{k_{i 3}^{2} ρ_{i 1}^{2}}{2} {∥e_{i}∥}^{2} .

(31)

In addition, there exist constants

b_{i Q}

satisfying

b_{i Q} < λ_{\min} (Q_{i})

. By utilizing the above inequalities (30) and (31), one can further obtain that

\begin{matrix} \begin{matrix} {\dot{L}}_{i 3} \leq & - (b_{i Q} - \frac{k_{i 3}^{2} ρ_{i 1}^{2}}{2} - \frac{k_{i 3}^{2} ρ_{i 2}^{2}}{2}) {∥e_{i}∥}^{2} - \frac{k_{i 2} ρ_{i 2} - k_{i 1} ρ_{i 2} γ_{i} - b_{D_{i}} - {\nabla {\bar{σ}}_{i}}^{2}}{2} {∥{\tilde{W}}_{i a}∥}^{2} + α_{i 8} \\ - \frac{k_{i 1} ρ_{i 1} r_{i 2} - k_{i 2} ρ_{i 2} - {\nabla {\bar{σ}}_{i}}^{2}}{2} {∥{\tilde{W}}_{i c}∥}^{2} + α_{i 9}, \end{matrix} \end{matrix}

(32)

where

γ_{i} = Λ_{\max} = r_{i 1} {∥β_{i 1}∥}_{\max} + r_{i 2}

and

α_{i 9} = \frac{k_{i 1} ρ_{i 1} γ_{i}}{2} {\bar{W}}_{i}^{2}

. Then, we can select the parameters

k_{i 1}, k_{i 2}

,

k_{i 3}

, such that

2 b_{i Q} - k_{i 3}^{2} ρ_{i 1}^{2} - k_{i 3}^{2} ρ_{i 2}^{2} > 0

,

k_{i 2} ρ_{i 2} - k_{i 1} ρ_{i 2} γ_{i} - b_{D_{i}} - {\nabla {\bar{σ}}_{i}}^{2} > 0

, and

k_{i 1} ρ_{i 1} r_{i 2} - k_{i 2} ρ_{i 2} - {\nabla {\bar{σ}}_{i}}^{2} > 0

hold. Furthermore, one can conclude that

{\dot{L}}_{i 3} \leq 0

when the condition:

\begin{matrix} \{\begin{matrix} ∥e_{i}∥ > \sqrt{\frac{2 α_{i 8}}{2 b_{i Q} - k_{i 3}^{2} ρ_{i 1}^{2} - k_{i 3}^{2} ρ_{i 2}^{2}}} \\ ∥{\tilde{W}}_{i a}∥ > \sqrt{\frac{2 α_{i 9}}{k_{i 2} ρ_{i 2} - k_{i 1} ρ_{i 2} γ_{i} - b_{D_{i}} - {\nabla {\bar{σ}}_{i}}^{2}}} or ∥{\tilde{W}}_{i c}∥ > \sqrt{\frac{2 α_{i 9}}{k_{i 1} ρ_{i 1} r_{i 2} - k_{i 2} ρ_{i 2} - {\nabla {\bar{σ}}_{i}}^{2}}} \end{matrix} \end{matrix}

is satisfied. Therefore, (32) indicates that the error states of the systems (7) and the AC NNs’ weight errors are UUB stable. □

Remark 2.

Different from [16,25], we designed weight update laws of the AC NNs that avoid the use of the PE condition by using an adaptive method, where

Λ_{i}

are designed to guarantee stable convergence performances of

{\tilde{W}}_{i c}

and

{\tilde{W}}_{i a}

. Moreover, the positive constants

r_{i 2}

are introduced in

Λ_{i}

to guarantee

{\tilde{W}}_{i c}

and

{\tilde{W}}_{i a}

can converge to a neighborhood of the origin when

∥β_{i 1}∥ = 0

.

Remark 3.

In [48,49], the critic NNs’ weight update law only contains a term similar to the first term of (31), which makes the convergence performances of the weights depend heavily on the initial weights. For instance, it is obvious that the training process of the AC NNs’ weights can be broken when the initial weights are set to zero. To emphasize the role of the system states, the second term in the critic NNs’ weight update laws (31) is added to improve the efficiency of the data utilization and the adaption capability of AC NNs.

4. Simulation

Numerical simulations are presented to verify the effectiveness and fine performances of the proposed ADP-based robust trajectory-tracking-control scheme (denoted as ADPFTNs) in this section. Firstly, the robust performances of the proposed ADPFTNs in (29) were verified in the case of external disturbances and parameter uncertainties. Then, we demonstrate the superior ability of the proposed ADPFTNs in balancing control performances and cost. The initial position vector was selected as

ζ (0) = {[0.1, - 0.2, 0.131]}^{T} m

. The desired position trajectory was

ζ_{d} = {[0.5 \sin (0.6 t), 0.5 \cos (0.6 t), 0.5 + \frac{3 t}{50}]}^{T} m

. The desired yaw angle was

ψ_{d} = 0 rad

. The basis function vectors are

\begin{matrix} \begin{matrix} σ_{1} (e_{1}) = {[0.5 x_{e}^{2}, 0.5 y_{e}^{2}, 0.5 z_{e}^{2}, x_{e} {\dot{x}}_{e}, y_{e} {\dot{y}}_{e}, z_{e} {\dot{z}}_{e}]}^{T}, σ_{2} (e_{2}) = {[0.5 ϕ_{e}^{2}, 0.5 θ_{e}^{2}, 0.5 ψ_{e}^{2}, ϕ_{e} {\dot{ϕ}}_{e}, θ_{e} {\dot{θ}}_{e}, ψ_{e} {\dot{ψ}}_{e}]}^{T} . \end{matrix} \end{matrix}

The initial weight vectors are

\begin{matrix} \begin{matrix} W_{1 a} (0) & = {[27, 26, 21, 22, 24, 22]}^{T}, W_{2 a} (0) = {[15, 16, 18, 16, 18, 12]}^{T}, \\ W_{1 c} (0) & = {[28, 28, 26, 23, 22, 24]}^{T}, W_{2 c} (0) = {[14, 15, 15, 16, 18, 24]}^{T} . \end{matrix} \end{matrix}

In addition, the main parameters of the quadrotor UAV are given as

m_{0} = 1.2 kg

,

g = 9.81 m / s^{2}

,

J_{0} = diag \{0.125, 0.125, 0.25\} kg \cdot m^{2}

,

J_{r} = 0.002 kg \cdot m^{2}

, and

k_{10}, k_{20}, \dots, k_{60} = 0.01

.

4.1. Robust Tracking Control Performances with Uncertainty

To highlight the superiority of the proposed ADPFTNs, the two NN-based observers (NNOBs) in [34] with the proposed approximate optimal control polices in (29) (denoted as ADPNNs) were employed to compare with the ADPFTNs. The uncertain parameter values were set as

Δ_{m} = 0.2 + 0.05 \cdot rand kg, Δ_{J_{x}} = Δ_{J_{y}} = Δ_{J_{z}} = 0.03 + 0.01 \cdot rand kg \cdot m^{2}

, and

Δ_{k_{1}} = Δ_{k_{2}} = Δ_{k_{3}} = Δ_{k_{4}} = Δ_{k_{5}} = Δ_{k_{6}} = 0.005 + 0.001 \cdot rand

. The

rand

function generates a random number in the interval

[0, 1]

.

u_{d} = {[0.1 \sin (0.8 t), 0.1 \sin (0.7 t), 0.1 \sin (0.5 t)]}^{T} N

and

τ_{d} = {[0.1 \sin (0.8 t) + J_{r} \dot{θ}, 0.1 \sin (0.7 t) - J_{r} \dot{ϕ}, 0.1 \sin (0.5 t)]}^{T} N \cdot m

with

J_{r} \dot{θ}

and

J_{r} \dot{ϕ}

being gyroscopic moments of the quadrotor UAV. The parameters of the ADPFTNs are given in Table 1.

Figure 3 shows the time response of the observation errors for the FTNNOBs and NNOBs, respectively. It can be observed that the observation error of the FTNNOBs stabilizes to the order of

10^{- 4}

within 0.5 s, while that of the NNOBs stabilizes to the order of

10^{- 3}

around 5 s. It can be concluded that the FTNNOBs have faster convergence performance and higher stabilization accuracy to attenuate external disturbances and parameter uncertainties in comparison with the NNOBs. Figure 4 shows the time response of the tracking errors for the ADPFTNs and the ADPNNs, respectively. For a fair comparison, we adjusted the convergence times of the two control schemes in the simulation to make them approximately equal (Section 4.2 performs the same setup). As shown in Figure 4, the tracking errors’ convergence curve of the proposed ADPFTNs is smoother than that of the ADPNNs and has higher convergence accuracy. This means that, when there are external disturbances and parameter uncertainties in the quadrotor UAV system, the FTNNOBs can quickly and accurately estimate the unknown uncertainties and compensate for the control policies. To illustrate the tracking performance of the two control schemes more directly, the 3D trajectory tracking results are shown in Figure 5. It can be observed that both the ADPFTNs and the ADPNNs can successfully track the desired trajectory. Further, it can be observed that the red solid line (the proposed ADPFTNs in this work) is more consistently close to the black solid line (desired trajectory) compared to the red dotted line (ADPNNs in [34]). Figure 6 shows the time responses of the AC NNs’ weights. One can see that the curves of the position error subsystem and attitude error subsystem stabilize after 5 s. Figure 7 shows the control inputs under the frame

ς_{E}

.

4.2. Tracking Control Performances and Cost

In this subsection, the control performances of the ADPFTNs in (19) are further analyzed from the perspective of control cost quantitatively and compared with the backstepping control scheme in [11] (denoted as Backstepping) and the learning-based robust control scheme in [50] (denoted as LBRC). For a fair comparison, the same cost functions are defined as

V_{p} = \int_{0}^{\infty} (e_{1}^{T} Q_{1} e_{1} + u^{T} R_{1} u)

and

V_{a} = \int_{0}^{\infty} (e_{2}^{T} Q_{2} e_{2} + τ^{T} R_{2} τ)

, in which

V_{p}

and

V_{a}

represent the system of position and attitude control cost, respectively. For a convenient comparison of the position subsystem control cost, the gravity necessary for the three schemes was ignored in the calculation of the control cost

V_{p}

.

Figure 8 shows the time response of the tracking errors for the ADPFTNs, Backstepping, and LBRC, respectively. As shown in Figure 8, the stabilization accuracy of the ADPFTN was

10^{- 5}

orders of magnitude higher than the

10^{- 4}

orders of magnitude of Backstepping and LBRC in the same convergence time. In the convergence step, Backstepping and LBRC had steeper convergence trends and faster transient responses, while the ADPFTNs has a smooth convergence tendency. It is worth noting that the steep convergence trends and fast transient responses led to higher control cost. Figure 9 illustrates the control cost of the three control schemes. It can be observed that the ADPFTNs were capable of reducing the control cost while guaranteeing better control performances. In 0–10 s, the tracking errors lied in the convergence process, and it could be obtained that the ADPFTNs had lower control cost than Backstepping and LBRC, which demonstrates the superior performance of the ADPFTNs in reducing the control cost. After 10 s, the tracking error converged to a neighborhood of the origin, and there was no major difference in the control cost of the three control schemes. Compared to Backstepping, the total control cost of the position subsystem was reduced by

5.5 %

and the total control cost of the attitude subsystem was reduced by

38.9 %

. In addition, the control inputs of the position system were larger than those of the attitude system, which accordingly led to a larger control cost of the position system. The 3D trajectory tracking results are illustrated in Figure 5. It can be observed that all three control schemes can successfully track the desired trajectory. Further, it can be seen that the red solid line had a more-stable tracking result in comparison with the blue dotted line (Backstepping in [11]) and the black dashed line (LBRC in [50]). Figure 10 shows the time response of control inputs in the frame

ς_{E}

. As shown in Figure 9, the initial control inputs of the ADPFTNs were the smallest among the three schemes. Smaller control inputs had a smaller control cost, which corresponds to the results in Figure 9. This further illustrates the superiority of ADPFTNs in balancing control cost.

4.3. Matlab/Simscape Simulation Results

In this subsection, the effectiveness of the ADPFTNs is further validated by utilizing the semi-physical simulation environment of Matlab/Simscape, where the mathematical model was replaced with a 3D physical model of SolidWorks. The Quadrotor UAV 3D physical model parameters, external disturbances, and parameter uncertainties were set the same as in the numerical simulation section (Section 4.1 and Section 4.2). Figure 11 shows the time response of the tracking errors of the ADPFTNs in the Matlab/Simscape semi-physical simulation environment. It can be observed that all errors had good convergence performance. The control inputs of the semi-physical simulation are displayed in Figure 12. The simulation results of Matlab/Simscape semi-physical simulation environment in Figure 11 and Figure 12 were almost the same as the numerical simulation results, which indicated that the ADPFTNs have potential value to be extended and transplanted to engineering applications.

5. Conclusions

This paper investigated the ADP-based robust control problem of a quadrotor UAV trajectory tracking control system subject to external disturbances and parameter uncertainties. Firstly, the error subsystems were obtained by preprocessing the quadrotor UAV system using the feedforward technique. Subsequently, two FTNNOBs were designed to estimate external disturbances and parameter uncertainties quickly and accurately (observation errors converged within 0.5 s with an accuracy of the order of

10^{- 4}

). Then, the AC NNs approximated the optimal value functions and the optimal control policies by designing two new weight update laws. Meanwhile, the designed weight update laws not only avoided utilizing the PE condition, but also improved the adaptive ability of the AC neural network. Finally, the ADPFTNs’ control scheme with high accuracy tracking performance (tracking accuracy of the order of

10^{- 5}

) and low control cost (a total control cost of position subsystem savings of

5.5 %

; a total control cost of attitude subsystem savings of

38.9 %

) was proposed by combining the FTNNOBs and the approximate results of the AC NNs. Through Lyapunov stability analysis, the proposed control scheme can guarantee the closed-loop tracking control systems to be UUB stable. In the future, we will construct some experiments to demonstrate the practicability of the developed control scheme and perform some research on a multi-UAV system. Meanwhile, we will further improve the adopted model to enhance its accuracy and applicability to ensure that it better reflects the actual situation.

Author Contributions

Conceptualization, S.Y.; methodology, S.Y.; software, S.Y.; validation, S.Y.; formal analysis, H.L. and H.Z.; investigation, H.L.; resources, H.L.; data curation, S.Y. and H.M.; writing—original draft preparation, S.Y. and F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by National Natural Science Foundation of China (Grant Number 62073212).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, N.; Shao, X. Desired compensation RISE-based IBVS control of quadrotors for tracking a moving target. Nonlinear Dyn. 2019, 95, 2605–2624. [Google Scholar] [CrossRef]
Das, D.N.; Sewani, R.; Wang, J.; Tiwari, M.K. Synchronized truck and drone routing in package delivery logistics. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5772–5782. [Google Scholar] [CrossRef]
Wu, Y.; Wu, S.B.; Hu, X.T. Cooperative path planning of UAVs & UGVs for a persistent surveillance task in urban environments. IEEE Internet Things J. 2021, 8, 4906–4919. [Google Scholar]
Gajbhiye, S.; Cabecinhas, D.; Silvestre, C.; Cunha, R. Geometric finite-time inner-outer loop trajectory tracking control strategy for quadrotor slung-load transportation. Nonlinear Dyn. 2022, 107, 2291–2308. [Google Scholar] [CrossRef]
Li, B.; Gong, W.Q.; Yang, Y.S.; Xiao, B. Distributed fixed-time leader-following formation control for multi-quadrotors with prescribed performance and collision avoidance. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7281–7294. [Google Scholar] [CrossRef]
Labbadi, M.; Cherkaoui, M. Robust adaptive backstepping fast terminal sliding mode controller for uncertain quadrotor UAV. Aerosp. Sci. Technol. 2019, 93, 105306. [Google Scholar] [CrossRef]
Michael, N.; Mellinger, D.; Lindsey, Q.; Kumar, V. The GRASP Multiple Micro-UAV Testbed. IEEE Robot Autom. Mag. 2010, 17, 56–65. [Google Scholar] [CrossRef]
Sun, Y.B.; Xian, N.; Duan, H.B. Linear-quadratic regulator controller design for quadrotor based on pigeon-inspired optimization. Aircr. Eng. Aerosp. Tec. 2016, 88, 761–770. [Google Scholar] [CrossRef]
Li, B.; Gong, W.Q.; Yang, Y.S.; Xiao, B.; Ran, D.C. Appointed Fixed Time Observer-Based Sliding Mode Control for a Quadrotor UAV Under External Disturbances. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 290–303. [Google Scholar] [CrossRef]
Zhao, Z.; Cao, D.; Yang, J.; Wang, H. High-order sliding mode observer-based trajectory tracking control for a quadrotor UAV with uncertain dynamics. Nonlinear Dyn. 2020, 102, 2583–2596. [Google Scholar] [CrossRef]
Xiao, B.; Yin, S. A New Disturbance Attenuation Control Scheme for Quadrotor Unmanned Aerial Vehicles. IEEE Trans. Industr. Inform. 2017, 13, 2922–2932. [Google Scholar] [CrossRef]
Tran, V.P.; Santoso, F.; Garratt, M.A. Adaptive Trajectory Tracking for Quadrotor Systems in Unknown Wind Environments Using Particle Swarm Optimization-Based Strictly Negative Imaginary Controllers. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 1742–1752. [Google Scholar] [CrossRef]
Alifbek, K.K.; Stepan, D.; Murodbek, S.; Dmitry, A.P.; Anvari, G.; Javod, A. Expert system application for reactive power compensation in isolated electric power systems. Int. J. Electr. Comput. Eng. 2021, 11, 3682–3691. [Google Scholar]
Martyushev, N.V.; Malozyomov, B.V.; Khalikov, I.H.; Kukartsev, V.A.; Kukartsev, V.V.; Tynchenko, V.S.; Tynchenko, Y.A.; Qi, M. Review of Methods for Improving the Energy Efficiency of Electrified Ground Transport by Optimizing Battery Consumption. Energies 2023, 16, 729. [Google Scholar] [CrossRef]
Werbos, P.J. Consistency of HDP applied to a simple reinforcement learning problem. Neural Netw. 1990, 3, 179–189. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online actor–Ccritic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Kamalapurkar, R.; Dinh, H.; Bhasin, S.; Dixon, W.E. Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 2015, 51, 40–48. [Google Scholar] [CrossRef]
Wang, D.; Liu, D.R.; Li, H.L. Policy Iteration Algorithm for Online Design of Robust Control for a Class of Continuous-Time Nonlinear Systems. IEEE Trans. Autom. Sci. Eng. 2014, 11, 627–632. [Google Scholar] [CrossRef]
Wen, G.X.; Ge, S.S.; Tu, F.W. Optimized Backstepping for Tracking Control of Strict-Feedback Systems. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 13. [Google Scholar]
Zhao, B.; Liu, D.R.; Li, Y.C. Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems. Inf. Sci. 2017, 384, 21–33. [Google Scholar] [CrossRef]
Wei, Q.L.; Liu, D.R.; Lin, Q.; Song, R.Z. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 957–969. [Google Scholar] [CrossRef]
Zhao, W.; Liu, H.; Lewis, F.L. Data-Driven Fault-Tolerant Control for Attitude Synchronization of Nonlinear Quadrotors. IEEE Trans. Automat. Control 2021, 66, 5584–5591. [Google Scholar] [CrossRef]
Chowdhary, G.; Yucelen, T.; Mühlegg, M.; Johnson, E.N. Concurrent learning adaptive control of linear systems with exponentially convergent bounds. Int. J. Adapt. Control Signal Process. 2013, 27, 280–301. [Google Scholar] [CrossRef]
Yang, X.; He, H. Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neural Netw. 2018, 99, 19–30. [Google Scholar] [CrossRef] [PubMed]
Dong, H.Y.; Zhao, X.W.; Yang, H.Y. Reinforcement Learning-Based Approximate Optimal Control for Attitude Reorientation Under State Constraints. IEEE Trans. Control Syst. Technol. 2021, 29, 1664–1673. [Google Scholar] [CrossRef]
Liu, H.; Li, B.; Xiao, B.; Ran, D.C.; Zhang, C.X. Reinforcement learning© tracking control for a quadrotor unmanned aerial vehicle under external disturbances. Int. J. Robust Nonlinear Control. 2022, 33, 10360–10377. [Google Scholar] [CrossRef]
Sun, J.L.; Liu, C.S. Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming. J. Frankl. Inst. 2018, 355, 2344–2368. [Google Scholar] [CrossRef]
Zhao, B.; Xu, S.Y.; Guo, J.Q.; Jiang, R.M.; Zhou, J. Integrated strapdown missile guidance and control based on neural network disturbance observer. Aerosp. Sci. Technol. 2019, 84, 170–181. [Google Scholar] [CrossRef]
Zhang, R.; Xu, B.; Shi, P. Finite time observer© output feedback control of MEMS gyroscopes with input saturation. Int. J. Robust Nonlinear Control. 2022, 32, 4300–4317. [Google Scholar] [CrossRef]
Zhao, Z.Y.; Jin, X.Z. Adaptive neural network-based sliding mode tracking control for agricultural quadrotor with variable payload. Comput. Electr. Eng. 2022, 103, 108336. [Google Scholar] [CrossRef]
Wang, D.D.; Zong, Q.; Tian, B.L.; Shao, S.K.; Zhang, X.Y.; Zhao, X.Y. Neural network disturbance observer-based distributed finite-time formation tracking control for multiple unmanned helicopters. ISA Trans. 2018, 73, 208–226. [Google Scholar] [CrossRef] [PubMed]
Liu, K.; Wang, R.J.; Wang, X.D.; Wang, X.X. Anti-saturation adaptive finite-time neural network based fault-tolerant tracking control for a quadrotor UAV with external disturbances. Aerosp. Sci. Technol. 2021, 115, 106790. [Google Scholar] [CrossRef]
Fan, Q.Y.; Yang, G.H. Adaptive Actor–Ccritic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 165–177. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Zhao, B.; Liu, D.R. Fault tolerant tracking control for nonlinear systems with actuator failures through particle swarm optimization-based adaptive dynamic programming. Appl. Soft Comput. 2020, 97, 106766. [Google Scholar] [CrossRef]
Adhyaru, D.M. State observer design for nonlinear systems using neural network. Appl. Soft Comput. 2012, 12, 2530–2537. [Google Scholar] [CrossRef]
Farid, K.; Zhen Yu, Y.; Kenzo, N. Guidance and nonlinear control system for autonomous flight of minirotorcraft unmanned aerial vehicles. J. Field Robot. 2010, 27, 311–334. [Google Scholar]
Abeywardena, D.; Kodagoda, S.; Dissanayake, G.; Munasinghe, R. Improved State Estimation in Quadrotor MAVs: A Novel Drift-Free Velocity Estimator. IEEE Robot. Autom. Mag. 2013, 20, 32–39. [Google Scholar] [CrossRef]
Tang, P.; Zhang, F.B.; Ye, J.C.; Lin, D.F. An integral TSMC-based adaptive fault-tolerant control for quadrotor with external disturbances and parametric uncertainties. Aerosp. Sci. Technol. 2021, 109, 106415. [Google Scholar] [CrossRef]
Xiao, B.; Yin, S. Exponential Tracking Control of Robotic Manipulators With Uncertain Dynamics and Kinematics. IEEE Trans. Industr. Inform. 2019, 15, 689–698. [Google Scholar] [CrossRef]
Li, B.; Zhang, H.C.; Xiao, B.; Wang, C.H.; Yang, Y.S. Fixed-time integral sliding mode control of a high-order nonlinear system. Nonlinear Dyn. 2022, 107, 909–920. [Google Scholar] [CrossRef]
Zhao, L.; Yu, J.P.; Lin, C.; Yu, H.S. Distributed adaptive fixed-time consensus tracking for second-order multi-agent systems using modified terminal sliding mode. Appl. Math. Comput. 2017, 312, 23–35. [Google Scholar] [CrossRef]
Yu, S.H.; Yu, X.H.; Shirinzadeh, B.; Man, Z.H. Continuous finite-time control for robotic manipulators with terminal sliding mode. Automatica 2005, 41, 1957–1964. [Google Scholar] [CrossRef]
Shao, K.; Zheng, J.C.; Huang, K.; Wang, H.; Man, Z.H.; Fu, M.Y. Finite-time control of a linear motor positioner using adaptive recursive terminal sliding mode. IEEE Trans. Ind. Electron. 2020, 67, 6659–6668. [Google Scholar] [CrossRef]
Song, Y.D.; Huang, X.C.; Wen, C.Y. Tracking Control for a Class of Unknown Nonsquare MIMO Nonaffine Systems: A Deep-Rooted Information Based Robust Adaptive Approach. IEEE Trans. Automat. Control 2016, 61, 3227–3233. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Girosi, F.; Poggio, T. Networks and the best approximation property. Biol. Cybern. 1990, 63, 169–176. [Google Scholar] [CrossRef]
Zhang, D.H.; Kong, L.H.; Zhang, S.; Li, Q.; Fu, Q. Neural networks-based fixed-time control for a robot with uncertainties and input deadzone. Neurocomputing 2020, 390, 139–147. [Google Scholar] [CrossRef]
Wen, G.X.; Chen, C.L.P.; Ge, S.S. Simplified Optimized Backstepping Control for a Class of Nonlinear Strict-Feedback Systems With Unknown Dynamic Functions. IEEE Trans. Cybern. 2021, 51, 4567–4580. [Google Scholar] [CrossRef]
Li, K.W.; Li, Y.M. Adaptive NN optimal consensus fault-tolerant control for stochastic nonlinear multiagent systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 947–957. [Google Scholar] [CrossRef]
Mu, C.X.; Zhang, Y. Learning-Based Robust Tracking Control of Quadrotor With Time-Varying and Coupling Uncertainties. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 259–273. [Google Scholar] [CrossRef]

Figure 1. Mechanical structure of a quadrotor UAV.

Figure 2. Framework for ADP-based robust tracking control of the quadrotor UAV system.

Figure 3. Time response of the observation errors by different observers.

Figure 4. Time response of tracking errors under ADPFTNs and ADPNNs.

Figure 5. Three-dimensional trajectory tracking results under different control schemes.

Figure 6. Time responses of AC NNs’ weights of ADPFTNs.

Figure 7. Time response of control inputs under ADPFTNs and ADPNNs.

Figure 8. Time response of tracking errors under different schemes.

Figure 9. The position control cost and attitude control cost under different schemes.

Figure 10. Time responses of control inputs under different schemes.

Figure 11. Time response of tracking errors of ADPFTNs in Simscape.

Figure 12. Time response of control inputs of ADPFTNs in Simscape.

Table 1. Parameters’ selection.

Variable	Position Error System (i = 1)	Attitude Error System (i = 2)
$r_{i 1}$	$1 \times 10^{- 6}$	$1 \times 10^{- 6}$
$r_{i 2}$	$1 \times 10^{- 5}$	$1 \times 10^{- 5}$
$k_{i 1}$	$0.01$	$0.01$
$k_{i 2}$	$1.2$	$1.2$
$Q_{i}$	$4 I_{6}$	$10 I_{6}$
$R_{i}$	$2.5 I_{3}$	$5 I_{3}$
$r_{i 3}$	$1.6$	$1.6$
$r_{i 4}$	$0.2$	$0.2$
$c_{i}$	200	700
$ϱ_{i 1}$	$0.06$	$0.06$
$m_{i}$	97	99
$n_{i}$	99	101
$p_{i}$	105	105
$q_{i}$	99	101

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, S.; Yu, F.; Liu, H.; Ma, H.; Zhang, H. Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties. Appl. Sci. 2023, 13, 12672. https://doi.org/10.3390/app132312672

AMA Style

Yang S, Yu F, Liu H, Ma H, Zhang H. Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties. Applied Sciences. 2023; 13(23):12672. https://doi.org/10.3390/app132312672

Chicago/Turabian Style

Yang, Shaoyu, Fang Yu, Hui Liu, Hongyue Ma, and Haichao Zhang. 2023. "Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties" Applied Sciences 13, no. 23: 12672. https://doi.org/10.3390/app132312672

APA Style

Yang, S., Yu, F., Liu, H., Ma, H., & Zhang, H. (2023). Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties. Applied Sciences, 13(23), 12672. https://doi.org/10.3390/app132312672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive-Dynamic-Programming-Based Robust Control for a Quadrotor UAV with External Disturbances and Parameter Uncertainties

Abstract

1. Introduction

2. Model Description and Transformation

2.1. Dynamics Model

2.2. Model Transformation

3. Adaptive-Dynamic-Programming-Based Robust Control Design

3.1. Online-Uncertainty-Compensation-Based Fixed-Time NN-Based Observers

3.2. ADP-Based Nominal Optimal Control Design

3.3. Adaptive-Dynamic-Programming-Based Robust Control Law Design

4. Simulation

4.1. Robust Tracking Control Performances with Uncertainty

4.2. Tracking Control Performances and Cost

4.3. Matlab/Simscape Simulation Results

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI