Next Article in Journal
Design Optimisation Approach of an Outer Rotor Multiphase PM Actuator for Multirotor Aerial Vehicle Applications
Next Article in Special Issue
Modelling and Control of an Urban Air Mobility Vehicle Subject to Empirically-Developed Urban Airflow Disturbances
Previous Article in Journal
Mitigation of Shock-Induced Separation Using Square-Shaped Micro-Serrations—A Preliminary Study
Previous Article in Special Issue
RBFNN-Based Anti-Input Saturation Control for Hypersonic Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Approximate Optimal Trajectory Tracking Control for Quadrotors

1
College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
2
Nuclear Emergency and Nuclear Safety Department, China Institute for Radiation Protection, Taiyuan 030006, China
3
College of Mechanical and Vehicle Engineering, Taiyuan University of Technology, Taiyuan 030024, China
*
Author to whom correspondence should be addressed.
Aerospace 2024, 11(2), 149; https://doi.org/10.3390/aerospace11020149
Submission received: 21 December 2023 / Revised: 5 February 2024 / Accepted: 7 February 2024 / Published: 13 February 2024
(This article belongs to the Special Issue Flight Dynamics, Control & Simulation (2nd Edition))

Abstract

:
This paper uses the adaptive dynamic programming (ADP) method to achieve optimal trajectory tracking control for quadrotors. Relying on an established mathematical model of a quadrotor, the approximate optimal trajectory tracking control, which consists of the steady-state control input and the approximate optimal feedback control input, is designed for a nominal system. Considering the compound disturbances in position and attitude dynamic models, disturbance observers are introduced. The estimated values are used to design robust compensation inputs to suppress the effect of the compound disturbances for good trajectory tracking performance. Theoretically, the Lyapunov theorem demonstrates the stability of a closed-loop system. The robustness and effectiveness of the proposed controller are confirmed by the simulation results.

1. Introduction

The miniaturization and reduction in cost of the relevant control components in aircraft, as well as the development and progress of computer and sensing and measurement technologies, have improved the stability of flight control systems and greatly facilitated the development of quadrotors [1]. The high operability, strong mobility and flexibility of quadrotors allow them to meet the specific needs of many projects, generally used in military, industrial and other fields [2,3]. A quadrotor system is multivariable, nonlinear and strongly coupled, and quadrotors will also be disturbed by the surrounding environment during flight [4]. These factors can affect the accuracy of quadrotor control systems. The requirements for high-accuracy and robust flight control in the design of controllers for quadrotors are stringent, and the design of a core control algorithm is a prerequisite for quadrotors to achieve a stable and high-precision flight performance. Therefore, the research and development of controllers for quadrotor systems is of great significance.
At present, it is no longer a problem to ensure the uniformity of quadrotors through control algorithms. Many controllers for quadrotors have been designed and are already in application [5]. Since the dynamics of quadrotors can be linearized around the equilibrium point, traditional linear control methods are used for a designed controller [6]. On this basis, linear techniques are employed in the flight control of quadrotors, such as linear quadratic regulator (LQR) control [7]. However, quadrotors need to be controlled away from the equilibrium point to accomplish complex control tasks and withstand external disturbances. As a result, a technique has been devised that is regarded as a robust feedback linearization method that uses extended state observers to estimate the nonlinear state feedback term online, containing aerodynamic forces, moments and unknown disturbances, and obtains the desired closed-loop dynamics via pole assignment [8]. Moreover, several robust controllers relying on nonlinear techniques have been proposed, such as sliding mode control [9], adaptive control [10], backstepping-based control [11] and robust control [12]. These control methods ensure the stability and robustness of nonlinear systems and have generally been used for the tracking control of these systems, but their optimal properties have not been considered. Therefore, the concept of optimization has been introduced into control design.
To derive the optimal control policy for the infinite horizon optimal control problem, solving the Hamilton–Jacobi–Bellman (HJB) equation or the Hamilton–Jacobi–Isaacs (HJI) equation for the  H  optimal control problem considering uncertainties is essential. Nevertheless, it is difficult to mathematically derive the corresponding analytical solutions in most cases. Neural networks are an optional method to overcome this problem [13,14,15]. The approximation property of neural networks makes it possible to find approximate solutions to partial differential equations. The convergence of neural networks can be ensured by penalizing them to ensure they satisfy the given partial differential equations. The ADP method is the combination of reinforcement learning, dynamic programming and neural network adaptive methods to derive approximate solutions of the HJB/HJI equations using function approximate structures to address nonlinear optimal control problems [16,17]. The ADP method is used for control design with suitable performance index functions to derive the desired dynamic performance and stabilize a nominal system with uncertainties. However, most nonlinear optimal control methods using the ADP method are aimed at nominal systems or uncertain systems satisfying specific conditions [18,19,20], while the immunity to disturbances is still weak for such systems with external time-varying disturbances independent of the state, and the control effect under stronger disturbances is not ideal. The ADP method has been used in the design of controllers for quadrotors and efforts have been made to improve the robustness, but the designed controllers are more geared towards linear systems and design uncertainty is a unique problem [21,22]. Quadrotors will often experience various external effects in flight, requiring strong adaptive and anti-disturbance capabilities in flight control. The disturbance observer technique achieves disturbance suppression of the target utilizing feedback regulation [23], which can attenuate compound disturbances containing external disturbances and model uncertainties, thus improving the system robustness. A disturbance observer can accurately estimate compound disturbances in a system, which greatly reduces the conservatism of the control. In addition, since a disturbance observer can usually be designed independently of the controller, this ensures that the method can be easily combined with other advanced control methods and more flexible in its application. There are experiments suggesting that the introduction of a disturbance observer significantly improves performance, which is a good reference for methods for quadrotors to overcome disturbances [24].
Considering the above analysis, a robust approximate optimal trajectory tracking control method is proposed for quadrotors to solve the optimal control problem under the conditions of compound disturbances. The main contributions are summarized as follows:
(1)
The combination of modeling uncertainties and external time-varying disturbances is considered as compound disturbances. Disturbance observers are introduced to estimate the compound disturbances in the position and attitude subsystems, and the estimated values are used to design robust compensation inputs to suppress the effects of the compound disturbances and ensure the stability of a quadrotor system under the ADP method.
(2)
To obtain optimal trajectory tracking control for a quadrotor without composite disturbances, the ADP method is used to design approximate optimal control inputs for the nominal system of a quadrotor.
The rest of the paper is organized as follows. In Section 2, the quadrotor mathematical model is developed, and the quadrotor system is divided into two subsystems. Section 3 describes the design of the robust approximate optimal trajectory tracking control and the stability analysis of the closed-loop system. Section 4 describes the robust approximate optimal trajectory tracking control for the quadrotor. The results of the corresponding simulation and the results of the comparative simulation without disturbance observer are presented in Section 5. Section 6 gives the conclusion of the paper.

2. Mathematical Modeling of a Quadrotor

The quadrotor has four evenly spaced, cross-symmetrical brushless motors in the plane, The rotors of motor 1 and motor 3 rotate clockwise, while the rotors of motor 2 and motor 4 rotate counterclockwise. By changing the rotational speed of the four rotors, the quadrotor generates different magnitudes of lift forces and torques, which can control the takeoff, landing and attitude motions of the quadrotor. As a result, the location of the quadrotor can be altered in the three-dimensional space. Figure 1 depicts the basic structure of the quadrotor.
To clarify the mathematical model of the quadrotor system and satisfy the implementation of the control method, the earth-fixed inertial frame  O I X I Y I Z I  and the body-fixed body frame  O B X B Y B Z B  are established. To ensure that the constructed mathematical model does not lose the generality, it is assumed that the deformation and elastic vibration properties of the rotors and body are neglected, and the quadrotor is considered as an ideal rigid body; the quadrotor’s structure is symmetrical, its mass is uniformly distributed, and its center of mass is located at the geometric center. The translational and rotational motions of the quadrotor are satisfied by [25]
P ˙ = v ,
Θ ˙ = W B I ω ,
where  P = [ x , y , z ] T R 3  represents the position of the quadrotor in the inertial frame and  v = [ v x , v y , v z ] T R 3  represents the corresponding velocity.  Θ = [ ϕ , θ , ψ ] T R 3  denotes the vector of Euler angles.  ω = [ p , q , r ] T R 3  denotes the angular velocity of the quadrotor in the body frame.  W B I R 3 × 3  is the rotation matrix for the angular velocity in the form of
W B I = 1 s ϕ t θ c ϕ t θ 0 c ϕ s ϕ 0 s ϕ / c θ c ϕ / c θ ,
in which  s = sin ( ) c = cos ( )  and  t = tan ( ) .
Relying on the Newton–Euler method, the dynamical equation of quadrotor with compound disturbances is represented by [26]
m v ˙ = F k v + d p ,
I ω ˙ + ω × I ω = τ + d a ,
where  m R  represents the mass of the quadrotor.  I R 3 × 3  represents the inertia matrix of the quadrotor. As the assumptions of the quadrotor structure, its inertia matrix can be defined as the diagonal array  I diag I x x , I y y , I z z k = diag k x , k y , k z R 3 × 3  is the drag coefficient matrix.  F = [ F x , F y , F z ] T R 3  represents the resultant force consisting of the gravity and the total lift in the inertial frame.  τ = [ τ x , τ y , τ z ] T R 3  represents the torque in the body frame.  d p R 3  and  d a R 3  are the compound disturbances in position and attitude dynamic models, which contain modeling uncertainties and external time-varying disturbances.
According to the mechanical analysis, the quadrotor is affected by gravity and lift forces. Since the special structure of the quadrotor, the lift forces are along the z-axis direction of the body frame. Then, the resultant force expressed in the inertial frame is [27]
F = R B I z u T z u m g I ,
where  T R  represents the total lift force and  g I R  represents the gravity acceleration.  z u = [ 0 , 0 , 1 ] T R B I R 3 × 3  is the rotation matrix of the body frame transformed into the inertial frame in the form of
R B I = c θ c ψ s ϕ s θ c ψ c ϕ s ψ c ϕ s θ c ψ + s ϕ s ψ c θ s ψ s ϕ s θ s ψ + c ϕ c ψ c ϕ s θ s ψ s ϕ c ψ s θ s ϕ c θ c ϕ c θ .
Assumption 1
([28]). The pitch and roll angles hold the conditions  | ϕ | < π / 2  and  | θ | < π / 2  to avoid the singularities of the matrices  W B I  and  R B I .
Assumption 2
([29]). In the control process, the total compound disturbance  d t = d p T , d a T T R 6  has finite energy. In addition,  d t  is a continuous function and its norm is bounded such that  d t d t M , where  d t M  is an unknown positive constant. Simultaneously, the compound disturbances are usually considered to be superimposed by the low-frequency period signals. Hence, it is assumed that the total compound disturbance has a low change rate and its rate of change is slow compared to the dynamic properties of the disturbance observer, which can be considered that  d ˙ t 0 .
Assumption 3
([30]). The desired trajectory of position  P d = [ x d , y d , z d ] T R 3  and the desired trajectory of yaw angle  ψ d R  and their higher order derivatives are known, continuous and bounded.
Remark 1.
Assumption 2 is common in control studies using disturbance observers [31,32,33], while there are different considerations for compound disturbances in [34]. In the case of this paper, the considerations in Assumption 2 are used. Assumption 3 ensures that the ADP method can be utilized for the control design and the stability analysis.
The total lift and torque of the quadrotor are related to the force and torque of the four rotors as follows [35]:
T = T 1 + T 2 + T 3 + T 1 τ x = l ( T 2 T 4 ) τ y = l ( T 1 T 3 ) τ z = τ 1 τ 2 + τ 3 τ 4 ,
where  T i  and  τ i ( i = 1 , 2 , 3 , 4 )  are the lift and torque generated by the four rotors of the quadrotor, respectively. l represents the length from each rotor to the center of the body.
The rotor speeds are related to pulse-width modulated (PWM) signals through the motors. The lift forces and torques generated by the four motors are related to the pulse width of the input signals as follows [36]:
T i = K t B w s + B w u i τ i = K o B w s + B w u i ,
where  K t  and  K o  are the positive gains of the lift coefficient and the inverse torque coefficient, respectively.  B w  is the motor bandwidth and  u i  represents the PWM signals of each corresponding motor, which should be limited between 0 and 1.
Assuming that the motors have a sufficiently fast response speed, then the motor model can be simplified as [37]
T i = K t u i τ i = K o u i .
Hence, (8) can be rewritten as [38]
T τ x τ y τ z = K t K t K t K t 0 K t l 0 K t l K t l 0 K t l 0 K o K o K o K o u 1 u 2 u 3 u 4 .
Considering the trajectory tracking control for the quadrotor, the control objective is to design a controller that allows the position and attitude to track the desired trajectory asymptotically within a small error.
Combining (1), (2), (4) and (5), the overall model of the quadrotor can be decomposed into a position subsystem and an attitude subsystem. The position subsystem can be represented as
x ˙ 1 = f 1 ( x 1 ) + g 1 ( x 1 ) ( F + d p ) ,
with
x 1 = [ P T , v T ] T = [ x , y , z , v x , v y , v z ] T R 6 , f 1 ( x 1 ) = [ v x , v y , v z , k x v x / m , k y v y / m , k z v z / m ] T R 6 , g 1 ( x 1 ) = 0 0 0 1 / m 0 0 0 0 0 0 1 / m 0 0 0 0 0 0 1 / m T R 6 × 3 .
While the attitude subsystem is expressed in the form of
x ˙ 2 = f 2 ( x 2 ) + g 2 ( x 2 ) ( τ + d a ) ,
with
x 2 = [ Θ T , ω T ] T = [ ϕ , θ , ψ , p , q , r ] T R 6 , f 2 ( x 2 ) = [ p + q s ϕ t θ + r c ϕ t θ , q c ϕ r s ϕ , q s ϕ / c θ + r c ϕ / c θ , q r ( I y y I z z ) / I x x , p r ( I z z I x x ) / I y y , p q ( I x x I y y ) / I z z ] T R 6 , g 2 ( x 2 ) = 0 0 0 1 / I x x 0 0 0 0 0 0 1 / I y y 0 0 0 0 0 0 1 / I z z T R 6 × 3 .
In the next section, (12) and (13) will be the focus of our research.

3. Robust Approximate Optimal Trajectory Tracking Control Design

Considering the convenience of describing the control design process, (12) and (13) is represented in the uniform form
X ˙ = f ( X ) + g ( X ) ( U + D ) ,
in which  f ( X ) R 6  and  g ( X ) R 6 × 3  represent the drift dynamics and the input dynamics of the system, respectively.  X R 6  denotes the observable state vector,  U R 3  denotes the control input, and  D R 3  denotes the compound disturbance.
Definition 1
([39]). A state vector  X  is said to be uniformly ultimately bounded (UUB) if there exists a compact set  Ø X , a positive number  b X  and a time  t b ( X ( t 0 ) , b X )  such that  X b X  for all state variable initial value  X ( t 0 ) Ø X  and all  t t 0 + t b .
Lemma 1
([40]). X  is UUB if the time derivative of a positive definite function  L X ( X )  is negative when  X > b X  for a positive constant  b X .
To realize the trajectory tracking control with robustness for the system, the designed controller consists of two parts, the form of which is as follows:
U = U N + U R ,
where  U R  is the robust compensation input designed through the disturbance observer for suppressing the effect of compound disturbances in the system.  U N  is the control input designed based on the ADP method for the nominal system, which takes the form of
U N = U d + U E ,
where  U d  represents the steady-state control input and  U E  represents the feedback control input.

3.1. Disturbance Observer Design

The disturbance observer is applied to derive the estimate of the compound disturbance. The estimated value is then used for the design of the robust compensation input to improve robustness. The disturbance observer is designed as
Z ˙ = l D ( X ) f ( X ) + g ( X ) ( p D ( X ) + U + Z ) D ^ = Z + p D ( X ) ,
in which  D ^ R 3  represents the estimate of the unknown compound disturbance,  p D ( X ) R 3  represents the designed vector-valued function,  l D ( X ) = p D ( X ) / X R 3 × 6  is the observer gain and  Z R 3  represents the auxiliary variable vector of the disturbance observer.
Remark 2.
In the disturbance observer (17), the derivative of the state is required, which is unknown because the compound disturbance is unknown. Then, the auxiliary variable vector is given to avoid calculating the derivative of the state.
Define the estimation error of compound disturbance as  D ˜ = D D ^ . With regard to Assumption 2 and the disturbance observer (17), the time derivative of  D ˜  is developed as
D ˜ ˙ = D ˙ D ^ ˙ = Z ˙ l D ( X ) X ˙ = l D ( X ) f ( X ) + g ( X ) p D ( X ) + U + Z l D ( X ) X ˙ = l D ( X ) g ( X ) ( Z + p D ( X ) ) l D ( X ) X ˙ f ( X ) g ( X ) U .
Combined with (14), we have
D ˜ ˙ = l D ( X ) g ( X ) ( Z + p D ( X ) ) l D ( X ) g ( X ) D = l D ( X ) g ( X ) ( D D ^ ) = l D ( X ) g ( X ) D ˜ .
Then,  D ˜  is convergent by appropriately designing the vector-valued function  p D ( X ) .
Theorem 1.
Considering System (14), the disturbance observer is designed as (17). If  l D ( X ) g ( X )  is ensured to be positive definite for the design of the vector-valued function  p D ( X ) , then the estimated compound disturbance  D ^  would follow the compound disturbance D, which means the estimation error  D ˜  could converge to zero.
Proof. 
Select the candidate Lyapunov function as follows:
L D = 1 2 D ˜ T D ˜ .
Combined with (18), the time derivative of  L D  is
L ˙ D = D ˜ T D ˜ ˙ = D ˜ T l D ( X ) g ( X ) D ˜ .
In the case where  l D ( X ) g ( X )  is positive definite, then we derive
L ˙ D κ D ˜ 2 ,
where  κ = λ min l D ( X ) g ( X )  and  λ min ( )  denotes the minimum eigenvalue. Obviously,  L ˙ D < 0  when  D ˜ 0 . Hence, the disturbance observer (17) can estimate D and  D ˜  will converge to zero. This completes the proof. □
Then, the robust compensation input  U R  is designed as
U R = D ^ .

3.2. Optimal Trajectory Tracking Control Design and Analysis

The compound disturbance is estimated by the disturbance observer. The robust compensation input is designed by the estimated value to suppress the effect of the compound disturbances. As a result, converting the trajectory tracking control problem of the nonlinear system with the compound disturbance into the trajectory tracking control problem of the nominal system is possible. In order to derive the optimal control for the nominal system, deriving the solution of the associated HJB equation is essential. Unfortunately, deriving the analytical solution is difficult for the nonlinear system by the direct solution method. Then, the ADP method is utilized for achieving the approximate optimal control by constructing the critic network. The weight update law designed for the critic network ensures the convergence of the weight and the stability of the closed-loop system.
For System (14), the nominal system is represented by
X ˙ = f ( X ) + g ( X ) U .
Given the desired trajectory  X d R 6 , the steady-state control input  U d  is obtained from (24) as
U d = g + ( X d ) X ˙ d f ( X d ) ,
in which  g + ( X d )  denotes the pseudo-inverse of  g ( X d ) .
Define the tracking error as  E = X X d R 6 . Combined with (14) and (15), the error system is developed as
E ˙ = f ( X ) + g ( X ) ( U + D ) X ˙ d = f ( E + X d ) + g ( E + X d ) U d X ˙ d + g ( E + X d ) U E + g ( E + X d ) D ˜ .
Let  f E = f ( E + X d ) + g ( E + X d ) U d X ˙ d  and  g E = g ( E + X d ) , then we have
E ˙ = f E + g E U E + g E D ˜ .
Noting that  g E = g ( X ) , the norm of  g E  is bounded such that  g m g E g M  for the positive constants  g m  and  g M .
As a result of Theorem 1, the disturbance observer (20) can successfully estimate the compound disturbance D and the estimation error of compound disturbance  D ˜  can converge to zero. Therefore, it is possible to neglect  D ˜  in the error system (27) for the optimal control design [41,42]. However,  D ˜  would still be considered in the stability analysis. Then, the nominal error system is represented by
E ˙ = f E + g E U E .
Define the cost function as
V ( E ) = t 0 E T Q E + U E T R U E d t ,
where  Q R 6 × 6  and  R R 3 × 3  are the designed symmetric positive definite matrices.
The nonlinear Lyapunov equation for (29) is achieved as
V T ( f E + g E U E ) + E T Q E + U E T R U E = 0 ,
where  V = V ( E ) / E  and  V ( 0 ) = 0 .
Definition 2
([43]). A control policy  μ ( E )  is said to be admissible on the compact set Ø for (29) if  μ ( E )  is continuous on Ø μ ( 0 ) = 0 μ ( E )  stabilizes (28) on Ø and  V ( E )  is finite  E Ø . This is represented by  μ ( E ) Ψ ( Ø ) , where  Ψ ( Ø )  denotes the set of admissible control policies.
The Hamiltonian function takes the following form
H E , U E , V = V T ( f E + g E U E ) + E T Q E + U E T R U E .
The optimal cost function is represented by
V ( E ) = min U E Ψ ( Ø ) t 0 E T Q E + U E T R U E d t ,
and the following relation is satisfied
min U E Ψ ( Ø ) H E , U E , V = 0 ,
where  V = V ( E ) / E .
Under the existence condition of the optimal solution  H ( E , U E , V ) / U E = 2 R U E + g E T V = 0 , the optimal feedback control input is derived by
U E = 1 2 R 1 g E T V .
Substituting (34) and (31) into (33), the HJB equation is developed as
V T f E + E T Q E 1 4 V T g E R 1 g E T V = 0 .

3.3. Approximate Optimal Control Design

Clearly, it is necessary to derive  V  by solving the HJB Equation (35) for deriving the optimal feedback control input (34). However, (35) is a typical nonlinear partial differential equation and its solution is difficult to derive in the analytic form [44,45]. To overcome the difficulty, the ADP method relying on the policy iteration technique is utilized to derive the approximate solution.
Assumption 4
([46]). The continuously differentiable Lyapunov function candidate  J ( E )  for the nominal error system (28) satisfies  J T ( f E + g E U E ) < 0 , where  J = J ( E ) / E . Meanwhile, there exists a symmetric positive definite matrix  Λ ( E )  such that  J T ( f E + g E U E ) = J T Λ ( E ) J . Moreover, the relation  Λ m Λ ( E ) Λ M  holds for positive constants  Λ m Λ M .
Remark 3.
Assumption 4 is a common assumption that has been used for the ADP method. Generally, it is assumed that the closed-loop dynamics with the optimal feedback control is bounded by a function of the system state on the compact set. In such a situation, there exists a positive constant  η  such that  f E + g E U E η J . Hence, we can further derive  J T ( f E + g E U E ) η J 2 . Furthermore, the function  J ( E )  can be correctly selected as a quadratic polynomial [47], such as  J ( E ) = 1 2 E T E .
Considering the uniform estimation property of neural networks, the optimal cost function is approximated by
V ( E ) = W c T φ c ( E ) + ε c ( E ) ,
where  W c R N  represents the unknown ideal constant weight,  φ c ( E ) R N  represents the activation function,  ε c ( E )  represents the approximate error, and N represents the number of neurons. This neural network is called the critic network in the ADP method.
Lemma 2
([48]). The estimation error  ε c ( E )  is expected to be bounded when the approximated function  V ( E )  is bounded.
Then, by the definition of  V , it is developed as follows
V = φ c T W c + ε c ,
where  φ c = φ c ( E ) / E  and  ε c = ε c ( E ) / E .
Invoking (37), the optimal feedback control input (34) is developed as
U E = 1 2 R 1 g E T ( φ c T W c + ε c ) .
Substituting (37) into (35), the HJB equation is developed as
W c T φ c f E + E T Q E 1 4 W c T φ c Ξ φ c T W c + ε H = 0 ,
where  Ξ = g E R 1 g E T ε H  represents the residual error, which takes the form of
ε H = ε c T f E 1 2 ε c T Ξ φ c T W c 1 4 ε c T Ξ ε c = ε c T ( f E + g E U E ) + 1 4 ε c T Ξ ε c .
Since  g E  is bounded, there exists the positive constants  Ξ m  and  Ξ M  such that  Ξ m Ξ Ξ M .
Define the estimate of  W c  as  W ^ c , then the estimate of  V ( E )  is derived as follows:
V ^ ( E ) = W ^ c T φ c ( E ) .
Moreover, the approximate optimal feedback control input is derived as
U E = 1 2 R 1 g E T φ c T W ^ c .
Remark 4.
The classical ADP method utilizes the critic network and the actor network to approximate the optimal cost function and the optimal feedback control, respectively [43,49,50]. Considering the association between the optimal cost function and the optimal feedback control for the continuous affine nonlinear system, it is possible to omit the actor network and only use the critic network [51,52]. This framework provides smaller computational effort, faster convergence and compared to the actor–critic network framework, which has a better practical value.
Combining (31), (41) and (42), the approximate Hamiltonian function is developed as
H E , W ^ c = W ^ c T φ c f E + E T Q E 1 4 W ^ c T φ c Ξ φ c T W ^ c e c .
Define the objective function as
E c = 1 2 e c 2 .
Moreover, the weight update law is designed as
W ^ ˙ c = α 1 σ σ c 2 ( W ^ c T φ c f E + E T Q E 1 4 W ^ c T φ c Ξ φ c T W ^ c ) + α 2 2 Π ( E , U E ) φ c Ξ J ,
where  α 1 > 0 α 2 > 0  are the learning rates to be designed.  σ = φ c ( f E + g E U E )  and  σ c = σ T σ + 1 J  is given in Assumption 4.  Π ( E , U E )  in the last term is defined as
Π ( E , U E ) = 0 , i f J T ( f E + g E U E ) + α 3 J T g E g E T J < 0 1 , e l s e ,
where  α 3  is a designed positive constant.
Remark 5.
The first term in (45) is employed for minimizing the objective function (44). To ensure that  W ^ c  will converge to  W c , the existence of the persistence of excitation (PE) condition is essential during the learning process is necessary [49]. In addition, the probing noise is typically introduced to the control input for satisfying this condition, which may enable the closed-loop system to become unstable during the learning process [53,54]. The second term in (45) is employed for the stability of the closed-loop system.
Define the weight estimation error as  W ˜ c = W c W ^ c . Observing that  W ˜ ˙ c = W ^ ˙ c σ = φ c ( f E + g E U E ) = φ c E ˙ + 1 2 φ c Ξ ε c + 1 2 φ c Ξ φ c T W ˜ c  where  E ˙ = f E + g E U E , and using (39) and (45), we have
W ˜ ˙ c = α 1 σ c 2 ( φ c E ˙ + 1 2 φ c Ξ ε c + 1 2 φ c Ξ φ c T W ˜ c ) ( W ˜ c T φ c E ˙ + 1 2 W ˜ c T φ c Ξ ε c + 1 4 W ˜ c T φ c Ξ φ c T W ˜ c + ε H ) α 2 2 Π ( E , U E ) φ c Ξ J .

3.4. Stability Analysis

Assumption 5
([50]). The ideal weight  W c  have bound over the compact set Ø such that  W c W c M  for a positive constant  W c M . Meanwhile, the activation function  φ c  and the approximate error  ε c  are bounded such that  φ c φ c M ε c ε c M  for positive constants  φ c M  and  ε c M , and their derivatives are also bounded such that  φ c φ ¯ c M  and  ε c ε ¯ c M  for positive constants  φ ¯ c M  and  ε ¯ c M . Moreover, the residual error  ε H  will converge to zero when the number of neurons N is sufficiently large, as suggested by Remark 3 and the bound of  Ξ . That is, the relation  ε H ε H M  exists for the positive constant  ε H M .
Theorem 2.
Considering System (14), the robust approximate optimal controller for the trajectory tracking control is designed as (15), which consists of the robust compensation input (23) and the nominal system control input (16), and the weight update law is designed as (45) for the critic network, then it is ensured that the tracking error E of the closed-loop system and the weight estimation error  W ˜ c  are UUB.
Proof. 
Select the candidate Lyapunov function as follows
L = L D + L J + L W ,
where  L D  is designed as (20),  L J = α 2 J ( E )  and  L W = 1 2 W ˜ c T W ˜ c .
Considering the second term in (48) and using (27), the time derivative is developed as
L ˙ J = α 2 J T ( f E + g E U E ) + α 2 J T g E D ˜ .
Considering the third term in (48) and according to (47), the time derivative is developed as
L ˙ W = W ˜ c T W ˜ ˙ c = α 1 σ c 2 ( W ˜ c T φ c E ˙ + 1 2 W ˜ c T φ c Ξ ε c + 1 2 W ˜ c T φ c Ξ φ c T W ˜ c ) ( W ˜ c T φ c E ˙ + 1 2 W ˜ c T φ c Ξ ε c + 1 4 W ˜ c T φ c Ξ φ c T W ˜ c + ε H ) α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J = α 1 σ c 2 ( W ˜ c T φ c E ˙ ) 2 α 1 4 σ c 2 ( W ˜ c T φ c Ξ ε c ) 2 α 1 8 σ c 2 ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 α 1 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ ε c ) 3 α 1 4 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) 3 α 1 8 σ c 2 ( W ˜ c T φ c Ξ ε c ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) α 1 σ c 2 W ˜ c T φ c E ˙ ε H α 1 2 σ c 2 W ˜ c T φ c Ξ ε c ε H α 1 2 σ c 2 W ˜ c T φ c Ξ φ c T W ˜ c ε H α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J .
Since the first two terms in the final form of (50) are negative semi-definite, we then derive
L ˙ W α 1 8 σ c 2 ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 α 1 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ ε c ) 3 α 1 4 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) 3 α 1 8 σ c 2 ( W ˜ c T φ c Ξ ε c ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) α 1 σ c 2 W ˜ c T φ c E ˙ ε H α 1 2 σ c 2 W ˜ c T φ c Ξ ε c ε H α 1 2 σ c 2 W ˜ c T φ c Ξ φ c T W ˜ c ε H α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J .
According to Remark 3 and Assumption 5, and considering the bound of  Ξ , we assume that  λ 1 m φ c Ξ φ c T λ 1 M Ξ λ 2 φ c E ˙ λ 3 ε c λ 4 φ c Ξ ε c λ 5  and  ε H λ 6 . Noticing that the PE condition guarantees  σ c  to be bounded, there exists a positive constant  λ 7  such that  λ 7 1 / σ c 2 1 . In addition, based on Young’s inequality, there exists the relation  a b 1 2 ( c 2 a 2 + b 2 c 2 ) , where c is a nonzero constant. Then, we have
α 1 8 σ c 2 ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 α 1 8 λ 7 λ 1 m 2 W ˜ c 4 ,
α 1 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ ε c ) α 1 2 σ c 2 c 1 2 ( W ˜ c T φ c E ˙ ) 2 + ( W ˜ c T φ c Ξ ε c ) 2 c 1 2 α 1 c 1 2 2 λ 3 2 W ˜ c 2 + α 1 2 c 1 2 λ 5 2 W ˜ c 2 ,
3 α 1 4 σ c 2 ( W ˜ c T φ c E ˙ ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) 3 α 1 8 σ c 2 c 2 2 ( W ˜ c T φ c E ˙ ) 2 + ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 c 2 2 3 α 1 c 2 2 8 λ 3 2 W ˜ c 2 + 3 α 1 8 c 2 2 λ 1 M 2 W ˜ c 4 ,
3 α 1 8 σ c 2 ( W ˜ c T φ c Ξ ε c ) ( W ˜ c T φ c Ξ φ c T W ˜ c ) 3 α 1 16 σ c 2 c 3 2 ( W ˜ c T φ c Ξ ε c ) 2 + ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 c 3 2 3 α 1 c 3 2 16 λ 5 2 W ˜ c 2 + 3 α 1 16 c 3 2 λ 1 M 2 W ˜ c 4 ,
α 1 σ c 2 W ˜ c T φ c E ˙ ε H α 1 2 σ c 2 c 4 2 ( W ˜ c T φ c E ˙ ) 2 + ε H 2 c 4 2 α 1 c 4 2 2 λ 3 2 W ˜ c 2 + α 1 2 c 4 2 λ 6 2 ,
α 1 2 σ c 2 W ˜ c T φ c Ξ ε c ε H α 1 4 σ c 2 c 5 2 ( W ˜ c T φ c Ξ ε c ) 2 + ε H 2 c 5 2 α 1 c 5 2 4 λ 5 2 W ˜ c 2 + α 1 4 c 5 2 λ 6 2 ,
α 1 2 σ c 2 W ˜ c T φ c Ξ φ c T W ˜ c ε H α 1 4 σ c 2 c 6 2 ( W ˜ c T φ c Ξ φ c T W ˜ c ) 2 + ε H 2 c 6 2 α 1 c 6 2 4 λ 1 M 2 W ˜ c 4 + α 1 4 c 6 2 λ 6 2 .
Then, (51) is developed as
L ˙ W α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J ,
where
λ 8 = 1 8 λ 7 λ 1 m 2 3 8 c 2 2 λ 1 M 2 3 16 c 3 2 λ 1 M 2 c 6 2 4 λ 1 M 2 , λ 9 = c 1 2 2 λ 3 2 + 1 2 c 1 2 λ 5 2 + 3 c 2 2 8 λ 3 2 + 3 c 3 2 16 λ 5 2 + c 4 2 2 λ 3 2 + c 5 2 4 λ 5 2 , λ 10 = 1 2 c 4 2 λ 6 2 + 1 4 c 5 2 λ 6 2 + 1 4 c 6 2 λ 6 2 ,
and  c j ( j = 1 , 2 , . . . , 6 )  are all non-zero constants whose selection guarantees  λ 8 > 0 . Combining the results of (22), (49) and (51), we have
L ˙ = L ˙ D + L ˙ J + L ˙ W κ D ˜ 2 + α 2 J T ( f E + g E U E ) + α 2 J T g E D ˜ α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J .
By using Young’s inequality, the relation  α 2 J T g E D ˜ α 2 α 3 2 J T g E g E T J + α 2 2 α 3 D ˜ 2  exists. Then, (61) is developed as
L ˙ ( κ α 2 2 α 3 ) D ˜ 2 + α 2 J T ( f E + g E U E ) + α 2 α 3 2 J T g E g E T J α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 α 2 2 Π ( E , U E ) W ˜ c T φ c Ξ J .
The following discussion is divided into two cases.
Case 1.
In this case,  Π ( E , U E ) = 0 . Since  J T ( f E + g E U E ) + α 3 J T g E g E T J < 0 , we can derive that  J T ( f E + g E U E ) < 0 . According to the dense property of  R , there exists a positive constant  λ 11  such that  0 < λ 11 J J T ( f E + g E U E )  for all  E Ø . Then, (62) becomes
L ˙ ( κ α 2 2 α 3 ) D ˜ 2 α 2 2 λ 11 J α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 .
By selecting  α 2  and  α 3 , such that  κ α 2 2 α 3 > 0 , then  L ˙ < 0  is satisfied provided that one of the following conditions holds:
J > α 1 4 λ 8 λ 10 + λ 9 2 2 α 2 λ 8 λ 11 1 ,
or
W ˜ c > λ 9 + 4 λ 8 λ 10 + λ 9 2 2 λ 8 1 .
Case 2.
Considering the case  Π ( E , U E ) = 1 , (62) is developed as
L ˙ α 2 J T ( f E + g E U E ) + α 3 2 J T g E g E T J ( κ α 2 2 α 3 ) D ˜ 2 α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 α 2 2 W ˜ c T φ c Ξ J = α 2 J T ( f E + g E U E ) + α 3 2 J T g E g E T J + α 2 2 J T Ξ ε c ( κ α 2 2 α 3 ) D ˜ 2 α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 .
Based on Assumption 4, and considering  g E g M , we have
L ˙ α 2 ( Λ m α 3 2 g M 2 ) J 2 + α 2 2 λ 2 λ 4 J ( κ α 2 2 α 3 ) D ˜ 2 α 1 λ 8 W ˜ c 4 + α 1 λ 9 W ˜ c 2 + α 1 λ 10 .
Similarly, by selecting  α 2  and  α 3  such that  λ 12 = Λ m α 3 2 g M 2 > 0  and  κ α 2 2 α 3 > 0 , then it means that  L ˙ < 0  holds as long as
J > λ 2 λ 4 4 λ 12 + α 1 4 λ 8 λ 10 + λ 9 2 4 α 2 λ 8 λ 12 + λ 2 2 λ 4 2 16 λ 12 2 2 ,
or
W ˜ c > λ 9 2 λ 8 + λ 10 λ 8 + λ 9 2 4 λ 8 2 + α 2 λ 2 2 λ 4 2 16 α 1 λ 8 λ 12 2 .
In conclusion,  L ˙ < 0  when  J > max 1 , 2  or  W ˜ c > max 1 , 2 . Relying on Lemma 1 and the standard Lyapunov extension theorem [55], it is further concluded that the tracking error E of the closed-loop system and the weight estimation error  W ˜ c  are UUB. This completes the proof. □
Remark 6.
As a result of Theorem 2, the approximate optimal cost function  V ^ ( E )  in (41) and the approximate optimal feedback control input  U E  in (42) can, respectively, converge to the neighborhoods of the optimal cost function  V ( E )  and the optimal feedback control input  U E  within finite bounds when the PE condition holds [41].

4. Robust Approximate Optimal Trajectory Tracking Control for a Quadrotor

Position and yaw angle are the system outputs for the quadrotor that tracks the desired trajectory of position and the desired trajectory of yaw angle. The desired trajectories of roll and pitch angles required by the attitude subsystem are generated according to the position subsystem control inputs. The tracking errors in lateral and longitudinal positions are eliminated by the attitude subsystem tracking the desired trajectories of roll and pitch angles. According to the description of the control design in the previous section, the control design for the quadrotor is shown in Figure 2, which can guarantee that the tracking error of the quadrotor remains within a small range.

4.1. Position Control Design

The estimated value of unknown compound disturbance  d ^ p  in the position subsystem is derived by the following disturbance observer
z ˙ 1 = l 1 ( x 1 ) f 1 ( x 1 ) + g 1 ( x 1 ) ( p 1 ( x 1 ) + F + z 1 ) d ^ p = z 1 + p 1 ( x 1 ) ,
where  l 1 ( x 1 ) = p 1 ( x 1 ) / x 1  denotes the observer gain of the disturbance observer in the position subsystem and F is derived by (6). Then, the position subsystem robust compensation input is designed as
F R = d ^ p .
The steady-state control input for the position nominal system is designed as
F d = g 1 + ( x 1 d ) x ˙ 1 d f 1 ( x 1 d ) ,
where  x 1 d = [ P d T , v d T ] T R 6  and  v d = [ v x d , v y d , v z d ] T = P ˙ d R 3 g 1 + ( x 1 d )  denotes the pseudo-inverse of  g 1 ( x 1 d ) . Then, define the position subsystem tracking error as
e 1 = x 1 x 1 d [ e x , e y , e z , e v x , e v y , e v z ] T R 6 .
The cost function of the position subsystem is represented as
V 1 ( e 1 ) = t 0 e 1 T Q 1 e 1 + F e T R 1 F e d t ,
where  Q 1 R 6 × 6  and  R 1 R 3 × 3  are the designed symmetric definite matrices. The approximate optimal feedback control input in the position subsystem is
F e = 1 2 R 1 1 g e 1 T φ c 1 T W ^ c 1 ,
where  g e 1 = g 1 ( x 1 )  and  φ c 1 = φ c 1 ( e 1 ) / e 1 φ c 1 ( e 1 )  is the activation function and  W ^ c 1  represents the estimate of the ideal weight for the critic network of the position subsystem. The corresponding weight update law is designed as
W ^ ˙ c 1 = α 11 σ 1 σ c 1 2 ( e 1 T Q 1 e 1 + W ^ c 1 T φ c 1 f e 1 1 4 W ^ c 1 T φ c 1 Ξ 1 φ c 1 T W ^ c 1 ) + α 12 2 Π ( e 1 , F e ) φ c 1 Ξ 1 J 1 ,
where  α 11 > 0 α 12 > 0  are the designed learning rates.  σ 1 = φ c 1 ( f e 1 + g e 1 F e ) σ c 1 = σ 1 T σ 1 + 1  and  Ξ 1 = g e 1 R 1 1 g e 1 T J 1 = J 1 ( e 1 ) / e 1 , where  J 1 ( e 1 )  is the Lyapunov function candidate that satisfies Assumption 4.
Then, the robust approximate optimal trajectory tracking control in the position subsystem is designed as
F = F N + F R = F d + F e + F R .

4.2. Attitude Resolution

Since the system of the quadrotor is underactuated and strongly coupled, the information of the position subsystem is used to calculate the total lift force. The desired trajectories of roll and pitch angles are determined by the position subsystem through the relation between the kinematic equation and the Euler equation and passed to the attitude subsystem. For the position subsystem, the generated tracking error and the received compound disturbance can be eliminated by the attitude subsystem. By a matrix operation on (6), the following equations are derived:
F x = T ( c ϕ s θ c ψ + s ϕ s ψ ) , F y = T ( c ϕ s θ s ψ s ϕ c ψ ) , F z = T c ϕ c θ m g I .
The actual total lift force for the quadrotor system is designed as
T = ( F z + m g I ) / c ϕ c θ .
Substituting (79) into (78), the form is transformed as
F x F y = ( F z + m g I ) c ψ s ψ s ψ c ψ t θ t ϕ / c θ .
The desired trajectories of the pitch and roll angles are derived by the following equations:
F x c ψ + F y s ψ = ( F z + m g I ) t θ d , F x s ψ F y c ψ = ( F z + m g I ) t ϕ d c θ .
Then, we have
θ d = arctan ( F x c ψ + F y s ψ F z + m g I ) , ϕ d = arctan ( c θ F x s ψ F y c ψ F z + m g I ) .

4.3. Attitude Control Design

Similarly, the estimated value of unknown compound disturbance  d ^ a  in the attitude subsystem is derived by the following disturbance observer
z ˙ 2 = l 2 ( x 2 ) f 2 ( x 2 ) + g 2 ( x 2 ) ( p 2 ( x 2 ) + τ + z 2 ) d ^ a = z 2 + p 2 ( x 2 ) ,
where  l 2 ( x 2 ) = p 2 ( x 2 ) / x 2  denotes the observer gain of the disturbance observer in the attitude subsystem. Then, the attitude subsystem robust compensation input is
τ R = d ^ a .
The desired trajectory for the angular velocity is given by [56]
ω d = 1 0 s θ d 0 c ϕ d s ϕ d c θ d 0 s ϕ d c ϕ d c θ d Θ ˙ d ,
in which  Θ d = [ ϕ d , θ d , ψ d ] T R 3  is the desired trajectory of Euler angles and  ω d = [ p d , q d , r d ] T R 3  is the desired trajectory of the angular velocity. The steady-state control input for the attitude nominal system is designed as
τ d = g 2 + ( x 2 d ) x ˙ 2 d f 2 ( x 2 d ) ,
where  x 2 d = [ Θ d T , ω d T ] T R 6  and  g 2 + ( x 2 d )  denotes the pseudo-inverse of  g 2 ( x 2 d ) . Then, define the attitude subsystem tracking error as
e 2 = x 2 x 2 d [ e ϕ , e θ , e ψ , e p , e q , e r ] T R 6 .
While the cost function of the attitude subsystem is represented as
V 2 ( e 2 ) = t 0 e 2 T Q 2 e 2 + τ e T R 2 τ e d t ,
where  Q 2 R 6 × 6  and  R 2 R 3 × 3  are the designed symmetric definite matrices. The approximate optimal feedback control input in the attitude subsystem is
τ e = 1 2 R 2 1 g e 2 T φ c 2 T W ^ c 2 ,
where  g e 2 = g 2 ( x 2 )  and  φ c 2 = φ c 2 ( e 2 ) / e 2 φ c 2 ( e 2 )  is the activation function and  W ^ c 2  represents the estimate of the ideal weight for the critic network of the attitude subsystem. The corresponding weight update law is designed as
W ^ ˙ c 2 = α 21 σ 2 σ c 2 2 ( e 2 T Q 2 e 2 + W ^ c 2 T φ c 2 f e 2 1 4 W ^ c 2 T φ c 2 Ξ 2 φ c 2 T W ^ c 2 ) + α 22 2 Π ( e 2 , τ e ) φ c 2 Ξ 2 J 2 ,
where  α 21 > 0 α 22 > 0  are the learning rates,  σ 2 = φ c 2 ( f e 2 + g e 2 τ e ) σ c 2 = σ 2 T σ 2 + 1  and  Ξ 2 = g e 2 R 2 1 g e 2 T J 2 = J 2 ( e 2 ) / e 2 , where  J 2 ( e 2 )  is the Lyapunov function candidate that satisfies Assumption 4.
Then, the robust approximate optimal trajectory tracking control in the attitude subsystem is designed as
τ = τ N + τ R = τ d + τ e + τ R .

5. Simulation Results

In this section, the robustness and effectiveness of the designed controller are evaluated through numerical simulations. The quadrotor is considered to be in a flight environment with slow-changing disturbances. The parameters of the quadrotor model are presented in Table 1 [24].
A representative desired trajectory is selected to emulate the trajectory tracking performance of the quadrotor. The desired trajectory is designed as  P d = [ 0.5 cos ( 0.5 t ) , 0.5 sin ( 0.5 t ) , 0.05 t + 0.5 ] T  and  ψ d = π / 12 . In addition, referring to [57,58], the unknown compound disturbances considered are described as  d p = [ 0.3 + 0.5 ( sin ( t ) + sin ( 0.5 t ) cos ( 0.8 t ) ) ; 0.3 + 0.5 ( cos ( t ) + sin ( 0.5 t ) cos ( 0.8 t ) ) ; 0.2 + 0.5 sin ( 1.5 t ) ] T  and  d a = [ 0.1 + 0.2 ( sin ( t ) + sin ( 0.5 t ) ) ; 0.1 + 0.2 ( cos ( 0.5 t ) cos ( 0.8 t ) ) ; 0.05 + 0.2 sin ( t ) sin ( 0.5 t ) ] T . In this way, the performance of the disturbance observers is reflected by comparing them with the estimates. The initial states of the quadrotor are all set to zero.
The vector-valued functions of the disturbance observers are designed as  p 1 ( x 1 ) = l 1 ( x 1 ) x 1 p 2 ( x 2 ) = l 2 ( x 2 ) x 2 , while the observer gains are selected as
l 1 ( x 1 ) = 0 0 0 60 0 0 0 0 0 0 60 0 0 0 0 0 0 60 , l 2 ( x 2 ) = 0 0 0 5 0 0 0 0 0 0 5 0 0 0 0 0 0 5 .
Clearly,  l 1 ( x 1 ) g 1 ( x 1 )  and  l 2 ( x 2 ) g 2 ( x 2 )  are positive definite and satisfy the design requirements of Theorem 1. To derive the appropriate dynamic performance, the parameters of the performance index functions are designed as  Q 1 = diag { 7 , 7 , 10 , 9 , 9 , 6 } Q 2 = diag { 1.5 , 1.5 , 1.2 , 0.3 , 0.3 , 0.4 } R 1 = R 2 = I 3 . The activation functions are designed as  φ c 1 ( e 1 ) = [ e x 2 , e x e v x , e y 2 , e y e v y , e z 2 , e z e v z , e v x 2 , e v y 2 , e v z 2 ] T φ c 2 ( e 2 ) = [ e ϕ 2 , e ϕ e p , e θ 2 , e θ e q , e ψ 2 , e ψ e r , e p 2 , e p e q , e p e r , e q 2 , e q e r , e r 2 , e ϕ 2 e q e r , e ϕ e p e q e r , e θ 2 e p e r , e θ e p e q e r , e ψ 2 e p e q , e ψ e p e q e r , e p 4 , e p 3 e q , e p 3 e r , e p 2 e q 2 , e p 2 e q e r , e p 2 e r 2 , e p e q 3 , e p e q 2 e r , e p e q e r 2 , e p e r 3 , e q 4 , e q 3 e r , e q 2 e r 2 , e q e r 3 , e r 4 ] T . The relevant constants of the weight update laws are selected as  α 11 = 10 α 12 = 0.01 α 13 = 0.1 α 21 = 20 α 22 = 0.001 α 23 = 0.1 . The Lyapunov function candidates are selected as  J 1 ( e 1 ) = 1 2 e 1 T e 1  and  J 2 ( e 2 ) = 1 2 e 2 T e 2 . The initial weights are assigned values within the interval  [ 0 , 1 ] .
The PE condition is ensured by the method mentioned in Remark 5 to excite the system states. The weights gradually vary to become slower and stabilize during the learning process. The converged weights are already very close to the ideal weights after sufficient learning. The convergence of the whole critic network weights  W ^ c 1 W ^ c 2  in the learning processes are depicted in Figure 3. The final converged values of  W ^ c 1 W ^ c 2  are as follows
W ^ c 1 = [ 11.3714 , 9.4718 , 11.3713 , 9.4718 , 13.1589 , 11.3203 , 7.6718 , 7.6718 , 7.4279 ] T , W ^ c 2 = [ 0.7500 , 0.0646 , 0.7182 , 0.0630 , 0.8202 , 0.0888 , 0.0214 , 0.0006 , 0.0004 , 0.0221 , 0.0024 , 0.0287 , 0.0092 , 0.0045 , 0.0346 , 0.0059 , 0.0096 , 0.0453 , 0.0340 , 0.0211 , 0.0106 , 0.0025 , 0.0039 , 0.0045 , 0.0227 , 0.0126 , 0.0155 , 0.0076 , 0.0256 , 0.0020 , 0.0016 , 0.0090 , 0.0185 ] T .
The converged weights are used to design the approximate feedback optimal control inputs. Figure 4 and Figure 5 present the variation of states in trajectory tracking control, revealing the corresponding tracking errors in Figure 6 and Figure 7. In addition, Figure 8 visualizes the path in three-dimensional space, whereas Figure 9 illustrates the PWM signals for the motors. The figures clearly demonstrate that the quadrotor system effectively tracks the desired trajectory and achieves a small convergence bound for the tracking error. These results highlight the rapidity and accuracy of the designed controller in the control process.
The estimates for the compound disturbances are depicted in Figure 10. It shows that the estimated values from the disturbance observers can quickly follow the actual compound disturbances. Moreover, the trajectory tracking control performs well in the presence of compound disturbances, which implies the robustness of the designed controller.
In order to verify that the designed controller rejects the compound disturbances, a comparative simulation is performed without the disturbance observers in the position subsystem and the attitude subsystem. The control inputs use only the control inputs designed for the nominal system. Under such control, the variation of states is presented in Figure 11 and Figure 12, while Figure 13 and Figure 14 show the corresponding tracking errors.
By comparing the simulation results, it is clear that the trajectory tracking control of the quadrotor cannot be realized without the robust compensation inputs. Thus, further demonstrating the robustness of the designed controller. Moreover, the corresponding path in three-dimensional space and the PWM signals of the motors are shown in Figure 15 and Figure 16, respectively.
In summary, the controller designed for quadrotor trajectory tracking control has good dynamic performance, high tracking accuracy and strong robustness when the quadrotor is subjected to compound disturbances.

6. Conclusions

This paper proposes a robust approximate optimal controller for the trajectory tracking control of the quadrotor with unknown compound disturbances. By incorporating the estimated values of compound disturbances that are estimated by the disturbance observers into the control design, the effect of compound disturbances can be suppressed, resulting in ensured tracking accuracy and improved robustness. Moreover, the ADP method can then be utilized in the nominal system for ensuring the performance index of the control. The stability of the closed-loop system is analyzed by the Lyapunov theorem, which demonstrates that the tracking errors are UUB. Simulation results further confirm the robustness and effectiveness of the designed controller. In future work, experiments will be considered to validate the performance of the proposed controller.

Author Contributions

Conceptualization, R.L. and Z.Y.; methodology, R.L.; software, Z.Y.; validation, R.L., Z.Y. and G.Y.; formal analysis, R.L.; investigation, Z.Y.; resources, R.L.; data curation, L.J.; writing—original draft preparation, Z.Y.; writing—review and editing, R.L., Z.Y. and G.Y.; visualization, G.L.; supervision, Z.L.; project administration, R.L.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62003233), the Fundamental Research Program of Shanxi Province (Grant Nos. 201901D211083 and 20210302124552), and the Science and Technology Innovation Project of Higher Education Institutions in Shanxi Province (Grant No. 2019L0236).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hassanalian, M.; Abdelkefi, A. Classifications, applications, and design challenges of drones: A review. Prog. Aerosp. Sci. 2017, 91, 99–131. [Google Scholar] [CrossRef]
  2. Salem, K.A.; Palaia, G.; Chiarelli, M.R.; Bianchi, M. A simulation framework for aircraft take-off considering ground effect aerodynamics in conceptual design. Aerospace 2023, 10, 459. [Google Scholar] [CrossRef]
  3. Salem, K.A.; Palaia, G.; Quarta, A.A. Review of hybrid-electric aircraft technologies and designs: Critical analysis and novel solutions. Prog. Aerosp. Sci. 2023, 141, 100924. [Google Scholar] [CrossRef]
  4. Shao, S.; Chen, M.; Hou, J.; Zhao, Q. Event-triggered-based discrete-time neural control for a quadrotor UAV using disturbance observer. IEEE/ASME Trans. Mechatronics 2021, 26, 689–699. [Google Scholar] [CrossRef]
  5. Idrissi, M.; Salami, M.; Annaz, F. A review of quadrotor unmanned aerial vehicles: Applications, architectural design and control algorithms. J. Intell. Robot. Syst. 2022, 104, 22. [Google Scholar] [CrossRef]
  6. Rinaldi, F.; Chiesa, S.; Quagliotti, F. Linear quadratic control for quadrotors UVAs dynamics and formation flight. J. Intell. Robot. Syst. 2013, 70, 203–220. [Google Scholar] [CrossRef]
  7. Dharmawan, A.; Priyambodo, T.K. Model of linear quadratic regulator (lqr) control method in hovering state of quadrotor. J. Telecommun. Electron. Comput. Eng. (JTEC) 2017, 9, 135–143. [Google Scholar]
  8. Alonge, F.; D’Ippolito, F.; Fagiolini, A.; Garraffa, G.; Sferlazza, A. Trajectory robust control of autonomous quadcopters based on model decoupling and disturbance estimation. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421996974. [Google Scholar] [CrossRef]
  9. Yang, Y.; Yan, Y. Attitude regulation for unmanned quadrotors using adaptive fuzzy gain-scheduling sliding mode control. Aerosp. Sci. Technol. 2016, 54, 208–217. [Google Scholar] [CrossRef]
  10. Avram, R.C.; Zhang, X.; Muse, J. Nonlinear adaptive fault-tolerant quadrotor altitude and attitude tracking with multiple actuator faults. IEEE Trans. Control. Syst. Technol. 2017, 26, 701–707. [Google Scholar] [CrossRef]
  11. Chen, F.; Lei, W.; Zhang, K.; Tao, G.; Jiang, B. A novel nonlinear resilient control for a quadrotor UVA via backstepping control and nonlinear disturbance observer. Nonlinear Dyn. 2016, 85, 1281–1295. [Google Scholar] [CrossRef]
  12. Liu, H.; Xi, J.; Zhong, Y. Robust attitude stabilization for nonlinear quadrotor systems with uncertainties and delays. IEEE Trans. Ind. Electron. 2017, 64, 5585–5594. [Google Scholar] [CrossRef]
  13. Liu, E.; Yan, Y.; Yang, Y. Neural network approximation-based backstepping sliding mode control for spacecraft with input saturation and dynamics uncertainty. Acta Astronaut. 2022, 191, 1–10. [Google Scholar] [CrossRef]
  14. Li, R.; Chen, M.; Wu, Q. Robust control for an unmanned helicopter with constrained flapping dynamics. Chin. J. Aeronaut. 2018, 31, 2136–2148. [Google Scholar] [CrossRef]
  15. Li, R.; Chen, M.; Wu, Q. Adaptive neural tracking control for uncertain nonlinear systems with input and output constraints using disturbance observer. Neurocomputing 2017, 235, 27–37. [Google Scholar] [CrossRef]
  16. Yang, Y.; Modares, H.; Vamvoudakis, K.G.; He, W.; Xu, C.Z.; Wunsch, D.C. Hamiltonian-driven adaptive dynamic programming with approximation errors. IEEE Trans. Cybern. 2021, 52, 13762–13773. [Google Scholar] [CrossRef]
  17. Xue, S.; Luo, B.; Liu, D. Event-triggered adaptive dynamic programming for zero-sum game of partially unknown continuous-time nonlinear systems. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 3189–3199. [Google Scholar] [CrossRef]
  18. Du, Y.; Jiang, B.; Ma, Y.; Cheng, Y. Robust ADP-based sliding-mode fault-tolerant control for nonlinear systems with application to spacecraft. Appl. Sci. 2022, 12, 1673. [Google Scholar] [CrossRef]
  19. Huang, Y.; Wang, D.; Liu, D. Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing 2017, 266, 128–140. [Google Scholar] [CrossRef]
  20. Wang, D.; Liu, D.; Li, H. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 2014, 11, 627–632. [Google Scholar] [CrossRef]
  21. Dou, L.; Su, X.; Zhao, X.; Zong, Q.; He, L. Robust tracking control of quadrotor via on-policy adaptive dynamic programming. Int. J. Robust Nonlinear Control 2021, 31, 2509–2525. [Google Scholar] [CrossRef]
  22. Mu, C.; Zhang, Y. Learning-based robust tracking control of quadrotor with time-varying and coupling uncertainties. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 259–273. [Google Scholar] [CrossRef]
  23. Chen, W.H.; Yang, J.; Guo, L.; Li, S. Disturbance-observer-based control and related methods—An overview. IEEE Trans. Ind. Electron. 2015, 63, 1083–1095. [Google Scholar] [CrossRef]
  24. Chen, M.; Xiong, S.; Wu, Q. Tracking flight control of quadrotor based on disturbance observer. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 1414–1423. [Google Scholar] [CrossRef]
  25. Chen, F.; Jiang, R.; Zhang, K.; Jiang, B.; Tao, G. Robust backstepping sliding-mode control and observer-based fault estimation for a quadrotor UVA. IEEE Trans. Ind. Electron. 2016, 63, 5044–5056. [Google Scholar]
  26. Shao, X.; Liu, J.; Cao, H.; Shen, C.; Wang, H. Robust dynamic surface trajectory tracking control for a quadrotor UVA via extended state observer. Int. J. Robust Nonlinear Control 2018, 28, 2700–2719. [Google Scholar] [CrossRef]
  27. Mofid, O.; Mobayen, S. Adaptive sliding mode control for finite-time stability of quad-rotor UAVs with parametric uncertainties. ISA Trans. 2018, 72, 1–14. [Google Scholar] [CrossRef] [PubMed]
  28. Lei, W.; Li, C.; Chen, M.Z. Robust adaptive tracking control for quadrotors by combining PI and self-tuning regulator. IEEE Trans. Control Syst. Technol. 2018, 27, 2663–2671. [Google Scholar] [CrossRef]
  29. Maqsood, H.; Qu, Y. Nonlinear disturbance observer based sliding mode control of quadrotor helicopter. J. Electr. Eng. Technol. 2020, 15, 1453–1461. [Google Scholar] [CrossRef]
  30. Hua, H.; Fang, Y.; Zhang, X.; Lu, B. A novel robust observer-based nonlinear trajectory tracking control strategy for quadrotors. IEEE Trans. Control Syst. Technol. 2020, 29, 1952–1963. [Google Scholar] [CrossRef]
  31. Song, R.; Lewis, F.L. Robust optimal control for a class of nonlinear systems with unknown disturbances based on disturbance observer and policy iteration. Neurocomputing 2020, 390, 185–195. [Google Scholar] [CrossRef]
  32. Lee, D. Nonlinear disturbance observer-based robust control for spacecraft formation flying. Aerosp. Sci. Technol. 2018, 76, 82–90. [Google Scholar] [CrossRef]
  33. Yuan, W.; Gao, G. Sliding mode control of the automobile electro-coating conveying mechanism with a nonlinear disturbance observer. Adv. Mech. Eng. 2018, 10, 1687814018795748. [Google Scholar] [CrossRef]
  34. Orozco Soto, S.M.; Cacace, J.; Ruggiero, F.; Lippiello, V. Active Disturbance Rejection Control for the Robust Flight of a Passively Tilted Hexarotor. Drones 2022, 6, 258. [Google Scholar] [CrossRef]
  35. Wang, Y.; Sun, J.; He, H.; Sun, C. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 3713–3725. [Google Scholar] [CrossRef]
  36. Li, C.; Wang, Y.; Yang, X. Adaptive fuzzy control of a quadrotor using disturbance observer. Aerosp. Sci. Technol. 2022, 128, 107784. [Google Scholar] [CrossRef]
  37. Fan, Y.; Guo, H.; Han, X.; Chen, X. Research and verification of trajectory tracking control of a quadrotor carrying a load. Appl. Sci. 2022, 12, 1036. [Google Scholar] [CrossRef]
  38. Wang, B.; Yu, X.; Mu, L.; Zhang, Y. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech. Syst. Signal Process. 2019, 120, 727–743. [Google Scholar] [CrossRef]
  39. Fei, Y.; Shi, P.; Lim, C.C. Robust and collision-free formation control of multiagent systems with limited information. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4286–4295. [Google Scholar] [CrossRef]
  40. Fei, Y.; Shi, P.; Lim, C.C. Robust formation control for multi-agent systems: A reference correction based approach. IEEE Trans. Circuits Syst. Regul. Pap. 2021, 68, 2616–2625. [Google Scholar] [CrossRef]
  41. Xia, R.; Wu, Q.; Shao, S. Disturbance observer-based optimal flight control of near space vehicle with external disturbance. Trans. Inst. Meas. Control 2020, 42, 272–284. [Google Scholar] [CrossRef]
  42. Sun, J.; Liu, C. Disturbance observer-based robust missile autopilot design with full-state constraints via adaptive dynamic programming. J. Frankl. Inst. 2018, 355, 2344–2368. [Google Scholar] [CrossRef]
  43. Zhang, H.; Cui, L.; Zhang, X.; Luo, Y. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 2011, 22, 2226–2236. [Google Scholar] [CrossRef] [PubMed]
  44. Xu, N.; Niu, B.; Wang, H.; Huo, X.; Zhao, X. Single-network ADP for solving optimal event-triggered tracking control problem of completely unknown nonlinear systems. Int. J. Intell. Syst. 2021, 36, 4795–4815. [Google Scholar] [CrossRef]
  45. Xia, R.; Wu, Q.; Chen, M. Disturbance observer-based optimal longitudinal trajectory control of near space vehicle. Sci. China Inf. Sci. 2019, 62, 1–3. [Google Scholar] [CrossRef]
  46. Sun, J.; Liu, C. Backstepping-based adaptive dynamic programming for missile-target guidance systems with state and input constraints. J. Frankl. Inst. 2018, 355, 8412–8440. [Google Scholar] [CrossRef]
  47. Wang, D.; Liu, D.; Li, H.; Ma, H. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Inf. Sci. 2014, 282, 167–179. [Google Scholar] [CrossRef]
  48. Zheng, S.; Shi, P.; Wang, S.; Shi, Y. Adaptive neural control for a class of nonlinear multiagent systems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 763–776. [Google Scholar] [CrossRef] [PubMed]
  49. Fan, Q.Y.; Yang, G.H. Adaptive actor–critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 165–177. [Google Scholar] [CrossRef] [PubMed]
  50. Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  51. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 142–160. [Google Scholar] [CrossRef]
  52. Zhao, B.; Liu, D.; Luo, C. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4330–4340. [Google Scholar] [CrossRef] [PubMed]
  53. Wang, D.; Liu, D.; Zhang, Y.; Li, H. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Netw. 2018, 97, 11–18. [Google Scholar] [CrossRef] [PubMed]
  54. Liu, D.; Wei, Q.; Wang, D.; Yang, X.; Li, H. Adaptive Dynamic Programming with Applications in Optimal Control; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  55. Lewis, F.L.; Jagannathan, S.; Yesildirek, A. Neural Network Control of Robot Manipulators and Nonlinear Systems; Taylor & Francis: London, UK, 1999. [Google Scholar]
  56. Castillo, A.; Sanz, R.; Garcia, P.; Qiu, W.; Wang, H.; Xu, C. Disturbance observer-based quadrotor attitude tracking control for aggressive maneuvers. Control Eng. Pract. 2019, 82, 14–23. [Google Scholar] [CrossRef]
  57. Mobayen, S.; El-Sousy, F.F.; Alattas, K.A.; Mofid, O.; Fekih, A.; Rojsiraphisal, T. Adaptive fast-reaching nonsingular terminal sliding mode tracking control for quadrotor UAVs subject to model uncertainties and external disturbances. Ain Shams Eng. J. 2023, 14, 102059. [Google Scholar] [CrossRef]
  58. Shao, X.; Yue, X.; Li, J. Event-triggered robust control for quadrotors with preassigned time performance constraints. Appl. Math. Comput. 2021, 14, 102059. [Google Scholar] [CrossRef]
Figure 1. Basic structure of the quadrotor.
Figure 1. Basic structure of the quadrotor.
Aerospace 11 00149 g001
Figure 2. Control design of the quadrotor.
Figure 2. Control design of the quadrotor.
Aerospace 11 00149 g002
Figure 3. Convergence of critic network weights.
Figure 3. Convergence of critic network weights.
Aerospace 11 00149 g003
Figure 4. Variation of states in the position subsystem.
Figure 4. Variation of states in the position subsystem.
Aerospace 11 00149 g004
Figure 5. Variation of states in the attitude subsystem.
Figure 5. Variation of states in the attitude subsystem.
Aerospace 11 00149 g005
Figure 6. Tracking errors in the position subsystem.
Figure 6. Tracking errors in the position subsystem.
Aerospace 11 00149 g006
Figure 7. Tracking errors in the attitude subsystem.
Figure 7. Tracking errors in the attitude subsystem.
Aerospace 11 00149 g007
Figure 8. Results of three-dimensional path.
Figure 8. Results of three-dimensional path.
Aerospace 11 00149 g008
Figure 9. Pulse-width of input signals.
Figure 9. Pulse-width of input signals.
Aerospace 11 00149 g009
Figure 10. Estimates of compound disturbances.
Figure 10. Estimates of compound disturbances.
Aerospace 11 00149 g010
Figure 11. Variation of states in the position subsystem without disturbance observers.
Figure 11. Variation of states in the position subsystem without disturbance observers.
Aerospace 11 00149 g011
Figure 12. Variation of states in the attitude subsystem without disturbance observers.
Figure 12. Variation of states in the attitude subsystem without disturbance observers.
Aerospace 11 00149 g012
Figure 13. Tracking errors in the position subsystem without disturbance observers.
Figure 13. Tracking errors in the position subsystem without disturbance observers.
Aerospace 11 00149 g013
Figure 14. Tracking errors in the attitude subsystem without disturbance observers.
Figure 14. Tracking errors in the attitude subsystem without disturbance observers.
Aerospace 11 00149 g014
Figure 15. Results of three-dimensional path without disturbance observers.
Figure 15. Results of three-dimensional path without disturbance observers.
Aerospace 11 00149 g015
Figure 16. Pulse-width of input signals without disturbance observers.
Figure 16. Pulse-width of input signals without disturbance observers.
Aerospace 11 00149 g016
Table 1. Parameters of quadrotor model.
Table 1. Parameters of quadrotor model.
SymbolValueUnits
m1.79kg
g I 9.81m/s2
l0.20m
K t 12.0N
K o 0.40N·m
I x x = I y y 0.03kg·m2
I z z 0.04kg·m2
k x = k y = k z 0.012N· s/m
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, R.; Yang, Z.; Yan, G.; Jian, L.; Li, G.; Li, Z. Robust Approximate Optimal Trajectory Tracking Control for Quadrotors. Aerospace 2024, 11, 149. https://doi.org/10.3390/aerospace11020149

AMA Style

Li R, Yang Z, Yan G, Jian L, Li G, Li Z. Robust Approximate Optimal Trajectory Tracking Control for Quadrotors. Aerospace. 2024; 11(2):149. https://doi.org/10.3390/aerospace11020149

Chicago/Turabian Style

Li, Rong, Zhengliang Yang, Gaowei Yan, Long Jian, Guoqiang Li, and Zhiqiang Li. 2024. "Robust Approximate Optimal Trajectory Tracking Control for Quadrotors" Aerospace 11, no. 2: 149. https://doi.org/10.3390/aerospace11020149

APA Style

Li, R., Yang, Z., Yan, G., Jian, L., Li, G., & Li, Z. (2024). Robust Approximate Optimal Trajectory Tracking Control for Quadrotors. Aerospace, 11(2), 149. https://doi.org/10.3390/aerospace11020149

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop