Robust Approximate Optimal Trajectory Tracking Control for Quadrotors

: This paper uses the adaptive dynamic programming (ADP) method to achieve optimal trajectory tracking control for quadrotors. Relying on an established mathematical model of a quadrotor, the approximate optimal trajectory tracking control, which consists of the steady-state control input and the approximate optimal feedback control input, is designed for a nominal system. Considering the compound disturbances in position and attitude dynamic models, disturbance observers are introduced. The estimated values are used to design robust compensation inputs to suppress the effect of the compound disturbances for good trajectory tracking performance. Theoretically, the Lyapunov theorem demonstrates the stability of a closed-loop system. The robustness and effectiveness of the proposed controller are confirmed by the simulation results.


Introduction
The miniaturization and reduction in cost of the relevant control components in aircraft, as well as the development and progress of computer and sensing and measurement technologies, have improved the stability of flight control systems and greatly facilitated the development of quadrotors [1].The high operability, strong mobility and flexibility of quadrotors allow them to meet the specific needs of many projects, generally used in military, industrial and other fields [2,3].A quadrotor system is multivariable, nonlinear and strongly coupled, and quadrotors will also be disturbed by the surrounding environment during flight [4].These factors can affect the accuracy of quadrotor control systems.The requirements for high-accuracy and robust flight control in the design of controllers for quadrotors are stringent, and the design of a core control algorithm is a prerequisite for quadrotors to achieve a stable and high-precision flight performance.Therefore, the research and development of controllers for quadrotor systems is of great significance.
At present, it is no longer a problem to ensure the uniformity of quadrotors through control algorithms.Many controllers for quadrotors have been designed and are already in application [5].Since the dynamics of quadrotors can be linearized around the equilibrium point, traditional linear control methods are used for a designed controller [6].On this basis, linear techniques are employed in the flight control of quadrotors, such as linear quadratic regulator (LQR) control [7].However, quadrotors need to be controlled away from the equilibrium point to accomplish complex control tasks and withstand external disturbances.As a result, a technique has been devised that is regarded as a robust feedback linearization method that uses extended state observers to estimate the nonlinear state feedback term online, containing aerodynamic forces, moments and unknown disturbances, and obtains the desired closed-loop dynamics via pole assignment [8].Moreover, several robust controllers relying on nonlinear techniques have been proposed, such as sliding mode control [9], adaptive control [10], backstepping-based control [11] and robust control [12].These control methods ensure the stability and robustness of nonlinear systems and have generally been used for the tracking control of these systems, but their optimal properties have not been considered.Therefore, the concept of optimization has been introduced into control design.
To derive the optimal control policy for the infinite horizon optimal control problem, solving the Hamilton-Jacobi-Bellman (HJB) equation or the Hamilton-Jacobi-Isaacs (HJI) equation for the H ∞ optimal control problem considering uncertainties is essential.Nevertheless, it is difficult to mathematically derive the corresponding analytical solutions in most cases.Neural networks are an optional method to overcome this problem [13][14][15].The approximation property of neural networks makes it possible to find approximate solutions to partial differential equations.The convergence of neural networks can be ensured by penalizing them to ensure they satisfy the given partial differential equations.The ADP method is the combination of reinforcement learning, dynamic programming and neural network adaptive methods to derive approximate solutions of the HJB/HJI equations using function approximate structures to address nonlinear optimal control problems [16,17].The ADP method is used for control design with suitable performance index functions to derive the desired dynamic performance and stabilize a nominal system with uncertainties.However, most nonlinear optimal control methods using the ADP method are aimed at nominal systems or uncertain systems satisfying specific conditions [18][19][20], while the immunity to disturbances is still weak for such systems with external time-varying disturbances independent of the state, and the control effect under stronger disturbances is not ideal.The ADP method has been used in the design of controllers for quadrotors and efforts have been made to improve the robustness, but the designed controllers are more geared towards linear systems and design uncertainty is a unique problem [21,22].Quadrotors will often experience various external effects in flight, requiring strong adaptive and anti-disturbance capabilities in flight control.The disturbance observer technique achieves disturbance suppression of the target utilizing feedback regulation [23], which can attenuate compound disturbances containing external disturbances and model uncertainties, thus improving the system robustness.A disturbance observer can accurately estimate compound disturbances in a system, which greatly reduces the conservatism of the control.In addition, since a disturbance observer can usually be designed independently of the controller, this ensures that the method can be easily combined with other advanced control methods and more flexible in its application.There are experiments suggesting that the introduction of a disturbance observer significantly improves performance, which is a good reference for methods for quadrotors to overcome disturbances [24].
Considering the above analysis, a robust approximate optimal trajectory tracking control method is proposed for quadrotors to solve the optimal control problem under the conditions of compound disturbances.The main contributions are summarized as follows: (1) The combination of modeling uncertainties and external time-varying disturbances is considered as compound disturbances.Disturbance observers are introduced to estimate the compound disturbances in the position and attitude subsystems, and the estimated values are used to design robust compensation inputs to suppress the effects of the compound disturbances and ensure the stability of a quadrotor system under the ADP method.(2) To obtain optimal trajectory tracking control for a quadrotor without composite disturbances, the ADP method is used to design approximate optimal control inputs for the nominal system of a quadrotor.
The rest of the paper is organized as follows.In Section 2, the quadrotor mathematical model is developed, and the quadrotor system is divided into two subsystems.Section 3 describes the design of the robust approximate optimal trajectory tracking control and the stability analysis of the closed-loop system.Section 4 describes the robust approximate optimal trajectory tracking control for the quadrotor.The results of the corresponding simulation and the results of the comparative simulation without disturbance observer are presented in Section 5. Section 6 gives the conclusion of the paper.

Mathematical Modeling of a Quadrotor
The quadrotor has four evenly spaced, cross-symmetrical brushless motors in the plane, The rotors of motor 1 and motor 3 rotate clockwise, while the rotors of motor 2 and motor 4 rotate counterclockwise.By changing the rotational speed of the four rotors, the quadrotor generates different magnitudes of lift forces and torques, which can control the takeoff, landing and attitude motions of the quadrotor.As a result, the location of the quadrotor can be altered in the three-dimensional space.Figure 1  To clarify the mathematical model of the quadrotor system and satisfy the implementation of the control method, the earth-fixed inertial frame O I X I Y I Z I and the body-fixed body frame O B X B Y B Z B are established.To ensure that the constructed mathematical model does not lose the generality, it is assumed that the deformation and elastic vibration properties of the rotors and body are neglected, and the quadrotor is considered as an ideal rigid body; the quadrotor's structure is symmetrical, its mass is uniformly distributed, and its center of mass is located at the geometric center.The translational and rotational motions of the quadrotor are satisfied by [25] Ṗ = v, where P = [x, y, z] T ∈ R 3 represents the position of the quadrotor in the inertial frame and v = [v x , v y , v z ] T ∈ R 3 represents the corresponding velocity.Θ = [ϕ, θ, ψ] T ∈ R 3 denotes the vector of Euler angles.ω = [p, q, r] T ∈ R 3 denotes the angular velocity of the quadrotor in the body frame.W I B ∈ R 3×3 is the rotation matrix for the angular velocity in the form of in which s * = sin( * ), c * = cos( * ) and t * = tan( * ).
Relying on the Newton-Euler method, the dynamical equation of quadrotor with compound disturbances is represented by [26] where m ∈ R represents the mass of the quadrotor.I ∈ R 3×3 represents the inertia matrix of the quadrotor.As the assumptions of the quadrotor structure, its inertia matrix can be defined as the diagonal array I ≜ diag I xx , I yy , represents the resultant force consisting of the gravity and the total lift in the inertial frame.τ = [τ x , τ y , τ z ] T ∈ R 3 represents the torque in the body frame.d p ∈ R 3 and d a ∈ R 3 are the compound disturbances in position and attitude dynamic models, which contain modeling uncertainties and external time-varying disturbances.
According to the mechanical analysis, the quadrotor is affected by gravity and lift forces.Since the special structure of the quadrotor, the lift forces are along the z-axis direction of the body frame.Then, the resultant force expressed in the inertial frame is [27] where T ∈ R represents the total lift force and g I ∈ R represents the gravity acceleration.
is the rotation matrix of the body frame transformed into the inertial frame in the form of ∈ R 6 has finite energy.In addition, d t is a continuous function and its norm is bounded such that ∥d t ∥ ≤ d tM , where d tM is an unknown positive constant.Simultaneously, the compound disturbances are usually considered to be superimposed by the low-frequency period signals.Hence, it is assumed that the total compound disturbance has a low change rate and its rate of change is slow compared to the dynamic properties of the disturbance observer, which can be considered that ḋt ≃ 0. Assumption 3 ([30]).The desired trajectory of position P d = [x d , y d , z d ] T ∈ R 3 and the desired trajectory of yaw angle ψ d ∈ R and their higher order derivatives are known, continuous and bounded.
Remark 1. Assumption 2 is common in control studies using disturbance observers [31][32][33], while there are different considerations for compound disturbances in [34].In the case of this paper, the considerations in Assumption 2 are used.Assumption 3 ensures that the ADP method can be utilized for the control design and the stability analysis.
The total lift and torque of the quadrotor are related to the force and torque of the four rotors as follows [35]: where T i and τ i (i = 1, 2, 3, 4) are the lift and torque generated by the four rotors of the quadrotor, respectively.l represents the length from each rotor to the center of the body.
The rotor speeds are related to pulse-width modulated (PWM) signals through the motors.The lift forces and torques generated by the four motors are related to the pulse width of the input signals as follows [36]: where K t and K o are the positive gains of the lift coefficient and the inverse torque coefficient, respectively.B w is the motor bandwidth and u i represents the PWM signals of each corresponding motor, which should be limited between 0 and 1.
Assuming that the motors have a sufficiently fast response speed, then the motor model can be simplified as [37] Hence, ( 8) can be rewritten as [38]  Considering the trajectory tracking control for the quadrotor, the control objective is to design a controller that allows the position and attitude to track the desired trajectory asymptotically within a small error.
Combining ( 1), ( 2), ( 4) and ( 5), the overall model of the quadrotor can be decomposed into a position subsystem and an attitude subsystem.The position subsystem can be represented as with While the attitude subsystem is expressed in the form of with In the next section, ( 12) and ( 13) will be the focus of our research.

Robust Approximate Optimal Trajectory Tracking Control Design
Considering the convenience of describing the control design process, ( 12) and ( 13) is represented in the uniform form in which f (X) ∈ R 6 and g(X) ∈ R 6×3 represent the drift dynamics and the input dynamics of the system, respectively.X ∈ R 6 denotes the observable state vector, U ∈ R 3 denotes the control input, and D ∈ R 3 denotes the compound disturbance.

Definition 1 ([39]
).A state vector X is said to be uniformly ultimately bounded (UUB) if there exists a compact set ∅ X , a positive number b X and a time t b (X (t 0 ), b X ) such that ∥X ∥ ≤ b X for all state variable initial value X (t 0 ) ∈ ∅ X and all t ≥ t 0 + t b .

Lemma 1 ([40]
). X is UUB if the time derivative of a positive definite function L X (X ) is negative when ∥X ∥ > b X for a positive constant b X .
To realize the trajectory tracking control with robustness for the system, the designed controller consists of two parts, the form of which is as follows: where U R is the robust compensation input designed through the disturbance observer for suppressing the effect of compound disturbances in the system.U N is the control input designed based on the ADP method for the nominal system, which takes the form of where U d represents the steady-state control input and U E represents the feedback control input.

Disturbance Observer Design
The disturbance observer is applied to derive the estimate of the compound disturbance.The estimated value is then used for the design of the robust compensation input to improve robustness.The disturbance observer is designed as in which D ∈ R 3 represents the estimate of the unknown compound disturbance, p D (X) ∈ R 3 represents the designed vector-valued function, l D (X) = ∂p D (X)/∂X ∈ R 3×6 is the observer gain and Z ∈ R 3 represents the auxiliary variable vector of the disturbance observer.
Remark 2. In the disturbance observer (17), the derivative of the state is required, which is unknown because the compound disturbance is unknown.Then, the auxiliary variable vector is given to avoid calculating the derivative of the state.
Define the estimation error of compound disturbance as D = D − D. With regard to Assumption 2 and the disturbance observer (17), the time derivative of D is developed as Combined with ( 14), we have Then, D is convergent by appropriately designing the vector-valued function p D (X).
Theorem 1. Considering System (14), the disturbance observer is designed as (17).If l D (X)g(X) is ensured to be positive definite for the design of the vector-valued function p D (X), then the estimated compound disturbance D would follow the compound disturbance D, which means the estimation error D could converge to zero.
Proof.Select the candidate Lyapunov function as follows: Combined with (18), the time derivative of In the case where l D (X)g(X) is positive definite, then we derive where κ = λ min l D (X)g(X) and λ min ( * ) denotes the minimum eigenvalue.Obviously, LD < 0 when D ̸ = 0. Hence, the disturbance observer ( 17) can estimate D and D will converge to zero.This completes the proof.
Then, the robust compensation input U R is designed as

Optimal Trajectory Tracking Control Design and Analysis
The compound disturbance is estimated by the disturbance observer.The robust compensation input is designed by the estimated value to suppress the effect of the compound disturbances.As a result, converting the trajectory tracking control problem of the nonlinear system with the compound disturbance into the trajectory tracking control problem of the nominal system is possible.In order to derive the optimal control for the nominal system, deriving the solution of the associated HJB equation is essential.Unfortunately, deriving the analytical solution is difficult for the nonlinear system by the direct solution method.Then, the ADP method is utilized for achieving the approximate optimal control by constructing the critic network.The weight update law designed for the critic network ensures the convergence of the weight and the stability of the closed-loop system.
For System (14), the nominal system is represented by Given the desired trajectory X d ∈ R 6 , the steady-state control input U d is obtained from (24) as in which g + (X d ) denotes the pseudo-inverse of g(X d ).
Define the tracking error as E = X − X d ∈ R 6 .Combined with ( 14) and ( 15), the error system is developed as Let Noting that g E = g(X), the norm of g E is bounded such that g m ≤ ∥g E ∥ ≤ g M for the positive constants g m and g M .
As a result of Theorem 1, the disturbance observer (20) can successfully estimate the compound disturbance D and the estimation error of compound disturbance D can converge to zero.Therefore, it is possible to neglect D in the error system (27) for the optimal control design [41,42].However, D would still be considered in the stability analysis.Then, the nominal error system is represented by Define the cost function as where Q ∈ R 6×6 and R ∈ R 3×3 are the designed symmetric positive definite matrices.

Definition 2 ([43]
).A control policy µ(E) is said to be admissible on the compact set ∅ for (29 , where Ψ(∅) denotes the set of admissible control policies.
The Hamiltonian function takes the following form The optimal cost function is represented by and the following relation is satisfied min where ∇V * = ∂V * (E)/∂E.Under the existence condition of the optimal solution ∂H(E, U * E , ∇V * )/∂U * E = 2RU * E + g T E ∇V * = 0, the optimal feedback control input is derived by Substituting ( 34) and ( 31) into (33), the HJB equation is developed as

Approximate Optimal Control Design
Clearly, it is necessary to derive ∇V * by solving the HJB Equation (35) for deriving the optimal feedback control input (34).However, (35) is a typical nonlinear partial differential equation and its solution is difficult to derive in the analytic form [44,45].To overcome the difficulty, the ADP method relying on the policy iteration technique is utilized to derive the approximate solution.
Assumption 4 ([46]).The continuously differentiable Lyapunov function candidate J(E) for the nominal error system (28) satisfies ∇J T ( f E + g E U * E ) < 0, where ∇J = ∂J(E)/∂E.Meanwhile, there exists a symmetric positive definite matrix Remark 3. Assumption 4 is a common assumption that has been used for the ADP method.Generally, it is assumed that the closed-loop dynamics with the optimal feedback control is bounded by a function of the system state on the compact set.In such a situation, there exists a positive constant η such that ∥ f E + g E U * E ∥ ≤ η∥∇J∥.Hence, we can further derive ∇J T ( f E + g E U * E ) ≤ η∥∇J∥ 2 .Furthermore, the function J(E) can be correctly selected as a quadratic polynomial [47], such as Considering the uniform estimation property of neural networks, the optimal cost function is approximated by where W c ∈ R N represents the unknown ideal constant weight, φ c (E) ∈ R N represents the activation function, ε c (E) represents the approximate error, and N represents the number of neurons.This neural network is called the critic network in the ADP method.

Lemma 2 ([48]
).The estimation error ε c (E) is expected to be bounded when the approximated function V * (E) is bounded.
Then, by the definition of ∇V * , it is developed as follows where ∇φ c = ∂φ c (E)/∂E and ∇ε c = ∂ε c (E)/∂E.Invoking (37), the optimal feedback control input (34) is developed as Substituting (37) into (35), the HJB equation is developed as where Ξ = g E R −1 g T E .ε H represents the residual error, which takes the form of Since ∥g E ∥ is bounded, there exists the positive constants Ξ m and Ξ M such that Define the estimate of W c as Ŵc , then the estimate of V * (E) is derived as follows: Moreover, the approximate optimal feedback control input is derived as Remark 4. The classical ADP method utilizes the critic network and the actor network to approximate the optimal cost function and the optimal feedback control, respectively [43,49,50].
Considering the association between the optimal cost function and the optimal feedback control for the continuous affine nonlinear system, it is possible to omit the actor network and only use the critic network [51,52].This framework provides smaller computational effort, faster convergence and compared to the actor-critic network framework, which has a better practical value.
Combining ( 31), ( 41) and ( 42), the approximate Hamiltonian function is developed as Define the objective function as Moreover, the weight update law is designed as where α 1 > 0, α 2 > 0 are the learning rates to be designed.σ = ∇φ c ( f E + g E U E ) and σ c = σ T σ + 1. ∇J is given in Assumption 4. Π(E, U E ) in the last term is defined as where α 3 is a designed positive constant.
Remark 5.The first term in ( 45) is employed for minimizing the objective function (44).To ensure that Ŵc will converge to W c , the existence of the persistence of excitation (PE) condition is essential during the learning process is necessary [49].In addition, the probing noise is typically introduced to the control input for satisfying this condition, which may enable the closed-loop system to become unstable during the learning process [53,54].The second term in (45) is employed for the stability of the closed-loop system.

Define the weight estimation error as
Wc where Ė * = f E + g E U * E , and using ( 39) and ( 45), we have

Stability Analysis
Assumption 5 ([50]).The ideal weight W c have bound over the compact set ∅ such that ∥W c ∥ ≤ W cM for a positive constant W cM .Meanwhile, the activation function φ c and the approximate error ε c are bounded such that ∥φ c ∥ ≤ φ cM , ∥ε c ∥ ≤ ε cM for positive constants φ cM and ε cM , and their derivatives are also bounded such that ∥∇φ c ∥ ≤ φcM and ∥∇ε c ∥ ≤ εcM for positive constants φcM and εcM .Moreover, the residual error ε H will converge to zero when the number of neurons N is sufficiently large, as suggested by Remark 3 and the bound of ∥Ξ∥.That is, the relation ∥ε H ∥ ≤ ε HM exists for the positive constant ε HM .
Theorem 2. Considering System ( 14), the robust approximate optimal controller for the trajectory tracking control is designed as (15), which consists of the robust compensation input ( 23) and the nominal system control input (16), and the weight update law is designed as (45) for the critic network, then it is ensured that the tracking error E of the closed-loop system and the weight estimation error Wc are UUB.

Proof. Select the candidate Lyapunov function as follows
where L D is designed as (20), L J = α 2 J(E) and L W = 1 2 WT c Wc .Considering the second term in (48) and using (27), the time derivative is developed as Considering the third term in (48) and according to (47), the time derivative is developed as Since the first two terms in the final form of ( 50) are negative semi-definite, we then derive According to Remark 3 and Assumption 5, and considering the bound of ∥Ξ∥, we assume that Noticing that the PE condition guarantees σ c to be bounded, there exists a positive constant λ 7 such that λ 7 ≤ 1/σ 2 c ≤ 1.In addition, based on Young's inequality, there exists the relation , where c is a nonzero constant.Then, we have Then, ( 51) is developed as where and c j (j = 1, 2, ..., 6) are all non-zero constants whose selection guarantees λ 8 > 0. Combining the results of ( 22), ( 49) and ( 51), we have By using Young's inequality, the relation The following discussion is divided into two cases.
Case 1.In this case, Π(E, According to the dense property of R, there exists a positive constant λ 11 such that 0 < λ 11 ∥∇J∥ ≤ −∇J T ( f E + g E U E ) for all E ∈ ∅.Then, (62) becomes By selecting α 2 and α 3 , such that κ − α 2 2α 3 > 0, then L < 0 is satisfied provided that one of the following conditions holds: or Case 2. Considering the case Π(E, U E ) = 1, (62) is developed as Based on Assumption 4, and considering ∥g E ∥ ≤ g M , we have Similarly, by selecting α 2 and α 3 such that λ 12 = Λ m − α 3 2 g 2 M > 0 and κ − α 2 2α 3 > 0, then it means that L < 0 holds as long as or In conclusion, L < 0 when ∥∇J∥ > max{ℓ 1 , ℓ 2 } or Wc > max{h 1 , h2 }.Relying on Lemma 1 and the standard Lyapunov extension theorem [55], it is further concluded that the tracking error E of the closed-loop system and the weight estimation error Wc are UUB.This completes the proof.Remark 6.As a result of Theorem 2, the approximate optimal cost function V(E) in ( 41) and the approximate optimal feedback control input U E in (42) can, respectively, converge to the neighborhoods of the optimal cost function V * (E) and the optimal feedback control input U * E within finite bounds when the PE condition holds [41].

Robust Approximate Optimal Trajectory Tracking Control for a Quadrotor
Position and yaw angle are the system outputs for the quadrotor that tracks the desired trajectory of position and the desired trajectory of yaw angle.The desired trajectories of roll and pitch angles required by the attitude subsystem are generated according to the position subsystem control inputs.The tracking errors in lateral and longitudinal positions are eliminated by the attitude subsystem tracking the desired trajectories of roll and pitch angles.According to the description of the control design in the previous section, the control design for the quadrotor is shown in Figure 2, which can guarantee that the tracking error of the quadrotor remains within a small range.

Position Control Design
The estimated value of unknown compound disturbance dp in the position subsystem is derived by the following disturbance observer where l 1 (x 1 ) = ∂p 1 (x 1 )/∂x 1 denotes the observer gain of the disturbance observer in the position subsystem and F is derived by (6).Then, the position subsystem robust compensation input is designed as The steady-state control input for the position nominal system is designed as where ) denotes the pseudo-inverse of g 1 (x 1d ).Then, define the position subsystem tracking error as e 1 = x 1 − x 1d ≜ [e x , e y , e z , e v x , e v y , e v z ] T ∈ R 6 . (73) The cost function of the position subsystem is represented as where Q 1 ∈ R 6×6 and R 1 ∈ R 3×3 are the designed symmetric definite matrices.The approximate optimal feedback control input in the position subsystem is where g e1 = g 1 (x 1 ) and ∇φ c1 = ∂φ c1 (e 1 )/∂e 1 .φ c1 (e 1 ) is the activation function and Ŵc1 represents the estimate of the ideal weight for the critic network of the position subsystem.The corresponding weight update law is designed as where α 11 > 0, α 12 > 0 are the designed learning rates.
Then, the robust approximate optimal trajectory tracking control in the position subsystem is designed as

Attitude Resolution
Since the system of the quadrotor is underactuated and strongly coupled, the information of the position subsystem is used to calculate the total lift force.The desired trajectories of roll and pitch angles are determined by the position subsystem through the relation between the kinematic equation and the Euler equation and passed to the attitude subsystem.For the position subsystem, the generated tracking error and the received compound disturbance can be eliminated by the attitude subsystem.By a matrix operation on (6), the following equations are derived: The actual total lift force for the quadrotor system is designed as Substituting (79) into (78), the form is transformed as The desired trajectories of the pitch and roll angles are derived by the following equations: Then, we have (82)

Attitude Control Design
Similarly, the estimated value of unknown compound disturbance da in the attitude subsystem is derived by the following disturbance observer where l 2 (x 2 ) = ∂p 2 (x 2 )/∂x 2 denotes the observer gain of the disturbance observer in the attitude subsystem.Then, the attitude subsystem robust compensation input is The desired trajectory for the angular velocity is given by [56] in which is the desired trajectory of Euler angles and ω d = [p d , q d , r d ] T ∈ R 3 is the desired trajectory of the angular velocity.The steady-state control input for the attitude nominal system is designed as where ) denotes the pseudo-inverse of g 2 (x 2d ).Then, define the attitude subsystem tracking error as e 2 = x 2 − x 2d ≜ [e ϕ , e θ , e ψ , e p , e q , e r ] T ∈ R 6 . (87) While the cost function of the attitude subsystem is represented as where Q 2 ∈ R 6×6 and R 2 ∈ R 3×3 are the designed symmetric definite matrices.The approximate optimal feedback control input in the attitude subsystem is where g e2 = g 2 (x 2 ) and ∇φ c2 = ∂φ c2 (e 2 )/∂e 2 .φ c2 (e 2 ) is the activation function and Ŵc2 represents the estimate of the ideal weight for the critic network of the attitude subsystem.The corresponding weight update law is designed as where α 21 > 0, α 22 > 0 are the learning rates, Then, the robust approximate optimal trajectory tracking control in the attitude subsystem is designed as

Simulation Results
In this section, the robustness and effectiveness of the designed controller are evaluated through numerical simulations.The quadrotor is considered to be in a flight environment with slow-changing disturbances.The parameters of the quadrotor model are presented in Table 1 [24].
The vector-valued functions of the disturbance observers are designed as p 1 (x 1 ) = l 1 (x 1 )x 1 , p 2 (x 2 ) = l 2 (x 2 )x 2 , while the observer gains are selected as Clearly, l 1 (x 1 )g 1 (x 1 ) and l 2 (x 2 )g 2 (x 2 ) are positive definite and satisfy the design requirements of Theorem 1.To derive the appropriate dynamic performance, the parameters of the performance index functions are designed as Q 1 = diag{7, 7, 10, 9, 9, 6}, Q 2 = diag{1.5,1.5, 1.2, 0.3, 0.3, 0.4}, R 1 = R 2 = I 3 .The activation functions are designed as φ c1 (e 1 ) = [e 2 x , e x e v x , e 2 y , e y e v y , e 2 z , e z e v z , e 2 v x , e 2 v y , e 2 v z ] T , φ c2 (e 2 ) = [e 2 ϕ , e ϕ e p , e 2 θ , e θ e q , e 2 ψ , e ψ e r , e 2 p , e p e q , e p e r , e 2 q , e q e r , e 2 r , e 2 ϕ e q e r , e ϕ e p e q e r , e 2 θ e p e r , e θ e p e q e r , e 2 ψ e p e q , e ψ e p e q e r , e 4 p , e 3 p e q , e 3 p e r , e 2 p e 2 q , e 2 p e q e r , e 2 p e 2 r , e p e 3 q , e p e 2 q e r , e p e q e 2 r , e p e 3 r , e 4 q , e 3 q e r , e 2 q e 2 r , e q e 3 r , e The PE condition is ensured by the method mentioned in Remark 5 to excite the system states.The weights gradually vary to become slower and stabilize during the learning process.The converged weights are already very close to the ideal weights after sufficient learning.The convergence of the whole critic network weights Ŵc1 , Ŵc2 in the learning processes are depicted in Figure 3.The final converged values of Ŵc1 , Ŵc2 are as follows Ŵc1 = [11.3714, 9.4718, 11.3713, 9.4718, 13.1589, 11.3203, 7.6718, 7.6718, 7.4279] T , Ŵc2 =[0.7500, 0.0646, 0.7182, 0.0630, 0.8202, 0.0888, 0.0214, −0.0006, −0.0004, 0.0221, −0.0024, 0.0287, 0.0092, 0.0045, 0.0346, −0.0059, −0.0096, 0.0453, 0.0340, 0.0211, 0.0106, −0.0025, 0.0039, −0.0045, 0.0227, 0.0126, 0.0155, 0.0076, 0.0256, 0.0020, 0.0016, 0.0090, −0.0185] T .The converged weights are used to design the approximate feedback optimal control inputs.Figure 4 and 5 present the variation of states in trajectory tracking control, revealing the corresponding tracking errors in Figure 6 and Figure 7.In addition, Figure 8 visualizes the path in three-dimensional space, whereas Figure 9 illustrates the PWM signals for the motors.The figures clearly demonstrate that the quadrotor system effectively tracks the desired trajectory and achieves a small convergence bound for the tracking error.These results highlight the rapidity and accuracy of the designed controller in the control process.The estimates for the compound disturbances are depicted in Figure 10.It shows that the estimated values from the disturbance observers can quickly follow the actual compound disturbances.Moreover, the trajectory tracking control performs well in the presence of compound disturbances, which implies the robustness of the designed controller.In order to verify that the designed controller rejects the compound disturbances, a comparative simulation is performed without the disturbance observers in the position subsystem and the attitude subsystem.The control inputs use only the control inputs designed for the nominal system.Under such control, the variation of states is presented in Figures 11 and 12, while Figures 13 and 14 show the corresponding tracking errors.By comparing the simulation results, it is clear that the trajectory tracking control of the quadrotor cannot be realized without the robust compensation inputs.Thus, further demonstrating the robustness of the designed controller.Moreover, the corresponding path in three-dimensional space and the PWM signals of the motors are shown in Figure 15 and Figure 16, respectively.In summary, the controller designed for quadrotor trajectory tracking control has good dynamic performance, high tracking accuracy and strong robustness when the quadrotor is subjected to compound disturbances.

Conclusions
This paper proposes a robust approximate optimal controller for the trajectory tracking control of the quadrotor with unknown compound disturbances.By incorporating the estimated values of compound disturbances that are estimated by the disturbance observers into the control design, the effect of compound disturbances can be suppressed, resulting in ensured tracking accuracy and improved robustness.Moreover, the ADP method can then be utilized in the nominal system for ensuring the performance index of the control.The stability of the closed-loop system is analyzed by the Lyapunov theorem, which demonstrates that the tracking errors are UUB.Simulation results further confirm the robustness and effectiveness of the designed controller.In future work, experiments will be considered to validate the performance of the proposed controller.

Figure 2 .
Figure 2. Control design of the quadrotor.

Figure 4 .
Figure 4. Variation of states in the position subsystem.

Figure 5 .
Figure 5. Variation of states in the attitude subsystem.

Figure 6 .
Figure 6.Tracking errors in the position subsystem.

Figure 7 .
Figure 7. Tracking errors in the attitude subsystem.

Figure 11 .
Figure 11.Variation of states in the position subsystem without disturbance observers.

Figure 12 .
Figure 12.Variation of in the attitude subsystem without disturbance observers.

Figure 13 .
Figure 13.Tracking errors in the position subsystem without disturbance observers.

Figure 14 .
Figure 14.Tracking errors in the attitude subsystem without disturbance observers.

Figure 15 .
Figure 15.Results of three-dimensional path without disturbance observers.

Figure 16 .
Figure 16.Pulse-width of input signals without disturbance observers.
∂e 2 , where J 2 (e 2 ) is the Lyapunov function candidate that satisfies Assumption 4.