Next Article in Journal
UGV Parking Planning Based on Swarm Optimization and Improved CBS in High-Density Scenarios for Innovative Urban Mobility
Previous Article in Journal
A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform
Previous Article in Special Issue
A Nonlinear Adaptive Autopilot for Unmanned Aerial Vehicles Based on the Extension of Regression Matrix
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Attitude-Tracking Control for Over-Actuated Tailless UAVs at Cruise Using Adaptive Dynamic Programming

Equipment Management and Unmanned Aerial Vehicle Engineering College, Airforce Enginnering University, Xi’an 710072, China
*
Authors to whom correspondence should be addressed.
Drones 2023, 7(5), 294; https://doi.org/10.3390/drones7050294
Submission received: 4 March 2023 / Revised: 21 April 2023 / Accepted: 24 April 2023 / Published: 27 April 2023
(This article belongs to the Special Issue Flight Control System Simulation)

Abstract

:
Using adaptive dynamic programming (ADP), this paper presents a novel attitude-tracking scheme for over-actuated tailless unmanned aerial vehicles (UAVs) that integrates control and control allocation while accounting for nonlinearity and nonaffine control inputs. The proposed method uses the idea of nonlinear dynamic inversion to create an augmented system and converts the optimal tracking problem into an optimal regulation problem using a discounted performance function. Drawing inspiration from incremental control, this method achieves optimal tracking control for the nonaffine system by simply using a critic-only structure. Moreover, the unique design of the performance function ensures robustness against model uncertainties and external disturbances. The ADP method was found to outperform traditional control architectures that separate control and control allocation, achieving the same level of attitude-tracking performance through a more optimized approach. Furthermore, unlike many recent optimal controllers for nonaffine systems, our method does not require any model identifiers and demonstrates robustness. The superiority of the ADP-based approach is verified through two simulated scenarios, and its internal mechanism is further discussed. The theoretical analysis of robustness and stability is also provided.

1. Introduction

The tailless unmanned aerial vehicle (UAV) has garnered immense attention due to its promising potential in both civil and military aviation. Its superior aerodynamic efficiency in comparison to traditional designs offers benefits such as improved voyage, carrying capacity, and stealth performance. This has led to the emergence of several tailless UAVs such as Boeing’s X-45 A, X-45B/C, Lockheed Martin’s RQ-170 Sentinel, BAE’s Taranis, and Dassault’s NEURON.
The Innovative Control Effector (ICE) aircraft [1,2], developed through research by Lockheed Martin, stands out among the latest tailless aerial vehicles. ICE is equipped with as many as 11 effectors relocated compactly on its main wing, which makes the system over-actuated. Such a unique layout was designed to investigate and measure the aerodynamics and performance of various low-observable tailless configurations using innovative control effectors. After decades of research, ICE has been found to have excellent maneuverability and stealth performance, making it a good choice for future UAV design.
The unique configuration of ICE allows its effectors to achieve goals beyond providing aerodynamic moments, such as minimizing drag or maximizing lift [3]. However, this configuration also poses challenges for control. Like other tailless vehicles, ICE suffers from problems such as poor static stability and coupling between longitudinal and lateral dynamics. Additionally, the redundant effectors of ICE require dealing with the control allocation problem, which involves selecting appropriate effectors and providing deflection commands to generate the required moments. However, the compact layout of ICE’s effectors results in strong coupling effects between them, causing the control inputs to appear nonlinear in the system, which means that the system is nonaffine. As a result, the control allocation of ICE is an extremely challenging task.
Since the proposal of the concept of ICE, researchers in the flight control field have been paying constant attention to it [4,5,6]. In 2017, Niestroy et al. [7] published detailed aerodynamic data for ICE, which enabled the construction of a highly precise control-oriented nonlinear model and the development of advanced control algorithms. Many researchers have developed different control algorithms for the nonlinear ICE model. In a recent study, He et al. [8] proposed an altitude tracker for ICE using the well-known decoupling conditions for nonaffine systems [9,10,11,12,13,14], while other researchers prefer incremental control methods.
The principle behind incremental control methods is timescale separation, which makes use of Taylor expansion. Incremental control methods can transform nonaffine systems into incremental affine forms. Therefore, the complexity of nonlinear optimization in CA can be avoided. This approach is highly effective for dealing with nonaffine systems [15]. Recently, there have been several advancements in incremental control for ICE. Stolk et al. [3] proposed a minimum drag CA method based on incremental nonlinear dynamic inversion. Matamoros [16] implemented an incremental nonlinear CA in ICE, resulting in improved tracking and CA performance. Sun et al. [17] improved the CA of ICE using hierarchical multi-objective optimization and adaptive incremental backstepping. Additionally, He et al. [14] extended the incremental control to the outer-loop control of ICE trajectory tracking using the pseudo-control hedging technique and relaxing the need for a timescale separation principle.
The reason incremental control is effective for nonaffine systems is it makes good use of the partial derivative of control inputs, f ( x , u ) u . This is obtained through the digital differentiation of aerodynamic data. It is worth noting that obtaining the partial derivative of control inputs may be challenging in some systems. However, with the advancements in wind-tunnel tests, we can obtain more accurate and economic aerodynamic data for flight control. When combined with model identification techniques, incremental control has great potential for the future.
Most of the aforementioned control methods separate the command tracking and control allocation (CA) tasks. The command tracker provides the aerodynamic coefficients command τ c to ensure that the reference signal x c is accurately tracked, while the CA determines the specific effector deflection u based on the aerodynamic coefficients command and its objective function. This framework is highly convenient for incorporating established flight control algorithms, and the CA can be viewed as an optimization problem that can leverage the well-established optimization theory. As a result, this framework is preferred by most researchers.
However, there are some aspects of the above framework that could be improved. The obvious drawback is that the CA is designed to minimize the objective function consisting of the moments tracking error τ e = τ τ c and the effector deflection u . From an input and output perspective, the moments tracking error is only an intermediate value, and what truly matters is the reference signal tracking error x e = x x c . Therefore, the ideal objective function should take both x e and the effector deflection into consideration, instead of just τ e . Additionally, most existing flight control algorithms are based on Lyapunov theory and can only take into account the convergence of x e . To make these algorithms compatible with the over-actuated UAV, the above framework must be adopted, and the second goal is left to the CA.
Meanwhile, the above framework takes two steps to give effector deflection, increasing the computational time. Hou et al. [18] introduced the recurrent neural network in CA and claimed that the recurrent neural network model could be solved in parallel to meet the real-time requirement. Still, this approach has only been validated in a linearized model, where the computational load is inherently small.
Hence, it is imperative to develop more reasonable and effective frameworks that abandon the meaningless intermediate values, therefore ensuring convergence of x e and achieving the second goal in one step. Optimal control is a promising option in this regard. Unlike the objective function of CA that only considers τ e and the second goal, the performance function of optimal control can incorporate the tracking error and any other desired second goals. This means that the command tracking and CA can be described using a single equation. However, solving the nonlinear Hamilton–Jacobi–Bellman (HJB) equation remains a formidable challenge.
Adaptive dynamic programming(ADP) provides new ideas for solving the nonlinear HJB equation. ADP is a heuristic algorithm for solving optimal control. Compared with other heuristic algorithms, such as reinforcement learning, ADP is supported by optimal control theory, so it shows better convergence and stability and is more suitable for flight control. The first application of ADP in optimal control could be seen in a study by Werbos [19]. ADP’s basic idea is to use sampling data to drive a neural network to approximate the optimal value function. In this way, APD turns the backward-in-time dynamic programming process into a forward-in-time manner and greatly expands the application of optimal control. For theoretical studies of ADP, Wei and Liu [20] give the stability analysis of policy iterative APD, and the stability proof of value iterative ADP is given by Al-Tamimi and Lewis [21]. Moreover, the researchers also proposed different frameworks of ADP, such as heuristic dynamic programming [22], dual heuristic programming [23], and globalized dual heuristic programming [24]. These studies lay the foundation of ADP, and a more detailed review of recent studies on ADP can be found in the paper by Liu et al. [25].
Model identification is a commonly adopted technique in recent applications of ADP in practical systems [26,27]. Model identification is an effective method for enhancing the robustness of ADP, but it requires introducing an identifier network. Compared to basic ADP, which uses only a critical network to approximate the value function, the incorporation of additional networks significantly increases the computational complexity. Therefore, approaches to alleviate the computational burden, such as the event-trigger technique [28], are necessary for these methods.
However, it is often overlooked that the optimal control itself could be robust with an appropriate design of performance function. This idea is illustrated in a book by Lin [29], and the author systematically discusses how to handle disturbance and model uncertainty in an optimal control way. This way, the ADP could enjoy robustness while avoiding heavy computational burdens. However, the author also points out that it is still an open question how to apply similar approaches to a nonaffine system. Most ADPs are developed to address the optimal regulation problem, but for flight control, the optimal tracking control has a more practical use. With the development of the aviation industry, modern flight control is no longer satisfied with just ensuring flight stability. Many researchers [30,31,32,33,34] began to consider how to track the command signal optimally.
In the control field, optimal tracking control has attracted increased attention. One of the most common optimal tracking methods is the combination of feedforward control and feedbackward control [35,36,37,38,39,40]. The feedforward control is a traditional steady-state tracking controller to ensure the command reference signal is tracked. ADP is used in the feedbackward control to stabilize the transient error optimally. With the help of the traditional steady-state tracking controller, this optimal tracker shows good stability. Nevertheless, this optimal tracker is not suitable for ICE. Designing a traditional steady-state tracking controller for ICE has already been an arduous task, and the control allocation still needs to be considered in this process.
Some ADP-based optimal trackers do not rely on the feedforward control [41,42,43]. These studies applied a discounted performance function to ensure the boundness of the optimal value function in the infinite-time process and constructed an augmented system using the error dynamic and reference signal dynamic to transform the tracking problem into the regulation problem. However, the dynamic of the reference signal is unavailable in these methods, limiting the use of these ADPs.
From the other view, nonlinear dynamic inversion [44], as a tried-and-tested control method in the flight control area, constructs the dynamic of the desired signal using state value and command signal, which provides a new idea to overcome this drawback. Moreover, these ADPs also cannot address the nonaffine system, so they cannot be directly used for ICE.
To apply ADP in ICE, the nonaffine control input must be considered. Recent optimal trackers for nonaffine systems can be grouped into two types. One type is mainly for single-input systems, which decouple the nonaffine system into an affine system with model uncertainties [45,46,47], but it is not easy to extend such a method to the multi-input system. Using decoupling conditions, these methods show robustness, but neglecting model details also makes their optimization performance poor. The other type uses the other neural network, known as the actor network, to handle nonlinearity in control input and update the policy through gradient-base algorithm [48,49,50]. This method performs well but also needs more data and training, which undoubtedly increases the computational burden. Of course, some tricks commonly used in reinforcement learning [34] could also help improve the convergence rate of the method, but this also causes the lack of stability proofs.
Motivated by the aforementioned studies, this article proposes a critic-only ADP technique for the attitude tracking of ICE featured by nonaffine control inputs and redundant effectors. Through ADP, our approach integrates control and control allocation so that the same performance can be achieved at a cost less than conventional methods. By the idea of nonlinear dynamic inversion, an augmented system is constructed. The optimal tracking problem is transformed into an optimal regulation problem with discounted performance function, and the command dynamic is avoided. Inspired by the successful use of f ( x , u ) u in incremental control, we introduce f ( x , u ) u into APD, letting our method handle the nonaffine system in a simple way. Moreover, this article proves that for the control of the nonaffine system, the robust tracking problem could be equivalent to the optimal tracking problem with an augmented cost. This provides another way to improve the robustness of ADP, and complex model identification methods can be avoided.
The rest of the paper is arranged as follows: Section 2 introduces the aerodynamic model of the UAV. Section 3 gives the problem formulation and shows at the theoretical level that the optimal control with a specially designed performance function is equivalent to robust control. Section 4 presents the control scheme and stability analysis. Section 5 presents two simulations that validate the superiority of our method over the conventional approach and demonstrate its robustness, respectively. Finally, Section 6 gives the conclusion and the outlook for the next steps of research.

2. Model Description

This section introduces the ICE model. The basic parameters of ICE can be found in Table 1 [7], while more detailed information on the modeling of effectors can be found in Chapter 3 of [3]. Due to space constraints, this information will not be repeated here.
Figure 1 displays the layout of ICE, which features a high-sweep, tailless flying wing with a leading-edge sweep of 65 deg and 25 deg chevron shaping on the trailing edge. ICE is equipped with 13 independent effectors, including two pairs of leading-edge flaps (LEF), a pair of spoiler slot deflectors (SSD), a pair of all-moving tips (AMT), a pair of elevon (ELE), a pair of ganged pitch flaps (PF), and multi-axis thrust vectoring (MTV). Since this paper is focused on the cruising stage, MTV will not be taken into account.
The deflection ranges of the effectors are, inboard LEF: 0–40 deg, outboard LEF: ±40 deg, ELE: ±30 deg, PF: ±30 deg, AMT: ±60 deg, SSD: 0–60 deg. The rate limits on the leading-edge devices are 40 deg/s and on all the other surfaces 150 deg/s.
The modeling of the UAV is based on the following two assumptions: 1st, the UAV flies in the atmosphere, and the atmosphere is incompressible; 2nd, the UAV’s body is rigid. Please note that only the body of the UAV is considered a rigid body, but the effectors are deformable.
Remark 1.
In previous studies [8,44,51], the MTV was used solely during the UAV’s vigorous maneuvers or when other effectors were saturated. However, this paper proposes a cruise-oriented approach where ADP enables a superior trade-off between effector deflection and tracking error. This effectively eliminates the need for MTV, ensuring that effector saturation is avoided.
The motion equation of 6-DOF UAV model is given below [14,18,52], the nomenclature of the variables in the following equation can be found in Table 2.
V ˙ χ ˙ γ ˙ = 1 M 0 0 0 1 M V c γ 0 0 0 1 M V · T ve G f + F
where V, γ , and χ are airspeed, flight path angle, and ground tracking angle, respectively. s and c represent sin and cos . F donate the sum of aerodynamic force and thrust, which could be approximated through accelerometers, and G f = [ 0 0 M g ] T represents gravitational forces. Define μ , α , and β as the bank angle, angle of attack, and sideslip angle, then the dynamic of [ μ α β ] is:
μ ˙ α ˙ β ˙ = c α c β 0 s α s β 1 0 s α c β 0 c α 1 T b v T χ ˙ s γ γ ˙ χ ˙ c γ + p q r
where p , q , and r are the body-axis roll, pitch, and yaw rates. The expression T vb is the transformation matrix from the body frame to the velocity frame, and T ve is the transformation matrix from the earth frame to the velocity frame. These matrices are given in [53]:
T v b = c α c β s β s α c β c α s β c μ + s α s μ c β c μ s α s β c μ c α s μ c α s β s μ s α c μ c β s μ s α s β s μ + c α c μ
T v e = c χ c γ s χ c γ s γ s χ c χ 0 c χ s γ s χ s γ c γ
The dynamic of [ p q r ] is:
p ˙ q ˙ r ˙ = J 1 ( M a p q r × J p q r )
where J is rotary inertia, defined as:
J = I x x 0 I x z 0 I y y 0 I z x 0 I z z
and M a = [ l m n ] is the aerodynamic moment, defined by:
M a = l m n = q ¯ S b · C l c ¯ · C m b · C n = q ¯ S b · [ ( C l , b a s e ( α , β , V ) + i = 1 j C l , i ( α , β , δ ) ) ] c ¯ · [ ( C m , b a s e ( α , β , V ) + i = 1 j C m , i ( α , β , δ ) ) ] b · [ ( C n , b a s e ( α , β , V ) + i = 1 j C n , i ( α , β , δ ) ) ]
where the q ¯ is the dynamic pressure, b is span, c ¯ is the mean aerodynamic chord, δ δ max represents the deflection of effectors, δ max is the deflection range of effectors [3], C · , b a s e ( α , β , V ) and i = 1 j C · , i ( α , β , δ ) are the aerodynamic coefficients generated by the body and control surfaces.
The coordinate system involved in the above kinetic equations is shown in Figure 2. Equation (1) is defined in the tangent-plane coordinate system, which is aligned as a geographic system but has its origin fixed at a point of interest on the spheroid; Equation (2) is defined in the wind-axes system, and Equation (5) is defined in the body-fixed coordinate system. The relationship between the wind-axes system and the body-fixed coordinate system is shown in Figure 1. The origin of both is at the UAV’s center of gravity, but the X-axis of the body-fixed coordinate system points in the direction of the nose, and the X-axis of the wind-axes system points in the direction of the relative wind [53].
To facilitate the design of the attitude tracker, we first construct a control-oriented model that considers disturbances and model uncertainty caused by inaccurate aerodynamic parameters. Define x 1 = [ μ α β ] , x 2 = [ p q r ] , then
x ˙ 1 = f 1 ( x 1 ) + g 1 ( x 1 ) x 2 x ˙ 2 = f 2 ( x 2 ) + g 2 ( x 1 , δ ) + d t
where
f 1 ( x 1 ) = c α c β 0 s α s β 1 0 s α c β 0 c α 1 · T b v T χ ˙ s γ γ ˙ χ ˙ c γ
g 1 ( x 1 ) = c α c β 0 s α s β 1 0 s α c β 0 c α 1
f 2 ( x 2 ) = J 1 x 2 × J x 2
g 2 ( x 1 , δ ) = J 1 M a g 2 , e ( x 1 , δ )
d t = g 2 , e ( x 1 , δ ) + d
where d t stand for total uncertainty, g 2 , e represents the control effectiveness that aerodynamic data fail to curve, and d represents the external disturbances.

3. Problem Formulation

The control structure is shown in Figure 3, donating · c as the command signal. The control system consists of two parts, attitude control using NDI and angular rate control using ADP. The attitude control is to give the proper angular command so that the UAV can track the attitude command, assuming that the derivative of the attitude command is known, and gives the angular rate command as [44]:
x 2 c = g 1 1 ( x 1 ) · ( x ˙ 1 d f 1 ( x 1 ) )
where x 2 c = [ p c , q c , r c ] T .
The goal of angular rate control is to give effectors’ deflection so that the angular rate command can be tracked optimally according to the performance function. First, we constructed an augmented system:
x ˙ 2 = f 2 ( x 2 ) + g 2 ( x 1 , δ )
x ˙ 2 d = λ ( x 2 c x 2 )
where λ is positively defined, and x 2 d represents the desired angular rate.
Define X ( t ) = [ x 2 T , x 2 d T ] T , and rewrite the augmented system in the compact form:
X ˙ = F ( X ) + G ( X , δ ) + C + D
where F ( X ) = f 2 ( x 2 ) λ x 2 , G ( X ) = g 2 ( x 1 , δ ) 0 3 × 1 , C = 0 3 × 1 λ x 2 c , D = d t 0 3 × 1 .
Remark 2.
From Equation (14), it could be known that it is difficult to obtain the dynamic of x 2 c because of its dependence on the second order derivative of x 1 d . In many control methods, such as incremental control methods [14], x ˙ 1 c is readily available, but x ˙ 2 c is usually obtained through digital differentiation, which is not only sensitive to noise but also increases the computational complexity. Therefore, in ADP, the dynamic of x 2 d is constructed using NDI, and the use of x ¨ 1 d could be avoided in this way.
To facilitate the following analysis, the following assumptions are given:
Assumption 1.
The total model uncertainty is bounded, there exist D max > 0 , let D < D max .
Assumption 2.
g 2 ( x 1 , δ ) δ R 3 × 11 is line nonsingular, i.e., for D < D max , there exist | Δ δ | < δ ¯ , δ ¯ > 0 such that g 2 ( x 1 , δ ) δ Δ δ = D .
Remark 3.
From Equation (8), it is obvious that the total model uncertainties can be equated to time-varying aerodynamic moment disturbances. To put it bluntly, Assumption 2 means that such disturbances are contained in the attainable moments set of ICE, and considering the redundant effectors that ICE is equipped with, its attainable moments set [54] has already been greatly expanded. From Equation (13), it also could be found that g 2 ( x 1 , δ ) could only reflect the aerodynamic characteristic of ICE to a certain degree. Except for the accuracy loss in the wind-tunnel test, the raw aerodynamic data also must be well-tailored to make it more suitable for flight control design, and the aerodynamic that g 2 ( x 1 , δ ) failed to reflect are seen as model uncertainties. Therefore, it is reasonable to say that g 2 ( x 1 , δ ) has the properties stated in Assumption 2.
Specifically, the angular rate control is to ensure the x 2 tracks the x 2 c by minimizing the following performance function. Consider that when the command signal does not converge to 0, the control input also tends not to be 0. To ensure the boundness of the performance function, the following discounted performance function is introduced:
J ( X , δ ) = t e υ ( τ t ) ( δ ¯ T R δ ¯ + e 2 T Q e 2 + δ T R δ ) d τ
where e 2 = x 2 x 2 d , Q and R are positively defined matrices, υ > 0 is the discounted factor. Compared to the conventional performance function, the upper bound of model uncertainties δ ¯ is introduced into the performance function.
Remark 4.
As a tried-and-tested control algorithm, NDI has been used in flight control for decades. In NDI, by feeding back the error between x 2 and x 2 d , the tracking of x 2 c could be achieved. Therefore, in the above performance function design, we also use the feedback of x 2 d instead of x 2 c .
Since the exact value of model uncertainties is unavailable, only the optimal tracker for the following nominal system that excludes the model uncertainties can be obtained:
X ˙ = F ( X ) + G ( X , δ ) + C
However, the optimal tracker designed for the nominal system using the performance function in Equation (18) is capable of handling model uncertainties. This point will be elaborated on later, and the optimal tracker for the nominal system is derived as follows:
Define Q T = [ I 3 × 3 , I 3 × 3 ] T Q [ I 3 × 3 , I 3 × 3 ] , then the discounted performance function could be modified as:
J ( X , δ ) = t e υ ( τ t ) ( δ ¯ T R δ ¯ + X T Q T X + δ T R δ ) d τ
Then define the optimal value function:
V ( X ) = min J ( X , δ )
Remark 5.
It is worth noting that the constraints imposed by the effector are not taken into account when solving the optimal control problem. However, it is possible to effectively prevent effector saturation by adjusting the weights in the performance function. This approach is widely adopted in the solution of optimization problems.
Differentiating Equation (20) and noting Equation (21) give the following Hamilton–Jacobi–Bellman (HJB) equation:
H ( V , δ ) δ ¯ T R δ ¯ + X T Q T X + δ T R δ υ V + V X T ( F ( X ) + G ( X , δ ) + C ) = 0
where V X = V X . Applying stationarity condition H ( V , δ ) / δ = 0 , then we have the optimal tracker:
δ * = 1 2 R 1 G δ T V X
where G δ = G ( X , δ ) δ . If the HJB equation is solved, the optimal control can be obtained. Therefore, the main issue of this paper is to solve the HJB equation using ADP.
Remark 6.
From Equation (23), it can be found that the only difference between the proposed optimal control for multi-input nonaffine systems and traditional optimal control lies in the use of G δ T , which is obtained through digital differentiation of aerodynamic data. In the theoretical study of the optimal control of nonaffine systems [49,50,55,56,57], the nonaffine part of the system is usually treated as completely unknown. Therefore, complex model identification methods are needed in these studies. However, for realistic control systems, even if the accurate analytical model is not available, the data-based model can be built by observing the input and output of the system. With the improvement of wind-tunnel tests for UAV modeling, the aerodynamic data we obtain is more accurate than ever. The aforementioned theoretical studies, however, have not made sufficient use of these data. This is undoubtedly a huge waste. In this paper, the wind-tunnel data are used to assist ADP. By introducing D max into the performance function, the ADP shows robustness against uncertainties in wind-tunnel data. Compared with online model identification, many more computational resources could be saved in this way.
The following lemma shows that with the performance function Equation (20), the optimal tracker in Equation (23) shows robustness against model uncertainties.
Lemma 1.
Assume that the optimal control δ * of the nominal system (19) with performance function (20) exist, δ * could also make the system with model uncertainties (17) asymptotically stable.
Proof. 
To facilitate the proof, an auxiliary system is proposed:
X ˙ = F ( X ) + G ( X , δ ) + C + D a ˙ = 1 2 υ a
It is obvious that the subsystem a ˙ = 1 2 υ a is asymptotically stable. Therefore, as long as the auxiliary system is proven to be asymptotically stable, Lemma 1 holds. Consider the optimal value function in Equation (21), for all X 0 , V > 0 and V = 0 only when X = 0 . Therefore, choose the Lyapunov function as V R = a 2 V , and the time derivation of V R is:
V ˙ R = a 2 V ˙ a 2 υ V = a 2 [ V X T ( F ( X ) + G ( X , δ ) + C ) + V X T D υ V ]
according to the HJB equation and stationarity condition, we have:
V ˙ R = a 2 [ δ ¯ T R δ ¯ X T Q T X δ T R δ + υ V 2 δ T R 1 2 · R 1 2 G δ + D υ V ]
where the G δ + is the Moore-Penrose inverse of G δ . According to Assumption 2 G δ + D = Δ δ , such that:
V ˙ R = a 2 [ δ ¯ T R δ ¯ + Δ δ T R Δ δ X T Q T X δ T R δ 2 δ T R 1 2 · R 1 2 Δ δ Δ δ T R Δ δ ] = a 2 [ δ ¯ T R δ ¯ + Δ δ T R Δ δ X T Q T X ( R 1 2 δ + R 1 2 Δ δ ) T ( R 1 2 δ + R 1 2 Δ δ ) ]
according to Assumption 2, | Δ δ | < δ ¯ and R is positively definite, δ ¯ T R δ ¯ + Δ δ T R Δ δ < 0 . Therefore, it is clear that V ˙ R is negative definite, the auxiliary system is asymptotically stable, and Lemma 1 is proven.

4. Main Result

As mentioned above, the main issue of this paper is solving the HJB equation and obtaining the optimal value function. The following single-layer neural network(NN) is applied to approximate the optimal value function:
V ( X ) = W T Ψ ( X ) + ς
where Ψ ( X ) R l is the activation function vector, l is the number of neurons, ς is the approximation error, and the derivative to state of V ( X ) is:
V X = Ψ W + ς
where W R l is unknown ideal weights, Ψ = Ψ T X , ς = ς X .
Suppose that the optimal value function is continuous and is defined on a bounded closed interval, according to the Weierstrass approximation theorem [58]. As l increases, the optimal value function can be uniformly approximated by the NN with arbitrarily high preciseness, which means that ς and ς can be arbitrarily small. In practice, we use a critical NN to approximate the optimal value function:
V ^ ( X ) = W ^ T Ψ ( X )
where W ^ is the estimation of unknown weights W . Then the near-optimal control δ ^ can be obtained:
δ ^ = 1 2 R 1 G δ T Ψ W ^
In this paper, ADP updates the critical NN online using sampling data. As more sampling data ADP obtains, W ^ would approach W gradually. The rest of this section contains two parts. The first part introduces the update law of critical NN. The second part is stability analysis, in which we will discuss why the W ^ could approach W ^ and why the system stays stable.

4.1. Update Law for Critical NN

The goal of the update scheme is to minimize W ˜ = W W ^ the estimation error of unknown NN weights. To derive the update law, substitute Equations (26) and (27) into HJB Equation (22), we have:
0 = H ( V , δ ) = δ ¯ T R δ ¯ + X T Q T X + δ T R δ υ ( W T Ψ ( X ) + ς ) + ( W T Ψ T + ς T ) ( F ( X ) + G ( X , δ ) + C ) = Λ + W ^ T Υ + W ˜ T Υ + ς H
where
Λ = δ ¯ T R δ ¯ + X T Q T X + δ T R δ
Υ = υ Ψ ( X ) + Ψ T ( F ( X ) + G ( X , δ ) + C )
ς H = υ ς + ς T ( F ( X ) + G ( X , δ ) + C )
From Equation (30), both the estimation error W ˜ and the weights of critical NN W ^ appear linearly. As discussed in Section 2, ς H is bounded and tends to be zero as l increases. Moreover, the rest of variables Υ , Λ are accessible. This allows us to design the update law that minimizes the estimation error. Therefore, we define the following filter:
P ˙ = k P + Υ Υ T , P ( 0 ) = 0 Q ˙ = k Q + Υ Λ , Q ( 0 ) = 0
where k R + . The solution of the filter is:
P = 0 t e k ( t τ ) Υ Υ T d τ Q = 0 t e k ( t τ ) Υ Λ d τ
According to Equations (30) and (35), we have:
0 = Q + P W ^ + P W ˜ + μ
where μ = 0 t e k ( t τ ) Υ ς H d τ . According to the above analysis, with sufficient neurons, ς 0 and ς 0 . Therefore, it is reasonable to find that μ is bounded, i.e., | μ | < μ ¯ , where μ ¯ R + .
Define an auxiliary vector M R l :
M = Q + P W ^
According to Equation (36), M could be seen as the estimation error of NN weights:
M = P W ˜ μ
Hence, we could obtain the online update law of W ^ as:
W ^ ˙ = K M
where K is constant positively defined matrix.
The update law in this paper is fundamentally different from some of the current methods [41,42] that employ the gradient-based algorithm to minimize the bellman error and only could ensure ultimately uniform boundlessness (UUB). The update law used here intends to use measurable state values and the weights of critical NN to represent the unknown estimation error of NN weights. In this way, the estimation error of NN weights can be ensured to be asymptotically convergent, and good convergence makes this update rate more suitable for online control systems. In what follows, we will show that W ^ could converge to the domain of W .
Remark 7.
The idea of this paper is to use the optimal tracker of nominal systems with modified performance functions to handle the systems with disturbances and model uncertainties. This idea can be well applied in linear systems [29] since the optimal control for linear nominal systems is easy to obtain by solving the Recatii equation. For the nonlinear system, however, it needs to solve the HJB equation to obtain the optimal control. For algorithms such as ADP that solve the HJB equation online, it gives the approximate solution of the HJB equation based on the sampled data. Still, for the systems that suffer from disturbance, it is impossible to measure the state value of the nominal system. According to Equations (31), (32) and (34), the information that update law uses include the quadratic function of effector deflection δ T R δ , the nominal system dynamic F ( X ) + G ( X , δ ) + C , and the quadratic function of state value X T Q T X . The δ T R δ and F ( X ) + G ( X , δ ) + C are directly accessible. Only the X T Q T X is affected by disturbances and may influence the update law.
To address this problem, the filter system in Equation (34) is used here. Instead of using these values directly, the values used by the update law are processed by the filter system. In this way, the influence of disturbance on X T Q T X could be mitigated to a certain extent and make the update law more applicable. To further illustrate this opinion, the following example is introduced:
For the system with disturbance:
x ˙ 1 = x 1 + x 2 + d x ˙ 2 = x 3 x ˙ 3 = x 2
where d is the disturbance.
Figure 4 shows the unfiltered value of x 1 2 , and the value of x 1 2 filtered by 1 0.8 s + 1 . The real system is affected by disturbance, while the nominal system is not. From Figure 4a, there is a distinct difference between the unfiltered x 1 2 of the real system and the nominal system. But by filtering the two signals, the difference between the two becomes significantly smaller, as shown in Figure 4b. By choosing the parameters of the filter wisely, the filtered x 1 2 of the real system could approximate the x 1 2 of the nominal system pretty well.
It should be noted that since the filter used in this paper is only the simplest first-order low-pass filter, this method could achieve quite a good result when the frequency of disturbances is higher than the frequency of state. Considering that high-frequency disturbance is a common kind of disturbance in control systems, the applicability of this method is acceptable. To cope with more complex situations, more targeted filters can be used according to the knowledge of the disturbance.

4.2. Stability Analysis

To prove that our approach can solve the optimal control online while making the tracking error converge, the following stability analysis is performed. Accordingly, modify the nominal system (24) in the following closed-loop form:
X ˙ = F ( X ) + G ( X , δ * ) + C + G ( X , δ ^ ) G ( X , δ * ) a ˙ = 1 2 υ a
Assumption 3.
There exist constants f ¯ , g ¯ R + such that F ( X ) f ¯ X , G ( X , δ ) g ¯ X .
Assumption 4.
The G ( X , δ ) is Lipschitz continuous with respect to δ, i.e., G ( X , δ 1 ) G ( X , δ 2 ) L δ 1 δ 2 , where L R + .
Assumption 5.
The command signal is bounded, i.e., there exists a constant c ¯ R + such that C c ¯ .
Assumption 6.
The activation function vector Ψ ( X ) and its derivative Ψ is bounded.
Lemma 2.
For the PE condition is satisfied for the regressor Υ of NN, the optimal control and the update law could stabilize the tracking error of the nominal system (19), and the near-optimal control converges to a bounded neighborhood around optimal control, i.e., X T Q T X ξ X and δ ^ δ * ξ δ , where ξ X , ξ δ R + .
Proof. 
The Lyapunov function is constructed as:
J = J 1 + J 2
where J 1 = 1 2 W ˜ T K 1 W ˜ , J 2 = ζ 1 X T Q T X + ζ 2 a 2 V .
According to the Lemma 1 in [37], if the PE condition is satisfied for the regressor Υ of NN, then P is positively defined, and since the inequality 2 a b η a 2 + b 2 η , the derivative of J 1 is:
J ˙ 1 = W ˜ T K W ˜ + W ˜ T μ ( λ m i n ( P ) η 1 ) W ˜ 2 + 1 η 1 μ ( λ m i n ( P ) η 1 ) W ˜ 2 + 1 η 1 μ ¯ 2
where λ m i n ( P ) represents minimum eigenvalue of P , and η 1 R + . According to Equations (22) and (24), the derivative of J 2 is:
J ˙ 2 = 2 ζ 1 X T Q T X ˙ υ a 2 V + a 2 V ˙ = 2 ζ 1 X T Q T [ F ( X ) + G ( X , δ * ) + C + G ( X , δ ^ ) G ( X , δ * ) ] υ a 2 V + a 2 ( δ ¯ T R δ ¯ X T Q T X δ T R δ + υ V ) = 2 ζ 1 X T Q T F ( X ) + 2 ζ 1 X T Q T G ( X , δ * ) + 2 ζ 1 X T Q T C + 2 ζ 1 X T Q T ( G ( X , δ ^ ) G ( X , δ * ) ) ζ 2 a 2 δ ¯ T R δ ¯ ζ 2 a 2 X T Q T X ζ 2 a 2 δ T R δ ζ 1 ( η 2 X T Q T 2 + 1 η 2 F ( X ) 2 ) + ζ 1 ( η 3 X T Q T 2 + 1 η 3 G ( X , δ * ) 2 ) + ζ 1 ( η 4 X T Q T 2 + 1 η 4 C 2 ) + ζ 1 ( η 5 X T Q T 2 + 1 η 5 ( G ( X , δ * ) G ( X , δ ^ ) ) 2 ) ζ 2 a 2 δ ¯ T R δ ¯ ζ 2 a 2 λ m i n ( Q T ) X 2 ζ 2 a 2 λ m i n ( R ) δ * 2
According to Assumptions 3–5, we have:
J ˙ 2 [ ( ζ 1 η 2 + ζ 1 η 3 + ζ 1 η 4 + ζ 1 η 5 ) λ m a x 2 ( Q T ) + ζ 1 η 2 f ¯ 2 + ζ 1 η 3 g ¯ 2 ζ 2 a 2 λ m i n ( Q T ) ] X 2 + ζ 1 η 5 L 2 b w W ˜ 2 + ζ 1 η 5 L 2 b ς ς 2 ζ 2 a 2 δ ¯ T R δ ¯ ζ 2 a 2 λ m i n ( R ) δ * 2
where b w = 1 2 R 1 G δ T Ψ , b ς = 1 2 R 1 G δ T according to Assumptions 4 and 6, b w and b ς are bounded.
Therefore, the derivative of J is:
J ˙ = J ˙ 1 + J ˙ 2 1 X 2 W ˜ 3 δ * + 4
where
1 = ( ζ 1 η 2 + ζ 1 η 3 + ζ 1 η 4 + ζ 1 η 5 ) λ m a x 2 ( Q T ) ζ 1 η 2 f ¯ 2 ζ 1 η 3 g ¯ 2 + ζ 2 a 2 λ m i n ( Q T ) 2 = λ m i n ( P ) η 1 ζ 1 η 5 L 2 b w 3 = ζ 2 a 2 λ m i n ( R ) 4 = + 1 η 1 μ ¯ 2 + ζ 1 η 5 L 2 b ς ς 2 ζ 2 a 2 δ ¯ T R δ ¯
By designing the parameters wisely, it could be ensured that B 1 , B 2 , B 3 > 0 . B 4 is mainly influenced by critical NN’s estimation error, which would converge to zero as the number of neurons increases.
According to Equation (47), if the following inequalities hold, J ˙ 2 would be negative defined:
1 X + 4 < 0 2 W ˜ + 4 < 0 3 δ * + 4 < 0
Therefore, according to Lyapunov theory, the closed-loop system is stable, and the weights error of NN W ˜ is bounded, consider the estimation error of critical NN is also bounded, the near-optimal control will converge to a bounded neighborhood of optimal control. □

5. Simulation Verification

This section presents two representative simulations to illustrate the effectiveness of the ADP-based integrated-control-and-control-allocation scheme. The simulations are conducted using fixed-step ode4(Runge–Kutta) solver. The fixed-step size is 0.01 s. The block diagram of the 6-DOF UAV simulation model can be found in Figure 10 of the report given by Niestroy [7]. Simulation 1 compares our control scheme with conventional incremental dynamic inversion and pseudo-inverse control allocation(INDIPI) to verify the optimality of our method. Simulation 2 aims to test the robustness of the proposed control scheme. It is assumed that the leading-edge actuators are represented by the transfer function ( 18 ) ( 100 ) ( ( s + 18 ) ( s + 100 ) ) while all the other actuators, including thrust vectoring, as ( 40 ) ( 100 ) ( ( s + 40 ) ( s + 100 ) ) .
The initial condition of the UAV is V ( 0 ) = 1240 ft/s, χ ( 0 ) = 0 , γ ( 0 ) = 0 , β ( 0 ) = 0.0196 deg, α ( 0 ) = 3.759 deg, μ ( 0 ) = 0.304 deg, p ( 0 ) = 0 , q ( 0 ) = 0 , r ( 0 ) = 0 , and the initial height is H ( 0 ) = 10,000 ft. The angle of attack command α c is generated by a band-limited white noise pass the second order filter 0.5 5 s 2 + 2 s + 0.5 , the bank angle signal command μ c = 6 sin ( 0.5 t ) , and the sideslip angle command β c = 0 .
The control parameters are set as λ = 3 · I 3 × 3 , R = I 11 × 11 , Q = 5 · I 3 × 3 , K = 500 · I 7 × 7 , υ = 1 . Define e 1 = p p c , e 2 = q q c , e 3 = r r c , then the activation function vector are designed as Ψ ( X ) = [ e 1 2 , e 2 2 , e 3 2 , e 1 2 e 2 , e 1 e 2 e 3 , e 3 2 e 2 , e 2 2 e 3 ] T , with initial weights W ^ ( 0 ) = [ 1000 , 1000 , 1500 , 0 , 0 , 0 , 0 ] T .
Remark 8.
We designed the activation function vector as above because it would be easier to find the initial admissible control policy [59]. Clearly, there exists e 1 2 p = 2 e 1 . In this sense, the initial control is equivalent to proportional control. Compared with dealing with complex nonlinear feedback, finding an admissible proportional control law is much easier.
Model uncertainty exists in both simulations. As mentioned above, model uncertainty mainly comes from inaccurate aerodynamic data, which are used to obtain G δ . Since the aerodynamic data provided by Niestroy is a series of discrete points, it takes interpolation so that these data are of practical use. In both simulations, different interpolation methods are used for controller design and model construction to simulate that controller cannot access accurate aerodynamic data. Specifically, the cubic spline is applied for model construction, and linear interpolation is used in controller design.
Remark 9.
The cubic spline is applied for model construction because the actual aerodynamic data should be continuous and smooth. Meanwhile, using linear interpolation in control could save online computational load. As mentioned in [14], different interpolation methods can cause errors of up to 30%. Take the aerodynamic data of a set of all-moving wingtips as an example, as shown in Figure 5, from which it can be founded that the slopes of the tangents of linear interpolation and cubic spline, i.e., tan τ 2 and tan τ 1 , are different.

5.1. Simulation 1

In this simulation, the ADP-based control scheme is compared with INDIPI [60]. Specifically, our approach and INDIPI are to track the same attitude command, and the performance of both is judged according to the tracking performance, flight quality, and control input. Considering that inaccurate control effectiveness could cause INDIPI to lose stability and this simulation is not to compare the robustness of INDIPI and ADP-based method, INDIPI could obtain accurate model information in this simulation, and model uncertainties only influence our method. Set δ ¯ as 11-dimensional vector [ 5.6 , 5.6 , , 5.6 ] T . Please note that the same NDI scheme is applied in attitude control of the ADP-based method and INDIPI.
The result is shown in Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17. First, from Figure 17, ADP-based method shows good convergence. Moreover, the proposed method outperforms INDIPI in three ways: superior flight quality, intelligence, and better effector deflection pattern.
From Figure 9, Figure 10, Figure 11 and Figure 12, the flight quality under ADP control is better than that of INDIPI. From Figure 9, the p signal under ADP control is steadier. After the adjustment period before 5s, the p signal under ADP control keeps steady, while the p signal fluctuates under INDIPI control, such as around 38 s and 44 s. From Figure 10, q chattered all the time under the control of INDIPI, such chattering also can be found in effector deflection as shown in Figure 16, and this can cause fatigue of the effector, which is very dangerous in reality, while the effector deflection under ADP control is more fluent. From Figure 11, it can be found that there is a sudden change in r signal under INDIPI control at 20 s, 29 s, 36 s, and 44 s. It also could be seen that the r signal under ADP control also appears to fluctuate, but, differently from the sudden change under INDIPI control exhibited all the time, it can be found that the fluctuation under ADP control is becoming lighter.
According to the above description, the control performance of our method is better compared to INDIPI. However, our method goes far beyond that. With the help of ADP, our method shows intelligence, i.e., it could improve its policy according to its experience.
The specific manifestations of the ADP-based method’s intelligence are p signal under ADP control only fluctuates once around 5 s. After that, it is always very smooth, and, compared with the sudden change in r under INDIPI control exhibited all the time, the fluctuation of r signal under ADP control becomes lighter as the control system runs.
A more extreme example is introduced to further illustrate the intelligence of ADP, as shown in Figure 14, which shows the p signal under the proportional control that adopts the initial weights of critical NN. Comparing Figure 9 and Figure 14 it can be found that no matter whether under ADP control or proportional control, fluctuation occurred around 5 s. However, such fluctuation only occurs once under ADP control; for proportional control, such fluctuation occurs repeatedly and eventually leads to losing control. From Figure 17, it can be seen that the critical NN weights undergo a large adjustment at 5 s, after which there is no more fluctuation similar to at 5 s. ADP could learn from such fluctuation; therefore, the subsequent policy is more suitable for flight control with a broader flight envelope.
From Figure 15 and Figure 16, the effector deflection generated by INDIPI and ADP shows different patterns. In Figure 16, it can be seen that only three effectors participate in the process under INDIPI control, even though some effectors appear to be saturated. However, for the ADP, under the modulation of the performance function, more effectors participate in the control process, and the effector deflection amplitude is significantly smaller than that of the INDIPI. From Figure 13, it also can be seen that the weighted quadratic sum of effector deflection given by ADP is much smaller than INDIPI.
Overall, integrated-design ADP’s performance is better than conventional INDIPI’s. Compared with INDIPI, ADP allows for a trade-off between tracking performance and effector deflection. The performance function dominates such a trade-off, so ADP would not waste too many resources to pursue tiny improvements in tracking performance. Coupling with its learning mechanism, ADP can achieve the same tracking performance as INDIPI in an optimal manner.

5.2. Simulation 2

Simulation 2 discusses the robustness of the proposed method and the effect of δ ¯ . The UAV suffers from aforementioned model uncertainties and external disturbances d = [ 0.06 sin ( 20 t ) , 0.04 cos ( 20 t ) , 0.03 sin ( 20 t ) ] T . The UAV is to follow the same attitude command as Simulation 1 and set δ ¯ as [ 17.8 , 17.8 , , 17.8 ] T , the result is shown in Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25.
With the help of ADP, the tracking performance of our method in the presence of external disturbances is unaffected, as shown in Figure 18, Figure 19 and Figure 20. However, some small chattering can be observed in the angular rate signal, as depicted in Figure 21, Figure 22 and Figure 23, which is typical for a UAV subject to external disturbance. Nevertheless, this does not compromise the stability of the closed-loop system. Figure 24 demonstrates that more effectors are involved in controlling external disturbances. Most importantly, the convergence of critical NN weights remains satisfactory, as demonstrated by Figure 25.
From the stability analysis, it can be found that our method’s robustness comes from δ ¯ . In the following, the performance of different δ ¯ is tested.
We begin by testing δ ¯ = [ 5.6 , 5.6 , , 5.6 ] T . Due to space limitations, we only present the convergence of the critical NN weights in Figure 26. As shown in Figure 26, the convergence of the critical NN in this result is initially similar to Simulation 1. However, the critical NN weights do not ultimately converge due to external disturbances.
From a theoretical perspective, Lemma 1 can explain the non-convergence of the algorithm, as the closed-loop system’s stability can only be guaranteed when δ ¯ is of sufficient magnitude.
From the other point of view, if the UAV experiences intense external disturbances, the initial sampled data may not provide enough information for ADP to update critical NN weights. This is particularly true when the UAV is required to track random commands, as the old critical neural network weights may not be equipped to handle new, unforeseen scenarios. As a result, the NN weights may take longer to converge, making it difficult to maintain control of the UAV. In this sense, δ ¯ T R δ ¯ not only acts as a means of compensation for external disturbances that may initially affect performance but can also be seen as an estimation of the potential impact of such disturbances on performance function. This helps the ADP better understand the current situation, allowing the weights of the critical NN to converge more quickly to a stable value.
Unlike the affine systems, where the upper bound on the effect of external disturbances on the performance function is easily ascertained [29], for the nonaffine system, the design of δ ¯ is more rely on the experience. Still, considering that δ ¯ has the actual physical definition, it would not be too hard to find a proper δ ¯ .
The above result shows that δ ¯ must exceed the upper limit of external disturbance effects so that ADP can show robustness.
However, δ ¯ should not exceed a reasonable value either. To illustrate this point, we conducted a convergence test of the critical neural network with δ ¯ = [ 56.4 , 56.4 , , 56.4 ] T , which is very large. The result is depicted in Figure 27. It can be observed that the system experiences a significant shift in critical NN weights, leading to a collapse within 3 s. From the analysis of Equation (34), having a too large δ ¯ can result in P not being positively definite. More bluntly, too large δ ¯ could dominate the dynamics of critical neural network weights, causing the ignored of sampled data that aid in policy improvement.
In conclusion, Simulation 2 demonstrates that our method can effectively withstand model uncertainties and external disturbances, given the appropriate selection of δ ¯ .

6. Conclusions and Outlook

The proposed method uses ADP to integrate control and control allocation, resulting in superior performance compared to conventional methods. Without using any model identification techniques, the ADP-based method exhibits strong convergence and robustness in the face of external disturbance and model uncertainty. Additionally, it presents a novel approach to flight control for over-actuated UAVs with nonaffine control inputs. From a control-theory perspective, the paper presents a straightforward yet efficient optimal tracking method for nonaffine systems, with theoretical evidence verifying its robustness. Specifically, this study has two key advantages in comparison to existing research. First, our method achieves better performance than traditional control architectures that separate control and control allocation by using a more optimized approach. Second, unlike many current optimal controllers for nonaffine systems, our method remains robust and does not depend on any model identifiers.
The proposed method has certain limitations that require attention. First, this method is only aimed at the cruise stage, as the nonlinear characteristics of the aircraft during this phase are not as prominent, and the optimal value function is relatively simple and can be well-fitted by a polynomial network. However, if large maneuvering flight is required, a more complex network structure needs to be introduced. This inevitably requires an improvement in the weight update rate to ensure system stability. Second, selecting the initial value for the critical network can be challenging when a complex network is used since the convergence of this method relies on the proper choice of the network’s initial value. Thirdly, the design of the δ ¯ is still heavily reliant on empirical knowledge. As demonstrated in the simulation section, a δ ¯ that is too small may weaken robustness, while too large δ ¯ may harm the closed-loop stability. Lastly, there is a dearth of real-world validation of this method. The external perturbations applied in the simulation offer only a limited exhibition of robustness and stability since the external interferences experienced by a UAV, in reality, are much more complex.
The next-step studies should focus on the following aspects: First, more complex neural networks can be introduced to further approximate the value function and handle more complex situations. Second, it would be very worthy work to introduce some intelligent algorithms to help design the performance function. Thirdly, only linear filters, as shown in Equation (34), are used in this paper, making our method better when facing high-frequency disturbances. In future studies, more advanced filters could be introduced to improve the performance of the ADP-based method when facing various disturbances. More importantly, it would be very expected that the performance of our method can be validated in real flight experiments.

Author Contributions

Conceptualization, Z.H. and Y.W.; Methodology, J.H.; Software, Z.H.; Validation, Y.B. and J.C.; Resources, J.H.; Data curation, Y.W.; Writing—original draft preparation, Z.H.; Writing—review and editing, Y.B. and L.H.; Visualization, Z.H.; Supervision, J.H.; Project administration, Z.H.; Funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant number 62103439], China Postdoctoral Science Foundation [grant number 2020M683716], and the Natural Science Basic Research Program of Shaanxi Province [grant number 2021JQ-364].

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dorsett, K.M.; Fears, S.P.; Houlden, H.P. Innovative Control Effectors (ICE) Phase III; Final Report for March 1996-July 1997, WL-TR-3059, August 1997; Wright Laboratory, Air Force Material Command: Dayton, OH, USA, 1997. [Google Scholar]
  2. Dorsett, K.M.; Mehl, D. Innovative Control Effectors (ICE); Technical Report ADB212813; Lock-heed Martin, Wright Laboratory: Dayton, OH, USA, 1996. [Google Scholar]
  3. Stolk, A. Minimum Drag Control Allocation for the Innovative Control Effector Aircraft. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2017; p. 142. [Google Scholar]
  4. Yao, L.; You, S.; Xiaodong, L.; Bo, G. Control allocation for a class of morphing aircraft with integer constraints based on Levy flight. J. Syst. Eng. Electron. 2020, 31, 826–840. [Google Scholar] [CrossRef]
  5. Dong, C.; Lu, Y.; Wang, Q. Tracking Control Based on Control Allocation with an Innovative Control Effector Aircraft Application. Math. Probl. Eng. 2016, 2016, 5037678. [Google Scholar] [CrossRef]
  6. Weilai, J.; Chaoyang, D.; Tong, W.; Qing, W. Fault tolerant control based on control allocation for morphing aircraft model. J. Beijing Univ. Aeronaut. Astronaut. 2014, 40, 355. [Google Scholar]
  7. Niestroy, M.A.; Dorsett, K.M.; Markstein, K. A tailless fighter aircraft model for control-related research and development. In Proceedings of the AIAA Modeling and Simulation Technologies Conference, Grapevine, TX, USA, 9–13 January 2017. [Google Scholar] [CrossRef]
  8. He, Z.; Hu, J.; Wang, Y.; Cong, J.; Han, L.; Su, M. Sample entropy based prescribed performance control for tailless aircraft. ISA Trans. 2022, 131, 349–366. [Google Scholar] [CrossRef] [PubMed]
  9. Wu, L.; Park, J.H.; Xie, X.; Ren, Y.; Yang, Z. Distributed adaptive neural network consensus for a class of uncertain nonaffine nonlinear multi-agent systems. Nonlinear Dyn. 2020, 100, 1243–1255. [Google Scholar] [CrossRef]
  10. Wang, Y.; Hu, J.; Zheng, Y. Improved decentralized prescribed performance control for non-affine large-scale systems with uncertain actuator nonlinearity. J. Frankl. Inst. 2019, 356, 7091–7111. [Google Scholar] [CrossRef]
  11. Bechlioulis, C.P.; Rovithakis, G.A. A low-complexity global approximation-free control scheme with prescribed performance for unknown pure feedback systems. Automatica 2014, 50, 1217–1226. [Google Scholar] [CrossRef]
  12. Wang, Y.; Hu, J.; Wang, J.; Xing, X. Adaptive neural novel prescribed performance control for non-affine pure-feedback systems with input saturation. Nonlinear Dyn. 2018, 93, 1241–1259. [Google Scholar] [CrossRef]
  13. Wang, Y.; Hu, J.; Li, J.; Liu, B. Improved prescribed performance control for nonaffine pure-feedback systems with input saturation. Int. J. Robust Nonlinear Control 2019, 29, 1769–1788. [Google Scholar] [CrossRef]
  14. He, Z.; Hu, J.; Wang, Y.; Cong, J.; Han, L.; Su, M. Incremental Backstepping Sliding-Mode Trajectory Control for Tailless Aircraft with Stability Enhancer. Aerospace 2022, 9, 352. [Google Scholar] [CrossRef]
  15. Reiner, J.; Balas, G.J.; Garrard, W.L. Flight control design using robust dynamic inversion and time-scale separation. Automatica 1996, 32, 1493–1504. [Google Scholar] [CrossRef]
  16. Matamoros, I. Nonlinear Control Allocation for a High-Performance Tailless Aircraft with Innovative Control Effectors—An Incremental Robust Approach. Master’s Thesis, Delft University of Technology, Delft, The Netherlands, 2017; p. 107. [Google Scholar]
  17. Sun, L.; Zhou, Q.; Jia, B.; Tan, W.; Li, H. Effective control allocation using hierarchical multi-objective optimization for multi-phase flight. Chin. J. Aeronaut. 2020, 33, 2002–2013. [Google Scholar] [CrossRef]
  18. Hou, Y.; Lv, M.; Liang, X.; Yang, A. Fuzzy adaptive fixed-time fault-tolerant attitude tracking control for tailless flying wing aircrafts. Aerosp. Sci. Technol. 2022, 130, 107950. [Google Scholar] [CrossRef]
  19. Werbos, P.J. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research. IEEE Trans. Syst. Man. Cybern. 1987, 17, 7–20. [Google Scholar] [CrossRef]
  20. Liu, D.; Wei, Q. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 621–634. [Google Scholar] [CrossRef]
  21. Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof. IEEE Trans. Syst. Man. Cybern. Part B (Cybern.) 2008, 38, 943–949. [Google Scholar] [CrossRef]
  22. Si, J.; Wang, Y.T. Online learning control by association and reinforcement. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 27–27 July 2000; Volume 3, pp. 221–226. [Google Scholar]
  23. Sun, B.; van Kampen, E.J. Intelligent adaptive optimal control using incremental model-based global dual heuristic programming subject to partial observability. Appl. Soft Comput. 2021, 103, 107153. [Google Scholar] [CrossRef]
  24. Ye, J.; Bian, Y.; Xu, B.; Qin, Z.; Hu, M. Online Optimal Control of Discrete-Time Systems Based on Globalized Dual Heuristic Programming with Eligibility Traces. In Proceedings of the 2021 3rd International Conference on Industrial Artificial Intelligence (IAI), Shenyang, China, 8–11 November 2021; pp. 1–6. [Google Scholar]
  25. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive Dynamic Programming for Control: A Survey and Recent Advances. IEEE Trans. Syst. Man. Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
  26. El-Sousy, F.F.M.; Amin, M.M.; Al-Durra, A. Adaptive Optimal Tracking Control Via Actor-Critic-Identifier Based Adaptive Dynamic Programming for Permanent-Magnet Synchronous Motor Drive System. IEEE Trans. Ind. Appl. 2021, 57, 6577–6591. [Google Scholar] [CrossRef]
  27. Dong, B.; Zhou, F.; Liu, K.; chun Li, Y. Decentralized robust optimal control for modular robot manipulators via critic-identifier structure-based adaptive dynamic programming. Neural Comput. Appl. 2018, 32, 3441–3458. [Google Scholar] [CrossRef]
  28. Wang, Z.; Lee, J.; Sun, X.; Chai, Y.; Liu, Y. Self-Learning Optimal Control with Performance Analysis using Event-Triggered Adaptive Dynamic Programming. In Proceedings of the 5th International Conference on Crowd Science and Engineering, Jinan, China, 16–18 October 2021. [Google Scholar]
  29. Lin, F.Y. Robust Control Design: An Optimal Control Approach; John Wiley and Sons Ltd.: Chichester, UK, 2007. [Google Scholar]
  30. Wan, S.; Chang, X.; Li, Q.; Yan, J. Finite-Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming. Math. Probl. Eng. 2019, 2019, 8649781. [Google Scholar] [CrossRef]
  31. Nobleheart, W.; Lakshmikanth, G.S.; Chakravarthy, A.; Steck, J.E. Single Network Adaptive Critic (SNAC) Architecture for Optimal Tracking Control of a Morphing Aircraft during a Pull-up Maneuver. In Proceedings of the AIAA Guidance, Navigation, and Control (GNC) Conference, Boston, MA, USA, 19–22 August 2013. [Google Scholar]
  32. Sun, T.; Sun, X.; Sun, A.Y. Optimal Output Tracking of Aircraft Engine Systems: A Data-Driven Adaptive Performance Seeking Control. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1467–1471. [Google Scholar] [CrossRef]
  33. Du, C.; Li, F.; Yang, C.; Shi, Y.; Liao, L.; Gui, W. Multiphase-Based Optimal Slip Ratio Tracking Control of Aircraft Antiskid Braking System via Second-Order Sliding-Mode Approach. IEEE/ASME Trans. Mechatronics. 2022, 27, 823–833. [Google Scholar] [CrossRef]
  34. Zhou, Y.; van Kampen, E.J.; Chu, Q. Incremental model based online heuristic dynamic programming for nonlinear adaptive tracking control with partial observability. Aerosp. Sci. Technol. 2020, 105, 106013. [Google Scholar] [CrossRef]
  35. Mannava, A.; Balakrishnan, S.N.; Tang, L.; Landers, R.G. Optimal Tracking Control of Motion Systems. IEEE Trans. Control Syst. Technol. 2012, 20, 1548–1558. [Google Scholar] [CrossRef]
  36. Nodland, D.; Zargarzadeh, H.; Jagannathan, S. Neural Network-Based Optimal Adaptive Output Feedback Control of a Helicopter UAV. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1061–1073. [Google Scholar] [CrossRef]
  37. Na, J.; Herrmann, G. Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear. IEEE/CAA J. Autom. Sin. 2014, 1, 412–422. [Google Scholar]
  38. Wang, Q.; Gong, L.; Dong, C.; Zhong, K. Morphing aircraft control based on switched nonlinear systems and adaptive dynamic programming. Aerosp. Sci. Technol. 2019, 93, 105325. [Google Scholar] [CrossRef]
  39. Li, H.; Sun, L.; Tan, W.; Liu, X.; Dang, W. Incremental Dual Heuristic Dynamic Programming Based Hybrid Approach for Multi-Channel Control of Unstable Tailless Aircraft. IEEE Access 2022, 10, 31677–31691. [Google Scholar] [CrossRef]
  40. Xu, N.; Niu, B.; Wang, H.; Huo, X.; Zhao, X. Single-network ADP for solving optimal event-triggered tracking control problem of completely unknown nonlinear systems. Int. J. Intell. Syst. 2021, 36, 4795–4815. [Google Scholar] [CrossRef]
  41. Xue, S.; Luo, B.; Liu, D.; Gao, Y. Event-Triggered ADP for Tracking Control of Partially Unknown Constrained Uncertain Systems. IEEE Trans. Cybern. 2021, 52, 9001–9012. [Google Scholar] [CrossRef] [PubMed]
  42. Xue, S.; Luo, B.; Liu, D.; Gao, Y. Adaptive dynamic programming-based event-triggered optimal tracking control. Int. J. Robust Nonlinear Control 2021, 31, 7480–7497. [Google Scholar] [CrossRef]
  43. Zhao, J.; Na, J.; Gao, G. Neurocomputing Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming. Neurocomputing 2022, 471, 21–30. [Google Scholar] [CrossRef]
  44. Snell, S.A.; Enns, D.F.; Garrard, W.L. Nonlinear inversion flight control for a supermaneuverable aircraft. J. Guid. Control. Dyn. 1992, 15, 976–984. [Google Scholar] [CrossRef]
  45. Ding, L.; Li, S.; Gao, H.; Liu, Y.; Huang, L.; Deng, Z. Adaptive Neural Network-Based Finite-Time Online Optimal Tracking Control of the Nonlinear System With Dead Zone. IEEE Trans. Cybern. 2021, 51, 382–392. [Google Scholar] [CrossRef]
  46. Wang, N.; Gao, Y.; Zhao, H.; Ahn, C.K. Reinforcement Learning-Based Optimal Tracking Control of an Unknown Unmanned Surface Vehicle. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3034–3045. [Google Scholar] [CrossRef]
  47. Liu, Y.; Li, S.; Tong, S.; Chen, C.L.P. Adaptive Reinforcement Learning Control Based on Neural Approximation for Nonlinear Discrete-Time Systems with Unknown Nonaffine Dead-Zone Input. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 295–305. [Google Scholar] [CrossRef]
  48. Farzanegan, B.; Suratgar, A.A.; Menhaj, M.B.; Zamani, M. Distributed optimal control for continuous-time nonaffine nonlinear interconnected systems. Int. J. Control 2021, 95, 3462–3476. [Google Scholar] [CrossRef]
  49. Duan, J.; Liu, Z.; Li, S.E.; Sun, Q.; Jia, Z.; Cheng, B. Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. Neurocomputing 2022, 484, 128–141. [Google Scholar] [CrossRef]
  50. Ha, M.; Wang, D.; Liu, D. Data-based nonaffine optimal tracking control using iterative DHP approach. IFAC-PapersOnLine 2020, 53, 4246–4251. [Google Scholar] [CrossRef]
  51. Bodson, M. Evaluation of optimization methods for control allocation. J. Guid. Control Dyn. 2001, 25, 703–711. [Google Scholar] [CrossRef]
  52. Singh, S.N.; Steinberg, M.L.; Page, A.B. Nonlinear adaptive and sliding mode flight path control of F/A-18 model. IEEE Trans. Aerosp. Electron. Syst. 2003, 39, 1250–1262. [Google Scholar] [CrossRef]
  53. Stevens, B.L.; Lewis, F.L.; Eric, E.N.J. Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems, 3rd ed.; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 2016. [Google Scholar]
  54. Durham, W.C.; Bordignon, K.A.; Beck, R. Aircraft Control Allocation; Registered office John Wiley and Sons Ltd.: Chichester, UK, 2017. [Google Scholar]
  55. Farzanegan, B.; Zamani, M.; Suratgar, A.A.; Menhaj, M.B. A neuro-observer-based optimal control for nonaffine nonlinear systems with control input saturations. Control Theory Technol. 2021, 19, 283–294. [Google Scholar] [CrossRef]
  56. Song, R.; Wei, Q.; Li, Q. Adaptive Dynamic Programming: Single and Multiple Controllers; Science Press: Beijing, China, 2019; p. 278. [Google Scholar]
  57. Wang, D.; Zhao, M.; Ha, M.; Ren, J. Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application. Neural Netw. 2021, 143, 121–132. [Google Scholar] [CrossRef]
  58. Stone, M.H. The Generalized Weierstrass Approximation Theorem. Math. Mag. 1948, 21, 167. [Google Scholar] [CrossRef]
  59. Abu-Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
  60. Cong, J.; Hu, J.; Wang, Y.; He, Z.; Han, L.; Su, M. Fault-Tolerant Attitude Control Incorporating Reconfiguration Control Allocation for Supersonic Tailless Aircraft. Aerospace 2023, 10, 241. [Google Scholar] [CrossRef]
Figure 1. The layout of ICE.
Figure 1. The layout of ICE.
Drones 07 00294 g001
Figure 2. Explanation of wind-axes system and body-fixed coordinate system.
Figure 2. Explanation of wind-axes system and body-fixed coordinate system.
Drones 07 00294 g002
Figure 3. Control structure.
Figure 3. Control structure.
Drones 07 00294 g003
Figure 4. Effectiveness of the filter.
Figure 4. Effectiveness of the filter.
Drones 07 00294 g004
Figure 5. Moment coefficients vs. left all-moving tips deflection.
Figure 5. Moment coefficients vs. left all-moving tips deflection.
Drones 07 00294 g005
Figure 6. Angle of attack.
Figure 6. Angle of attack.
Drones 07 00294 g006
Figure 7. Sideslip angle.
Figure 7. Sideslip angle.
Drones 07 00294 g007
Figure 8. Bank angle.
Figure 8. Bank angle.
Drones 07 00294 g008
Figure 9. Body-axis roll rate.
Figure 9. Body-axis roll rate.
Drones 07 00294 g009
Figure 10. Body-axis pitch rate.
Figure 10. Body-axis pitch rate.
Drones 07 00294 g010
Figure 11. Body-axis yaw rate.
Figure 11. Body-axis yaw rate.
Drones 07 00294 g011
Figure 12. Tracking error of x 2 c .
Figure 12. Tracking error of x 2 c .
Drones 07 00294 g012
Figure 13. Weighted quadratic sum of effector deflection.
Figure 13. Weighted quadratic sum of effector deflection.
Drones 07 00294 g013
Figure 14. p under proportional control.
Figure 14. p under proportional control.
Drones 07 00294 g014
Figure 15. Effector deflection of ADP.
Figure 15. Effector deflection of ADP.
Drones 07 00294 g015
Figure 16. Effector deflection of INDIPI.
Figure 16. Effector deflection of INDIPI.
Drones 07 00294 g016
Figure 17. Convergence of the critical NN weights.
Figure 17. Convergence of the critical NN weights.
Drones 07 00294 g017
Figure 18. Angle of attack.
Figure 18. Angle of attack.
Drones 07 00294 g018
Figure 19. Sideslip angle.
Figure 19. Sideslip angle.
Drones 07 00294 g019
Figure 20. Bank angle.
Figure 20. Bank angle.
Drones 07 00294 g020
Figure 21. Body-axis roll rate.
Figure 21. Body-axis roll rate.
Drones 07 00294 g021
Figure 22. Body-axis pitch rate.
Figure 22. Body-axis pitch rate.
Drones 07 00294 g022
Figure 23. Body-axis yaw rate.
Figure 23. Body-axis yaw rate.
Drones 07 00294 g023
Figure 24. Effector deflection.
Figure 24. Effector deflection.
Drones 07 00294 g024
Figure 25. Convergence of the critical NN weights when δ ¯ = [ 17.8 , 17.8 , , 17.8 ] T .
Figure 25. Convergence of the critical NN weights when δ ¯ = [ 17.8 , 17.8 , , 17.8 ] T .
Drones 07 00294 g025
Figure 26. Convergence of the critical NN weights when δ ¯ = [ 5.6 , 5.6 , , 5.6 ] T .
Figure 26. Convergence of the critical NN weights when δ ¯ = [ 5.6 , 5.6 , , 5.6 ] T .
Drones 07 00294 g026
Figure 27. Convergence of the critical NN weights when δ ¯ = [ 56.4 , 56.4 , , 56.4 ] T .
Figure 27. Convergence of the critical NN weights when δ ¯ = [ 56.4 , 56.4 , , 56.4 ] T .
Drones 07 00294 g027
Table 1. The basic parameters of ICE.
Table 1. The basic parameters of ICE.
ParameterNomenclatureValueUnit
bLateral–directional reference length, span37.50ft
c ¯ Mean aerodynamic chord28.75ft
mWeight32,750LBF
I y y Pitch Moment of Inertia78,451 slug · ft 2
I x z Cross Product of Inertia−525 slug · ft 2
I x x Roll Moment of Inertia35,479 slug · ft 2
I z z Yaw Moment of Inertia110,627 slug · ft 2
SReference area808.60 ft 2
X t h Moment arm for thrust vectoring18.75ft
X c g Gravity center38.84% c ¯
X a c Aerodynamic center38.00% c ¯
Table 2. Nomenclature of variables.
Table 2. Nomenclature of variables.
ParameterNomenclatureUnit
F= sum of aerodynamic force and thrustLBS
l , m , n = aerodynamic rolling, pitching, and yaw momentLBS·ft
p , q , r = body-axis roll, pitch, and yaw raterad/s
V= airspeedft/s
γ = flight path anglerad
χ = flight path sideslip anglerad
α = angle of attackrad
β = sideslip anglerad
μ = bank angle about the velocity vectorrad
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, Z.; Hu, J.; Wang, Y.; Cong, J.; Bian, Y.; Han, L. Attitude-Tracking Control for Over-Actuated Tailless UAVs at Cruise Using Adaptive Dynamic Programming. Drones 2023, 7, 294. https://doi.org/10.3390/drones7050294

AMA Style

He Z, Hu J, Wang Y, Cong J, Bian Y, Han L. Attitude-Tracking Control for Over-Actuated Tailless UAVs at Cruise Using Adaptive Dynamic Programming. Drones. 2023; 7(5):294. https://doi.org/10.3390/drones7050294

Chicago/Turabian Style

He, Zihou, Jianbo Hu, Yingyang Wang, Jiping Cong, Yuan Bian, and Linxiao Han. 2023. "Attitude-Tracking Control for Over-Actuated Tailless UAVs at Cruise Using Adaptive Dynamic Programming" Drones 7, no. 5: 294. https://doi.org/10.3390/drones7050294

Article Metrics

Back to TopTop