Adaptive Fuzzy Fault-Tolerant Attitude Control for a Hypersonic Gliding Vehicle: A Policy-Iteration Approach

: In this paper, adaptive fuzzy fault-tolerant control (AFFTC) for the attitude control system of a hypersonic gliding vehicle (HGV) experiencing an actuator fault is proposed. Actuator faults of the HGV are considered with respect to its actual structure and actuator characteristics. The HGV’s attitude system is firstly represented by a T–S fuzzy model, and then a normal T–S fuzzy controller is designed. A reinforcement learning (RL)-based policy iterative solution algorithm is proposed for the solving of the T-S fuzzy controller. Then, based on the normal T–S controller, a fuzzy FTC controller is proposed in which the control matrices can improve themselves according to the special fault. An integral reinforcement learning (IRL)-based solving algorithm is proposed to reduce the dependence of the design methods on the HGV model. Simulations on three different kinds of actuator faults show that the designed IRL-based FTC can ensure a reliable flight by the HGV.


Introduction
A hypersonic gliding vehicle (HGV) is a vehicle with high flight speed and a large flight envelope, so its flight dynamics are really complex.As a basic element of aircraft control, the modeling and realization of HGV attitude control have received extensive attention in recent years.Various control methods are used to solve this problem, such as trajectorylinearization-based active disturbance rejection control [1], fuzzy logic-based adaptive control [2], reinforcement-learning-based nonlinear control [3], and so on.Linear parameter varying-based attitude control for the attitude system of an HGV is proposed in [4], and the sources of uncertainties in HGV control are analyzed, which are uncertainty of aerodynamic parameters and external disturbance.Robust nonlinear controllers for the attitude system of an HGV suffering from model disturbance have also received widespread attention.
In practice, faults may exist in the actuator of the HGV, and such failures necessarily reduce the control accuracy of the HGV, so FTC is an important research area in the attitude control of HGVs.FTC of HGVs has been widely studied in recent years.In ref. [5], actuator faults of an HGV's attitude control system are considered, and adaptive FTC is proposed.In ref. [6], actuator faults of an HGV's attitude control system are given in a more general model, and then FTC that can ensure the robustness of the system under actuator fault is given.In ref. [7], fixed-time-quantized fault-tolerant attitude control is presented for an HGV.In the above literature, the model of the HGV's attitude system is assumed to be known, while in practice the nonlinear dynamics of the HGV may be unknown.In ref. [8], adaptive FTC is proposed for an HGV's attitude system with an unknown inertial matrix and state constraints.In ref. [9], time-varying fault control of an HGV together with an adaptive FTC are proposed.Actuator faults and model uncertainties are considered synchronously and then sliding-mode-control-based FTC is proposed in ref. [10].
In the above literature, the specific form and occurrence time of the faults are all supposed to be already known, and then FTC is designed according to these known faults.This is unreasonable in practice, as the specific form and occurrence time of faults are difficult to obtain.In ref. [11], online computing-method-based FTC is proposed for the attitude control of HGV, but this method can deal with, only fixed-form faults.In ref. [12], iterative learning fault-tolerant control is proposed for time-varying industrial processes with actuator faults, and in ref. [13], stochastic actuator failures are considered for Markovian jump time-delayed systems.The FTC of the constrained system is solved by model predictive control in ref. [14].The proposed controller design strategy can deal with some given faults but cannot adjust itself according to a specific fault online, so this is still an open problem.It should be noted that in order to facilitate practical use, the form of a given adaptive FTC should be as simple as possible.
The Takagi-Sugeno (T-S) fuzzy-based control design strategy can approach a nonlinear system with arbitrary accuracy, and the resulting T-S model is a combination of linear systems, so classical linear control methods can still be utilized by a parallel-distributed compensation (PDC) scheme [15].T-S fuzzy technology is viewed as an efficient way for analyzing and designing the control of nonlinear systems and has been widely applied, such as type-2 T-S fuzzy-based tracking control of a saturation system [16] and T-S fuzzy control for semi-Markov jumps [17].T-S fuzzy FTC of HGVs has also been utilized in [18].In ref. [18], the upper and lower bounds of the faults are already known.How to design FTC for HGVs with faults of unknown specific form and unknown occurrence time is still an open problem.
Furthermore, in order to improve the self-learning ability of the FTC, an intelligent algorithm is needed.Commonly used intelligent algorithms include deep learning [19] and reinforcement learning (RL) [20].The learning process of RL is similar to that of human beings.By interacting with the environment and updating the reward value in time, RL can obtain an optimal control law.RL does not need a specific model of the nonlinear system and can handle high-dimensional task scenarios, so it has been widely utilized by all trades and professions [21].
As a branch of RL, integral reinforcement learning (IRL) obtains its reinforcement signal by integrating the value function, and thereby can be utilized by unknown systems.Using only data from a completely unknown system, IRL can complete the assigned studying task, so it has been successfully applied to the optimal control of discrete-time multiagent systems [22], motion planning of autonomous vehicles [23], nuclear systems [24], and linear systems with input delay [25].IRL-based fault-tolerant adaptive tracking control of Euler-Lagrange systems are also presented to improve the tracking performance of fault-tolerant control [26].In a word, IRL has proved to be an effective means of solving FTC design of complex nonlinear systems [27].
Based on the above understanding, in this paper, an RL-based policy-iteration (PI) algorithm is utilized for the design of FTC of an HGV's attitude system.A nonlinear model of the HGV's attitude system is firstly, represented by a T-S fuzzy model, and then, the actuator fault model is built.A PI-based normal T-S fuzzy controller solving method is proposed without considering the actuator fault model.Based on the normal fuzzy controller, IRL-based fuzzy adaptive FTC is proposed.The control gains of the adaptive fuzzy FTC controller can self-adjust according to the special model of actuator fault.Finally, three simulation results are given to prove the effectiveness of the presented controller under different faults.
To sum up, the paper's contributions are: (1) A policy-iteration (PI) algorithm is utilized for the optimal controller design of a T-S fuzzy system.
(2) IRL-based adaptive fuzzy FTC is built for a T-S fuzzy system with actuator faults.With this FTC controller, the controller can be utilized online.
(3) This method is successfully applied to the attitude-tracking control of an HGV's attitude system.

Problem Description
In this section, the nonlinear attitude model of an HGV is discussed, and then the nonlinear mode is represented by a T-S fuzzy system.Based on the attitude tracking control objective and the actuator fault model of the HGV, the control objective of the proposed T-S fuzzy model is discussed.

Nonlinear Model of an HGV's Attitude System
The attitude model of an HGV used in this paper is where V is the velocity of the HGV, θ is the flight path angle and ψ v is the heading angle, L, D, and Y are the lift force, drag force, and side force, respectively.α is the attack angle, β is the sideslip angle, and γ v is the velocity inclination angle, ω x , ω y , and ω z are angular velocity of three axes, respectively.M x , M y , M z are the corresponding aerodynamic torques, and they are functions of α, β, and γ v , and δ e , δ r , and δ a , where δ e , δ r , and δ a are the elevator angle, yaw angle, and aileron angle, respectively.More details of the utilized mode can be found in ref. [5].
For the convenience of description, the nonlinear mode (1) can be rewritten as where and the expression of F(x(t), u(t)) can be found in (1).

T-S Fuzzy Modeling of an HGV
Since the parameter variation range of an HGV is big, a simple linear model derived by linearization of the equilibrium points is unsuitable for controller design.Considering the application convenience of the control system, T-S fuzzy modeling technology is utilized here.
For the application of T-S fuzzy modeling to an HGV's attitude system, the premise variable is chosen as α.Then three levels for α are chosen, the specific values of which are listed in Table 1.Then, the HGV attitude system T-S model is built.The details are as follows:

Rule NO.
Premise Variables α A i and B i (i = S, M, B) are system matrices and where Based on the T-S fuzzy model, the overall T-S fuzzy model of the HGV's attitude model can be represented by: where

Actuator Fault Model
In considering the actuator faults of an HGV, the following actuator fault is utilized: where Γ F is the failure matrix, which represents the failure coefficient and is unknown, and Λ F is the loss matrix, which denotes a bias fault, and is also unknown.u(t) is the normal control signal, and Γ F and Λ F have different values, which means that different fault cases occur.
ρ j (j = δ e , δ r , δ a ) represents the actual available proportion of the normal actuator, χ j (j = δ e , δ r , δ a ) represents the actuator float variable caused by friction, piezoelectric effects, and so on.χ j (j = δ e , δ r , δ a ) is a slowly changed value or is invariant; so it can be regarded as a constant in the controller design.Expression (3) represents almost all fault models, more specifically: (1) If Γ F = I, Λ F = 0, the actuator fault is in free fault mode; (2) If 0 < Γ F < I, Λ F = 0, the actuator fault is loss of effectiveness; (3) If Γ F = I, Λ F ̸ = 0, the actuator fault is a drift fault; (4) If 0 < Γ F < I, Λ F ̸ = 0, the actuator fault is a combined loss of effectiveness and drift fault fault.
Remark 1.The faults described by (3) are a general form of HGV actuator faults.In the previous literature, one or more faults were considered, but there was no unified description of HGV actuator faults.This paper gives a general description as shown in (3) and then presents a controller design method according to this fault mode.

Control Objective
The problem mainly considered in this paper is the design of an attitude-tracking controller for an HGV, so the main objective of the optimal tracking problem is to seek the control policy u(t), so as to make the system (2) track a desired trajectory r d (t), where r d (t) = α d , β d , γ vd , 0, 0, 0 T .Then, defining the tracking error as by ( 2) and ( 4), e(t) can be rewritten as: where ṙd (t) is the the differential of the reference command r d (t).Then, the whole system is: where ṙd (t) is assumed to be bounded.Then, considering the actuator faults proposed in (3), the designed controller for (6) should track a command r d (t) under unknown actuator faults u F (t).

Main Results
In this section, a PI-based fuzzy adaptive FTC strategy is proposed based on only a little information about the actuator faults.

PI-Based Normal Controller
For the T-S fuzzy model (2) without actuator faults (3), the comprehensive control performance index for the ith rule is where R ∈ R 3 and Q ∈ R 6 are given positive definite symmetric matrices, and γ is a prescribed constant that respects the performance of interference suppression for the designed controller.Defining the following value function for the ith rule: then, the above value function can be viewed as a zero-sum game between the control policy u i (t) and the derivation of the reference command ṙd (t) [28].In this zero-sum game, the control policy u i (t) wants to seek an optimal controller to minimize the performance index (7), while ṙd (t) wants to maximize it.According to the definition of a zero-sum game, we make the following definition of the Nash equilibrium.Remark 2. In the above zero-sum game, we assume that the reference command ṙd (t) is a player and can be changed freely.In practice, the reference command must combine some flight requirements, but this assumption is more conducive to us obtaining the optimal strategy of the game.This assumption can also cope with various changes in actual instructions, so this assumption is necessary.
For the above game, according to the Bellman equation, we choose the Hamiltonian function for the ith rule to be For convenience of description, using the fuzzy state feedback controller, we use the following rule to describe the control policy and derivative policy of the reference command.Then, the game value function for the ith rule can be respected as ∞ t e(τ) T P i e(τ)dτ where P i is a solution of the following Algebraic Riccati Equation (10): According to the Pontryagin minimum principle, the optimal control policy for ( 9) is where u * i (t) is the optimal control for the ith rule of the T-S model (2), is the optimal derivation of the reference command, and With the above optimal control u * i (t) and ṙ * d (t), the optimal game value function for the ith rule is ∞ t e(τ) T P * i e(τ)dτ where P * i is the optimal value of P i .Then, for the T-S model of the HGV, the total controller is constructed as: Theorem 1.For system (2) without actuator faults, if there exist positive-definite matrices P i > 0, and for the ith rule of the T-S fuzzy system, under controller ( 11) and ( 12), satisfying then, the T-S fuzzy system (2) is stable.
Proof.The overall performance of ( 6) is where P is a positive definite symmetric matrix for the overall system.Then, T j P j e(t) + γ 2 ṙd (t) T ṙd (t) Since P and P j are all positive definite symmetric matrices, considering the matrices B i for controller design, there exist positive constants ϱ M ≥ ϱ m > 0, ϱ m e T (t)P j B j R −1 B T j P j e(t) ≤ e T (t)PB i R −1 B T j P j e(t) ≤ ϱ M e T (t)P j B j R −1 B T j P j e(t) and positive definite symmetric matrix M j , and positive constant η > 0, such that Then, and . The proof process is completed.
Based on Theorem 1, the online policy-iteration algorithm is as Algorithm 1: Algorithm 1: Model-based PI algorithm 1. Initialization: Select i = 0, choose any reasonable policy u i (t) = K (0) i e(t); 2. Policy evaluation: Solve the following equations for P 4. If the convergence condition is satisfied, stop; else, go to step 2.
In Algorithm 1, the superscript (i) represents the ith iteration.
Remark 3. In the proof of the above Theorem, overall performance J T in ( 14) is not equal to the simple sum of J i proposed in (7).This proof process for Theorem 1 has taken into account the PDC of fuzzy systems.

PI-Based Fuzzy FTC Control
Just as described in (3), Γ F and Λ F are unknown, but they are all bounded.Then, adaptive fuzzy FTC is constructed: for the T-S fuzzy model ( 6) with actuator faults (3), the fuzzy state feedback FTC controller is: where ΛFi is the estimation of Λ F at Rule i, and ΓFi is the estimation of Γ F at Rule i.
Theorem 2. For system (2) with actuator fault (3), if there exist positive-definite matrices P i > 0, and for the ith rule of the T-S fuzzy system and a FTC controller K i satisfying where and the updating law ΓFi and ΛFi are where q 1 and q 2 are given constants, then system (2) is semi-globally uniformly ultimately bounded.
Proof.Choosing a Lyapunov function for tracking system (6) as Then, considering the differential of the proposed Lyapunov function +2e T (t) and considering the updating law of ΓFi and ΛFi , Based on the analysis of Theorem 1, the whole system is semi-globally uniformly ultimately bounded.

IRL-Based Fuzzy FTC Control
For solving Theorem 2, all information on A i and B i are needed, and the ARE is difficult to solve.To reduce the dependence on the system model in the solution process, and give a more easily solved method, a new data-based PI fuzzy FTC controller is developed from Theorem 2 and Algorithm 1, in which the system matrices A i can be unknown and only B i are utilized for the proposed FTC [25].
The derivative of the ith value function is Vi (e(t), Integrating both sides of (21) from t to t + T, results in Since the value of e(t + T), e(t) and the integral of the right side can all be computed, the positive defined matrices P i can be solved by the above equation.Then, the following theorem is proposed (Algorithm 2): Algorithm 2: IRL-based fuzzy FTC control algorithm 1. Initialization: Select i = 0, choose any reasonable policy u i (t) = K i e(t), P (i) > 0; 2. Policy evaluation: Solve the following equations for P

Policy improvement:
. and ΛFi = 2q 1 B T i P i e(t) ΓFi = diag(2q 2 e(t) T P i B i u i (t)) 4. If the convergence condition is satisfied, stop; else, go to step 2.
A flowchart of the proposed AFFTC is given in Figure 1.From Figure 1, one can see that, the desired trajectory r d (t) is firstly added to system (1), and then, the tracking error e(t) is generated.After that, the adaptive parameter estimator updates the parameters according to (19) and (20), and the controller updates the controller according to (18).Finally, the overall updating of the fuzzy controller is completed.

Simulation Results
According to the T-S approximation method proposed in Section 2, the fuzzy model of the HGV's attitude system can be constructed as follows: Then, based on the adaptive fuzzy FTC strategy given in Section 3.2, the FTC controller can be driven from different faults.The parameters and initial states of the proposed simulation are similar to ref. [5].For testing the effectiveness of the proposed controller, three simulations under three different fault modes are given as follows: Case I: loss of effectiveness.In this simulation, the actuator faults are According to the proposed method, the loss of effectiveness fault is carried on the nonlinear attitude model of the HGV.The fault is set to occur 40 s after the simulation starts.For comparison, the T-S controller proposed in ref. [29], marked as u T−S , is applied to the attitude control of the HGV, and the proposed adaptive fuzzy control is marked as u FTC .The tracking simulation results are presented in Figures 2 and 3. Specifically, the tracking results of the given command are presented in Figure 2, and the input is presented in Figure 3. From the simulation figures, we can see that, when a fault occurs, the tracking error of u FTC is smaller than that of u T−S , and the input of u FTC is also smoother than that of u T−S .So, the proposed adaptive fuzzy FTC can adjust itself and obtain a better performance.
Case II: Drift fault.In this simulation, the following faults are considered: The fault is set to occur 40 s after the simulation starts.The tracking simulation results are presented in Figures 4 and 5. Specifically, the tracking results of the given command are presented in Figure 4, and the input is presented in Figure 5. From the simulation figures, we can see that, under drift fault, the proposed adaptive fuzzy FTC can also guarantee the tracking performance of the HGV, and the smoothness of the input under u FTC is better than that under u T−S .
Case III: Combined fault.In this simulation, the following faults are considered: The faults are set to occur 40 s after the simulation starts.The tracking simulation results are presented in Figures 6 and 7. Specifically, the tracking results of the given command are presented in Figure 6, and the input to the HGV is presented in Figure 7.Under the combined fault, the input of u FTC and u T−S are all non-smooth, but the oscillation amplitude of u FTC is much smaller than that of u T−S .

Conclusions
FTC of an HGV's attitude control system is discussed in this paper.Actuator faults are considered and the mode of the faults is the general model of the actuator.Based on the T-S fuzzy approach, a nonlinear attitude model is firstly modeled using a T-S fuzzy model, and then, a normal T-S controller without considering the actuator fault is designed utilizing RL technology.Then, based on the normal fuzzy controller, an improved adaptive FTC is designed in which the FTC can be adjusted online according to the failure mode.Finally, simulation results on different kinds of actuator faults are given to show the good performance of the proposed FTC.
The simulation results given in this paper are based on a mathematical model.Future work will consist of engineering the proposed algorithm and verification of the proposed method by combining the algorithm with an actual HGV.

Figure 1 .
Figure 1.Flowchart of the proposed method.

Remark 4 .
The proposed FTC method is an IRL-based adaptive fuzzy FTC controller.Although adaptive control methods can adjust the parameters of a controller, the efficiency of self-adaptation adjustment is too low for time-varying faults or sudden faults.Reinforcement learning can quickly obtain new control parameters according to the changes in HGV states.Combining reinforcement learning with self-adaptation, the proposed method can quickly adjust the parameters of the controller when the HGV has sudden or time-varying faults.

Definition 1 .
Nash equilibrium: The zero-sum game (8) has a unique Nash equilibrium if the following conditions are satisfied:1.For each fixed control u i (t), there is always a unique ṙd (t) that can maximize V i (e(t), u i (t), ṙd (t)); which means that, there exists a ṙ * d (t) such that V i (e(t), u i (t), ṙ * d (t)) ≥ V i (e(t), u i (t), ṙd (t)); d (t) is the optimal disturbance policy.