Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties

Ren, Wei; Wei, Yingjie; Wang, Cong; Wang, Zheng

doi:10.3390/drones9080562

Open AccessArticle

Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties

¹

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

²

Jiangnan Electromechanical Design Institute, Guiyang 550009, China

³

Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(8), 562; https://doi.org/10.3390/drones9080562

Submission received: 3 July 2025 / Revised: 31 July 2025 / Accepted: 6 August 2025 / Published: 11 August 2025

(This article belongs to the Section Drone Design and Development)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

A novel RL-based adaptive finite-time control scheme for morphing unmanned aircraft is synthesized. It can address mismatched disturbances, coupled uncertainties, and non-affine characteristics, enabling the aircraft’s attitude to converge to the desired value within a finite time.
The attitude dynamics of the morphing unmanned aircraft are described as a class of mismatched non-affine systems, which are more suitable for practical scenarios and simplify the analysis process compared to previous models.

What is the implication of the main finding?

Applying reinforcement learning to morphing unmanned aircraft enhances its ability to handle uncertainties, and finite-time reinforcement learning helps limit the control convergence time, thus improving the trajectory-tracking control performance.
The proposed scheme for RL-based robust adaptive flight control offers a method that can be extended to various aircraft control research fields.

Abstract

This paper proposes a finite-time adaptive reinforcement learning (RL) control law for a class of morphing unmanned aircraft with mismatched disturbances and coupled uncertainties. To handle the mismatched disturbances, an adaptive upper-bound estimator as well as the parameter adaptive laws have been proposed. Aiming at the coupled uncertainties, an RL-based online uncertainty estimator and a corresponding finite-time compensation control law have been developed. To deal with the non-affine characteristics, an auxiliary integral system has been introduced. By systematically integrating the aforementioned adaptive upper-bound estimators, finite-time control law, and the auxiliary signals, a novel RL-based adaptive finite-time control framework is constructed for morphing unmanned aircraft. Simulation results reveal the finite-time convergence and the advantages of the proposed method.

Keywords:

reinforcement learning; finite-time adaptive control; morphing unmanned aircraft; mismatched disturbance; coupled uncertainties

1. Introduction

Morphing unmanned aircraft can optimize their aerodynamics and performance by changing their shape to adapt to diverse flight scenarios [1]. It can be classified by morphing scale, location, gas flow, and implementation method [2]. Yu et al. [3] and Noordin et al. [4] developed PID-based UAV adaptive control methods, achieving satisfactory tracking performance. As the application scenarios increase, there is an urgent need to improve its modeling and control law design [5]. Guidance and control technology is crucial as it ensures stable flight in complex environments, enables real-time adjustment of flight paths and morphing strategies, and enhances overall performance and adaptability [6,7]. He et al. [8] proposed an integrated guidance and control method using backstepping and fixed-time sliding mode control. This method includes a morphing term and solves the guidance and control problems of high-speed morphing aircraft. It stabilizes the system and makes errors converge quickly, thus improving control performance. Zhang et al. [9] presented an event-triggered fixed-time sliding mode control, which enhances the aircraft’s robustness and adaptability under complex conditions. Abouheaf et al. [10] developed a machine-learning-based autonomous morphing control for flexible-wing morphing aircraft. This method shows better stability than conventional model-free adaptive control approaches. The literature [11,12] designs a feedback-linearization-based nonlinear command and stability system for variable aspect ratio morphing aircraft. This system improves flexibility and fuel efficiency. The literature [13] proposes an online actor–critic control for variable aspect-ratio and swept-morphing-wing aircraft, which solves the problem of control input frequency. Numerical simulations prove its effectiveness. The literature [14] proposes a preset-performance sliding-mode controller for high-speed morphing unmanned aircraft attitude control under strong uncertainty. Paired with a finite-time neural network disturbance observer, it enhances the system’s performance and adaptability to disturbances and parameter changes. These methods combine advanced control theories to solve the high-speed guidance and control problems of morphing unmanned aircraft.

In recent years, advanced adaptive control approaches have been extensively researched and applied in the realm of complex and uncertain systems [15,16,17]. Among them, model reference adaptive control (MRAC) is a prominent technique that enables real-time parameter adjustment. It achieves this by comparing the performance of the system with a pre-established reference model. The literature [18,19,20] puts forward an enhanced nonlinear dynamic inversion control method based on MRAC. This innovation strengthens the aircraft’s resilience against faults and external disturbances. It refines the accuracy of adaptive estimation and incorporates a low-pass filter to optimize the balance between dynamic performance and robustness. To design the controller better, it is crucial to elucidate the distinctions and relationships among parameter uncertainties, disturbance rejection, and fault-tolerant control. Parameter uncertainties can be caused by sensor biases, which introduce inaccuracies in the measurement of system states. Actuator failures, on the other hand, represent a severe type of disturbance. These failures can disrupt the normal operation of the aircraft and require effective fault-tolerant control strategies to ensure safe flight. To ensure robust performance, researchers focus on parameter uncertainties and disturbance rejection. For robustness enhancement, fuzzy adaptive control effectively strengthens system robustness under uncertain and fuzzy operating conditions. As detailed in the literature [21], a novel fixed-time adaptive generalized type-2 fuzzy logic control scheme was devised for hypersonic aircraft grappling with uncertainties. The efficacy of this control scheme is validated through simulations. Based on the observer, a robust flight controller for a disturbance-unmanned aerial vehicle system is developed [22]. In addition, in the literature [23], the control problem of nonuniform nonlinear systems with time-varying delays in both state and input has also been discussed. Adaptive sliding mode control has also made significant strides in enhancing system robustness and responsiveness. In reference [24], a dedicated control strategy for hypersonic aircraft has been proposed. It constructs a dynamic uncertainty model and estimates uncertainty bounds online to coordinate the robustness and responsiveness of the controller. During flight, morphing unmanned aircraft experience complex shape alterations and fluctuations in aerodynamic characteristics. Traditional control methods like MRAC or sliding mode control often can’t accurately capture these nonlinear dynamics. To address this, some scholars have studied the finite-time asynchronous control of singular fuzzy Markov jump systems [25]. Their research resulted in an adaptive event-triggered scheme, which reduces communication frequency and improves system efficiency. Additionally, Shi et al. [26] introduced a distributed adaptive event-triggered control strategy to solve the cooperative output regulation problem in heterogeneous linear multi-agent systems.

Nevertheless, abrupt changes in morphing unmanned aircraft, particularly shape-switching events, can surpass the predefined triggering conditions. This may result in overly frequent or missed trigger events, thereby introducing instability into the system. Adaptive neural network control emerges as a viable solution. It can manage nonlinear and high-dimensional systems, ensuring stability under unknown dynamics and external disturbances. Research [27,28] introduced a fault-tolerant control approach that combines an adaptive neural network with a nonlinear observer. This synergy improves the robustness and real-time decision-making capabilities of nonlinear systems. To optimize control performance, reinforcement learning has been integrated into the adaptive control framework, expanding its applicability. Reference [29] presented an adaptive model-free fault-tolerant control solution based on integral reinforcement learning for highly flexible aircraft with actuator failures. Reinforcement-learning-based methods, with their distinct advantages in handling nonlinearity, model-free scenarios, real-time learning, and multi-objective optimization, can surmount the control challenges of morphing unmanned aircraft in complex environments.

However, most existing design methods for proving the Lyapunov stability theory can only achieve asymptotic stability. That is, the system takes an infinite time to converge to the equilibrium point [30,31,32,33]. In many engineering applications, it is desired that the control objectives be achieved as soon as possible. Thus, finite-time control has emerged. Finite-time control is applicable to different types of systems. For example, references [34,35,36] proposed finite-time control methods for different types of nonlinear systems, ensuring the finite-time stability of the closed-loop systems. In the aerospace field, finite-time control is mainly used for spacecraft attitude tracking. For instance, reference [37] studied the vibration suppression and attitude tracking of flexible spacecraft under model uncertainties and external disturbances. It designs a multivariable finite-time control scheme for spacecraft attitude tracking based on a novel dynamic sliding dynamics and an adaptive disturbance observer (ADO). Reference [38] investigated the finite-time tracking control problem of a class of nonlinear systems and proposed a new finite-time command-filtered backstepping method. This method has the advantages of conventional command-filtered backstepping control and ensures finite-time convergence.

Research [39] designed a concurrent-learning adaptive finite-time control based on inertial parameter identification under external disturbances and applied it to aerospace attitude control. Reference [40] proposed a novel discrete-time fuzzy preselected performance control (PPC) method. By employing an indirect stabilization mechanism and a low-computation fuzzy approximation strategy, this approach achieves finite-time convergent control of unknown system dynamics without the need for complex model reconstruction. Furthermore, reference [41] developed a fixed-time pre-configured controller for electromechanical systems based on enhanced fuzzy neural approximation and backstepping design. Extending this research to hypersonic vehicles with elevator-stuck faults, the authors propose a fuzzy fault-tolerant control scheme that ensures appointed-time convergence [42]. Reference [43] studied the spacecraft formation flying system affected by external disturbances and parameter uncertainties; it designed a new improved, fast integral terminal sliding-mode control law, and proposed an adaptive tracking control for the spacecraft formation flying system.

Only a few studies focus on applying finite-time control to morphing unmanned aircraft. In the early stage, Wang et al. [44] proposed a smooth-switching state-feedback controller design method for the altitude-keeping and attitude-stability problems of morphing unmanned aircraft during continuous morphing. They establish a chained smooth-switching system model and derive sufficient conditions for finite-time boundedness and robustness. Subsequently, Cheng et al. [45] studied the asynchronous finite-time H∞ control problem of morphing unmanned aircraft with controller uncertainties. Considering the inherent packet dropouts of the system and controller uncertainties, they propose a non-fragile finite-time H∞ controller design method, and its effectiveness is verified by numerical examples. In recent years, some scholars have combined finite-time control with disturbance observers to improve the robustness of morphing unmanned aircraft [46,47].

Based on the above analysis, the application of reinforcement learning to the trajectory-tracking control of morphing unmanned aircraft has not been reported yet. In particular, the research on finite-time reinforcement learning in this field is lacking. Considering the strong coupling and uncertainties of morphing unmanned aircraft, the application of reinforcement learning can enhance its ability to deal with uncertainties. Meanwhile, finite-time reinforcement learning can help limit and shorten the system’s stabilization and control convergence time. Therefore, it is of great significance to improve the trajectory-tracking control performance of morphing unmanned aircraft, and this paper will conduct in-depth research on this.

Based on the above analysis, the main contributions of this paper are as follows:

The attitude dynamics of the morphing unmanned aircraft are described as a class of mismatched non-affine systems, including matched and mismatched disturbances, non-affine input, and internal uncertainties. Compared to previous models [14,48,49], the proposed model is more applicable to practical scenarios and simplifies the analysis process.
Compared with the literature [13], our work focuses on finite-time control, mismatched disturbances, and coupled uncertainties, while the literature [13] addresses non-affine control systems and control input frequency constraints. Different from the literature [27] on highly flexible aircraft, our method targets morphing unmanned aircraft, devises adaptive finite-time controllers, and ensures finite-time attitude convergence with better performance.
This paper proposes a design framework for RL-based adaptive anti-disturbance flight control, offering a paradigm that can be extended to various aircraft.

The remainder of this paper is structured as follows. In Section 2, the problem is formulated, and preliminary knowledge is introduced. Section 3 presents the main findings. Section 4 provides simulation studies. Finally, Section 5 concludes the paper.

2. Problem Formulation and Preliminaries

2.1. Dynamic Model Description

The attitude dynamics model of a morphing unmanned aircraft can be formulated as follows [8,14]:

\begin{array}{l} [\begin{matrix} \dot{α} \\ \dot{β} \\ {\dot{γ}}_{v} \end{matrix}] = [\begin{matrix} - \cos α \tan β & \sin α \tan β & 1 \\ \sin α & \cos α & 0 \\ \cos α \sec β & - \sin α \sec β & 0 \end{matrix}] [\begin{matrix} ω_{x} \\ ω_{y} \\ ω_{z} \end{matrix}] \\ - [\begin{matrix} \sec β \cos γ_{v} \\ \sin γ_{v} \\ - \tan β \cos γ_{v} \end{matrix}] \dot{θ} - [\begin{matrix} - \sec β \cos θ \sin γ_{v} \\ \cos θ \cos γ_{v} \\ - \sin θ + \tan β \cos θ \sin γ_{v} \end{matrix}] \dot{σ}, \end{array}

(1)

where

α

is the angle of attack,

β

is the sideslip angle,

γ_{v}

is the tilt angle,

θ

is the velocity inclination angle, and

σ

is the track yaw angle, and

ω_{x}, ω_{y}, ω_{z}

are the angular velocities of the roll, yaw, and pitch channels, respectively.

The longitudinal dynamics model of the aircraft is as follows:

\dot{ω} = {(I_{b} + \sum_{j = L, r} I_{b j})}^{- 1} (M_{c} - M_{f} - \sum_{j = L, r} m_{j} (r_{O m \to O j} + r_{O j \to c_{j}}) \times (v + ω \times v)),

(2)

where

ω = {[ω_{x}, ω_{y}, ω_{z}]}^{T}

,

I_{b}

denotes the moment of inertia of the aircraft body,

I_{b j}

denotes the moment of inertia of the wing,

M_{c}

represents the aerodynamic torque associated with control ability,

M_{f}

is the additional moment caused by deformation,

r_{O m \to O j}

is the rotation vector of the wing relative to the fuselage in this system,

r_{O j \to c_{j}}

is the rotation vector of the wing relative to the fuselage under the wing mounting system,

v

represents the matrix of velocity magnitudes, and

m_{j}

is the mass of the aircraft wing, respectively. The detailed descriptions of some notations are as follows.

The

I_{b j}

is expressed as:

\begin{array}{l} I_{b j} = - m_{j} r_{O m \to O j}^{\times} \times r_{O m \to O j}^{\times} - m_{j} R_{b j} r_{O j \to c_{j}}^{\times} \times R_{b j}^{T} r_{O m \to O j}^{\times} \\ - m_{j} r_{O m \to O j}^{\times} R_{b j} r_{O j \to c_{j}}^{\times} \times R_{b j}^{T} r_{O m \to O j}^{\times} + R_{b j} I_{j} R_{b j}^{T}, \end{array}

(3)

where

r_{O j \to c_{j}}^{\times}

and

r_{O m \to O j}^{\times}

represent the cross-product vector matrices of

r_{O j \to c_{j}}

and

r_{O m \to O j}

, respectively,

R_{b j}

is a rotation matrix from the wing installation frame to the body coordinate system, and is expressed as follows:

\{\begin{array}{l} R_{b l} = {[\begin{matrix} R_{y} (90^{°} + χ_{l}) \end{matrix}]}^{T} \\ R_{b r} = {[\begin{matrix} R_{y} (- 90^{°} - χ_{r}) \end{matrix}]}^{T}, \end{array}

(4)

where

R_{y}

is the basic rotation matrix of the

y

axis, and

χ

is the caster angle.

The

M_{c}

can be given as follows:

M_{c} = [\begin{matrix} M_{c x} \\ M_{c y} \\ M_{c z} \end{matrix}] = \frac{1}{2} ρ v^{2} S_{0} L_{0} [\begin{matrix} m_{c x} \\ m_{c y} \\ m_{c z} \end{matrix}],

(5)

where

ρ

represents the atmospheric density,

v

is the velocity magnitude,

S_{0}

represents the reference area,

L_{0}

is the reference length,

m_{c x}, m_{c y}, m_{c z}

represents the rotation moment coefficient, yaw moment coefficient, and pitch moment coefficient, respectively.

The determination of

m_{c i}

can be presented by

m_{c i} = m_{c i} (α, β, χ, δ_{x}, δ_{y}, δ_{z}), i = x, y, z,

(6)

where

δ_{x}

,

δ_{y}

and

δ_{z}

represent the roll angle, yaw angle, and pitch angle, respectively.

The

M_{f}

can be given as follows:

\begin{array}{l} M_{f} = ω_{b} \times (I_{b} ω_{b}) + \sum_{j = L, r} I_{b j} {\dot{ω}}_{j} + \sum_{j = L, r} m_{j} r_{O m \to O j} \times [ω_{b} \times (ω_{b} \times r_{O m \to O j})] \\ + \sum_{j = L, r} m_{j} r_{O j \to c_{j}} \times [ω_{b} \times (ω_{b} \times r_{O_{m} \to O_{j}})] \\ + \sum_{j = L, r} m_{j} r_{O_{m} \to O_{j}} \times \{(ω_{b} + ω_{j}) \times [(ω_{b} + ω_{j}) \times r_{O j \to c_{j}}]\}, \end{array}

(7)

where

I_{b j} = R_{b j} I_{b} R_{b j}^{T},

ω_{b}

represents the angular velocity around the center of mass

O_{m}

of the aircraft in the body coordinate system,

ω_{j}

and

{\dot{ω}}_{j}

are the angular velocity and angular acceleration, respectively.

By defining the state variables

x_{1} = [\begin{matrix} x_{11} & x_{12} & x_{13} \end{matrix}] = {[\begin{matrix} α & β & γ_{v} \end{matrix}]}^{T}

and

x_{2} = {[\begin{matrix} x_{21} & x_{22} & x_{23} \end{matrix}]}^{T} =

{[\begin{matrix} ω_{x} & ω_{y} & ω_{z} \end{matrix}]}^{T}

, control input

u = {[\begin{matrix} δ_{x} & δ_{y} & δ_{z} \end{matrix}]}^{T}

, then (1) and (2) can be simplified in the following vector form with uncertain mismatched non-affine functions:

\begin{array}{l} {\dot{x}}_{1} (t) = f_{1} (x_{1} (t)) + Δ f_{1} (x_{1} (t)) + g_{1} (x_{1} (t), x_{2} (t)) + d_{1} (t) \\ {\dot{x}}_{2} (t) = f_{2} (x_{1} (t), x_{2} (t)) + Δ f_{2} (x_{1} (t), x_{2} (t)) + g_{2} (x_{1} (t), x_{2} (t), u (t)) + d_{2} (t), \end{array}

(8)

where

Δ f_{1} (x_{1} (t))

and

Δ f_{2} (x_{1} (t), x_{2} (t))

are the unknown structural uncertainties caused by the deformation of the aircraft,

d_{1} (t)

and

d_{2} (t)

represent the various unknown disturbances encountered during flight,

g_{1} (x_{1} (t), x_{2} (t))

and

g_{2} (x_{1} (t), x_{2} (t), u (t))

denotes the non-affine functions, some notations are detailed as:

\begin{array}{l} f_{1} (x_{1} (t)) = - {[\begin{matrix} \sec β \cos γ_{v} & \sin γ_{v} & - \tan β \cos γ_{v} \end{matrix}]}^{T} \dot{θ} \\ - {[\begin{array}{l} - \sec β \cos θ \sin γ_{v} & \cos θ \cos γ_{v} & - \sin θ + \tan β \cos θ \sin γ_{v} \end{array}]}^{T} \dot{σ}, \end{array}

(9)

\begin{array}{l} f_{2} (x_{1} (t), x_{2} (t)) = - {(I_{b} + \sum_{j = L, r} I_{b j})}^{- 1} M_{f} \\ - {(I_{b} + \sum_{j = L, r} I_{b j})}^{- 1} (\sum_{j = L, r} m_{j} (r_{O m \to O j} + r_{O j \to c_{j}}) \times (v + ω \times v)) . \end{array}

(10)

\begin{array}{l} g_{1} (x_{1} (t), x_{2} (t)) = [\begin{matrix} - \cos α \tan β & \sin α \tan β & 1 \\ \sin α & \cos α & 0 \\ \cos α \sec β & - \sin α \sec β & 0 \end{matrix}] [\begin{matrix} ω_{x} \\ ω_{y} \\ ω_{z} \end{matrix}], \\ g_{2} (x_{1} (t), x_{2} (t), u (t)) = {(I_{b} + \sum_{j = L, r} I_{b j})}^{- 1} [\begin{array}{l} m_{c x} (α, β, χ, δ_{x}, δ_{y}, δ_{z}) \\ m_{c y} (α, β, χ, δ_{x}, δ_{y}, δ_{z}) \\ m_{c z} (α, β, χ, δ_{x}, δ_{y}, δ_{z}) \end{array}] . \end{array}

(11)

Remark 1.

In the research of morphing unmanned aircraft control system modeling, to conform to real-world engineering practices and simplify the analysis, specific symbols are defined.

Δ f_{1} (x_{1} (t))

and

Δ f_{2} (x_{1} (t), x_{2} (t))

stand for unknown structural uncertainties due to aircraft deformation,

d_{1} (t)

and

d_{2} (t)

denote various unknown flight disturbances. These flight disturbances encompass sensor biases that can lead to parameter uncertainties, as well as actuator failures, which are a particularly severe type of disturbance. Additionally,

g_{1} (x_{1} (t), x_{2} (t))

and

g_{2} (x_{1} (t), x_{2} (t), u (t))

represents non-affine functions. Furthermore, analysis of (9)–(11) shows that system (8) has significant non-affine features, posing great challenges to morphing unmanned aircraft control design.

Thereafter, when there is no risk of confusion, vectors may not be bolded, and function arguments may be omitted.

Based on the above analysis and simplification, the control-oriented morphing unmanned aircraft attitude model is obtained as an uncertain mismatched non-affine system as follows:

\begin{array}{l} {\dot{x}}_{1} (t) = f_{1} (x_{1} (t)) + Δ f_{1} (x_{1} (t)) + g_{1} (x_{1} (t), x_{2} (t)) + d_{1} (t) \\ {\dot{x}}_{2} (t) = f_{2} (x_{1} (t), x_{2} (t)) + Δ f_{2} (x_{1} (t), x_{2} (t)) + g_{2} (x_{1} (t), x_{2} (t), u (t)) + d_{2} (t) \\ y (t) = x_{1} (t), \end{array}

(12)

where

y (t)

represents the system output vector.

To design an RL-based adaptive fault-tolerant controller for a morphing unmanned aircraft facing external disturbances, unknown dynamics, and non-affine inputs, this paper sets two control goals: (a) All the signals in the closed-loop system are semi-global practical finite-time stable. (b) The racking errors converge to a small neighborhood of origin in a finite time.

The following assumptions and lemmas are necessary to design the controller.

Assumption 1.

To ensure the validity of the input, it is assumed that

\partial g_{1} / \partial x_{1},

\partial g_{1} / \partial x_{2}, \partial g_{2} / \partial x_{1}, \partial g_{2} / \partial x_{2}, \partial g_{2} / \partial u

are all invertible matrices, and it has [50,51]:

λ [\frac{\partial g_{i}}{\partial x_{1}}] \leq \underline{π}, λ [\frac{\partial g_{i}}{\partial x_{2}}] \leq \underline{π}, λ [\frac{\partial g_{2}}{\partial u}] \leq \underline{π}, i = 1, 2,

(13)

where

λ [\partial g_{i} / \partial x_{1}], λ [\partial g_{i} / \partial x_{2}]

and

λ [\partial g_{2} / \partial u]

represents the eigenvalue of the matrix

\partial g_{i} / \partial x_{1}, \partial g_{i} / \partial x_{2}

and

\partial g_{2} / \partial u

, respectively.

Assumption 2.

The desired trajectory

y_{d}

and its first derivative are continuous and bounded [52].

Definition 1.

If for all

ϵ (t_{0}) = ϵ_{0}

, there exist

l > 0

and a settling time

T (l, ϵ_{0}) < \infty

such that

‖ ϵ (t) ‖ < l

holds for all

t \geq t_{0} + T

, then the equilibrium

ϵ = 0

of the nonlinear system

\dot{ϵ} = f (ϵ)

is semi-global practical finite-time stable [53].

Lemma 1.

For any

ƛ > 0

and

x \in R

, there exists [49,53]:

0 \leq |x| - x \tanh (\frac{x}{ƛ}) \leq κ ƛ,

(14)

where

κ = 0.2785

.

Lemma 2.

Given any constants

ℑ > 0

,

0 < l < 1

, and

℘ > 0

, consider system

\dot{ϵ} = f (ϵ)

[54]. If there exists a smooth positive-definite function

V (η)

such that:

\dot{V} (ϵ) \leq - ℑ V^{l} (ϵ) + ℘, t \geq 0,

(15)

holds, then system

\dot{ϵ} = f (ϵ)

is semi-global practical finite-time stable.

Lemma 3.

For any real variables

p, q

and any positive constants

a_{1}, a_{2}, ℏ

, the following inequality holds [55,56]:

{|p|}^{a_{1}} {|q|}^{a_{2}} \leq \frac{a_{1}}{a_{1} + a_{2}} ℏ {|p|}^{a_{1} + a_{2}} + \frac{a_{2}}{a_{1} + a_{2}} ℏ^{- \frac{a_{1}}{a_{2}}} {|q|}^{a_{1} + a_{2}} .

(16)

Lemma 4.

For

Υ_{i} \in R

and

0 < γ \leq 1

, the following inequality holds [57]:

{(\sum_{i = 1}^{n} |Υ_{i}|)}^{γ} \leq {\sum_{i = 1}^{n} |Υ_{i}|}^{γ} \leq n^{1 - γ} {(\sum_{i = 1}^{n} |Υ_{i}|)}^{γ} .

(17)

2.2. Designs of Actor–Critic Neural Networks

Research shows that radial basis function neural networks (RBFNNs) can approximate any smooth continuous function

f (x)

with any precision [58]. As a current research hotspot, the actor–critic framework enables the critic network to receive information from the environment and evaluate control performance through a cost function. Based on this evaluation, the actor network generates a control strategy for the actuator. Owing to the incorporation of reinforcement signals, the resulting control law exhibits faster convergence and reduced steady-state error. Based on this, an RL framework is developed in this study, using RBFNNs to approximate unknown nonlinear functions and penalty functions, namely:

f (x) = W_{a}^{* T} Φ_{a} (x) + ε_{a} (x), |ε_{a} (x)| \leq {\bar{ε}}_{a},

(18)

where

x = {[x_{1}, x_{2}, \dots x_{n}]}^{T}

is the input vector of the RBFNN,

ε (x)

is the approximation error,

W_{a}^{*}

is the optimized vector of the weight vector

W_{a} = {[{\underline{υ}}_{1}, {\underline{υ}}_{2}, \dots, {\underline{υ}}_{p}]}^{T}

, and

Φ_{a} (x) = {[Φ_{1} (x), \dots, Φ_{p} (x)]}^{T}

is the basis function vector,

p

is the number of hidden nodes,

Φ_{j} (x) = e^{- {(x - μ_{j})}^{2} / (2 σ_{j}^{2})}, j = 1, \dots, p

is the basis function,

μ_{j}

is the center of the basis function,

σ_{j}

is the width of the Gaussian basis function,

ε_{a} (x)

is the approximation error, and

{\bar{ε}}_{a}

is a positive integer.

When designing the actor–critic neural networks, several important design principles of the weight adaptive law in reinforcement learning are considered to ensure the effectiveness and stability of the system.

To approximate the unknown nonlinearity

Δ f_{i} ({\bar{x}}_{i})

in (12), an actor neural network

W_{a i}^{T} Φ_{a i} ({\bar{x}}_{i})

is introduced, which is expressed as follows:

Δ f_{i} ({\bar{x}}_{i}) = W_{a i}^{T} Φ_{a i} ({\bar{x}}_{i}) + ε_{a i},

(19)

where

{\bar{x}}_{i} = {[x_{1}, \dots x_{i}]}^{T} \in R^{i}, i = 1, 2

,

W_{a i}

is the ideal actor neural network weight.

Define the weight error of the actor neural network as

{\tilde{W}}_{a i} = {\hat{W}}_{a i} - W_{a i}

, then its approximation error is expressed as:

H_{a i} = {\tilde{W}}_{a i}^{T} Φ_{a i} ({\bar{x}}_{i}) .

(20)

To improve the tracking performance, a new error function of the actor neural network is constructed based on the approximation error and the penalty function:

\begin{array}{l} e_{a i} = H_{a i} + Γ_{i} {\hat{J}}_{i}, \\ E_{a i} = \frac{1}{2} e_{a i}^{T} e_{a i}, \end{array}

(21)

where

Γ_{i}

is the gain to be chosen, and the goal of updating the network weights to minimize

E_{a i}

.

Based on the information gathered from the environment, an integral penalty function is constructed to generate RL signals:

J_{i} (t) = \int_{τ}^{\infty} q_{i} (t) d t,

(22)

where

q_{i} (t) = z_{i}^{T} Q_{i} z_{i}

.

The specific meaning of the variables will be given later. The penalty function can be approximated by a critic neural network:

\begin{array}{l} J_{i} = W_{c i}^{T} Φ_{c i} ({\bar{x}}_{i}) + ε_{c i}, \\ {\hat{J}}_{i} = {\hat{W}}_{c i}^{T} Φ_{c i} ({\bar{x}}_{i}), \end{array}

(23)

where

W_{c i}

is the ideal critic neural network weight.

The update rule for the weights of the actor–critic network is designed as follows:

{\dot{\hat{W}}}_{a i} = - τ_{i} η_{i} {\hat{W}}_{a i} - η_{i} Φ_{a i} ({\bar{x}}_{i}) [Φ_{a i}^{T} {\hat{W}}_{a i} + {\hat{J}}_{i} Γ_{i}^{T}],

(24)

where

η_{i} > 0, τ_{i} > 0

are the parameters to be chosen.

Construct the residual mean square error function of the critic network:

\begin{array}{l} e_{c i} = q_{i} (t) + {\dot{\hat{J}}}_{i} = q_{i} (t) + {\hat{W}}_{c i}^{T} {\dot{Φ}}_{c i} ({\bar{x}}_{i}), \\ E_{c i} = \frac{1}{2} e_{c i}^{T} e_{c i} . \end{array}

(25)

The update rule for the weights of the critic network is derived using the gradient descent method as follows:

{\dot{\hat{W}}}_{c i} = - ϖ_{i} ({\hat{W}}_{c i}^{T} {\dot{Φ}}_{c i} + q_{i} (t)) {\dot{Φ}}_{c i} - ω_{i} ϖ_{i} {\hat{W}}_{c i},

(26)

where

ω_{i}

and

ϖ_{i}

are the positive design constants.

By applying these design principles, we can design an effective weight adaptive law for the actor–critic neural networks, which helps to improve the performance of the control system.

Remark 2.

The

\hat{J}

term approximates the unknown value function, enabling the actor to consider future rewards. This has helped the actor learn an optimal long-term policy, as focusing only on immediate rewards would lead to sub-optimal behavior. The second correction term enhances the stability and convergence of the actor–critic algorithm. It can adjust the actor’s update step, preventing aggressive updates that could cause divergence or oscillation. Moreover, it accounts for environmental uncertainties, improving the system’s robust ability to handle unexpected situations.

Property 1.

To satisfy the persistence excitation condition, the basis functions

Φ_{c i}

and

Φ_{a i}

, and the derivatives

{\dot{Φ}}_{c i}

and

{\dot{Φ}}_{a i}

are assumed to satisfy

‖{\dot{Φ}}_{a i}‖ \leq Φ_{a i m}, ‖{\dot{Φ}}_{c i}‖ \leq Φ_{c i m}, ‖Φ_{a i}‖

\leq Φ_{a i M}, ‖Φ_{c i}‖ \leq Φ_{c i M}, i = 1, 2

. Meanwhile, the estimation errors of actor–critic learning and their derivative are bounded, that is

|ε_{c i}| \leq ε_{c i M}

and

|{\dot{ε}}_{c i}| \leq ε_{c i m}

[51].

Remark 3.

RBFNNs are selected over other function approximations for several well-founded reasons. Firstly, owing to their universal approximation property, RBFNNs are capable of accurately approximating uncertainties [52]. Secondly, for high-dimensional data, RBFNNs offer computational efficiency as their training process is relatively straightforward.

Remark 4.

When determining the centers and widths, a trial-and-error technique is used to process the input data. The centers are set in high-density regions, and prior knowledge of the system’s critical operating points is incorporated. The widths are set according to the distances between the centers and fine-tuned on the validation set.

Remark 5.

The penalty function (22) and update laws (24) and (26) are designed with the aim of ensuring the stability and convergence of the RL framework. The penalty function serves to penalize undesired system behaviors, and the update laws are derived from Lyapunov techniques. The detailed derivations of these aspects will be presented in the revised manuscript.

Remark 6.

The actor–critic neural network and RBFNN architectures, learning mechanisms, and generalization abilities are designed to suit different application scenarios. The actor–critic is for dynamic scenarios, while RBFNN excels at local function approximation.

3. Main Results

This section details the design process of the RL-based adaptive fault-tolerant control method. First, by introducing an auxiliary integral term, the system is reformulated as an augmented affine system with the overall non-affine function treated as the control input. Next, a virtual control law is designed, which integrates the actor–critic network and the disturbance boundary estimator. The final control signal is obtained through recursive design, and the actual control input is generated through integration. The structure of the controller is shown in Figure 1.

3.1. The Design of the Augmented System

Through the auxiliary integration method, the augmented system is established as follows:

\begin{array}{l} {\dot{x}}_{1} (t) = f_{1} (x_{1} (t)) + Δ f_{1} (x_{1} (t)) + g_{1} (x_{1} (t), x_{2} (t)) + d_{1} (t) \\ {\dot{x}}_{2} (t) = f_{2} (x_{1} (t), x_{2} (t)) + Δ f_{2} (x_{1} (t), x_{2} (t)) + g_{2} (x_{1} (t), x_{2} (t), u (t)) + d_{2} (t) \\ \dot{u} = u_{f} \\ y (t) = x_{1} (t), \end{array}

(27)

where

d_{1} (t)

denotes mismatched disturbance,

d_{2} (t)

denotes matched disturbance, which are compensated by the Tanh function boundary estimator. Additionally, unknown nonlinear functions

Δ f_{1} (x_{1} (t)), Δ f_{2} (x_{1} (t), x_{2} (t))

are approximated using an actor–critic framework.

Different from the affine system, non-affine systems treat the non-affine function holistically as the subsystem input, with the tracking error formally defined as follows:

\begin{array}{l} z_{1} (t) = x_{1} (t) - y_{d} \\ z_{2} (t) = g_{1} (x_{1} (t), x_{2} (t)) - x_{1 d} (t) \\ z_{3} (t) = g_{2} (x_{1} (t), x_{2} (t), u (t)) - x_{2 d} (t), \end{array}

(28)

where

y_{d}

is the desired output value of the system,

x_{1 d}

and

x_{2 d}

, the constructed virtual control variables, will be presented below.

Let

x_{i c}

denote the virtual control law to be designed for the i-th subsystem. To mitigate differential explosion, a first-order low-pass filter [58] is introduced as:

δ_{i} {\dot{x}}_{i d} + x_{i d} = x_{i c}, x_{i d} (0) = x_{i c} (0), i = 1, 2,

(29)

where

0 < δ_{i} < 1

denotes the filter time constant to be designed, through which both

x_{i d}

and its derivative can be obtained. The boundary layer error is defined as:

y_{i} = x_{i d} - x_{i c},

(30)

where

x_{i c}

denotes the filtered value of the virtual controller

x_{i d}

.

Then we can obtain:

{\dot{y}}_{i} = {\dot{x}}_{i d} - {\dot{x}}_{i c} = - \frac{y_{i}}{δ_{i}} - {\dot{x}}_{i c} .

(31)

3.2. Controller Design

Step 1. From the first equality in Equation (28), it yields:

\begin{array}{l} {\dot{z}}_{1} (t) = {\dot{x}}_{1} (t) - {\dot{y}}_{d} \\ = f_{1} (x_{1} (t)) + Δ f_{1} (x_{1} (t)) + g_{1} (x_{1} (t), x_{2} (t)) + d_{1} (t) - {\dot{y}}_{d} \\ = f_{1} + z_{2} + y_{1} + x_{1 c} + d_{1} + Δ f_{1} - {\dot{y}}_{d} . \end{array}

(32)

Let

Δ {\hat{f}}_{1}

denote the estimated value of the inner-loop nonlinearity

Δ f_{1}

. According to Section 2.2, it can be approximated using an actor network, expressed as:

Δ f_{1} = W_{a 1}^{T} Φ_{a 1} (x_{1}) + ε_{a 1},

(33)

where

ε_{a 1}

is the error of the actor network.

Define

D_{1} = \sup_{t \geq 0} ‖d_{1} (t) + ε_{a_{1}}‖

, thus, the virtual controller can be designed as:

x_{1 c} = - k_{1} {(z_{1}^{T} z_{1})}^{l - 1} z_{1} - {\hat{W}}_{a 1}^{T} Φ_{1} (x_{1}) - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) - f_{1} + {\dot{y}}_{d} .

(34)

Let

{\hat{D}}_{1}

denote the estimated value of the mismatched disturbance. From Equations (32) and (34), we obtain:

z_{1}^{T} {\dot{z}}_{1} = - k_{1} {(z_{1}^{T} z_{1})}^{l} + z_{1}^{T} (z_{2} + y_{1}) + z_{1}^{T} (d_{1} + ε_{a 1} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) - z_{1}^{T} {\tilde{W}}_{a 1}^{T} Φ_{a 1} (x_{1}),

(35)

where

{\tilde{W}}_{a 1} = {\hat{W}}_{a 1} - W_{a 1}

.

The integral penalty function of the control system is defined as:

J_{1} (t) = \int_{τ}^{\infty} q_{1} (t) d t,

where

q_{1} (t) = z_{1}^{T} Q_{1} z_{1}

. Then, the actor–critic network weight updating law is designed as:

{\dot{\hat{W}}}_{a 1} = - τ_{1} η_{1} {\hat{W}}_{a 1} - η_{1} Φ_{a 1} (x_{1}) [Φ_{a 1}^{T} {\hat{W}}_{a 1} + {\hat{J}}_{1} Γ_{1}^{T}] .

(36)

The critic network weight updating law is derived via the gradient descent method as:

{\dot{\hat{W}}}_{c 1} = - ϖ_{1} ({\dot{\hat{W}}}_{c 1}^{T} {\dot{Φ}}_{c 1} + q_{1} (t)) {\dot{Φ}}_{c 1} - ω_{1} ϖ_{1} {\hat{W}}_{c 1} .

(37)

The updating law of

{\hat{D}}_{1}

is given as:

{\dot{\hat{D}}}_{1} = τ_{D_{1}} z_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) - τ_{D_{1}} γ_{D_{1}} {\hat{D}}_{1} .

(38)

Step 2. Define

D_{2} = \sup_{t \geq 0} ‖d_{2} (t) + ε_{a 2}‖

, where

{\hat{D}}_{2}

is the estimated value. According to Equation (28), we have:

\begin{array}{l} {\dot{z}}_{2} = {\dot{g}}_{1} - {\dot{x}}_{1 d} \\ = \frac{\partial g_{1}}{\partial x_{1}} {\dot{x}}_{1} + \frac{\partial g_{1}}{\partial x_{2}} {\dot{x}}_{2} - {\dot{x}}_{1 d} \\ = \frac{\partial g_{1}}{\partial x_{1}} (f_{1} + g_{1} + d_{1} + Δ f_{1}) + \frac{\partial g_{1}}{\partial x_{2}} (f_{2} + Δ f_{2} + z_{3} + y_{2} + x_{2 c} + d_{2}) - {\dot{x}}_{1 d} . \end{array}

(39)

Introduce the actor network to compensate

Δ f_{2} (t, x_{1}, x_{2})

:

\begin{array}{l} Δ f_{2} (t, x_{1}, x_{2}) = W_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) + ε_{a 2}, \\ Δ {\hat{f}}_{2} (t, x_{1}, x_{2}) = {\hat{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) . \end{array}

(40)

The designed virtual control law is:

x_{2 c} = {(\frac{\partial g_{1}}{\partial x_{2}})}^{- 1} (\begin{array}{l} - k_{2} {(z_{2}^{T} z_{2})}^{l - 1} z_{2} - (\frac{1}{2} + \frac{5}{2} {\underline{π}}^{2}) z_{2} - \frac{\partial g_{1}}{\partial x_{1}} (f_{1} + g_{1} + {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) + {\hat{W}}_{a 1}^{T} Φ_{a 1} (x_{1})) \\ - \frac{\partial g_{1}}{\partial x_{2}} ({\hat{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) + f_{2} + {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) + {\dot{x}}_{1 d} \end{array}) .

(41)

Substituting into Equation (39), it yields:

\begin{array}{l} z_{2}^{T} {\dot{z}}_{2} = - k_{2} {(z_{2}^{T} z_{2})}^{l} + \frac{\partial g_{1}}{\partial x_{1}} z_{2} (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) - \frac{\partial g_{1}}{\partial x_{1}} z_{2} {\tilde{W}}_{a 1}^{T} Φ_{a 1} (x_{1}) + \frac{\partial g_{1}}{\partial x_{2}} z_{2} (z_{3} + y_{2}) \\ - \frac{\partial g_{1}}{\partial x_{2}} z_{2} {\tilde{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) + \frac{\partial g_{1}}{\partial x_{2}} z_{2} (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) \end{array}

(42)

Similarly, the actor–critic framework is designed with the penalty function defined as:

J_{2} (t) = \int_{τ}^{\infty} q_{2} (t) d t

and

q_{2} (t) = z_{2}^{T} Q_{2} z_{2}

.

Construct the critic network module effect error function and design the weight update law as:

\begin{array}{l} {\dot{\hat{W}}}_{a 2} = - \frac{\partial g_{1}}{\partial x_{2}} η_{2} Φ_{a 2} (x_{2}) [\frac{\partial g_{1}}{\partial x_{2}} Φ_{a 2}^{T} {\hat{W}}_{a 2} + {\hat{J}}_{2} Γ_{2}^{T}] - η_{2} τ_{2} {\hat{W}}_{a 2}, \\ {\dot{\hat{W}}}_{c 2} = - ϖ_{2} ({\dot{\hat{W}}}_{_{c 2}}^{T} {\dot{Φ}}_{c 2} + q_{2} (t)) {\dot{Φ}}_{c 2} - ω_{2} ϖ_{2} {\hat{W}}_{c 2} . \end{array}

(43)

The updating law of

{\hat{D}}_{2}

is given as:

{\dot{\hat{D}}}_{2} = \frac{\partial g_{1}}{\partial x_{2}} τ_{D_{2}} z_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D 2}}) - τ_{D_{2}} γ_{D_{2}} {\hat{D}}_{2} .

(44)

Step 3. From the third equation of Equation (28), let

u_{f} (t) = \dot{u} (t)

, it can be obtained:

\begin{array}{l} {\dot{z}}_{3} = {\dot{g}}_{2} - {\dot{x}}_{2 d} \\ = \frac{\partial g_{2}}{\partial x_{1}} {\dot{x}}_{1} + \frac{\partial g_{2}}{\partial x_{2}} {\dot{x}}_{2} + \frac{\partial g_{2}}{\partial u} \dot{u} - {\dot{x}}_{2 d} \\ = \frac{\partial g_{2}}{\partial x_{1}} (f_{1} + g_{1} + d_{1} + Δ f_{1}) + \frac{\partial g_{2}}{\partial x_{2}} (f_{2} + g_{2} + Δ f_{2} + d_{2}) + \frac{\partial g_{2}}{\partial u} u_{f} - {\dot{x}}_{2 d} . \end{array}

(45)

The virtual control signal is designed as:

u_{f} = {(\frac{\partial g_{2}}{\partial u})}^{- 1} (\begin{array}{l} - k_{3} {(z_{3}^{T} z_{3})}^{l - 1} z_{3} - (\frac{1}{2} + 2 {\underline{π}}^{2}) z_{3} - \frac{\partial g_{2}}{\partial x_{1}} (f_{1} + g_{1} + {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) + {\hat{W}}_{a 1}^{T} Φ_{a 1} (x_{1})) \\ - \frac{\partial g_{2}}{\partial x_{2}} (f_{2} + g_{2} + {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}}) + {\hat{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2})) + {\dot{x}}_{2 d} \end{array}) .

(46)

We can obtain:

\begin{matrix} z_{3}^{T} {\dot{z}}_{3} = - k_{3} {(z_{3}^{T} z_{3})}^{l} + \frac{\partial g_{2}}{\partial x_{1}} z_{3} (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) - \frac{\partial g_{2}}{\partial x_{1}} z_{3} {\tilde{W}}_{a 1}^{T} Φ_{a 1} (x_{1}) \\ + \frac{\partial g_{2}}{\partial x_{2}} z_{3} (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) - \frac{\partial g_{2}}{\partial x_{2}} z_{3} {\tilde{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) . \end{matrix}

(47)

3.3. Proof of Stability

Theorem 1.

For a second-order non-affine nonlinear system that satisfies Assumptions 1–2 and Lemmas 1–4, if the virtual control signals, adaptive laws, and reinforcement learning (RL) update laws are designed as described in Section 3.1, the closed-loop system can achieve the control the objectives (a) and (b).

Proof.

① Choose the overall candidate Lyapunov function as:

\begin{array}{l} V = \frac{1}{2} z_{1}^{T} z_{1} + \frac{1}{2} z_{2}^{T} z_{2} + \frac{1}{2} z_{3}^{T} z_{3} + \frac{1}{2} y_{1}^{T} y_{1} + \frac{1}{2} y_{2}^{T} y_{2} + \frac{1}{2} T r ({\tilde{W}}_{a 1}^{T} η_{1}^{- 1} {\tilde{W}}_{a 1}) + \frac{1}{2} T r ({\tilde{W}}_{c 1}^{T} ϖ_{1}^{- 1} {\tilde{W}}_{c 1}) \\ + \frac{1}{2} T r ({\tilde{W}}_{a 2}^{T} η_{2}^{- 1} {\tilde{W}}_{a 2}) + \frac{1}{2} T r ({\tilde{W}}_{c 2}^{T} ϖ_{2}^{- 1} {\tilde{W}}_{c 2}) + \frac{1}{2 τ_{D 1}} {\tilde{D}}_{1}^{T} {\tilde{D}}_{1} + \frac{1}{2 τ_{D 2}} {\tilde{D}}_{2}^{T} {\tilde{D}}_{2} \end{array}

(48)

By taking its derivative, we can obtain:

\begin{array}{l} \dot{V} = z_{1}^{T} {\dot{z}}_{1} + z_{2}^{T} {\dot{z}}_{2} + z_{3}^{T} {\dot{z}}_{3} + y_{1}^{T} {\dot{y}}_{1} + y_{2}^{T} {\dot{y}}_{2} + \frac{1}{τ_{D_{1}}} {\tilde{D}}_{1}^{T} {\dot{\hat{D}}}_{1} + \frac{1}{τ_{D_{2}}} {\tilde{D}}_{2}^{T} {\dot{\hat{D}}}_{2} + T r ({\tilde{W}}_{a 1}^{T} η_{1}^{- 1} {\dot{\hat{W}}}_{a 1}) \\ + T r ({\tilde{W}}_{c 1}^{T} ϖ_{1}^{- 1} {\dot{\hat{W}}}_{c 1}) + T r ({\tilde{W}}_{a 2}^{T} η_{2}^{- 1} {\dot{\hat{W}}}_{a 1}) + T r ({\tilde{W}}_{c 2}^{T} ϖ_{2}^{- 1} {\dot{\hat{W}}}_{c 2}) \\ = - k_{1} {(z_{1}^{T} z_{1})}^{l} - k_{2} {(z_{2}^{T} z_{2})}^{l} - k_{3} {(z_{3}^{T} z_{3})}^{l} - \frac{3}{2} z_{1}^{T} z_{1} - (\frac{1}{2} + \frac{5}{2} {\underline{π}}^{2}) z_{2}^{T} z_{2} - (\frac{1}{2} + 2 {\underline{π}}^{2}) z_{3}^{T} z_{3} - \frac{y_{1}^{T} y_{1}}{δ_{1}} \\ - {\dot{x}}_{1 c}^{T} y_{1} - \frac{y_{2}^{T} y_{2}}{δ_{2}} - {\dot{x}}_{2 c}^{T} y_{2} + z_{1} (z_{2} + y_{1}) + \frac{\partial g_{1}}{\partial x_{2}} z_{2} (z_{3} + y_{2}) + (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) z_{1} \\ + (\frac{\partial g_{1}}{\partial x_{1}} z_{2} + \frac{\partial g_{2}}{\partial x_{1}} z_{3}) (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) + (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) \frac{\partial g_{1}}{\partial x_{2}} z_{2} \\ + \frac{\partial g_{2}}{\partial x_{2}} z_{3} (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) - {\tilde{W}}_{a 1}^{T} Φ_{a 1} (x_{1}) (z_{1} + \frac{\partial g_{1}}{\partial x_{1}} z_{2} + \frac{\partial g_{2}}{\partial x_{1}} z_{3}) \\ - {\tilde{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) (\frac{\partial g_{1}}{\partial x_{2}} z_{2} + \frac{\partial g_{2}}{\partial x_{2}} z_{3}) + \frac{1}{τ_{D_{1}}} {\tilde{D}}_{1}^{T} {\dot{\hat{D}}}_{1} + \frac{1}{τ_{D_{2}}} {\tilde{D}}_{2}^{T} {\dot{\hat{D}}}_{2} + T r ({\tilde{W}}_{a 1}^{T} η_{1}^{- 1} {\dot{\hat{W}}}_{a 1}) \\ + T r ({\tilde{W}}_{c 1}^{T} ϖ_{1}^{- 1} {\dot{\hat{W}}}_{c 1}) + T r ({\tilde{W}}_{a 2}^{T} η_{2}^{- 1} {\dot{\hat{W}}}_{a 2}) + T r ({\tilde{W}}_{c 2}^{T} ϖ_{2}^{- 1} {\dot{\hat{W}}}_{c 2}) . \end{array}

(49)

According to Lemma 3, we have:

\begin{array}{l} - {\dot{x}}_{1 c}^{T} y_{1} \leq \frac{y_{1}^{T} y_{1}}{2 δ_{1}} + \frac{δ_{1} {|{\dot{x}}_{1 c}|}^{2}}{2}, - {\dot{x}}_{2 c}^{T} y_{2} \leq \frac{y_{2}^{T} y_{2}}{2 δ_{2}} + \frac{δ_{2} {|{\dot{x}}_{2 c}|}^{2}}{2}, \\ z_{1}^{T} (z_{2} + y_{1}) + \frac{\partial g_{1}}{\partial x_{2}} z_{2} (z_{3} + y_{2}) \leq z_{1}^{T} z_{1} + (\frac{1}{2} + {\underline{π}}^{2}) z_{2}^{T} z_{2} + \frac{1}{2} z_{3}^{T} z_{3} + \frac{1}{2} y_{1}^{T} y_{1} + \frac{1}{2} y_{2}^{T} y_{2}, \\ (\frac{\partial g_{1}}{\partial x_{1}} z_{2} + \frac{\partial g_{2}}{\partial x_{1}} z_{3}) (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) \leq \frac{1}{2} {\underline{π}}^{2} z_{2}^{T} z_{2} + \frac{1}{2} {\underline{π}}^{2} z_{3}^{T} z_{3} + \frac{3}{2} {\bar{ε}}_{D_{1}}^{2}, \\ \frac{\partial g_{2}}{\partial x_{2}} z_{3} (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) \leq \frac{1}{2} {\underline{π}}^{2} z_{3}^{T} z_{3} + \frac{1}{2} {\bar{ε}}_{D_{2}}^{2}, \\ - {\tilde{W}}_{a 1}^{T} Φ_{a 1} (x_{1}) (z_{1} + \frac{\partial g_{1}}{\partial x_{1}} z_{2} + \frac{\partial g_{2}}{\partial x_{1}} z_{3}) \leq \frac{1}{2} z_{1}^{T} z_{1} + \frac{1}{2} {\underline{π}}^{2} (z_{2}^{T} z_{2} + z_{3}^{T} z_{3}) + \frac{3}{2} Φ_{a 1 M}^{2} T r ({\tilde{W}}_{a 1}^{T} {\tilde{W}}_{a 1}), \\ - {\tilde{W}}_{a 2}^{T} Φ_{a 2} (x_{1}, x_{2}) (\frac{\partial g_{1}}{\partial x_{2}} z_{2} + \frac{\partial g_{2}}{\partial x_{2}} z_{3}) \leq \frac{1}{2} {\underline{π}}^{2} (z_{2}^{T} z_{2} + z_{3}^{T} z_{3}) + Φ_{a 2 M}^{2} T r ({\tilde{W}}_{a 2}^{T} {\tilde{W}}_{a 2}) . \end{array}

(50)

It follows from Lemma 1 that:

\begin{array}{l} z_{1} (d_{1} + ε_{a_{1}} - {\hat{D}}_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}})) \leq - {\tilde{D}}_{1} z_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) + κ ε_{D_{1}}, \\ \frac{\partial g_{1}}{\partial x_{2}} z_{2} (d_{2} + ε_{a_{2}} - {\hat{D}}_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}})) \leq - {\tilde{D}}_{2} \frac{\partial g_{1}}{\partial x_{2}} z_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}}) + κ ε_{D_{2}} . \end{array}

(51)

Combing Equations (38) and (44), we can obtain:

\begin{array}{l} \frac{1}{τ_{D_{1}}} {\tilde{D}}_{1}^{T} {\dot{\hat{D}}}_{1} = {\tilde{D}}_{1} z_{1} \tanh (\frac{z_{1}}{ε_{D_{1}}}) - γ_{D_{1}} {\hat{D}}_{1} {\tilde{D}}_{1}, \\ \frac{1}{τ_{D_{2}}} {\tilde{D}}_{2}^{T} {\dot{\hat{D}}}_{2} = {\tilde{D}}_{2} \frac{\partial g_{1}}{\partial x_{2}} z_{2} \tanh (\frac{\partial g_{1}}{\partial x_{2}} \cdot \frac{z_{2}}{ε_{D_{2}}}) - γ_{D_{2}} {\hat{D}}_{2} {\tilde{D}}_{2} . \end{array}

(52)

Through transforming the inequalities, we derive the following relationship:

\begin{array}{l} - γ_{D_{1}} {\hat{D}}_{1}^{T} {\tilde{D}}_{1} \leq - \frac{1}{2} γ_{D_{1}} {\tilde{D}}_{1}^{T} {\tilde{D}}_{1} + \frac{1}{2} γ_{D_{1}} D_{1}^{T} D_{1}, \\ - γ_{D_{2}} {\hat{D}}_{2}^{T} {\tilde{D}}_{2} \leq - \frac{1}{2} γ_{D_{2}} {\tilde{D}}_{2}^{T} {\tilde{D}}_{2} + \frac{1}{2} γ_{D_{2}} D_{2}^{T} D_{2}, \\ T r ({\tilde{W}}_{a 1}^{T} η_{1}^{- 1} {\dot{\tilde{W}}}_{a 1}^{}) + T r ({\tilde{W}}_{c 1}^{T} ϖ_{1}^{- 1} {\dot{\tilde{W}}}_{c 1}^{}) \\ \leq - \frac{1}{2} (τ_{1} - Φ_{a 1 M}^{2}) T r ({\tilde{W}}_{a 1}^{T} {\tilde{W}}_{a 1}) - \frac{1}{2} (ω_{1} - Φ_{c 1 m}^{2} - Φ_{c 1 M}^{2} Γ_{1 M}^{2}) T r ({\tilde{W}}_{c 1}^{T} {\tilde{W}}_{c 1}) \\ + \frac{1}{2} (τ_{1} + Φ_{a 1 M}^{2}) T r (W_{a 1}^{T} W_{a 1}) + \frac{1}{2} (ω_{1} + 2 Φ_{c 1 m}^{2} + Φ_{c 1 M}^{2} Γ_{1 M}^{2}) T r (W_{c 1}^{T} W_{c 1}) + \frac{1}{2} ε_{c 1 m}^{2}, \\ T r ({\tilde{W}}_{_{a 2}}^{T} η_{2}^{- 1} {\dot{\tilde{W}}}_{_{a 2}}^{}) + T r ({\tilde{W}}_{_{c 2}}^{T} ϖ_{2}^{- 1} {\dot{\tilde{W}}}_{_{c 2}}^{}) \\ \leq - \frac{1}{2} (τ_{2} - {\underline{π}}^{2} Φ_{a 2 M}^{2} - 2 π Φ_{a 2 M}^{2}) T r ({\tilde{W}}_{a 2}^{T} {\tilde{W}}_{a 2}) - \frac{1}{2} (ω_{2} - Φ_{c 2 m}^{2} - \underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2}) T r ({\tilde{W}}_{_{c 2}}^{T} {\tilde{W}}_{c 2}) \\ + \frac{1}{2} ({\underline{π}}^{2} Φ_{a 2 M}^{2} + τ_{2}) T r (W_{a 2}^{T} W_{a 2}) + \frac{1}{2} (\underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2} + ω_{2} + 2 Φ_{c 2 m}^{2}) T r (W_{_{c 2}}^{T} W_{c 2}) + \frac{1}{2} ε_{c 2 m}^{2} . \end{array}

(53)

It can be sorted out:

\begin{array}{l} \dot{V} \leq - k_{1} {(z_{1}^{T} z_{1})}^{l} - k_{2} {(z_{2}^{T} z_{2})}^{l} - k_{3} {(z_{3}^{T} z_{3})}^{l} - (\frac{1}{2 δ_{1}} - \frac{1}{2}) y_{1}^{T} y_{1} - (\frac{1}{2 δ_{2}} - \frac{1}{2}) y_{2}^{T} y_{2} \\ - \frac{1}{2} γ_{D_{1}} {\tilde{D}}_{1}^{T} {\tilde{D}}_{1} - \frac{1}{2} γ_{D_{2}} {\tilde{D}}_{2}^{T} {\tilde{D}}_{2} - \frac{1}{2} (τ_{1} - 4 Φ_{a 1 M}^{2}) T r ({\tilde{W}}_{a 1}^{T} {\tilde{W}}_{a 1}) - \frac{1}{2} (ω_{1} - Φ_{c 1 m}^{2} - Φ_{c 1 M}^{2} Γ_{1 M}^{2}) T r ({\tilde{W}}_{c 1}^{T} {\tilde{W}}_{c 1}) \\ - \frac{1}{2} (τ_{2} - {\underline{π}}^{2} Φ_{a 2 M}^{2} - 2 \underline{π} Φ_{a 2 M}^{2} - 2 Φ_{a 2 M}^{2}) T r ({\tilde{W}}_{a 2}^{T} {\tilde{W}}_{a 2}) - \frac{1}{2} (ω_{2} - Φ_{c 2 m}^{2} - \underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2}) T r ({\tilde{W}}_{c 2}^{T} {\tilde{W}}_{c 2}) \\ + κ ε_{D_{1}} + \frac{3}{2} {\bar{ε}}_{D_{1}}^{2} + κ ε_{D_{2}} + \frac{1}{2} {\bar{ε}}_{D_{2}}^{2} + \frac{1}{2} γ_{D_{1}} D_{1}^{T} D_{1} + \frac{1}{2} γ_{D_{2}} D_{2}^{T} D_{2} + \frac{δ_{1} {|{\dot{x}}_{1 c}|}^{2}}{2} + \frac{δ_{2} {|{\dot{x}}_{2 c}|}^{2}}{2} \\ + \frac{1}{2} (τ_{1} + Φ_{a 1 M}^{2}) T r (W_{a 1}^{T} W_{a 1}) + \frac{1}{2} (ω_{1} + 2 Φ_{c 1 m}^{2} + Φ_{c 1 M}^{2} Γ_{1 M}^{2}) T r (W_{c 1}^{T} W_{c 1}) + \frac{1}{2} ε_{c 1 m}^{2} \\ + \frac{1}{2} ({\underline{π}}^{2} Φ_{a 2 M}^{2} + τ_{2}) T r (W_{a 2}^{T} W_{a 2}) + \frac{1}{2} (\underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2} + ω_{2} + 2 Φ_{c 2 m}^{2}) T r (W_{_{c 2}}^{T} W_{c 2}) + \frac{1}{2} ε_{c 2 m}^{2} . \end{array}

(54)

By defining:

\bar{C} = \min \{\begin{array}{l} 2 k_{1}, 2 k_{2}, 2 k_{3}, (\frac{1}{δ_{1}} - 1), (\frac{1}{δ_{2}} - 1), τ_{D_{1}} γ_{D_{1}}, τ_{D_{2}} γ_{D_{2}}, (τ_{1} - 4 Φ_{a 1 M}^{2}), (ω_{1} - Φ_{c 1 m}^{2} - Φ_{c 1 M}^{2} Γ_{1 M}^{2}), \\ (τ_{2} - {\underline{π}}^{2} Φ_{a 2 M}^{2} - 2 \underline{π} Φ_{a 2 M}^{2} - 2 Φ_{a 2 M}^{2}), (ω_{2} - Φ_{c 2 m}^{2} - \underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2}) \end{array}\},

(55)

\begin{array}{l} ε = κ ε_{D_{1}} + \frac{3}{2} {\bar{ε}}_{D_{1}}^{2} + κ ε_{D_{2}} + \frac{1}{2} {\bar{ε}}_{D_{2}}^{2} + \frac{1}{2} γ_{D_{1}} D_{1}^{T} D_{1} + \frac{1}{2} γ_{D_{2}} D_{2}^{T} D_{2} + \frac{δ_{1} {|{\dot{x}}_{1 c}|}^{2}}{2} + \frac{δ_{2} {|{\dot{x}}_{2 c}|}^{2}}{2} \\ + \frac{1}{2} (τ_{1} + Φ_{a 1 M}^{2}) T r (W_{a 1}^{T} W_{a 1}) + \frac{1}{2} (ω_{1} + 2 Φ_{c 1 m}^{2} + Φ_{c 1 M}^{2} Γ_{1 M}^{2}) T r (W_{c 1}^{T} W_{c 1}) + \frac{1}{2} ε_{c 1 m}^{2} \\ + \frac{1}{2} ({\underline{π}}^{2} Φ_{a 2 M}^{2} + τ_{2}) T r (W_{a 2}^{T} W_{a 2}) + \frac{1}{2} (\underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2} + ω_{2} + 2 Φ_{c 2 m}^{2}) T r (W_{c 2}^{T} W_{c 2}) + \frac{1}{2} ε_{c 2 m}^{2} . \end{array}

(56)

Then, Equation (54) can be rewritten as:

\begin{array}{l} \dot{V} \leq - \frac{1}{2} \bar{C} {(z_{1}^{T} z_{1})}^{l} - \frac{1}{2} \bar{C} {(z_{2}^{T} z_{2})}^{l} - \frac{1}{2} \bar{C} {(z_{3}^{T} z_{3})}^{l} - \frac{1}{2} \bar{C} y_{1}^{T} y_{1} - \frac{1}{2} \bar{C} y_{2}^{T} y_{2} \\ - \frac{\bar{C}}{2 τ_{D 1}} {\tilde{D}}_{1}^{T} {\tilde{D}}_{1} - \frac{\bar{C}}{2 τ_{D 2}} {\tilde{D}}_{2}^{T} {\tilde{D}}_{2} - \frac{\bar{C}}{2 η_{1}} T r ({\tilde{W}}_{a 1}^{T} {\tilde{W}}_{a 1}) - \frac{\bar{C}}{2 ϖ_{1}} T r ({\tilde{W}}_{c 1}^{T} {\tilde{W}}_{c 1}) \\ - \frac{\bar{C}}{2 η_{1}} T r ({\tilde{W}}_{a 2}^{T} {\tilde{W}}_{a 2}) - \frac{\bar{C}}{2 ϖ_{2}} T r ({\tilde{W}}_{c 2}^{T} {\tilde{W}}_{c 2}) + ε . \end{array}

(57)

According to Lemma, we obtain:

\begin{array}{l} {(\frac{1}{2} y_{i}^{T} y_{i})}^{l} \leq (1 - l) l^{\frac{l}{1 - l}} + \frac{1}{2} y_{i}^{T} y_{i}, \\ {(\frac{1}{2 τ_{D i}} {\tilde{D}}_{i}^{T} {\tilde{D}}_{i})}^{l} \leq (1 - l) l^{\frac{l}{1 - l}} + \frac{1}{2 τ_{D i}} {\tilde{D}}_{i}^{T} {\tilde{D}}_{i}, \\ {(\frac{1}{2 η_{i}} T r ({\tilde{W}}_{a i}^{T} {\tilde{W}}_{a i}))}^{l} \leq (1 - l) l^{\frac{l}{1 - l}} + \frac{1}{2 η_{i}} T r ({\tilde{W}}_{a i}^{T} {\tilde{W}}_{a i}), \\ {(\frac{1}{2 ϖ_{i}} T r ({\tilde{W}}_{c i}^{T} {\tilde{W}}_{c i}))}^{l} \leq (1 - l) l^{\frac{l}{1 - l}} + \frac{1}{2 ϖ_{i}} T r ({\tilde{W}}_{c i}^{T} {\tilde{W}}_{c i}) . \end{array}

(58)

It follows from Lemma 4 that:

- \bar{C} (\frac{1}{2} {(z_{i}^{T} z_{i})}^{l}) \leq - 2^{l - 1} \bar{C} {(\frac{1}{2} z_{i}^{T} z_{i})}^{l} .

(59)

Finally, we can obtain:

\begin{array}{l} \dot{V} \leq - 2^{l - 1} \bar{C} {(\frac{1}{2} z_{1}^{T} z_{1})}^{l} - 2^{l - 1} \bar{C} {(\frac{1}{2} z_{2}^{T} z_{2})}^{l} - 2^{l - 1} \bar{C} {(\frac{1}{2} z_{3}^{T} z_{3})}^{l} - \bar{C} {(\frac{1}{2} y_{1}^{T} y_{1})}^{l} + 9 \bar{C} (1 - l) l^{\frac{l}{1 - l}} \\ - \bar{C} {(\frac{1}{2} y_{2}^{T} y_{2})}^{l} - \bar{C} {(\frac{1}{2} y_{3}^{T} y_{3})}^{l} - \bar{C} {(\frac{1}{2 τ_{D 1}} {\tilde{D}}_{1}^{T} {\tilde{D}}_{1})}^{l} - \bar{C} {(\frac{1}{2 τ_{D 2}} {\tilde{D}}_{2}^{T} {\tilde{D}}_{2})}^{l} \\ - \bar{C} {(\frac{1}{2 η_{1}} T r ({\tilde{W}}_{a 1}^{T} {\tilde{W}}_{a 1}))}^{l} - \bar{C} (\frac{1}{2 η_{2}} T r ({\tilde{W}}_{a 2}^{T} {\tilde{W}}_{a 2})) \\ - {(\frac{1}{2 ϖ_{2}} T r ({\tilde{W}}_{c 2}^{T} {\tilde{W}}_{c 2}))}^{l} + ε . \end{array}

(60)

Let:

\begin{array}{l} C = \min \{2^{l - 1} \bar{C}, \bar{C}\} \\ D = ε + 9 \bar{C} (1 - l) l^{\frac{l}{1 - l}} . \end{array}

(61)

Then, Equation (60) can be rewritten as:

\dot{V} \leq - C V^{l} + D .

(62)

Based on Lemma 2, selecting the parameters

k_{i} > 0, i = 1, 2, 3, 0 < δ_{i} < 1, τ_{D_{i}} > 0, γ_{D_{i}} > 0, τ_{1}

> 4 Φ_{a 1 M}^{2}, ω_{1} > Φ_{c 1 m}^{2} + Φ_{c 1 M}^{2} Γ_{1 M}^{2}, τ_{2} > {\underline{π}}^{2} Φ_{a 2 M}^{2} + 2 \underline{π} Φ_{a 2 M}^{2} + 2 Φ_{a 2 M}^{2}, ω_{2} > Φ_{c 2 m}^{2} + \underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2} + \underline{π} Φ_{c 2 M}^{2} Γ_{2 M}^{2},

i = 1, 2

to derive the explicit conditions for

\bar{C} > 0

, we conclude that all the signals of the closed-loop system are semi-global practical finite-time stable.

② Given

T_{r} = \frac{1}{(1 - l) ℵ C} [V^{1 - l} (0) - {(\frac{D}{(1 - ℵ) C})}^{Ξ}]

, where

Ξ = (1 - l) / l, ℵ \in (0, 1)

. With the analysis in [52], it yields:

V \leq {(\frac{D}{(1 - ℵ) C})}^{1 / l}, \forall t \geq T_{r} .

(63)

Then, combining Equation (48), it can be obtained:

|z_{1} (t)| = |x_{1} (t) - y_{d}| < 2 {(\frac{D}{(1 - ℵ) C})}^{\frac{1}{2 l}}, \forall t \geq T_{r} .

(64)

Thereby, it can be concluded that the tracking error converges to and remains in a small neighborhood of the desired signals for

\forall t \geq T_{r}

. □

Remark 7.

The system tracking error

z_{1} (t)

enters and remains within a steady-state neighborhood determined by the parameters

D, C, ℵ, l

, after a finite time

T_{r}

. The convergence process can be accelerated by appropriately increasing the value of

l

.

4. Simulation Study

In this section, numerical simulations are performed to confirm the effectiveness and advantages of the developed RL-based finite-time adaptive fault tolerant control method for a morphing unmanned aircraft.

The initial conditions are chosen as:

{[\begin{matrix} α & β & γ_{v} \end{matrix}]}^{T} = {[\begin{matrix} 20^{\circ} & 5^{\circ} & - 5^{\circ} \end{matrix}]}^{T}

,

{[\begin{matrix} ω_{x} & ω_{y} & ω_{z} \end{matrix}]}^{T} =

{[\begin{matrix} 0.1 r a d / s & 0.1 r a d / s & 0.1 r a d / s \end{matrix}]}^{T}

; The adaptive parameters and RL parameters are set as:

k_{1} = 3

,

k_{2} = 5

,

k_{3} = 8

,

τ_{1} = τ_{2} = 8

,

η_{1} = 10

,

η_{2} = 1

,

ϖ_{1} = 5

,

ϖ_{2} = 0.5

,

ω_{1} = 0.5

,

ω_{2} = 0.1

; The disturbance compensation adaptative parameters are defined as:

ε_{D_{1}} = ε_{D_{2}} = 0.01

,

τ_{D_{1}} = τ_{D_{2}} = 10

,

γ_{D_{1}} = γ_{D_{2}}

= 10

; Structural parameters of the morphing unmanned aircraft:

I_{b x} = 20

,

I_{b y} = I_{b}_{z} = 150

; The aircraft’s flight velocity is

V = 600

m/s, and its flight altitude is

H = 15

km. To ensure the proper implementation of the simulations, they are conducted under three different cases. The different parameters of the uncertainties

Δ f_{1}, Δ f_{2}

and disturbances

d_{1} (t)

and

d_{2} (t)

expressed in model (27) are defined under three cases. The detailed specifications of these scenarios are presented in Table 1.

The simulation results are given in Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6. Figure 2 demonstrates the trajectory tracking performance of the morphing unmanned aircraft’s attitude angles under coupled uncertainties. Obviously, the proposed RL-based adaptive controller achieves convergence within a 0.001-error bound around the desired trajectories. Despite an initial transient overshoot, the intelligent controller can quickly compensate for the uncertainties, exhibiting strong adaptability and robustness. The disturbance estimation and neural network weight estimation signals are shown in Figure 3, Figure 4 and Figure 5, and it can be seen that all signals are bounded. Figure 6 shows the control torque variation curve over time under Case 1.

To verify the advantages of the proposed reinforcement learning adaptive control (RLAC) method and its adaptability, such as speed and stability, the neural network adaptive control (NNAC) [59] method and the adaptive fault-tolerant control (AFTC) [60] method are selected for comparison. These existing adaptive control methods serve as benchmarks to highlight the unique features of the RLAC method. All controllers track identical reference trajectories under the same experimental conditions. The experimental results comprehensively compare the proposed RLAC method with two existing control approaches, AFTC and NNAC methods, are shown in Figure 7 and Figure 8. Table 2 presents the steady-state error and convergence time of the three methods, where convergence time is defined as the time required to complete 90% of the convergence process. In Figure 8, the tracking error is defined as the sum of the two norms of the errors

z_{1}

and

z_{2}

, that is,

e = {‖z_{1}‖}_{2} + {‖z_{2}‖}_{2}

, which clearly highlights the superiority of the proposed RLAC method.

The results demonstrate that under the same experimental conditions, the AFTC method has limitations. It has a relatively large steady-state error of about 0.25 and a convergence time exceeding 10 s, which means it struggles to precisely track the reference trajectory and reaches a stable state slowly, leading to inefficient system operation.

The NNAC method shows improvements. It can achieve a steady-state error of less than 0.0015, more accurate than the AFTC method. However, it suffers from significant overshoot, which may cause system instability. In contrast, the RLAC method demonstrates remarkable superiority. It has a steady-state error below 0.001, the fastest convergence speed, and can rapidly adapt to changing environments. In terms of tracking performance, it outperforms both AFTC and NNAC methods in speed and stability. Consequently, the RLAC method offers significant advantages over AFTC and NNAC methods, making it a more promising choice for practical control applications.

5. Conclusions

The RL-based adaptive fault-tolerant control method proposed in this paper effectively solves the control problem for a class of morphing unmanned aircraft under mismatched disturbances and coupled uncertainties. The aircraft’s uncertainties are modeled as a non-affine second-order nonlinear system, and the non-affine structure is handled by introducing an auxiliary integral system. The unknown functions are evaluated by the introduced RL algorithms, and the control actions are adjusted by the developed RL-based adaptive laws. The filtering errors and disturbances are compensated by adopting a disturbance boundary estimator. The RL-based adaptive fault-tolerant control method has been successfully designed by integrating RL, disturbance estimation, and finite-time theory. It is proven by the Lyapunov function that the system has finite-time stability and all signals are bounded. Numerical simulations verify the effectiveness and superiority of this method. In the future, this method is expected to be extended to aircraft models, considering various factors in the actual flight environment.

Author Contributions

W.R.: writing–review and editing, writing–original draft, visualization, methodology, software, conceptualization, and funding acquisition. Y.W.: writing—review and editing, supervision. C.W.: writing—review and editing, visualization. Z.W.: writing—review and editing, visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the Guizhou Provincial Science and Technology Projects ([2025] 049).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jha, A.K.; Kudva, J.N. Morphing Unmanned aircraft Concepts, Classifications, and Challenges. In Smart Structures and Materials 2004: Industrial and Commercial Applications of Smart Structures Technologies; Society of Photo Optical: Bellingham, WA, USA, 2004; Volume 5388, pp. 213–224. [Google Scholar]
Chu, L.; Li, Q.; Gu, F.; Du, X.; He, Y.; Deng, Y. Design, Modeling, and Control of Morphing Aircraft: A Review. Chin. J. Aeronaut. 2022, 35, 220–246. [Google Scholar] [CrossRef]
Yu, Z.; Zang, Y.; Jiang, B. PID-type fault-tolerant prescribed performance control of fixed-wing UAV. J. Syst. Eng. Electron. 2021, 32, 1053–1061. [Google Scholar]
Noordin, A.; Mohd Basri, M.A.; Mohamed, Z.; Mat Lazim, I. Adaptive PID controller using sliding mode control approaches for quadrotor UAV attitude and position stabilization. Arab. J. Sci. Eng. 2021, 46, 963–981. [Google Scholar] [CrossRef]
Wang, P.; Chen, H.; Bao, C.; Tang, G. Review on Modeling and Control Methods of Morphing Vehicle. J. Astronaut. 2022, 43, 853–865. [Google Scholar]
Ameduri, S.; Concilio, A. Morphing Wings Review: Aims, Challenges, and Current Open Issues of a Technology. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2023, 237, 4112–4130. [Google Scholar] [CrossRef]
Shardul, G. Study of Various Trends for Morphing Wing Technology. J. Comput. Methods Sci. Eng. 2021, 21, 613–621. [Google Scholar] [CrossRef]
He, H.; Wang, P. Integrated Guidance and Control Method for High-Speed Morphing Wing Aircraft. Acta Aeronaut. Astronaut. Sin. 2024, 45, 299–312. [Google Scholar]
Zhang, H.; Wang, P.; Tang, G.; Bao, W. Fixed-Time Sliding Mode Control for Hypersonic Morphing Vehicles via Event-Triggering Mechanism. Aerosp. Sci. Technol. 2023, 140, 108458. [Google Scholar] [CrossRef]
Abouheaf, M.; Mailhot, N.Q.; Gueaieb, W.; Spinello, D. Guidance Mechanism for Flexible-Wing Aircraft Using Measurement-Interfaced Machine-Learning Platform. IEEE Trans. Instrum. Meas. 2020, 69, 4637–4648. [Google Scholar] [CrossRef]
Lee, J.; Kim, Y. Neural Network-Based Nonlinear Dynamic Inversion Control of Variable-Span Morphing Aircraft. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2020, 234, 1624–1637. [Google Scholar] [CrossRef]
Irfan, S.; Zhao, L.; Ullah, S.; Javaid, U.; Iqbal, S. Differentiator- and Observer-Based Feedback Linearized Advanced Nonlinear Control Strategies for an Unmanned Aerial Vehicle System. Drones 2024, 8, 527. [Google Scholar] [CrossRef]
Lee, H.; Kim, S.; Kim, Y. Actor-Critic-Based Optimal Adaptive Control Design for Morphing Aircraft. IFAC Pap. 2020, 53, 14863–14868. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, P.; Tang, G.; Chen, H. Disturbance Observer-Based Prescribed Performance Control for Morphing Aircraft. Tactical Missile Technol. 2024, 4, 72–82. [Google Scholar]
Hu, H.; Li, Y.; Yi, W.; Wang, Y.; Qu, F.; Wang, X. Event-Triggered Neural Network-Based Adaptive Control for a Class of Uncertain Nonlinear Systems. J. Circuits Syst. Comput. 2021, 30, 15. [Google Scholar] [CrossRef]
Yuan, F.; Liu, Y.-J.; Liu, L.; Lan, J. Adaptive Neural Network Control of Non-Affine Multi-Agent Systems with Actuator Fault and Input Saturation. Int. J. Robust. Nonlinear Control 2024, 34, 3761–3780. [Google Scholar] [CrossRef]
Anderson, R.B.; Marshall, J.A.; L’Afflitto, A.; Dotterweich, J.M. Model Reference Adaptive Control of Switched Dynamical Systems with Applications to Aerial Robotics. J. Intell. Robot. Syst. 2020, 100, 1265–1281. [Google Scholar] [CrossRef]
Qi, W.; Teng, J.; Cao, J.; Yan, H.; Cheng, J. Improved Model Reference-Based Adaptive Nonlinear Dynamic Inversion for Fault-Tolerant Flight Control. Int. J. Robust. Nonlinear Control 2023, 33, 10328–10359. [Google Scholar] [CrossRef]
Li, Y.; Liu, X.; Ming, R.; Li, K.; Zhang, W. Dynamic Protocol-Based Control for Hidden Stochastic Jump Multiarea Power Systems in Finite-Time Interval. IEEE Trans. Cybern. 2025, 55, 1486–1496. [Google Scholar]
Li, G.; Peng, C.; Cao, Z. Finite-time bounded asynchronous sliding-mode control for T-S fuzzy time-delay systems via event-triggered scheme. Fuzzy Sets Syst. 2025, 514, 109400. [Google Scholar] [CrossRef]
Yu, C.; Jiang, J.; Wang, S.; Han, B. Fixed-Time Adaptive General Type-2 Fuzzy Logic Control for Air-Breathing Hypersonic Vehicle. Trans. Inst. Meas. Control 2021, 43, 2143–2158. [Google Scholar] [CrossRef]
Hernández-González, O.; Targui, B.; Valencia-Palomo, G.; Guerrero-Sánchez, M.E. Robust cascade observer for a disturbance unmanned aerial vehicle carrying a load under multiple time-varying delays and uncertainties. Int. J. Syst. Sci. 2024, 55, 1056–1072. [Google Scholar] [CrossRef]
Hernández-González, O.; Ramírez-Rasgado, F.; Farza, M.; Guerrero-Sánchez, M.-E.; Astorga-Zaragoza, C.-M.; M’Saad, M.; Valencia-Palomo, G. Observer for Nonlinear Systems with Time-Varying Delays: Application to a Two-Degrees-of-Freedom Helicopter. Aerospace 2024, 11, 206. [Google Scholar] [CrossRef]
Qu, C.; Cheng, L.; Gong, S.; Huang, X. Dynamic-Matching Adaptive Sliding Mode Control for Hypersonic Vehicles. Aerosp. Sci. Technol. 2024, 149, 109159. [Google Scholar] [CrossRef]
Zhao, Y.; Ma, Y. Adaptive Event-Triggered Finite-Time Sliding Mode Control for Singular T–S Fuzzy Markov Jump Systems with Asynchronous Modes. Commun. Nonlinear Sci. Numer. Simul. 2023, 126, 107465. [Google Scholar] [CrossRef]
Shi, X.; Li, Y.; Liu, Q.; Lin, K.; Chen, S. A Fully Distributed Adaptive Event-Triggered Control for Output Regulation of Multi-Agent Systems with Directed Network. Inf. Sci. 2023, 626, 60–74. [Google Scholar] [CrossRef]
Abbas, M.; Sadati, S.H.; Khazaee, M. Fault-Tolerant Control Design Based on Observer-Switching and Adaptive Neural Networks for Maneuvering Aircraft. J. Braz. Soc. Mech. Sci. Eng. 2024, 46, 14689–14698. [Google Scholar] [CrossRef]
Zhao, L.; Wang, L.; Cao, Y.; Yang, Y.; Wen, S. Learning-Based Fault-Tolerant Control With High-Order Control Barrier Functions. IEEE Trans. Autom. Sci. Eng. 2025, 22, 694. [Google Scholar] [CrossRef]
Ma, J.; Peng, C. Adaptive Model-Free Fault-Tolerant Control Based on Integral Reinforcement Learning for a Highly Flexible Aircraft with Actuator Faults. Aerosp. Sci. Technol. 2021, 119, 107204. [Google Scholar] [CrossRef]
Liu, Y.; Jing, Y.; Liu, X.; Li, X. Survey on Finite-Time Control for Nonlinear Systems. Control Theory Appl. 2020, 37, 1–12. [Google Scholar]
Guo, F.; Zhang, W.; Lv, M.; Zhang, R. Fault-Tolerant Tracking Control of Hypersonic Vehicle Based on a Universal Prescribe Time Architecture. Drones 2024, 8, 295. [Google Scholar] [CrossRef]
Bingö, Z.; Güzey, H. Finite-Time Neuro-Sliding-Mode Controller Design for Quadrotor UAVs Carrying Suspended Payload. Drones 2022, 6, 311. [Google Scholar] [CrossRef]
Rang, E.R. Isochrone Families for Second-Order Systems. IEEE Trans. Automat. Contr. 1963, 8, 64–65. [Google Scholar] [CrossRef]
Chen, X.; Zhang, X. Output-Feedback Control Strategies of Lower-Triangular Nonlinear Nonholonomic Systems in Any Prescribed Finite Time. Int. J. Robust Nonlinear Control 2019, 29, 904–918. [Google Scholar] [CrossRef]
Zhou, T.; Liu, S.; Liu, C. Finite-Time Prescribed Performance Adaptive Fuzzy Control for Nonlinear Systems with Unknown Virtual Control Coefficients. In Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Chengdu, China, 18–20 June 2021; pp. 7–12. [Google Scholar]
Kang, B.; Li, Y. Adaptive Finite-Time Stabilization of High-Order Stochastic Nonlinear Systems with Unknown Control Direction. In Proceedings of the Conference Digest—2021 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Chengdu, China, 18–20 June 2021; pp. 150–155. [Google Scholar]
Zhu, W.; Zong, Q.; Zhang, X.; Liu, W. Disturbance Observer-Based Multivariable Finite-Time Attitude Tracking for Flexible Spacecraft. In Proceedings of the Chinese Control Conference (CCC), Chongqing, China, 28–30 July 2020; Volume 2020, pp. 1772–1777. [Google Scholar]
Wang, H.; Kang, S.; Feng, Z. Finite-Time Adaptive Fuzzy Command Filtered Backstepping Control for a Class of Nonlinear Systems. Int. J. Fuzzy Syst. 2019, 21, 2575–2587. [Google Scholar] [CrossRef]
Zhao, Q.; Duan, G. Concurrent Learning Adaptive Finite-Time Control for Spacecraft with Inertia Parameter Identification under External Disturbance. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 3691–3704. [Google Scholar] [CrossRef]
Bu, X.; Luo, R.; Lei, H. Chattering-avoidance discrete-time fuzzy control with finite-time preselected qualities. IEEE Trans. Fuzzy Syst. 2024, 33, 997–1008. [Google Scholar] [CrossRef]
Bu, X.; Luo, R.; Lei, H. Fuzzy-neural intelligent control with fixed-time pre-configured qualities for electromechanical dynamics of electric vehicles motors. IEEE Trans. Intell. Transp. Syst. 2024, 26, 1193–1202. [Google Scholar] [CrossRef]
Luo, R.; He, G.; Li, Y.; Bu, X.; Li, Q. Appointed-time Fuzzy Fault-tolerant Control of Hypersonic Flight Vehicles with Flexible Predefined Behaviors. IEEE Trans. Autom. Sci. Eng. 2025, 22, 13995–14007. [Google Scholar] [CrossRef]
Lin, X.; Shi, X.; Li, S. Adaptive Tracking Control for Spacecraft Formation Flying System via Modified Fast Integral Terminal Sliding Mode Surface. IEEE Access 2020, 8, 198357–198367. [Google Scholar] [CrossRef]
Wang, Q.; Wang, T.; Dong, C.; Jiang, W. Chained Smooth Switching Control for Morphing Aircraft. Control Theory Appl. 2015, 32, 949–954. [Google Scholar]
Cheng, H.; Fu, W.; Dong, C.; Wang, Q.; Hou, Y. Asynchronously Finite-Time H∞ Control for Morphing Aircraft. Trans. Inst. Meas. Control 2018, 40, 4330–4344. [Google Scholar] [CrossRef]
Chen, H.; Wang, P.; Tang, G. Attitude Control Scheme for Morphing Vehicles with Output Error Constraints and Input Saturation. Acta Aeronaut. Astronaut. Sin. 2023, 44, 408–419. [Google Scholar]
Sheng, H. Research on Strong Robust Control for Flexible Hypersonic Morphing Vehicle. Master’s Thesis, National University of Defense Technology, Changsha, China, 2021. [Google Scholar]
Liang, X.; Wang, Q.; Xu, B.; Dong, C. Back-stepping fault-tolerant control for Morphing Unmanned aircraft based on fixed-time observer. Int. J. Control Autom. Syst. 2021, 19, 3924–3936. [Google Scholar] [CrossRef]
Wu, Z.; Lu, J.; Zhou, Q.; Shi, J. Modified adaptive neural dynamic surface control for Morphing Unmanned aircraft with input and output constraints. Nonlinear Dyn. 2017, 87, 2367–2383. [Google Scholar] [CrossRef]
Emmanuel, S.A.; Sun, R. Approximation-free prescribed performance control for Nonlinear morphing missile System. Int. J. Eng. Appl. Sci. 2016, 3, 257584. [Google Scholar]
Wang, Z.; Chang, Y.; Qiu, Y.; Xing, X. RL-based adaptive control for a class of non-affine uncertain stochastic systems with mismatched disturbances. Commun. Nonlinear Sci. Numer. Simul. 2024, 138, 108191. [Google Scholar] [CrossRef]
Zhang, J.; Wang, S.; Zhou, P.; Zhao, L.; Li, S. Novel Prescribed Performance-Tangent Barrier Lyapunov Function for Neural Adaptive Control of the Chaotic PMSM System by Backstepping. Int. J. Electr. Power Energy Syst. 2020, 121, 105991. [Google Scholar] [CrossRef]
Wang, F.; Chen, B.; Liu, X.; Lin, C. Finite-time adaptive fuzzy tracking control design for nonlinear systems. IEEE Trans. Fuzzy Syst. 2017, 26, 1207–1216. [Google Scholar] [CrossRef]
Qian, C.; Lin, W. Non-Lipschitz Continuous Stabilizers for Nonlinear Systems with Uncontrollable Unstable Linearization. Syst. Control Lett. 2001, 42, 185–200. [Google Scholar] [CrossRef]
Li, M.; Zhang, J.; Li, S.; Wu, F. Adaptive Finite-Time Fault-Tolerant Control for the Full-State-Constrained Robotic Manipulator with Novel Given Performance. Eng. Appl. Artif. Intell. 2023, 125, 106650. [Google Scholar] [CrossRef]
Wei, Y.; Zhou, P.; Liang, Y.; Wang, Y.; Duan, D. Adaptive finite-time neural backstepping control for multi-input and multi-output state-constrained nonlinear systems using tangent-type nonlinear mapping. Int. J. Robust Nonlinear Control 2020, 30, 5559–5578. [Google Scholar] [CrossRef]
Wang, F.; Chen, B.; Lin, C.; Zhang, J.; Meng, X. Adaptive Neural Network Finite-Time Output Feedback Control of Quantized Nonlinear Systems. IEEE Trans. Cybern. 2018, 48, 1839–1848. [Google Scholar] [CrossRef]
Zhang, J.; Wang, S.; Li, S.; Zhou, P. Adaptive Neural Dynamic Surface Control for the Chaotic PMSM System with External Disturbances and Constrained Output. Recent Adv. Electr. Electron. Eng. 2020, 13, 894–905. [Google Scholar]
Wu, J.; Sun, Y.; Zhao, Q.; Wu, Z.-G. Adaptive neural asymptotic tracking control for a class of stochastic non-strict-feedback switched systems. J. Frankl. Inst. 2018, 359, 1274–1297. [Google Scholar] [CrossRef]
Chen, K.; Zhu, S.; Wei, C.; Xu, T.; Zhang, X. Output constrained adaptive neural control for generic hypersonic vehi-cles suffering from non-affine aerodynamic characteristics and stochastic disturbances. Aerosp. Sci. Technol. 2021, 111, 106469. [Google Scholar] [CrossRef]

Figure 1. The controller structure.

Figure 2. Attitude angle trajectories under three cases.

Figure 3. Disturbance norm estimation under Case 1.

Figure 4. Actor network weight norm estimation under Case 1.

Figure 5. Penalty functions estimation under Case 1.

Figure 6. Control torques under Case 1.

Figure 7. Comparison of tracking performance of three methods under Case 1.

Figure 8. Comparison of tracking errors of three methods under Case 1.

Table 1. The different parameter values of the morphing unmanned aircraft under three cases.

Cases	$Δ f_{1}$	$d_{1} (t)$	$Δ f_{2}$	$d_{2} (t)$
Case 1	$\dot{θ} = {0.007}^{\circ} / s$	$\dot{σ} = {0.01}^{\circ} / s, θ = 0^{\circ}$	$0.01 I_{b}$	$0.01 M_{f} + 0.01 I_{b}$
Case 2	$\dot{θ} = {0.01}^{\circ} / s$	$\dot{σ} = {0.006}^{\circ} / s, θ = {0.03}^{\circ}$	$0.008 I_{b}$	$0.006 M_{f} + 0.01 I_{b}$
Case 3	$\begin{array}{l} \dot{θ} = {0.005}^{°} \times \\ (1 + 0.1 \sin (t)) / s \end{array}$	$\begin{array}{l} \dot{σ} = {0.008}^{°} \times (1 + 0.1 \cos (t)) / s, \\ θ = {0.02}^{°} \end{array}$	$\begin{array}{l} 0.004 \times (1 + \\ 0.1 \sin (t)) I_{b} \end{array}$	$\begin{array}{l} 0.009 M_{f} + 0.01 \times (1 \\ + 0.1 \cos (t)) I_{f} \end{array}$

Table 2. Tracking performance of three methods.

Method	Steady-State Error (Rad)	Convergence Time (s)
RLAC	0.0038, 0.0024, 0.00087	0.79, 0.66, 0.82
NNAC	0.007, 0.0023, 0.001	1.43, 4.19, 1.19
AFTC	>0.026, >0.025, >0.0026	5.81, >10, 1.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, W.; Wei, Y.; Wang, C.; Wang, Z. Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties. Drones 2025, 9, 562. https://doi.org/10.3390/drones9080562

AMA Style

Ren W, Wei Y, Wang C, Wang Z. Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties. Drones. 2025; 9(8):562. https://doi.org/10.3390/drones9080562

Chicago/Turabian Style

Ren, Wei, Yingjie Wei, Cong Wang, and Zheng Wang. 2025. "Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties" Drones 9, no. 8: 562. https://doi.org/10.3390/drones9080562

APA Style

Ren, W., Wei, Y., Wang, C., & Wang, Z. (2025). Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties. Drones, 9(8), 562. https://doi.org/10.3390/drones9080562

Article Menu

Finite-Time Adaptive Reinforcement Learning Control for a Class of Morphing Unmanned Aircraft with Mismatched Disturbances and Coupled Uncertainties

Abstract

Highlights

Abstract

1. Introduction

2. Problem Formulation and Preliminaries

2.1. Dynamic Model Description

2.2. Designs of Actor–Critic Neural Networks

3. Main Results

3.1. The Design of the Augmented System

3.2. Controller Design

3.3. Proof of Stability

4. Simulation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI