ADP-Based Fault-Tolerant Control with Stability Guarantee for Nonlinear Systems

Luojia Liu; Junhong Lv; Haowei Lin; Ruidian Zhan; Liming Wu

doi:10.3390/e27101028

,

and

School of Advanced Manufacturing, Guangdong University of Technology, Jieyang 515200, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy2025, 27(10), 1028;https://doi.org/10.3390/e27101028

This article belongs to the Section Complexity

Version Notes

Order Reprints

Abstract

This paper develops the stability-guaranteed adaptive dynamic programming (ADP)-based fault tolerant control (FTC) for nonlinear systems with an actuator fault. Firstly, a fault observer is designed to identify the unknown actuator fault. Then, a critic neural network (NN) is built to approximate the optimal control of the nominal system. Meanwhile, a stability-aware weight update mechanism is proposed based on the Lyapunov stability theorem to relax the restriction of the initial admissible control on the system stability. By integrating the nominal optimal control and the fault estimation, the stability-guaranteed ADP-based FTC is developed to eliminate the influence of the actuator fault. Furthermore, the observer errors, critic NN weight estimation errors, and the closed-loop system are all shown to exhibit uniform ultimate boundedness by using Lyapunov’s direct method. Finally, simulation examples are given to demonstrate the validity of the proposed method.

Keywords:

adaptive dynamic programming; actuator fault; fault observer; Lyapunov stability

1. Introduction

The increasing scale and complexity of modern industrial systems pose significant challenges to optimal controller design. Adaptive dynamic programming (ADP) has emerged as an effective tool of optimal control, particularly for high-dimensional nonlinear systems where exact mathematical models are unavailable. ADP derives the optimal control by integrating the technique of reinforcement learning (RL), dynamic programming, and neural networks (NNs). With years of developments, ADP has found widespread application in complex nonlinear systems, including robust control [1], fault tolerant control (FTC) [2], zero-sum games [3], actuator saturation [4], and so on.

Value iteration (VI) and policy iteration (PI) are two basic algorithms of ADP. Based on VI, the value function converges to the optima monotonically without requiring an initial admissible policy (IAP). The convergence analysis of continuous VI was presented in [5]. It was proved that the value function initialized with an arbitrary positive semidefinite function can converge to a neighborhood of the optimum. Ha et al. [6] improved the VI performance by developing a novel adaptive critic design with a discount factor, which accelerates the learning formulation and relaxes conditions on the system dynamics. However, VI cannot guarantee system stability during the learning process. In contrast, PI starts with the IAP and alternates between policy evaluation and improvement while ensuring stability at each iteration. Thus, PI is generally considered more suitable for practical applications. Nonetheless, one notable limitation of PI is its reliance on an IAP, which is difficult to obtain. To address this problem, Duan et al. [7] proposed a PI-based actor–critic approach with stability guarantees for nonlinear systems, where a warm-up phase was introduced to guide the initial policy toward admissibility. Combining the advantages of VI and PI, Luo et al. [8] developed a novel ADP method by adding a parameter to balance VI and PI, which not only accelerates VI but also eliminates the need of an IAP. In [9], Guo et al. studied the event-triggered tracking control utilizing ADP with a single critic NN, where the dependence on the IAP is removed by refining the weight update mechanism.

Due to the increasing complexity of industrial systems, faults in control components are a frequent occurrence, with actuator faults being one of the most common types. Such faults can lead to a significant degradation of system reliability, safety, and stability. In response to the severe challenges posed by actuator faults, FTC has emerged as a critical technology for ensuring system safety and enhancing reliability. A variety of classical FTC strategies have been established, such as observer-based methods [10,11,12], sliding mode control (ISMC) [13,14,15], and fuzzy logic control (FLC) [16,17,18]. Recently, increasing attention has been given to ADP-based FTC methods. A number of studies have achieved success by integrating ADP with observer-based techniques. Liu [19] employs an NN-based fault observer for online fault estimation and compensation. A neuro-dynamic programming-based FTC scheme is proposed in [20], which combines a state observer with a fault observer to simultaneously estimate the state and multiple faults. This observer-based paradigm has also been extended to handle more demanding dynamics, such as state time delays [21,22] and complex multi-agent interactions [23,24]. However, their reliance on model knowledge for observer design limits their application to systems with unknown dynamics. To address this limitation, model-free approaches have become a major research focus. Lin et al. [25] proposed a data-based FTC framework which uses the particle swarm optimization algorithm to optimize the NN identifier and the critic NN for unknown nonlinear systems. More recently, fully RL methods have been developed to learn the control policy directly from system data. For instance, Zhang et al. [26] utilizes a model-free, off-policy RL algorithm to learn and deploy a differential-game-based

H_{\infty}

fault-tolerant controller. This controller enables effective compensation for actuator faults while mitigating external disturbances. Ref. [27] integrates the reference trajectory and system state. The authors’ method learns the optimal control policy using both offline pre-training and an online actor–critic framework, which enables optimal fault-tolerant tracking control for nonlinear systems. A notable challenge in these online learning schemes is the stringent persistent excitation (PE) condition. Wu et al. [28] developed a distributed tracking control scheme based on event-triggered ADP. This scheme introduces an event-triggered mechanism to save computation and communication resources while relaxing the PE condition. These model-free strategies have also been adapted to manage complex scenarios involving multi-fault handling and input constraints [29].

It is worth mentioning that many existing ADP-based FTC methods are developed based on the PI algorithm, which completely relies on an IAP policy. However, as is mentioned above, it is extremely difficult to obtain an IAP theoretically, which limits the application of the ADP-based FTC algorithms. To address this challenge, this paper proposes a stability-guaranteed ADP-based FTC strategy that eliminates the need for the admissible policy. The key contributions of this work are described as follows:

(1): A stability-aware weight update mechanism with an auxiliary stabilizing term is proposed to evaluate real-time system stability and correct the learning process of the critic NN, which ensures the system stability in the absence of the initial stabilizing control.
(2): By designing a fault observer, the unknown actuator fault is estimated accurately and compensated based on the nominal optimal control, thereby eliminating the influence of the actuator fault.

2. Problem Statement and Preliminaries

2.1. Problem Formulation

Consider the following continuous-time nonlinear system with the actuator fault:

\dot{x} (t) = f (x (t)) + g (x (t)) (u (x (t)) - f_{a} (t)),

(1)

where

x \in R^{n}

is the system state vector,

u \in R^{m}

is the control input vector, and

f_{a} (t) \in R^{m}

denotes the unknown actuator fault. The function

f (x) \in R^{n}

is a nonlinear function satisfying

f (0) = 0

.

g (x) \in R^{n \times m}

is the input gain function. The initial state is given by

x (0) = x_{0}

.

Assumption 1.

The actuator fault

f_{a} (t)

is unknown but bounded by a positive constant

δ_{1}

, i.e.,

∥ f_{a} (t) ∥ \leq δ_{1}

[30].

The objective of this paper is to design the FTC policy

u (x (t))

for the system (1) subject to the unknown but bounded actuator fault

f_{a} (t)

satisfying Assumption 1. While actuator faults share the input channel

g (x)

with matched disturbances, they are fundamentally distinct from typical external disturbances. First, actuator faults represent an internal malfunction that directly compromises the intended execution and effectiveness of the control signal, rather than being merely an additive exogenous input. This distinction necessitates the explicit fault identification and compensatory control strategies to maintain stability and recover performance. Second, actuator faults originate from random internal component failures within the system. This makes their occurrence difficult to predict and poses a far greater threat to system stability and safety than conventional disturbances.

2.2. Nominal Optimal Control

We first consider the nominal optimal control problem for the system in the absence of the fault, i.e., where

f_{a} (t) = 0

. The nominal system dynamics become

\dot{x} (t) = f (x (t)) + g (x (t)) u (x (t)) .

(2)

The performance index for the nominal system (2) is defined as

J (x_{0}) = \int_{0}^{\infty} U (x (τ), u (τ)) d τ,

(3)

where the utility function is defined by

U (x, u) = x^{⊤} Q x + u^{⊤} R u

, with

Q \in R^{n \times n}

and

R \in R^{m \times m}

being constant symmetric positive-definite matrices.

Definition 1.

For the nominal system (2) (i.e., system (1) with

f_{a} (t) = 0

), a control policy

u (x)

is continuous on a compact set

Ω \subset R^{n}

, satisfies

u (0) = 0

, stabilizes the system (2), and results in a finite performance index

J (x_{0})

as defined in (3) for all

x_{0} \in Ω

, then the control policy

u (x)

is said to be admissible.

Let

Ψ (Ω)

denote the set of admissible policies on

Ω

. If

J (x)

is continuously differentiable, the nonlinear Lyapunov equation is

0 = U (x, u (x)) + \nabla V {(x)}^{⊤} (f (x) + g (x) u (x)),

(4)

where

V (0) = 0

, and

\nabla V (x) = \frac{\partial V (x)}{\partial x}

denotes the gradient of

V (x)

with respect to x.

The Hamiltonian function is given by

H (x, u, \nabla V (x)) = U (x, u) + \nabla V {(x)}^{⊤} (f (x) + g (x) u) .

(5)

Using this definition, the Lyapunov Equation (4) can be simply expressed as

H (x, u (x), \nabla V (x)) = 0 .

(6)

The optimal control problem aims to find the control policy

u^{*} (x) \in Ψ (Ω)

that minimizes the performance index (3). The optimal performance index is defined as

J^{*} (x) = min_{u \in Ψ (Ω)} \int_{0}^{\infty} U (x (τ), u (x (τ))) d τ,

(7)

and the corresponding HJB equation is

0 = min_{u \in Ψ (Ω)} \{U (x, u (x)) + \nabla J^{*} {(x)}^{⊤} (f (x) + g (x) u (x))\},

(8)

where

\nabla J^{*} (x) = \partial J^{*} (x) / \partial x

. The HJB equation can also be written using the Hamiltonian as

{min}_{u \in Ψ (Ω)} H (x, u (x), \nabla J^{*} (x)) = 0

, where

\nabla J^{*} (x)

now takes the role of

\nabla V (x)

in the Hamiltonian definition (5). By solving the HJB equation, an optimal control policy is obtained:

u^{*} (x) = - \frac{1}{2} R^{- 1} g^{⊤} (x) \nabla J^{*} (x) .

(9)

The optimal control

u^{*} (x)

given by (9) and its corresponding optimal value function

J^{*} (x)

are typically found using iterative methods. PI is a widely used ADP approach that effectively solves the HJB equation. PI conventionally alternates between two main steps: policy evaluation, in which the value function for the current control policy is determined by solving the Bellman Equation (4), and policy improvement, where the control policy is updated using the gradient of this value function to better approximate

u^{*} (x)

. However, a significant challenge in conventional PI is its requirement for an initial admissible control policy to begin the iteration. This limitation motivates the development of techniques that can operate without a predefined initial admissible control.

From the optimal control law expression (9), we can derive a useful relationship:

{(\nabla J^{*} (x))}^{⊤} g (x) = - 2 {(u^{*} (x))}^{⊤} R .

(10)

3. Stability-Guaranteed ADP-Based FTC Without IAP

3.1. Fault Observer Design

For the faulty system described in (1), we design the fault observer as

\dot{\hat{x}} = \hat{f} (x) + \hat{g} (x) (u - {\hat{f}}_{a}) + L_{1} (x - \hat{x}),

(11)

where

\hat{x} \in R^{n}

denotes the estimated system state,

L_{1} \in R^{n \times n}

is a positive-definite observer gain matrix, and

{\hat{f}}_{a} \in R^{m}

is the estimated actuator fault. The fault estimate

{\hat{f}}_{a}

is updated using the following adaptive law:

{\dot{\hat{f}}}_{a} = - L_{2} {\hat{g}}^{⊤} (x) e_{o},

(12)

where

L_{2} \in R^{m \times m}

is a positive-definite gain matrix, and

e_{o} = x - \hat{x}

denotes the observation error. According to the system dynamics (2) and the observer (11), the error dynamics are derived as

\begin{matrix} {\dot{e}}_{o} & = f (x) + g (x) (u - f_{a}) - \hat{f} (x) - \hat{g} (x) (u - {\hat{f}}_{a}) - L_{1} e_{o} \\ = \tilde{f} + \tilde{g} u - g (x) f_{a} + \hat{g} (x) {\hat{f}}_{a} - L_{1} e_{o}, \end{matrix}

(13)

where

\tilde{f} = f (x) - \hat{f} (x)

and

\tilde{g} = g (x) - \hat{g} (x)

represent the observation errors of the nonlinear components

f (x)

and

g (x)

, respectively.

Define

ς = \tilde{f} + \tilde{g} (u - f_{a})

; the error dynamics in (13) is expressed as

{\dot{e}}_{o} = ς - \hat{g} (x) (f_{a} - {\hat{f}}_{a}) - L_{1} e_{o} .

Assumption 2.

ς is norm-bounded, i.e.,

∥ ς ∥ \leq ς_{M}

, where

ς_{M}

is a positive constant [30].

Theorem 1.

For the faulty system (1) with Assumptions 1 and 2, the fault observation error is uniformly ultimately bounded (UUB) using the fault observer (11) and the adaptive update law (12).

Proof.

Select the Lyapunov function candidate as

Σ_{1} = \frac{1}{2} e_{o}^{⊤} e_{o} + \frac{1}{2} {\tilde{f}}_{a}^{⊤} L_{2}^{- 1} {\tilde{f}}_{a},

(14)

where

{\tilde{f}}_{a} = f_{a} - {\hat{f}}_{a}

is the estimation error of the actuator fault. Taking the time derivative of (14) and substituting (13) yields

\begin{matrix} {\dot{Σ}}_{1} & = e_{o}^{⊤} {\dot{e}}_{o} - {\dot{\hat{f}}}_{a}^{⊤} L_{2}^{- 1} {\tilde{f}}_{a} \\ = e_{o}^{⊤} (\tilde{f} + \tilde{g} (u - f_{a}) - \hat{g} (x) (f_{a} - {\hat{f}}_{a}) - L_{1} e_{o}) - {\dot{\hat{f}}}_{a}^{⊤} L_{2}^{- 1} {\tilde{f}}_{a} \\ \leq ς_{M} ∥ e_{o} ∥ - e_{o}^{⊤} \hat{g} (x) {\tilde{f}}_{a} - λ_{min} (L_{1}) {∥ e_{o} ∥}^{2} - {\dot{\hat{f}}}_{a}^{⊤} L_{2}^{- 1} {\tilde{f}}_{a} \\ = - (λ_{min} (L_{1}) ∥ e_{o} ∥ - ς_{M}) ∥ e_{o} ∥ - (e_{o}^{⊤} \hat{g} (x) + {\dot{\hat{f}}}_{a}^{⊤} L_{2}^{- 1}) {\tilde{f}}_{a} \end{matrix}

Substituting the adaptive law (12) into the above yields

{\dot{Σ}}_{1} = - (λ_{min} (L_{1}) ∥ e_{o} ∥ - ς_{M}) ∥ e_{o} ∥ .

(15)

This implies that we have

{\dot{Σ}}_{1} < 0

as long as

e_{o}

lies outside the compact set

∥ e_{o} ∥ \leq \frac{ς_{M}}{λ_{min} (L_{1})}

. According to the Lyapunov stability theorem, the fault observation error is UUB. The proof is thus complete. □

Remark 1.

The observer gains

L_{1}

and

L_{2}

, as well as the disturbance bound

ς_{M}

, directly affect the convergence rate and ultimate bound of the estimation error. A larger

L_{1}

improves convergence speed, while

L_{2}

regulates the responsiveness of fault estimation. Appropriate tuning is essential to balance estimation accuracy and robustness.

3.2. Nominal Optimal Control via Critic Neural Network

NNs are known to be universal approximators for nonlinear functions. In this work, the critic NN is utilized to approximate the optimal value function

J^{*} (x)

(7), which is the solution to the HJB Equation (8) and is typically unknown, nonlinear, and non-analytic. The optimal value function

J^{*} (x)

can be represented by an NN as

J^{*} (x) = W_{c}^{⊤} σ (x) + ε_{c} (x),

(16)

where

W_{c} \in R^{l}

is the ideal weight vector,

σ (x) \in R^{l}

is the vector of activation functions, l is the number of neurons in the hidden layer, and

ε_{c} (x)

is the NN approximation error. The gradient of the optimal value function is then

\nabla J^{*} (x) = {(\nabla σ (x))}^{⊤} W_{c} + \nabla ε_{c} (x),

(17)

where

\nabla σ (x) = \partial σ (x) / \partial x \in R^{l \times n}

. When substituting the NN approximation (17) into (8), the resulting expression can be separated into terms involving

{(\nabla σ (x))}^{⊤} W_{c}

and terms involving the approximation error gradient

\nabla ε_{c} (x)

. We define

e_{c H}

as the sum of all terms that include

ε_{c} (x)

or its gradient

\nabla ε_{c} (x)

, such that the HJB equation is satisfied by setting the sum of the principal part (formed with

{(\nabla σ (x))}^{⊤} W_{c}

) and

e_{c H}

to zero. Specifically,

e_{c H}

is derived as

\begin{matrix} e_{c H} & = {(\nabla ε_{c} (x))}^{⊤} f (x) - \frac{1}{2} {(\nabla ε_{c} (x))}^{⊤} G (x) {(\nabla σ (x))}^{⊤} W_{c} \\ - \frac{1}{4} {(\nabla ε_{c} (x))}^{⊤} G (x) \nabla ε_{c} (x), \end{matrix}

(18)

where

G (x) = g (x) R^{- 1} g^{⊤} (x)

.

Since the ideal weight vector

W_{c}

is unknown, (16) is approximated as

{\hat{J}}^{*} (x) = {\hat{W}}_{c}^{⊤} σ (x),

(19)

with gradient

\nabla {\hat{J}}^{*} (x) = {(\nabla σ (x))}^{⊤} {\hat{W}}_{c},

(20)

where

{\hat{W}}_{c}

is the estimate of

W_{c}

.

By substituting

\nabla {\hat{J}}^{*} (x)

, the approximate Hamiltonian error is defined as

\begin{matrix} e_{c} & = x^{⊤} Q x + {({(\nabla σ (x))}^{⊤} {\hat{W}}_{c})}^{⊤} f (x) \\ - \frac{1}{4} {({(\nabla σ (x))}^{⊤} {\hat{W}}_{c})}^{⊤} G (x) ({(\nabla σ (x))}^{⊤} {\hat{W}}_{c}) . \end{matrix}

(21)

Let

{\tilde{W}}_{c} = W_{c} - {\hat{W}}_{c}

be the weight estimation error. The Hamiltonian error

e_{c}

can be expressed in terms of

{\tilde{W}}_{c}

,

W_{c}

, and

e_{c H}

as

\begin{matrix} e_{c} & = - {\tilde{W}}_{c}^{⊤} (\nabla σ (x) f (x) - \frac{1}{2} \nabla σ (x) G (x) {(\nabla σ (x))}^{⊤} W_{c}) \\ - \frac{1}{4} {\tilde{W}}_{c}^{⊤} \nabla σ (x) G (x) {(\nabla σ (x))}^{⊤} {\tilde{W}}_{c} - e_{c H} . \end{matrix}

(22)

To update the critic weight vector

{\hat{W}}_{c}

, the gradient descent algorithm is applied to minimize the objective function:

E_{c} = \frac{1}{2} e_{c}^{⊤} e_{c} .

(23)

The gradient of

E_{c}

with respect to

{\hat{W}}_{c}

is given by

\frac{\partial E_{c}}{\partial {\hat{W}}_{c}} = e_{c} (\nabla σ (x) f (x) - \frac{1}{2} \nabla σ (x) G (x) ({(\nabla σ (x))}^{⊤} {\hat{W}}_{c})) .

(24)

Based on (9) and (17), the optimal control policy is obtained by

u^{*} (x) = - \frac{1}{2} R^{- 1} g^{⊤} (x) ({(\nabla σ (x))}^{⊤} W_{c} + \nabla ε_{c} (x)) .

(25)

Using the estimated critic weights

{\hat{W}}_{c}

, the approximated optimal control is given by

\hat{u} (x) = - \frac{1}{2} R^{- 1} g^{⊤} (x) {(\nabla σ (x))}^{⊤} {\hat{W}}_{c} .

(26)

Applying the approximate control policy

\hat{u} (x)

to the nominal system (2) yields the closed-loop dynamics

\dot{x} = f (x) - \frac{1}{2} g (x) R^{- 1} g^{⊤} (x) {(\nabla σ (x))}^{⊤} {\hat{W}}_{c} .

(27)

3.3. Stability-Aware Weight Update Mechanism

The conventional critic weight update rule is typically derived under the assumption of an admissible initial control policy. Without a stabilizing initial policy, the learning process may fail to ensure system stability. To address this challenge, we introduce an auxiliary term related to system stability into the update law of the critic network. The enhanced critic weight update law is given by

{\dot{\hat{W}}}_{c} = - α_{c} \frac{\partial E_{c}}{\partial {\hat{W}}_{c}} + \frac{1}{2} α_{s} Π (x, \hat{u} (x)) \nabla σ (x) G (x) \nabla J_{s} (x) .

(28)

where

α_{c}

and

α_{s}

are learning rates. The binary function

Π (x, \hat{u} (x)) \in {0, 1}

is defined as

Π (x, \hat{u} (x)) = \{\begin{matrix} 0, & if \nabla J_{s} {(x)}^{⊤} (f (x) + g (x) \hat{u} (x)) < 0 \\ 1, & otherwise . \end{matrix}

(29)

Assumption 3.

A continuously differentiable Lyapunov function

J_{s} (x)

is selected such that

{\dot{J}}_{s} (x) = {(\nabla J_{s} (x))}^{⊤} \dot{x} = {(\nabla J_{s} (x))}^{⊤} (f (x) + g (x) u^{*}) < 0 .

(30)

Moreover, there exists a positive definite matrix

Λ (x)

such that

{(\nabla J_{s} (x))}^{⊤} (f (x) + g (x) u^{*}) = - {(\nabla J_{s} (x))}^{⊤} Λ (x) \nabla J_{s} (x) .

(31)

Furthermore, there exists a positive constant

λ_{s} > 0

satisfying

0 < λ_{s} ∥ \nabla J_{s} (x) ∥ \leq - {(\nabla J_{s} (x))}^{⊤} \dot{x} .

(32)

Remark 2.

Assumption 3 [31] is commonly used for establishing the stability of the closed-loop system under the optimal control

u^{*}

. It relies on the premise that the closed-loop dynamics,

f (x) + g (x) u^{*}

, are suitably bounded. Specifically, for some positive constant

η > 0

, it is often assumed that

∥ f (x) + g (x) u^{*} ∥ \leq η ∥ \nabla J_{s} (x) ∥ .

(33)

From this, using the Cauchy–Schwarz inequality, we have

∥ {(\nabla J_{s} (x))}^{⊤} (f (x) + g (x) u^{*}) ∥ \leq ∥ \nabla J_{s} (x) ∥ ∥ f (x) + g (x) u^{*} ∥ \leq η ∥ \nabla J_{s} {(x) ∥}^{2} .

The specific structure (31) provides a concrete way to ensure

{\dot{J}}_{s} (x) < 0

(for

\nabla J_{s} (x) \neq 0

) and is consistent with this derived upper bound. Such an assumption is generally considered reasonable, and

J_{s} (x)

is often chosen as a quadratic function in practice.

The auxiliary stabilizing term provides real-time feedback on the stabilizing capability of the current policy based on the Lyapunov stability theorem and adjusts the gradient direction accordingly. It is worth noting that the binary function

Π (x, \hat{u} (x))

is defined based on Lyapunov stability conditions. When the nonlinear system is stable—i.e., when the time derivative of the Lyapunov function satisfies

\nabla J_{s} {(x)}^{⊤} (f (x) + g (x) \hat{u}) < 0

—the auxiliary term is inactive, and the critic update is driven purely by minimizing the approximation error. In contrast, when the system tends to diverge, the auxiliary term is activated to redirect the learning process toward a stabilizing control policy. This modification provides a structural enhancement to conventional critic learning, enabling the critic to converge eventually without requiring any admissible initial controller. It significantly reduces sensitivity to initial conditions and improves the algorithm’s practicality in real-time control scenarios. The theoretical validity of this mechanism will be formally established in the next section using Lyapunov-based analysis.

Remark 3.

In ADP, PI typically relies on an IAP to maintain closed-loop stability, whereas VI is more flexible but often lacks a verifiable stability guarantee during purely online learning. To address this issue, we introduce in (28) a Lyapunov-criterion-based stability switch

Π (x, \hat{u} (x))

. When the update direction may cause

{\dot{J}}_{s} (x) \geq 0

, the switch automatically activates an auxiliary update, which integrates HJB optimization and the stability requirement during learning within a single, unified framework.

Within the assumptions adopted in this paper and over the working domain

Ω

, the design enables online optimization without an IAP and provides a proven safety guarantee; Section 3.4 establishes UUB. Compared with multi-stage or hybrid schemes [7,8], the proposed method embeds learning-phase stability directly into the update law, thereby delivering a verifiably safeguarded online learning process. This capability remains relatively uncommon among existing IAP-free ADP frameworks and is well aligned with the real-time control demands of safety-critical systems.

3.4. Fault Compensation Design

Based on the analysis in Section 3.1, we further propose a fault compensation control structure to mitigate the influence of the actuator fault. To this end, we propose the following fault compensation control law:

u_{a} (x) = \hat{u} (x) + {\hat{f}}_{a} .

(34)

Remark 4.

The proposed control scheme belongs to the category of active FTC. By incorporating a dynamically estimated fault compensation term, the controller can adapt to real-time fault variations and maintain fault-tolerant performance.

The structural diagram of the observer-based ADP scheme with the auxiliary stabilizing term for FTC is shown in Figure 1.

Figure 1. The structural diagram of the observer-based ADP scheme with auxiliary stabilizing term for FTC.

3.5. Stability Analysis

Assumption 4.

There exist positive constants

λ_{ϕ}, λ_{A}, λ_{1}, λ_{e}, λ_{G}, λ_{4},

and

λ_{6}

such that for all

x \in Ω

,

∥ ϕ (x) ∥ \leq λ_{ϕ}

,

∥ A (x) ∥ \leq λ_{A}

,

∥ A W_{c} ∥ \leq λ_{1}

,

∥ e_{c H} ∥ \leq λ_{e}

,

∥ G (x) ∥ \leq λ_{G}

,

∥ \nabla σ (x) ∥ \leq λ_{4}

, and

∥ \nabla ε_{c} (x) ∥ \leq λ_{6}

[31].

Theorem 2.

Consider the system described by Equation (2). If the feedback control law is implemented as (26), and the critic network weights are updated according to the learning rule (28), then the closed-loop system state x and the weight estimation error

{\tilde{W}}_{c}

are UUB.

Proof.

To establish the stability analysis, we define a Lyapunov candidate function as

L = \frac{1}{2 α_{c}} {\tilde{W}}_{c}^{⊤} {\tilde{W}}_{c} + \frac{α_{s}}{2 α_{c}} J_{s} (x) .

(35)

Taking the time derivative of L, we obtain

\dot{L} = \frac{1}{α_{c}} {\tilde{W}}_{c}^{⊤} {\dot{\tilde{W}}}_{c} + \frac{α_{s}}{2 α_{c}} {(\nabla J_{s} (x))}^{⊤} \dot{x} .

(36)

Substituting the critic weight update law, the dynamics of the weight estimation error are given by

{\dot{\tilde{W}}}_{c} = α_{c} \frac{\partial E_{c}}{\partial {\hat{W}}_{c}} - \frac{1}{2} α_{s} Π (x, \hat{u} (x)) \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) \nabla J_{s} (x) .

(37)

Substituting (37) and (27) into (36) yields

\begin{matrix} \dot{L} = & - [{\tilde{W}}_{c}^{⊤} ϕ (x) + \frac{1}{4} {\tilde{W}}_{c}^{⊤} A (x) {\tilde{W}}_{c} - \frac{1}{2} {\tilde{W}}_{c}^{⊤} A (x) W_{c} + e_{c H}] \times \\ [{\tilde{W}}_{c}^{⊤} ϕ (x) + \frac{1}{2} {\tilde{W}}_{c}^{⊤} A (x) {\tilde{W}}_{c} - \frac{1}{2} {\tilde{W}}_{c}^{⊤} A (x) W_{c}] \\ - \frac{α_{s}}{2 α_{c}} Π (x, \hat{u} (x)) {\tilde{W}}_{c}^{⊤} \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) \nabla J_{s} (x) \\ + \frac{α_{s}}{2 α_{c}} {(\nabla J_{s} (x))}^{⊤} \dot{x}, \end{matrix}

(38)

where

\begin{matrix} A (x) & = \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) {(\nabla σ (x))}^{⊤}, \\ ϕ (x) & = \nabla σ (x) f (x) . \end{matrix}

Based on Assumption 4, we have

\begin{matrix} \dot{L} \leq & - (\frac{1}{8} - \frac{3}{8} ϕ_{1}^{2} - \frac{3}{16} ϕ_{2}^{2}) λ_{A}^{2} {\tilde{W}}_{c}^{4} \\ + \{\frac{1}{2} λ_{A} λ_{e} + (1 + \frac{3}{8 ϕ_{1}^{2}}) λ_{ϕ}^{2} + (\frac{3}{4} + \frac{3}{16 ϕ_{2}^{2}}) λ_{1}^{2}\} {∥\begin{matrix} {\tilde{W}}_{c} \end{matrix}∥}^{2} \\ + \frac{3}{4} λ_{e}^{2} - \frac{α_{s}}{2 α_{c}} Π (x, \hat{u}) {\tilde{W}}_{c}^{⊤} \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) \nabla J_{s} (x) \\ + \frac{α_{s}}{2 α_{c}} {(\nabla J_{s} (x))}^{⊤} \dot{x} . \end{matrix}

(39)

□

Case 1: When

Π (x, \hat{u}) = 0

,

{(\nabla J_{s} (x))}^{⊤} \dot{x} < 0

. According to Assumption 3, there exists a positive constant

λ_{s}

such that the inequality

0 < λ_{s} ∥ \nabla J_{s} (x) ∥ \leq - {(\nabla J_{s} (x))}^{⊤} \dot{x}

holds. Thus, (39) can then be expressed as

\begin{matrix} \dot{L} \leq & - λ_{2} {∥{\tilde{W}}_{c}∥}^{4} + λ_{3} {∥{\tilde{W}}_{c}∥}^{2} \\ + \frac{3}{4} λ_{e}^{2} - \frac{α_{s}}{{2 α}_{c}} λ_{s} ‖ \nabla J_{s} (x) ‖, \end{matrix}

(40)

where

\begin{matrix} λ_{2} & = (\frac{1}{8} - \frac{3}{8} ϕ_{1}^{2} - \frac{3}{16} ϕ_{2}^{2}) λ_{A}^{2}, \\ λ_{3} & = \frac{1}{2} λ_{A} λ_{e} + (1 + \frac{3}{8 ϕ_{1}^{2}}) λ_{ϕ}^{2} + (\frac{3}{4} + \frac{3}{16 ϕ_{2}^{2}}) λ_{1}^{2} . \end{matrix}

Therefore, it follows that

\dot{L} < 0

when either of the following conditions is satisfied:

∥{\tilde{W}}_{c}∥ \geq \sqrt{\frac{λ_{3} + \sqrt{3 λ_{e}^{2} λ_{2} + λ_{3}^{2}}}{2 λ_{2}}} ≜ A_{1}

(41)

or

∥ \nabla J_{s} (x) ∥ \geq \frac{1}{λ_{s}} (\frac{2 α_{c}}{α_{s}}) (\frac{λ_{3}^{2}}{4 λ_{2}} + \frac{3}{4} λ_{e}^{2}) ≜ B_{1} .

(42)

Case 2: When

Π (x, \hat{u} (x)) = 1

, the auxiliary term is active. According to (26) and (25), the difference between the optimal control and its estimate is given by

u^{*} (x) - \hat{u} (x) = - \frac{1}{2} R^{- 1} g^{⊤} (x) ({(\nabla σ (x))}^{⊤} {\tilde{W}}_{c} + \nabla ε_{c} (x)) .

(43)

Following the previous derivation, we have

\dot{x} = f (x) + g (x) \hat{u} (x) = f (x) + g (x) u^{*} (x) + g (x) (\hat{u} (x) - u^{*} (x))

. Substituting the control error term (43) into the expression of

\dot{x}

yields

\dot{x} = f (x) + g (x) (u^{*} + \frac{1}{2} R^{- 1} g^{⊤} (x) ({(\nabla σ (x))}^{⊤} {\tilde{W}}_{c} + \nabla ε_{c} (x))) .

(44)

Therefore, the Lyapunov function derivative now becomes

\begin{matrix} \begin{matrix} \dot{L} ⩽ & - (\frac{1}{8} - \frac{3}{8} ϕ_{1}^{2} - \frac{3}{16} ϕ_{2}^{2}) λ_{A}^{2} {∥{\tilde{W}}_{c}∥}^{4} \\ + (\frac{1}{2} λ_{A} λ_{e} + (1 + \frac{3}{8 ϕ_{1}^{2}}) λ_{ϕ}^{2} + (\frac{3}{4} + \frac{3}{16 ϕ_{2}^{2}}) λ_{1}^{2}) {∥{\tilde{W}}_{c}∥}^{2} \\ + \frac{3}{4} λ_{e}^{2} + \frac{α_{s}}{2 α_{c}} {(\nabla J_{s} (x))}^{⊤} (f (x) + g (x) u^{*}) \\ - \frac{α_{s}}{4 α_{c}} {\tilde{W}}_{c}^{⊤} \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) \nabla J_{s} (x) \\ + \frac{α_{s}}{4 α_{c}} {(\nabla J_{s} (x))}^{⊤} g (x) R^{- 1} g^{⊤} (x) \nabla ε_{c} (x) . \end{matrix} \end{matrix}

(45)

Substituting (31) from Assumption 3 into (45), and applying the Cauchy–Schwarz and Young’s inequalities to the penultimate term of the above equation yields

\begin{matrix} - \frac{α_{s}}{4 α_{c}} {\tilde{W}}_{c}^{⊤} \nabla σ (x) g (x) R^{- 1} g^{⊤} (x) \nabla J_{s} (x) & \leq \frac{α_{s}}{8 α_{c}} ∥ {\tilde{W}}_{c} ∥^{2} + \frac{α_{s} λ_{4}^{2} λ_{G}^{2}}{8 α_{c}} {∥ \nabla J_{s} (x) ∥}^{2}, \end{matrix}

Thus, we have

\begin{matrix} \dot{L} \leq & - λ_{2} ∥ {\tilde{W}}_{c} ∥^{4} + (λ_{3} + \frac{α_{s}}{8 α_{c}}) {∥ {\tilde{W}}_{c} ∥}^{2} \\ + \frac{3}{4} λ_{e}^{2} - λ_{5} {∥ \nabla J_{s} (x) ∥}^{2} \\ + \frac{α_{s}}{4 α_{c}} λ_{G} λ_{6} ∥ \nabla J_{s} (x) ∥, \end{matrix}

(46)

where

λ_{5} = \frac{α_{s}}{2 α_{c}} (λ_{min} (Λ) - \frac{1}{4} λ_{G}^{2} λ_{4}^{2})

.

Therefore, it follows that

\dot{L} < 0

if either

∥{\tilde{W}}_{c}∥ ⩾ \sqrt{\frac{8 α_{c} λ_{3} + α_{s}}{16 α_{c} λ_{2}} + \sqrt{\frac{{(8 α_{c} λ_{3} + α_{s})}^{2}}{256 α_{c}^{2} λ_{2}^{2}} + \frac{3 λ_{e}^{2}}{4 λ_{2}} + \frac{α_{s}^{2} λ_{G}^{2} λ_{6}^{2}}{64 α_{c}^{2} λ_{2} λ_{5}}}} ≜ A_{2}

(47)

or

∥\begin{matrix} \nabla J_{s} (x) \end{matrix}∥ \geq \frac{α_{s} λ_{G} λ_{6}}{8 α_{c} λ_{5}} + \sqrt{\frac{{(8 α_{c} λ_{3} + α_{s})}^{2}}{256 α_{c}^{2} λ_{2} λ_{5}} + \frac{3 λ_{e}^{2}}{4 λ_{5}} + \frac{α_{s}^{2} λ_{G}^{2} λ_{6}^{2}}{64 α_{c}^{2} λ_{5}^{2}}} ≜ B_{2} .

(48)

By combining the two cases above, we have

\dot{L} < 0

if the following condition holds:

∥{\tilde{W}}_{c}∥ > max (A_{1}, A_{2}) or ∥\nabla J_{s} (x)∥ > max (B_{1}, B_{2}) .

(49)

According to the standard Lyapunov stability theorem, it follows that both the system state x and the critic weight estimation error

{\tilde{W}}_{c}

are UUB. This concludes the proof.

4. Simulation

This section demonstrates the effectiveness of the proposed stability-guaranteed ADP-based FTC through two simulation examples.

4.1. Example 1

Consider the following nonlinear affine system:

\dot{x} = [\begin{matrix} x_{2} - x_{1} \\ \begin{matrix} - 0.5 x_{1} - 0.5 x_{2} + \\ 0.5 x_{2} {(cos (2 x_{1}) + 2)}^{2} \end{matrix} \end{matrix}] + [\begin{matrix} 0 \\ cos (2 x_{1}) + 2 \end{matrix}] (u (t) - f_{a} (t)),

(50)

with the initial state

x (0) = {[0.5, - 0.5]}^{T}

.

f_{a} (t)

represents the unknown actuator fault, which is defined as

f_{a} (t) = \{\begin{matrix} 0.2 cos (\frac{t}{2 π}), & 30 s \leq t \leq 60 s \\ 0, & otherwise . \end{matrix}

(51)

The parameters of the performance index function are selected as

Q = 2 I_{2}

and

R = 1

, respectively, where

I_{2}

is the

2 \times 2

identity matrix. The initial state of the observer is set to

\hat{x} (0) = {[0.5, - 0.5]}^{T}

, and the initial value of the fault estimation is set to

{\hat{f}}_{a} (0) = 0

. The observer gain is chosen as

L_{1} = 43 I_{2}

and the learning rate of the fault observer is set to

L_{2} = 100

. A critic NN is constructed to approximate the value function. The weight vector is

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, {\hat{W}}_{c 3}]}^{T}

with initial weights

{\hat{W}}_{c} (0) = {[0.7, 0.9, 0.6]}^{T}

, and the activation function of the critic NN is

σ (x) = [x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}]

. The learning rate for the critic network is set to

α_{c} = 1

. The parameter of the auxiliary stabilizing term is set as

α_{s} = 1

. The total simulation duration was 100 s, and the actuator fault

f_{a} (t)

affects the system during the interval

t \in [30 s, 60 s]

.

Simulation results are shown in Figure 2, Figure 3, Figure 4 and Figure 5. The state trajectory performance is shown in Figure 2a. It shows that the states converge within 5 s. Figure 2b shows the time response curves of the observer estimation errors

{\tilde{x}}_{1}

and

{\tilde{x}}_{2}

. As illustrated in Figure 2b, the state observation errors converge to a small neighborhood around the origin. Figure 3a presents the curves of the actual actuator fault

f_{a} (t)

and the fault estimation

{\hat{f}}_{a} (t)

. The results indicate that the designed fault observer can accurately identify the unknown fault signal rapidly after fault occurrence. Figure 3b shows the control signals of the nominal optimal control (26) and the ADP-based FTC policy (34). It can be observed that after the fault occurs, the ADP-based FTC policy adjusts accordingly. Figure 4a shows the evolution process of the critic NN weights

{\hat{W}}_{c 1} (t)

,

{\hat{W}}_{c 2} (t)

, and

{\hat{W}}_{c 3} (t)

, which converge to [1.14, 0.36, 0.88]. Figure 4b further shows the working mechanism of the auxiliary stabilizing term, including the stability condition

{\dot{J}}_{s} (t)

and the stability indicator

Π (t)

. It can be seen that the stability indicator

Π (t)

is frequently activated in the early learning phase and eventually converges to 0.

Figure 2. State estimation performance for Section 4.1. (a) State performance for Section 4.1. (b) Observer estimation error for Section 4.1.

Figure 3. Fault estimation and control performance for Section 4.1. (a) Actuator fault estimation for Section 4.1. (b) Control input signals for Section 4.1.

Figure 4. Learning dynamics and stability mechanism for Section 4.1. (a) Critic NN weights for Section 4.1. (b) Stability mechanism performance for Section 4.1.

Figure 5. Performance without stability mechanism for Section 4.1. (a) Critic NN without stability mechanism for Section 4.1. (b) System state trajectories without stability mechanism for Section 4.1.

To further verify the effectiveness of the proposed auxiliary stabilizing term, a comparative analysis is performed against the conventional ADP method presented in [30]. The update law in [30] relies on a standard gradient descent approach, which typically requires an initial admissible policy to ensure stability. By removing our auxiliary stabilizing term, we effectively simulate this conventional approach under conditions where the initial policy may not be stabilizing. Figure 5a,b show the divergent critic NN weights and system states without the auxiliary stabilizing term, which indicates that the initial control fails to stabilize the system. Fortunately, with the auxiliary stabilizing term, the training process of the weight vector is reinforced until the system exhibits stable behavior (see Figure 4a,b). Thus, the requirement of an initial stabilizing control is relaxed.

4.2. Example 2

In this case, we consider a nonlinear mass-spring-damper system, with dynamics described as follows:

\dot{x} = [\begin{matrix} x_{2} \\ - 0.02 x_{1} - 0.67 x_{1}^{3} - 0.1 x_{2}^{3} \end{matrix}] + [\begin{matrix} 0 \\ 1 \end{matrix}] (u (t) - f_{a} (t)),

(52)

with the initial state

x (0) = {[0.5, - 0.5]}^{T}

. The control input

u (t)

is defined as in Example 1. In this case, the actuator fault

f_{a} (t)

is chosen as a different model, specifically as follows:

f_{a} (t) = \{\begin{matrix} 0.1 sin (\frac{1}{π} t), & 40 s \leq t \leq 70 s \\ 0, & otherwise . \end{matrix}

(53)

The observer gain is chosen as

L_{1} = 5 I_{2}

and the learning rate of the fault observer is set to

L_{2} = 100

. The weight vector is

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, {\hat{W}}_{c 3}]}^{T}

with initial weights selected as

{\hat{W}}_{c} (0) = {[0.03, 0.2, 0.04]}^{T}

. The learning rate is

α_{c} = 3

. The total simulation duration was still 100 s, with the actuator fault

f_{a} (t)

affecting the system during the interval

t \in [40 s, 70 s]

. Other parameters and initial conditions are chosen the same as Example 1.

Simulation results for Example 2 are shown in Figure 6, Figure 7, Figure 8 and Figure 9. Figure 6a shows the trajectories of the stable system states. Figure 6b shows the convergent observer errors. Figure 7a presents the curves of the actual actuator fault and its estimation. It can be observed that the fault observer can estimate the actuator fault, verifying the effectiveness of the fault observer. Figure 7b demonstrates the nominal control and ADP-based FTC signals. Figure 8a shows the trajectories of the critic weights converging to [0.44, 0.54, 1.04]. Figure 8b illustrates the working condition of the auxiliary stabilizing term. The stability indicator is mainly triggered in the early stages and eventually converges to 0. Figure 9a,b show that without the auxiliary stabilizing term, the critic NN weights and system trajectories diverge, indicating that the control policy fail to stabilize the system. Figure 6a and Figure 8a together with Figure 9a,b verify that the stability-aware Weight update mechanism can effectively overcome the dependency on an initial admissible control under different models.

Figure 6. State estimation performance for Section 4.2. (a) State performance for Section 4.2. (b) Observer estimation errors for Section 4.2.

Figure 7. Fault estimation and control performance for Section 4.2. (a) Actuator fault estimation for Section 4.2. (b) Control input signals for Section 4.2.

Figure 8. Learning dynamics and stability mechanism for Section 4.2. (a) Critic NN weights for Section 4.2. (b) Stability mechanism performance for Section 4.2.

Figure 9. Performance without stability mechanism for Section 4.2. (a) Critic NN weights without stability mechanism for Section 4.2. (b) System state trajectories without stability mechanism for Section 4.2.

5. Conclusions

This paper presents a stability-guaranteed ADP-based FTC scheme for nonlinear systems with the actuator fault, which eliminates the dependency on an initial admissible policy. This is primarily achieved by embedding a Lyapunov-based stability condition directly into the critic network’s learning process. In parallel, the actuator fault is estimated by an observer, and compensated based on the nominal optimal control, which enhances the system’s resilience. We have rigorously proven the UUB stability of all signals, including the closed-loop system states, the estimation error of critic NN weights and the observer error, and demonstrated the scheme’s effectiveness through simulations. Future research will focus on extending this framework to more complex scenarios. Key directions include designing observers for intermittent faults, developing data-driven strategies for systems with unknown dynamics, and applying the method to distributed networked architectures to address challenges such as asynchronous group consensus [32], containment control with deferred constraints [33], and resilience against cyber-attacks via event-triggered control [34].

Author Contributions

Author Contributions: L.L. and J.L. provided methodology, validation, and writing—original draft preparation; H.L. provided conceptualization, supervision and writing—review; R.Z. and L.W. provided results discussion and writing—review. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, Z.; Liu, G.; Zhou, M. A robust mean-field actor-critic reinforcement learning against adversarial perturbations on agent states. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 14370–14381. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, H.; Sun, J.; Yue, X. ADP-based fault-tolerant control for multiagent systems with semi-Markovian jump parameters. IEEE Trans. Cybern. 2024, 54, 5952–5962. [Google Scholar] [CrossRef] [PubMed]
Mu, C.; Wang, K. Aperiodic adaptive control for neural-network-based nonzero-sum differential games: A novel event-triggering strategy. ISA Trans. 2019, 92, 1–13. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Ding, D.; Dong, H.; Zhang, X.-M. Neural-network-based control for discrete-time nonlinear systems with input saturation under stochastic communication protocol. IEEE/CAA J. Autom. Sin. 2021, 8, 766–778. [Google Scholar] [CrossRef]
Xiao, G.; Zhang, H. Convergence analysis of value iteration adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans. Cybern. 2024, 54, 1639–1650. [Google Scholar] [CrossRef]
Ha, M.; Wang, D.; Liu, D. Novel discounted adaptive critic control designs with accelerated learning formulation. IEEE Trans. Cybern. 2024, 54, 3003–3016. [Google Scholar] [CrossRef]
Duan, J.; Li, J.; Ge, Q.; Li, S.E.; Bujarbaruah, M.; Ma, F. Relaxed actor-critic with convergence guarantees for continuous-time optimal control of nonlinear systems. IEEE Trans. Intell. Veh. 2023, 8, 3299–3313. [Google Scholar] [CrossRef]
Luo, B.; Yang, H.; Wu, H.-N.; Huang, T. Balancing value iteration and policy iteration for discrete-time control. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3948–3958. [Google Scholar] [CrossRef]
Guo, Z.; Yao, D.; Bai, W.; Li, H.; Lu, R. Event-triggered guaranteed cost fault-tolerant optimal tracking control for uncertain nonlinear system via adaptive dynamic programming. Int. J. Robust Nonlinear Control 2021, 31, 2572–2592. [Google Scholar] [CrossRef]
Teng, J.; Li, C.; Feng, Y.; Yang, T.; Zhou, R. Adaptive observer based fault tolerant control for sensor and actuator faults in wind turbines. Sensors 2021, 21, 8170. [Google Scholar] [CrossRef]
Sheng, N.; Liu, Y.; Chi, R. Adaptive observer-based finite-time fault tolerant control for non-strict feedback systems. J. Syst. Sci. Complex. 2024, 37, 1526–1544. [Google Scholar] [CrossRef]
Vazquez Trejo, J.A.; Ponsart, J.-C.; Adam-Medina, M.; Valencia-Palomo, G.; Vazquez Trejo, J.A.; Theilliol, D. Distributed observer-based fault-tolerant leader-following control of multi-agent systems. IFAC-PapersOnLine 2022, 55, 209–221. [Google Scholar] [CrossRef]
Yuan, Y.; Zhang, T.; Lin, Z.; Zhao, Z.; Zhang, X. Actuator fault tolerant control of variable cycle engine using sliding mode control scheme. Actuators 2021, 10, 24. [Google Scholar] [CrossRef]
Du, Z.; Li, J.; Ye, H.; Li, J. Integral sliding mode control for nonlinear networked cascade control systems with multiple delays. J. Frankl. Inst. 2025, 362, 107680. [Google Scholar] [CrossRef]
Liu, C.; Jiang, B.; Patton, R.J.; Zhang, K. Decentralized output sliding-mode fault-tolerant control for heterogeneous multiagent systems. IEEE Trans. Cybern. 2020, 50, 4934–4945. [Google Scholar] [CrossRef]
Yan, W.; Tu, H.; Qin, P.; Zhao, T. Interval type-II fuzzy fault-tolerant control for constrained uncertain 2-DOF robotic multi-agent systems with active fault detection. Sensors 2023, 23, 4836. [Google Scholar] [CrossRef]
Sun, K.; Ma, Z.; Dong, G.; Gong, P. Adaptive fuzzy fault-tolerant control of uncertain fractional-order nonlinear systems with sensor and actuator faults. Fractal Fract. 2023, 7, 862. [Google Scholar] [CrossRef]
Zhai, D.; Xi, C.; Dong, J.; Zhang, Q. Adaptive fuzzy fault-tolerant tracking control of uncertain nonlinear time-varying delay systems. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 1840–1849. [Google Scholar] [CrossRef]
Liu, X.; Zhao, B.; Liu, D. Fault tolerant tracking control for nonlinear systems with actuator failures through particle swarm optimization-based adaptive dynamic programming. Appl. Soft Comput. 2020, 97, 106766. [Google Scholar] [CrossRef]
Zeng, C.; Zhao, B.; Liu, D. Fault tolerant control for a class of nonlinear systems with multiple faults using neuro-dynamic programming. Neurocomputing 2023, 553, 126502. [Google Scholar] [CrossRef]
Rahimi, F. Adaptive dynamic programming-based fault tolerant control for nonlinear time-delay systems. Chaos Solit. Fractals 2024, 188, 115544. [Google Scholar] [CrossRef]
Rahimi, F. Fault-tolerant control for nonlinear time-delay systems using neural network observers. Int. J. Dyn. Control 2025, 13, 33. [Google Scholar] [CrossRef]
Guo, Z.; Zhou, Q.; Ren, H.; Ma, H.; Li, H. ADP-based fault-tolerant consensus control for multiagent systems with irregular state constraints. Neural Netw. 2024, 180, 106737. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Li, T.; Yang, Y.; Tong, S.; Chen, C.L.P. Simplified ADP for event-triggered control of multiagent systems against FDI attacks. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 4672–4685. [Google Scholar] [CrossRef]
Lin, H.; Zhao, B.; Liu, D.; Alippi, C. Data-based fault tolerant control for affine nonlinear systems through particle swarm optimized neural networks. IEEE/CAA J. Autom. Sin. 2020, 7, 954–964. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, J.; Li, S.; Luo, X.; Guan, X. Off-policy reinforcement-learning-based fault-tolerant H_∞ control for topside separation systems with time-varying uncertainties. ISA Trans. 2025, 156, 11–19. [Google Scholar] [CrossRef]
Wang, D.; Hu, L.; Li, X.; Qiao, J. Online fault-tolerant tracking control with adaptive critic for nonaffine nonlinear systems. IEEE/CAA J. Autom. Sin. 2025, 12, 215–227. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, B.; Liu, D.; Polycarpou, M. Event-triggered adaptive dynamic programming for decentralized tracking control of input constrained unknown nonlinear interconnected systems. Neural Netw. 2023, 157, 336–349. [Google Scholar] [CrossRef]
Zhang, Z.; Ma, T.; Zhao, Y.; Yu, S.; Zhou, F. Adaptive dynamic programming-based multi-fault tolerant control of reconfigurable manipulator with input constraint. Complex Intell. Syst. 2024, 10, 8341–8353. [Google Scholar] [CrossRef]
Zhao, B.; Liu, D.; Li, Y. Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems. Inf. Sci. 2017, 384, 21–33. [Google Scholar] [CrossRef]
Wang, D.; Liu, D.; Mu, C.; Zhang, Y. Neural Network Learning and Robust Stabilization of Nonlinear Systems With Dynamic Uncertainties. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 1342–1351. [Google Scholar] [CrossRef]
Ren, H.; Zhang, C.; Ma, H.; Li, H. Cloud-based distributed group asynchronous consensus for switched nonlinear cyber-physical systems. IEEE Trans. Ind. Inform. 2025, 21, 693–701. [Google Scholar] [CrossRef]
Ma, H.; Zhou, Q.; Ren, H.; Wang, Z. Distributed estimator-based fuzzy containment control for nonlinear multiagent systems with deferred constraints. IEEE Trans. Fuzzy Syst. 2025, 33, 2074–2083. [Google Scholar] [CrossRef]
Du, Z.; Di, M.; Li, C.; Ye, H.; Li, J. Improved event-triggered controller design for nonlinear networked cascade control system under cyber attacks. Nonlinear Dyn. 2025. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

ADP-Based Fault-Tolerant Control with Stability Guarantee for Nonlinear Systems

Abstract

1. Introduction

2. Problem Statement and Preliminaries

2.1. Problem Formulation

2.2. Nominal Optimal Control

3. Stability-Guaranteed ADP-Based FTC Without IAP

3.1. Fault Observer Design

3.2. Nominal Optimal Control via Critic Neural Network

3.3. Stability-Aware Weight Update Mechanism

3.4. Fault Compensation Design

3.5. Stability Analysis

4. Simulation

4.1. Example 1

4.2. Example 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics