Neural Adaptive H∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

Huang, Yuzhu; Zhang, Zhaoyan

doi:10.3390/e25121570

Open AccessArticle

Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

by

Yuzhu Huang

^* and

Zhaoyan Zhang

College of Electronic and Information Engineering, Hebei University, Baoding 071002, China

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(12), 1570; https://doi.org/10.3390/e25121570

Submission received: 17 October 2023 / Revised: 16 November 2023 / Accepted: 17 November 2023 / Published: 22 November 2023

(This article belongs to the Special Issue Intelligent Modeling and Control)

Download

Browse Figures

Versions Notes

Abstract

:

This paper focuses on a neural adaptive

H_{\infty}

sliding-mode control scheme for a class of uncertain nonlinear systems subject to external disturbances by the aid of adaptive dynamic programming (ADP). First, by combining the neural network (NN) approximation method with a nonlinear disturbance observer, an enhanced observer framework is developed for estimating the system uncertainties and observing the external disturbances simultaneously. Then, based on the reliable estimations provided by the enhanced observer, an adaptive sliding-mode controller is meticulously designed, which can effectively counteract the effects of the system uncertainties and the separated matched disturbances, even in the absence of prior knowledge regarding their upper bounds. While the remaining unmatched disturbances are attenuated by means of

H_{\infty}

control performance on the sliding surface. Moreover, a single critic network-based ADP algorithm is employed to learn the cost function related to the Hamilton–Jacobi–Isaacs equation, and thus, the

H_{\infty}

optimal control is obtained. An updated law for the critic NN is proposed not only to make the Nash equilibrium achieved, but also to stabilize the sliding-mode dynamics without the need for an initial stabilizing control. In addition, we analyze the uniform ultimate boundedness stability of the resultant closed-loop system via Lyapunov’s method. Finally, the effectiveness of the proposed scheme is verified through simulations of a single-link robot arm and a power system.

Keywords:

adaptive dynamic programming; disturbance observer; neural networks; optimal control; sliding-mode control

1. Introduction

Within the last few decades, multifarious robust control design theories and methods have been proposed for uncertain nonlinear systems [1]. As one of the most efficient and widely used control methods, sliding-mode control (SMC) has garnered significant attention by reason of its simplicity, order reduction and inherent robustness against the matched uncertainties [2]. The classical SMC approach is to exert a discontinuous control to drive the system states onto a prescribed sliding manifold or surface [3]. As long as the sliding surface is reached, the system will become immune from the matched uncertainties and input disturbances. To remove the reaching phase, an integral SMC was developed by using the integral sliding manifold, including an integral term, which can enable the system states to reach and remain on the sliding manifold from the beginning [4,5,6]. Although towards a wide variety of actual systems, the relevant uncertainties and disturbances can be assumed to be matched in the design of control systems, there are also many physical systems, such as permanent magnet synchronous motors [7], underactuated aerial vehicles and robotic systems [8] directly affected by unmatched disturbances. Lately, several new approaches involving the integral SMC have been proposed to stabilize various systems with unmatched disturbances [9,10,11,12,13]. Among these methods, it is worth noticing that in [12,13], the impact of the separated unmatched disturbances would not be amplified after choosing a suitable projection matrix in a sliding manifold and were attenuated by the combination of the integral SMC with

H_{\infty}

control theories. This provides a feasible and effective way to handle the unmatched disturbances and helps explore the relationships between integral SMC and

H_{\infty}

control in nonlinear system control design.

In many instances, we expect the control policy not just to make the closed-loop system stable, but to possess certain optimality by minimizing the user-defined cost. For nonlinear systems, the settlement of associated optimal control problems requires solving the Hamilton–Jacobi–Bellman (HJB) equation. While considering

H_{\infty}

optimal control, based on the dissipativity theory, it can be formulated as an

L_{2}

-gain control problem, which involves solving the Hamilton–Jacobi–Isaacs (HJI) equation [14]. However, the analytical solutions of both HJB and HJI equations are very hard or even impossible to obtain directly because of their inherent nonlinearities [15]. In recent years, a class of neural network (NN) and reinforcement learning (RL)-based intelligent optimization and control methods, referred to as adaptive dynamic programming (ADP), is becoming more and more striking and shows great application potential in solving various optimization problems, and effectively conquers the “curse of dimensionality” [15,16]. By now, many researchers have employed ADP to tackle a variety of optimal control problems for both discrete-time (DT) [17,18,19,20,21,22] and continuous-time (CT) systems [23,24,25,26,27,28]. Moreover, how to combine ADP with other robust methods to achieve better performance and stronger robustness for uncertain nonlinear systems is becoming a new research focus [29,30].

Recently, Modares et al. [31] proposed an online integral RL algorithm that incorporates a non-quadratic discounted cost function to address the constrained-input optimal tracking problem. Luo et al. [32] described an NN-based off-policy learning algorithm within the actor-critic framework to deal with the associated HJI equation, and this algorithm was later extended to find the near-optimal

H_{\infty}

tracking control solution in [33]. Nevertheless, the influences of potential system or modeling uncertainties were not taken into account in the design. Wang et al. [34] introduced a robust neuro-optimal control approach for input-affine nonlinear systems with both matched and state-depended uncertainties. They achieved this by redesigning the cost function and selecting a suitable feedback gain, whereas the upper bound function of uncertainties is needed for redesigning the cost function to suppress these uncertainties. Mitra et al. [35] presented an optimal SMC scheme for the single-input cascade nonlinear systems with matched bounded disturbances. Fan et al. [36] investigated an adaptive actor–critic-based integral SMC strategy for CT nonlinear systems with unknown terms and input disturbances, where the initial stabilizing control requirement in the learning was quite stringent and limiting in practical applications. Qu et al. [37] developed an adaptive

H_{\infty}

optimal SMC method in the presence of actuator faults and unmatched disturbances using the ADP algorithm, and further explored the optimal guaranteed cost SMC for constrained-input uncertain systems by formulating an auxiliary system and redefining the utility function [38]. Based on [37], combined with event-triggered mechanisms, Yang et al. [39] provided an event-triggered integral SMC design for nonlinear control-affine systems by leveraging the ADP technique. Note that these methods mentioned above rely on the availability of upper bounds for matched or unmatched disturbances, which may cause over-design and thus leads to an over-conservative control scheme. Additionally, in real-world scenarios, determining precise upper bounds of external disturbances is often a challenging task.

Inspired by the works mentioned earlier, we propose an adaptive neural

H_{\infty}

SMC scheme for uncertain nonlinear systems subject to external disturbances using the ADP algorithm. Based on the enhanced observer system composed of the NN identifier and nonlinear disturbance observer (DO), an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds. While on the sliding manifold, the remaining unmatched disturbances are attenuated by

H_{\infty}

optimal control solved by the single-network ADP algorithm. Moreover, the uniform ultimate boundedness stability of the resultant closed-loop system are guaranteed via the Lyapunov approach.

The principal contributions of this study can be enumerated as follows. First, unlike other existing schemes [34,35,36,37,38,39], based on the enhanced observer system, the proposed approach makes the designed sliding-mode controller independent from the relevant upper bounds of uncertainties and disturbances, which renders the implementation much easier and more practical and removes the assumption that the upper bounds need to be known in advance. Second, compared with the algorithms presented in [36,37], our approach can deal with both unknown nonlinear terms and unmatched external disturbances, where the single-network ADP is utilized to approximate an

H_{\infty}

optimal control. Unlike typical actor–critic–disturbance network architectures, the single critic network structure may bring a simpler implementation, lower calculation amount, and avoid the numerical approximate errors arising from actor and disturbance networks. Third, we introduce an updated law for the critic NN, which not only achieves the Nash equilibrium, but also ensures the stability of the sliding-mode dynamics without the need for an initial stabilizing control in the learning.

The remainder of this paper is arranged as follows. Section 2 outlines the problem formulation and provides some necessary preliminaries. Section 3 describes the design of an integral SMC based on the enhanced observer system. Section 4 presents the application of the single-network ADP to obtain

H_{\infty}

optimal control for the sliding-mode dynamics, along with stability analysis. Simulations of the robotic arm and a power system are given in Section 5, followed by a summary of this study in Section 6.

2. Problem Formulation

Consider the following uncertain perturbed nonlinear system as

\dot{x} = f (x) + Δ f (x) + (g (x) + Δ g (x)) u + d,

(1)

where the state vector

x \in R^{n}

is measurable,

u \in R^{m}

is the control input,

f (x) \in R^{n}

and

g (x) \in R^{n \times m}

are the known system drift and input dynamics, respectively;

Δ f (x)

and

Δ g (x)

denote uncertain nonlinear terms that refer to either the inherent characteristics of the system or modeling uncertainties, while

d \in R^{n}

represents the unknown external disturbances. Moreover, it is assumed that the system uncertainties

Δ f (x)

and

Δ g (x)

satisfy the matched condition, i.e.,

Δ f (x) + Δ g (x) u = g (x) w (x, u)

, then the system (1) is rewritten in the form of

\dot{x} = f (x) + g (x) u + g (x) w (x, u) + d

(2)

with

w (x, u)

being the bounded lumped uncertain term. Let

Ω \subseteq R^{n}

be a compact set, and suppose that

f (x) + g (x) u

is Lipschitz continuous over

Ω

with

f (0) = 0

. Besides,

d \in L_{2} [0, \infty]

and its derivative

\dot{d}

is bounded such that

∥ \dot{d} ∥ \leq d_{M}

with

d_{M} > 0

. To avoid any confusion,

∥ \cdot ∥

denotes the 2-norm of a vector or the Frobenius norm of a matrix hereafter, unless otherwise specified.

Assumption 1.

The input matrix

g (x)

has a full column rank and is norm bounded with

g_{M} > 0

, that is,

∥ g (x) ∥ \leq g_{M}

for any x. Moreover, the resulting left pseudoinverse

g^{+} (x) \in R^{m \times n}

is given by

g^{+} (x) = {(g^{T} (x) g (x))}^{- 1} g^{T} (x)

, which is bounded by

∥ g^{+} (x) ∥ \leq b_{M}

, where

b_{M}

,

g_{M}

are known positive constants.

Based on Assumption 1, d is then decomposed into the matched and unmatched components through the projection of d onto the input matrix

g (x)

as

d = g (x) g^{+} (x) d + (I - g (x) g^{+} (x)) d,

(3)

where I denotes an identity matrix of appropriate dimensions, and

g^{+} (x)

is the left pseudoinverse of

g (x)

. It should be noted that Assumption 1 is somewhat restrictive, which may lessen the applicability scope of the proposed approach to some extent. However, many real-world physical systems, such as the satellite dynamics, the hypersonic flight vehicle and overhead crane systems, have such a property to make this assumption valid [15,20].

To deal with the uncertain nonlinear system (1) with external disturbances, an enhanced observer system is first constructed for estimating the uncertain terms and observing the unknown disturbances simultaneously. Then, based on the reliable estimations, an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds. Meanwhile, the remaining unmatched disturbances are attenuated by

H_{\infty}

optimal control on the sliding surface. Moreover, the single-network ADP algorithm is employed to learn the cost function related to the Hamilton–Jacobi–Isaacs equation, and then, the

H_{\infty}

optimal control is obtained. What is more, a weight updating law is formulated to ensure both the achievement of Nash equilibrium and the stabilization of sliding-mode dynamics during the learning process.

3. Integral SMC Design Based on the Enhanced Observer System

Recalling the NN universal approximation property, the uncertain term

w (x, u)

can be represented by a three-layered NN as

w (x, u) = W_{o}^{T} σ (V_{o}^{T} \bar{x}) + ε_{o} (x),

(4)

where

W_{o} \in R^{l_{o} \times m}

and

V_{o} \in R^{(n + m) \times l_{o}}

denote unknown ideal weight matrices between the output and hidden, and hidden and input layers, respectively;

\bar{x} = {[x^{T}, u^{T}]}^{T} \in R^{n + m}

is the NN input,

σ (\cdot) \in R^{l_{o}}

represents the activation function with

l_{o}

hidden layer neurons, and

ε_{o} (x) \in R^{m}

stands for the NN reconstruction error. To simplify the learning process, only the weights of

W_{o}

are adapted online, while

V_{o}

is an initialized set with random values and then remains unchanged during the weight updating process [16].

The NN identifier is designed by

\dot{\hat{x}} = A \hat{x} + f (x) - A x + g (x) u + g (x) {\hat{W}}_{o}^{T} σ (z) + d,

(5)

where A is a Hurwitz matrix,

\hat{x}

is the identifier state,

{\hat{W}}_{o}

is the estimate of

W_{o}

, and the activation function

σ (z) = σ (V_{o}^{T} \bar{x})

with

z = V_{o}^{T} \bar{x}

. Since the unknown disturbance term d is needed in (5), inspired by [11], a nonlinear DO is introduced for obtaining

\hat{d}

, namely, the estimated value of d.

Then, combining the NN identifier with a nonlinear DO, an enhanced observer system is constructed as

\begin{matrix} \dot{\hat{x}} = & A \hat{x} + f (x) - A x + g (x) u + g (x) {\hat{W}}_{o}^{T} σ (z) + \hat{d} \\ \dot{d_{0}} = & - l (x) (f (x) + g (x) u + g (x) {\hat{W}}_{o}^{T} σ (z) + d_{0} + p (x)), \end{matrix}

(6)

with

\hat{d} = d_{0} + p (x)

, where

d_{0}

is an auxiliary variable, and

p (x)

is a designed state-dependent function and brings out the gain function

l (x)

such that

l (x) = {(\partial p (x) / \partial x)}^{T}

. Following (6), we have

\dot{\hat{d}} = - l (x) \hat{d} + l (x) d + l (x) g (x) {\tilde{W}}_{o}^{T} σ (z) + l (x) g (x) ε_{o} (x),

(7)

where

{\tilde{W}}_{o} = W_{o} - {\hat{W}}_{o}

represents the NN weight estimation error. Let

\tilde{x} = x - \hat{x}

and

\tilde{d} = d - \hat{d}

be the state and disturbance estimation errors, respectively. Subtracting (5) from (2) and combining with (7), we obtain the coupled error dynamics of (6) as follows:

\begin{matrix} \dot{\tilde{x}} & = A \tilde{x} + g (x) {\tilde{W}}_{o}^{T} σ (z) + \tilde{d} + g (x) ε_{o} (x), \\ \dot{\tilde{d}} & = - l (x) \tilde{d} - l (x) g (x) {\tilde{W}}_{o}^{T} σ (z) + \dot{d} - l (x) g (x) ε_{o} (x) . \end{matrix}

(8)

Before proceeding, we introduce a common assumption for stability analysis [15,16].

Assumption 2.

For the identifier NN, there are known positive constants

σ_{M}

,

ε_{M}

,

W_{M}

and

V_{M}

in the sense that

∥ σ (z) ∥ \leq σ_{M}

,

∥ ε_{o} (x) ∥ \leq ε_{M}

,

∥ W_{o} ∥ \leq W_{M}

and

∥ V_{o} ∥ \leq V_{M}

, respectively.

Lemma 1.

Considering the system (2) and the coupled error dynamics (8), let the identifier NN weight

{\hat{W}}_{o}

be updated by

{\dot{\hat{W}}}_{o} = - η_{1} σ (z) {\tilde{x}}^{T} A^{- 1} g (x) - η_{2} (∥ \tilde{x} ∥ + 1) {\hat{W}}_{o},

(9)

where

η_{1}

,

η_{2}

are the positive updating ratios. Moreover, we select parameter matrices A, P and gain function

l (x)

to satisfy

P^{T} P - l (x) - l^{T} (x) + l (x) g (x) g^{T} (x) l^{T} (x) \leq - ρ I

(10)

with

ρ > 0

. Then all the estimation errors

\tilde{x}

,

\tilde{d}

, and

{\tilde{W}}_{0}

are uniformly ultimately bounded (UUB).

Proof.

Consider the Lyapunov function candidate given by

L_{1} = \frac{1}{2} {\tilde{x}}^{T} P \tilde{x} + \frac{1}{2} {\tilde{d}}^{T} \tilde{d} + \frac{1}{2} tr {{\tilde{W}}_{o}^{T} {\tilde{W}}_{o}},

(11)

where

L_{11} = {\tilde{x}}^{T} P \tilde{x} / 2 + {\tilde{d}}^{T} \tilde{d} / 2

,

L_{12} = tr {{\tilde{W}}_{o}^{T} {\tilde{W}}_{o}} / 2

, and

P = P^{T}

is positive definite, which together with some matrix

Λ > 0

satisfies

A^{T} P + P A = - Λ

for the Hurwitz matrix A. By taking the time derivative of

L_{11}

and substituting the coupled error dynamics (8), we can obtain

\begin{matrix} {\dot{L}}_{11} = & \frac{1}{2} {\tilde{x}}^{T} (A^{T} P + P A) \tilde{x} + {\tilde{x}}^{T} P \tilde{d} + {\tilde{x}}^{T} P g (x) ε_{o} (x) + {\tilde{x}}^{T} P g (x) {\tilde{W}}_{o}^{T} σ (z) - {\tilde{d}}^{T} l (x) \\ \times g (x) {\tilde{W}}_{o}^{T} σ (z) - \frac{1}{2} {\tilde{d}}^{T} (l (x) + l^{T} (x)) \tilde{d} - {\tilde{d}}^{T} l (x) g (x) ε_{o} (x) + {\tilde{d}}^{T} \dot{d} . \end{matrix}

(12)

Based on Assumption 2, together with Young’s inequality, it follows:

\begin{matrix} {\dot{L}}_{11} \leq & - \frac{1}{2} {\tilde{x}}^{T} Λ \tilde{x} + \frac{1}{2} {\tilde{x}}^{T} \tilde{x} + \frac{1}{2} {\tilde{d}}^{T} (P^{T} P - l (x) - l^{T} (x) + l (x) g (x) g^{T} (x) l^{T} (x)) \tilde{d} \\ + {\tilde{d}}^{T} \dot{d} + {\tilde{x}}^{T} P g (x) {\tilde{W}}_{o}^{T} σ (z) + {\tilde{x}}^{T} P g (x) ε_{o} (x) + σ_{M}^{2} {∥ {\tilde{W}}_{o} ∥}^{2} + ε_{M}^{2} . \end{matrix}

(13)

Considering (10), (13) is rewritten as

\begin{matrix} {\dot{L}}_{11} \leq & - \frac{1}{2} τ {\tilde{x}}^{T} \tilde{x} - \frac{1}{2} ρ {\tilde{d}}^{T} \tilde{d} + {\tilde{x}}^{T} P g (x) {\tilde{W}}_{o}^{T} σ (z) + {\tilde{x}}^{T} P g (x) ε_{o} (x) \\ + {\tilde{d}}^{T} \dot{d} + σ_{M}^{2} ∥ {\tilde{W}}_{o} ∥^{2} + ε_{M}^{2}, \end{matrix}

(14)

where

τ = λ_{\min} (Λ) - 1 > 0

ensured by properly selecting positive definite matrix

Λ

and its minimum eigenvalue

λ_{\min} (Λ)

.

Combining with (9),

{\dot{L}}_{12}

is derived as

\begin{matrix} {\dot{L}}_{12} = tr {η_{1} {\tilde{W}}_{o}^{T} σ (z) {\tilde{x}}^{T} A^{- 1} g (x) + η_{2} {\tilde{W}}_{o}^{T} ∥ \tilde{x} ∥ {\hat{W}}_{o} + η_{2} {\tilde{W}}_{o}^{T} {\hat{W}}_{o}} . \end{matrix}

(15)

With the inequality

tr {{\tilde{W}}_{o}^{T} {\hat{W}}_{o}} \leq ∥ W_{o} ∥^{2} / 2 - ∥ {\tilde{W}}_{o} ∥^{2} / 2

, (15) becomes

\begin{matrix} {\dot{L}}_{12} \leq tr {η_{1} {\tilde{W}}_{o}^{T} σ (z) {\tilde{x}}^{T} A^{- 1} g (x)} + tr {η_{2} {\tilde{W}}_{o}^{T} ∥ \tilde{x} ∥ {\hat{W}}_{o}} + \frac{η_{2}}{2} ∥ W_{o} ∥^{2} - \frac{η_{2}}{2} {∥ {\tilde{W}}_{o} ∥}^{2} . \end{matrix}

Note that the relationship

tr {A^{T} B} = B^{T} A

for all

A \in R^{n}

,

B \in R^{n}

and the inequality

tr {{\tilde{W}}_{o}^{T} (W_{o} - {\tilde{W}}_{o})} \leq W_{M} ∥ {\tilde{W}}_{o} ∥ - ∥ {\tilde{W}}_{o} ∥^{2}

, we can have

\begin{matrix} {\dot{L}}_{12} \leq & η_{1} σ_{M} g_{M} ∥ \tilde{x} ∥ ∥ A^{- 1} ∥ ∥ {\tilde{W}}_{o} ∥ + η_{2} W_{M} ∥ \tilde{x} ∥ ∥ {\tilde{W}}_{o} ∥ - η_{2} ∥ \tilde{x} ∥ ∥ {\tilde{W}}_{o} ∥^{2} + \frac{η_{2}}{2} {∥ W_{o} ∥}^{2} \\ - \frac{η_{2}}{2} {∥ {\tilde{W}}_{o} ∥}^{2} . \end{matrix}

(16)

By combining (14) and (16) and taking their norms, one can derive an upper bound for

{\dot{L}}_{1} (t)

as

\begin{matrix} {\dot{L}}_{1} \leq & - \frac{1}{2} τ ∥ \tilde{x} ∥^{2} + (g_{M} ε_{M} ∥ P ∥ + (g_{M} σ_{M} ∥ P ∥ + η_{1} g_{M} σ_{M} ∥ A^{- 1} ∥ + η_{2} W_{M}) ∥ {\tilde{W}}_{o} ∥ - η_{2} \\ \times ∥ {\tilde{W}}_{o} ∥^{2}) ∥ \tilde{x} ∥ - \frac{1}{2} ρ ∥ \tilde{d} ∥^{2} + d_{M} ∥ \tilde{d} ∥ + \frac{η_{2}}{2} ∥ W_{o} ∥^{2} - \frac{η_{2} - 2 σ_{M}^{2}}{2} {∥ {\tilde{W}}_{o} ∥}^{2} + ε_{M}^{2} . \end{matrix}

(17)

Select

η_{2} \geq 2 σ_{M}^{2}

and complete the square with respect to

∥ {\tilde{W}}_{o} ∥

, then (17) becomes

\begin{matrix} {\dot{L}}_{1} \leq & - \frac{1}{2} τ ∥ \tilde{x} ∥^{2} - \frac{1}{2} ρ ∥ \tilde{d} ∥^{2} + (g_{M} ε_{M} ∥ P ∥ - η_{2} {(∥ {\tilde{W}}_{o} ∥ - Θ_{1})}^{2} + η_{2} {∥ Θ_{1} ∥}^{2}) ∥ \tilde{x} ∥ \\ + d_{M} ∥ \tilde{d} (t) ∥ + Θ_{2}, \end{matrix}

(18)

where

\begin{matrix} Θ_{1} = \frac{g_{M} σ_{M} ∥ P ∥ + η_{1} g_{M} σ_{M} ∥ A^{- 1} ∥ + η_{2} W_{M}}{2 η_{2}}, Θ_{2} = \frac{η_{2} {∥ W_{o} ∥}^{2} + 2 ε_{M}^{2}}{2} . \end{matrix}

Define

e_{x d} = [\begin{matrix} \tilde{x} \\ \tilde{d} \end{matrix}], E_{o} = \frac{1}{2} [\begin{matrix} τ I 0 \\ 0 ρ I \end{matrix}]

and

B_{o} = [g_{M} ε_{M} ∥ P ∥ + η_{2} {∥ Θ_{1} ∥}^{2}, d_{M}]

, we can further derive

{\dot{L}}_{1} \leq - λ_{\min} (E_{o}) ∥ e_{x d} ∥^{2} + ∥ B_{o} ∥ ∥ e_{x d} ∥ + Θ_{2} .

(19)

Therefore, we can conclude that

{\dot{L}}_{1} < 0

only if

∥ e_{x d} (t) ∥

satisfies

∥ e_{x d} ∥ > \frac{∥ B_{o} ∥}{2 λ_{\min} (E_{o})} + \sqrt{\frac{∥ B_{o} ∥^{2}}{4 λ_{\min}^{2} (E_{o})} + \frac{Θ_{2}}{λ_{\min} (E_{o})}} .

Furthermore, according to the Lyapunov extension theorem [16], when the inequality (10) holds by selecting proper matrices, we can infer that all the estimation errors

\tilde{x}

,

\tilde{d}

, and

{\tilde{W}}_{o}

are UUB. □

Remark 1.

The gain function matrix

l (x)

is an important design parameter that can be chosen as linear or nonlinear functions. When the form of system function

g (x)

is simple, it can be easy to find the function

l (x)

that satisfies the inequality (10) by substituting appropriate functions into (10). However, if the form of system function

g (x)

is complex, the trial and error method is employed to select appropriate function

l (x)

that meets the inequality (10). Although there is no universal design procedure for designing

l (x)

, experience has shown that it is not difficult to find a suitable

l (x)

for specific applications [36,37].

To effectively handle both system uncertainties and external disturbances, we propose a compound

H_{\infty}

optimal SMC scheme that combines the integral SMC with

H_{\infty}

control theories. This compound controller is formulated as

u = u_{d} + u_{c},

(20)

where

u_{d}

represents the discontinuous control designed to steer the system trajectories towards and maintain them on the sliding surface, thereby eliminating the effects of matched uncertainties and disturbances.

u_{c}

denotes the continuous control derived to guarantee the system stability and achieve near-optimal performance under the remaining unmatched disturbances on sliding surfaces.

Accordingly, we define the integral sliding surface as follows:

s (x) = S_{0} (x) - S_{0} (x_{0}) - \int_{0}^{t} G (x) (f (x) + g (x) u_{c}) d v,

(21)

where

x_{0}

denotes the initial state,

S_{0} (x) \in R^{m}

and

G (x) = \partial S_{0} (x) / \partial x \in R^{m \times n}

. Moreover, it follows from Assumption 1 that a suitable matrix

G (x)

can be found such that the product

G (x) g (x)

is invertible.

Taking the time derivative of

s (x)

as

\dot{s} (x) = G (x) (g (x) u_{d} + g (x) w (x, u) + d) .

(22)

By incorporating the valid estimators

\hat{d}

and

{\hat{W}}_{o}

,

u_{d}

is devised as

\begin{matrix} u_{d} = & - {(G (x) g (x))}^{- 1} (G (x) \hat{d} + G (x) g (x) {\hat{W}}_{o}^{T} σ (z) + μ sgn (s) + \frac{G (x) G^{T} (x) s}{∥ s^{T} G (x) ∥} ζ), \end{matrix}

(23)

where

μ > 0

,

sgn (s) \in R^{m}

is the sign function, and

ζ \in R

is generated by

\dot{ζ} = κ ∥ s^{T} G (x) ∥

(24)

with

κ > 0

. In particular, it is noted that

ζ

is designed to tackle the unknown bounds of the approximation errors arisen from the estimated terms

\hat{d}

and

{\hat{W}}_{o}

.

Considering the specific implementation of

\hat{d}

and

{\hat{W}}_{o}

in (23), we define

ζ_{e} = \tilde{d} + g (x) {\tilde{W}}_{o} σ (z) + g (x) ε_{o} (x)

to represent the approximation errors. Based on the previous analysis and the boundedness of

g (x)

,

ζ_{e}

is bounded as

∥ ζ_{e} ∥ \leq ζ_{M}

for an unknown positive constant

ζ_{M}

. To estimate

ζ_{M}

, we design

ζ

as defined in (24), and the estimation error is calculated as

\tilde{ζ} = ζ_{M} - ζ

.

Theorem 1.

Considering system (2) with the sliding surface (21), the discontinuous control

u_{d}

is devised by (23) with the adaptive law (24), then it can guarantee the convergence of the sliding surface s to zero from the beginning.

Proof.

Choose the positive definite Lyapunov function candidate as

L_{s} = \frac{1}{2} s^{T} s + \frac{1}{2} κ^{- 1} {\tilde{ζ}}^{2} .

Along with the system (2),

\dot{L_{s}} (t)

is derived as

\begin{matrix} \dot{L_{s}} & = s^{T} \dot{s} - κ^{- 1} \tilde{ζ} \dot{ζ} \\ = s^{T} G (x) (g (x) u_{d} + g (x) w (x, u) + d) - κ^{- 1} \tilde{ζ} \dot{ζ} . \end{matrix}

(25)

Substituting (23) and (24) into (25), we can have

\begin{matrix} \dot{L_{s}} & = s^{T} G (x) (\tilde{d} + g (x) {\tilde{W}}_{o}^{T} σ (z) + g (x) ε_{o} (x)) - μ s^{T} sgn (s) - ∥ s^{T} G (x) ∥ ζ - \tilde{ζ} ∥ s^{T} G (x) ∥ \\ = s^{T} G (x) ζ_{e} - μ s^{T} sgn (s) - ∥ s^{T} G (x) ∥ ζ - \tilde{ζ} ∥ s^{T} G (x) ∥ . \end{matrix}

Using

∥ ζ_{e} ∥ \leq ζ_{M}

and the estimation error

\tilde{ζ} = ζ_{M} - ζ

yields

\begin{matrix} \dot{L_{s}} & \leq ∥ s^{T} G (x) ∥ ζ_{M} - ∥ s^{T} G (x) ∥ ζ - \tilde{ζ} ∥ s^{T} G (x) ∥ - μ s^{T} sgn (s) \\ \leq - μ s^{T} sgn (s) . \end{matrix}

(26)

Thus, it is shown from (26) that

\dot{L_{s}} \leq - μ {∥ s ∥}_{1} < 0

for any

s \neq 0

, where

{∥ s ∥}_{1}

denotes the vector 1-norm. This means the asymptotic stability and convergence of sliding mode motion

s (x) = 0

can be guaranteed. Moreover, according to (21), the sliding surface

s (x_{0}) = 0

when

t = 0

, which implies that the system states start on the sliding surface, thus avoiding the need for a separate reaching phase. □

From Theorem 1, it is clear that the stable sliding motion

s (x) = 0

exists from the initial time; that is, for all

t \geq 0

,

s (x) = 0

and

\dot{s} (x) = 0

. Moreover, the equivalent control method is utilized to obtain the sliding-mode dynamics. Combining

\dot{s} (x) = 0

with (3) and (22), the equivalent control can be derived as

u_{deq} = - {(G (x) g (x))}^{- 1} G (x) (I - g (x) g^{+} (x)) d - g^{+} (x) d - w (x, u) .

(27)

Then, substitute

u_{deq}

into (2), the sliding-mode dynamics without matched uncertain term and disturbance component is

\dot{x} = f (x) + g (x) u_{c} + Γ (x) d_{u},

(28)

where

Γ (x) = I - g (x) {(G (x) g (x))}^{- 1} G (x)

,

d_{u} = (I - g (x) g^{+} (x)) d

is the unmatched component of the external disturbance in (3). In order to reduce the influence of multiplier matrix

Γ (x)

and minimize the unmatched disturbance

Γ (x) d_{u}

, an optimal projection matrix

G^{*} (x)

within

Γ (x)

is provided in the following Lemma.

Lemma 2.

Considering nonlinear system (2) with Assumption 1, the optimal projection matrix

G^{*} (x)

is selected as

G^{*} (x) = g^{+} (x)

, which not only minimizes the norm

∥ Γ (x) d_{u} ∥

, but also makes the relation

Γ (x) d_{u} = d_{u}

hold.

Proof.

The proof can refer to Theorem 1 in [12]. □

As a result, with the relation

Γ (x) d_{u} = d_{u}

, we can express (28) as

\dot{x} = f (x) + g (x) u_{c} + d_{u},

(29)

which means that the discontinuous control

u_{d}

in (23) can fully counteract the impacts of the matched uncertainties and disturbances.

Notice that in (20),

u_{c}

aims not only to suppress the remaining unmatched disturbances on sliding surface, but also to achieve a near-optimal performance for sliding-mode dynamics (29). This formulation can be seen as a nonlinear

H_{\infty}

optimal control problem, which is known to be challenging to solve directly. In the following, we will demonstrate how to find an approximate

H_{\infty}

optimal control solution by using the single-network ADP algorithm.

4. H_∞ Control Design for Sliding-Mode Dynamics

Considering (3) and (29), the sliding-mode dynamics is represented as

\dot{x} = f (x) + g (x) u_{c} + k (x) d,

(30)

with

k (x) = I - g (x) g^{+} (x)

. Since

g (x)

and

g^{+} (x)

are bounded, it follows that the function

k (x)

is also bounded by

∥ k (x) ∥ \leq k_{M}

with

k_{M} > 0

.

For attenuating the remaining unmatched disturbances

k (x) d

, the corresponding

H_{\infty}

control problem of sliding-mode dynamics is established, which aims to seek a feedback control

u_{c}

to stabilize the system and achieve

L_{2}

-gain no larger than

γ

, that is,

\int_{0}^{\infty} (x^{T} Q x + u_{c}^{T} R u_{c}) d v \leq γ^{2} \int_{0}^{\infty} d^{T} d d v,

(31)

where Q and R are positive definite matrices with appropriate dimensions, and

γ > 0

refers to the level of the disturbance attenuation. Based on [32,33], by treating the disturbance d as the other system input, we can reframe the

H_{\infty}

optimal control problem for system (30) as a two-player zero-sum game with the following infinite-horizon cost function:

V (x) = \int_{t}^{\infty} (x^{T} Q x + u_{c}^{T} R u_{c} - γ^{2} d^{T} d) d v .

(32)

Assuming that

V (x) \in C^{1}

, the Hamiltonian function with the associated admissible control pair

(u_{c}, d)

is defined as

H (x, \nabla V, u_{c}, d) = x^{T} Q x + u_{c}^{T} R u_{c} - γ^{2} d^{T} d + {(\nabla V)}^{T} (f (x) + g (x) u_{c} + k (x) d)

(33)

with

\nabla V = \partial V (x) / \partial x

. From Bellman’s optimality principle, it follows that the optimal cost function

V^{*} (x)

satisfies the HJI equation

0 = \min_{u_{c}} \max_{d} H (x, \nabla V^{*}, u_{c}, d)

(34)

with

\nabla V^{*} = \partial V^{*} (x) / \partial x

. Moreover, according to the zero-sum game theory [16], we have the following Nash condition

\min_{u_{c}} \max_{d} H (x, \nabla V^{*}, u_{c}, d) = \max_{d} \min_{u_{c}} H (x, \nabla V^{*}, u_{c}, d),

(35)

which ensures the existence of saddle point

(u_{c}^{*}, d^{*})

of the HJI Equation (34). Then, applying the stationary condition, one can derive the optimal control

u_{c}^{*}

and worst disturbance

d^{*}

as

\begin{matrix} u_{c}^{*} & = - \frac{1}{2} R^{- 1} g^{T} (x) \nabla V^{*}, \end{matrix}

(36)

\begin{matrix} d^{*} & = \frac{1}{2 γ^{2}} k^{T} (x) \nabla V^{*} . \end{matrix}

(37)

By substituting (36) and (37) into (33), the HJI equation associated with

\nabla V^{*}

becomes

\begin{matrix} 0 = & x^{T} Q x + {(\nabla V^{*})}^{T} f (x) - \frac{1}{4} {(\nabla V^{*})}^{T} g (x) R^{- 1} g^{T} (x) \nabla V^{*} \\ + \frac{1}{4 γ^{2}} {(\nabla V^{*})}^{T} k^{T} (x) k (x) \nabla V^{*} . \end{matrix}

(38)

Due to the highly nonlinear nature of the relevant HJI equation, obtaining its analytical solution is extremely difficult, if not impossible. To overcome this challenge, we propose an online optimal algorithm that learns the solution of the HJI equation and achieves

H_{\infty}

optimal control. This is accomplished through the use of single-network ADP, where only one critic network, implemented by NN, is adopted to approximate the cost function

V^{*}

related to (38). Therefore, by using the critic NN with

l_{c}

neurons,

V^{*}

is represented over a set

Ω

as follows:

V^{*} (x) = W_{c}^{T} σ_{c} (x) + ε_{c} (x)

(39)

with the ideal weight vector

W_{c} \in R^{l_{c}}

being unknown, the vector of activation functions

σ_{c} (x) \in R^{l_{c}}

and the reconstruction error

ε_{c} (x)

. Meanwhile, we have the gradient vector

\nabla V^{*} = {(\nabla σ_{c})}^{T} W_{c} + \nabla ε_{c}

(40)

with

\nabla σ_{c} = \partial σ_{c} (x) / \partial x

and

\nabla ε_{c} = \partial ε_{c} (x) / \partial x

.

By combining (36), (37) and (40), it is easy to get

\begin{matrix} u_{c}^{*} & = - \frac{1}{2} R^{- 1} g^{T} (x) ({(\nabla σ_{c})}^{T} W_{c} + \nabla ε_{c}), \end{matrix}

(41)

\begin{matrix} d^{*} & = \frac{1}{2 γ^{2}} k^{T} (x) ({(\nabla σ_{c})}^{T} W_{c} + \nabla ε_{c}) . \end{matrix}

(42)

Substituting (41) and (42) into (33), the HJI equation becomes

\begin{matrix} 0 = & H (x, \nabla V^{*}, u_{c}^{*}, d^{*}) \\ = & x^{T} Q x + W_{c}^{T} \nabla σ_{c} f (x) - \frac{1}{4} W_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c} - ε_{HJI}, \end{matrix}

(43)

where

D = g (x) R^{- 1} g^{T} (x) - k (x) k^{T} (x) / γ^{2}

, and the approximate error

ε_{HJI}

is defined as

ε_{HJI} = - {(\nabla ε_{c})}^{T} f (x) + W_{c}^{T} \nabla σ_{c} D \nabla ε_{c} / 2 + {(\nabla ε_{c})}^{T} D \nabla ε_{c} / 4

due to the NN reconstruction error. Furthermore, taking into account

∥ k (x) ∥ \leq k_{M}

and

∥ g (x) ∥ \leq g_{M}

, we can infer that there exists a positive constant

D_{M}

in the sense that

∥ D ∥ \leq D_{M}

.

Because

W_{c}

in (39) is unknown, the critic NN with the estimated weights approximates the cost function in the form of

\hat{V} (x) = {\hat{W}}_{c}^{T} σ_{c} (x),

(44)

where

{\hat{W}}_{c}

denotes the estimated values of

W_{c}

. In addition, we can obtain

\nabla \hat{V} = {(\nabla σ_{c})}^{T} {\hat{W}}_{c} .

(45)

By using (36), (37) and (45), the approximate forms of (41) and (42) are derived as

\begin{matrix} {\hat{u}}_{c} & = - \frac{1}{2} R^{- 1} g^{T} (x) {(\nabla σ_{c})}^{T} {\hat{W}}_{c}, \end{matrix}

(46)

\begin{matrix} {\hat{d}}_{w} & = \frac{1}{2 γ^{2}} k^{T} (x) {(\nabla σ_{c})}^{T} {\hat{W}}_{c} . \end{matrix}

(47)

Then, incorporating (46) and (47) into (43), we have the approximate Hamiltonian as follows:

\begin{matrix} H (x, {\hat{W}}_{c}, {\hat{u}}_{c}, {\hat{d}}_{w}) & = x^{T} Q x + {\hat{W}}_{c}^{T} \nabla σ_{c} f (x) - \frac{1}{4} {\hat{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\hat{W}}_{c} . \end{matrix}

(48)

Subtracting (43) from (48), the corresponding Hamiltonian error is defined as

\begin{matrix} e_{c} = H (x, {\hat{W}}_{c}, {\hat{u}}_{c}, {\hat{d}}_{w}) - H (x, \nabla V^{*}, u_{c}^{*}, d^{*}) = H (x, {\hat{W}}_{c}, {\hat{u}}_{c}, {\hat{d}}_{w}) . \end{matrix}

To effectively approximate the cost function, one needs to adjust the critic NN weight

{\hat{W}}_{c}

in a manner that minimizes the Hamiltonian error

e_{c}

. To this end, it is common practice to train the critic NN by minimizing the squared residual error

E_{c}

, where

E_{c} = e_{c}^{T} e_{c} / 2

. The traditional weight updating laws of critic NN based on gradient descent method can only minimize the squared error, but cannot provide any guarantee for the stability of the resulting system during the learning phase.

However, in practice, the stability is one fundamental requirement of system, and a prerequisite for achieving other higher performance. Thus, not just for minimizing the residual error, but also to guarantee the system stability and eliminate the need for an initial stabilizing control, a weight updating law is developed for the critic NN as follows:

\begin{matrix} {\dot{\hat{W}}}_{c} = & - α \frac{ϕ}{{(ϕ^{T} ϕ + 1)}^{2}} e_{c} + \frac{α}{4} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\hat{W}}_{c} \frac{ϕ_{1}^{T}}{ϕ_{s}} {\hat{W}}_{c} - α (F_{2} - F_{1} ϕ_{1}^{T}) {\hat{W}}_{c} \\ + \frac{β}{2} Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) \nabla σ_{c} D \nabla J_{a}, \end{matrix}

(49)

where

α

and

β

are the positive updating ratios,

ϕ = \nabla σ_{c} (f (x) - D {(\nabla σ_{c})}^{T} {\hat{W}}_{c} / 2)

,

ϕ_{1} = ϕ / (ϕ^{T} ϕ + 1)

,

ϕ_{s} = ϕ^{T} ϕ + 1

,

F_{1}

and

F_{2}

represent design parameter matrices with suitable dimensions,

J_{a} (x)

is a Lyapunov function candidate provided in Assumption 4, and the index operator

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w})

is given by

\begin{matrix} Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = \{\begin{matrix} 0, & if {\dot{J}}_{a} (x) = {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w}) < 0 \\ 1, & otherwise \end{matrix} \end{matrix}

(50)

with

\nabla J_{a} = \partial J_{a} (x) / \partial x

.

Remark 2.

Note that in (49), the first term is designed by the normalized gradient descent method for minimizing the residual error. The second term has a well-designed form for ensuring the system’s stability, which is derived from the Lyapunov stability analysis. The last term is an additional adjustment term that works or not depends on the index operator

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w})

, which is selected based on the derivative of

J_{a} (x)

along the sliding-mode dynamics (30), namely,

\dot{J_{a}} (x) = {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w})

. Once the system dynamics may become unstable, this results in

\dot{J_{a}} (x) \geq 0

, then

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = 1

and the last term in (49) is activated. Moreover, based on the negative gradient direction of

\dot{J_{a}} (x)

, i.e.,

- \partial ({(\nabla J_{a})}^{T} (f (x) - D \nabla σ_{c}^{T} {\hat{W}}_{c} / 2)) / \partial {\hat{W}}_{c}

, the last term is designed to reinforce the training process of the critic NN until the system dynamics become stable. This also eliminates the need for an initial stabilizing control, compared with [35,36,37,39], where the stabilizing control is required for initialization; however, in practical applications, finding an initial stabilizing control is quite challenging.

Remark 3.

Based on [14,15,16], it is necessary to satisfy the persistence of excitation (PE) requirement for updating the weights of critic NN, which enhances its ability to explore the state space and is indispensable for the weights to converge to their desired ones. To fulfill the PE requirement, a probing noise is injected into the control input [15], which may cause the instability problem during the online learning. As a result, it is important to design the last term in (49) for stabilizing the resulting system, especially when the probing signal is injected.

The schematic structure of the proposed

H_{\infty}

SMC scheme is illustrated in Figure 1. As shown in Figure 1, this structure consists of two main modules: the

H_{\infty}

optimal learning module and the enhanced observer module. It should be noted that, based on the deduced sliding-mode dynamics, the learning module can operate independently. However, the original system and the observer module rely on the compound control input u, which includes the approximate

H_{\infty}

optimal control

{\hat{u}}_{c}

obtained from the learning module. Consequently, it is necessary to first run the learning module to obtain the approximate optimal control

{\hat{u}}_{c}

during the implementation process.

Considering (43), together with

{\tilde{W}}_{c} = W_{c} - {\hat{W}}_{c}

, (48) is represented as

e_{c} = - {\tilde{W}}_{c}^{T} \nabla σ_{c} (f (x) - \frac{1}{2} D {(\nabla σ_{c})}^{T} {\hat{W}}_{c}) + \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\tilde{W}}_{c} + ε_{HJI} .

(51)

By means of the relation

{\dot{\tilde{W}}}_{c} = - {\dot{\hat{W}}}_{c}

and incorporating (51) into (49), we obtain

\begin{matrix} {\dot{\tilde{W}}}_{c} = & - α \frac{ϕ_{1}}{ϕ_{s}} ({\tilde{W}}_{c}^{T} ϕ - \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\tilde{W}}_{c} - ε_{HJI}) - \frac{α}{4} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\hat{W}}_{c} \frac{ϕ_{1}^{T}}{ϕ_{s}} {\hat{W}}_{c} \\ + α (F_{2} - F_{1} ϕ_{1}^{T}) {\hat{W}}_{c} - \frac{β}{2} Π (x, {\hat{u}}_{c}, {\hat{d}}_{w}) \nabla σ_{c} D \nabla J_{a} . \end{matrix}

(52)

Next, the main stability theorem is presented, but before that, one basic common assumption for the critic NN is introduced [16], and the other assumption for the sliding-mode dynamics is also needed, which has been used in [34,38].

Assumption 3.

For the critic NN, there exist known positive constants

σ_{cM}

,

σ_{dM}

,

ε_{cM}

,

ε_{dM}

and

W_{cM}

such that

∥ σ_{c} (x) ∥ \leq σ_{cM}

,

∥ \nabla σ_{c} ∥ \leq σ_{dM}

,

∥ ε_{c} (x) ∥ \leq ε_{cM}

,

∥ \nabla ε_{c} ∥ \leq ε_{dM}

and

∥ W_{c} ∥ \leq W_{cM}

, respectively. Moreover, the approximation error

ε_{HJI}

is bounded above by

ε_{H} > 0

, namely,

∥ ε_{HJI} ∥ \leq ε_{H}

.

Assumption 4.

Considering the sliding-mode dynamics (30) with the optimal control pair

(u_{c}^{*}, d^{*})

in (36) and (37), let

J_{a} (x)

be a smooth, radially unbounded and positive definite Lyapunov candidate that satisfies

\dot{J_{a}} (x) = {(\nabla J_{a})}^{T} (f (x) + g (x) u_{c}^{*} + k (x) d^{*}) < 0

. Moreover, it is assumed that a positive definite matrix

Ψ (x)

makes

{(\nabla J^{*})}^{T} Ψ (x) \nabla J_{a} = x^{T} Q x + u_{c}^{* T} R u_{c}^{*} - γ^{2} d^{* T} d^{*}

hold. Then, one can derive

{(\nabla J_{a})}^{T} (f (x) + g (x) u_{c}^{*} + k (x) d^{*}) = - {(\nabla J_{a})}^{T} Ψ (x) \nabla J_{a} .

(53)

Remark 4.

Note that the plausibility of Assumption 4 depends on the boundedness of optimal sliding-mode dynamics, which is usually assumed to be bounded by a function of system state x. For more details, refer to [34,38]. Furthermore, it is impossible to solve (53) directly for getting the form of

J_{a} (x)

. Based on [34], one can obtain

J_{a} (x)

by selecting an appropriate form, such as a quadratic polynomial.

Theorem 2.

Considering the sliding-mode dynamics (30) and its associated cost function (32), the control input and disturbance policy are designed by (46) and (47), respectively, along with the critic weight updating law as given by (49). Then, both the sliding-mode state x and the weight estimation error

{\tilde{W}}_{c}

are ensured to be UUB. Furthermore, the obtained control input

{\hat{u}}_{c}

can be proven to converge to a neighborhood of the optimum control

u_{c}^{*}

with a small adjustable bound.

Proof.

Consider the following Lyapunov function candidate

L = \frac{1}{2} {\tilde{W}}_{c}^{T} α^{- 1} {\tilde{W}}_{c} + β_{1} J_{a} (x),

where

β_{1} = β / α > 0

. By calculating the time derivative of L along the sliding-mode dynamics (30), we have

\dot{L} = {\tilde{W}}_{c}^{T} {\dot{\tilde{W}}}_{c} + β_{1} {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w}) .

(54)

Substituting (52) into (54) and making some adjustments, one can get

\begin{matrix} \dot{L} = & - {\tilde{W}}_{c}^{T} ϕ_{1} ϕ_{1}^{T} {\tilde{W}}_{c} + β_{1} {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w}) + \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c} \frac{ϕ_{1}^{T}}{ϕ_{s}} {\tilde{W}}_{c} \\ - \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c} \frac{ϕ_{1}^{T}}{ϕ_{s}} W_{c} - \frac{1}{4} {\tilde{W}}_{c}^{T} \nabla σ_{c} D {(\nabla σ_{c})}^{T} {\tilde{W}}_{c} \frac{ϕ_{1}^{T}}{ϕ_{s}} W_{c} \\ + {\tilde{W}}_{c}^{T} \frac{ϕ_{1}^{T}}{ϕ_{s}} ε_{HJI} - \frac{β}{2} Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) {\tilde{W}}_{c}^{T} \nabla σ_{c} D \nabla J_{a} \\ + {\tilde{W}}_{c}^{T} F_{2} {\hat{W}}_{c} - {\tilde{W}}_{c}^{T} F_{1} ϕ_{1}^{T} {\hat{W}}_{c} . \end{matrix}

(55)

Using

{\hat{W}}_{c} = W_{c} - {\tilde{W}}_{c}

, the last two terms in (55) become

{\tilde{W}}_{c}^{T} F_{2} {\hat{W}}_{c} - {\tilde{W}}_{c}^{T} F_{1} ϕ_{1}^{T} {\hat{W}}_{c} = {\tilde{W}}_{c}^{T} F_{2} W_{c} - {\tilde{W}}_{c}^{T} F_{2} {\tilde{W}}_{c} - {\tilde{W}}_{c}^{T} F_{1} ϕ_{1}^{T} W_{c} + {\tilde{W}}_{c}^{T} F_{1} ϕ_{1}^{T} {\tilde{W}}_{c} .

(56)

Defining

Υ = {[{\tilde{W}}_{c}^{T} ϕ_{1}, {\tilde{W}}_{c}^{T}]}^{T}

, and substituting (55) into (56), it can be rewritten as

\begin{matrix} \dot{L} = & - Υ^{T} M Υ + Υ^{T} δ + β_{1} {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w}) \\ - \frac{β}{2} Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) {\tilde{W}}_{c}^{T} \nabla σ_{c} D \nabla J_{a}, \end{matrix}

(57)

where

\begin{matrix} M & = [\begin{matrix} I & - \frac{\nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c}}{4 ϕ_{s}} - \frac{F_{1}}{2} \\ - \frac{\nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c}}{4 ϕ_{s}} - \frac{F_{1}}{2} & F_{2} \end{matrix}], \\ δ & = [\begin{matrix} \frac{1}{ϕ_{s}} ε_{HJI} \\ - \frac{\nabla σ_{c} D {(\nabla σ_{c})}^{T} W_{c}}{4 ϕ_{s}} + F_{2} W_{c} - F_{1} ϕ_{1}^{T} W_{c} \end{matrix}] . \end{matrix}

With Assumption 3 in mind, and recalling the boundedness of

ϕ_{1}

and D, in particular

∥ ϕ_{1} ∥ < 1

and

∥ D ∥ \leq D_{M}

, we can infer that there exists a positive constant

δ_{M}

in the sense that

∥ δ ∥ \leq δ_{M}

. For guaranteeing

M > 0

, the appropriate parameters

F_{1}

and

F_{2}

need to be selected in design. Then, one can upper bound

\dot{L}

as follows:

\begin{matrix} \dot{L} \leq & - λ_{\min} {(M) ∥ Υ ∥}^{2} + δ_{M} ∥ Υ ∥ + β_{1} {(\nabla J_{a})}^{T} (f (x) + g (x) {\hat{u}}_{c} + k (x) {\hat{d}}_{w}) \\ - \frac{β_{1}}{2} Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) {\tilde{W}}_{c}^{T} \nabla σ_{c} D \nabla J_{a} \end{matrix}

(58)

with

λ_{\min} (M)

being the minimum eigenvalue of M.

According to (50), there are two cases to consider:

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = 0

and

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = 1

for (58) in the following analysis.

C a s e

1: For

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = 0

, it follows from (50) that

{\dot{J}}_{a} (x) < 0

, i.e.,

{(\nabla J_{a})}^{T} \dot{x} < 0

, which, together with the PE condition, can ensure that there exists a positive constant

ϱ

such that

∥ \dot{Z} ∥ > ϱ

. This implies that

{(\nabla J_{a})}^{T} \dot{x} < - ϱ ∥ \nabla J_{a} ∥ < 0

. Then, (58) becomes

\begin{matrix} \dot{L} & \leq β_{1} {(\nabla J_{a})}^{T} \dot{x} - λ_{\min} (M) {∥ Υ ∥}^{2} + δ_{M} ∥ Υ ∥ \\ < - β_{1} ϱ ∥ \nabla J_{a} ∥ - λ_{\min} (M) {(∥ Υ ∥ - \frac{δ_{M}}{2 λ_{\min} (M)})}^{2} + \frac{δ_{M}^{2}}{4 λ_{\min} (M)} . \end{matrix}

(59)

Focus on (59), only if the following inequalities:

∥ \nabla J_{a} ∥ > \frac{δ_{M}^{2}}{4 λ_{\min} (M) ϱ} ≜ A_{1}

or

∥ Υ ∥ > \frac{δ_{M}^{2}}{λ_{\min} (M)}

hold, then

\dot{L} < 0

. Moreover, based on the relation

∥ Υ ∥ \leq \sqrt{∥ ϕ_{1} ∥^{2} + 1} ∥ {\tilde{W}}_{c} ∥

with

∥ ϕ_{1} ∥ < 1

, we can derive

∥ {\tilde{W}}_{c} ∥ > \frac{δ_{M}^{2}}{\sqrt{2} λ_{\min} (M)} ≜ B_{1} .

C a s e

2: For

Σ (x, {\hat{u}}_{c}, {\hat{d}}_{w}) = 1

, in light of (41) and (42), by adding and subtracting

β_{1} {(\nabla J_{a})}^{T} D \nabla ε_{c} / 2

into (58), we can derive

\begin{matrix} \dot{L} \leq & - λ_{\min} (M) {(∥ Υ ∥ - \frac{δ_{M}}{2 λ_{\min} (M)})}^{2} + \frac{δ_{M}^{2}}{4 λ_{\min} (M)} + β_{1} {(\nabla J_{a})}^{T} (f (x) + g (x) u_{c}^{*} \\ + k (x) d^{*}) + \frac{β_{1}}{2} {(\nabla J_{a})}^{T} D \nabla ε_{c} . \end{matrix}

(60)

Then, using (53) in Assumption 4, and recalling the boundedness of D and

\nabla ε_{c}

, (60) is upper bounded as

\dot{L} \leq - λ_{\min} (M) {(∥ Υ ∥ - \frac{δ_{M}}{2 λ_{\min} (M)})}^{2} - \frac{β_{1}}{2} λ_{\min} (Ψ) {∥ \nabla J_{a} ∥}^{2} + Φ,

(61)

where

Φ = δ_{M}^{2} / (4 λ_{\min} (M)) + β_{1} D_{M}^{2} ε_{dM}^{2} / (8 λ_{\min} (Ψ))

,

λ_{\min} (Ψ)

denotes the minimum eigenvalue of

Ψ (x)

. Hence, provided the following inequalities:

∥ \nabla J_{a} ∥ > \sqrt{\frac{2 Φ}{β_{1} λ_{\min} (Ψ)}} ≜ A_{2}

or

∥ Υ ∥ > \sqrt{\frac{Φ}{λ_{\min} (M)}} + \frac{δ_{M}}{2 λ_{\min} (M)}

hold, one has

\dot{L} < 0

. Further, by the relation

∥ Υ ∥ \leq \sqrt{2} ∥ {\tilde{W}}_{c} ∥

, we have

∥ {\tilde{W}}_{c} ∥ > \sqrt{\frac{Φ}{2 λ_{\min} (M)}} + \frac{δ_{M}}{2 \sqrt{2} λ_{\min} (M)} ≜ B_{2} .

To sum up, for both

C a s e

1 and

C a s e

2, with proper parameters

F_{1}

and

F_{2}

satisfying

M > 0

, the inequality

∥ \nabla J_{a} ∥ \geq \max {A_{1}, A_{2}} = \bar{A}

or

∥ {\tilde{W}}_{c} ∥ \geq \max {B_{1}, B_{2}} = \bar{B}

holds, then, we have

\dot{L} < 0

. From the Lyapunov extension theorem [16], it is found that

∥ \nabla J_{a} ∥

and

∥ {\tilde{W}}_{c} ∥

are bounded by

\bar{A}

and

\bar{B}

, respectively. Based on Assumption 4, the Lyapunov candidate

J_{a} (x)

is radially unbounded, which implies that the boundedness of

∥ \nabla J_{a} ∥

leads to the boundedness of the system state

∥ x ∥

. In particular,

∥ x ∥

is bounded by

{\bar{A}}_{x} = \max {A_{1 x}, A_{2 x}}

, where

A_{1 x}

and

A_{2 x}

are determined by

A_{1}

and

A_{2}

, respectively. So far, we can conclude that both x and

{\tilde{W}}_{c}

are guaranteed to be UUB.

Next, we will prove

{\hat{u}}_{c}

converges to a small neighborhood of

u_{c}^{*}

with an adjustable bound, i.e.,

∥ {\hat{u}}_{c} - u_{c}^{*} ∥ \leq ϵ_{u}

. Considering (41) and (46), we have

{\hat{u}}_{c} - u_{c}^{*} = - \frac{1}{2} R^{- 1} g^{T} (x) ({(\nabla σ_{c})}^{T} {\tilde{W}}_{c} + \nabla ε_{c}) .

Noticing that

{\tilde{W}}_{c}

is UUB together with the associated bound

\bar{B} = \max {B_{1}, B_{2}}

, and invoking

∥ g (x) ∥ \leq g_{M}

,

∥ \nabla σ_{c} ∥ \leq σ_{dM}

,

∥ \nabla ε_{c} ∥ \leq ε_{dM}

and boundedness of R, it follows that

\begin{matrix} ∥ {\hat{u}}_{c} - u_{c}^{*} ∥ & \leq \frac{1}{2} λ_{\max} (R^{- 1}) g_{M} (σ_{dM} \bar{B} + ε_{dM}) ≜ ϵ_{u} . \end{matrix}

(62)

□

Remark 5.

From the expression of

B_{1}

and

B_{2}

, it is seen that

\bar{B}

can be kept small with

λ_{\min} (M)

being larger enough. In view of (57), we can enlarge the value of

λ_{\min} (M)

by adjusting the corresponding design parameters

F_{1}

and

F_{2}

. Moreover, we can make the approximate error

ε_{c}

and its upper bound

ε_{dM}

sufficiently small when the neuron number

l_{c}

is large enough. Therefore, we can make the convergence errors

ϵ_{u}

in (62) as small as possible in the design.

5. Simulation Results

To validate the effectiveness of the proposed

H_{\infty}

optimal SMC scheme, two simulation examples are provided. The first example focuses on a single-link robot arm, while the second example deals with a power system.

5.1. Single-Link Robot Arm

Considering a nonlinear single-link robot arm [23] and its dynamics given by

J \ddot{θ} = - M g L \sin (θ) - D \dot{θ} + u + w,

(63)

where

θ

is the joint rotation angle of robot arm in radians, u refers to the control torque applied to the joint in

Nm

, and w denotes the lumped uncertain term. Select the system parameters as follows: the arm length

L = 0.5 m

, the payload mass

M = 1 kg

, the local gravity acceleration

g = 9.81 m / s^{2}

. the rotational inertia

J = 1 kg \cdot m^{2}

and the viscous friction

D = 2 Nm \cdot s / rad

. With the system states defined as

x_{1} = θ

and

x_{2} = \dot{θ}

, and considering the presence of exogenous disturbances, then the dynamics (63) in state-space form can be represented as

[\begin{matrix} {\dot{x}}_{1} \\ {\dot{x}}_{2} \end{matrix}] = [\begin{matrix} x_{2} \\ - 4.905 \sin (x_{1}) - 2 x_{2} \end{matrix}] + [\begin{matrix} 0 \\ 1 \end{matrix}] (u + w) + d,

(64)

where d represents the unknown disturbances. Moreover, it is assumed that the initial state is set as

x_{0} = {[1, - 0.5]}^{T}

, the lumped uncertainty term is

w (x, u) = x_{2} \sin (x_{1}) + 0.1 \sin (x_{1}) u

, and the disturbance term is chosen as

d = {[0.5 e^{- t} \sin (t), 0.5 \sin (t)]}^{T}

in the simulation.

The enhanced observer system, consisting of an NN identifier and a nonlinear DO, can be designed as shown in (6), where the identifier NN is selected as a three-layered feedforward NN with one hidden layer containing six neurons, and the hyperbolic activation function

\tanh (\cdot)

is utilized. The updating ratios are set as

η_{1} = 30

and

η_{2} = 2.5

, while the weights

{\hat{W}}_{o}

and

{\hat{V}}_{o}

are initialized with random values chosen from the interval

[- 0.1, 0.1]

. The initial observer state is set as

{\hat{x}}_{0} = {[0.5, 0]}^{T}

. Moreover, based on Lemma 1, select the Hurwitz matrix

A = [- 15, 0; 0, - 15]

,

p (x) = [10 x_{1}; 10 x_{2}]

and

l (x) = [10, 0; 0, 10]

to ensure that the inequality (10) holds. The integral sliding surface function is determined by (21), together with

G (x) = g^{+} (x) = [0, 1]

and

S_{0} (x) = x_{2}

. Accordingly, the discontinuous SMC

u_{d}

is given by (23) and (24). For the propose of eliminating the chattering phenomenon, an arctangent function

atan (s / ϵ)

with a small positive scalar

ϵ = 0.005

is employed to replace the sign function

sgn (s)

in (23).

By considering the SMC law

u_{d}

, the sliding-mode dynamics can be obtained as

\dot{x} = f (x) + g (x) u_{c} + k (x) d,

(65)

where

k (x) = I - g (x) g^{+} (x) = [1, 0; 0, 0]

. We choose the associated cost function as the form of (32), together with

Q = diag (1, 1)

,

R = 1

and

γ = 1.5

. For the critic NN, the activation function is chosen as

σ_{c} (x) = {[x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}, x_{1}^{3} x_{2}, x_{1}^{2} x_{2}^{2}, x_{1} x_{2}^{3}]}^{T}

, which results in

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, \dots, {\hat{W}}_{c 6}]}^{T}

. Select the updating ratios

α = 1

,

β = 0.5

, the design parameters

F_{1} = F_{2} = 10 I

,

l_{c} = 6

and

J_{a} (x)

as a quadratic polynomial. Furthermore, the weight vector

{\hat{W}}_{c}

is initialized to zero, which leads to the initial control input of zero. Noticing that the zero initial control cannot make the system (65) stable, it is thus clear that no initial stabilizing control strategy is necessary when implementing the proposed algorithm.

During the learning process, a damped decreasing probing noise is injected into the control input for satisfying PE condition. This noise comprises sinusoids of diverse frequencies and is applied for the first 450 s. Figure 2 shows the trajectories of the critic weights, which eventually converge to

{\hat{W}}_{c} = {[1.0420, 0.0856, - 0.0603, - 0.2174, 0.2948, - 0.0358]}^{T}

. Figure 3 describes the trajectories of system states in the learning. From Figure 3, one can see that without an initial stabilizing control, the system states stay at or near zero after the probing noise is removed, which indicates that

{\hat{u}}_{c}

generated by the learning module can effectively stabilize the system. With the converged weights, the approximate

H_{\infty}

optimal control

{\hat{u}}_{c}

can be calculated by (46).

Next, we substitute

{\hat{u}}_{c}

into (21) to obtain an available sliding surface. Subsequently, integrating with the enhanced observer system, the SMC law

u_{d}

is implemented by using (23) and (24) with the reliable estimations of uncertainties and disturbances. Figure 4 depicts the estimates of disturbances

d_{1} = 0.5 e^{- t} \sin (t)

and

d_{2} = 0.5 \sin (t)

, along with small estimation errors. Figure 5 presents the identifications of system states using the identifier NN. It can be observed that the identified states rapidly track the real states, illustrating the effectiveness and efficiency of the identifier NN. Note that the valid estimations

\hat{d}

and

{\hat{W}}_{o}

are used to design the SMC law

u_{d}

, which helps to reduce the sliding-mode gain and alleviate the chattering phenomenon. Figure 6 displays the state trajectories of the robot arm under the compound

H_{\infty}

sliding-mode control

u = u_{d} + {\hat{u}}_{c}

. Figure 7 depicts the compound control u, while the

H_{\infty}

control

{\hat{u}}_{c}

and the SMC law

u_{d}

are given in Figure 8. These results presented in Figure 6, Figure 7 and Figure 8 confirm that the compound control u successfully renders the robot arm system stable and exhibits satisfactory performance against both system uncertainties and external disturbances.

5.2. Power Plant System

To further validate the effectivity of the proposed scheme, we consider an electric power system comprised of a gas turbine generator, a system load, and an automatic generation control [34]. To model this system, the incremental frequency deviation

Δ f_{G}

, the generator output power variation

Δ P_{m}

, and the valve position change of the governor

Δ v

are taken into consideration. The control input is represented by the speed change

Δ P_{c}

in position deviation. By defining the state vector

x = {[Δ v, Δ P_{m}, Δ f_{G}]}^{T} \in R^{3}

, we can express the reduced power system model in state-space form as

\dot{x} = [\begin{matrix} - \frac{1}{T_{g}} & 0 & \frac{1}{R_{g} T_{g}} \\ \frac{K_{t}}{T_{t}} & - \frac{1}{T_{t}} & 0 \\ 0 & \frac{K_{p}}{T_{p}} & - \frac{1}{T_{p}} \end{matrix}] x + [\begin{matrix} \frac{1}{T_{g}} \\ 0 \\ 0 \end{matrix}] (u + ϑ) + d

(66)

where

g (x) = {[1 / T_{g}, 0, 0]}^{T}

,

ϑ

represents the modeling uncertainty, and d stands for the exterior disturbances. Assume that the uncertain term is

ϑ = x_{2} \sin (x_{1})

, and the disturbance term is defined as

d (t) = {[\sin (2 π t) e^{- t}, 0, 0.2 \sin^{2} (t) e^{- t}]}^{T}

in the simulation. Let the regulation constant

R_{g}

= 2.5 Hz/MW, the turbine gain constant

K_{t}

= 1 s and the generator gain constant

K_{p}

= 120 Hz/MW. Moreover, the corresponding time constants are set as

T_{g}

= 0.08 s,

T_{t}

= 0.1 s and

T_{p}

= 20 s, respectively.

For estimating the unknown uncertainty and disturbance terms, the enhanced observer system is constructed as (6) with a three-layered feedforward NN containing eight hidden neurons and the Hurwitz matrix

A = [- 12, 0, 0; 0, - 12, 0; 0, 0, - 12]

. The activation function, the initial weights, and the updating ratios are the same as in Section 5.1. Let

p (x) = {[10 x_{1}, 0, 10 x_{3}]}^{T}

,

l (x) = [10, 0, 0; 0, 0, 0; 0, 0, 10]

,

G (x) = g^{+} (x) = [0.08, 0, 0]

and

S_{0} (x) = 0.08 x_{1}

. Similarly, an arctangent function

atan (s / ϵ)

is used for designing the SMC law

u_{d}

instead of the sign function

sgn (s)

.

Without the matched uncertainties and disturbances, we can derive the sliding-mode dynamics from (66), wherein

k (x) = [0, 0, 0; 0, 1, 0; 0, 0, 1]

, and the initial state

x_{0} = {[0.2, - 0.2, 0.1]}^{T}

. Let the associated cost function be of the form (32) along with

Q = diag (1, 1, 1)

,

R = 1

and

γ = 3

. The critic NN is designed as (44) and its corresponding parameters are

α = 15

,

β = 0.5

,

σ (x) = [x_{1}^{2}, x_{1} x_{2}, x_{1} x_{3}, x_{2}^{2}, x_{2} x_{3}, x_{3}^{2}, x_{1}^{2} x_{2} x_{3}, x_{1} x_{2}^{2} x_{3}, x_{1} x_{2} x_{3}^{2}]^{T}

and

{\hat{W}}_{c} = {[{\hat{W}}_{c 1}, {\hat{W}}_{c 2}, \dots, {\hat{W}}_{c 9}]}^{T}

. Similar to Section 5.1,

J_{a} (x) = x^{T} x / 2

, the initial weight vector is set to zero, and a similar probing noise is injected into the control input before 550 s. The evolving trajectories of the critic weights are shown in Figure 9, while the trajectories of system states in the learning are depicted in Figure 10. After 550 s, the critic weights converge to

{\hat{W}}_{c} = {[0.0830, 0.1245, 0.2284, 0.1616, 0.4883, 0.5488, 0.1154, 0.0563, 0.0564]}^{T}

, then we can derive

{\hat{u}}_{c}

using (46) with the converged weights.

Then, we substitute

{\hat{u}}_{c}

into the integral sliding surface (21), and we design the SMC law

u_{d}

by (23) and (24). Consequently, the compound control is constructed as

u = u_{d} + {\hat{u}}_{c}

. After simulation, Figure 11 shows the trajectories of the power system states under this compound control for 15 s. Figure 12 presents the compound control u. From Figure 11 and Figure 12, we can conclude that the compound control effectively stabilizes the system states to the equilibrium point, even in the presence of modeling uncertainties and exterior disturbances. These results undeniably demonstrate the viability and efficiency of the proposed approach.

6. Conclusions

In this paper, we develop a neural adaptive

H_{\infty}

sliding-mode control scheme for uncertain nonlinear systems subject to external disturbances. Based on the enhanced observer system composed of the NN identifier and nonlinear DO, an integral SMC is designed for suppressing the influences of the uncertain term and the matched disturbance component, as well as unknown approximation errors, with no prior knowledge of their upper bounds. Meanwhile, on the sliding surface, the remaining unmatched disturbances are attenuated using the

H_{\infty}

optimal control solved by the single critic network-based ADP algorithm. Furthermore, uniform ultimate boundedness stability of the resultant closed-loop system can be proven by Lyapunov’s method. In addition to the theoretical analysis, two simulation examples are provided to further validate the proposed approach. Recently, the growing interest in saving communication resources or reducing the calculation amount of networked control systems makes the event-triggering mechanism gain more and more attention and undergo rapid development. Hence, how to combine the optimal SMC strategy with the event-triggering mechanism for more complex physical systems, not just for control-affine systems, will be our future research topic.

Author Contributions

Y.H. contributed to the conception and design of this study, performed the experiment and the data analysis, and wrote the manuscript; Z.Z. contributed to the results discussion and the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Advanced Talents Incubation Program of Hebei University under Grant 521100221049, in part by Hebei Province Higher Education Science and Technology Research Project of China under Grant CXY2023009, in part by Hebei University Research and Innovation Team Project under Grant IT202306, and in part by Baoding Science and Technology Plan Project under Grant 2372P010.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

Ioannou, P.; Sun, J. Robust Adaptive Control; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
Utkin, V.; Guldner, J.; Shi, J. Sliding Mode Control in Electro-Mechanical Systems; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Yu, X.; Kaynak, O. Sliding-mode control with soft computing: A survey. IEEE Trans. Ind. Electron. 2009, 56, 3275–3285. [Google Scholar]
Xu, J.; Guo, Z.; Tong, H. Design and implementation of integral sliding-mode control on an underactuated two-wheeled mobile robot. IEEE Trans. Ind. Electron. 2014, 61, 3671–3681. [Google Scholar] [CrossRef]
Chen, L.; Edwards, C.; Alwi, H. Integral sliding mode fault-tolerant control allocation for a class of affine nonlinear system. Int. J. Robust Nonlinear 2019, 29, 565–582. [Google Scholar] [CrossRef]
Pan, Y.; Yang, C.; Pan, L.; Yu, H. Integral sliding mode control: Performance, modification, and improvement. IEEE Trans. Ind. Inform. 2017, 14, 3087–3096. [Google Scholar] [CrossRef]
Errouissi, R.; Ouhrouche, M.; Chen, W.; Trzynadlowski, A. Robust nonlinear predictive controller for permanent-magnet synchronous motors with an optimized cost function. IEEE Trans. Ind. Electron. 2012, 59, 2849–2858. [Google Scholar] [CrossRef]
Huang, J.; Ri, S.; Fukuda, T.; Wang, Y. A disturbance observer based sliding mode control for a class of underactuated robotic system with mismatched uncertainties. IEEE Trans. Autom. Control 2019, 64, 2480–2487. [Google Scholar] [CrossRef]
Cui, R.; Chen, L.; Yang, C.; Chen, M. Extended state observer-based integral sliding mode control for an underwater robot with unknown disturbances and uncertain nonlinearities. IEEE Trans. Ind. Electron. 2017, 64, 6785–6795. [Google Scholar] [CrossRef]
Wang, Y.; Xie, X.; Chadli, M.; Xie, S.; Peng, Y. Sliding-mode control of fuzzy singularly perturbed descriptor systems. IEEE Trans. Fuzzy Syst. 2020, 29, 2349–2360. [Google Scholar] [CrossRef]
Chen, M.; Chen, W. Sliding mode control for a class of uncertain nonlinear system based on disturbance observer. Int. J. Adapt. Control Signal Process 2010, 24, 51–64. [Google Scholar] [CrossRef]
Rubagotti, M.; Estrada, A.; Castanos, F.; Ferrara, A. Integral sliding mode control for nonlinear systems with matched and unmatched perturbations. IEEE Trans. Autom. Control 2011, 56, 2699–2704. [Google Scholar] [CrossRef]
Castanos, F.; Fridman, L. Analysis and design of integral sliding manifolds for systems with unmatched perturbations. IEEE Trans. Autom. Control 2006, 51, 853–858. [Google Scholar] [CrossRef]
Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H.; Lewis, F.L. Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2042–2062. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Wei, Q.; Wang, D.; Yang, X.; Li, H. Adaptive Dynamic Programming with Applications in Optimal Control; Springer: Cham, Switzerland, 2017. [Google Scholar]
Lewis, F.L.; Liu, D. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Ha, M.; Wang, D.; Liu, D. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J. Autom. Sin. 2022, 9, 1262–1272. [Google Scholar] [CrossRef]
Wei, Q.; Lewis, F.L.; Liu, D.; Song, R.; Lin, H. Discrete-time local value iteration adaptive dynamic programming: Convergence analysis. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 875–891. [Google Scholar] [CrossRef]
Wei, Q.; Wang, L.; Lu, J.; Wang, F.Y. Discrete-Time Self-Learning Parallel Control. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 192–204. [Google Scholar] [CrossRef]
Heydari, A.; Balakrishnan, S. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 145–157. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Wei, Q.; Wang, Z.; Zhou, T.; Wang, F. Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control. Inf. Sci. 2022, 584, 519–535. [Google Scholar] [CrossRef]
Wang, D.; Ren, J.; Ha, M. Discounted linear Q-learning control with novel tracking cost and its stability. Inf. Sci. 2023, 626, 339–353. [Google Scholar] [CrossRef]
Zhang, X.; Ni, Z.; He, H. A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2513–2525. [Google Scholar] [CrossRef]
Huang, Y.; Wang, D.; Liu, D. Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing 2017, 266, 128–140. [Google Scholar] [CrossRef]
Yang, X.; Wei, Q. Adaptive critic designs for optimal event-driven control of a CSTR system. IEEE Trans. Ind. Inform. 2021, 17, 484–493. [Google Scholar] [CrossRef]
Yang, X.; He, H.; Zhong, X. Approximate dynamic programming for nonlinear-constrained optimizations. IEEE Trans. Cybern. 2021, 51, 2419–2432. [Google Scholar] [CrossRef] [PubMed]
Wen, G.; Niu, B. Optimized tracking control based on reinforcement learning for a class of high-order unknown nonlinear dynamic systems. Inf. Sci. 2022, 606, 368–379. [Google Scholar] [CrossRef]
Wang, D.; Qiao, J.; Cheng, L. An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans. Cybern. 2022, 52, 77–86. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 142–160. [Google Scholar] [CrossRef]
Wang, D.; Ha, M.; Zhao, M. The intelligent critic framework for advanced optimal control. Artif. Intell. Rev. 2022, 55, 1–22. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially-unknown constrained input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
Luo, B.; Wu, H.; Huang, T. Off-policy reinforcement learning for H_∞ control design. IEEE Trans. Cybern. 2014, 45, 65–76. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L.; Jiang, Z. H_∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
Wang, D.; He, H.; Liu, D. Adaptive critic nonlinear robust control: A survey. IEEE Trans. Cybern. 2017, 47, 3429–3451. [Google Scholar] [CrossRef]
Mitra, A.; Behera, L. Continuous-time single network adaptive critic based optimal sliding mode control for nonlinear control affine systems. In Proceedings of the 34th Chinese Control Conference, HangZhou, China, 28–30 July 2015; pp. 3300–3306. [Google Scholar]
Fan, Q.; Yang, G. Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 165–177. [Google Scholar] [CrossRef] [PubMed]
Qu, Q.; Zhang, H.; Yu, R.; Liu, Y. Neural network-based H_∞ sliding mode control for nonlinear systems with actuator faults and unmatched disturbances. Neurocomputing 2018, 275, 2009–2018. [Google Scholar] [CrossRef]
Zhang, H.; Qu, Q.; Xiao, G.; Cui, Y. Optimal guaranteed cost sliding mode control for constrained-input nonlinear systems with matched and unmatched disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2112–2126. [Google Scholar] [CrossRef] [PubMed]
Yang, D.; Li, T.; Xie, X.; Zhang, H. Event-triggered integral sliding-mode control for nonlinear constrained-input systems with disturbances via adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 4086–4096. [Google Scholar] [CrossRef]

Figure 1. The schematic of the adaptive

H_{\infty}

SMC scheme.

Figure 1. The schematic of the adaptive

H_{\infty}

SMC scheme.

Figure 2. Trajectories of the critic NN weights.

Figure 3. Trajectories of system states in the learning.

Figure 4. (a) Real disturbance

d_{1}

and its estimation

{\hat{d}}_{1}

, (b) Real disturbance

d_{2}

and its estimation

{\hat{d}}_{2}

.

Figure 4. (a) Real disturbance

d_{1}

and its estimation

{\hat{d}}_{1}

, (b) Real disturbance

d_{2}

and its estimation

{\hat{d}}_{2}

.

Figure 5. (a) Real state

x_{1}

and identified state

{\hat{x}}_{1}

, (b) Real state

x_{2}

and identified state

{\hat{x}}_{2}

.

Figure 5. (a) Real state

x_{1}

and identified state

{\hat{x}}_{1}

, (b) Real state

x_{2}

and identified state

{\hat{x}}_{2}

.

Figure 6. State trajectories of the robotic arm.

Figure 7. The compound control u.

Figure 8. (a) The

H_{\infty}

optimal control

{\hat{u}}_{c}

. (b) The SMC law

u_{d}

.

Figure 8. (a) The

H_{\infty}

optimal control

{\hat{u}}_{c}

. (b) The SMC law

u_{d}

.

Figure 9. Trajectories of the critic NN weights.

Figure 10. Trajectories of system states in the learning.

Figure 11. Trajectories of the electric power system.

Figure 12. The compound optimal control.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhang, Z. Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. Entropy 2023, 25, 1570. https://doi.org/10.3390/e25121570

AMA Style

Huang Y, Zhang Z. Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. Entropy. 2023; 25(12):1570. https://doi.org/10.3390/e25121570

Chicago/Turabian Style

Huang, Yuzhu, and Zhaoyan Zhang. 2023. "Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming" Entropy 25, no. 12: 1570. https://doi.org/10.3390/e25121570

APA Style

Huang, Y., & Zhang, Z. (2023). Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. Entropy, 25(12), 1570. https://doi.org/10.3390/e25121570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

Abstract

1. Introduction

2. Problem Formulation

3. Integral SMC Design Based on the Enhanced Observer System

4. H_∞ Control Design for Sliding-Mode Dynamics

5. Simulation Results

5.1. Single-Link Robot Arm

5.2. Power Plant System

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Neural Adaptive H∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

Abstract

1. Introduction

2. Problem Formulation

3. Integral SMC Design Based on the Enhanced Observer System

4. H∞ Control Design for Sliding-Mode Dynamics

5. Simulation Results

5.1. Single-Link Robot Arm

5.2. Power Plant System

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Neural Adaptive H_∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

4. H_∞ Control Design for Sliding-Mode Dynamics