Neural Adaptive H∞ Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

This paper focuses on a neural adaptive H∞ sliding-mode control scheme for a class of uncertain nonlinear systems subject to external disturbances by the aid of adaptive dynamic programming (ADP). First, by combining the neural network (NN) approximation method with a nonlinear disturbance observer, an enhanced observer framework is developed for estimating the system uncertainties and observing the external disturbances simultaneously. Then, based on the reliable estimations provided by the enhanced observer, an adaptive sliding-mode controller is meticulously designed, which can effectively counteract the effects of the system uncertainties and the separated matched disturbances, even in the absence of prior knowledge regarding their upper bounds. While the remaining unmatched disturbances are attenuated by means of H∞ control performance on the sliding surface. Moreover, a single critic network-based ADP algorithm is employed to learn the cost function related to the Hamilton–Jacobi–Isaacs equation, and thus, the H∞ optimal control is obtained. An updated law for the critic NN is proposed not only to make the Nash equilibrium achieved, but also to stabilize the sliding-mode dynamics without the need for an initial stabilizing control. In addition, we analyze the uniform ultimate boundedness stability of the resultant closed-loop system via Lyapunov’s method. Finally, the effectiveness of the proposed scheme is verified through simulations of a single-link robot arm and a power system.


Introduction
Within the last few decades, multifarious robust control design theories and methods have been proposed for uncertain nonlinear systems [1].As one of the most efficient and widely used control methods, sliding-mode control (SMC) has garnered significant attention by reason of its simplicity, order reduction and inherent robustness against the matched uncertainties [2].The classical SMC approach is to exert a discontinuous control to drive the system states onto a prescribed sliding manifold or surface [3].As long as the sliding surface is reached, the system will become immune from the matched uncertainties and input disturbances.To remove the reaching phase, an integral SMC was developed by using the integral sliding manifold, including an integral term, which can enable the system states to reach and remain on the sliding manifold from the beginning [4][5][6].Although towards a wide variety of actual systems, the relevant uncertainties and disturbances can be assumed to be matched in the design of control systems, there are also many physical systems, such as permanent magnet synchronous motors [7], underactuated aerial vehicles and robotic systems [8] directly affected by unmatched disturbances.Lately, several new approaches involving the integral SMC have been proposed to stabilize various systems with unmatched disturbances [9][10][11][12][13].Among these methods, it is worth noticing that in [12,13], the impact of the separated unmatched disturbances would not be amplified after choosing a suitable projection matrix in a sliding manifold and were attenuated by the combination of the integral SMC with H ∞ control theories.This provides a feasible and effective way to handle the unmatched disturbances and helps explore the relationships between integral SMC and H ∞ control in nonlinear system control design.
In many instances, we expect the control policy not just to make the closed-loop system stable, but to possess certain optimality by minimizing the user-defined cost.For nonlinear systems, the settlement of associated optimal control problems requires solving the Hamilton-Jacobi-Bellman (HJB) equation.While considering H ∞ optimal control, based on the dissipativity theory, it can be formulated as an L 2 -gain control problem, which involves solving the Hamilton-Jacobi-Isaacs (HJI) equation [14].However, the analytical solutions of both HJB and HJI equations are very hard or even impossible to obtain directly because of their inherent nonlinearities [15].In recent years, a class of neural network (NN) and reinforcement learning (RL)-based intelligent optimization and control methods, referred to as adaptive dynamic programming (ADP), is becoming more and more striking and shows great application potential in solving various optimization problems, and effectively conquers the "curse of dimensionality" [15,16].By now, many researchers have employed ADP to tackle a variety of optimal control problems for both discrete-time (DT) [17][18][19][20][21][22] and continuous-time (CT) systems [23][24][25][26][27][28].Moreover, how to combine ADP with other robust methods to achieve better performance and stronger robustness for uncertain nonlinear systems is becoming a new research focus [29,30].
Recently, Modares et al. [31] proposed an online integral RL algorithm that incorporates a non-quadratic discounted cost function to address the constrained-input optimal tracking problem.Luo et al. [32] described an NN-based off-policy learning algorithm within the actor-critic framework to deal with the associated HJI equation, and this algorithm was later extended to find the near-optimal H ∞ tracking control solution in [33].
Nevertheless, the influences of potential system or modeling uncertainties were not taken into account in the design.Wang et al. [34] introduced a robust neuro-optimal control approach for input-affine nonlinear systems with both matched and state-depended uncertainties.They achieved this by redesigning the cost function and selecting a suitable feedback gain, whereas the upper bound function of uncertainties is needed for redesigning the cost function to suppress these uncertainties.Mitra et al. [35] presented an optimal SMC scheme for the single-input cascade nonlinear systems with matched bounded disturbances.Fan et al. [36] investigated an adaptive actor-critic-based integral SMC strategy for CT nonlinear systems with unknown terms and input disturbances, where the initial stabilizing control requirement in the learning was quite stringent and limiting in practical applications.Qu et al. [37] developed an adaptive H ∞ optimal SMC method in the presence of actuator faults and unmatched disturbances using the ADP algorithm, and further explored the optimal guaranteed cost SMC for constrained-input uncertain systems by formulating an auxiliary system and redefining the utility function [38].Based on [37], combined with event-triggered mechanisms, Yang et al. [39] provided an event-triggered integral SMC design for nonlinear control-affine systems by leveraging the ADP technique.Note that these methods mentioned above rely on the availability of upper bounds for matched or unmatched disturbances, which may cause over-design and thus leads to an over-conservative control scheme.Additionally, in real-world scenarios, determining precise upper bounds of external disturbances is often a challenging task.
Inspired by the works mentioned earlier, we propose an adaptive neural H ∞ SMC scheme for uncertain nonlinear systems subject to external disturbances using the ADP algorithm.Based on the enhanced observer system composed of the NN identifier and nonlinear disturbance observer (DO), an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds.While on the sliding manifold, the remaining unmatched disturbances are attenuated by H ∞ optimal control solved by the single-network ADP algorithm.Moreover, the uniform ultimate boundedness stability of the resultant closed-loop system are guaranteed via the Lyapunov approach.
The principal contributions of this study can be enumerated as follows.First, unlike other existing schemes [34][35][36][37][38][39], based on the enhanced observer system, the proposed approach makes the designed sliding-mode controller independent from the relevant upper bounds of uncertainties and disturbances, which renders the implementation much easier and more practical and removes the assumption that the upper bounds need to be known in advance.Second, compared with the algorithms presented in [36,37], our approach can deal with both unknown nonlinear terms and unmatched external disturbances, where the single-network ADP is utilized to approximate an H ∞ optimal control.Unlike typical actorcritic-disturbance network architectures, the single critic network structure may bring a simpler implementation, lower calculation amount, and avoid the numerical approximate errors arising from actor and disturbance networks.Third, we introduce an updated law for the critic NN, which not only achieves the Nash equilibrium, but also ensures the stability of the sliding-mode dynamics without the need for an initial stabilizing control in the learning.
The remainder of this paper is arranged as follows.Section 2 outlines the problem formulation and provides some necessary preliminaries.Section 3 describes the design of an integral SMC based on the enhanced observer system.Section 4 presents the application of the single-network ADP to obtain H ∞ optimal control for the sliding-mode dynamics, along with stability analysis.Simulations of the robotic arm and a power system are given in Section 5, followed by a summary of this study in Section 6.

Problem Formulation
Consider the following uncertain perturbed nonlinear system as where the state vector x ∈ R n is measurable, u ∈ R m is the control input, f (x) ∈ R n and g(x) ∈ R n×m are the known system drift and input dynamics, respectively; ∆ f (x) and ∆g(x) denote uncertain nonlinear terms that refer to either the inherent characteristics of the system or modeling uncertainties, while d ∈ R n represents the unknown external disturbances.Moreover, it is assumed that the system uncertainties ∆ f (x) and ∆g(x) satisfy the matched condition, i.e., ∆ f (x) + ∆g(x)u = g(x)w(x, u), then the system (1) is rewritten in the form of ẋ = f (x) + g(x)u + g(x)w(x, u) + d with w(x, u) being the bounded lumped uncertain term.Let Ω ⊆ R n be a compact set, and suppose that f (x) + g(x)u is Lipschitz continuous over Ω with f (0 To avoid any confusion, • denotes the 2-norm of a vector or the Frobenius norm of a matrix hereafter, unless otherwise specified. Assumption 1.The input matrix g(x) has a full column rank and is norm bounded with g M > 0, that is, ||g(x)|| ≤ g M for any x.Moreover, the resulting left pseudoinverse g + (x) ∈ R m×n is given by g + (x) = (g T (x)g(x)) −1 g T (x), which is bounded by ||g + (x)|| ≤ b M , where b M , g M are known positive constants.
Based on Assumption 1, d is then decomposed into the matched and unmatched components through the projection of d onto the input matrix g(x) as where I denotes an identity matrix of appropriate dimensions, and g + (x) is the left pseudoinverse of g(x).It should be noted that Assumption 1 is somewhat restrictive, which may lessen the applicability scope of the proposed approach to some extent.However, many real-world physical systems, such as the satellite dynamics, the hypersonic flight vehicle and overhead crane systems, have such a property to make this assumption valid [15,20].
To deal with the uncertain nonlinear system (1) with external disturbances, an enhanced observer system is first constructed for estimating the uncertain terms and observing the unknown disturbances simultaneously.Then, based on the reliable estimations, an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds.Meanwhile, the remaining unmatched disturbances are attenuated by H ∞ optimal control on the sliding surface.Moreover, the single-network ADP algorithm is employed to learn the cost function related to the Hamilton-Jacobi-Isaacs equation, and then, the H ∞ optimal control is obtained.What is more, a weight updating law is formulated to ensure both the achievement of Nash equilibrium and the stabilization of sliding-mode dynamics during the learning process.

Integral SMC Design Based on the Enhanced Observer System
Recalling the NN universal approximation property, the uncertain term w(x, u) can be represented by a three-layered NN as where W o ∈ R l o ×m and V o ∈ R (n+m)×l o denote unknown ideal weight matrices between the output and hidden, and hidden and input layers, respectively; represents the activation function with l o hidden layer neurons, and ε o (x) ∈ R m stands for the NN reconstruction error.To simplify the learning process, only the weights of W o are adapted online, while V o is an initialized set with random values and then remains unchanged during the weight updating process [16].
The NN identifier is designed by where A is a Hurwitz matrix, x is the identifier state, Ŵo is the estimate of W o , and the activation function σ(z) = σ(V T o x) with z = V T o x.Since the unknown disturbance term d is needed in (5), inspired by [11], a nonlinear DO is introduced for obtaining d, namely, the estimated value of d.
Then, combining the NN identifier with a nonlinear DO, an enhanced observer system is constructed as with d = d 0 + p(x), where d 0 is an auxiliary variable, and p(x) is a designed statedependent function and brings out the gain function l(x) such that l(x) = (∂p(x)/∂x) T .Following (6), we have where Wo = W o − Ŵo represents the NN weight estimation error.Let x = x − x and d = d − d be the state and disturbance estimation errors, respectively.Subtracting ( 5) from ( 2) and combining with (7), we obtain the coupled error dynamics of (6) as follows: Before proceeding, we introduce a common assumption for stability analysis [15,16].
Assumption 2. For the identifier NN, there are known positive constants σ Lemma 1. Considering the system (2) and the coupled error dynamics (8), let the identifier NN weight Ŵo be updated by where η 1 , η 2 are the positive updating ratios.Moreover, we select parameter matrices A, P and gain function l(x) to satisfy with ρ > 0. Then all the estimation errors x, d, and W0 are uniformly ultimately bounded (UUB).
Proof.Consider the Lyapunov function candidate given by where , and P = P T is positive definite, which together with some matrix Λ > 0 satisfies A T P + PA = −Λ for the Hurwitz matrix A. By taking the time derivative of L 11 and substituting the coupled error dynamics (8), we can obtain Based on Assumption 2, together with Young's inequality, it follows: Considering ( 10), ( 13) is rewritten as where τ = λ min (Λ) − 1 > 0 ensured by properly selecting positive definite matrix Λ and its minimum eigenvalue λ min (Λ).
Combining with (9), L12 is derived as With the inequality tr WT Note that the relationship tr{A By combining ( 14) and ( 16) and taking their norms, one can derive an upper bound for L1 (t) as Select η 2 ≥ 2σ 2 M and complete the square with respect to Wo , then (17) becomes where Define Therefore, we can conclude that L1 < 0 only if e xd (t) satisfies Furthermore, according to the Lyapunov extension theorem [16], when the inequality (10) holds by selecting proper matrices, we can infer that all the estimation errors x, d, and Wo are UUB.
Remark 1.The gain function matrix l(x) is an important design parameter that can be chosen as linear or nonlinear functions.When the form of system function g(x) is simple, it can be easy to find the function l(x) that satisfies the inequality (10) by substituting appropriate functions into (10).However, if the form of system function g(x) is complex, the trial and error method is employed to select appropriate function l(x) that meets the inequality (10).Although there is no universal design procedure for designing l(x), experience has shown that it is not difficult to find a suitable l(x) for specific applications [36,37].
To effectively handle both system uncertainties and external disturbances, we propose a compound H ∞ optimal SMC scheme that combines the integral SMC with H ∞ control theories.This compound controller is formulated as where u d represents the discontinuous control designed to steer the system trajectories towards and maintain them on the sliding surface, thereby eliminating the effects of matched uncertainties and disturbances.u c denotes the continuous control derived to guarantee the system stability and achieve near-optimal performance under the remaining unmatched disturbances on sliding surfaces.Accordingly, we define the integral sliding surface as follows: where x 0 denotes the initial state, S 0 (x) ∈ R m and G(x) = ∂S 0 (x)/∂x ∈ R m×n .Moreover, it follows from Assumption 1 that a suitable matrix G(x) can be found such that the product G(x)g(x) is invertible.
Taking the time derivative of s(x) as By incorporating the valid estimators d and Ŵo , u d is devised as where µ > 0, sgn(s) ∈ R m is the sign function, and with κ > 0. In particular, it is noted that ζ is designed to tackle the unknown bounds of the approximation errors arisen from the estimated terms d and Ŵo .
Considering the specific implementation of d and Ŵo in (23), we define Theorem 1. Considering system (2) with the sliding surface (21), the discontinuous control u d is devised by (23) with the adaptive law (24), then it can guarantee the convergence of the sliding surface s to zero from the beginning.

Proof. Choose the positive definite Lyapunov function candidate as
Along with the system (2), Ls (t) is derived as Substituting ( 23) and ( 24) into (25), we can have Thus, it is shown from ( 26) that Ls ≤ −µ s 1 < 0 for any s = 0, where s 1 denotes the vector 1-norm.This means the asymptotic stability and convergence of sliding mode motion s(x) = 0 can be guaranteed.Moreover, according to (21), the sliding surface s(x 0 ) = 0 when t = 0, which implies that the system states start on the sliding surface, thus avoiding the need for a separate reaching phase.
From Theorem 1, it is clear that the stable sliding motion s(x) = 0 exists from the initial time; that is, for all t ≥ 0, s(x) = 0 and ṡ(x) = 0.Moreover, the equivalent control method is utilized to obtain the sliding-mode dynamics.Combining ṡ(x) = 0 with (3) and ( 22), the equivalent control can be derived as Then, substitute u deq into (2), the sliding-mode dynamics without matched uncertain term and disturbance component is where )d is the unmatched component of the external disturbance in (3).In order to reduce the influence of multiplier matrix Γ(x) and minimize the unmatched disturbance Γ(x)d u , an optimal projection matrix G * (x) within Γ(x) is provided in the following Lemma.
Lemma 2. Considering nonlinear system (2) with Assumption 1, the optimal projection matrix G * (x) is selected as G * (x) = g + (x), which not only minimizes the norm Γ(x)d u , but also makes the relation Γ(x)d u = d u hold.
Proof.The proof can refer to Theorem 1 in [12].
As a result, with the relation Γ(x)d u = d u , we can express (28) as which means that the discontinuous control u d in (23) can fully counteract the impacts of the matched uncertainties and disturbances.Notice that in (20), u c aims not only to suppress the remaining unmatched disturbances on sliding surface, but also to achieve a near-optimal performance for sliding-mode dynamics (29).This formulation can be seen as a nonlinear H ∞ optimal control problem, which is known to be challenging to solve directly.In the following, we will demonstrate how to find an approximate H ∞ optimal control solution by using the single-network ADP algorithm.

H ∞ Control Design for Sliding-Mode Dynamics
Considering (3) and ( 29), the sliding-mode dynamics is represented as with k(x) = I − g(x)g + (x).Since g(x) and g + (x) are bounded, it follows that the function k(x) is also bounded by k(x) ≤ k M with k M > 0.
For attenuating the remaining unmatched disturbances k(x)d, the corresponding H ∞ control problem of sliding-mode dynamics is established, which aims to seek a feedback control u c to stabilize the system and achieve L 2 -gain no larger than γ, that is, where Q and R are positive definite matrices with appropriate dimensions, and γ > 0 refers to the level of the disturbance attenuation.Based on [32,33], by treating the disturbance d as the other system input, we can reframe the H ∞ optimal control problem for system (30) as a two-player zero-sum game with the following infinite-horizon cost function: Assuming that V(x) ∈ C 1 , the Hamiltonian function with the associated admissible control pair (u c , d) is defined as with ∇V = ∂V(x)/∂x.From Bellman's optimality principle, it follows that the optimal cost function V * (x) satisfies the HJI equation with ∇V * = ∂V * (x)/∂x.Moreover, according to the zero-sum game theory [16], we have the following Nash condition min which ensures the existence of saddle point (u * c , d * ) of the HJI Equation (34).Then, applying the stationary condition, one can derive the optimal control u * c and worst disturbance d * as By substituting (36) and ( 37) into (33), the HJI equation associated with ∇V * becomes Due to the highly nonlinear nature of the relevant HJI equation, obtaining its analytical solution is extremely difficult, if not impossible.To overcome this challenge, we propose an online optimal algorithm that learns the solution of the HJI equation and achieves H ∞ optimal control.This is accomplished through the use of single-network ADP, where only one critic network, implemented by NN, is adopted to approximate the cost function V * related to (38).Therefore, by using the critic NN with l c neurons, V * is represented over a set Ω as follows: with the ideal weight vector W c ∈ R l c being unknown, the vector of activation functions σ c (x) ∈ R l c and the reconstruction error ε c (x).Meanwhile, we have the gradient vector with ∇σ c = ∂σ c (x)/∂x and ∇ε c = ∂ε c (x)/∂x.
By combining (36), ( 37) and (40), it is easy to get Substituting ( 41) and ( 42) into (33), the HJI equation becomes where Because W c in ( 39) is unknown, the critic NN with the estimated weights approximates the cost function in the form of where Ŵc denotes the estimated values of W c .In addition, we can obtain By using ( 36), ( 37) and ( 45), the approximate forms of ( 41) and ( 42) are derived as Then, incorporating (46) and (47) into (43), we have the approximate Hamiltonian as follows: Subtracting (43) from (48), the corresponding Hamiltonian error is defined as To effectively approximate the cost function, one needs to adjust the critic NN weight Ŵc in a manner that minimizes the Hamiltonian error e c .To this end, it is common practice to train the critic NN by minimizing the squared residual error E c , where E c = e T c e c /2.The traditional weight updating laws of critic NN based on gradient descent method can only minimize the squared error, but cannot provide any guarantee for the stability of the resulting system during the learning phase.
However, in practice, the stability is one fundamental requirement of system, and a prerequisite for achieving other higher performance.Thus, not just for minimizing the residual error, but also to guarantee the system stability and eliminate the need for an initial stabilizing control, a weight updating law is developed for the critic NN as follows: where α and β are the positive updating ratios, φ = ∇σ c ( f (x) − D(∇σ c ) T Ŵc /2), φ 1 = φ/(φ T φ + 1), φ s = φ T φ + 1, F 1 and F 2 represent design parameter matrices with suitable dimensions, J a (x) is a Lyapunov function candidate provided in Assumption 4, and the index operator Σ(x, ûc , dw ) is given by with ∇J a = ∂J a (x)/∂x.
Remark 2. Note that in (49), the first term is designed by the normalized gradient descent method for minimizing the residual error.The second term has a well-designed form for ensuring the system's stability, which is derived from the Lyapunov stability analysis.The last term is an additional adjustment term that works or not depends on the index operator Σ(x, ûc , dw ), which is selected based on the derivative of J a (x) along the sliding-mode dynamics (30), namely, Ja (x) = (∇J a ) T f (x) + g(x) ûc + k(x) dw ).Once the system dynamics may become unstable, this results in Ja (x) ≥ 0, then Σ(x, ûc , dw ) = 1 and the last term in (49) is activated.Moreover, based on the negative gradient direction of Ja (x), i.e., −∂ (∇J a ) T ( f (x) − D∇σ T c Ŵc /2) /∂ Ŵc , the last term is designed to reinforce the training process of the critic NN until the system dynamics become stable.This also eliminates the need for an initial stabilizing control, compared with [35][36][37]39], where the stabilizing control is required for initialization; however, in practical applications, finding an initial stabilizing control is quite challenging.

Remark 3.
Based on [14][15][16], it is necessary to satisfy the persistence of excitation (PE) requirement for updating the weights of critic NN, which enhances its ability to explore the state space and is indispensable for the weights to converge to their desired ones.To fulfill the PE requirement, a probing noise is injected into the control input [15], which may cause the instability problem during the online learning.As a result, it is important to design the last term in (49) for stabilizing the resulting system, especially when the probing signal is injected.
The schematic structure of the proposed H ∞ SMC scheme is illustrated in Figure 1.As shown in Figure 1, this structure consists of two main modules: the H ∞ optimal learning module and the enhanced observer module.It should be noted that, based on the deduced sliding-mode dynamics, the learning module can operate independently.However, the original system and the observer module rely on the compound control input u, which includes the approximate H ∞ optimal control ûc obtained from the learning module.Consequently, it is necessary to first run the learning module to obtain the approximate optimal control ûc during the implementation process.
Considering (43), together with Wc = W c − Ŵc , ( 48) is represented as By means of the relation Ẇc = − ˙Ŵ c and incorporating (51) into (49), we obtain Next, the main stability theorem is presented, but before that, one basic common assumption for the critic NN is introduced [16], and the other assumption for the slidingmode dynamics is also needed, which has been used in [34,38].Assumption 3.For the critic NN, there exist known positive constants σ cM , σ dM , ε cM , ε dM and W cM such that σ c (x) ≤ σ cM , ∇σ c ≤ σ dM , ε c (x) ≤ ε cM , ∇ε c ≤ ε dM and W c ≤ W cM , respectively.Moreover, the approximation error ε HJI is bounded above by ε H > 0, namely, ε HJI ≤ ε H . Assumption 4. Considering the sliding-mode dynamics (30) with the optimal control pair (u * c , d * ) in ( 36) and ( 37), let J a (x) be a smooth, radially unbounded and positive definite Lyapunov candidate that satisfies Ja Remark 4. Note that the plausibility of Assumption 4 depends on the boundedness of optimal sliding-mode dynamics, which is usually assumed to be bounded by a function of system state x.
For more details, refer to [34,38].Furthermore, it is impossible to solve (53) directly for getting the form of J a (x).Based on [34], one can obtain J a (x) by selecting an appropriate form, such as a quadratic polynomial.
Theorem 2. Considering the sliding-mode dynamics (30) and its associated cost function (32), the control input and disturbance policy are designed by ( 46) and (47), respectively, along with the critic weight updating law as given by (49).Then, both the sliding-mode state x and the weight estimation error Wc are ensured to be UUB.Furthermore, the obtained control input ûc can be proven to converge to a neighborhood of the optimum control u * c with a small adjustable bound.

Proof. Consider the following Lyapunov function candidate
where β 1 = β/α > 0. By calculating the time derivative of L along the sliding-mode dynamics (30), we have Substituting ( 52) into (54) and making some adjustments, one can get Using Ŵc = W c − Wc , the last two terms in (55) become Defining Υ = [ WT c φ 1 , WT c ] T , and substituting (55) into (56), it can be rewritten as where With Assumption 3 in mind, and recalling the boundedness of φ 1 and D, in particular φ 1 < 1 and D ≤ D M , we can infer that there exists a positive constant δ M in the sense that δ ≤ δ M .For guaranteeing M > 0, the appropriate parameters F 1 and F 2 need to be selected in design.Then, one can upper bound L as follows: with λ min (M) being the minimum eigenvalue of M.
Case 1 : For Σ(x, ûc , dw ) = 0, it follows from (50) that Ja (x) < 0, i.e., (∇J a ) T ẋ < 0, which, together with the PE condition, can ensure that there exists a positive constant such that Ż > .This implies that (∇J a ) T ẋ < − ∇J a < 0.Then, (58) becomes Focus on (59), only if the following inequalities: hold, then L < 0.Moreover, based on the relation Υ ≤ Case 2: For Σ(x, ûc , dw ) = 1, in light of ( 41) and (42), by adding and subtracting β 1 (∇J a ) T D∇ε c /2 into (58), we can derive Then, using (53) in Assumption 4, and recalling the boundedness of D and ∇ε c , (60) is upper bounded as where Φ = δ 2 M /(4λ min (M)) + β 1 D 2 M ε 2 dM /(8λ min (Ψ)), λ min (Ψ) denotes the minimum eigenvalue of Ψ(x).Hence, provided the following inequalities: To sum up, for both Case 1 and Case 2, with proper parameters F 1 and F 2 satisfying M > 0, the inequality ∇J a ≥ max{A 1 , A 2 } = Ā or Wc ≥ max{B 1 , B 2 } = B holds, then, we have L < 0. From the Lyapunov extension theorem [16], it is found that ∇J a and Wc are bounded by Ā and B, respectively.Based on Assumption 4, the Lyapunov candidate J a (x) is radially unbounded, which implies that the boundedness of ∇J a leads to the boundedness of the system state x .In particular, x is bounded by Āx = max{A 1x , A 2x }, where A 1x and A 2x are determined by A 1 and A 2 , respectively.So far, we can conclude that both x and Wc are guaranteed to be UUB.
Next, we will prove ûc converges to a small neighborhood of u * c with an adjustable bound, i.e., ûc − u * c ≤ u .Considering (41) and (46), we have Noticing that Wc is UUB together with the associated bound B = max{B 1 , B 2 }, and invoking g(x) ≤ g M , ∇σ c ≤ σ dM , ∇ε c ≤ ε dM and boundedness of R, it follows that Remark 5. From the expression of B 1 and B 2 , it is seen that B can be kept small with λ min (M) being larger enough.In view of (57), we can enlarge the value of λ min (M) by adjusting the corresponding design parameters F 1 and F 2 .Moreover, we can make the approximate error ε c and its upper bound ε dM sufficiently small when the neuron number l c is large enough.Therefore, we can make the convergence errors u in (62) as small as possible in the design.

Simulation Results
To validate the effectiveness of the proposed H ∞ optimal SMC scheme, two simulation examples are provided.The first example focuses on a single-link robot arm, while the second example deals with a power system.

Single-Link Robot Arm
Considering a nonlinear single-link robot arm [23] and its dynamics given by where θ is the joint rotation angle of robot arm in radians, u refers to the control torque applied to the joint in Nm, and w denotes the lumped uncertain term.Select the system parameters as follows: the arm length L = 0.5 m, the payload mass M = 1 kg, the local gravity acceleration g = 9.81 m/s 2 .the rotational inertia J = 1 kg • m 2 and the viscous friction D = 2 Nm • s/rad.With the system states defined as x 1 = θ and x 2 = θ, and considering the presence of exogenous disturbances, then the dynamics (63) in state-space form can be represented as where d represents the unknown disturbances.Moreover, it is assumed that the initial state is set as x 0 = [1, −0.5] T , the lumped uncertainty term is w(x, u) = x 2 sin(x 1 ) + 0.1 sin(x 1 )u, and the disturbance term is chosen as d = [0.5e−t sin(t), 0.5 sin(t)] T in the simulation.The enhanced observer system, consisting of an NN identifier and a nonlinear DO, can be designed as shown in (6), where the identifier NN is selected as a three-layered feedforward NN with one hidden layer containing six neurons, and the hyperbolic activation function tanh(•) is utilized.The updating ratios are set as η 1 = 30 and η 2 = 2.5, while the weights Ŵo and Vo are initialized with random values chosen from the interval [−0.1, 0.1].The initial observer state is set as x0 = [0.5, 0] T .Moreover, based on Lemma 1, select the Hurwitz matrix A = [−15, 0; 0, −15], p(x) = [10x 1 ; 10x 2 ] and l(x) = [10, 0; 0, 10] to ensure that the inequality (10) holds.The integral sliding surface function is determined by (21), together with G(x) = g + (x) = [0, 1] and S 0 (x) = x 2 .Accordingly, the discontinuous SMC u d is given by ( 23) and (24).For the propose of eliminating the chattering phenomenon, an arctangent function atan(s/ ) with a small positive scalar = 0.005 is employed to replace the sign function sgn(s) in (23).
By considering the SMC law u d , the sliding-mode dynamics can be obtained as where k(x) = I − g(x)g + (x) = [1, 0; 0, 0].We choose the associated cost function as the form of ( 32), together with Q = diag(1, 1), R = 1 and γ = 1.5.For the critic NN, the activation function is chosen as T , which results in Ŵc = [ Ŵc1 , Ŵc2 , . . . ,Ŵc6 ] T .Select the updating ratios α = 1, β = 0.5, the design parameters F 1 = F 2 = 10I, l c = 6 and J a (x) as a quadratic polynomial.Furthermore, the weight vector Ŵc is initialized to zero, which leads to the initial control input of zero.Noticing that the zero initial control cannot make the system (65) stable, it is thus clear that no initial stabilizing control strategy is necessary when implementing the proposed algorithm.
During the learning process, a damped decreasing probing noise is injected into the control input for satisfying PE condition.This noise comprises sinusoids of diverse frequencies and is applied for the first 450 s. Figure 2 shows the trajectories of the critic weights, which eventually converge to Ŵc = [1.0420, 0.0856, −0.0603, −0.2174, 0.2948, −0.0358] T .Figure 3 describes the trajectories of system states in the learning.From Figure 3, one can see that without an initial stabilizing control, the system states stay at or near zero after the probing noise is removed, which indicates that ûc generated by the learning module can effectively stabilize the system.With the converged weights, the approximate H ∞ optimal control ûc can be calculated by (46).Next, we substitute ûc into (21) to obtain an available sliding surface.Subsequently, integrating with the enhanced observer system, the SMC law u d is implemented by using (23) and (24) with the reliable estimations of uncertainties and disturbances.Figure 4 depicts the estimates of disturbances d 1 = 0.5e −t sin(t) and d 2 = 0.5 sin(t), along with small estimation errors.Figure 5 presents the identifications of system states using the identifier NN.It can be observed that the identified states rapidly track the real states, illustrating the effectiveness and efficiency of the identifier NN.Note that the valid estimations d and Ŵo are used to design the SMC law u d , which helps to reduce the sliding-mode gain and alleviate the chattering phenomenon.Figure 6 displays the state trajectories of the robot arm under the compound H ∞ sliding-mode control u = u d + ûc .Figure 7 depicts the compound control u, while the H ∞ control ûc and the SMC law u d are given in Figure 8.These results presented in Figures 6-8 confirm that the compound control u successfully renders the robot arm system stable and exhibits satisfactory performance against both system uncertainties and external disturbances.

Power Plant System
To further validate the effectivity of the proposed scheme, we consider an electric power system comprised of a gas turbine generator, a system load, and an automatic generation control [34].To model this system, the incremental frequency deviation ∆ f G , the generator output power variation ∆P m , and the valve position change of the governor ∆v are taken into consideration.The control input is represented by the speed change ∆P c in position deviation.By defining the state vector x = [∆v, ∆P m , ∆ f G ] T ∈ R 3 , we can express the reduced power system model in state-space form as where g(x) = [1/T g , 0, 0] T , ϑ represents the modeling uncertainty, and d stands for the exterior disturbances.Assume that the uncertain term is ϑ = x 2 sin(x 1 ), and the disturbance term is defined as d(t) = [sin(2πt)e −t , 0, 0.2 sin 2 (t)e −t ] T in the simulation.Let the regulation constant R g = 2.5 Hz/MW, the turbine gain constant K t = 1 s and the generator gain constant K p = 120 Hz/MW.Moreover, the corresponding time constants are set as T g = 0.08 s, T t = 0.1 s and T p = 20 s, respectively.For estimating the unknown uncertainty and disturbance terms, the enhanced observer system is constructed as (6) with a three-layered feedforward NN containing eight hidden neurons and the Hurwitz matrix A = [−12, 0, 0; 0, −12, 0; 0, 0, −12].The activation function, the initial weights, and the updating ratios are the same as in Section 5.1.Let p(x) = [10x 1 , 0, 10x 3 ] T , l(x) = [10, 0, 0; 0, 0, 0; 0, 0, 10], G(x) = g + (x) = [0.08,0, 0] and S 0 (x) = 0.08x 1 .Similarly, an arctangent function atan(s/ ) is used for designing the SMC law u d instead of the sign function sgn(s).
Then, we substitute ûc into the integral sliding surface (21), and we design the SMC law u d by ( 23) and (24).Consequently, the compound control is constructed as u = u d + ûc .After simulation, Figure 11 shows the trajectories of the power system states under this compound control for 15 s. Figure 12 presents the compound control u.From Figures 11 and 12, we can conclude that the compound control effectively stabilizes the system states to the equilibrium point, even in the presence of modeling uncertainties and exterior disturbances.These results undeniably demonstrate the viability and efficiency of the proposed approach.

Conclusions
In this paper, we develop a neural adaptive H ∞ sliding-mode control scheme for uncertain nonlinear systems subject to external disturbances.Based on the enhanced observer system composed of the NN identifier and nonlinear DO, an integral SMC is designed for suppressing the influences of the uncertain term and the matched disturbance component, as well as unknown approximation errors, with no prior knowledge of their upper bounds.Meanwhile, on the sliding surface, the remaining unmatched disturbances are attenuated using the H ∞ optimal control solved by the single critic network-based ADP algorithm.Furthermore, uniform ultimate boundedness stability of the resultant closedloop system can be proven by Lyapunov's method.In addition to the theoretical analysis, two simulation examples are provided to further validate the proposed approach.Recently, the growing interest in saving communication resources or reducing the calculation amount of networked control systems makes the event-triggering mechanism gain more and more attention and undergo rapid development.Hence, how to combine the optimal SMC strategy with the event-triggering mechanism for more complex physical systems, not just for control-affine systems, will be our future research topic.
to represent the approximation errors.Based on the previous analysis and the boundedness of g(x), ζ e is bounded as ||ζ e || ≤ ζ M for an unknown positive constant ζ M .To estimate ζ M , we design ζ as defined in (24), and the estimation error is calculated as ζ = ζ M − ζ.
and the approximate error ε HJI is defined asε HJI = −(∇ε c ) T f (x) + W T c ∇σ c D∇ε c /2 + (∇ε c ) T D∇ε c /4 due to the NN reconstruction error.Furthermore, taking into account k(x) ≤ k M and ||g(x)|| ≤ g M , we can infer that there exists a positive constant D M in the sense that D ≤ D M .

Figure 1 .
Figure 1.The schematic of the adaptive H ∞ SMC scheme.

Figure 2 .
Figure 2. Trajectories of the critic NN weights.

Figure 3 .
Figure 3. Trajectories of system states in the learning.

Figure 5 .
Figure 5. (a) Real state x 1 and identified state x1 , (b) Real state x 2 and identified state x2 .

Figure 6 .
Figure 6.State trajectories of the robotic arm.

Figure 11 .
Figure 11.Trajectories of the electric power system.