Next Article in Journal
Rapidity and Energy Dependencies of Temperatures and Volume Extracted from Identified Charged Hadron Spectra in Proton–Proton Collisions at a Super Proton Synchrotron (SPS)
Next Article in Special Issue
Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation
Previous Article in Journal
GLPS: A Geohash-Based Location Privacy Protection Scheme
Previous Article in Special Issue
Periodic Intermittent Adaptive Control with Saturation for Pinning Quasi-Consensus of Heterogeneous Multi-Agent Systems with External Disturbances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Neural Adaptive H Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming

College of Electronic and Information Engineering, Hebei University, Baoding 071002, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(12), 1570; https://doi.org/10.3390/e25121570
Submission received: 17 October 2023 / Revised: 16 November 2023 / Accepted: 17 November 2023 / Published: 22 November 2023
(This article belongs to the Special Issue Intelligent Modeling and Control)

Abstract

:
This paper focuses on a neural adaptive H sliding-mode control scheme for a class of uncertain nonlinear systems subject to external disturbances by the aid of adaptive dynamic programming (ADP). First, by combining the neural network (NN) approximation method with a nonlinear disturbance observer, an enhanced observer framework is developed for estimating the system uncertainties and observing the external disturbances simultaneously. Then, based on the reliable estimations provided by the enhanced observer, an adaptive sliding-mode controller is meticulously designed, which can effectively counteract the effects of the system uncertainties and the separated matched disturbances, even in the absence of prior knowledge regarding their upper bounds. While the remaining unmatched disturbances are attenuated by means of H control performance on the sliding surface. Moreover, a single critic network-based ADP algorithm is employed to learn the cost function related to the Hamilton–Jacobi–Isaacs equation, and thus, the H optimal control is obtained. An updated law for the critic NN is proposed not only to make the Nash equilibrium achieved, but also to stabilize the sliding-mode dynamics without the need for an initial stabilizing control. In addition, we analyze the uniform ultimate boundedness stability of the resultant closed-loop system via Lyapunov’s method. Finally, the effectiveness of the proposed scheme is verified through simulations of a single-link robot arm and a power system.

1. Introduction

Within the last few decades, multifarious robust control design theories and methods have been proposed for uncertain nonlinear systems [1]. As one of the most efficient and widely used control methods, sliding-mode control (SMC) has garnered significant attention by reason of its simplicity, order reduction and inherent robustness against the matched uncertainties [2]. The classical SMC approach is to exert a discontinuous control to drive the system states onto a prescribed sliding manifold or surface [3]. As long as the sliding surface is reached, the system will become immune from the matched uncertainties and input disturbances. To remove the reaching phase, an integral SMC was developed by using the integral sliding manifold, including an integral term, which can enable the system states to reach and remain on the sliding manifold from the beginning [4,5,6]. Although towards a wide variety of actual systems, the relevant uncertainties and disturbances can be assumed to be matched in the design of control systems, there are also many physical systems, such as permanent magnet synchronous motors [7], underactuated aerial vehicles and robotic systems [8] directly affected by unmatched disturbances. Lately, several new approaches involving the integral SMC have been proposed to stabilize various systems with unmatched disturbances [9,10,11,12,13]. Among these methods, it is worth noticing that in [12,13], the impact of the separated unmatched disturbances would not be amplified after choosing a suitable projection matrix in a sliding manifold and were attenuated by the combination of the integral SMC with H control theories. This provides a feasible and effective way to handle the unmatched disturbances and helps explore the relationships between integral SMC and H control in nonlinear system control design.
In many instances, we expect the control policy not just to make the closed-loop system stable, but to possess certain optimality by minimizing the user-defined cost. For nonlinear systems, the settlement of associated optimal control problems requires solving the Hamilton–Jacobi–Bellman (HJB) equation. While considering H optimal control, based on the dissipativity theory, it can be formulated as an L 2 -gain control problem, which involves solving the Hamilton–Jacobi–Isaacs (HJI) equation [14]. However, the analytical solutions of both HJB and HJI equations are very hard or even impossible to obtain directly because of their inherent nonlinearities [15]. In recent years, a class of neural network (NN) and reinforcement learning (RL)-based intelligent optimization and control methods, referred to as adaptive dynamic programming (ADP), is becoming more and more striking and shows great application potential in solving various optimization problems, and effectively conquers the “curse of dimensionality” [15,16]. By now, many researchers have employed ADP to tackle a variety of optimal control problems for both discrete-time (DT) [17,18,19,20,21,22] and continuous-time (CT) systems [23,24,25,26,27,28]. Moreover, how to combine ADP with other robust methods to achieve better performance and stronger robustness for uncertain nonlinear systems is becoming a new research focus [29,30].
Recently, Modares et al. [31] proposed an online integral RL algorithm that incorporates a non-quadratic discounted cost function to address the constrained-input optimal tracking problem. Luo et al. [32] described an NN-based off-policy learning algorithm within the actor-critic framework to deal with the associated HJI equation, and this algorithm was later extended to find the near-optimal H tracking control solution in [33]. Nevertheless, the influences of potential system or modeling uncertainties were not taken into account in the design. Wang et al. [34] introduced a robust neuro-optimal control approach for input-affine nonlinear systems with both matched and state-depended uncertainties. They achieved this by redesigning the cost function and selecting a suitable feedback gain, whereas the upper bound function of uncertainties is needed for redesigning the cost function to suppress these uncertainties. Mitra et al. [35] presented an optimal SMC scheme for the single-input cascade nonlinear systems with matched bounded disturbances. Fan et al. [36] investigated an adaptive actor–critic-based integral SMC strategy for CT nonlinear systems with unknown terms and input disturbances, where the initial stabilizing control requirement in the learning was quite stringent and limiting in practical applications. Qu et al. [37] developed an adaptive H optimal SMC method in the presence of actuator faults and unmatched disturbances using the ADP algorithm, and further explored the optimal guaranteed cost SMC for constrained-input uncertain systems by formulating an auxiliary system and redefining the utility function [38]. Based on [37], combined with event-triggered mechanisms, Yang et al. [39] provided an event-triggered integral SMC design for nonlinear control-affine systems by leveraging the ADP technique. Note that these methods mentioned above rely on the availability of upper bounds for matched or unmatched disturbances, which may cause over-design and thus leads to an over-conservative control scheme. Additionally, in real-world scenarios, determining precise upper bounds of external disturbances is often a challenging task.
Inspired by the works mentioned earlier, we propose an adaptive neural H SMC scheme for uncertain nonlinear systems subject to external disturbances using the ADP algorithm. Based on the enhanced observer system composed of the NN identifier and nonlinear disturbance observer (DO), an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds. While on the sliding manifold, the remaining unmatched disturbances are attenuated by H optimal control solved by the single-network ADP algorithm. Moreover, the uniform ultimate boundedness stability of the resultant closed-loop system are guaranteed via the Lyapunov approach.
The principal contributions of this study can be enumerated as follows. First, unlike other existing schemes [34,35,36,37,38,39], based on the enhanced observer system, the proposed approach makes the designed sliding-mode controller independent from the relevant upper bounds of uncertainties and disturbances, which renders the implementation much easier and more practical and removes the assumption that the upper bounds need to be known in advance. Second, compared with the algorithms presented in [36,37], our approach can deal with both unknown nonlinear terms and unmatched external disturbances, where the single-network ADP is utilized to approximate an H optimal control. Unlike typical actor–critic–disturbance network architectures, the single critic network structure may bring a simpler implementation, lower calculation amount, and avoid the numerical approximate errors arising from actor and disturbance networks. Third, we introduce an updated law for the critic NN, which not only achieves the Nash equilibrium, but also ensures the stability of the sliding-mode dynamics without the need for an initial stabilizing control in the learning.
The remainder of this paper is arranged as follows. Section 2 outlines the problem formulation and provides some necessary preliminaries. Section 3 describes the design of an integral SMC based on the enhanced observer system. Section 4 presents the application of the single-network ADP to obtain H optimal control for the sliding-mode dynamics, along with stability analysis. Simulations of the robotic arm and a power system are given in Section 5, followed by a summary of this study in Section 6.

2. Problem Formulation

Consider the following uncertain perturbed nonlinear system as
x ˙ = f ( x ) + Δ f ( x ) + g ( x ) + Δ g ( x ) u + d ,
where the state vector x R n is measurable, u R m is the control input, f ( x ) R n and g ( x ) R n × m are the known system drift and input dynamics, respectively; Δ f ( x ) and Δ g ( x ) denote uncertain nonlinear terms that refer to either the inherent characteristics of the system or modeling uncertainties, while d R n represents the unknown external disturbances. Moreover, it is assumed that the system uncertainties Δ f ( x ) and Δ g ( x ) satisfy the matched condition, i.e., Δ f ( x ) + Δ g ( x ) u = g ( x ) w ( x , u ) , then the system (1) is rewritten in the form of
x ˙ = f ( x ) + g ( x ) u + g ( x ) w ( x , u ) + d
with w ( x , u ) being the bounded lumped uncertain term. Let Ω R n be a compact set, and suppose that f ( x ) + g ( x ) u is Lipschitz continuous over Ω with f ( 0 ) = 0 . Besides, d L 2 [ 0 , ] and its derivative d ˙ is bounded such that d ˙ d M with d M > 0 . To avoid any confusion, · denotes the 2-norm of a vector or the Frobenius norm of a matrix hereafter, unless otherwise specified.
Assumption 1.
The input matrix g ( x ) has a full column rank and is norm bounded with g M > 0 , that is, g ( x ) g M for any x. Moreover, the resulting left pseudoinverse g + ( x ) R m × n is given by g + ( x ) = ( g T ( x ) g ( x ) ) 1 g T ( x ) , which is bounded by g + ( x ) b M , where b M , g M are known positive constants.
Based on Assumption 1, d is then decomposed into the matched and unmatched components through the projection of d onto the input matrix g ( x ) as
d = g ( x ) g + ( x ) d + ( I g ( x ) g + ( x ) ) d ,
where I denotes an identity matrix of appropriate dimensions, and g + ( x ) is the left pseudoinverse of g ( x ) . It should be noted that Assumption 1 is somewhat restrictive, which may lessen the applicability scope of the proposed approach to some extent. However, many real-world physical systems, such as the satellite dynamics, the hypersonic flight vehicle and overhead crane systems, have such a property to make this assumption valid [15,20].
To deal with the uncertain nonlinear system (1) with external disturbances, an enhanced observer system is first constructed for estimating the uncertain terms and observing the unknown disturbances simultaneously. Then, based on the reliable estimations, an integral SMC is developed to counteract the impacts of the system uncertainties and the separated matched disturbances, as well as unknown approximation errors, without requiring prior knowledge of their upper bounds. Meanwhile, the remaining unmatched disturbances are attenuated by H optimal control on the sliding surface. Moreover, the single-network ADP algorithm is employed to learn the cost function related to the Hamilton–Jacobi–Isaacs equation, and then, the H optimal control is obtained. What is more, a weight updating law is formulated to ensure both the achievement of Nash equilibrium and the stabilization of sliding-mode dynamics during the learning process.

3. Integral SMC Design Based on the Enhanced Observer System

Recalling the NN universal approximation property, the uncertain term w ( x , u ) can be represented by a three-layered NN as
w ( x , u ) = W o T σ ( V o T x ¯ ) + ε o ( x ) ,
where W o R l o × m and V o R ( n + m ) × l o denote unknown ideal weight matrices between the output and hidden, and hidden and input layers, respectively; x ¯ = [ x T , u T ] T R n + m is the NN input, σ ( · ) R l o represents the activation function with l o hidden layer neurons, and ε o ( x ) R m stands for the NN reconstruction error. To simplify the learning process, only the weights of W o are adapted online, while V o is an initialized set with random values and then remains unchanged during the weight updating process [16].
The NN identifier is designed by
x ^ ˙ = A x ^ + f ( x ) A x + g ( x ) u + g ( x ) W ^ o T σ ( z ) + d ,
where A is a Hurwitz matrix, x ^ is the identifier state, W ^ o is the estimate of W o , and the activation function σ ( z ) = σ ( V o T x ¯ ) with z = V o T x ¯ . Since the unknown disturbance term d is needed in (5), inspired by [11], a nonlinear DO is introduced for obtaining d ^ , namely, the estimated value of d.
Then, combining the NN identifier with a nonlinear DO, an enhanced observer system is constructed as
x ^ ˙ = A x ^ + f ( x ) A x + g ( x ) u + g ( x ) W ^ o T σ ( z ) + d ^ d 0 ˙ = l ( x ) ( f ( x ) + g ( x ) u + g ( x ) W ^ o T σ ( z ) + d 0 + p ( x ) ) ,
with d ^ = d 0 + p ( x ) , where d 0 is an auxiliary variable, and p ( x ) is a designed state-dependent function and brings out the gain function l ( x ) such that l ( x ) = ( p ( x ) / x ) T . Following (6), we have
d ^ ˙ = l ( x ) d ^ + l ( x ) d + l ( x ) g ( x ) W ˜ o T σ ( z ) + l ( x ) g ( x ) ε o ( x ) ,
where W ˜ o = W o W ^ o represents the NN weight estimation error. Let x ˜ = x x ^ and d ˜ = d d ^ be the state and disturbance estimation errors, respectively. Subtracting (5) from (2) and combining with (7), we obtain the coupled error dynamics of (6) as follows:
x ˜ ˙ = A x ˜ + g ( x ) W ˜ o T σ ( z ) + d ˜ + g ( x ) ε o ( x ) , d ˜ ˙ = l ( x ) d ˜ l ( x ) g ( x ) W ˜ o T σ ( z ) + d ˙ l ( x ) g ( x ) ε o ( x ) .
Before proceeding, we introduce a common assumption for stability analysis [15,16].
Assumption 2.
For the identifier NN, there are known positive constants σ M , ε M , W M and V M in the sense that σ ( z ) σ M , ε o ( x ) ε M , W o W M and V o V M , respectively.
Lemma 1.
Considering the system (2) and the coupled error dynamics (8), let the identifier NN weight W ^ o be updated by
W ^ ˙ o = η 1 σ ( z ) x ˜ T A 1 g ( x ) η 2 ( x ˜ + 1 ) W ^ o ,
where η 1 , η 2 are the positive updating ratios. Moreover, we select parameter matrices A, P and gain function l ( x ) to satisfy
P T P l ( x ) l T ( x ) + l ( x ) g ( x ) g T ( x ) l T ( x ) ρ I
with ρ > 0 . Then all the estimation errors x ˜ , d ˜ , and W ˜ 0 are uniformly ultimately bounded (UUB).
Proof. 
Consider the Lyapunov function candidate given by
L 1 = 1 2 x ˜ T P x ˜ + 1 2 d ˜ T d ˜ + 1 2 tr { W ˜ o T W ˜ o } ,
where L 11 = x ˜ T P x ˜ / 2 + d ˜ T d ˜ / 2 , L 12 = tr { W ˜ o T W ˜ o } / 2 , and P = P T is positive definite, which together with some matrix Λ > 0 satisfies A T P + P A = Λ for the Hurwitz matrix A. By taking the time derivative of L 11 and substituting the coupled error dynamics (8), we can obtain
L ˙ 11 = 1 2 x ˜ T ( A T P + P A ) x ˜ + x ˜ T P d ˜ + x ˜ T P g ( x ) ε o ( x ) + x ˜ T P g ( x ) W ˜ o T σ ( z ) d ˜ T l ( x ) × g ( x ) W ˜ o T σ ( z ) 1 2 d ˜ T l ( x ) + l T ( x ) d ˜ d ˜ T l ( x ) g ( x ) ε o ( x ) + d ˜ T d ˙ .
Based on Assumption 2, together with Young’s inequality, it follows:
L ˙ 11 1 2 x ˜ T Λ x ˜ + 1 2 x ˜ T x ˜ + 1 2 d ˜ T P T P l ( x ) l T ( x ) + l ( x ) g ( x ) g T ( x ) l T ( x ) d ˜ + d ˜ T d ˙ + x ˜ T P g ( x ) W ˜ o T σ ( z ) + x ˜ T P g ( x ) ε o ( x ) + σ M 2 W ˜ o 2 + ε M 2 .
Considering (10), (13) is rewritten as
L ˙ 11 1 2 τ x ˜ T x ˜ 1 2 ρ d ˜ T d ˜ + x ˜ T P g ( x ) W ˜ o T σ ( z ) + x ˜ T P g ( x ) ε o ( x ) + d ˜ T d ˙ + σ M 2 W ˜ o 2 + ε M 2 ,
where τ = λ min ( Λ ) 1 > 0 ensured by properly selecting positive definite matrix Λ and its minimum eigenvalue λ min ( Λ ) .
Combining with (9), L ˙ 12 is derived as
L ˙ 12 = tr { η 1 W ˜ o T σ ( z ) x ˜ T A 1 g ( x ) + η 2 W ˜ o T x ˜ W ^ o + η 2 W ˜ o T W ^ o } .
With the inequality tr { W ˜ o T W ^ o } W o 2 / 2 W ˜ o 2 / 2 , (15) becomes
L ˙ 12 tr { η 1 W ˜ o T σ ( z ) x ˜ T A 1 g ( x ) } + tr { η 2 W ˜ o T x ˜ W ^ o } + η 2 2 W o 2 η 2 2 W ˜ o 2 .
Note that the relationship tr { A T B } = B T A for all A R n , B R n and the inequality tr { W ˜ o T ( W o W ˜ o ) } W M W ˜ o W ˜ o 2 , we can have
L ˙ 12 η 1 σ M g M x ˜ A 1 W ˜ o + η 2 W M x ˜ W ˜ o η 2 x ˜ W ˜ o 2 + η 2 2 W o 2 η 2 2 W ˜ o 2 .
By combining (14) and (16) and taking their norms, one can derive an upper bound for L ˙ 1 ( t ) as
L ˙ 1 1 2 τ x ˜ 2 + ( g M ε M P + g M σ M P + η 1 g M σ M A 1 + η 2 W M W ˜ o η 2 × W ˜ o 2 ) x ˜ 1 2 ρ d ˜ 2 + d M d ˜ + η 2 2 W o 2 η 2 2 σ M 2 2 W ˜ o 2 + ε M 2 .
Select η 2 2 σ M 2 and complete the square with respect to W ˜ o , then (17) becomes
L ˙ 1 1 2 τ x ˜ 2 1 2 ρ d ˜ 2 + g M ε M P η 2 W ˜ o Θ 1 2 + η 2 Θ 1 2 x ˜ + d M d ˜ ( t ) + Θ 2 ,
where
Θ 1 = g M σ M P + η 1 g M σ M A 1 + η 2 W M 2 η 2 , Θ 2 = η 2 W o 2 + 2 ε M 2 2 .
Define
e x d = x ˜ d ˜ , E o = 1 2 τ I 0 0 ρ I
and B o = g M ε M P + η 2 Θ 1 2 , d M , we can further derive
L ˙ 1 λ min ( E o ) e x d 2 + B o e x d + Θ 2 .
Therefore, we can conclude that L ˙ 1 < 0 only if e x d ( t ) satisfies
e x d > B o 2 λ min ( E o ) + B o 2 4 λ min 2 ( E o ) + Θ 2 λ min ( E o ) .
Furthermore, according to the Lyapunov extension theorem [16], when the inequality (10) holds by selecting proper matrices, we can infer that all the estimation errors x ˜ , d ˜ , and W ˜ o are UUB. □
Remark 1.
The gain function matrix l ( x ) is an important design parameter that can be chosen as linear or nonlinear functions. When the form of system function g ( x ) is simple, it can be easy to find the function l ( x ) that satisfies the inequality (10) by substituting appropriate functions into (10). However, if the form of system function g ( x ) is complex, the trial and error method is employed to select appropriate function l ( x ) that meets the inequality (10). Although there is no universal design procedure for designing l ( x ) , experience has shown that it is not difficult to find a suitable l ( x ) for specific applications [36,37].
To effectively handle both system uncertainties and external disturbances, we propose a compound H optimal SMC scheme that combines the integral SMC with H control theories. This compound controller is formulated as
u = u d + u c ,
where u d represents the discontinuous control designed to steer the system trajectories towards and maintain them on the sliding surface, thereby eliminating the effects of matched uncertainties and disturbances. u c denotes the continuous control derived to guarantee the system stability and achieve near-optimal performance under the remaining unmatched disturbances on sliding surfaces.
Accordingly, we define the integral sliding surface as follows:
s ( x ) = S 0 ( x ) S 0 ( x 0 ) 0 t G ( x ) f ( x ) + g ( x ) u c d v ,
where x 0 denotes the initial state, S 0 ( x ) R m and G ( x ) = S 0 ( x ) / x R m × n . Moreover, it follows from Assumption 1 that a suitable matrix G ( x ) can be found such that the product G ( x ) g ( x ) is invertible.
Taking the time derivative of s ( x ) as
s ˙ ( x ) = G ( x ) g ( x ) u d + g ( x ) w ( x , u ) + d .
By incorporating the valid estimators d ^ and W ^ o , u d is devised as
u d = ( G ( x ) g ( x ) ) 1 G ( x ) d ^ + G ( x ) g ( x ) W ^ o T σ ( z ) + μ sgn ( s ) + G ( x ) G T ( x ) s s T G ( x ) ζ ,
where μ > 0 , sgn ( s ) R m is the sign function, and ζ R is generated by
ζ ˙ = κ s T G ( x )
with κ > 0 . In particular, it is noted that ζ is designed to tackle the unknown bounds of the approximation errors arisen from the estimated terms d ^ and W ^ o .
Considering the specific implementation of d ^ and W ^ o in (23), we define ζ e = d ˜ + g ( x ) W ˜ o σ ( z ) + g ( x ) ε o ( x ) to represent the approximation errors. Based on the previous analysis and the boundedness of g ( x ) , ζ e is bounded as ζ e ζ M for an unknown positive constant ζ M . To estimate ζ M , we design ζ as defined in (24), and the estimation error is calculated as ζ ˜ = ζ M ζ .
Theorem 1.
Considering system (2) with the sliding surface (21), the discontinuous control u d is devised by (23) with the adaptive law (24), then it can guarantee the convergence of the sliding surface s to zero from the beginning.
Proof. 
Choose the positive definite Lyapunov function candidate as
L s = 1 2 s T s + 1 2 κ 1 ζ ˜ 2 .
Along with the system (2), L s ˙ ( t ) is derived as
L s ˙ = s T s ˙ κ 1 ζ ˜ ζ ˙ = s T G ( x ) g ( x ) u d + g ( x ) w ( x , u ) + d κ 1 ζ ˜ ζ ˙ .
Substituting (23) and (24) into (25), we can have
L s ˙ = s T G ( x ) ( d ˜ + g ( x ) W ˜ o T σ ( z ) + g ( x ) ε o ( x ) ) μ s T sgn ( s ) s T G ( x ) ζ ζ ˜ s T G ( x ) = s T G ( x ) ζ e μ s T sgn ( s ) s T G ( x ) ζ ζ ˜ s T G ( x ) .
Using ζ e ζ M and the estimation error ζ ˜ = ζ M ζ yields
L s ˙ s T G ( x ) ζ M s T G ( x ) ζ ζ ˜ s T G ( x ) μ s T sgn ( s ) μ s T sgn ( s ) .
Thus, it is shown from (26) that L s ˙ μ s 1 < 0 for any s 0 , where s 1 denotes the vector 1-norm. This means the asymptotic stability and convergence of sliding mode motion s ( x ) = 0 can be guaranteed. Moreover, according to (21), the sliding surface s ( x 0 ) = 0 when t = 0 , which implies that the system states start on the sliding surface, thus avoiding the need for a separate reaching phase. □
From Theorem 1, it is clear that the stable sliding motion s ( x ) = 0 exists from the initial time; that is, for all t 0 , s ( x ) = 0 and s ˙ ( x ) = 0 . Moreover, the equivalent control method is utilized to obtain the sliding-mode dynamics. Combining s ˙ ( x ) = 0 with (3) and (22), the equivalent control can be derived as
u deq = ( G ( x ) g ( x ) ) 1 G ( x ) I g ( x ) g + ( x ) d g + ( x ) d w ( x , u ) .
Then, substitute u deq into (2), the sliding-mode dynamics without matched uncertain term and disturbance component is
x ˙ = f ( x ) + g ( x ) u c + Γ ( x ) d u ,
where Γ ( x ) = I g ( x ) ( G ( x ) g ( x ) ) 1 G ( x ) , d u = ( I g ( x ) g + ( x ) ) d is the unmatched component of the external disturbance in (3). In order to reduce the influence of multiplier matrix Γ ( x ) and minimize the unmatched disturbance Γ ( x ) d u , an optimal projection matrix G * ( x ) within Γ ( x ) is provided in the following Lemma.
Lemma 2.
Considering nonlinear system (2) with Assumption 1, the optimal projection matrix G * ( x ) is selected as G * ( x ) = g + ( x ) , which not only minimizes the norm Γ ( x ) d u , but also makes the relation Γ ( x ) d u = d u hold.
Proof. 
The proof can refer to Theorem 1 in [12]. □
As a result, with the relation Γ ( x ) d u = d u , we can express (28) as
x ˙ = f ( x ) + g ( x ) u c + d u ,
which means that the discontinuous control u d in (23) can fully counteract the impacts of the matched uncertainties and disturbances.
Notice that in (20), u c aims not only to suppress the remaining unmatched disturbances on sliding surface, but also to achieve a near-optimal performance for sliding-mode dynamics (29). This formulation can be seen as a nonlinear H optimal control problem, which is known to be challenging to solve directly. In the following, we will demonstrate how to find an approximate H optimal control solution by using the single-network ADP algorithm.

4. H Control Design for Sliding-Mode Dynamics

Considering (3) and (29), the sliding-mode dynamics is represented as
x ˙ = f ( x ) + g ( x ) u c + k ( x ) d ,
with k ( x ) = I g ( x ) g + ( x ) . Since g ( x ) and g + ( x ) are bounded, it follows that the function k ( x ) is also bounded by k ( x ) k M with k M > 0 .
For attenuating the remaining unmatched disturbances k ( x ) d , the corresponding H control problem of sliding-mode dynamics is established, which aims to seek a feedback control u c to stabilize the system and achieve L 2 -gain no larger than γ , that is,
0 x T Q x + u c T R u c d v γ 2 0 d T d d v ,
where Q and R are positive definite matrices with appropriate dimensions, and γ > 0 refers to the level of the disturbance attenuation. Based on [32,33], by treating the disturbance d as the other system input, we can reframe the H optimal control problem for system (30) as a two-player zero-sum game with the following infinite-horizon cost function:
V ( x ) = t x T Q x + u c T R u c γ 2 d T d d v .
Assuming that V ( x ) C 1 , the Hamiltonian function with the associated admissible control pair ( u c , d ) is defined as
H ( x , V , u c , d ) = x T Q x + u c T R u c γ 2 d T d + ( V ) T f ( x ) + g ( x ) u c + k ( x ) d
with V = V ( x ) / x . From Bellman’s optimality principle, it follows that the optimal cost function V * ( x ) satisfies the HJI equation
0 = min u c max d H ( x , V * , u c , d )
with V * = V * ( x ) / x . Moreover, according to the zero-sum game theory [16], we have the following Nash condition
min u c max d H ( x , V * , u c , d ) = max d min u c H ( x , V * , u c , d ) ,
which ensures the existence of saddle point ( u c * , d * ) of the HJI Equation (34). Then, applying the stationary condition, one can derive the optimal control u c * and worst disturbance d * as
u c * = 1 2 R 1 g T ( x ) V * ,
d * = 1 2 γ 2 k T ( x ) V * .
By substituting (36) and (37) into (33), the HJI equation associated with V * becomes
0 = x T Q x + ( V * ) T f ( x ) 1 4 ( V * ) T g ( x ) R 1 g T ( x ) V * + 1 4 γ 2 ( V * ) T k T ( x ) k ( x ) V * .
Due to the highly nonlinear nature of the relevant HJI equation, obtaining its analytical solution is extremely difficult, if not impossible. To overcome this challenge, we propose an online optimal algorithm that learns the solution of the HJI equation and achieves H optimal control. This is accomplished through the use of single-network ADP, where only one critic network, implemented by NN, is adopted to approximate the cost function V * related to (38). Therefore, by using the critic NN with l c neurons, V * is represented over a set Ω as follows:
V * ( x ) = W c T σ c ( x ) + ε c ( x )
with the ideal weight vector W c R l c being unknown, the vector of activation functions σ c ( x ) R l c and the reconstruction error ε c ( x ) . Meanwhile, we have the gradient vector
V * = ( σ c ) T W c + ε c
with σ c = σ c ( x ) / x and ε c = ε c ( x ) / x .
By combining (36), (37) and (40), it is easy to get
u c * = 1 2 R 1 g T ( x ) ( σ c ) T W c + ε c ,
d * = 1 2 γ 2 k T ( x ) ( σ c ) T W c + ε c .
Substituting (41) and (42) into (33), the HJI equation becomes
0 = H ( x , V * , u c * , d * ) = x T Q x + W c T σ c f ( x ) 1 4 W c T σ c D ( σ c ) T W c ε HJI ,
where D = g ( x ) R 1 g T ( x ) k ( x ) k T ( x ) / γ 2 , and the approximate error ε HJI is defined as ε HJI = ( ε c ) T f ( x ) + W c T σ c D ε c / 2 + ( ε c ) T D ε c / 4 due to the NN reconstruction error. Furthermore, taking into account k ( x ) k M and g ( x ) g M , we can infer that there exists a positive constant D M in the sense that D D M .
Because W c in (39) is unknown, the critic NN with the estimated weights approximates the cost function in the form of
V ^ ( x ) = W ^ c T σ c ( x ) ,
where W ^ c denotes the estimated values of W c . In addition, we can obtain
V ^ = ( σ c ) T W ^ c .
By using (36), (37) and (45), the approximate forms of (41) and (42) are derived as
u ^ c = 1 2 R 1 g T ( x ) ( σ c ) T W ^ c ,
d ^ w = 1 2 γ 2 k T ( x ) ( σ c ) T W ^ c .
Then, incorporating (46) and (47) into (43), we have the approximate Hamiltonian as follows:
H ( x , W ^ c , u ^ c , d ^ w ) = x T Q x + W ^ c T σ c f ( x ) 1 4 W ^ c T σ c D ( σ c ) T W ^ c .
Subtracting (43) from (48), the corresponding Hamiltonian error is defined as
e c = H ( x , W ^ c , u ^ c , d ^ w ) H ( x , V * , u c * , d * ) = H ( x , W ^ c , u ^ c , d ^ w ) .
To effectively approximate the cost function, one needs to adjust the critic NN weight W ^ c in a manner that minimizes the Hamiltonian error e c . To this end, it is common practice to train the critic NN by minimizing the squared residual error E c , where E c = e c T e c / 2 . The traditional weight updating laws of critic NN based on gradient descent method can only minimize the squared error, but cannot provide any guarantee for the stability of the resulting system during the learning phase.
However, in practice, the stability is one fundamental requirement of system, and a prerequisite for achieving other higher performance. Thus, not just for minimizing the residual error, but also to guarantee the system stability and eliminate the need for an initial stabilizing control, a weight updating law is developed for the critic NN as follows:
W ^ ˙ c = α ϕ ( ϕ T ϕ + 1 ) 2 e c + α 4 σ c D ( σ c ) T W ^ c ϕ 1 T ϕ s W ^ c α ( F 2 F 1 ϕ 1 T ) W ^ c + β 2 Σ ( x , u ^ c , d ^ w ) σ c D J a ,
where α and β are the positive updating ratios, ϕ = σ c ( f ( x ) D ( σ c ) T W ^ c / 2 ) , ϕ 1 = ϕ / ( ϕ T ϕ + 1 ) , ϕ s = ϕ T ϕ + 1 , F 1 and F 2 represent design parameter matrices with suitable dimensions, J a ( x ) is a Lyapunov function candidate provided in Assumption 4, and the index operator Σ ( x , u ^ c , d ^ w ) is given by
Σ ( x , u ^ c , d ^ w ) = 0 , if J ˙ a ( x ) = ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) < 0 1 , otherwise
with J a = J a ( x ) / x .
Remark 2.
Note that in (49), the first term is designed by the normalized gradient descent method for minimizing the residual error. The second term has a well-designed form for ensuring the system’s stability, which is derived from the Lyapunov stability analysis. The last term is an additional adjustment term that works or not depends on the index operator Σ ( x , u ^ c , d ^ w ) , which is selected based on the derivative of J a ( x ) along the sliding-mode dynamics (30), namely, J a ˙ ( x ) = ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) . Once the system dynamics may become unstable, this results in J a ˙ ( x ) 0 , then Σ ( x , u ^ c , d ^ w ) = 1 and the last term in (49) is activated. Moreover, based on the negative gradient direction of J a ˙ ( x ) , i.e., ( ( J a ) T ( f ( x ) D σ c T W ^ c / 2 ) ) / W ^ c , the last term is designed to reinforce the training process of the critic NN until the system dynamics become stable. This also eliminates the need for an initial stabilizing control, compared with [35,36,37,39], where the stabilizing control is required for initialization; however, in practical applications, finding an initial stabilizing control is quite challenging.
Remark 3.
Based on [14,15,16], it is necessary to satisfy the persistence of excitation (PE) requirement for updating the weights of critic NN, which enhances its ability to explore the state space and is indispensable for the weights to converge to their desired ones. To fulfill the PE requirement, a probing noise is injected into the control input [15], which may cause the instability problem during the online learning. As a result, it is important to design the last term in (49) for stabilizing the resulting system, especially when the probing signal is injected.
The schematic structure of the proposed H SMC scheme is illustrated in Figure 1. As shown in Figure 1, this structure consists of two main modules: the H optimal learning module and the enhanced observer module. It should be noted that, based on the deduced sliding-mode dynamics, the learning module can operate independently. However, the original system and the observer module rely on the compound control input u, which includes the approximate H optimal control u ^ c obtained from the learning module. Consequently, it is necessary to first run the learning module to obtain the approximate optimal control u ^ c during the implementation process.
Considering (43), together with W ˜ c = W c W ^ c , (48) is represented as
e c = W ˜ c T σ c f ( x ) 1 2 D ( σ c ) T W ^ c + 1 4 W ˜ c T σ c D ( σ c ) T W ˜ c + ε HJI .
By means of the relation W ˜ ˙ c = W ^ ˙ c and incorporating (51) into (49), we obtain
W ˜ ˙ c = α ϕ 1 ϕ s W ˜ c T ϕ 1 4 W ˜ c T σ c D ( σ c ) T W ˜ c ε HJI α 4 σ c D ( σ c ) T W ^ c ϕ 1 T ϕ s W ^ c + α ( F 2 F 1 ϕ 1 T ) W ^ c β 2 Π ( x , u ^ c , d ^ w ) σ c D J a .
Next, the main stability theorem is presented, but before that, one basic common assumption for the critic NN is introduced [16], and the other assumption for the sliding-mode dynamics is also needed, which has been used in [34,38].
Assumption 3.
For the critic NN, there exist known positive constants σ cM , σ dM , ε cM , ε dM and W cM such that σ c ( x ) σ cM , σ c σ dM , ε c ( x ) ε cM , ε c ε dM and W c W cM , respectively. Moreover, the approximation error ε HJI is bounded above by ε H > 0 , namely, ε HJI ε H .
Assumption 4.
Considering the sliding-mode dynamics (30) with the optimal control pair ( u c * , d * ) in (36) and (37), let J a ( x ) be a smooth, radially unbounded and positive definite Lyapunov candidate that satisfies J a ˙ ( x ) = ( J a ) T ( f ( x ) + g ( x ) u c * + k ( x ) d * ) < 0 . Moreover, it is assumed that a positive definite matrix Ψ ( x ) makes ( J * ) T Ψ ( x ) J a = x T Q x + u c * T R u c * γ 2 d * T d * hold. Then, one can derive
( J a ) T ( f ( x ) + g ( x ) u c * + k ( x ) d * ) = ( J a ) T Ψ ( x ) J a .
Remark 4.
Note that the plausibility of Assumption 4 depends on the boundedness of optimal sliding-mode dynamics, which is usually assumed to be bounded by a function of system state x. For more details, refer to [34,38]. Furthermore, it is impossible to solve (53) directly for getting the form of J a ( x ) . Based on [34], one can obtain J a ( x ) by selecting an appropriate form, such as a quadratic polynomial.
Theorem 2.
Considering the sliding-mode dynamics (30) and its associated cost function (32), the control input and disturbance policy are designed by (46) and (47), respectively, along with the critic weight updating law as given by (49). Then, both the sliding-mode state x and the weight estimation error W ˜ c are ensured to be UUB. Furthermore, the obtained control input u ^ c can be proven to converge to a neighborhood of the optimum control u c * with a small adjustable bound.
Proof. 
Consider the following Lyapunov function candidate
L = 1 2 W ˜ c T α 1 W ˜ c + β 1 J a ( x ) ,
where β 1 = β / α > 0 . By calculating the time derivative of L along the sliding-mode dynamics (30), we have
L ˙ = W ˜ c T W ˜ ˙ c + β 1 ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) .
Substituting (52) into (54) and making some adjustments, one can get
L ˙ = W ˜ c T ϕ 1 ϕ 1 T W ˜ c + β 1 ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) + 1 4 W ˜ c T σ c D ( σ c ) T W c ϕ 1 T ϕ s W ˜ c 1 4 W ˜ c T σ c D ( σ c ) T W c ϕ 1 T ϕ s W c 1 4 W ˜ c T σ c D ( σ c ) T W ˜ c ϕ 1 T ϕ s W c + W ˜ c T ϕ 1 T ϕ s ε HJI β 2 Σ ( x , u ^ c , d ^ w ) W ˜ c T σ c D J a + W ˜ c T F 2 W ^ c W ˜ c T F 1 ϕ 1 T W ^ c .
Using W ^ c = W c W ˜ c , the last two terms in (55) become
W ˜ c T F 2 W ^ c W ˜ c T F 1 ϕ 1 T W ^ c = W ˜ c T F 2 W c W ˜ c T F 2 W ˜ c W ˜ c T F 1 ϕ 1 T W c + W ˜ c T F 1 ϕ 1 T W ˜ c .
Defining Υ = [ W ˜ c T ϕ 1 , W ˜ c T ] T , and substituting (55) into (56), it can be rewritten as
L ˙ = Υ T M Υ + Υ T δ + β 1 ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) β 2 Σ ( x , u ^ c , d ^ w ) W ˜ c T σ c D J a ,
where
M = I σ c D ( σ c ) T W c 4 ϕ s F 1 2 σ c D ( σ c ) T W c 4 ϕ s F 1 2 F 2 , δ = 1 ϕ s ε HJI σ c D ( σ c ) T W c 4 ϕ s + F 2 W c F 1 ϕ 1 T W c .
With Assumption 3 in mind, and recalling the boundedness of ϕ 1 and D, in particular ϕ 1 < 1 and D D M , we can infer that there exists a positive constant δ M in the sense that δ δ M . For guaranteeing M > 0 , the appropriate parameters F 1 and F 2 need to be selected in design. Then, one can upper bound L ˙ as follows:
L ˙ λ min ( M ) Υ 2 + δ M Υ + β 1 ( J a ) T ( f ( x ) + g ( x ) u ^ c + k ( x ) d ^ w ) β 1 2 Σ ( x , u ^ c , d ^ w ) W ˜ c T σ c D J a
with λ min ( M ) being the minimum eigenvalue of M.
According to (50), there are two cases to consider: Σ ( x , u ^ c , d ^ w ) = 0 and Σ ( x , u ^ c , d ^ w ) = 1 for (58) in the following analysis.
C a s e  1: For Σ ( x , u ^ c , d ^ w ) = 0 , it follows from (50) that J ˙ a ( x ) < 0 , i.e., ( J a ) T x ˙ < 0 , which, together with the PE condition, can ensure that there exists a positive constant ϱ such that Z ˙ > ϱ . This implies that ( J a ) T x ˙ < ϱ J a < 0 . Then, (58) becomes
L ˙ β 1 ( J a ) T x ˙ λ min ( M ) Υ 2 + δ M Υ < β 1 ϱ J a λ min ( M ) Υ δ M 2 λ min ( M ) 2 + δ M 2 4 λ min ( M ) .
Focus on (59), only if the following inequalities:
J a > δ M 2 4 λ min ( M ) ϱ A 1
or
Υ > δ M 2 λ min ( M )
hold, then L ˙ < 0 . Moreover, based on the relation Υ ϕ 1 2 + 1 W ˜ c with ϕ 1 < 1 , we can derive
W ˜ c > δ M 2 2 λ min ( M ) B 1 .
C a s e  2: For Σ ( x , u ^ c , d ^ w ) = 1 , in light of (41) and (42), by adding and subtracting β 1 ( J a ) T D ε c / 2 into (58), we can derive
L ˙ λ min ( M ) Υ δ M 2 λ min ( M ) 2 + δ M 2 4 λ min ( M ) + β 1 ( J a ) T ( f ( x ) + g ( x ) u c * + k ( x ) d * ) + β 1 2 ( J a ) T D ε c .
Then, using (53) in Assumption 4, and recalling the boundedness of D and ε c , (60) is upper bounded as
L ˙ λ min ( M ) Υ δ M 2 λ min ( M ) 2 β 1 2 λ min ( Ψ ) J a 2 + Φ ,
where Φ = δ M 2 / ( 4 λ min ( M ) ) + β 1 D M 2 ε dM 2 / ( 8 λ min ( Ψ ) ) , λ min ( Ψ ) denotes the minimum eigenvalue of Ψ ( x ) . Hence, provided the following inequalities:
J a > 2 Φ β 1 λ min ( Ψ ) A 2
or
Υ > Φ λ min ( M ) + δ M 2 λ min ( M )
hold, one has L ˙ < 0 . Further, by the relation Υ 2 W ˜ c , we have
W ˜ c > Φ 2 λ min ( M ) + δ M 2 2 λ min ( M ) B 2 .
To sum up, for both C a s e  1 and C a s e  2, with proper parameters F 1 and F 2 satisfying M > 0 , the inequality J a max { A 1 , A 2 } = A ¯ or W ˜ c max { B 1 , B 2 } = B ¯ holds, then, we have L ˙ < 0 . From the Lyapunov extension theorem [16], it is found that J a and W ˜ c are bounded by A ¯ and B ¯ , respectively. Based on Assumption 4, the Lyapunov candidate J a ( x ) is radially unbounded, which implies that the boundedness of J a leads to the boundedness of the system state x . In particular, x is bounded by A ¯ x = max { A 1 x , A 2 x } , where A 1 x and A 2 x are determined by A 1 and A 2 , respectively. So far, we can conclude that both x and W ˜ c are guaranteed to be UUB.
Next, we will prove u ^ c converges to a small neighborhood of u c * with an adjustable bound, i.e., u ^ c u c * ϵ u . Considering (41) and (46), we have
u ^ c u c * = 1 2 R 1 g T ( x ) ( ( σ c ) T W ˜ c + ε c ) .
Noticing that W ˜ c is UUB together with the associated bound B ¯ = max { B 1 , B 2 } , and invoking g ( x ) g M , σ c σ dM , ε c ε dM and boundedness of R, it follows that
u ^ c u c * 1 2 λ max ( R 1 ) g M ( σ dM B ¯ + ε dM ) ϵ u .
 □
Remark 5.
From the expression of B 1 and B 2 , it is seen that B ¯ can be kept small with λ min ( M ) being larger enough. In view of (57), we can enlarge the value of λ min ( M ) by adjusting the corresponding design parameters F 1 and F 2 . Moreover, we can make the approximate error ε c and its upper bound ε dM sufficiently small when the neuron number l c is large enough. Therefore, we can make the convergence errors ϵ u in (62) as small as possible in the design.

5. Simulation Results

To validate the effectiveness of the proposed H optimal SMC scheme, two simulation examples are provided. The first example focuses on a single-link robot arm, while the second example deals with a power system.

5.1. Single-Link Robot Arm

Considering a nonlinear single-link robot arm [23] and its dynamics given by
J θ ¨ = M g L sin ( θ ) D θ ˙ + u + w ,
where θ is the joint rotation angle of robot arm in radians, u refers to the control torque applied to the joint in Nm , and w denotes the lumped uncertain term. Select the system parameters as follows: the arm length L = 0.5 m , the payload mass M = 1 kg , the local gravity acceleration g = 9.81 m / s 2 . the rotational inertia J = 1 kg · m 2 and the viscous friction D = 2 Nm · s / rad . With the system states defined as x 1 = θ and x 2 = θ ˙ , and considering the presence of exogenous disturbances, then the dynamics (63) in state-space form can be represented as
x ˙ 1 x ˙ 2 = x 2 4.905 sin ( x 1 ) 2 x 2 + 0 1 ( u + w ) + d ,
where d represents the unknown disturbances. Moreover, it is assumed that the initial state is set as x 0 = [ 1 , 0.5 ] T , the lumped uncertainty term is w ( x , u ) = x 2 sin ( x 1 ) + 0.1 sin ( x 1 ) u , and the disturbance term is chosen as d = [ 0.5 e t sin ( t ) , 0.5 sin ( t ) ] T in the simulation.
The enhanced observer system, consisting of an NN identifier and a nonlinear DO, can be designed as shown in (6), where the identifier NN is selected as a three-layered feedforward NN with one hidden layer containing six neurons, and the hyperbolic activation function tanh ( · ) is utilized. The updating ratios are set as η 1 = 30 and η 2 = 2.5 , while the weights W ^ o and V ^ o are initialized with random values chosen from the interval [ 0.1 , 0.1 ] . The initial observer state is set as x ^ 0 = [ 0.5 , 0 ] T . Moreover, based on Lemma 1, select the Hurwitz matrix A = [ 15 , 0 ; 0 , 15 ] , p ( x ) = [ 10 x 1 ; 10 x 2 ] and l ( x ) = [ 10 , 0 ; 0 , 10 ] to ensure that the inequality (10) holds. The integral sliding surface function is determined by (21), together with G ( x ) = g + ( x ) = [ 0 , 1 ] and S 0 ( x ) = x 2 . Accordingly, the discontinuous SMC u d is given by (23) and (24). For the propose of eliminating the chattering phenomenon, an arctangent function atan ( s / ϵ ) with a small positive scalar ϵ = 0.005 is employed to replace the sign function sgn ( s ) in (23).
By considering the SMC law u d , the sliding-mode dynamics can be obtained as
x ˙ = f ( x ) + g ( x ) u c + k ( x ) d ,
where k ( x ) = I g ( x ) g + ( x ) = [ 1 , 0 ; 0 , 0 ] . We choose the associated cost function as the form of (32), together with Q = diag ( 1 , 1 ) , R = 1 and γ = 1.5 . For the critic NN, the activation function is chosen as σ c ( x ) = [ x 1 2 , x 1 x 2 , x 2 2 , x 1 3 x 2 , x 1 2 x 2 2 , x 1 x 2 3 ] T , which results in W ^ c = [ W ^ c 1 , W ^ c 2 , , W ^ c 6 ] T . Select the updating ratios α = 1 , β = 0.5 , the design parameters F 1 = F 2 = 10 I , l c = 6 and J a ( x ) as a quadratic polynomial. Furthermore, the weight vector W ^ c is initialized to zero, which leads to the initial control input of zero. Noticing that the zero initial control cannot make the system (65) stable, it is thus clear that no initial stabilizing control strategy is necessary when implementing the proposed algorithm.
During the learning process, a damped decreasing probing noise is injected into the control input for satisfying PE condition. This noise comprises sinusoids of diverse frequencies and is applied for the first 450 s. Figure 2 shows the trajectories of the critic weights, which eventually converge to W ^ c = [ 1.0420 , 0.0856 , 0.0603 , 0.2174 , 0.2948 , 0.0358 ] T . Figure 3 describes the trajectories of system states in the learning. From Figure 3, one can see that without an initial stabilizing control, the system states stay at or near zero after the probing noise is removed, which indicates that u ^ c generated by the learning module can effectively stabilize the system. With the converged weights, the approximate H optimal control u ^ c can be calculated by (46).
Next, we substitute u ^ c into (21) to obtain an available sliding surface. Subsequently, integrating with the enhanced observer system, the SMC law u d is implemented by using (23) and (24) with the reliable estimations of uncertainties and disturbances. Figure 4 depicts the estimates of disturbances d 1 = 0.5 e t sin ( t ) and d 2 = 0.5 sin ( t ) , along with small estimation errors. Figure 5 presents the identifications of system states using the identifier NN. It can be observed that the identified states rapidly track the real states, illustrating the effectiveness and efficiency of the identifier NN. Note that the valid estimations d ^ and W ^ o are used to design the SMC law u d , which helps to reduce the sliding-mode gain and alleviate the chattering phenomenon. Figure 6 displays the state trajectories of the robot arm under the compound H sliding-mode control u = u d + u ^ c . Figure 7 depicts the compound control u, while the H control u ^ c and the SMC law u d are given in Figure 8. These results presented in Figure 6, Figure 7 and Figure 8 confirm that the compound control u successfully renders the robot arm system stable and exhibits satisfactory performance against both system uncertainties and external disturbances.

5.2. Power Plant System

To further validate the effectivity of the proposed scheme, we consider an electric power system comprised of a gas turbine generator, a system load, and an automatic generation control [34]. To model this system, the incremental frequency deviation Δ f G , the generator output power variation Δ P m , and the valve position change of the governor Δ v are taken into consideration. The control input is represented by the speed change Δ P c in position deviation. By defining the state vector x = [ Δ v , Δ P m , Δ f G ] T R 3 , we can express the reduced power system model in state-space form as
x ˙ = 1 T g 0 1 R g T g K t T t 1 T t 0 0 K p T p 1 T p x + 1 T g 0 0 ( u + ϑ ) + d
where g ( x ) = [ 1 / T g , 0 , 0 ] T , ϑ represents the modeling uncertainty, and d stands for the exterior disturbances. Assume that the uncertain term is ϑ = x 2 sin ( x 1 ) , and the disturbance term is defined as d ( t ) = [ sin ( 2 π t ) e t , 0 , 0.2 sin 2 ( t ) e t ] T in the simulation. Let the regulation constant R g = 2.5 Hz/MW, the turbine gain constant K t = 1 s and the generator gain constant K p = 120 Hz/MW. Moreover, the corresponding time constants are set as T g = 0.08 s, T t = 0.1 s and T p = 20 s, respectively.
For estimating the unknown uncertainty and disturbance terms, the enhanced observer system is constructed as (6) with a three-layered feedforward NN containing eight hidden neurons and the Hurwitz matrix A = [ 12 , 0 , 0 ; 0 , 12 , 0 ; 0 , 0 , 12 ] . The activation function, the initial weights, and the updating ratios are the same as in Section 5.1. Let p ( x ) = [ 10 x 1 , 0 , 10 x 3 ] T , l ( x ) = [ 10 , 0 , 0 ; 0 , 0 , 0 ; 0 , 0 , 10 ] , G ( x ) = g + ( x ) = [ 0.08 , 0 , 0 ] and S 0 ( x ) = 0.08 x 1 . Similarly, an arctangent function atan ( s / ϵ ) is used for designing the SMC law u d instead of the sign function sgn ( s ) .
Without the matched uncertainties and disturbances, we can derive the sliding-mode dynamics from (66), wherein k ( x ) = [ 0 , 0 , 0 ; 0 , 1 , 0 ; 0 , 0 , 1 ] , and the initial state x 0 = [ 0.2 , 0.2 , 0.1 ] T . Let the associated cost function be of the form (32) along with Q = diag ( 1 , 1 , 1 ) , R = 1 and γ = 3 . The critic NN is designed as (44) and its corresponding parameters are α = 15 , β = 0.5 , σ ( x ) = [ x 1 2 , x 1 x 2 , x 1 x 3 , x 2 2 , x 2 x 3 , x 3 2 , x 1 2 x 2 x 3 , x 1 x 2 2 x 3 , x 1 x 2 x 3 2 ] T and W ^ c = [ W ^ c 1 , W ^ c 2 , , W ^ c 9 ] T . Similar to Section 5.1, J a ( x ) = x T x / 2 , the initial weight vector is set to zero, and a similar probing noise is injected into the control input before 550 s. The evolving trajectories of the critic weights are shown in Figure 9, while the trajectories of system states in the learning are depicted in Figure 10. After 550 s, the critic weights converge to W ^ c = [ 0.0830 , 0.1245 , 0.2284 , 0.1616 , 0.4883 , 0.5488 , 0.1154 , 0.0563 , 0.0564 ] T , then we can derive u ^ c using (46) with the converged weights.
Then, we substitute u ^ c into the integral sliding surface (21), and we design the SMC law u d by (23) and (24). Consequently, the compound control is constructed as u = u d + u ^ c . After simulation, Figure 11 shows the trajectories of the power system states under this compound control for 15 s. Figure 12 presents the compound control u. From Figure 11 and Figure 12, we can conclude that the compound control effectively stabilizes the system states to the equilibrium point, even in the presence of modeling uncertainties and exterior disturbances. These results undeniably demonstrate the viability and efficiency of the proposed approach.

6. Conclusions

In this paper, we develop a neural adaptive H sliding-mode control scheme for uncertain nonlinear systems subject to external disturbances. Based on the enhanced observer system composed of the NN identifier and nonlinear DO, an integral SMC is designed for suppressing the influences of the uncertain term and the matched disturbance component, as well as unknown approximation errors, with no prior knowledge of their upper bounds. Meanwhile, on the sliding surface, the remaining unmatched disturbances are attenuated using the H optimal control solved by the single critic network-based ADP algorithm. Furthermore, uniform ultimate boundedness stability of the resultant closed-loop system can be proven by Lyapunov’s method. In addition to the theoretical analysis, two simulation examples are provided to further validate the proposed approach. Recently, the growing interest in saving communication resources or reducing the calculation amount of networked control systems makes the event-triggering mechanism gain more and more attention and undergo rapid development. Hence, how to combine the optimal SMC strategy with the event-triggering mechanism for more complex physical systems, not just for control-affine systems, will be our future research topic.

Author Contributions

Y.H. contributed to the conception and design of this study, performed the experiment and the data analysis, and wrote the manuscript; Z.Z. contributed to the results discussion and the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Advanced Talents Incubation Program of Hebei University under Grant 521100221049, in part by Hebei Province Higher Education Science and Technology Research Project of China under Grant CXY2023009, in part by Hebei University Research and Innovation Team Project under Grant IT202306, and in part by Baoding Science and Technology Plan Project under Grant 2372P010.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors can confirm that all relevant data are included in the article.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

  1. Ioannou, P.; Sun, J. Robust Adaptive Control; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
  2. Utkin, V.; Guldner, J.; Shi, J. Sliding Mode Control in Electro-Mechanical Systems; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  3. Yu, X.; Kaynak, O. Sliding-mode control with soft computing: A survey. IEEE Trans. Ind. Electron. 2009, 56, 3275–3285. [Google Scholar]
  4. Xu, J.; Guo, Z.; Tong, H. Design and implementation of integral sliding-mode control on an underactuated two-wheeled mobile robot. IEEE Trans. Ind. Electron. 2014, 61, 3671–3681. [Google Scholar] [CrossRef]
  5. Chen, L.; Edwards, C.; Alwi, H. Integral sliding mode fault-tolerant control allocation for a class of affine nonlinear system. Int. J. Robust Nonlinear 2019, 29, 565–582. [Google Scholar] [CrossRef]
  6. Pan, Y.; Yang, C.; Pan, L.; Yu, H. Integral sliding mode control: Performance, modification, and improvement. IEEE Trans. Ind. Inform. 2017, 14, 3087–3096. [Google Scholar] [CrossRef]
  7. Errouissi, R.; Ouhrouche, M.; Chen, W.; Trzynadlowski, A. Robust nonlinear predictive controller for permanent-magnet synchronous motors with an optimized cost function. IEEE Trans. Ind. Electron. 2012, 59, 2849–2858. [Google Scholar] [CrossRef]
  8. Huang, J.; Ri, S.; Fukuda, T.; Wang, Y. A disturbance observer based sliding mode control for a class of underactuated robotic system with mismatched uncertainties. IEEE Trans. Autom. Control 2019, 64, 2480–2487. [Google Scholar] [CrossRef]
  9. Cui, R.; Chen, L.; Yang, C.; Chen, M. Extended state observer-based integral sliding mode control for an underwater robot with unknown disturbances and uncertain nonlinearities. IEEE Trans. Ind. Electron. 2017, 64, 6785–6795. [Google Scholar] [CrossRef]
  10. Wang, Y.; Xie, X.; Chadli, M.; Xie, S.; Peng, Y. Sliding-mode control of fuzzy singularly perturbed descriptor systems. IEEE Trans. Fuzzy Syst. 2020, 29, 2349–2360. [Google Scholar] [CrossRef]
  11. Chen, M.; Chen, W. Sliding mode control for a class of uncertain nonlinear system based on disturbance observer. Int. J. Adapt. Control Signal Process 2010, 24, 51–64. [Google Scholar] [CrossRef]
  12. Rubagotti, M.; Estrada, A.; Castanos, F.; Ferrara, A. Integral sliding mode control for nonlinear systems with matched and unmatched perturbations. IEEE Trans. Autom. Control 2011, 56, 2699–2704. [Google Scholar] [CrossRef]
  13. Castanos, F.; Fridman, L. Analysis and design of integral sliding manifolds for systems with unmatched perturbations. IEEE Trans. Autom. Control 2006, 51, 853–858. [Google Scholar] [CrossRef]
  14. Kiumarsi, B.; Vamvoudakis, K.G.; Modares, H.; Lewis, F.L. Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2042–2062. [Google Scholar] [CrossRef] [PubMed]
  15. Liu, D.; Wei, Q.; Wang, D.; Yang, X.; Li, H. Adaptive Dynamic Programming with Applications in Optimal Control; Springer: Cham, Switzerland, 2017. [Google Scholar]
  16. Lewis, F.L.; Liu, D. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
  17. Ha, M.; Wang, D.; Liu, D. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J. Autom. Sin. 2022, 9, 1262–1272. [Google Scholar] [CrossRef]
  18. Wei, Q.; Lewis, F.L.; Liu, D.; Song, R.; Lin, H. Discrete-time local value iteration adaptive dynamic programming: Convergence analysis. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 875–891. [Google Scholar] [CrossRef]
  19. Wei, Q.; Wang, L.; Lu, J.; Wang, F.Y. Discrete-Time Self-Learning Parallel Control. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 192–204. [Google Scholar] [CrossRef]
  20. Heydari, A.; Balakrishnan, S. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 145–157. [Google Scholar] [CrossRef] [PubMed]
  21. Lu, J.; Wei, Q.; Wang, Z.; Zhou, T.; Wang, F. Event-triggered optimal control for discrete-time multi-player non-zero-sum games using parallel control. Inf. Sci. 2022, 584, 519–535. [Google Scholar] [CrossRef]
  22. Wang, D.; Ren, J.; Ha, M. Discounted linear Q-learning control with novel tracking cost and its stability. Inf. Sci. 2023, 626, 339–353. [Google Scholar] [CrossRef]
  23. Zhang, X.; Ni, Z.; He, H. A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2513–2525. [Google Scholar] [CrossRef]
  24. Huang, Y.; Wang, D.; Liu, D. Bounded robust control design for uncertain nonlinear systems using single-network adaptive dynamic programming. Neurocomputing 2017, 266, 128–140. [Google Scholar] [CrossRef]
  25. Yang, X.; Wei, Q. Adaptive critic designs for optimal event-driven control of a CSTR system. IEEE Trans. Ind. Inform. 2021, 17, 484–493. [Google Scholar] [CrossRef]
  26. Yang, X.; He, H.; Zhong, X. Approximate dynamic programming for nonlinear-constrained optimizations. IEEE Trans. Cybern. 2021, 51, 2419–2432. [Google Scholar] [CrossRef] [PubMed]
  27. Wen, G.; Niu, B. Optimized tracking control based on reinforcement learning for a class of high-order unknown nonlinear dynamic systems. Inf. Sci. 2022, 606, 368–379. [Google Scholar] [CrossRef]
  28. Wang, D.; Qiao, J.; Cheng, L. An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans. Cybern. 2022, 52, 77–86. [Google Scholar] [CrossRef] [PubMed]
  29. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 142–160. [Google Scholar] [CrossRef]
  30. Wang, D.; Ha, M.; Zhao, M. The intelligent critic framework for advanced optimal control. Artif. Intell. Rev. 2022, 55, 1–22. [Google Scholar] [CrossRef]
  31. Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially-unknown constrained input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
  32. Luo, B.; Wu, H.; Huang, T. Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 2014, 45, 65–76. [Google Scholar] [CrossRef]
  33. Modares, H.; Lewis, F.L.; Jiang, Z. H tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2550–2562. [Google Scholar] [CrossRef]
  34. Wang, D.; He, H.; Liu, D. Adaptive critic nonlinear robust control: A survey. IEEE Trans. Cybern. 2017, 47, 3429–3451. [Google Scholar] [CrossRef]
  35. Mitra, A.; Behera, L. Continuous-time single network adaptive critic based optimal sliding mode control for nonlinear control affine systems. In Proceedings of the 34th Chinese Control Conference, HangZhou, China, 28–30 July 2015; pp. 3300–3306. [Google Scholar]
  36. Fan, Q.; Yang, G. Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 165–177. [Google Scholar] [CrossRef] [PubMed]
  37. Qu, Q.; Zhang, H.; Yu, R.; Liu, Y. Neural network-based H sliding mode control for nonlinear systems with actuator faults and unmatched disturbances. Neurocomputing 2018, 275, 2009–2018. [Google Scholar] [CrossRef]
  38. Zhang, H.; Qu, Q.; Xiao, G.; Cui, Y. Optimal guaranteed cost sliding mode control for constrained-input nonlinear systems with matched and unmatched disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2112–2126. [Google Scholar] [CrossRef] [PubMed]
  39. Yang, D.; Li, T.; Xie, X.; Zhang, H. Event-triggered integral sliding-mode control for nonlinear constrained-input systems with disturbances via adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 4086–4096. [Google Scholar] [CrossRef]
Figure 1. The schematic of the adaptive H SMC scheme.
Figure 1. The schematic of the adaptive H SMC scheme.
Entropy 25 01570 g001
Figure 2. Trajectories of the critic NN weights.
Figure 2. Trajectories of the critic NN weights.
Entropy 25 01570 g002
Figure 3. Trajectories of system states in the learning.
Figure 3. Trajectories of system states in the learning.
Entropy 25 01570 g003
Figure 4. (a) Real disturbance d 1 and its estimation d ^ 1 , (b) Real disturbance d 2 and its estimation d ^ 2 .
Figure 4. (a) Real disturbance d 1 and its estimation d ^ 1 , (b) Real disturbance d 2 and its estimation d ^ 2 .
Entropy 25 01570 g004
Figure 5. (a) Real state x 1 and identified state x ^ 1 , (b) Real state x 2 and identified state x ^ 2 .
Figure 5. (a) Real state x 1 and identified state x ^ 1 , (b) Real state x 2 and identified state x ^ 2 .
Entropy 25 01570 g005
Figure 6. State trajectories of the robotic arm.
Figure 6. State trajectories of the robotic arm.
Entropy 25 01570 g006
Figure 7. The compound control u.
Figure 7. The compound control u.
Entropy 25 01570 g007
Figure 8. (a) The H optimal control u ^ c . (b) The SMC law u d .
Figure 8. (a) The H optimal control u ^ c . (b) The SMC law u d .
Entropy 25 01570 g008
Figure 9. Trajectories of the critic NN weights.
Figure 9. Trajectories of the critic NN weights.
Entropy 25 01570 g009
Figure 10. Trajectories of system states in the learning.
Figure 10. Trajectories of system states in the learning.
Entropy 25 01570 g010
Figure 11. Trajectories of the electric power system.
Figure 11. Trajectories of the electric power system.
Entropy 25 01570 g011
Figure 12. The compound optimal control.
Figure 12. The compound optimal control.
Entropy 25 01570 g012
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhang, Z. Neural Adaptive H Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. Entropy 2023, 25, 1570. https://doi.org/10.3390/e25121570

AMA Style

Huang Y, Zhang Z. Neural Adaptive H Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming. Entropy. 2023; 25(12):1570. https://doi.org/10.3390/e25121570

Chicago/Turabian Style

Huang, Yuzhu, and Zhaoyan Zhang. 2023. "Neural Adaptive H Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming" Entropy 25, no. 12: 1570. https://doi.org/10.3390/e25121570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop