Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation

This paper presents an adaptive learning structure based on neural networks (NNs) to solve the optimal robust control problem for nonlinear continuous-time systems with unknown dynamics and disturbances. First, a system identifier is introduced to approximate the unknown system matrices and disturbances with the help of NNs and parameter estimation techniques. To obtain the optimal solution of the optimal robust control problem, a critic learning control structure is proposed to compute the approximate controller. Unlike existing identifier-critic NNs learning control methods, novel adaptive tuning laws based on Kreisselmeier’s regressor extension and mixing technique are designed to estimate the unknown parameters of the two NNs under relaxed persistence of excitation conditions. Furthermore, theoretical analysis is also given to prove the significant relaxation of the proposed convergence conditions. Finally, effectiveness of the proposed learning approach is demonstrated via a simulation study.


Introduction
In the past several decades, much attention has been given to H ∞ control problems, wherein the aim is to eliminate the influence of disturbance on the system.H ∞ control mainly focuses on designing a robust controller to regulate and stabilize the system.In practice, we should not only focus on the control performance, but also consider the optimization of the system [1,2].Therefore, optimal H ∞ control problems will always be a hot research topic.
Adaptive dynamic programming (ADP), as one of the optimal control methods, has emerged as a powerful tool through which to deal with the optimal control problems of all kinds of dynamic systems [3].The ADP framework combines dynamic programming and neural network approximation, and it has strong learning and adaptive ability.In this sense, ADP has rapidly developed in the control community in recent years.Generally speaking, the core of controller designs mainly concentrates on solving a Hamilton-Jacobi-Bellman (HJB) equation for nonlinear systems or an algebraic Riccati equation for linear systems [4].Unfortunately, the HJB equation contains nonlinear, partial differential parts, which are difficult to solve directly [5].Therefore, many efforts have been made for finding approximate solutions to the HJB equation using iterative or learning methods.Regarding the case of iterative methods, the ADP can be classed into two categories: value iteration (VI) [6,7] and policy iteration (PI) [8,9].Regarding the case of learning-based methods, neural network (NN) approximation is generally utilized to learn the optimal or suboptimal solutions to the HJB equation.The standard learning frameworks include the following: actor-critic NNs and only-critic NNs.However, the abovementioned pieces of literature require partial or full model information in the controller design loop.To avoid relying on system models, many data-driven or model-free methods have been developed for improving the existing ADP frameworks, that is, data-driven RL [7], integral RL (IRL) [10,11], and system identification-based ADP methods [12][13][14].
More recently, excellent development has been realized with the use of ADP for the robust controller designs of optimal H ∞ control problems [15][16][17].The main way through which to solve optimal H ∞ control problems is to model such problems as a two-player zero-sum game (min-max optimization problem), where the controller and the disturbance are viewed as players that try to find a controller to minimize the performance index function in worst-case disturbance conditions [18,19].However, the disadvantage of zerosum games is in judging the existence of the saddle point, which is generally difficult to judged.In order to overcome this issue, an indirect method motivated by [20] was developed by formulating an optimal regulation for a nominal system with new designs of the cost/value function [21].For instance, Yang et al. proposed an event-triggered robust control strategy for nonlinear systems [22] using the indirect method.Xue et al. studied a tracking control problem for partial continuous-time systems with uncertainties and constraints [23] by transforming the robust control problem into an optimal regulation of nominal systems.
However, the existing results on H ∞ optimal control designs have two main characteristics: (1) their controller designs are based on the assumption that the complete or partial knowledge of the system dynamics are known in advance; however, (2) to address this issue, some system identification methods have been proposed, such as the identifier-criticor identifier-actor-critic-based designs of H ∞ optimal control.However, it is generally required that the persistence of excitation (PE) condition must be satisfied to ensure the learning performance of the weight updating of neural networks, which is difficult to check online in practice [18,19,23].Therefore, how to weaken the PE condition is also the research motivation of this paper.
From the abovementioned observations and considerations, in this paper, we propose a novel online parameter estimation method based on an identifier-critic learning control framework for the H ∞ optimal control of nonlinear systems that have unknown dynamics with relaxed PE conditions.The contributions of our work can be summarized as follows: • A new online identifier-critic learning control framework with a relaxed PE condition is proposed to address robust control for unknown continuous-time systems subject to unknown disturbances.To reconstruct the information of the system dynamics, neural networks combined with the linear regressor method are established to approximate the unknown system dynamics and disturbances.

•
The approach in this paper is different from the existing weight adaption laws [18,19,23], where the PE condition is needed to ensure the learning performance of the NN's weight parameters.However, such a condition is difficult to check online, and a general way through which to satisfy this condition is to add external noise to the controller, which may lead to the instability of the system.To overcome this issue, a Kreisselmeier regressor extension and mixing (KREM)-based weight adaption law is designed for identifier-critic NNs with new convergence conditions.

•
Weak PE properties of new convergence conditions are analyzed rigorously compared to traditional PE conditions.Moreover, the theoretical results indicate that the closedloop system's stability and the convergence of identifier-critic learning are guaranteed.
The remainder of this article is organized as follows.In Section 2, some preliminaries are introduced and the optimal robust control problem of nonlinear continuous-time systems is given.Then, a system identifier design with a relaxed PE condition is constructed in Section 3. Section 4 gives the critic NN design for robust control under a relaxed PE condition.Theoretical analyses of the weak PE properties under new convergence conditions and the stability of the closed-loop systems are given in Section 5.The simulation results are provided in Section 6.Some conclusions are summarized in Section 7.

Preliminaries and Problem Formulation
In this section, some notation and definitions are first introduced.Then, the optimal robust control problem of the nonlinear continuous-time systems is described.

Preliminaries
To facilitate readability, some notations are listed.

λ(•)
Eigenvalue of a matrix {•} * Adjoint matrix The following definitions will be used in the sequel.[24]).A bounded signal ψ(t) is said to be PE, if there exist positive constants T and δ 1 such that t+T t ψ(r)ψ T (r)dr ≥ δ 1 I.

Problem Formulation
Consider the nonlinear continuous-time (NCT) systems with disturbances described by the following dynamics: where x(t) ∈ R n and u(t) ∈ R m denote the system state and control input, respectively.d(t) ∈ R q represents the external disturbance.The terms f (x) ∈ R n , g(x) ∈ R n×m , and G(x) ∈ R n×q are the drift dynamics, input dynamics, and disturbance injection dynamics, respectively.In this study, f (x), g(x), and G(x) are assumed to be unknown.Furthermore, it is assumed that f (x), g(x), and G(x) are Lipschitz continuous with f (0) = 0, and that the system (1) is stabilizing and controllable.The goal of this study is to solve an H ∞ control problem for the system (1).This problem can be equivalently transformed into a two-player zero-sum game, where the control input u(t) acts as the minimizing player and the disturbance d(t) acts as the maximizing player.The solution to the H ∞ control problem corresponds to a saddle point in the game, which stabilizes the equilibrium of the two-player zero-sum game.
Define the infinite-horizon performance index function as where κ > 0, V(0) = 0, and Q and R are symmetric positive-definite matrices with appropriate dimensions.Let u ⋆ be the optimal control input and d ⋆ be the worst disturbance.
Our objective is to find the saddle point (u ⋆ , d ⋆ ) that optimizes the performance index (2), which can be more precisely clarified by the following inequality: We then define the optimal performance index function V ⋆ as follows: The Hamiltonian of system (1) can be written as where V x = ∂V/∂x ∈ R n .The Hamilton-Jacobi-Isaacs (HJI) equation related to this game has the form min where Based on the stationarity condition, the H ∞ control pair (u ⋆ , d ⋆ ) for (1) has the following form: Thus, according to ( 7) and ( 8), the HJI Equation ( 6) can be rewritten as Indeed, the HJI Equation ( 9) represents a highly nonlinear partial differential equation (PDE) and requires complete system information for its resolution.To address these challenges, a new IC framework with relaxed PE conditions will be proposed in the following sections.Furthermore, new adaptive update laws for the identifier and critic NNs are provided with the help of the KREM technique.The block diagram of the proposed control system is shown in Figure 1, and detailed theoretical analysis will be presented in subsequent sections.

System Identifier Design with Relaxed PE Condition
In this section, an NN-based identifier is utilized to reconstruct the unknown system dynamics in (1).The KREM technique is introduced to adjust the identifier weights under relaxed PE conditions.We assume that the unknown system dynamics f (x), g(x), and G(x) in ( 1) are continuous functions defined on compact sets.The NN-based identifier is designed as follows: where are the basis functions; and ϵ f ∈ R n , ϵ g ∈ R n×m and ϵ G ∈ R n×q are the reconstruction errors.Then, according to the Weierstrass theorem and the statements in [10], the approximation errors ϵ f , ϵ g , and ϵ G can be shown to approach zero as the number of NN neurons d f , d g , and d G increases to infinity.Before proceeding, it is essential to establish the following underlying assumption.

Assumption 1.
(1) The basis functions (2) The reconstruction errors ε f , ε g and ε G are bounded, that is, Using ( 10)-( 12), the system (1) can be rewritten as where Note that ẋ and W I are unknown.Therefore, we define the filtered variables x f and θ I f as where ρ ∈ R > 0 is the filter coefficient.From Equations ( 13) and ( 14), we can deduce that where ϵ T f denotes the filtered version of ϵ T as ρ εT f + ϵ T f = ϵ T .Clearly, ( 15) is a linear regression equation (LRE), where ẋ f and θ I f can be calculated from ( 14).In the following, we describe how the KREM technique is applied to estimate W I by using the measured information ẋ f and θ I f .To approximate the unknown weights W I in (15) such that the estimated weights ŴI converge to their true values under a relaxed PE condition, we aim to construct an extended LRE (E-LRE) based on (15).We define the matrices P I ∈ R d×d and Q I ∈ R d×n as follows: where with p = d/dt, l I > 0 is a forgetting factor.From ( 16), we can derive its solution as Note that it can be verified that P I and Q I are bounded for any given bounded θ I and x due to the appropriate choice of l I .Thus, an E-LRE is obtained where To construct an identifier weight error dynamics that achieves better convergence properties, we define the variables Q I (t) ∈ R d×n , P I ∈ R d×d , and V I ∈ R d×n as follows: Then Equation ( 18) becomes Note that for any square matrix M ∈ R q×q , we have M * M = |M|I q , even if M is not full rank.Thus, P I = |P I |I d ∈ R d×d .Moreover, P I is a scalar diagonal matrix, where (20) can be decoupled into a series of scalar LREs: where Q I (i,j) and W I (i,j) indicate the ith row and jth column of Q I and W I , respectively.Then, the estimation algorithm for the unknown identifier NN weights can be designed based on (21) as follows: where γ 1 ∈ R > 0 presents the adaptive learning gain.The convergence of identifier ( 22) can be given as follows.
Theorem 1.Consider the system (13) with the online update law (22); if |P I | ∈ PE, then (i) for ϵ T = 0, the estimator error WI(i,j) converges to zero exponentially; (ii) for ϵ T ̸ = 0, the estimator error WI(i,j) converges to a compact set around zero. Proof.
Defining the estimation error WI(i,j) = ŴI(i,j) − W I(i,j) , i = 1, . . ., d, j = 1, . . ., n. Due to ( 21) and ( 22), the identifier weight error dynamics can be obtained Considering the Lyapunov function In fact, when ϵ T = 0, (24) can be rewritten as where µ I = 2γ 1 δ I > 0. According to the Lyapunov theorem, the weight estimation error WI(i,j) exponentially converges to zero.When ϵ T ̸ = 0, ( 24) can be further presented as According to Assumption 1, |P I |V I(i,j) is bounded, denoted as ∥|P According to the extended Lyapunov theorem, the estimation error WI(i,j) uniformly ultimately converges to a compact set { WI(i,j) ||| WI(i,j) || ≤ b P I V I /p 2 I }.
Remark 1.In [12], the update law for the unknown weight W I was designed based on (18) In contrast, assessing the standard PE condition directly online is not possible [18,19,23].
Based on the above analysis, the unknown information f (x), g(x), and G(x) can be estimated using ( 13) and (22).This allows for the reconstruction of the completely unknown system dynamics.In order to obtain the optimal H ∞ control pair, the critic NN will be introduced to learn the solution of the HJB equation in the subsequent section.

Critic NN Design for H ∞ Control under Relaxed PE Condition
In this section, the performance index will be approximated via a critic NN to obtain the optimal H ∞ control pair.The KREM algorithm will be continually utilized to design the update law of critic NN under the relaxed PE condition.Firstly, based on the above identifier, the system (1) can be represented as where Ŵf , Ŵg and ŴG are the estimated values of W f , W g and W G , respectively.ϵ I = WI θ I denotes the identifier error.And, the Hamiltonian (5) can be further written as Then, the HJI Equation ( 6) becomes Therefore, based on (30), the H ∞ control pair (u ⋆ , d ⋆ ) for the estimated system (28) can be expressed as follows: Since the HJI Equation ( 30) is a nonlinear PDE, similar to (6), we utilize a critic NN to estimate V ⋆ (x) and its gradient V ⋆ x (x) as follows: where W c ∈ R l is the unknown constant weight.θ c (x) ∈ R l represents the independent basis function with ∇θ c (x) = ∂θ c /∂x. l is the number of neurons.The approximation error is presented as ϵ v with ∇ϵ v = ∂ϵ v /∂x.Note that as the number of independent basis functions increases, both the approximation errors and their gradients can approach zero.Before proceeding, the following assumption is needed.
Assumption 2. ( The basis functions θ c (x) and its gradients ∇θ c (x) are bounded, that is, The approximator reconstruction error ϵ v and its gradients ∇ϵ v are bounded, that is, Since the ideal critic NN weights W c are unknown, take Ŵc as the estimated value of W c and V as the estimated value of V, where the practical critic NN is given by The estimated H ∞ control pair û and d can be obtained as To online estimate the unknown weights of the critic NN using KREM technology, we aim to construct a linear equation according to (30) and (34) as where where a linear equation is obtained as follows: Similar to the previous section, we define the filtered regressor matrix P c ∈ R l×l and the vector Q c ∈ R l as follows: where and l c > 0 is the forgetting factor.Then, the solution of (40) can be deduced as (41) From ( 39) and ( 41), an E-LRE related to P c and Q c is obtained where v c = t 0 e −l c (t−τ) Θ(τ)ϵ T H J I (τ)dτ is bounded.To estimate the unknown parameter W c in (42) under a relaxed PE condition, define the variables Q c (t) ∈ R l , P c ∈ R l×l , and Then Equation (42) becomes Note that P c = |P c |I l .Since P c is a scalar matrix, a series of scalar LREs is obtained as where Q c(i) , W c(i) and V c(i) indicate the ith rows of Q c , W c , and V c , respectively.Driven by the parameter error based on (45), the critic unknown weight W c(i) is designed as where γ 2 ∈ R > 0 presents the adaptive learning gain.The convergence condition for the proposed critic NN adaptive law is provided in Theorem 2.
Theorem 2. For adaptive law (46) of critic NN with the regressor matrix P c in (44); if |P c | ∈ PE, then (i) for ϵ H J I = 0, the estimator error Wc(i) converges to zero exponentially; (ii) for ϵ H J I ̸ = 0, the estimator error Wc(i) converges to a compact set around zero; Proof.Defining the estimation error Wc(i) = Ŵc(i) − W c(i) , i = 1, . . ., l.The proofs presented in Theorem 1 can be extended to establish similar results in the current context.Note that the Lyapunov function V c here is chosen as 0.5γ −1
Remark 2. According to Theorem 2, a new convergence condition for the estimation error of the critic neural network weights, denoted as Wc , is provided.This condition does not rely on the conventional parameter estimation (PE) condition, i.e., Θ ∈ PE.In this paper, the additional exploration signal is not required to guarantee Θ ∈ PE.Instead, the satisfaction of |P c | ∈ PE can be achieved by adjusting the forgetting factor l c .It is worth noting that the new convergence condition is associated with the matrix P c , and it can be verified online by calculating the determinant of P c .The proof of the weak PE property for the new convergence condition will be presented in the following section.
Remark 3. The convergence analysis of WI(i,j) and Wc(i) are provided in Theorem 1 and Theorem 2, respectively.In fact, we can derive the convergence of WI and Wc using simple matrix operations, which will be omitted in this paper.
Till now, the identifier-critic learning-based framework for H ∞ optimal control under the relaxed PE condition is given.For clarity, the design details of the proposed method are shown in Algorithm 1, which can be considered the pseudocode for the simulation part.
Algorithm 1 Identifier-critic learning-based H ∞ optimal control algorithm 1: Initialization 2: Initialize system parameters: x(0), Q, R and running time T; 3: Set the identifier and critic filter operators: H I and H c ; 4: Set the basis functions of identifier and critic NNs: θ I (x, u) and θ c (x); 5: Initialize and set the filter operator parameters: ρ, l I , l c , x f (0) = 0, θ I f (0) = 0 and ϵ I f (0) = 0; 6: Initialize identifier NNs parameters: γ 1 > 0, Ŵinitial Calculate the filter processing of the identifier NNs by (14); Calculate the dynamic regressor extension (DRE) of the identifier NNs by (15); Calculate the regressor "mixing" of the identifier NNs by (18); Update the weight parameters of the identifier NNs ŴI(i,j) by (20); Compute the approximated HJB equation by (39);  Update the weight parameters of the critic NNs Ŵc(i) by (46); Update the control pair by (36) and (37); 19: Update the system states x by (28); 20: end while

Stability and Convergence Analysis
In this section, we present the main results, which include the theoretical analysis of weak PE properties under new convergence conditions proposed in Theorems 1 and 2. Furthermore, we provide a stability result for the closed-loop system under the proposed online learning optimal control method.
To facilitate the analysis, the following assumption is made.

Weak PE Properties of New Convergence Conditions
As shown in Theorem 1, Theorem 2 and Remark 3, the convergence of WI and Wc is established without the restrictive PE condition, i.e., θ I ∈ PE and Θ ∈ PE.These new convergence conditions can be easily checked online, as mentioned in Remarks 1 and 2. Furthermore, we will analyze the superiority of the new convergence conditions compared to the conventional PE condition from a theoretical standpoint.Theorem 3. Consider the system (13) with the online identifier NN adaptive law (22) Proof.For (i), suppose that θ I (t) in ( 13) is PE, indicating that θ I f (t) ∈ PE [25].From Definition 1, we have Moreover, since e −β I (t−r) ≥ e −β I τ > 0 with r ∈ [t − τ, t], the following inequality holds Furthermore, for t > τ > 0, we also have From ( 17), ( 52) and (53), we conclude that Hence, the matrix P I in ( 16) is positive definite, that is, λ i (P I ) > 0, i = 1, . . ., d. Considering that the determinant of a matrix is equal to the product of all its eigenvalues, that is, |P I | = λ 1 (P I )λ 2 (P I ) . . .λ d (P I ), we obtain λ i (P Thus, (47) is true.
The proof of (48) is established by the following: For (ii), the proof process can be referred to in (i).This finishes the proof.

Stability and Convergence Analysis
The stability result for the closed-loop system under the proposed online learning optimal control method will be presented in the following theorem.Theorem 4. Let Assumptions 1 and 2 hold.Considering system (1) with the identifier weight tuning law given by (22), the H ∞ control pair are computed by (36) and (37), respectively.The critic NN weight tuning laws are updated by (46).If |P I | ∈ PE and |P c | ∈ PE, then the closedloop system, system identifier estimation error WI , and critic estimation error Wc are uniformly ultimately bounded (UUB).Moreover, the approximated H ∞ control pair given by ( 36) and (37) are close to the optimal control pair u ⋆ and d ⋆ within a small region b u and b d , that is, Proof.We consider the Lyapunov function as follows: where γ 3 , γ 4 , γ 5 and γ 6 are positive constants.
By applying matrix operations, we can obtain the following: According to Definition 1, 19), (43), and using Young's inequality ab ≤ a 2 η/2 + b 2 /2η with η > 0, we have where ∥P * I ∥ ≤ b P * I , ∥P * c ∥ ≤ b P * c .For J 3 and J 4 , Ŵc , (60) can be rewritten as where b ω = Ŵc is a bounded variable.Recall that V I = P * I v I and vI = −l Since vc = −l c v c + Θϵ T H J I .Hence, the last term of (56) can be given as J6 ≤2γ 6 b 2 where b ϖ1 = Ŵg θ g and b ϖ2 = ŴG θ G are bounded variables.Consequently, we substitute (58), (59), and (61)-( 63) into (56); thus, we have (64) We choose the parameters γ 3 , γ 4 , γ 5 , γ 6 and η, fulfilling the following conditions Then, (64) can be further presented as where k 1 , k 2 , k 3 , k 4 , k 5 and b γ are positive constants which implies that the NN weight estimation errors WI , Wc and the system state x are all UUB.
Lastly, the error between the proposed H ∞ control pair and the ideal one are written as where b u > 0 and b d > 0 are constants determined by the identifier NN estimation error WI and the critic NN estimation error Wc .It proves that the approximate H ∞ control pair can converge to a set around the optimal solution.This completes the proof.

Numerical Simulation
This section aims to verify the effectiveness of the proposed KREM-based IC learning approach for optimal robust control.We consider the following NCT system [12] where f We choose the regressor of identifier NN as with the unknown identifier weight matrix given by The activation function in (33) for the critic NN is selected as The ideal critic NN weights were W c = [0.5, 0, 1] T .In this numerical example, several other parameters are set as follows: the initial values of the system states are x 1 (0) = 3 and x 2 (0) = −1.Q = I 2 and R = 1.The filter coefficients are ρ = 0.001, l I = 0.1, l c = 20, γ 1 = 800, γ 2 = 200diag{0.3,1, 1}.It is important to note that in this simulation, there is no need to add noise to the control input u(t) to ensure the PE condition.This condition is often necessary for many existing ADP-based control methods to ensure that θ I (t) ∈ PE and Θ(t) ∈ PE.
For comparison, we consider the Kreisselmeier's Regressor Extension (KRE) based identifier-critic network framework [12] for the system (66).Figures 2 and 3 display the convergence of the identifier NN weights and the critic NN weights, respectively, under our KREM-based optimal robust control method and the KRE-based control method [12].As illustrated in Figure 2, the KREM-based ADP method proposed in this paper exhibits faster convergence compared to the KRE-based ADP method.Furthermore, it demonstrates element-wise monotonicity, thus preventing oscillations and peaking in the learning curve.The trajectories of the approximate control input û and the estimated disturbance d are presented in Figures 4 and 5, respectively.By applying the optimal H ∞ control pair, the system states are stabilized, as depicted in Figure 6.

Conclusions
This paper presents a novel adaptive learning approach using neural networks (NNs) to address the problem of optimal robust control for nonlinear continuous-time systems with unknown dynamics.The approach involves employing a system identifier that utilizes NNs and parameter estimation techniques to approximate the unknown system matrices and disturbances.Additionally, a critic NNs learning structure is introduced to obtain an approximate controller that corresponds to the optimal control problem.Unlike existing identifier-critic NNs learning control methods, this approach incorporates adaptive tuning laws based on a regressor extension and mixing technique.These laws facilitate the learning of unknown parameters in the two NNs under relaxed persistence of excitation conditions.The convergence conditions of the proposed approach have been theoretically demonstrated.Finally, the effectiveness of the proposed learning control approach is validated via a simulation study.

Figure 1 .
Figure 1.Schematic of the proposed control system.

15 :
Calculate the dynamic regressor extension (DRE) of the critic NNs by (40); 16:Calculate the regressor "mixing" of the critic NNs by (42); where b u and b d are positive constants.
, while the PE condition (i.e., θ I ∈ PE) was required to ensure convergence.However, satisfying the PE condition is generally challenging.In Theorem 1, we provide a new convergence condition |P I | ∈ PE.Notably, this new condition is significantly superior to the conventional PE condition for two reasons.(1) We theoretically prove that |P I | ∈ PE is much weaker than θ I ∈ PE, as detailed in Section 5. (2) |P I | is directly related to the determinant of the matrix P I (t).Therefore, checking |P I | ∈ PE online becomes feasible by calculating the determinant of P I (t).
and critic NN adaptive law (46), (i) The convergence condition of estimation error WI in Theorem 1, that is, |P c | ∈ PE, is weaker than θ I ∈ PE in the following precise sense θ I (t) ∈ PE ⇒ |P c | ∈ PE, The convergence condition of estimation error Wc in Theorem 2, that is, |P c | ∈ PE, is weaker than Θ ∈ PE in the following precise sense