Next Article in Journal
Study on Microstructure and High Temperature Stability of WTaVTiZrx Refractory High Entropy Alloy Prepared by Laser Cladding
Previous Article in Journal
Leakage Benchmarking for Universal Gate Sets
Previous Article in Special Issue
Neural Adaptive H Sliding-Mode Control for Uncertain Nonlinear Systems with Disturbances Using Adaptive Dynamic Programming
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation

1
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
Institute of Electronic and Information Engineering, University of Electronic Science and Technology of China, Dongguan 523808, China
3
Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(1), 72; https://doi.org/10.3390/e26010072
Submission received: 11 December 2023 / Revised: 4 January 2024 / Accepted: 10 January 2024 / Published: 14 January 2024
(This article belongs to the Special Issue Intelligent Modeling and Control)

Abstract

:
This paper presents an adaptive learning structure based on neural networks (NNs) to solve the optimal robust control problem for nonlinear continuous-time systems with unknown dynamics and disturbances. First, a system identifier is introduced to approximate the unknown system matrices and disturbances with the help of NNs and parameter estimation techniques. To obtain the optimal solution of the optimal robust control problem, a critic learning control structure is proposed to compute the approximate controller. Unlike existing identifier-critic NNs learning control methods, novel adaptive tuning laws based on Kreisselmeier’s regressor extension and mixing technique are designed to estimate the unknown parameters of the two NNs under relaxed persistence of excitation conditions. Furthermore, theoretical analysis is also given to prove the significant relaxation of the proposed convergence conditions. Finally, effectiveness of the proposed learning approach is demonstrated via a simulation study.

1. Introduction

In the past several decades, much attention has been given to H control problems, wherein the aim is to eliminate the influence of disturbance on the system. H control mainly focuses on designing a robust controller to regulate and stabilize the system. In practice, we should not only focus on the control performance, but also consider the optimization of the system [1,2]. Therefore, optimal H control problems will always be a hot research topic.
Adaptive dynamic programming (ADP), as one of the optimal control methods, has emerged as a powerful tool through which to deal with the optimal control problems of all kinds of dynamic systems [3]. The ADP framework combines dynamic programming and neural network approximation, and it has strong learning and adaptive ability. In this sense, ADP has rapidly developed in the control community in recent years. Generally speaking, the core of controller designs mainly concentrates on solving a Hamilton–Jacobi–Bellman (HJB) equation for nonlinear systems or an algebraic Riccati equation for linear systems [4]. Unfortunately, the HJB equation contains nonlinear, partial differential parts, which are difficult to solve directly [5]. Therefore, many efforts have been made for finding approximate solutions to the HJB equation using iterative or learning methods. Regarding the case of iterative methods, the ADP can be classed into two categories: value iteration (VI) [6,7] and policy iteration (PI) [8,9]. Regarding the case of learning-based methods, neural network (NN) approximation is generally utilized to learn the optimal or suboptimal solutions to the HJB equation. The standard learning frameworks include the following: actor–critic NNs and only-critic NNs. However, the abovementioned pieces of literature require partial or full model information in the controller design loop. To avoid relying on system models, many data-driven or model-free methods have been developed for improving the existing ADP frameworks, that is, data-driven RL [7], integral RL (IRL) [10,11], and system identification-based ADP methods [12,13,14].
More recently, excellent development has been realized with the use of ADP for the robust controller designs of optimal H control problems [15,16,17]. The main way through which to solve optimal H control problems is to model such problems as a two-player zero-sum game (min–max optimization problem), where the controller and the disturbance are viewed as players that try to find a controller to minimize the performance index function in worst-case disturbance conditions [18,19]. However, the disadvantage of zero-sum games is in judging the existence of the saddle point, which is generally difficult to judged. In order to overcome this issue, an indirect method motivated by [20] was developed by formulating an optimal regulation for a nominal system with new designs of the cost/value function [21]. For instance, Yang et al. proposed an event-triggered robust control strategy for nonlinear systems [22] using the indirect method. Xue et al. studied a tracking control problem for partial continuous-time systems with uncertainties and constraints [23] by transforming the robust control problem into an optimal regulation of nominal systems.
However, the existing results on H optimal control designs have two main characteristics: (1) their controller designs are based on the assumption that the complete or partial knowledge of the system dynamics are known in advance; however, (2) to address this issue, some system identification methods have been proposed, such as the identifier–critic- or identifier–actor–critic-based designs of H optimal control. However, it is generally required that the persistence of excitation (PE) condition must be satisfied to ensure the learning performance of the weight updating of neural networks, which is difficult to check online in practice [18,19,23]. Therefore, how to weaken the PE condition is also the research motivation of this paper.
From the abovementioned observations and considerations, in this paper, we propose a novel online parameter estimation method based on an identifier–critic learning control framework for the H optimal control of nonlinear systems that have unknown dynamics with relaxed PE conditions. The contributions of our work can be summarized as follows:
  • A new online identifier–critic learning control framework with a relaxed PE condition is proposed to address robust control for unknown continuous-time systems subject to unknown disturbances. To reconstruct the information of the system dynamics, neural networks combined with the linear regressor method are established to approximate the unknown system dynamics and disturbances.
  • The approach in this paper is different from the existing weight adaption laws [18,19,23], where the PE condition is needed to ensure the learning performance of the NN’s weight parameters. However, such a condition is difficult to check online, and a general way through which to satisfy this condition is to add external noise to the controller, which may lead to the instability of the system. To overcome this issue, a Kreisselmeier regressor extension and mixing (KREM)-based weight adaption law is designed for identifier–critic NNs with new convergence conditions.
  • Weak PE properties of new convergence conditions are analyzed rigorously compared to traditional PE conditions. Moreover, the theoretical results indicate that the closed-loop system’s stability and the convergence of identifier–critic learning are guaranteed.
The remainder of this article is organized as follows. In Section 2, some preliminaries are introduced and the optimal robust control problem of nonlinear continuous-time systems is given. Then, a system identifier design with a relaxed PE condition is constructed in Section 3Section 4 gives the critic NN design for robust control under a relaxed PE condition. Theoretical analyses of the weak PE properties under new convergence conditions and the stability of the closed-loop systems are given in Section 5. The simulation results are provided in Section 6. Some conclusions are summarized in Section 7.

2. Preliminaries and Problem Formulation

In this section, some notation and definitions are first introduced. Then, the optimal robust control problem of the nonlinear continuous-time systems is described.

2.1. Preliminaries

To facilitate readability, some notations are listed.
λ ( · ) Eigenvalue of a matrix
{ · } * Adjoint matrix
I n Identity matrix
t r ( · ) Trace of a matrix
λ M ( · ) Maximum eigenvalues
λ m ( · ) Minimum eigenvalues
The following definitions will be used in the sequel.
Definition 1
( P e r s i s t e n c e o f E x c i t a t i o n  [24]). A bounded signal ψ ( t ) is said to be PE, if there exist positive constants T and δ 1 such that
t t + T ψ ( r ) ψ T ( r ) d r δ 1 I .
For clarity, we indicate that ψ ( t ) satisfies the PE condition using the notation ψ ( t ) P E ; otherwise, ψ ( t ) P E .
Definition 2
( U n i f o r m l y U l t i m a t e l y B o u n d e d  [24]). The time function x ( t ) is said to be uniformly ultimately bounded (UUB) on a compact set Ω x , if, for all x ( t 0 ) = x 0 Ω x , there exists a δ 2 > 0 and a number T ( δ 2 , x 0 ) such that x ( t ) < δ 2 for all t t 0 + T .

2.2. Problem Formulation

Consider the nonlinear continuous-time (NCT) systems with disturbances described by the following dynamics:
x ˙ ( t ) = f ( x ) + g ( x ) u ( t ) + G ( x ) d ( t ) ,
where x ( t ) R n and u ( t ) R m denote the system state and control input, respectively. d ( t ) R q represents the external disturbance. The terms f ( x ) R n , g ( x ) R n × m , and G ( x ) R n × q are the drift dynamics, input dynamics, and disturbance injection dynamics, respectively. In this study, f ( x ) , g ( x ) , and G ( x ) are assumed to be unknown. Furthermore, it is assumed that f ( x ) , g ( x ) , and G ( x ) are Lipschitz continuous with f ( 0 ) = 0 , and that the system (1) is stabilizing and controllable.
The goal of this study is to solve an H control problem for the system (1). This problem can be equivalently transformed into a two-player zero-sum game, where the control input u ( t ) acts as the minimizing player and the disturbance d ( t ) acts as the maximizing player. The solution to the H control problem corresponds to a saddle point in the game, which stabilizes the equilibrium of the two-player zero-sum game.
Define the infinite-horizon performance index function as
V ( x , u , d ) = t x T Q x + u T R u κ 2 d T d d τ ,
where κ > 0 , V ( 0 ) = 0 , and Q and R are symmetric positive-definite matrices with appropriate dimensions. Let u be the optimal control input and d be the worst disturbance. Our objective is to find the saddle point ( u , d ) that optimizes the performance index (2), which can be more precisely clarified by the following inequality:
V ( u , d ) V ( u , d ) V ( u , d ) .
We then define the optimal performance index function V as follows:
V ( x , u , d ) = min u max d t x T Q x + u T R u κ 2 d T d d τ .
The Hamiltonian of system (1) can be written as  
H ( V x , x , u , d ) = V x T [ f ( x ) + g ( x ) u + G ( x ) d ] + x T Q x + u T R u κ 2 d T d ,
where V x = V / x R n . The Hamilton–Jacobi–Isaacs (HJI) equation related to this game has the form
min u max d H ( V x , x , u , d ) = 0 ,
where V x = V / x R n . Based on the stationarity condition, the H control pair ( u , d ) for (1) has the following form:
u = 1 2 R 1 g T ( x ) V x ( x ) ,
d = 1 2 κ 2 G T ( x ) V x ( x ) .
Thus, according to (7) and (8), the HJI Equation (6) can be rewritten as
x T Q x + V x T f ( x ) 1 4 V x T g ( x ) R 1 g T ( x ) V x + 1 4 κ 2 V x T G ( x ) G T ( x ) V x = 0 .
Indeed, the HJI Equation (9) represents a highly nonlinear partial differential equation (PDE) and requires complete system information for its resolution. To address these challenges, a new IC framework with relaxed PE conditions will be proposed in the following sections. Furthermore, new adaptive update laws for the identifier and critic NNs are provided with the help of the KREM technique. The block diagram of the proposed control system is shown in Figure 1, and detailed theoretical analysis will be presented in subsequent sections.

3. System Identifier Design with Relaxed PE Condition

In this section, an NN-based identifier is utilized to reconstruct the unknown system dynamics in (1). The KREM technique is introduced to adjust the identifier weights under relaxed PE conditions. We assume that the unknown system dynamics f ( x ) , g ( x ) , and G ( x ) in (1) are continuous functions defined on compact sets. The NN-based identifier is designed as follows:
f ( x ) = W f θ f ( x ) + ϵ f ,
g ( x ) = W g θ g ( x ) + ϵ g ,
G ( x ) = W G θ G ( x ) + ϵ G ,
where W f R n × d f , W g R n × d g and W G R n × d G are the ideal NN weights; θ f ( x ) R d f , θ g ( x ) R d g × m and θ G ( x ) R d G × q are the basis functions; and ϵ f R n , ϵ g R n × m and ϵ G R n × q are the reconstruction errors. Then, according to the Weierstrass theorem and the statements in [10], the approximation errors ϵ f , ϵ g , and ϵ G can be shown to approach zero as the number of NN neurons d f , d g , and d G increases to infinity.
Before proceeding, it is essential to establish the following underlying assumption.
Assumption 1.
(1) 
The basis functions θ f ( x ) , θ g ( x ) and θ G ( x ) are bounded, that is, θ f ( x ) b θ f , θ g ( x ) b θ g , θ G ( x ) b θ G , respectively.
(2) 
The reconstruction errors ε f , ε g and ε G are bounded, that is, ε f b ε f , ε g b ε g , ε G b ε G , respectively.
Using (10)–(12), the system (1) can be rewritten as
x ˙ = W I T θ I ( x , u ) + ϵ T ,
where W I = [ W f , W g , W G ] T R d × n is the augmented weight matrix with d = d f + d g + d G , and θ I ( x , u ) = [ θ f T ( x ) , u T θ g T ( x ) , d T θ G T ( x ) ] T R d is the augmented regressor vector. ϵ T = ϵ f + ϵ g u + ϵ G d R n is the model approximation error.
Note that x ˙ and W I are unknown. Therefore, we define the filtered variables x f and θ I f as
ρ x ˙ f + x f = x , x f ( 0 ) = 0 ρ θ ˙ I f + θ I f = θ I , θ I f ( 0 ) = 0
where ρ R > 0 is the filter coefficient. From Equations (13) and (14), we can deduce that
x ˙ f = x x f ρ = W I T θ I f + ϵ T f ,
where ϵ T f denotes the filtered version of ϵ T as ρ ϵ ˙ T f + ϵ T f = ϵ T . Clearly, (15) is a linear regression equation (LRE), where x ˙ f and θ I f can be calculated from (14). In the following, we describe how the KREM technique is applied to estimate W I by using the measured information x ˙ f and θ I f .
To approximate the unknown weights W I in (15) such that the estimated weights W ^ I converge to their true values under a relaxed PE condition, we aim to construct an extended LRE (E-LRE) based on (15). We define the matrices P I R d × d and Q I R d × n as follows:
{ P I = H I [ θ I f θ I f T ] , P I ( 0 ) = 0 Q I = H I [ θ I f ( x x f ρ ) T ] , Q I ( 0 ) = 0
where
H I = 1 p + l I [ s ] ( t )
with p = d / d t , l I > 0 is a forgetting factor. From (16), we can derive its solution as
{ P I = 0 t e l I ( t τ ) θ I f ( τ ) θ I f T ( τ ) d ( τ ) , Q I = 0 t e l I ( t τ ) θ I f ( τ ) ( x ( τ ) x f ( τ ) ρ ) T d ( τ ) .
Note that it can be verified that P I and Q I are bounded for any given bounded θ I and x due to the appropriate choice of l I . Thus, an E-LRE is obtained
Q I ( t ) = P I ( t ) W I + v I ,
where v I = 0 t e l I ( t τ ) θ I f ( τ ) ϵ T f T ( τ ) d ( τ ) .
To construct an identifier weight error dynamics that achieves better convergence properties, we define the variables Q I ( t ) R d × n , P I R d × d , and V I R d × n as follows:
{ Q I = P I * Q I , P I = P I * P I , V I = P I * v I .
Then Equation (18) becomes
Q I ( t ) = P I ( t ) W I + V I .
Note that for any square matrix M R q × q , we have M * M = | M | I q , even if M is not full rank. Thus, P I = | P I | I d R d × d . Moreover, P I is a scalar diagonal matrix, where (20) can be decoupled into a series of scalar LREs:
Q I ( i , j ) ( t ) = | P I | ( t ) W I ( i , j ) + V I ( i , j ) , i = 1 , , d , j = 1 , , n ,
where Q I ( i , j ) and W I ( i , j ) indicate the ith row and jth column of Q I and W I , respectively.
Then, the estimation algorithm for the unknown identifier NN weights can be designed based on (21) as follows:
W ^ ˙ I ( i , j ) = γ 1 | P I | [ | P I | W ^ I ( i , j ) Q I ( i , j ) ] ,
where γ 1 R > 0 presents the adaptive learning gain.
The convergence of identifier (22) can be given as follows.
Theorem 1.
Consider the system (13) with the online update law (22); if | P I |   P E , then
(i) 
for ϵ T = 0 , the estimator error W ˜ I ( i , j ) converges to zero exponentially;
(ii) 
for ϵ T 0 , the estimator error W ˜ I ( i , j ) converges to a compact set around zero.
Proof. 
If | P I | P E , according to Definition 1 we have t t + T | P I | 2 d r δ I > 0 . Defining the estimation error W ˜ I ( i , j ) = W ^ I ( i , j ) W I ( i , j ) , i = 1 , , d , j = 1 , , n . Due to (21) and (22), the identifier weight error dynamics can be obtained
W ˜ ˙ I ( i , j ) = γ 1 | P I | 2 W ˜ I ( i , j ) + γ 1 | P I | V I ( i , j ) .
Considering the Lyapunov function V I = 0.5 γ 1 1 W ˜ I ( i , j ) 2 , the derivation of V I can be calculated as
V ˙ I = 1 γ 1 W ˜ I ( i , j ) W ˜ ˙ I ( i , j ) = | P I | 2 W ˜ I ( i , j ) 2 + | P I | W ˜ I ( i , j ) V I ( i , j ) .
In fact, when ϵ T = 0 , (24) can be rewritten as
V ˙ I = | P I | 2 W ˜ I ( i , j ) 2 < μ I V I ,
where μ I = 2 γ 1 δ I > 0 . According to the Lyapunov theorem, the weight estimation error W ˜ I ( i , j ) exponentially converges to zero.
When ϵ T 0 , (24) can be further presented as
V ˙ I = | P I | 2 W ˜ I ( i , j ) 2 + | P I | W ˜ I ( i , j ) V I ( i , j ) = [ | P I | 2 W ˜ I ( i , j ) | P I | V I ( i , j ) ] W ˜ I ( i , j ) .
According to Assumption 1, | P I | V I ( i , j ) is bounded, denoted as | P I | V I ( i , j ) < b P I V I . Then,  
V ˙ I [ | P I | 2 | | W ˜ I ( i , j ) | | b P I V I ] | | W ˜ I ( i , j ) | | .
According to the extended Lyapunov theorem, the estimation error W ˜ I ( i , j ) uniformly ultimately converges to a compact set { W ˜ I ( i , j ) | | | W ˜ I ( i , j ) | | b P I V I / p I 2 } .    □
Remark 1.
In [12], the update law for the unknown weight W I was designed based on (18), while the PE condition (i.e., θ I P E ) was required to ensure convergence. However, satisfying the PE condition is generally challenging. In Theorem 1, we provide a new convergence condition | P I | P E . Notably, this new condition is significantly superior to the conventional PE condition for two reasons. (1) We theoretically prove that | P I | P E is much weaker than θ I P E , as detailed in Section 5. (2) | P I | is directly related to the determinant of the matrix P I ( t ) . Therefore, checking | P I | P E online becomes feasible by calculating the determinant of P I ( t ) . In contrast, assessing the standard PE condition directly online is not possible [18,19,23].
Based on the above analysis, the unknown information f ( x ) , g ( x ) , and G ( x ) can be estimated using (13) and (22). This allows for the reconstruction of the completely unknown system dynamics. In order to obtain the optimal H control pair, the critic NN will be introduced to learn the solution of the HJB equation in the subsequent section.

4. Critic NN Design for H Control under Relaxed PE Condition

In this section, the performance index will be approximated via a critic NN to obtain the optimal H control pair. The KREM algorithm will be continually utilized to design the update law of critic NN under the relaxed PE condition. Firstly, based on the above identifier, the system (1) can be represented as
x ˙ = W ^ f θ f ( x ) + W ^ g θ g ( x ) u + W ^ G θ G ( x ) d ( t ) + ϵ I + ϵ T ,
where W ^ f , W ^ g and W ^ G are the estimated values of W f , W g and W G , respectively. ϵ I = W ˜ I θ I denotes the identifier error. And, the Hamiltonian (5) can be further written as
H = V x T [ W ^ f θ f ( x ) + W ^ g θ g ( x ) u + W ^ G θ G ( x ) d ( t ) + ϵ I + ϵ T ] + x T Q x + u T R u κ 2 d T d .
Then, the HJI Equation (6) becomes
0 = min u max d [ H ( V x , x , u , d ) ] = V x T [ W ^ f θ f ( x ) + W ^ g θ g ( x ) u + W ^ G θ G ( x ) d ( t ) + ϵ I + ϵ T ] + x T Q x + u T R u κ 2 d T d .
Therefore, based on (30), the H control pair ( u , d ) for the estimated system (28) can be expressed as follows:
u = 1 2 R 1 [ W ^ g θ g ] T V x ,
d = 1 2 κ 2 [ W ^ G θ G ] T V x ( x ) .
Since the HJI Equation (30) is a nonlinear PDE, similar to (6), we utilize a critic NN to estimate V ( x ) and its gradient V x ( x ) as follows:
V ( x ) = W c T θ c ( x ) + ϵ v ,
V x ( x ) = θ c T ( x ) W c + ϵ v ,
where W c R l is the unknown constant weight. θ c ( x ) R l represents the independent basis function with θ c ( x ) = θ c / x . l is the number of neurons. The approximation error is presented as ϵ v with ϵ v = ϵ v / x . Note that as the number of independent basis functions increases, both the approximation errors and their gradients can approach zero.
Before proceeding, the following assumption is needed.
Assumption 2.
(1) 
The ideal critic NN’s weight W c is bounded, that is, W c < b W c .
(2) 
The basis functions θ c ( x ) and its gradients θ c ( x ) are bounded, that is, θ c b θ c , θ c b θ c .
(3) 
The approximator reconstruction error ϵ v and its gradients ϵ v are bounded, that is, ϵ v b ϵ v , ϵ v b ϵ v .
Since the ideal critic NN weights W c are unknown, take W ^ c as the estimated value of W c and V ^ as the estimated value of V, where the practical critic NN is given by
V ^ ( x ) = W ^ c T θ c ( x ) , V ^ x ( x ) = θ c T ( x ) W ^ c .
The estimated H control pair u ^ and d ^ can be obtained as
u ^ = 1 2 R 1 [ W ^ g θ g ] T V ^ x = 1 2 R 1 [ W ^ g θ g ] T θ c T W ^ c ,
d ^ = 1 2 κ 2 [ W ^ G θ G ] T V ^ x = 1 2 κ 2 [ W ^ G θ G ] T θ c T W ^ c .
To online estimate the unknown weights of the critic NN using KREM technology, we aim to construct a linear equation according to (30) and (34) as
ϵ H J I + x T Q x + u ^ T R u ^ κ 2 d ^ T d ^ + W c T θ c W ^ f θ f ( x ) + W c T θ c W ^ g θ g ( x ) u ^ + W c T θ c W ^ G θ G ( x ) d ^ = 0 ,
where ϵ H J I = W c T θ c ( ϵ I + ϵ T ) + ϵ v T ( W ^ f θ f + W ^ g θ g u ^ + W ^ G θ G d ^ + ϵ I + ϵ T ) is a bounded residual HJI equation error. Let Θ = θ c [ W ^ f θ f + W ^ g θ g u ^ + W ^ G θ G d ^ ] and Σ = x T Q x + u ^ T R u ^ κ 2 d ^ T d ^ , where a linear equation is obtained as follows:
Σ = W c T Θ ϵ H J I .
Similar to the previous section, we define the filtered regressor matrix P c R l × l and the vector Q c R l as follows:
P c = H c [ Θ Θ T ] , P c ( 0 ) = 0 Q c = H c [ Θ Σ ] , Q c ( 0 ) = 0
where
H c = 1 p + l c [ s ] ( t ) ,
and l c > 0 is the forgetting factor. Then, the solution of (40) can be deduced as
P c = 0 t e l c ( t τ ) Θ Θ T d τ , Q c = 0 t e l c ( t τ ) Θ Σ d τ .
From (39) and (41), an E-LRE related to P c and Q c is obtained
Q c ( t ) = P c ( t ) W c v c ,
where v c = 0 t e l c ( t τ ) Θ ( τ ) ϵ H J I T ( τ ) d τ is bounded. To estimate the unknown parameter W c in (42) under a relaxed PE condition, define the variables Q c ( t ) R l , P c R l × l , and V c R l as  
Q c = P c * Q c , P c = P c * P c , V c = P c * v c .
Then Equation (42) becomes
Q c ( t ) = P c ( t ) W c V c .
Note that P c = | P c | I l . Since P c is a scalar matrix, a series of scalar LREs is obtained as
Q c ( i ) ( t ) = | P c | ( t ) W c ( i ) V c ( i ) , i = 1 , , l ,
where Q c ( i ) , W c ( i ) and V c ( i ) indicate the i t h rows of Q c , W c , and V c , respectively.
Driven by the parameter error based on (45), the critic unknown weight W c ( i ) is designed as
W ^ ˙ c ( i ) = γ 2 | P c | [ | P c | W ^ c ( i ) + Q c ( i ) ] ,
where γ 2 R > 0 presents the adaptive learning gain.
The convergence condition for the proposed critic NN adaptive law is provided in Theorem 2.
Theorem 2.
For adaptive law (46) of critic NN with the regressor matrix P c in (44); if | P c |   P E , then
(i) 
for ϵ H J I = 0 , the estimator error W ˜ c ( i ) converges to zero exponentially;
(ii) 
for ϵ H J I 0 , the estimator error W ˜ c ( i ) converges to a compact set around zero;
Proof. 
Defining the estimation error W ˜ c ( i ) = W ^ c ( i ) W c ( i ) , i = 1 , , l . The proofs presented in Theorem 1 can be extended to establish similar results in the current context. Note that the Lyapunov function V c here is chosen as 0.5 γ 2 1 W ˜ c ( i ) 2 .    □
Remark 2.
According to Theorem 2, a new convergence condition for the estimation error of the critic neural network weights, denoted as W ˜ c , is provided. This condition does not rely on the conventional parameter estimation (PE) condition, i.e., Θ P E . In this paper, the additional exploration signal is not required to guarantee Θ P E . Instead, the satisfaction of | P c | P E can be achieved by adjusting the forgetting factor l c . It is worth noting that the new convergence condition is associated with the matrix P c , and it can be verified online by calculating the determinant of P c . The proof of the weak PE property for the new convergence condition will be presented in the following section.
Remark 3.
The convergence analysis of W ˜ I ( i , j ) and W ˜ c ( i ) are provided in Theorem 1 and Theorem 2, respectively. In fact, we can derive the convergence of W ˜ I and W ˜ c using simple matrix operations, which will be omitted in this paper.
Till now, the identifier–critic learning-based framework for H optimal control under the relaxed PE condition is given. For clarity, the design details of the proposed method are shown in Algorithm 1, which can be considered the pseudocode for the simulation part.
Algorithm 1 Identifier–critic learning-based H optimal control algorithm
1:
Initialization
2:
Initialize system parameters: x ( 0 ) , Q, R and running time T;
3:
Set the identifier and critic filter operators: H I and H c ;
4:
Set the basis functions of identifier and critic NNs: θ I ( x , u ) and θ c ( x ) ;
5:
Initialize and set the filter operator parameters: ρ , l I , l c , x f ( 0 ) = 0 , θ I f ( 0 ) = 0 and ϵ I f ( 0 ) = 0 ;
6:
Initialize identifier NNs parameters: γ 1 > 0 , W ^ I i n i t i a l ( 0 , 1 ] ;
7:
Initialize critic NNs parameters: γ 2 > 0 , W ^ c i n i t i a l ( 0 , 1 ] ;
8:
Initialize the control pair by ( 36 ) and ( 37 ) ;
9:
while  t T do
10:
   Calculate the filter processing of the identifier NNs by ( 14 ) ;
11:
   Calculate the dynamic regressor extension (DRE) of the identifier NNs by ( 15 ) ;
12:
   Calculate the regressor “mixing” of the identifier NNs by ( 18 ) ;
13:
   Update the weight parameters of the identifier NNs W ^ I ( i , j ) by ( 20 ) ;
W ^ ˙ I ( i , j ) = γ 1 | P I | [ | P I | W ^ I ( i , j ) Q I ( i , j ) ] ;
14:
   Compute the approximated HJB equation by ( 39 ) ;
15:
   Calculate the dynamic regressor extension (DRE) of the critic NNs by ( 40 ) ;
16:
   Calculate the regressor “mixing” of the critic NNs by ( 42 ) ;
17:
   Update the weight parameters of the critic NNs W ^ c ( i ) by ( 46 ) ;
W ^ ˙ c ( i ) = γ 2 | P c | [ | P c | W ^ c ( i ) + Q c ( i ) ] ;
18:
   Update the control pair by ( 36 ) and ( 37 ) ;
19:
   Update the system states x by ( 28 ) ;
20:
end while

5. Stability and Convergence Analysis

In this section, we present the main results, which include the theoretical analysis of weak PE properties under new convergence conditions proposed in Theorem 1 and Theorem 2. Furthermore, we provide a stability result for the closed-loop system under the proposed online learning optimal control method.
To facilitate the analysis, the following assumption is made.
Assumption 3.
The system dynamics in (1) satisfy f ( x ) b f x , g ( x ) b g and G ( x ) b G , where b f > 0 , b g > 0 and b G > 0 .

5.1. Weak PE Properties of New Convergence Conditions

As shown in Theorem 1, Theorem 2 and Remark 3, the convergence of W ˜ I and W ˜ c is established without the restrictive PE condition, i.e., θ I P E and Θ P E . These new convergence conditions can be easily checked online, as mentioned in Remark 1 and Remark 2. Furthermore, we will analyze the superiority of the new convergence conditions compared to the conventional PE condition from a theoretical standpoint.
Theorem 3.
Consider the system (13) with the online identifier NN adaptive law (22) and critic NN adaptive law (46),
(i) 
The convergence condition of estimation error W ˜ I in Theorem 1, that is, | P c | P E , is weaker than θ I P E in the following precise sense
θ I ( t ) P E | P c | P E ,
| P c | P E θ I ( t ) P E ;
(ii) 
The convergence condition of estimation error W ˜ c in Theorem 2, that is, | P c | P E , is weaker than Θ P E in the following precise sense
Θ P E | P c | P E ,
| P c | P E Θ P E .
Proof. 
For ( i ) , suppose that θ I ( t ) in (13) is PE, indicating that θ I f ( t ) P E [25]. From Definition 1, we have
t t + τ θ I f ( r ) θ I f T ( r ) d r δ I t τ t θ I f ( r ) θ I f T ( r ) d r δ I for t > τ > 0 .
Moreover, since e β I ( t r ) e β I τ > 0 with r [ t τ , t ] , the following inequality holds
t τ t e β I ( t r ) θ I f T ( r ) θ I f ( r ) d r t τ t e β I τ θ I f T ( r ) θ I f ( r ) d r e β I τ δ I .
Furthermore, for t > τ > 0 , we also have
0 t e β I ( t r ) θ I f T ( r ) θ I f ( r ) d r > t τ t e β I ( t r ) θ I f T ( r ) θ I f ( r ) d r .
From (17), (52) and (53), we conclude that
P I = 0 t e β I ( t r ) θ I f T ( r ) θ I f ( r ) d r > e β I τ t τ t θ I f T ( r ) θ I f ( r ) d r e β I τ δ I .
Hence, the matrix P I in (16) is positive definite, that is, λ i ( P I ) > 0 , i = 1 , , d . Considering that the determinant of a matrix is equal to the product of all its eigenvalues, that is, | P I | = λ 1 ( P I ) λ 2 ( P I ) λ d ( P I ) , we obtain λ i ( P I ) > 0 i = 1 d λ i ( P I ) > 0 | P I | > 0 . Thus, (47) is true.
The proof of (48) is established by the following:
| P I | P E 0 t | P I | 2 ( τ ) d τ > 0 0 t i = 1 d λ i 2 ( P I ) > 0 λ i ( P I ) > 0 , i = 1 , , d P I > 0 P I P E .
For (ii), the proof process can be referred to in (i). This finishes the proof. □

5.2. Stability and Convergence Analysis

The stability result for the closed-loop system under the proposed online learning optimal control method will be presented in the following theorem.
Theorem 4.
Let Assumptions 1 and 2 hold. Considering system (1) with the identifier weight tuning law given by (22), the H control pair are computed by (36) and (37), respectively. The critic NN weight tuning laws are updated by (46). If | P I | P E and | P c | P E , then the closed-loop system, system identifier estimation error W ˜ I , and critic estimation error W ˜ c are uniformly ultimately bounded (UUB). Moreover, the approximated H control pair given by (36) and (37) are close to the optimal control pair u and d within a small region b u and b d , that is, u ^ u b u and d ^ d b d , where b u and b d are positive constants.
Proof. 
We consider the Lyapunov function as follows:
J ( t ) = 1 2 t r { W ˜ I T ( t ) γ 1 1 W ˜ I ( t ) } + 1 2 W ˜ c T ( t ) γ 2 1 W ˜ c ( t ) + γ 3 x T x + γ 4 V ( x ) + γ 5 t r { V I T V I } + γ 6 V c T V c = J 1 + J 2 + J 3 + J 4 + J 5 + J 6 ,
where γ 3 , γ 4 , γ 5 and γ 6 are positive constants.
By applying matrix operations, we can obtain the following:
W ˜ ˙ I = γ 1 | P I | [ | P I | W ˜ I V I ] , W ˜ ˙ c = γ 2 | P c | [ | P c | W ˜ c V c ] .
According to Definition 1, | P I |   P E and | P c | P E imply that t t + T | P I | 2 d r δ I > 0 and t t + T | P c | 2 d r δ c > 0 . Substituting (19), (43), and using Young’s inequality a b a 2 η / 2 + b 2 / 2 η with η > 0 , we have
J ˙ 1 = t r W ˜ I T | P I | 2 W ˜ I + | P I | W ˜ I T V I δ I 1 2 η W ˜ I 2 + η 2 b P I * 2 | P I | v I 2 ,
J ˙ 2 = W ˜ c T | P c | 2 W ˜ c + | P c | W ˜ c T V c δ c 1 2 η W ˜ c 2 + η 2 b P c * 2 | P c | v c 2 .
where P I * b P I * , P c * b P c * .
For J 3 and J 4 ,
J ˙ 3 + J ˙ 4 = 2 γ 3 x T x ˙ + γ 4 V ˙ ( x ) = 2 γ 3 x T f ( x ) + g ( x ) u ^ + G ( x ) d ^ g ( x ) u + g ( x ) u G ( x ) d + G ( x ) d + γ 4 ( x T Q x u T R u + κ 2 d T d ) = 2 γ 3 x T [ f ( x ) + g ( x ) ( 1 2 R 1 g ^ T ( x ) θ c T ( x ) W ^ c + 1 2 R 1 g T ( x ) ( θ c T ( x ) W c + ϵ v ) ) + g ( x ) u + G ( x ) ( 1 2 κ 2 G ^ T ( x ) θ c T ( x ) W ^ c 1 2 κ 2 G T ( x ) ( θ c T ( x ) W c + ϵ v ) ) + G ( x ) d ] + γ 4 ( x T Q x u T R u + κ 2 d T d ) .
Since g T θ c T W c g ^ T θ c T W ^ c = g T θ c T W ˜ c + g ˜ T θ c T W ^ c , and G T θ c T W c + G ^ T θ c T W ^ c = G T θ c T W ˜ c G ˜ T θ c T W ^ c , (60) can be rewritten as
J ˙ 3 + J ˙ 4 = 2 γ 3 x T [ f ( x ) + g ( x ) ( 1 2 R 1 g T ( x ) θ c T ( x ) W ˜ c + 1 2 R 1 g ˜ T ( x ) θ c T ( x ) W ^ c + 1 2 R 1 g T ( x ) ε v ) + g ( x ) u + G ( x ) ( 1 2 κ 2 G T ( x ) θ c T ( x ) W ˜ c 1 2 κ 2 G ˜ T ( x ) θ c T ( x ) W ^ c 1 2 κ 2 G T ( x ) ϵ v ) + G ( x ) d ] + γ 4 ( x T Q x u T R u + κ 2 d T d ) γ 4 λ m ( Q ) 2 γ 3 b f 4 η x 2 + [ 1 2 η γ 3 2 b g 2 b ω 2 b θ c 2 λ M 2 ( R 1 ) + 1 2 η κ 4 γ 3 2 b G 2 b ω 2 b θ c 2 ] W ˜ I 2 + 1 2 η γ 3 2 b g 4 b θ c 2 λ M 2 ( R 1 ) + 1 2 η κ 4 γ 3 2 b G 4 b θ c 2 W ˜ c 2 γ 4 λ m ( R ) 2 γ 3 2 b g 2 / η u 2 + 1 2 η γ 3 2 b g 4 λ M 2 ( R 1 ) + γ 3 2 κ 4 b G 4 ϵ v 2 + γ 4 κ 2 + 2 γ 3 2 b G 2 / η d 2 ,
where b ω = W ^ c is a bounded variable.
Recall that V I = P I * v I and v ˙ I = l I v I + θ I f ϵ T f T , thus
J ˙ 5 2 γ 5 b P I * 2 v I T v ˙ I = 2 γ 5 b P I * 2 v I T [ l I v I + θ I f ϵ T f T ] 2 γ 5 b P I * 2 l I b P I * 2 η v I 2 + 1 η γ 5 2 b P I * 2 θ I f ϵ T f T 2 .
Since v ˙ c = l c v c + Θ ϵ H J I T . Hence, the last term of (56) can be given as
J ˙ 6 2 γ 6 b P c * 2 v c T v ˙ c = 2 γ 6 b P c * 2 v c T l c v c + Θ ϵ H J I T 2 γ 6 l c b P c * 2 5 b P c * 2 η v c 2 + 1 η γ 6 2 b P c * 2 b W c 2 b θ c 2 Θ 2 ϵ T 2 + 1 η γ 6 2 b P c * 2 b θ c 2 b W c 2 θ I 2 Θ 2 W ˜ I 2 + 1 η γ 6 2 b P c * 2 b f 2 b ϵ v 2 Θ 2 x 2 + 1 4 η γ 6 2 b P c * 2 b g 2 b ϖ 1 2 b θ c 2 b ω 2 λ M 2 ( R 1 ) Θ 2 ϵ v 2 + 1 4 η κ 4 γ 6 2 b P c * 2 b G 2 b ϖ 2 2 b θ c 2 b ω 2 Θ 2 ϵ v 2 .
where b ϖ 1 = W ^ g θ g and b ϖ 2 = W ^ G θ G are bounded variables. Consequently, we substitute (58), (59), and (61)–(63) into (56); thus, we have
J ˙ ( t ) = J ˙ 1 + J ˙ 2 + J ˙ 3 + J ˙ 4 + J ˙ 5 + J ˙ 6 ( δ I 1 2 η 1 2 η γ 3 2 b g 2 b ω 2 b θ c 2 λ M 2 ( R 1 ) 1 2 η κ 4 γ 3 2 b G 2 b ω 2 b θ c 2 1 η γ 6 2 b P c * 2 b θ c 2 b W c 2 θ I 2 Θ 2 ) W ˜ I 2 γ 4 λ m ( R ) 2 η γ 3 2 b g 2 u 2 δ c 1 2 η 1 2 η γ 3 2 b g 4 b θ c 2 λ M 2 ( R 1 ) 1 2 η κ 4 γ 3 2 b G 4 b θ c 2 W ˜ c 2 γ 4 λ m ( Q ) 2 γ 3 b f 4 η 1 η γ 6 2 b P c * 2 b f 2 b ϵ v 2 Θ 2 x 2 + 1 η γ 5 2 b P I * 2 ϕ I f ϵ T f T 2 2 γ 5 b P I * 2 l I b P I * 2 η η 2 b P I * 2 | P I | 2 v I 2 2 γ 6 l c b P c * 2 5 b P c * 2 η η 2 b P I * 2 | P I | 2 v c 2 + ( 1 2 η γ 3 2 b g 4 λ M 2 ( R 1 ) + γ 3 2 κ 4 b G 4 + 1 4 η γ 6 2 b P c * 2 b g 2 b ϖ 1 2 b θ c 2 b ω 2 λ M 2 ( R 1 ) Θ 2 + 1 4 η κ 4 γ 6 2 b P c * 2 b G 2 b ϖ 2 2 b θ c 2 b ω 2 Θ 2 ) ϵ v 2 + 1 η γ 6 2 b P c * 2 b W c 2 b θ c 2 Θ 2 ϵ T 2 .
We choose the parameters γ 3 , γ 4 , γ 5 , γ 6 and η , fulfilling the following conditions
η > max { ( κ 4 + κ 4 γ 3 2 b g 2 b ω 2 b θ c 2 λ M 2 ( R 1 ) + γ 3 2 b G 2 b ω 2 b θ c 2 + 2 κ 4 γ 6 2 b P c * 2 b θ c 2 b W c 2 θ I 2 Θ 2 ) / 2 κ 4 δ I , κ 4 + κ 4 γ 3 2 b g 4 b θ c 2 λ M 2 ( R 1 ) + κ 4 γ 3 2 b G 4 b θ c 2 / 2 κ 4 δ c } , γ 3 < γ 4 η λ m ( R ) / 2 b g 2 , γ 4 > 2 γ 3 b f 4 η 1 η γ 6 2 b P c * 2 b f 2 b ϵ v 2 Θ 2 / λ m ( Q ) , γ 5 > ( η + η 2 | P I | 2 ) / 2 l I , γ 6 > ( 5 η η 2 | P c | 2 ) / 2 l c .
Then, (64) can be further presented as
J ˙ ( t ) k 1 W ˜ I 2 k 2 W ˜ c 2 k 3 x 2 k 4 v I 2 k 5 v c 2 + b γ ,
where k 1 , k 2 , k 3 , k 4 , k 5 and b γ are positive constants
k 1 = δ I 1 2 η 1 2 η γ 3 2 b g 2 b ω 2 b θ c 2 λ M 2 ( R 1 ) 1 2 η κ 4 γ 3 2 b G 2 b ω 2 b θ c 2 1 η γ 6 2 b P c * 2 b θ c 2 b W c 2 θ I 2 Θ 2 , k 2 = δ c 1 2 η 1 2 η γ 3 2 b g 4 b θ c 2 λ M 2 ( R 1 ) 1 2 η κ 4 γ 3 2 b G 4 b θ c 2 , k 3 = γ 4 λ m ( Q ) 2 γ 3 b f 4 η 1 η γ 6 2 b P c * 2 b f 2 b ϵ v 2 Θ 2 , k 4 = 2 γ 5 b P I * 2 l I b P I * 2 η η 2 b P I * 2 | P I | 2 , k 5 = 2 γ 6 l c b P c * 2 5 b P c * 2 η η 2 b P c * 2 | P c | 2 , b γ = ( 1 2 η γ 3 2 b g 4 λ M 2 ( R 1 ) + γ 3 2 κ 4 b G 4 + 1 4 η γ 6 2 b P c * 2 b g 2 b ϖ 1 2 b θ c 2 b ω 2 λ M 2 ( R 1 ) Θ 2 + 1 η γ 5 2 b P I * 2 ϕ I f ϵ T f T 2 + 1 4 η κ 4 γ 6 2 b P c * 2 b G 2 b ϖ 2 2 b θ c 2 b ω 2 Θ 2 ) ϵ v 2 + 1 η γ 6 2 b P c * 2 b W c 2 b θ c 2 Θ 2 ϵ T 2 .
Thus, J ˙ ( t ) is negative if
W ˜ I > b γ / k 1 , W ˜ c > b γ / k 2 , x > b γ / k 3 , Ξ I > b γ / k 4 , Ξ c > b γ / k 5 ,
which implies that the NN weight estimation errors W ˜ I , W ˜ c and the system state x are all UUB.
Lastly, the error between the proposed H control pair and the ideal one are written as
u ^ u = 1 2 R 1 W ^ g θ g ( x ) T θ c T ( x ) W ^ c + 1 2 R 1 g T θ c T ( x ) W c + ϵ v = 1 2 R 1 g T θ c T ( x ) W ˜ c + 1 2 R 1 g W ^ g θ g ( x ) T θ c T ( x ) W c 1 2 R 1 g W ^ g θ g ( x ) T θ c T ( x ) W ˜ c + 1 2 R 1 g T ϵ v ,
d ^ d = 1 2 κ 2 W ^ G θ G ( x ) T θ c T ( x ) W ^ c 1 2 κ 2 G T θ c T ( x ) W c + ϵ v = 1 2 κ 2 G T θ c T ( x ) W ˜ c + 1 2 κ 2 G W ^ G θ G ( x ) T θ c T ( x ) W c + 1 2 κ 2 G W ^ G θ G ( x ) T θ c T ( x ) W ˜ c 1 2 κ 2 [ W ^ G θ G ( x ) ] T ϵ v ,
which further implies the following fact
lim t + u ^ u 1 2 λ M ( R 1 ) { b g b θ c W ˜ c + b ϵ v + b θ c b W c W ˜ I + b g + b θ c W ˜ c W ˜ I + b g } b u ,
lim t + d ^ d 1 2 κ 2 { b G b θ c W ˜ c + b ϵ v + b θ c b W c W ˜ I + b G + b θ c W ˜ c W ˜ I + b G } b d ,
where b u > 0 and b d > 0 are constants determined by the identifier NN estimation error W ˜ I and the critic NN estimation error W ˜ c . It proves that the approximate H control pair can converge to a set around the optimal solution.
This completes the proof. □

6. Numerical Simulation

This section aims to verify the effectiveness of the proposed KREM-based IC learning approach for optimal robust control. We consider the following NCT system [12]
x ˙ = f ( x ) + g ( x ) u + G ( x ) d ,
where f ( x ) = x 1 + x 2 0.5 x 1 0.5 x 2 ( 1 ( cos ( 2 x 1 ) + 2 ) 2 ) , g ( x ) = 0 cos ( 2 x 1 ) + 2 , G ( x ) = 0 sin ( 4 x 1 ) + 2 .
We choose the regressor of identifier NN as
θ I ( x , u ) = [ x 1 , x 2 , x 2 ( 1 ( cos ( 2 x 1 ) + 2 ) 2 ) , u cos ( 2 x 1 ) , u , d sin ( 4 x 1 ) , d ] T ,
with the unknown identifier weight matrix given by
W I = 1 1 0 0 0 0 0 0.5 0 0.5 1 2 1 2 .
The activation function in (33) for the critic NN is selected as
θ c ( x , u ) = [ x 1 2 , x 1 x 2 , x 2 2 ] T .
The ideal critic NN weights were W c = [ 0.5 , 0 , 1 ] T .
In this numerical example, several other parameters are set as follows: the initial values of the system states are x 1 ( 0 ) = 3 and x 2 ( 0 ) = 1 . Q = I 2 and R = 1 . The filter coefficients are ρ = 0.001 , l I = 0.1 , l c = 20 , γ 1 = 800 , γ 2 = 200 d i a g { 0.3 , 1 , 1 } . It is important to note that in this simulation, there is no need to add noise to the control input u ( t ) to ensure the PE condition. This condition is often necessary for many existing ADP-based control methods to ensure that θ I ( t ) P E and Θ ( t ) P E .
For comparison, we consider the Kreisselmeier’s Regressor Extension (KRE) based identifier-critic network framework [12] for the system (66). Figure 2 and Figure 3 display the convergence of the identifier NN weights and the critic NN weights, respectively, under our KREM-based optimal robust control method and the KRE-based control method [12]. As illustrated in Figure 2, the KREM-based ADP method proposed in this paper exhibits faster convergence compared to the KRE-based ADP method. Furthermore, it demonstrates element-wise monotonicity, thus preventing oscillations and peaking in the learning curve. The trajectories of the approximate control input u ^ and the estimated disturbance d ^ are presented in Figure 4 and Figure 5, respectively. By applying the optimal H control pair, the system states are stabilized, as depicted in Figure 6.

7. Conclusions

This paper presents a novel adaptive learning approach using neural networks (NNs) to address the problem of optimal robust control for nonlinear continuous-time systems with unknown dynamics. The approach involves employing a system identifier that utilizes NNs and parameter estimation techniques to approximate the unknown system matrices and disturbances. Additionally, a critic NNs learning structure is introduced to obtain an approximate controller that corresponds to the optimal control problem. Unlike existing identifier-critic NNs learning control methods, this approach incorporates adaptive tuning laws based on a regressor extension and mixing technique. These laws facilitate the learning of unknown parameters in the two NNs under relaxed persistence of excitation conditions. The convergence conditions of the proposed approach have been theoretically demonstrated. Finally, the effectiveness of the proposed learning control approach is validated via a simulation study.

Author Contributions

Methodology, R.L.; Validation, R.L.; Formal analysis, R.L.; Investigation, R.L. and Z.P.; Writing—original draft, R.L.; Writing—review & editing, Z.P. and J.H.; Supervision, Z.P. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62203089 and Grant 62103084, in part by the Project funded by China Postdoctoral Science Foundation under Grant 2021M700695, in part by the Sichuan Science and Technology Program, China under Grant 2022NSFSC0890, Grant 2022NSFSC0865, and Grant 2021YFS0016, and in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515110135.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

  1. Luo, R.; Peng, Z.; Hu, J. On model identification based optimal control and it’s applications to multi-agent learning and control. Mathematics 2023, 11, 906. [Google Scholar] [CrossRef]
  2. Luo, B.; Wu, H.N.; Huang, T. Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 2014, 45, 65–76. [Google Scholar] [CrossRef] [PubMed]
  3. Werbos, P. Approximate Dynamic Programming for Realtime Control and Neural Modelling; White, D.A., Sofge, D.A., Eds.; Van Nostrand: New York, NY, USA, 1992. [Google Scholar]
  4. Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
  5. Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  6. Wei, Q.; Liu, D.; Lin, H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 2015, 46, 840–853. [Google Scholar] [CrossRef]
  7. Peng, Z.; Zhao, Y.; Hu, J.; Luo, R.; Ghosh, B.K.; Nguang, S.K. Input-output data-based output antisynchronization control of multi-agent systems using reinforcement learning approach. IEEE Trans. Ind. Inform. 2021, 17, 7359–7367. [Google Scholar] [CrossRef]
  8. Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B.K. Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm. Inf. Sci. 2019, 481, 189–202. [Google Scholar] [CrossRef]
  9. Zhang, H.; Jiang, H.; Luo, C.; Xiao, G. Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms. IEEE transactions on cybernetics. IEEE Trans. Cybern. 2016, 47, 3331–3340. [Google Scholar] [CrossRef]
  10. Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially unknown constrained input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
  11. Yang, X.; Liu, D.; Luo, B.; Li, C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf. Sci. 2016, 369, 731–747. [Google Scholar] [CrossRef]
  12. Lv, Y.; Na, J.; Ren, X. Online H control for completely unknown nonlinear systems via an identifier-critic-based ADP structure. Int. J. Control Autom. 2019, 92, 100–111. [Google Scholar] [CrossRef]
  13. Luo, R.; Peng, Z.; Hu, J.; Ghosh, B.K. Adaptive optimal control of affine nonlinear systems via identifier-critic neural network approximation with relaxed PE conditions. Neural Netw. 2023, 167, 588–600. [Google Scholar] [CrossRef] [PubMed]
  14. Luo, R.; Tan, W.; Peng, Z.; Zhang, J.; Hu, J.; Ghosh, B.K. Optimal consensus control for multi-agent systems with unknown dynamics and states of leader: A distributed KREM learning method. IEEE Trans. Circuits Syst. II Express Briefs 2023. [Google Scholar] [CrossRef]
  15. Wei, Q.; Song, R.; Yan, P. Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 444–458. [Google Scholar] [CrossRef] [PubMed]
  16. Wang, D.; Zhou, Z.; Liu, A.; Qiao, J. Event-triggered robust adaptive critic control for nonlinear disturbed systems. Nonlinear Dyn. 2023, 111, 19963–19977. [Google Scholar] [CrossRef]
  17. Zhao, J.; Na, J.; Gao, G. Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomputing 2020, 395, 56–65. [Google Scholar] [CrossRef]
  18. Vamvoudakis, K.G.; Lewis, F.L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control 2012, 22, 1460–1483. [Google Scholar] [CrossRef]
  19. Peng, Z.; Ji, H.; Zou, C.; Kuang, Y.; Cheng, H.; Shi, K.; Ghosh, B.K. Optimal H tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs. Neural Netw. 2023, 164, 105–114. [Google Scholar] [CrossRef]
  20. Lin, F.; Brandt, R.D. An optimal control approach to robust control of robot manipulators. IEEE Trans. Robot. Automat. 1998, 14, 69–77. [Google Scholar]
  21. Yang, X.; He, H.; Zhong, X. Adaptive dynamic programming for robust regulation and its application to power systems. IEEE Trans. Ind. Electron. 2017, 65, 5722–5732. [Google Scholar] [CrossRef]
  22. Yang, X.; He, H. Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics. IEEE Trans. Cybern. 2018, 49, 2255–2267. [Google Scholar] [CrossRef] [PubMed]
  23. Xue, S.; Luo, B.; Liu, D.; Gao, Y. Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans. Cybern. 2021, 52, 9001–9012. [Google Scholar] [CrossRef] [PubMed]
  24. Lewis, F.W.; Jagannathan, S.; Yesildirak, A. Neural Network Control of Robot Manipulators and Non-Linear Systems; Taylor & Francis: London, UK, 1999. [Google Scholar]
  25. Boyd, S.; Sastry, S. Adaptive Control: Stability, Convergence and Robustness; Prentice-Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
Figure 1. Schematic of the proposed control system.
Figure 1. Schematic of the proposed control system.
Entropy 26 00072 g001
Figure 2. Comparison of the convergence of identifier NN’s weights W ^ I : (a) KREM-based method; (b) KRE-based method in [12].
Figure 2. Comparison of the convergence of identifier NN’s weights W ^ I : (a) KREM-based method; (b) KRE-based method in [12].
Entropy 26 00072 g002
Figure 3. Comparison of the convergence of critic NN’s weights W ^ c : (a) KREM-based method; (b) KRE-based method in [12].
Figure 3. Comparison of the convergence of critic NN’s weights W ^ c : (a) KREM-based method; (b) KRE-based method in [12].
Entropy 26 00072 g003
Figure 4. Evolution of the approximate control input u ^ .
Figure 4. Evolution of the approximate control input u ^ .
Entropy 26 00072 g004
Figure 5. Disturbance action d.
Figure 5. Disturbance action d.
Entropy 26 00072 g005
Figure 6. Trajectories of the system states x = [ x 1 , x 2 ] T .
Figure 6. Trajectories of the system states x = [ x 1 , x 2 ] T .
Entropy 26 00072 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, R.; Peng, Z.; Hu, J. Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation. Entropy 2024, 26, 72. https://doi.org/10.3390/e26010072

AMA Style

Luo R, Peng Z, Hu J. Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation. Entropy. 2024; 26(1):72. https://doi.org/10.3390/e26010072

Chicago/Turabian Style

Luo, Rui, Zhinan Peng, and Jiangping Hu. 2024. "Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation" Entropy 26, no. 1: 72. https://doi.org/10.3390/e26010072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop