1. Introduction
It is ineluctable to contain uncertain parameters and disturbances in practical systems due to modeling errors, external disturbances, and so on [
1]. Thus, it is of great practical significance to study robust control of the uncertain systems. In recent years, the control problem of uncertain systems has been extensively studied. The literature on robust control of uncertain systems mainly includes linear systems and nonlinear systems. For the uncertain linear systems, algebraic Riccati equations (ARE) or linear matrix inequalities (LMI) were mostly used to deal with them in the classical research methods [
2,
3,
4,
5,
6]. In this literature, both matched and mismatched systems were involved. For nonlinear systems, the early research methods include feedback linearization, fuzzy modeling, nonlinear
${H}_{\infty}$ control, and so on [
7,
8,
9,
10,
11]. However, in recent decades, neural networks (NN) and PI in RL were used to approximate the robust control law numerically [
12,
13,
14].
The PI method in RL was initially utilized to calculate the optimal control law for some deterministic systems. Werbos first proposed an idea of approximating the solution in Bellman’s equation using approximate dynamic programming (ADP) [
15]. There have been many results of using the PI method to calculate the optimal control law for some deterministic systems [
16,
17,
18,
19]. There are two major benefits of using the PI algorithm to deal with such optimal problems. On the one hand, it can effectively solve some problems of the “curse of dimensionality” in engineering control [
20]. On the other hand, it can be utilized to calculate the optimal control law without knowing the system dynamics. In the practice of engineering control, it is difficult to obtain system dynamics accurately. Therefore, it is a good choice to use the PI algorithm to solve unknown model control problems.
Within the last ten years, the PI method was also developed to calculate the robust controller for some uncertain systems, which is based on the optimal control method of robust control [
21]. For an input constraint nonlinear system with continuous time, a novel RLbased algorithm was prosposed to deal with the robust control problem in [
22]. Based on network structure approximation, an online PI algorithm was developed to solve robust control of a class of nonlinear discretetime uncertain systems in [
23]. Using a datadriven RL algorithm, a robust control scheme was developed for a class of completely unknown dynamic systems with uncertainty in [
24]. In addition, there are many other examples of literature on robust control based on RL, such as [
25,
26,
27]. In all the literature listed above, the solution to the Hamilton Jacobi Bellman (HJB) equation was approximated by neural network. In fact, solving the HJB equation is a key problem in optimal control problem [
28]. The HJB equation is difficult to solve because it is a nonlinear partial differential equation. For a nonlinear system, the HJB equation is solved with neural network approximation in many cases. Meanwhile, for the linear system, ARE is used to solve it instead of a neural network. However, for all we know, most of the current studies have not considered the input uncertainty in system. The input uncertainty does exist in the actual control system.
In this study, a class of continuoustime nonlinear systems with internal and input uncertainties is considered. The main objective is to establish robust control laws for the uncertain systems. By solving the optimal control problem constructed, the robust control problem is converted into calculating an optimal controller. The online PI algorithms are proposed to calculate robust control by approximating the optimal cost with neural network. The convergence of the proposed algorithms is proved. Numerical examples are given to illustrate the availability of the method.
Our main contributions are as follows. First, more general uncertain nonlinear systems are considered, in which the uncertainty entered both the system and the input. For the matched and the mismatched uncertain systems, it is proved that the robust control can be converted into calculating an optimal controller. Second, the online PI algorithms are developed to solve the robust control problem. The neural network is utilized to approximate the optimal cost in PI algorithm, which fulfilled a difficult task of solving the HJB equation.
The rest of this paper is arranged as follows. We formulate the robust control problems and propose some basic results for the issues under consideration in
Section 2. Solving the robust control problem is converted to calculate an optimal control law of a nominal or auxiliary system in
Section 3 and
Section 4. Based on approximating optimal cost with neural network, the online PI algorithms are developed to solve the robust control problem in
Section 5. To support the proposed theoretical framework, we provide some numerical examples in
Section 6. In
Section 7, the study is concluded, and the scope for future research is discussed.
2. Preliminaries and Problem Formulation
Consider an uncertain nonlinear system as follows:
where
$x\left(t\right)\in {\mathbb{R}}^{n}$ is the system state,
$u\left(t\right)\in {\mathbb{R}}^{m}$ is the control input,
$f\left(x\left(t\right)\right)\in {\mathbb{R}}^{n}$,
$g\left(x\left(t\right)\right)\in {\mathbb{R}}^{n\times m}$ are known function,
$\Delta f\left(x\left(t\right)\right)\in {\mathbb{R}}^{n}$,
$\Delta g\left(x\left(t\right)\right)\in {\mathbb{R}}^{n\times m}$ are uncertain disturbing function.
The control objective is to establish a robust control law $u=u\left(x\right)$ in order that the closedloop system is asymptotically stable for all allowed uncertain disturbances $\Delta f\left(x\right(t\left)\right)$ and $\Delta g\left(x\right(t\left)\right)$.
As a general case, we first make the following assumptions to ensure that the state Equation (
1) is well defined [
1,
29].
Assumption 1. In (1), $f\left(x\right)+g\left(x\right)u$ is Lipschitz continuous with respect to x and u on the set $\Omega \subseteq {\mathbb{R}}^{n}$ containing the origin. Assumption 2. For the free vibration system, $f\left(0\right)=0,\Delta f\left(0\right)=0$, that is, $x=0$ is the equilibrium point of the free vibration system.
Definition 1. The system (1) is called to satisfy the system dynamics matched condition if there is a function matrix $h\left(x\right)\in {R}^{m\times 1}$ such that Definition 2. System (1) is called to satisfy the input matched condition if there is a function $m\left(x\right)\in {R}^{m\times m}$ such thatwhere $m\left(x\right)\ge 0$. Definition 3. If the system (1) satisfies the conditions (2) and (3) for any allowed disturbances $\Delta f\left(x\right)$ and $\Delta g\left(x\right)$, then the system (1) is called a matched uncertain system. Definition 4. If the system (1) does not satisfy the condition (2) or (3) for any allowed disturbances $\Delta f\left(x\right)$ and $\Delta g\left(x\right)$, then the system (1) is called a mismatched uncertain system. Next, we consider the robust control problem of nonlinear system (
1) with matched and mismatched conditions, respectively.
3. Robust Control of Matched Uncertain Nonlinear Systems
This section considers the problem of robust control when the system (
1) meets the matched conditions (
2) and (
3). By constructing appropriate performance indexes, the problem of robust control is transformed into calculating the optimal control law of a corresponding nominal system. Based on the optimal control of the nominal system, a PI algorithm is proposed to obtain robust feedback controller.
For the nominal system
find the controller
$u=u\left(x\right)$ to minimize performance index
where
${f}_{max}\left(x\right)$ is the supremum function of uncertainty
$h\left(x\right)$, that is
$\parallel h\left(x\right)\parallel \le {f}_{max}\left(x\right)$.
The definition of admissible control in optimal control problem is given below [
26].
Definition 5. The control policy $u\left(x\right)$ is called an admissible control of the system (4) with regard to the performance function (5) on compact set $\Omega \subseteq {\mathbb{R}}^{n}$ if $u\left(x\right)$ is continuous on Ω, $u\left(0\right)=0$, it can stabilize the system (4) on $\Omega $, and the performance function (5) is limited for any $x\in \Omega $. According to the performance index (
5), the cost function corresponding to the admissible control
$u\left(x\right)$ is given by
Taking time derivative on both side of (
6), it follows the Bellman equation
where
$\nabla V$ is the gradient vector of the cost function
$V(x,u)$ with respect to
x.
Definite Hamiltonian function
Determining the extremum of the Hamiltonian function yields the optimal control function
By substituting (
9) into (
7), it follows that optimal cost
${V}^{*}\left(x\right)$ satisfies the following HJB equation
and initial conditions
${V}^{*}\left(0\right)=0$.
Solving the optimal cost
${V}^{*}\left(x\right)$ from the HJB Equation (
10), we can get the solution to the optimal control problem. Thus, the robust control problem can be solved.
The following theorem shows that optimal control ${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ is a robust controller for matched uncertain systems.
Theorem 1. Assume that the conditions (2) and (3) hold in system (1) and the solution ${V}^{*}\left(x\right)$ in HJB Equation (10) exists. Considering the nominal nonlinear system (4) with performance index (5), then the optimal control policy ${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ can globally asymptotically stabilize the nonlinear uncertain system (1). That is to say, the closedloop uncertain system $\dot{x}\left(t\right)=f\left(x\left(t\right)\right)+\Delta f\left(x\left(t\right)\right)+[g\left(x\left(t\right)\right)+\Delta g\left(x\left(t\right)\right)]{u}^{*}\left(x\right)$ is globally asymptotically stable. Proof. In order to prove the stability with controller
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$,
${V}^{*}\left(x\right)$ is chosen as the Lyapunov function. Considering the performance index (
5),
${V}^{*}\left(x\right)$ is obviously positive, and
${V}^{*}\left(0\right)=0$. Taking time derivative of the function
${V}^{*}\left(x\right)$ along closedloop system (
1), it follows that
Using the matched conditions (
2)and (
3), it follows from (
11) that
From HJB Equation (
10), one can obtain
Substituting (
13) into (
12) yields
It follows from
$m\left(x\right)\ge 0$ that
$\frac{1}{2}\nabla {V}^{*T}g\left(x\right)m\left(x\right){g}^{T}\left(x\right)\nabla {V}^{*}\le 0$. Therefore, from (
14), we can have 4.6cm0cm
Therefore, by Lyapunov stability theory [
30], the optimal control
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ can make the matched uncertain system (
1) asymptotically stable. Thus, for a constant
$c>0$, there is a neighborhood
$N=\{x:\parallel x\parallel <c\}$ such that if the state
$x\left(t\right)$ enters the neighborhood
N, then
$x\to 0$ when
$t\to \infty $. However,
$x\left(t\right)$ cannot stay out of the domain
N forever; otherwise, for all
$t>0$, there is
$\parallel x\left(t\right)\parallel \ge c$. This implies that
Therefore, when
$t\to \infty $,
${V}^{*}\left[x\left(t\right)\right]\le {V}^{*}\left[x\left(0\right)\right]{c}^{2}t\to \infty $. This contradicts that
${V}^{*}\left[x\left(t\right)\right]$ is positive definite. Consequently, the system (
1) is globally asymptotically stable. □
Remark 1. For matched nonlinear systems, the robust controller can be obtained by solving the optimal cost function ${V}^{*}\left[x\left(t\right)\right]$ from HJB Equation (10). In Section 4, we will use the PI algorithm to solve the HJB equation, which is a difficult partial differential equation. 4. Robust Control of Nonlinear Systems with Mismatched Uncertainties
In this section, we consider the robust control problem when the system (
1) does not satisfy the matched condition (
2). At this time, the system is a mismatched nonlinear uncertain system. By constructing the appropriate auxiliary system and performance index, the robust control for the mismatched uncertain system is transformed into solving optimal control law of an auxiliary system.
Firstly, the following assumptions are given.
Assumption 3. Suppose that the uncertainty of system (1) satisfies $\Delta f\left(x\right)=c\left(x\right)h\left(x\right)$, $\Delta g\left(x\right)=g\left(x\right)m\left(x\right)$, where $c\left(x\right)$ is a known function matrix of appropriate dimensions, $h\left(x\right)$ and $m\left(x\right)$ are uncertain functions, and $m\left(x\right)\ge 0$. The goal of robust control is to find a control function
$u\left(x\right)$, which makes the closedloop system
globally asymptotically stable for all uncertainties
$h\left(x\right)$ and
$m\left(x\right)$.
In order to obtain the robust controller, an optimal control problem is constructed as follow. For the following auxiliary systems
find the controller
$u=u\left(x\right)$,
$v=v\left(x\right)$, such that the performance index
is minimized, where
$\beta $ is the design parameter,
$g{\left(x\right)}^{+}={\left[{g}^{T}\left(x\right)g\left(x\right)\right]}^{1}{g}^{T}\left(x\right)$ is a pseudo inverse of the matrix function
$g\left(x\right)$. Moreover,
${f}_{max}\left(x\right)$ and
${g}_{max}\left(x\right)$ are nonnegative functions and satisfy the conditions
According to the performance index (
18), the cost function corresponding to the admissible control
$\left(u\right(x),v(x\left)\right)$ is
The following Bellman equation is obtained by taking the time derivation on both sides of (
20)
where
$\nabla V$ is the gradient vector of
$V\left(x\right)$ with respect to
x,
$\overline{g}\left(x\right)=[g\left(x\right),(Ig\left(x\right)g{\left(x\right)}^{+})c\left(x\right)]$,
$\overline{u}={[{u}^{T},{v}^{T}]}^{T}$.
Defining Hamiltonian functions as
Assuming that the minimum value exists and is unique in (
22), the optimal control law is given by
By substituting (
23) into (
21), the HJB equation is given by
and the initial value
${V}^{*}\left(0\right)=0$.
Remark 2. Generally, the pseudoinverse of $g\left(x\right)$, $g{\left(x\right)}^{+}$ will exist if its columns are linearly independent when Assumptions 1 and 2 are true [31]. In practical control systems, the function, $g\left(x\right)$, is usually column fullrank. Therefore, the pseudoinverse of the function $g\left(x\right)$ is generally satisfied. Furthermore, the pseudoinverse $g{\left(x\right)}^{+}$ satisfies $g{\left(x\right)}^{+}g\left(x\right)=I$. However, it does not satisfy $g\left(x\right)g{\left(x\right)}^{+}=I$. In addition, the auxiliary system constructed above is not a nominal system, but a compensation control term $v\left(x\right)$ is added to the nominal system. If we can choose an appropriate parameter
$\beta $, the optimal cost
${V}^{*}\left(x\right)$ can be computed from HJB Equation (
24). Then, we can get the optimal control law of system (
17) with performance index (
18). The following theorem shows that optimal control
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ is a robust controller for uncertain systems.
Theorem 2. Assume that the mismatched uncertain system (16) satisfies Assumptions 4.1, 4.2 and (19). Consider the auxiliary system (17) corresponding to the performance index (18). There exists a solution ${V}^{*}\left(x\right)$ in HJB Equation (24) for a selected parameter β, and for a constant ${\beta}^{\prime}$ satisfying ${\beta}^{\prime}<\beta $, such that Then, the optimal control policy ${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ can globally asymptotically stabilize the nonlinear uncertain system (16). That is to say, the closedloop uncertain system $\dot{x}\left(t\right)=f\left(x\right)+c\left(x\right)h\left(x\right)+[g\left(x\right)+g\left(x\right)m\left(x\right)]{u}^{*}\left(x\right)$ is globally asymptotically stable. Proof. In order to prove the global asymptotic stability of the closedloop system,
${V}^{*}\left(x\right)$ is chosen as the Lyapunov function. Considering the performance index (
18),
${V}^{*}\left(x\right)$ is obviously positive, and
${V}^{*}\left(0\right)=0$. Taking the time derivative of the function
${V}^{*}\left(x\right)$ along the system (
16), we have
Using
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ yields
by
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ and
${v}^{*}\left(x\right)=\frac{1}{2}{c}^{T}\left(x\right){(Ig\left(x\right)g{\left(x\right)}^{+})}^{T}\nabla {V}^{*}$,
It follows from (
24) that
It follows from the basic matrix inequality that
So, it can be obtained from (
26)–(
28) that
Therefore, by Lyapunov stability theory, the optimal control
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}$ can make the closedloop uncertain nonlinear system asymptotically stable. Thus, for a constant
$c>0$, there is a neighborhood
$N=\{x:\parallel x\parallel <c\}$ such that if the state
$x\left(t\right)$ enters the neighborhood
N, then
$x\to 0$ when
$t\to \infty $. However,
$x\left(t\right)$ cannot stay out of the domain
N forever; otherwise, for all
$t>0$, there is
$\parallel x\left(t\right)\parallel \ge c$. This implies that
Hence, when
$t\to \infty $,
${V}^{*}\left[x\left(t\right)\right]\le {V}^{*}\left[x\left(0\right)\right]({\beta}^{2}{\beta}^{\prime 2}){c}^{2}t\to \infty $. This contradicts the positivity of
${V}^{*}\left[x\left(t\right)\right]$. Therefore, system (
16) is globally asymptotically stable. We complete the proof. □
5. Neural Networks Approximation in PI Algorithm
In the first two sections, the robust control of uncertain nonlinear systems was transformed into solving the optimal control of an auxiliary system. However, whether the uncertain system is matched or mismatched, the key issue is how to obtain the solution to corresponding HJB equation. As is well known, it is a nonlinear partial differential equation that is hard to solve. Moreover, solving the HJB equation may lead to the curse of dimensionality [
21]. In this section, an online PI algorithm is used to solve the HJB equation iteratively, and neural networks are utilized to approximate the optimal cost in PI algorithm.
5.1. PI Algorithms for Robust Control
For the system with matched uncertainty, the optimal control problem (
4) with (
5) is considered. For any admissible control, the corresponding cost function can be expressed as
where
$T>0$ is a selected constant. Therefore, it follows that
Based on the integral reinforcement relationship (
29) and optimal control (
9), the PI algorithm of robust control for matched uncertain nonlinear systems is given below.
The convergence of Algorithm 1 is illustrated as follows. The following conclusion gives an equivalent form of the Bellman Equation (
30).
Algorithm 1 PI algorithm of robust control for matched uncertain nonlinear systems 
 (1)
Select supremum ${f}_{max}\left(x\right)$ to satisfy $\parallel h\left(x\right)\parallel \le {f}_{max}\left(x\right)$;  (2)
Initialization: for the nominal nonlinear system ( 4), select an initial stabilization control ${u}_{0}\left(x\right)$;  (3)
Policy evaluation: for control input ${u}_{i}\left(x\right)$, calculate cost ${V}_{i}\left(x\right)$ from the Bellman equation
 (4)
Policy improvement: compute the control law ${u}_{i+1}\left(x\right)$ using
By repeatedly iterating between ( 30) and ( 31), until the control input is convergent.

Proposition 1. Suppose that ${u}_{i}\left(x\right)$ is a stabilization controller of nominal system (4). Then the optimal cost ${V}_{i}\left(x\right)$ solved from (30) is equivalent to solving the following equation Proof. Dividing both sides of (
30) by
T and finding the limit yields
From the definition of function limit and L’Hospital’s rule, we can get
Thus, we can deduce (
32) from (
30). On the other hand, along the stable system
$\dot{x}=f\left(x\right)+g\left(x\right){u}_{i}\left(x\right)$, finding the time derivative of
${V}_{i}\left(x\right)$ yields
Integrating both sides from
t to
$t+T$, yields
Therefore, we can get the following result from (
32)
This proves that (
32) can deduce (
30). □
According to [
32,
33,
34], if the initial stabilization control policy is given
${u}_{0}\left(x\right)$, then the followup control policy calculated by the iterative relations of (
30) and (
31) is also a stabilizing control policy, and cost sequence
${V}_{i}\left[x\left(t\right)\right]$ calculated by iteration converges to the optimal cost. By Proposition 1, it is known that (
30) and (
32) are equivalent, so the iterative relations (
30) and (
31) in Algorithm 1 converge to the optimal control and optimal cost.
Similarly, we give a PI algorithm of robust control for nonlinear systems with mismatched uncertainties.
The steps of policy evaluation (
34) and policy improvement (
35) are iteratively calculated until the policy improvement step does not change the current policy. The optimal cost function is calculated as
${V}^{*}\left(x\right)$, then
${u}^{*}\left(x\right)=\frac{1}{2}{g}^{T}\left(x\right)\nabla {V}^{*}\left(x\right)$ is the robust control law.
The convergence proof of Algorithm 2 is similar to Algorithm 1, which will not be repeated here.
Algorithm 2 PI algorithm of robust control for nonlinear systems with mismatched uncertainties 
 (1)
Decompose the uncertainty properly so that $\Delta f\left(x\right)=c\left(x\right)h\left(x\right)$ and $\Delta g\left(x\right)=g\left(x\right)m\left(x\right)$, select constant parameter $\beta $, ${\beta}^{\prime}$ such that ${\beta}^{\prime}<\beta $, and then calculate the nonnegative function ${f}_{max}\left(x\right)$ and ${g}_{max}\left(x\right)$ according to ( 19);  (2)
For auxiliary system ( 17), select an initial stabilization control policy ${u}_{0}\left(x\right)$;  (3)
Policy evaluation: Give a control policy ${u}_{i}\left(x\right)$, the cost ${V}_{i}\left(x\right)$ is solved from the following Bellman equation
 (4)
Policy improvement: Calculate the control policy using the following update law
 (5)
Check if the condition $2{v}^{*T}\left(x\right){v}^{*}\left(x\right)\le {\beta}^{\prime 2}{x}^{T}x$ is satisfied. Return to step (1) and select the larger constants $\beta $ and ${\beta}^{\prime}$ when it does not hold.

Remark 3. In Step (3) of Algorithm 1 or Algorithm 2, solving ${V}_{i}\left[x\left(t\right)\right]$ from (30) or (34) can be transformed into a least squares problem [17]. By reading enough data online along the system trajectory, the cost function $V\left(x\right)$ can be calculated by using the least square principle. However, the cost ${V}_{i}\left[x\left(t\right)\right]$ has no specific expressions. In next subsection, along the system trajectory, online reading of sufficient data on the interval $[t,t+T]$, the cost ${V}_{i}\left[x\left(t\right)\right]$ can be approximated by neural network in PI algorithms. Moreover, implementation of the algorithm does not need to know the system dynamics function $f\left(x\right)$. 5.2. Neural Network Approximation of Optimal Cost in PI Algorithm
In the implementation of the PI algorithms, we need to use the data of the nominal system and use the least square method to solve the cost function. However, the cost function of nonlinear optimal control problem has no specific form. Therefore, it is necessary to use neural network structure to approximate the cost function, carry out policy iteration, update weights, and then obtain the approximate optimal cost function. In this subsection, neural network is utilized to approximate the optimal cost in the corresponding HJB equation.
Based on the continuous approximation theory of neural network [
35], a single neural network is utilized to approximate the optimal cost in HJB equation. For matched uncertain systems, suppose that the solution
${V}^{*}\left(x\right)$ of HJB Equation (
10) is smooth positive definite, and the optimal cost function on compact set
$\Omega $ is expressed as
where
$W\in {\mathbb{R}}^{L}$ is an unknown ideal weight, and
$\varphi (.):{\mathbb{R}}^{n}\to {\mathbb{R}}^{L}$ is a linear independent basis vector function. It is assumed that
$\varphi \left(x\right)$ is continuous,
$\varphi \left(0\right)=0$, and
$\epsilon \left(x\right)$ is the error vector of neural network reconstruction. Thus, the gradient of the optimal cost function (
36) can be expressed as
where
$\nabla \epsilon \left(x\right)=\frac{\partial \epsilon}{\partial x}$. On the basis of approximation property of neural network [
35,
36], when the number of neurons in hidden layer
$L\to \infty $, the approximation error
$\epsilon \left(x\right)\to 0$,
$\nabla \epsilon \left(x\right)\to 0$. Substituting (
36) and (
37) into (
9), the optimal control is rewritten as follows
Assume that
$\widehat{W}$ is an estimated value of the ideal weight
W. Since the ideal weight
W in (
36) is unknown, the cost function of the
$ith$ iteration in Algorithm 1 is expressed as
Using the approximation of neural network in cost function, the Bellman Equation (
30) in Algorithm 1 is rewritten as follows
where
$\Psi ={\int}_{t}^{t+T}\left[{f}_{max}^{2}\left(x\right)+{x}^{T}x+{u}_{i}^{T}\left(x\right){u}_{i}\left(x\right)\right]dt$. Since the above formula uses neural network to approximate the cost function, the residual error caused by neural network approximation is
In order to obtain the neural network weight parameters of approximation function, the following objective functions can be minimized in the meaning of least square
that is
${\int}_{\Omega}\frac{d{\epsilon}_{i}(x\left(t\right),T)}{d{\widehat{W}}_{i}}{\epsilon}_{i}(x\left(t\right),T)dx=0$. Using the definition of inner product, it can be rewritten as
It follows from properties of the internal product that
where
$\Phi =\langle [\varphi \left(x(t+T)\right)\varphi \left(x\left(t\right)\right)],{[\varphi \left(x(t+T)\right)\varphi \left(x\left(t\right)\right)]}^{T}\rangle $. Therefore,
So far, the neural network weight parameters of approximation function
${V}_{i}\left(x\right)$ can be calculated. Thus, the update control policy can be obtained from (
35)
According to [
32,
33,
35,
36], using the policy iteration of RL algorithm, the cost sequence
${V}_{i}\left(x\right)$ converges to the optimal cost
${V}^{*}\left(x\right)$, and the control sequence
${u}_{i}\left(x\right)$ converges to the optimal control function
${u}^{*}\left(x\right)$.
For mismatched uncertain systems, similar neural network approximation can be used.
6. Simulation Examples
Some simulation examples are presented to verify the feasibility of the robust control design method for uncertain nonlinear systems in this section.
Example 1. Consider the following uncertain nonlinear systemswhere $x={[{x}_{1},{x}_{2}]}^{T}$ is the system state, $\Delta f\left(x\right)=\left[\begin{array}{c}0\\ {p}_{1}{x}_{1}cos\left({x}_{2}^{2}\right)\end{array}\right]$ is the uncertain disturbance function of the system, $\Delta g\left(x\right)=\left[\begin{array}{c}0\\ {p}_{2}{x}_{2}^{2}\end{array}\right]$ is input uncertainty function, ${p}_{1}\in [2,2]$, ${p}_{2}\in [0,10]$. Obviously,where $g\left(x\right)=\left[\begin{array}{c}0\\ 1\end{array}\right]$, $h\left(x\right)={p}_{1}{x}_{1}cos\left({x}_{2}^{2}\right)$, $m\left(x\right)={p}_{2}{x}_{2}^{2}$. Moreover, $\lefth\left(x\right)\right\le 2{x}_{1}={f}_{max}\left(x\right)$. Thus, the original robust control problem is converted into calculating optimal control law. For nominal systemfind the control function u, such that the performance indexis minimized. In order to solve the robust control problem by using Algorithm 1, it is assumed that the optimal cost function ${V}^{*}\left(x\right)$ has a neural network structure: ${V}^{*}\left(x\right)={W}^{T}\varphi \left(x\right)$, where $W={[{W}_{1},{W}_{2},{W}_{3}]}^{T}$, $\varphi \left(x\right)={[{x}_{1}^{2},{x}_{1}{x}_{2},{x}_{2}^{2}]}^{T}$. The initial weight is taken as ${W}_{0}={[1,5,1.5]}^{T}$, and the initial state of system ${x}_{0}={[2,0.5]}^{T}$. The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After five iterations, the weight converges to $[1.9645,2.8990,5.4038]$. The robust control law of uncertain system (47) is ${u}^{*}=1.4495{x}_{1}5.4038{x}_{2}$. The convergence process of neural network weight is shown in Figure 1, while the changing process of control signal is shown in Figure 2. The uncertain parameter ${p}_{1}$ and ${p}_{2}$ in uncertain system (47) take different values, the state trajectories of the closedloop system are obtained by the robust control law. Figure 3 shows the trajectory of the closedloop system when ${p}_{1}=2$, ${p}_{2}=1$. Figure 4 shows the trajectory of the closedloop system when ${p}_{1}=1$, ${p}_{2}=4$. Figure 5 shows the trajectory of the closedloop system when ${p}_{1}=0$, ${p}_{2}=7$. Figure 6 shows the trajectory of the closedloop system is ${p}_{1}=2$, ${p}_{2}=10$. From these figures, we can see that the closedloop system is stable, which shows the effectiveness of the robust control law. In this example, because of the linear property of the nominal system, MATLAB software can be used to solve LQR problem directly. With this method, the optimal control is calculated as ${u}^{*}=1.4496{x}_{1}5.4038{x}_{2}$. It is almost the same as the result of neural network approximation, which shows the validity of Algorithm 1.
Example 2. Consider the following uncertain nonlinear systemswhere $x={[{x}_{1},{x}_{2}]}^{T}$ is the system state, ${p}_{1}\in [2,2]$, ${p}_{2}\in [0,5]$, ${p}_{3}\in [1,1]$. Let $\Delta f\left(x\right)=\left[\begin{array}{c}{p}_{1}{x}_{1}cos\left({x}_{2}^{2}\right)+{p}_{3}{x}_{2}sin\left({x}_{1}{x}_{2}\right)\\ 0\end{array}\right]$, $\Delta g\left(x\right)=\left[\begin{array}{c}0\\ {p}_{2}{x}_{2}^{2}\end{array}\right]$. It is easy to know that the system (51) is a mismatched system. The uncertain disturbance of the system is decomposed aswhere, $g\left(x\right)=\left[\begin{array}{c}0\\ 0.5\end{array}\right]$, $c\left(x\right)=\left[\begin{array}{c}1\\ 0\end{array}\right]$, $h\left(x\right)={p}_{1}{x}_{1}cos\left({x}_{2}^{2}\right)+{p}_{3}{x}_{2}sin\left({x}_{1}{x}_{2}\right)$, $m\left(x\right)={p}_{2}{x}_{2}^{2}$. Moreover, ${f}_{max}\left(x\right)$ and ${g}_{max}\left(x\right)$ are calculated as follows.and Select the parameter $\beta =1$. Then the original robust control problem is converted into solving an optimal control problem. For the auxiliary systemfind the control policy, $\overline{u}$, such that the following performance index is minimized In order to obtain the obust control law by using Algorithm 2, it is assumed that the optimal cost function ${V}^{*}\left(x\right)$ has a neural network structure: ${V}^{*}\left(x\right)={W}^{T}\varphi \left(x\right)$, where $W={[{W}_{1},{W}_{2},{W}_{3}]}^{T}$, $\varphi \left(x\right)={[{x}_{1}^{2},{x}_{1}{x}_{2},{x}_{2}^{2}]}^{T}$. The initial weight is taken as ${W}_{0}={[1,3,0.5]}^{T}$, and the initial state of system is chosen as ${x}_{0}={[2,0.5]}^{T}$. The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After six iterations, the weight converges to $W={[2.8983,0.6859,5.2576]}^{T}$. The optimal control of the auxiliary system is calculated as ${\overline{u}}^{*}=\left[\begin{array}{c}0.1715{x}_{1}2.6288{x}_{2}\\ 2.8983{x}_{1}+0.3429{x}_{2}\end{array}\right]$. The robust control law of the original uncertain system is ${u}^{*}=0.1715{x}_{1}2.6288{x}_{2}$. The convergence process of neural network weight is shown in Figure 7, while the changing process of control signal is shown in Figure 8. The uncertain parameters ${p}_{1}$, ${p}_{2}$ and ${p}_{3}$ in uncertain system (51) take different values, the state trajectories of the closedloop system are obtained by the robust control law. Figure 9 shows the trajectory of the closedloop system when ${p}_{1}=1$, ${p}_{2}=1$ and ${p}_{3}=1$. Figure 10 shows the trajectory of the closedloop system when ${p}_{1}=1$, ${p}_{2}=2$ and ${p}_{3}=0$. Figure 11 shows the trajectory of the closedloop system when ${p}_{1}=0.3$, ${p}_{2}=3$ and ${p}_{3}=1$. Figure 12 shows the trajectory of the closedloop system when ${p}_{1}=2$, ${p}_{2}=5$ and ${p}_{3}=1$. From these figures, we can see that the closedloop system is stable, which shows the effectiveness of the robust control law. The nominal system is also a linear system, so MATLAB software can be used to solve LQR problem directly. With this method, the optimal control is calculated as ${\overline{u}}^{*}=\left[\begin{array}{c}0.1713{x}_{1}2.6286{x}_{2}\\ 2.8983{x}_{1}+0.3430{x}_{2}\end{array}\right]$. It has little difference with the approximate result of neural network, which shows the validity of Algorithm 2.
The nominal systems corresponding to the above two examples are linear systems. The following is an example with nonlinear nominal system.
Example 3. Consider the following uncertain nonlinear systemswhere $x={[{x}_{1},{x}_{2}]}^{T}$ is the system state, $\Delta f\left(x\right)=\left[\begin{array}{c}0\\ {p}_{1}{x}_{2}co{s}^{2}\left({x}_{1}\right)\end{array}\right]$ is the uncertain disturbance function of the system, $\Delta g\left(x\right)=\left[\begin{array}{c}0\\ {p}_{2}{x}_{2}^{2}\end{array}\right]$ is input uncertainty function, ${p}_{1}\in [2,2]$, ${p}_{2}\in [0,2]$. Obviously,where $g\left(x\right)=\left[\begin{array}{c}0\\ 1\end{array}\right]$, $h\left(x\right)={p}_{1}{x}_{2}co{s}^{2}\left({x}_{1}\right)$, $m\left(x\right)={p}_{2}{x}_{2}^{2}$. Moreover, $\lefth\left(x\right)\right\le 2{x}_{2}={f}_{max}\left(x\right)$. Thus, the original robust control problem is converted into calculating optimal control law. For nominal systemfind the control function u, such that the performance indexis minimized. In order to solve the robust control problem by using Algorithm 1, it is assumed that the optimal cost function ${V}^{*}\left(x\right)$ has a neural network structure: ${V}^{*}\left(x\right)={W}^{T}\varphi \left(x\right)$, where $W={[{W}_{1},{W}_{2},{W}_{3}]}^{T}$, $\varphi \left(x\right)={[{x}_{1}^{2},{x}_{1}{x}_{2},{x}_{2}^{2}]}^{T}$. The initial weight is taken as ${W}_{0}={[2,5,0.5]}^{T}$, and the initial state of system ${x}_{0}={[2,0.5]}^{T}$. The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After five iterations, the weight converges to $[25.5830,12.5830,2.6458]$. The robust control law of uncertain system (55) is ${u}^{*}=6.2915{x}_{1}2.6458{x}_{2}$. The convergence process of neural network weight is shown in Figure 13, while the changing process of control signal is shown in Figure 14. The uncertain parameter ${p}_{1}$ and ${p}_{2}$ in uncertain system (55) take different values, the state trajectories of the closedloop system are obtained by the robust control law. Figure 15 shows the trajectory of the closedloop system when ${p}_{1}=1$, ${p}_{2}=0.8$. Figure 16 shows the trajectory of the closedloop system when ${p}_{1}=0.5$, ${p}_{2}=1$. Figure 17 shows the trajectory of the closedloop system when ${p}_{1}=1$, ${p}_{2}=2$. Figure 18 shows the trajectory of the closedloop system is ${p}_{1}=2$, ${p}_{2}=1$. From these figures, we can see that the closedloop system is stable, which shows the effectiveness of the robust control law.