Adaptive Optimal Robust Control for Uncertain Nonlinear Systems Using Neural Network Approximation in Policy Iteration

: In this study, based on the policy iteration (PI) in reinforcement learning (RL), an optimal adaptive control approach is established to solve robust control problems of nonlinear systems with internal and input uncertainties. First, the robust control is converted into solving an optimal control containing a nominal or auxiliary system with a predeﬁned performance index. It is demonstrated that the optimal control law enables the considered system globally asymptotically stable for all admissible uncertainties. Second, based on the Bellman optimality principle, the online PI algorithms are proposed to calculate robust controllers for the matched and the mismatched uncertain systems. The approximate structure of the robust control law is obtained by approximating the optimal cost function with neural network in PI algorithms. Finally, in order to illustrate the availability of the proposed algorithm and theoretical results, some numerical examples are provided.


Introduction
It is ineluctable to contain uncertain parameters and disturbances in practical systems due to modeling errors, external disturbances, and so on [1]. Thus, it is of great practical significance to study robust control of the uncertain systems. In recent years, the control problem of uncertain systems has been extensively studied. The literature on robust control of uncertain systems mainly includes linear systems and nonlinear systems. For the uncertain linear systems, algebraic Riccati equations (ARE) or linear matrix inequalities (LMI) were mostly used to deal with them in the classical research methods [2][3][4][5][6]. In this literature, both matched and mismatched systems were involved. For nonlinear systems, the early research methods include feedback linearization, fuzzy modeling, nonlinear H ∞ control, and so on [7][8][9][10][11]. However, in recent decades, neural networks (NN) and PI in RL were used to approximate the robust control law numerically [12][13][14].
The PI method in RL was initially utilized to calculate the optimal control law for some deterministic systems. Werbos first proposed an idea of approximating the solution in Bellman's equation using approximate dynamic programming (ADP) [15]. There have been many results of using the PI method to calculate the optimal control law for some deterministic systems [16][17][18][19]. There are two major benefits of using the PI algorithm to deal with such optimal problems. On the one hand, it can effectively solve some problems of the "curse of dimensionality" in engineering control [20]. On the other hand, it can be utilized to calculate the optimal control law without knowing the system dynamics. In the practice of engineering control, it is difficult to obtain system dynamics accurately. Therefore, it is a good choice to use the PI algorithm to solve unknown model control problems.
Within the last ten years, the PI method was also developed to calculate the robust controller for some uncertain systems, which is based on the optimal control method of robust control [21]. For an input constraint nonlinear system with continuous time, a novel RL-based algorithm was prosposed to deal with the robust control problem in [22]. Based on network structure approximation, an online PI algorithm was developed to solve robust control of a class of nonlinear discrete-time uncertain systems in [23]. Using a data-driven RL algorithm, a robust control scheme was developed for a class of completely unknown dynamic systems with uncertainty in [24]. In addition, there are many other examples of literature on robust control based on RL, such as [25][26][27]. In all the literature listed above, the solution to the Hamilton Jacobi Bellman (HJB) equation was approximated by neural network. In fact, solving the HJB equation is a key problem in optimal control problem [28]. The HJB equation is difficult to solve because it is a nonlinear partial differential equation. For a nonlinear system, the HJB equation is solved with neural network approximation in many cases. Meanwhile, for the linear system, ARE is used to solve it instead of a neural network. However, for all we know, most of the current studies have not considered the input uncertainty in system. The input uncertainty does exist in the actual control system.
In this study, a class of continuous-time nonlinear systems with internal and input uncertainties is considered. The main objective is to establish robust control laws for the uncertain systems. By solving the optimal control problem constructed, the robust control problem is converted into calculating an optimal controller. The online PI algorithms are proposed to calculate robust control by approximating the optimal cost with neural network. The convergence of the proposed algorithms is proved. Numerical examples are given to illustrate the availability of the method.
Our main contributions are as follows. First, more general uncertain nonlinear systems are considered, in which the uncertainty entered both the system and the input. For the matched and the mismatched uncertain systems, it is proved that the robust control can be converted into calculating an optimal controller. Second, the online PI algorithms are developed to solve the robust control problem. The neural network is utilized to approximate the optimal cost in PI algorithm, which fulfilled a difficult task of solving the HJB equation.
The rest of this paper is arranged as follows. We formulate the robust control problems and propose some basic results for the issues under consideration in Section 2. Solving the robust control problem is converted to calculate an optimal control law of a nominal or auxiliary system in Sections 3 and 4. Based on approximating optimal cost with neural network, the online PI algorithms are developed to solve the robust control problem in Section 5. To support the proposed theoretical framework, we provide some numerical examples in Section 6. In Section 7, the study is concluded, and the scope for future research is discussed.

Preliminaries and Problem Formulation
Consider an uncertain nonlinear system as follows: where x(t) ∈ R n is the system state, u(t) ∈ R m is the control input, f (x(t)) ∈ R n , g(x(t)) ∈ R n×m are known function, ∆ f (x(t)) ∈ R n , ∆g(x(t)) ∈ R n×m are uncertain disturbing function. The control objective is to establish a robust control law u = u(x) in order that the closed-loop system is asymptotically stable for all allowed uncertain disturbances ∆ f (x(t)) and ∆g(x(t)).
As a general case, we first make the following assumptions to ensure that the state Equation (1) is well defined [1,29]. Assumption 1. In (1), f (x) + g(x)u is Lipschitz continuous with respect to x and u on the set Ω ⊆ R n containing the origin.

Assumption 2.
For the free vibration system, f (0) = 0, ∆ f (0) = 0, that is, x = 0 is the equilibrium point of the free vibration system. Definition 1. The system (1) is called to satisfy the system dynamics matched condition if there is a function matrix h(x) ∈ R m×1 such that (2) where m(x) ≥ 0.
Definition 3. If the system (1) satisfies the conditions (2) and (3) for any allowed disturbances ∆ f (x) and ∆g(x), then the system (1) is called a matched uncertain system. Definition 4. If the system (1) does not satisfy the condition (2) or (3) for any allowed disturbances ∆ f (x) and ∆g(x), then the system (1) is called a mismatched uncertain system.
Next, we consider the robust control problem of nonlinear system (1) with matched and mismatched conditions, respectively.

Robust Control of Matched Uncertain Nonlinear Systems
This section considers the problem of robust control when the system (1) meets the matched conditions (2) and (3). By constructing appropriate performance indexes, the problem of robust control is transformed into calculating the optimal control law of a corresponding nominal system. Based on the optimal control of the nominal system, a PI algorithm is proposed to obtain robust feedback controller.
For the nominal systemẋ find the controller u = u(x) to minimize performance index The definition of admissible control in optimal control problem is given below [26].
Definition 5. The control policy u(x) is called an admissible control of the system (4) with regard to the performance function (5) on compact set Ω ⊆ R n if u(x) is continuous on Ω, u(0) = 0, it can stabilize the system (4) on Ω, and the performance function (5) is limited for any x ∈ Ω.
According to the performance index (5), the cost function corresponding to the admissible control u(x) is given by Taking time derivative on both side of (6), it follows the Bellman equation where ∇V is the gradient vector of the cost function V(x, u) with respect to x.

Definite Hamiltonian function
Determining the extremum of the Hamiltonian function yields the optimal control function u * (x) = − 1 2 g T (x)∇V (9) By substituting (9) into (7), it follows that optimal cost V * (x) satisfies the following HJB equation and initial conditions V * (0) = 0. Solving the optimal cost V * (x) from the HJB Equation (10), we can get the solution to the optimal control problem. Thus, the robust control problem can be solved.

Remark 1.
For matched nonlinear systems, the robust controller can be obtained by solving the optimal cost function V * [x(t)] from HJB Equation (10). In Section 4, we will use the PI algorithm to solve the HJB equation, which is a difficult partial differential equation.

Robust Control of Nonlinear Systems with Mismatched Uncertainties
In this section, we consider the robust control problem when the system (1) does not satisfy the matched condition (2). At this time, the system is a mismatched nonlinear uncertain system. By constructing the appropriate auxiliary system and performance index, the robust control for the mismatched uncertain system is transformed into solving optimal control law of an auxiliary system.
Firstly, the following assumptions are given.

Assumption 3. Suppose that the uncertainty of system
is a known function matrix of appropriate dimensions, h(x) and m(x) are uncertain functions, and m(x) ≥ 0.
The goal of robust control is to find a control function u(x), which makes the closedloop systemẋ globally asymptotically stable for all uncertainties h(x) and m(x).
In order to obtain the robust controller, an optimal control problem is constructed as follow. For the following auxiliary systemṡ . Moreover, f max (x) and g max (x) are nonnegative functions and satisfy the conditions According to the performance index (18), the cost function corresponding to the The following Bellman equation is obtained by taking the time derivation on both sides of (20) where ∇V is the gradient vector of Defining Hamiltonian functions as Assuming that the minimum value exists and is unique in (22), the optimal control law is given byū By substituting (23) into (21), the HJB equation is given by and the initial value V * (0) = 0.

Remark 2.
Generally, the pseudo-inverse of g(x), g(x) + will exist if its columns are linearly independent when Assumptions 1 and 2 are true [31]. In practical control systems, the function, g(x), is usually column full-rank. Therefore, the pseudo-inverse of the function g(x) is generally satisfied. Furthermore, the pseudo-inverse g(x) + satisfies g(x) + g(x) = I. However, it does not satisfy g(x)g(x) + = I. In addition, the auxiliary system constructed above is not a nominal system, but a compensation control term v(x) is added to the nominal system.
If we can choose an appropriate parameter β, the optimal cost V * (x) can be computed from HJB Equation (24). Then, we can get the optimal control law of system (17) with performance index (18). The following theorem shows that optimal control u * (x) = − 1 2 g T (x)∇V * is a robust controller for uncertain systems.
Theorem 2. Assume that the mismatched uncertain system (16) satisfies Assumptions 4.1, 4.2 and (19). Consider the auxiliary system (17) corresponding to the performance index (18). There exists a solution V * (x) in HJB Equation (24) for a selected parameter β, and for a constant β satisfying |β | < |β|, such that Then, the optimal control policy u * (x) = − 1 2 g T (x)∇V * can globally asymptotically stabilize the nonlinear uncertain system (16). That is to say, the closed-loop uncertain Proof. In order to prove the global asymptotic stability of the closed-loop system, V * (x) is chosen as the Lyapunov function. Considering the performance index (18), V * (x) is obviously positive, and V * (0) = 0. Taking the time derivative of the function V * (x) along the system (16), we have It follows from (24) that As a result, On the other hand It follows from the basic matrix inequality that So, it can be obtained from (26)-(28) that Therefore, by Lyapunov stability theory, the optimal control u * (x) = − 1 2 g T (x)∇V * can make the closed-loop uncertain nonlinear system asymptotically stable. Thus, for a constant c > 0, there is a neighborhood N = {x : x < c} such that if the state x(t) enters the neighborhood N, then x → 0 when t → ∞. However, x(t) cannot stay out of the domain N forever; otherwise, for all t > 0, there is x(t) ≥ c. This implies that This contradicts the positivity of V * [x(t)]. Therefore, system (16) is globally asymptotically stable. We complete the proof.

Neural Networks Approximation in PI Algorithm
In the first two sections, the robust control of uncertain nonlinear systems was transformed into solving the optimal control of an auxiliary system. However, whether the uncertain system is matched or mismatched, the key issue is how to obtain the solution to corresponding HJB equation. As is well known, it is a nonlinear partial differential equation that is hard to solve. Moreover, solving the HJB equation may lead to the curse of dimensionality [21]. In this section, an online PI algorithm is used to solve the HJB equation iteratively, and neural networks are utilized to approximate the optimal cost in PI algorithm.

PI Algorithms for Robust Control
For the system with matched uncertainty, the optimal control problem (4) with (5) is considered. For any admissible control, the corresponding cost function can be expressed as where T > 0 is a selected constant. Therefore, it follows that Based on the integral reinforcement relationship (29) and optimal control (9), the PI algorithm of robust control for matched uncertain nonlinear systems is given below.
The convergence of Algorithm 1 is illustrated as follows. The following conclusion gives an equivalent form of the Bellman Equation (30).

Algorithm 1 PI algorithm of robust control for matched uncertain nonlinear systems
(1) Select supremum f max (x) to satisfy h(x) ≤ f max (x); (2) Initialization: for the nominal nonlinear system (4), select an initial stabilization control u 0 (x); (3) Policy evaluation: for control input u i (x), calculate cost V i (x) from the Bellman equation (4) Policy improvement: compute the control law u i+1 (x) using By repeatedly iterating between (30) and (31), until the control input is convergent. (4). Then the optimal cost V i (x) solved from (30) is equivalent to solving the following equation

Proposition 1. Suppose that u i (x) is a stabilization controller of nominal system
Proof. Dividing both sides of (30) by T and finding the limit yields From the definition of function limit and L'Hospital's rule, we can get It follows that Thus, we can deduce (32) from (30). On the other hand, along the stable systemẋ = f (x) + g(x)u i (x), finding the time derivative of V i (x) yields Integrating both sides from t to t + T, yields Therefore, we can get the following result from (32) This proves that (32) can deduce (30).
According to [32][33][34], if the initial stabilization control policy is given u 0 (x), then the follow-up control policy calculated by the iterative relations of (30) and (31) is also a stabilizing control policy, and cost sequence V i [x(t)] calculated by iteration converges to the optimal cost. By Proposition 1, it is known that (30) and (32) are equivalent, so the iterative relations (30) and (31) in Algorithm 1 converge to the optimal control and optimal cost.
Similarly, we give a PI algorithm of robust control for nonlinear systems with mismatched uncertainties.
The steps of policy evaluation (34) and policy improvement (35) are iteratively calculated until the policy improvement step does not change the current policy. The optimal cost function is calculated as V * (x), then u * (x) = − 1 2 g T (x)∇V * (x) is the robust control law. The convergence proof of Algorithm 2 is similar to Algorithm 1, which will not be repeated here.

Algorithm 2 PI algorithm of robust control for nonlinear systems with mismatched uncertainties
(1) Decompose the uncertainty properly so that ∆ f (x) = c(x)h(x) and ∆g(x) = g(x)m(x), select constant parameter β, β such that |β | < |β|, and then calculate the nonnegative function f max (x) and g max (x) according to (19); (2) For auxiliary system (17), select an initial stabilization control policy u 0 (x); (3) Policy evaluation: Give a control policy u i (x), the cost V i (x) is solved from the following Bellman equation (4) Policy improvement: Calculate the control policy using the following update law x T x is satisfied. Return to step (1) and select the larger constants β and β when it does not hold.

Remark 3. In
Step (3) of Algorithm 1 or Algorithm 2, solving V i [x(t)] from (30) or (34) can be transformed into a least squares problem [17]. By reading enough data online along the system trajectory, the cost function V(x) can be calculated by using the least square principle. However, the cost V i [x(t)] has no specific expressions. In next subsection, along the system trajectory, online reading of sufficient data on the interval [t, t + T], the cost V i [x(t)] can be approximated by neural network in PI algorithms. Moreover, implementation of the algorithm does not need to know the system dynamics function f (x).

Neural Network Approximation of Optimal Cost in PI Algorithm
In the implementation of the PI algorithms, we need to use the data of the nominal system and use the least square method to solve the cost function. However, the cost function of nonlinear optimal control problem has no specific form. Therefore, it is necessary to use neural network structure to approximate the cost function, carry out policy iteration, update weights, and then obtain the approximate optimal cost function. In this subsection, neural network is utilized to approximate the optimal cost in the corresponding HJB equation.
Based on the continuous approximation theory of neural network [35], a single neural network is utilized to approximate the optimal cost in HJB equation. For matched uncertain systems, suppose that the solution V * (x) of HJB Equation (10) is smooth positive definite, and the optimal cost function on compact set Ω is expressed as where W ∈ R L is an unknown ideal weight, and φ(.) : R n → R L is a linear independent basis vector function. It is assumed that φ(x) is continuous, φ(0) = 0, and ε(x) is the error vector of neural network reconstruction. Thus, the gradient of the optimal cost function (36) can be expressed as where ∇ε(x) = ∂ε ∂x . On the basis of approximation property of neural network [35,36], when the number of neurons in hidden layer L → ∞, the approximation error ε(x) → 0, ∇ε(x) → 0. Substituting (36) and (37) into (9), the optimal control is rewritten as follows Assume thatŴ is an estimated value of the ideal weight W. Since the ideal weight W in (36) is unknown, the cost function of the i − th iteration in Algorithm 1 is expressed aŝ Using the approximation of neural network in cost function, the Bellman Equation (30) in Algorithm 1 is rewritten as followŝ Since the above formula uses neural network to approximate the cost function, the residual error caused by neural network approximation is In order to obtain the neural network weight parameters of approximation function, the following objective functions can be minimized in the meaning of least square Using the definition of inner product, it can be rewritten as It follows from properties of the internal product that So far, the neural network weight parameters of approximation function V i (x) can be calculated. Thus, the update control policy can be obtained from (35) According to [32,33,35,36], using the policy iteration of RL algorithm, the cost sequence V i (x) converges to the optimal cost V * (x), and the control sequence u i (x) converges to the optimal control function u * (x).
For mismatched uncertain systems, similar neural network approximation can be used.

Simulation Examples
Some simulation examples are presented to verify the feasibility of the robust control design method for uncertain nonlinear systems in this section. Example 1. Consider the following uncertain nonlinear systemṡ is the uncertain disturbance function of the system, ∆g(x) = 0 Obviously, . Thus, the original robust control problem is converted into calculating optimal control law. For nominal systemẋ (t) = 0 6 −1 1 find the control function u, such that the performance index is minimized. In order to solve the robust control problem by using Algorithm 1, it is assumed that the optimal cost function V * (x) has a neural network structure: The initial weight is taken as W 0 = [−1, 5, 1.5] T , and the initial state of system x 0 = [2, −0.5] T . The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After five iterations, the weight converges to [1.9645, 2.8990, 5.4038]. The robust control law of uncertain system (47) is u * = −1.4495x 1 − 5.4038x 2 . The convergence process of neural network weight is shown in Figure 1, while the changing process of control signal is shown in Figure 2. The uncertain parameter p 1 and p 2 in uncertain system (47) take different values, the state trajectories of the closed-loop system are obtained by the robust control law. Figure 3 shows the trajectory of the closed-loop system when p 1 = −2, p 2 = 1. Figure 4 shows the trajectory of the closed-loop system when p 1 = −1, p 2 = 4. Figure 5 shows the trajectory of the closed-loop system when p 1 = 0, p 2 = 7. Figure 6 shows the trajectory of the closed-loop system is p 1 = 2, p 2 = 10. From these figures, we can see that the closed-loop system is stable, which shows the effectiveness of the robust control law.
In this example, because of the linear property of the nominal system, MATLAB software can be used to solve LQR problem directly. With this method, the optimal control is calculated as u * = −1.4496x 1 − 5.4038x 2 . It is almost the same as the result of neural network approximation, which shows the validity of Algorithm 1.    Example 2. Consider the following uncertain nonlinear systems . It is easy to know that the system (51) is a mismatched system. The uncertain disturbance of the system is decomposed as . Moreover, f max (x) and g max (x) are calculated as follows.
Select the parameter β = 1. Then the original robust control problem is converted into solving an optimal control problem. For the auxiliary systeṁ find the control policy,ū, such that the following performance index is minimized In order to obtain the obust control law by using Algorithm 2, it is assumed that the optimal cost function V * (x) has a neural network structure: The initial weight is taken as W 0 = [1, −3, 0.5] T , and the initial state of system is chosen as x 0 = [−2, 0.5] T . The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After six iterations, the weight converges to W = [2.8983, −0.6859, 5.2576] T . The optimal control of the auxiliary system is calculated The robust control law of the original uncertain system is The convergence process of neural network weight is shown in Figure 7, while the changing process of control signal is shown in Figure 8. The uncertain parameters p 1 , p 2 and p 3 in uncertain system (51) take different values, the state trajectories of the closed-loop system are obtained by the robust control law. Figure 9 shows the trajectory of the closed-loop system when p 1 = −1, p 2 = 1 and p 3 = 1. Figure 10 shows the trajectory of the closed-loop system when p 1 = −1, p 2 = 2 and p 3 = 0. Figure 11 shows the trajectory of the closed-loop system when p 1 = 0.3, p 2 = 3 and p 3 = −1. Figure 12 shows the trajectory of the closed-loop system when p 1 = −2, p 2 = 5 and p 3 = 1. From these figures, we can see that the closed-loop system is stable, which shows the effectiveness of the robust control law.
The nominal system is also a linear system, so MATLAB software can be used to solve LQR problem directly. With this method, the optimal control is calculated asū * = 0.1713x 1 − 2.6286x 2 −2.8983x 1 + 0.3430x 2 . It has little difference with the approximate result of neural network, which shows the validity of Algorithm 2.   The nominal systems corresponding to the above two examples are linear systems. The following is an example with nonlinear nominal system. Example 3. Consider the following uncertain nonlinear systemṡ is the uncertain disturbance function of the system, ∆g(x) = 0 Obviously, . Thus, the original robust control problem is converted into calculating optimal control law. For nominal systeṁ find the control function u, such that the performance index is minimized. In order to solve the robust control problem by using Algorithm 1, it is assumed that the optimal cost function V * (x) has a neural network structure: The initial weight is taken as W 0 = [−2, 5, 0.5] T , and the initial state of system x 0 = [2, −0.5] T . The neural network weights are calculated iteratively by MATLAB. In each iteration, 10 sets of data samples are collected along the nominal system trajectory to perform the batch least squares problem. After five iterations, the weight converges to [25.5830, 12.5830, 2.6458]. The robust control law of uncertain system (55) is u * = −6.2915x 1 − 2.6458x 2 . The convergence process of neural network weight is shown in Figure 13, while the changing process of control signal is shown in Figure 14. The uncertain parameter p 1 and p 2 in uncertain system (55) take different values, the state trajectories of the closed-loop system are obtained by the robust control law. Figure 15 shows the trajectory of the closed-loop system when p 1 = 1, p 2 = 0.8. Figure 16 shows the trajectory of the closed-loop system when p 1 = −0.5, p 2 = 1. Figure 17 shows the trajectory of the closed-loop system when p 1 = 1, p 2 = 2. Figure 18 shows the trajectory of the closed-loop system is p 1 = 2, p 2 = 1. From these figures, we can see that the closed-loop system is stable, which shows the effectiveness of the robust control law.

Conclusions
In this paper, the PI algorithms in RL are proposed to solve robust control problem for a class of nonlinear continuous time uncertain system. The robust control law is obtained without knowing the internal dynamics of the nominal system. The considered robust control problem is converted into solving an optimal control problem containing a nominal or auxiliary system with a predefined performance index. The online PI algorithms are established to calculate the robust controller of matched and mismatched system. The numerical examples are given to show the availability of the theoretical results. The proposed method may be extended to solve robust tracking problems for some nonlinear systems with uncertainty entering output, which may be the subject of our future research.