Next Article in Journal
Special Issue on Ensemble Learning and/or Explainability
Next Article in Special Issue
Set-Point Control of a Spatially Distributed Buck Converter
Previous Article in Journal
The Importance of Modeling Path Choice Behavior in the Vehicle Routing Problem
Previous Article in Special Issue
Hyperparameter Black-Box Optimization to Improve the Automatic Classification of Support Tickets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Algorithm for Solving Zero-Sum Differential Game Related to the Nonlinear H Control Problem

Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, HR-10000 Zagreb, Croatia
*
Authors to whom correspondence should be addressed.
Algorithms 2023, 16(1), 48; https://doi.org/10.3390/a16010048
Submission received: 18 December 2022 / Revised: 5 January 2023 / Accepted: 9 January 2023 / Published: 10 January 2023
(This article belongs to the Collection Feature Papers in Algorithms)

Abstract

:
This paper presents an approach for the solution of a zero-sum differential game associated with a nonlinear state-feedback H control problem. Instead of using the approximation methods for solving the corresponding Hamilton–Jacobi–Isaacs (HJI) partial differential equation, we propose an algorithm that calculates the explicit inputs to the dynamic system by directly performing minimization with simultaneous maximization of the same objective function. In order to achieve numerical robustness and stability, the proposed algorithm uses: quasi-Newton method, conjugate gradient method, line search method with Wolfe conditions, Adams approximation method for time discretization and complex-step calculation of derivatives. The algorithm is evaluated in computer simulations on examples of first- and second-order nonlinear systems with analytical solutions of H control problem.

1. Introduction

Although the theory of nonlinear H control [1,2] is well developed and can be considered standardized, developing algorithms for solving this problem that enable practical implementation is a very active area of research. Furthermore, it is well known that the H control problem can be formulated as a two-player zero-sum differential game [3,4] with the objective function including a parameter in such a way that the control vector represents the player that minimizes the objective function while the vector of uncertainty represents the player that maximizes the same objective function. All available scientific research is mainly based on two main approaches: the formulation of the problem in the form of linear matrix inequalities (LMI) [5,6,7,8] or on the determination of the approximate solution of the associated Hamilton–Jacobi–Isaacs (HJI) equation [9,10,11,12], which is in linear case equivalent to the generalized Riccati equation [13].
Furthermore, methods for solving the nonlinear H control problem of singular or descriptor systems have also been developed. The state-feedback scheme, impulse controllability and the well-known implicit function theorem for stability analysis for finite-time H control problem of descriptor systems subject to actuator saturation are adopted in [14], while in [15] impulse and hybrid controllers are combined, resulting in less conservative stability conditions than in state-feedback control strategy.
In nonlinear programming-based algorithms that have been proposed in the literature, the system dynamics is treated as an equality constraint and is included in the optimization process using the method of Lagrange multipliers. This results in a HJI equation that is very difficult or almost impossible to solve. For this reason, many approximation methods are developed in which actual computational complexity increases with the number of system states which need to be estimated. In [10,12], the reviews of reinforcement adaptive learning and adaptive dynamic programming techniques to solve multiplayer games related to H control are given. In [16], an improved iterative or successive approximation methods for solving the HJI is developed. A game theoretic differential dynamic programming based algorithm for calculating both minimizing and maximizing inputs based on Taylor series expansion of the associated HJI equation around a nominal trajectory is proposed in [11]. Using a critic neural network with time-varying activation functions, the HJI equation is approximately solved in [9]. An event-triggering function for two-player zero-sum games in the presence of control constraints is designed in [17]. Furthermore, in [13], an iterative procedure to find the stabilizing solution of a set of Riccati equations related to discrete-time H control problems for periodic systems is addressed. The randomized algorithm based on the Ray-Shooting Method for H optimal synthesis of proportional–integral–derivative (PID) controller for linear systems is proposed in [18].
In the case of the methods and algorithms mentioned above, the application of LMI and solving Riccati equation requires the linearization, and in this case the optimality of solution cannot be guaranteed in all operating states of nonlinear system dynamics. On the other hand, solving the HJI equation can be very complex and therefore difficult to apply in real control tasks.
In this paper, the nonlinear state-feedback H control problem is formulated as zero-sum differential game, and we approach its solution without the LMI formalism or the need to approximate the solution of the HJI equation. The main idea of the presented approach is in the application of the conjugate gradient method, where the first-order derivatives are calculated by matrix relations backwards in time, which gives a direct numerical calculation of the control and uncertainty variables that explicitly depend on the system states.
In contrast to the approaches proposed in [11,16], which in the case of including a more complex nonlinearity of the dynamic system in the objective function result in a HJI equation so complex that it is practically impossible to solve, in our approach the nonlinear system dynamics is not included in the objective function as an equality constraint, but the state variables, control law and uncertainty are coupled by recursive matrix relations and chain rule for ordered derivatives, which are used to calculate the objective function gradients that appear in conjugate gradient method. Furthermore, in contrast to approaches presented in [10,12], which can be computationally expensive since the tuning of neural network weights is based on a method of weighted residuals that includes calculation of Lebesgue integrals and estimation of state variables, in our approach, the computational complexity does not depend on the dimension of the state space since procedure proposed for gradient calculation has a backward in time structure similar to the back propagation through time algorithm.
Bedsides the conjugate gradient method, which is used for computation of saddle point of zero-sum differential game, the quasi-Newton method for L 2 -gain minimization, line search method with Wolfe conditions, Adams approximation method for time discretization and the complex-step method for calculation of derivatives are also systematically integrated into an efficient mathematical tool in order to achieve numerical robustness and stability.
The rest of the paper is organized as follows. In Section 2, the preliminaries of the nonlinear state-feedback H control from the concept of dissipativity point of view and also from the zero-sum differential games point of view are presented. Although theories of these concepts are well known, providing basic terms is necessary to understand the contributions contained in the following sections. A complete framework of the derivation of the algorithmic procedure that optimizes the L 2 -gain of nonlinear dynamic systems without solving the HJI partial differential equation is given in Section 3. In Section 4, the proposed algorithmic procedure is evaluated on examples of nonlinear systems for which the input variables can be determined exactly by solving the HJI equation. Finally, Section 5 concludes the paper.

Notation

The notation used is fairly standard. Matrices and vectors are represented in bold upper and bold lower case, respectively. Scalars are represented in italic lower case. I is an identity matrix and 0 is a null matrix. The dimensions of the matrices and vectors can generally be determined trivially by the context. The symbol ∇ stands for the gradient and is considered as a row vector. The symbol T denotes transposition.
The vec ( · ) is a operator that stacks the columns of the matrix one underneath the other. The Kronecker product of the two matrices A ( m × n ) and B ( p × q ) , denoted by A B is a m p × n q matrix defined by A B = ( a i j B ) i j . The definitions of matrix differentials calculus and the algebras related to Kronecker products can be found in [19,20].
The Euclidean norm of the vector is defined as x = x T x . L 2 I , R n stands for the standard Lebesgue spaces of vector valued square-integrable and essentially bounded functions mapping an interval I R to R n . This space is equipped with an L 2 norm defined by · L 2 = t 0 t f · 2 d t . We avoid explicitly showing the dependence of the variables from the time when not needed.

2. Preliminaries and Problem Formulation

In this section, we give a review of the basic terms that include nonlinear state-feedback H control and related differential game theory, and this is mostly based on classic references from this field [1,2,3,4,21,22].
Consider the causal input-affine nonlinear system in the state space defined on a some manifold X R n in the following form
Σ : x ˙ = f ( x ) + G 1 ( x ) u + G 2 ( x ) d , x ( t 0 ) = x 0 , y = x , z = h ( x ) + L ( x ) u ,
where x X is the state vector, u U R m is the control vector, d is the vector representing internal/external uncertainty belonging to the set D L 2 t 0 , t f , R s . The output vector y R n contains all directly measured states of system Σ . The vector z R q is the performance variable. Furthermore, the functions f : X X , G 1 : X M n × m ( X ) , G 2 : X M n × s ( X ) , h : X R q , L : X M q × m ( X ) are real C 1 -functions of x .
The following assumptions are employed:
Assumption 1.
x = 0 is a unique equilibrium point, with  u = 0 and d = 0 , of system Σ and for simplicity f ( 0 ) = 0 , h ( 0 ) = 0 .
Assumption 2.
The vector h ( x ) and matrix L ( x ) are such that h ( x ) T L ( x ) = 0 i L ( x ) T L ( x ) = I form all x X , which implies
z = h ( x ) u z 2 = h ( x ) 2 + u 2 .
By introducing Assumption 2, the so-called singular problem is avoided. More details on solving the singular problem can be found in [23,24].
Assumption 3.
Initial state vector x 0 is a priori known.
Furthermore, the following definition is introduced:
Definition 1
( L 2 -gain). L 2 -gain from d to z of system Σ is the supremum of γ > 0 satisfies
z L 2 2 γ 2 d L 2 2 + β ( x 0 ) ,
for some bounded C 0 -function β : U X R such that β ( 0 ) = 0 .
In general, the problem of nonlinear H control of system Σ where all state variables are available (measurable or can be estimated) can be formulated as follows:
Problem 1.
The problem of optimal nonlinear state-feedback H control of system Σ is to determine the control law u * = μ ( x , t ) and the “worst case” d * = ν ( x , t ) such that γ > 0 is minimized.
Assumption 4.
The functions μ ( x , t ) and ν ( x , t ) are μ C 1 ( X ) , ν C 1 ( X ) .
If for the system Σ there exists some γ γ * that satisfies (3), then a zero-sum differential game can be defined. The optimal value of this game is given by the following expression:
J * ( x 0 ) = min u max d t 0 t f h ( x ) 2 + u 2 γ 2 d 2 d t ,
with dynamic equality constraints (1) on a finite time horizon t f > t 0 .
The necessary conditions for the saddle point of the zero-sum differential game (4) follow from the minimum and maximum principles, and are of the form
u * ( x , t ) = 1 2 G 1 T ( x ) x T V ( x , t ) ,
d * ( x , t ) = 1 2 γ 2 G 2 T ( x ) x T V ( x , t ) ,
where V is a smooth positive semidefinite solution of the HJI partial differential equation
t V ( x , t ) + x V ( x , t ) f ( x ) + 1 4 x V ( x , t ) 1 γ 2 G 2 ( x ) G 2 T ( x ) G 1 ( x ) G 1 T ( x ) ] x T V ( x , t ) + h T ( x ) h ( x ) = 0 , V ( x ( t f ) , t f ) = 0 .

3. Synthesis of the Algorithm for the Solution of the Zero-Sum Differential Game

In this section, an approach of determining control and uncertainty variables for optimal H control (Problem 1) of nonlinear dynamic system Σ is proposed. The proposed approach does not require solving the HJI Equation (7), but the solution is reduced to the direct numerical determination of the saddle point of the related zero-sum differential game.
Control and uncertainty variables are approximated by functions with a linear dependence on a finite number of constant parameters. To calculate these parameters, an approach based on the integration of the quasi-Newton method, the conjugate gradient method, the Adams method and the complex-step derivative approximation into one algorithm is proposed. The goal is to obtain a control variable that explicitly depends on the state variables and in that form is simple for practical implementation. Additionally, the aim is to achieve a numerical solution that uniformly converges towards the optimal solution by increasing the order of complexity of the approximation, i.e., by increasing the number of parameters.
Since we introduced the Assumption 4, based on Weierstraß’s theorem ([25], p. 65) (which refers to polynomial approximation functions) and its generalizations [26,27,28,29] (which refer to non-polynomial forms of nonlinear approximation functions) on the uniform approximation of smooth functions, there are constants p j i , r j i R such that the i-th component of control and uncertainty vector can be written in the following form:
u ^ i ( x ) = j = 1 n θ p j i θ j i ( x ) ,
d ^ i ( x ) = j = 1 n ψ r j i ψ j i ( x ) ,
where θ j i ( x ) C 1 ( X ) , ψ j i ( x ) C 1 ( X ) such that θ j i ( 0 ) = 0 , ψ j i ( 0 ) = 0 . Linear subspaces generated by sets { θ j i ( x ) } i { ψ j i ( x ) } are dense in the Sobolev norm W 1 , [30].
For well chosen functions θ j i ( x ) and ψ j i ( x ) , we have
| u ^ i ( x ) u i ( x ) | < ε u i ( x ) ,
| d ^ i ( x ) d i ( x ) | < ε d i ( x ) ,
where ε u i ( x ) i ε d i ( x ) are the approximation errors. It follows that ε u i ( x ) 0 , ε d i ( x ) 0 when n θ , n ψ , respectively, while for fixed n θ and n ψ , approximation errors are bounded by constants on the compact set. It is well known from approximation theory (see for example [31,32]) that it is often possible to determine in advance how many terms of the basis functions expansion should be taken for the required accuracy.
Equations (8) and (9) can be written in the following matrix form:
u ^ ( x ) = Θ ( x ) π ,
d ^ ( x ) = Ψ ( x ) ρ ,
where
Θ ( x ) θ 1 ( x ) 0 0 0 θ 2 ( x ) 0 0 0 θ m ( x ) , Ψ ( x ) ψ 1 ( x ) 0 0 0 ψ 2 ( x ) 0 0 0 ψ s ( x ) , θ i ( x ) θ 1 i ( x ) θ 2 i ( x ) θ n θ i ( x ) , ψ i ( x ) ψ 1 i ( x ) ψ 2 i ( x ) ψ n ψ i ( x ) , π p 1 p 2 p m , ρ r 1 r 2 r s , p i p 1 i p 2 i p n θ i , r i r 1 i r 2 i r n ψ i .
In this paper, based on the previous considerations, the problem that needs to be solved can be formulated as follows:
Problem 2.
Determine the parameters π and ρ such that the L 2 -gain of the closed-loop
x ˙ = f ( x ) + G 1 ( x ) Θ ( x ) π + G 2 ( x ) Ψ ( x ) ρ ,
is minimized. In other words, according to Definition 1, solve the following minimax optimization problem
J μ * ( x 0 ) = min π max ρ h ( x ) L 2 2 + Θ ( x ) π L 2 2 μ Ψ ( x ) ρ L 2 2 ,
which represents a zero-sum differential game, where the minimal L 2 -gain is γ * = μ .
First, to propose an algorithm for minimization of the parameter μ in (16), the first-order derivative of the value of the differential game with respect to μ is needed. Since J μ * ( x 0 ) is non-differentiable because it is defined with the min–max operator, the sub-differential is employed. It can easily be shown that sub-differential of J μ * ( x 0 ) with respect to μ is
J μ * = Ψ ( x ) ρ L 2 2 .
Next, due to the inaccurate calculation of the previously mentioned derivative, the quasi-Newton method is considered, for which the superlinear convergence and numerical robustness have been proven (see for example [33,34,35,36]). The quasi-Newton k-th iteration is defined by
μ k + 1 = μ k h ( x k ) L 2 2 + Θ ( x k ) π k L 2 2 μ k Ψ ( x k ) ρ k L 2 2 J μ k * = h ( x k ) L 2 2 + Θ ( x k ) π k L 2 2 Ψ ( x k ) ρ k L 2 2 .
To minimize the L 2 -gain of the system (15), we propose the following quasi-Newton algorithm.
It is important to note that in the second step of Algorithm 1, unlike other known methods of nonlinear optimization, the dynamics of the system (15) is not included in the objective function.
Algorithm 1 quasi-Newton method for L 2 -gain minimization.
Require: 
μ 0 R + , ε R + .
Ensure: 
μ * .
1:
Set k 0 .
2:
Solve zero-sum differential game
J μ k * = min π max ρ h ( x k ) L 2 2 + Θ ( x k ) π k L 2 2 μ k Ψ ( x k ) ρ k L 2 2 ,
where x k , π k and ρ k are coupled by (15).
3:
Calculate
μ k + 1 = h ( x k ) L 2 2 + Θ ( x k ) π k L 2 2 Ψ ( x k ) ρ k L 2 2 .
4:
If | μ k + 1 μ k | ε then terminate. Otherwise, set k k + 1 and return to step 2.
To solve the subproblem in the second step of Algorithm 1, we use the conjugate gradient method as described in the following algorithm.
Note that, in Algorithm 2, the maximization of the objective function is obtained by changing the sign in front of the gradient with respect to ρ in (23). Additionally, it should be noted that the strategy for solving Problem 2 proposed by Algorithms 1 and 2 requires appropriate initialization, i.e., appropriate selection of approximation functions and initial parameters π 0 , ρ 0 and μ 0 . If they are inadequate, it cannot be guaranteed that the control law is for the “worst case” of uncertainty. In the general, it is very difficult to guarantee whether a solution for the “worst case” of uncertainty is obtained. However, the conditions that would give recommendations for the initialization of the parameters of the proposed algorithms could be derived by applying the concept of inverse min max optimal control. Solving the problem of inverse min–max optimal control goes beyond the scope of the research presented in this paper and will be part of future research.
Algorithm 2 Conjugate gradient method for saddle point computation.
Require: 
x 0 R n , ξ 0 π 0 T ρ 0 T T R n ξ , μ k , β 0 R + , η 0 R + , ϵ R + .
Ensure: 
π * , ρ * .
1:
Set j 1 .
2:
Perform a conjugate gradient descent/ascent algorithm in the following form
ξ j = ξ j 1 + η j 1 s j 1 ,
s j = J ( ξ j ) + β j 1 s j 1 ,
where s is the search direction vector and
J ( ξ j ) = π J ρ J T ,
J = h ( x j ) L 2 2 + Θ ( x j ) π j L 2 2 μ k Ψ ( x j ) ρ j L 2 2 .
3:
Determine η j > 0 by applying the line search strategy that satisfies Wolfe’s conditions (see Algorithm 3).
4:
Determine β j > 0 by applying Dai-Yuan method [37].
5:
If J ( ξ j ) ϵ then terminate. Otherwise, set j j + 1 and return to step 2.
Algorithm 3 Line search method with Wolfe conditions.
Require: 
ξ j , s j , J ( ξ j ) , J ( ξ j ) , 0 < c 1 < c 2 < 1 , ν ( 0 , 1 ) .
Ensure: 
η j .
1:
Set l 0 .
2:
Choose initial η 0 .
3:
While η l satisfies Wolfe’s conditions
J ( ξ j + η l s j ) J ( ξ j ) + c 1 η l J ( ξ j ) s j ,
J ( ξ j + η l s j ) s j c 2 J ( ξ j ) s j ,
calculate
(i)
η l + 1 = ν η l ,
(ii)
l l + 1 .
4:
Set η j = η l .
Furthermore, the initialization of parameters β and η is dependent on a specific optimization problem. Through a series of numerical experiments in simulations, we determined that the Dai–Yuan conjugate gradient method reaches a similar level of numerical robustness and accuracy for various initial values of β 0 , whereas for the standard gradient algorithm, choosing the initial parameters largely affects the convergence and can cause numerical instabilities.
To determine the convergence step η j > 0 , we use a line search method that satisfies the Wolfe conditions as described in the following algorithm.
Details related to the line search method and Wolfe conditions can be found in [38].

Gradient Calculation

In this subsection, the relations for calculating the gradients that appear in (22) i.e., (23) is derived. In order to perform chain rule for derivatives backward in time, first the system (15) needs to be rewritten in discrete-time state-space form. For this purpose, we use the multistep Adams method. Compared to the most popular four stage Runge–Kutta method, which requires solving the approximation problem four times at each step, the Adams method requires only one calculation.
The explicit Adams approximation of system (15) can be formulated as follows. Let the time interval t 0 , t f consists of t i = t 0 + i τ for i = 0 , 1 , 2 , , N 1 , such that τ = ( t f t 0 ) / N is the time step length, and let x ^ be the extended 4 n -dimensional state vector (n is dimension of state vector x and 4 is order of Adams method). Then, the system can be written in the following discrete-time state-space form:
x ^ ( i + 1 ) = f ^ ( x ^ ( i ) ) + a B ( x ( i ) ) π + a D ( x ( i ) ) ρ , x ^ ( 0 ) = x ^ 0 ,
where the vector a contains a non-zero coefficients of Adams method (see ([39], page 358)), and where, for simplicity, we introduced
B ( x ) G 1 ( x ) Θ ( x ) , D ( x ) G 2 ( x ) Ψ ( x ) .
General considerations regarding Adams method are given in [39], while several novel schemes for multistep Adams method are proposed in [40].
The discrete-time form of the objective function (24) is
J = τ i = 0 N 1 h T ( x ( i ) ) h ( x ( i ) ) + π T P ( x ( i ) ) π μ ρ T R ( x ( i ) ) ρ ,
where for simplicity we introduced
P ( x ) Θ T ( x ) Θ ( x ) , R ( x ) Ψ T ( x ) Ψ ( x ) .
The following is a derivation of recursive matrix relations for calculating π J using the basic chain rule arithmetic and matrix differentials calculus as well as certain properties of vectorization of matrices and Kronecker algebra. The relations for calculating ρ J are obtained in an analogous way.
The gradient of the objective function (29) with respect to the vector π is
π J = τ i = 0 N 1 F ( i ) π ,
where
F ( i ) = h T ( x ( i ) ) h ( x ( i ) ) + π T P ( x ( i ) ) π μ ρ T R ( x ( i ) ) ρ .
Then, an operation of partial derivative with respect to vector π is performed over the expression (32), which gives:
π F ( i ) = 2 h T ( x ( i ) ) · x ^ h ( x ( i ) ) · π x ^ ( i ) μ ρ ρ T · x ^ vec ( R ( x ( i ) ) ) · π x ^ ( i ) + vec ( P ( x ( i ) ) ) T · π π + π π T · x ^ vec ( P ( x ( i ) ) ) · π x ^ ( i ) .
In (33) it is necessary to calculate π x ^ ( i ) , which is obtained based on the expression (27) as follows:
π x ^ ( i ) = a B ( x ( i 1 ) ) + x ^ a B ( x ( i 1 ) ) π · π x ^ ( i 1 ) + x ^ f ^ ( x ^ ( i 1 ) ) · π x ^ ( i 1 ) + x ^ a D ( x ( i 1 ) ) ρ · π x ^ ( i 1 ) ,
where
x ^ a B ( x ( i 1 ) ) π = a π T I n · x ^ vec B ( x ( i 1 ) ) ,
x ^ a D ( x ( i 1 ) ) ρ = a ρ T I n · x ^ vec D ( x ( i 1 ) ) .
The expressions (34)–(36) represent recursive matrix relations for i = 1 , 2 , , N 1 with initial conditions
π x ^ ( 0 ) = 0 .
Note that the functions, f ^ ( x ^ ) , h ( x ) , P ( x ) , R ( x ) , B ( x ) and D ( x ) , which appear in previous expressions, are known from system dynamics and predetermined approximation functions basis and their derivatives can be easily calculated to machine precision by applying a complex-step method.
The first-order derivative of f ^ ( x ^ ) with respect to x ^ using complex-step approximation is accomplished by approximating its component (let us say the k-th component) with a complex variable using a Taylor’s series expansion [41,42]
f ^ k ( x ^ + i h e j ) = f ^ k ( x ^ ) + i h x ^ f ^ k ( x ^ ) · e j + ( i h ) 2 2 ! e j T · x ^ 2 f ^ k ( x ^ ) · e j + .
where e is unit vector. Taking only the imaginary parts gives
x ^ f ^ k ( x ^ ) · e j = Im f ^ k ( x ^ + i h e j ) h + O ( h 2 ) .
In the above expressions, i represents an imaginary unit i 2 = 1 , while previously with i we denoted the i-th time point. For this reason, the  x ^ ( i ) is omitted in these expressions, but this is implied. It can be seen that subtraction does not appear in the numerator of the expression (39), i.e., the subtractive cancellation problem has been removed. This means that the step h can be extremely small, so the derivative can be calculated to machine precision.
The implementation of a complex-step method can be very simply achieved by using a high-level programming language that does not require a prior definition of the variable types, but it is possible to define complex variables automatically by applying built-in functions.
The complex-step method for calculating x ^ f ^ ( x ^ ) is described by the pseudocode given in Algorithm 4. The calculation of derivatives of other known functions can be obtained in an analogous manner.
Algorithm 4 Calculation of derivatives at the i-th time point using the complex-step method.
Require: 
n, h, x ^ , f ^ ( x ^ )
Ensure: 
x ^ f ^ ( x ^ )
1:
x 1 x ^
2:
for j = 1 to n do
3:
     x 1 j c o m p l e x ( x ^ j , h )
4:
     f 1 f ^ ( x 1 )
5:
    for  k = 1 to n do
6:
         f ^ k ( x ^ j ) x j i m a g ( f 1 k ) h
7:
    end for
8:
     x 1 x ^
9:
end for

4. Benchmark Examples with Analytic Solution

In this section, the algorithmic procedure described in the previous sections is tested on examples of nonlinear systems where the HJI equation can be solved analytically, thereby exactly determining the control and uncertainty vectors. In this way, it can be directly assessed whether the proposed algorithmic procedure calculates the required solutions with sufficient numerical efficiency in terms of convergence and accuracy. The algorithm is written and implemented in the MATLAB software package. When solving a particular problem, it is necessary to initialize the algorithm in a suitable way.

4.1. Example—First-Order System

Consider a scalar nonlinear system described by the following equation [21]:
x ˙ = u + arctg ( x ) · d , z = x u .
If γ > π 2 , then by solving the corresponding HJI equation (for details see [21]), the analytical feedbacks are
u * = x 1 1 γ 2 arctg 2 ( x ) 1 2 , d * = 1 γ 2 x arctg ( x ) 1 1 γ 2 arctg 2 ( x ) 1 2 .
Approximation functions of control and uncertainty variables are chosen in the following form
u ^ ( x ) = p 1 x + p 2 x 3 + p 3 x 5 + p 4 x 7 + p 5 x 9 , d ^ ( x ) = r 1 x + r 2 x 3 + r 3 x 5 + r 4 x 7 + r 5 x 9 ,
i.e., written in the form (12) and (13) we have
Θ ( x ) = Ψ ( x ) = x x 3 x 5 x 7 x 9 , π = p 1 p 2 p 3 p 4 p 5 , ρ = r 1 r 2 r 3 r 4 r 5 .
In this example, we set the following values of the algorithm parameters: we divided the time interval from t 0 = 0 , to t f = 10 [s] into N = 100,000 equal subintervals, i.e., the discretization step is τ = 10 4 [s] and we applied the Adams method of the 4-th order; vector of initial conditions x 0 = 1 ; initial value of the parameter μ 0 = γ 0 2 = 5 ; vectors of initial parameters of approximation functions π 0 = ρ 0 = 0 ; stopping criterion of the quasi-Newton method ε = 10 3 ; stopping criterion of conjugate-gradient method ϵ = 10 6 , Dai–Yuan method is used by default, the initial numerical values of the conjugate gradient algorithm parameters are chosen as η 0 = 0.1 and β 0 = 0.5 , coefficients c 1 = 10 3 , c 2 = 0.9 and ν = 0.8 .
We obtain the following parameter μ and parameters of approximation functions:
μ * = 2.4667 , π * = 0.9999 0.2045 0.0781 0.0391 0.0112 , ρ * = 0.0414 0.9638 1.7517 1.8462 0.7361 ,
i.e., minimum L 2 -gain is γ * = μ * = 1.5706 .
Figure 1 shows the time dependence of the state variable, i.e., the response of the system (40) from the initial condition x 0 = 1 , where the control variable and uncertainty variable are of the form (42) with parameters (44). The control and uncertainty variables from (42) obtained by the proposed algorithm with the parameters from (44) in comparison with the analytical solutions from the expression (41) depending on the state variable are shown in Figure 2. Figure 3 shows the solutions obtained by the derived algorithm in comparison with the analytical solutions (41) as a function of time. Based on everything shown, it can be concluded that the numerically obtained solutions approximate the analytical ones well with negligible error.

4.2. Example—Second-Order System

Consider the second order nonlinear system [43]
x ˙ 1 = 1 8 29 x 1 + 87 x 1 x 2 2 1 4 2 x 2 + 3 x 2 x 1 2 + u 1 + 1 2 d , x ˙ 2 = 1 4 x 1 + 3 x 1 x 2 2 + 3 u 2 + d , z = 2 2 x 1 + 6 x 1 x 2 2 2 4 x 2 + 6 x 1 2 x 2 u 1 u 2 .
If γ * = 1 then by solving the corresponding HJI equation (for details see [43]), the analytical feedbacks are
u 1 * ( x ) = x 1 3 x 1 x 2 2 , u 2 * ( x ) = 6 x 2 9 x 1 2 x 2 , d * ( x ) = 1 2 x 1 + 2 x 2 + 3 x 1 2 x 2 + 3 2 x 1 x 2 2 .
Approximation functions of control and uncertainty variables are chosen in the following form:
u ^ 1 ( x ) = p 1 1 x 1 + p 2 1 x 2 + p 3 1 x 1 x 2 + p 4 1 x 1 2 x 2 + p 5 1 x 1 x 2 2 + p 6 1 x 1 2 x 2 2 + p 7 1 x 1 2 + p 8 1 x 2 2 , u ^ 2 ( x ) = p 1 2 x 1 + p 2 2 x 2 + p 3 2 x 1 x 2 + p 4 2 x 1 2 x 2 + p 5 2 x 1 x 2 2 + p 6 2 x 1 2 x 2 2 + p 7 2 x 1 2 + p 8 2 x 2 2 , d ^ ( x ) = r 1 x 1 + r 2 x 2 + r 3 x 1 x 2 + r 4 x 1 2 x 2 + r 5 x 1 x 2 2 + r 6 x 1 2 x 2 2 + r 7 x 1 2 + r 8 x 2 2 ,
i.e., written in the form (12) and (13), we have
θ 1 ( x ) = θ 2 ( x ) = ψ ( x ) = x 1 x 2 x 1 x 2 x 1 2 x 2 x 1 x 2 2 x 1 2 x 2 2 x 1 2 x 2 2 , Θ ( x ) = θ 1 ( x ) 0 0 θ 2 ( x ) , Ψ ( x ) = ψ ( x ) , p 1 = p 1 1 p 2 1 p 8 1 , p 2 = p 1 2 p 2 2 p 8 2 , π = p 1 p 2 , ρ = r 1 r 2 r 8 .
In this example, we set the following values of the algorithm parameters: we divided the time interval from t 0 = 0 , to t f = 3 [s] into N = 5000 equal subintervals, i.e., the discretization step is τ = 10 4 [s] and we applied the Adams method of the 4-th order by default; vector of initial conditions x 0 = 1 1 T ; initial value of the parameter μ 0 = γ 0 2 = 5 ; vectors of initial parameters of approximation functions π 0 = ρ 0 = 1 ; stopping criterion of the quasi-Newton method ε = 10 3 ; stopping criterion of conjugate-gradient method ϵ = 10 3 , Dai–Yuan method is used by default, the initial numerical values of the conjugate gradient algorithm parameters are chosen as η 0 = 0.1 and β 0 = 0.5 , coefficients c 1 = 10 3 , c 2 = 0.9 and ν = 0.8 .
We obtain the following parameter μ and parameters of approximation functions:
μ * = 1.0021 , p 1 * = 0.9997 0.0241 0.0785 0.0676 3.0576 0.0422 0.0003 0.0287 , p 2 * = 0.0004 6.0319 0.1224 9.1208 0.1533 0.1016 0.0009 0.0825 , ρ * = 0.4995 2.0478 0.1576 3.1370 1.6151 0.0847 0.0007 0.0579 ,
i.e., the minimum L 2 -gain is γ * = μ * = 1.0010 .
Figure 4 shows the time dependence of the state variables, i.e., the response of the system (45) from the initial conditions x 0 = 1 1 T , where the control variables and uncertainty variable are of the form (47) with parameters (49). Figure 5 shows the solutions obtained by the derived algorithm in comparison with the analytical solutions (46). In this example, as in the previous, it is evident that the numerical solution approximates the analytical solution well.

5. Conclusions

In this paper, an approach for the solution of a nonlinear H control problem is presented. Instead of using the approximation methods for solving the corresponding HJI equation, the solution is obtained by direct numerical calculation of the control and uncertainty variables that explicitly depend on the system states. In order to achieve numerical efficiency, the proposed algorithmic procedure uses quasi-Newton method, conjugate gradient method, line search method with Wolfe conditions, Adams approximation method for time discretization, and complex-step calculation of derivatives. In spite of the fact that the methods used in this paper are known from the references cited, in our approach, they are integrated together to provide a suitable mathematical tool for numerical solution of the zero-sum differential game related to the nonlinear H control problem.
The extension of this approach can be continued in two research directions: (i) consider output measurement-feedback H optimal control problem, and (ii) the case where the initial state vector is unknown and treated as an uncertainty, i.e., the maximizing “player”. It can be assumed that the proposed strategy can be extended to these two cases without a significant increase in its complexity.

Author Contributions

Conceptualization, V.M. and J.K.; methodology, V.M. and J.K.; software, V.M.; validation, V.M., J.K. and M.L.; formal analysis, V.M.; investigation, V.M., J.K. and M.L.; resources, V.M.; data curation, V.M., J.K. and M.L.; writing—original draft preparation, V.M.; writing—review and editing, V.M., J.K. and M.L.; visualization, V.M., J.K. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the European Regional Development Fund, Operational Program Competitiveness and Cohesion 2014–2020, grant number KK.01.1.1.04.0092.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HJIHamilton–Jacobi–Isaacs
LMILinear Matrix Inequalities
PIDProportional–Integral–Derivative

References

  1. Helton, J.W.; James, M.R. Extending H Control to Nonlinear Systems; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
  2. Van Der Schaft, A. L2-gain analysis of nonlinear systems and nonlinear state feedback H control. IEEE Trans. Autom. Control 1992, 37, 770–784. [Google Scholar] [CrossRef] [Green Version]
  3. Basar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
  4. Basar, T.; Bernard, P. H Optimal Control and Related Minimax Design Problems, Second Edition; Birkhuser: Boston, MA, USA, 1995. [Google Scholar]
  5. Khanbaghi, M.; Zečević, A. An LMI-based control strategy for large-scale systems with applications to interconnected microgrid clusters. IEEE Access 2022, 10, 111554–111563. [Google Scholar] [CrossRef]
  6. Chen, B.S.; Ma, Y.S.; Lee, M.Y. Stochastic robust H decentralized network formation tracking control of large-scale team satellites via event-triggered mechanism. IEEE Access 2022, 10, 62011–62036. [Google Scholar] [CrossRef]
  7. Chatavi, M.; Vu, M.T.; Mobayen, S.; Fekih, A. H robust LMI-based nonlinear state feedback controller of uncertain nonlinear systems with external disturbances. Mathematics 2022, 10, 3518. [Google Scholar] [CrossRef]
  8. Gritli, H.; Belghith, S. Robust feedback control of the underactuated inertia wheel inverted pendulum under parametric uncertainties and subject to external disturbances: LMI formulation. J. Frankl. Inst. 2018, 355, 9150–9191. [Google Scholar] [CrossRef]
  9. Xi, A.; Cai, Y. A nonlinear finite-time robust differential game guidance law. Sensors 2022, 22, 6650. [Google Scholar] [CrossRef]
  10. Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q. Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 142–160. [Google Scholar] [CrossRef]
  11. Sun, W.; Pan, Y.; Lim, J.; Theodorou, E.A.; Tsiotras, P. Min-max differential dynamic programming: Continuous and discrete time formulations. J. Guid. Control. Dyn. 2018, 41, 2568–2580. [Google Scholar] [CrossRef]
  12. Vamvoudakis, K.G.; Modares, H.; Kiumarsi, B.; Lewis, F.L. Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online. IEEE Control Syst. Mag. 2017, 37, 33–52. [Google Scholar]
  13. Ivanov, I.G.; Bogdanova, B.C. The iterative solution to discrete-time H control problems for periodic systems. Algorithms 2016, 9, 20. [Google Scholar] [CrossRef] [Green Version]
  14. Lu, X.; Li, H. A hybrid control approach to H problem of nonlinear descriptor systems with actuator saturation. IEEE Trans. Autom. Control 2021, 66, 4960–4966. [Google Scholar] [CrossRef]
  15. Lu, X.; Li, H. Prescribed finite-time H control for nonlinear descriptor systems. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 2917–2921. [Google Scholar] [CrossRef]
  16. Aliyu, M.D.S. An improved iterative computational approach to the solution of the Hamilton-Jacobi equation in optimal control problems of affine nonlinear systems with application. Int. J. Syst. Sci. 2020, 51, 2625–2634. [Google Scholar] [CrossRef]
  17. Mu, C.; Wang, K. Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism. Nonlinear Dyn. 2019, 95, 2639–2657. [Google Scholar] [CrossRef]
  18. Peretz, Y. A Randomized Algorithm for Optimal PID Controllers. Algorithms 2018, 11, 81. [Google Scholar] [CrossRef] [Green Version]
  19. Graham, A. Kronecker Products and Matrix Calculus: With Applications; Ellis Horwood Limited: West Sussex, UK, 1981. [Google Scholar]
  20. Brewer, J.W. Kronecker products and matrix calculus in system theory. IEEE Trans. Circuits Syst. 1978, 25, 772–781. [Google Scholar] [CrossRef]
  21. Van Der Schaft, A. L2-Gain and Passivity Techniques in Nonlinear Control; Springer: London, UK, 1996. [Google Scholar]
  22. Isaacs, R. Differential Games. A Mathematical Theory with Application to Warfare and Pursuit, Control and Optimization; John Wiley and Sons, Inc.: New York, NY, USA, 1965. [Google Scholar]
  23. Astolfi, A. Singular H control for nonlinear systems. Int. J. Robust Nonlinear Control 1997, 7, 727–740. [Google Scholar] [CrossRef]
  24. Maas, W.C.A.; Van der Schaft, A.J. Singular nonlinear H optimal control by state feedback. In Proceedings of the The 33rd IEEE Conference on Decision and Control, Lake Buena Vista, FL, USA, 14–16 December 1994; Volume 2, pp. 1415–1420. [Google Scholar]
  25. Courant, R.; Hilbert, D. Methods of Mathematical Physics: Volume 1; Interscience Publishers, Inc.: New York, NY, USA, 1966. [Google Scholar]
  26. Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
  27. Barron, A.R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 1993, 39, 930–945. [Google Scholar] [CrossRef] [Green Version]
  28. Sandberg, I.W. Notes on uniform approximation of time-varying systems on finite time intervals. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 1998, 45, 863–865. [Google Scholar] [CrossRef]
  29. Sandberg, I.W. Uniform approximation of periodically-varying systems. IEEE Trans. Circuits Syst. I Regul. Pap. 2004, 51, 1631–1638. [Google Scholar] [CrossRef]
  30. Adams, R.A.; Fournier, J.J.F. Sobolev Spaces; Pure and Applied Mathematics; Elsevier Science: Amsterdam, The Netherlands, 2003. [Google Scholar]
  31. Davis, P.J. Interpolation and Approximation; Dover Publications Inc.: New York, NY, USA, 1975. [Google Scholar]
  32. Meinardus, G. Approximation of Functions: Theory and Numerical Methods, Larry, L., Transed; Schumaker Springer: Berlin, Germany, 1967. [Google Scholar]
  33. Pu, D.; Zhang, J. Inexact generalized Newton methods for second order C-differentiable optimization. J. Comput. Appl. Math. 1998, 93, 107–122. [Google Scholar] [CrossRef] [Green Version]
  34. Qi, L. On superlinear convergence of quasi-Newton methods for nonsmooth equations. Oper. Res. Lett. 1997, 20, 223–228. [Google Scholar] [CrossRef]
  35. Pang, J.S.; Qi, L. Nonsmooth equations: Motivation and algorithms. SIAM J. Optim. 1993, 3, 443–465. [Google Scholar] [CrossRef]
  36. Qi, L.; Sun, J. A nonsmooth version of Newton’s method. Math. Program. 1993, 58, 353–367. [Google Scholar] [CrossRef]
  37. Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
  38. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer Science + Business Media, LLC: New York, NY, USA, 2006. [Google Scholar]
  39. Hairer, E.; Nørsett, S.P.; Wanner, G. Solving Ordinary Differential Equations I—Nonstiff Problems, Second Revised Edition; Springer: Berlin, Germany, 2008. [Google Scholar]
  40. Pesterev, D.; Druzhina, O.; Pchelintsev, A.; Nepomuceno, E.; Butusov, D. Numerical integration schemes based on composition of adjoint multistep methods. Algorithms 2022, 15, 463. [Google Scholar] [CrossRef]
  41. Squire, W.; Trapp, G. Using complex variables to estimate derivatives of real functions. SIAM Rev. 1998, 40, 110–112. [Google Scholar] [CrossRef] [Green Version]
  42. Fornberg, B. Numerical differentiation of analytic functions. ACM Trans. Math. Softw. 1981, 7, 512–526. [Google Scholar] [CrossRef]
  43. Dierks, T.; Jagannathan, S. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. In Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, GA, USA, 15–17 December 2010; pp. 3048–3053. [Google Scholar]
Figure 1. Time dependence of the state variable (Example 4.1).
Figure 1. Time dependence of the state variable (Example 4.1).
Algorithms 16 00048 g001
Figure 2. Control and uncertainty variables in dependence on the state variable (Example 4.1).
Figure 2. Control and uncertainty variables in dependence on the state variable (Example 4.1).
Algorithms 16 00048 g002
Figure 3. Time dependence of the control and uncertainty variables (Example 4.1).
Figure 3. Time dependence of the control and uncertainty variables (Example 4.1).
Algorithms 16 00048 g003
Figure 4. Time dependence of the state variables (Example 4.2).
Figure 4. Time dependence of the state variables (Example 4.2).
Algorithms 16 00048 g004
Figure 5. Time dependence of the control and uncertainty variables (Example 4.2).
Figure 5. Time dependence of the control and uncertainty variables (Example 4.2).
Algorithms 16 00048 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Milić, V.; Kasać, J.; Lukas, M. An Algorithm for Solving Zero-Sum Differential Game Related to the Nonlinear H Control Problem. Algorithms 2023, 16, 48. https://doi.org/10.3390/a16010048

AMA Style

Milić V, Kasać J, Lukas M. An Algorithm for Solving Zero-Sum Differential Game Related to the Nonlinear H Control Problem. Algorithms. 2023; 16(1):48. https://doi.org/10.3390/a16010048

Chicago/Turabian Style

Milić, Vladimir, Josip Kasać, and Marin Lukas. 2023. "An Algorithm for Solving Zero-Sum Differential Game Related to the Nonlinear H Control Problem" Algorithms 16, no. 1: 48. https://doi.org/10.3390/a16010048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop