Next Article in Journal
Use of the WASPAS Method to Select Suitable Helicopters for Aerial Activity Carried Out by the Military Police of the State of Rio de Janeiro
Next Article in Special Issue
Uncertain Programming Model for the Cross-Border Multimodal Container Transport System Based on Inland Ports
Previous Article in Journal
The SQEIRP Mathematical Model for the COVID-19 Epidemic in Thailand
Previous Article in Special Issue
A Joint Location–Allocation–Inventory Spare Part Optimization Model for Base-Level Support System with Uncertain Demands
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Game—Theoretic Model for a Stochastic Linear Quadratic Tracking Problem

by
Vasile Drăgan
1,2,†,
Ivan Ganchev Ivanov
3,† and
Ioan-Lucian Popa
4,5,*,†
1
“Simion Stoilow” Institute of Mathematics, Romanian Academy, P.O. Box 1-764, 014700 Bucharest , Romania
2
Academy of the Romanian Scientists, Str. Ilfov, Nr. 3, 50044 Bucharest, Romania
3
Faculty of Economics and Business Administration, Sofia University “St. Kl. Ohridski”, 125 Tzarigradsko Chaussee Blvd., bl. 3, 1113 Sofia, Bulgaria
4
Department of Computing, Mathematics and Electronics, Faculty of Computing and Engineering, “1 Decembrie 1918” University of Alba Iulia, 510009 Abla Iulia, Romania
5
Faculty of Mathematics and Computer Science, Transilvania University of Braşov, Iuliu Maniu Street 50, 500091 Braşov, Romania
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2023, 12(1), 76; https://doi.org/10.3390/axioms12010076
Submission received: 14 December 2022 / Revised: 6 January 2023 / Accepted: 8 January 2023 / Published: 11 January 2023
(This article belongs to the Special Issue Advances in Uncertain Optimization and Applications)

Abstract

:
In this paper, we solve a stochastic linear quadratic tracking problem. The controlled dynamical system is modeled by a system of linear Itô differential equations subject to jump Markov perturbations. We consider the case when there are two decision-makers and each of them wants to minimize the deviation of a preferential output of the controlled dynamical system from a given reference signal. We assume that the two decision-makers do not cooperate. Under these conditions, we state the considered tracking problem as a problem of finding a Nash equilibrium strategy for a stochastic differential game. Explicit formulae of a Nash equilibrium strategy are provided. To this end, we use the solutions of two given terminal value problems (TVPs). The first TVP is associated with a hybrid system formed by two backward nonlinear differential equations coupled by two algebraic nonlinear equations. The second TVP is associated with a hybrid system formed by two backward linear differential equations coupled by two algebraic linear equations.

1. Introduction

Tracking problems are often encountered in many applications and have received attention from the research community in the past few decades [1,2,3,4,5]. In the stochastic context, this problem was studied in [6,7] as well as in [8,9]. For stochastic systems with time delay, the linear quadratic tracking problem has been studied in [10,11]. Applications of tracking problems may be found in economic policy control [12], process control [13], networked control systems [14], control of mobile robots [15], spacecraft hovering [16], etc. Usually, a linear quadratic tracking problem requires minimization of the L 2 -norm of the deviation of a signal generated by a controlled linear system from a reference signal. When there exists more than one decision maker and each of them wants to minimize the deviation of a preferential signal from a given reference signal, the optimal tracking problem may be stated as a problem of finding a Nash equilibrium strategy for a linear quadratic differential game. If the controlled system whose outputs have to track the given reference signal is described by linear stochastic differential equations, one obtains a problem of finding a Nash equilibrium strategy for a stochastic differential game. Lately, the stochastic differential games have attracted an increasing research interest see, e.g., [17,18,19,20]. Moreover, for Nash tracking game problems for continuous-time systems over finite intervals see [21] and the reference therein.
In the present work, we consider the case when the controlled system is described by a system of Itô differential equations with coefficients affected by a standard homogeneous Markov process with a finite number of states. We assume that there exist at least two decision-makers. The aim of the k-th decision-maker is to minimize the deviation of an output z k ( · ) of the controlled system from a reference signal r k ( · ) . The class of admissible strategies consists of stochastic processes in an affine state feedback form.
In the derivation of the main results we consider two cases:
(a) the case with only one decision-maker;
(b) the case of two decision-makers.
The result derived from case (a) is used in the case with more than one decision-maker in order to obtain an optimal strategy. In case (b), we study the game theoretic model for two players, where each player wants to find the optimal admissible strategy minimizing the deviation of the controlled signal from the given reference.
We assume that the players do not cooperate. The reasons for which they are not cooperating may be caused by individual motivations or by physical reasons. We provide explicit formulae of a Nash equilibrium strategy. To this end, we use the solutions of two TVPs. The first TVP is associated with a hybrid system formed by two backward nonlinear differential equations coupled with two algebraic nonlinear equations. The second TVP is associated with a hybrid system formed by two backward linear differential equations coupled by two algebraic linear equations.
The paper is organized as follows: Section 2 includes the description of the mathematical model as well as the formulation of the tracking problem as a problem of finding a Nash equilibrium strategy for a stochastic differential game. The main results are derived in Section 3. First, in Section 3.1, we consider the case with one decision-maker. Further, in Section 3.2, we obtain explicit formulae for the optimal strategies in the case of two decision-makers. In Section 4, we briefly discuss two special cases: (i) the case when the controlled system does not contain controlled dependent terms in the diffusion part; (ii) the case when the aim of the decision-makers is to minimize the deviation of the controlled signals from a given final target without restrictions regarding the behavior of the transient states. In Section 5, we provide a numerical example that shows that the proposed procedure is feasible. Finally, in Section 6, we provide some conclusions and future research directions.

2. The Problem

Consider the controlled system having the state space representation described by
d x ( t ) = [ A 0 ( t , η t ) x ( t ) + B 1 ( t , η t ) u 1 ( t ) + B 2 ( t , η t ) u 2 ( t ) ] d t + [ A 1 ( t , η t ) x ( t ) + D 1 ( t , η t ) u 1 ( t ) + D 2 ( t , η t ) u 2 ( t ) ] d w ( t ) ,
x ( t 0 ) = x 0 ,
z k ( t ) = C k ( t , η t ) x ( t ) , k = 1 , 2 ,
t [ t 0 , t f ] [ 0 , ) , where x ( t ) R n is the state vector at instance time t and u k ( t ) R m k ,   k = 1 , 2 , are the vectors of control parameters. In (1), { w ( t ) } t 0 , is a one-dimensional standard Wiener process defined on a given probability space ( Ω , F , P ) and { η t } t 0 is a standard right continuous Markov process taking values in a finite set N = { 1 , 2 , , N } and having the transition semigroup P ( t ) = e Q t ,   t 0 . The elements q i j of the generator matrix Q R N × N satisfy
q i j 0 if i j ,
l = 1 N q i l = 0 ,
for all ( i , j ) N × N . For more details regarding the properties of a Wiener process, one can see [22] or [23], whereas for more properties of a Markov process, we refer, for example, to [24,25].
Throughout the paper, we assume that { w ( t ) } t 0 and { η t } t 0 are independent stochastic processes. The dependence of the coefficients of the system (1) upon the Markov process { η t } t 0 highlights the fact that this system may be considered the mathematical model of some phenomena in which abrupt structural changes are likely to occur. Such variations may be due, for instance, to changes between different operation points, sensor or actuator failures, temporary loss of communication, and so on. For the readers’ convenience, we refer to [9,26,27,28] for extensive discussions on the subject. As usual, we shall write A k ( t , i ) ,   k = 0 , 1 ,   B j ( t , i ) ,   C j ( t , i ) ,   D j ( t , i ) ,   j = 1 , 2 instead of A k ( t , η t ) , and so on, whenever η t = i N . Assume that t A k ( t , i ) : [ t 0 , t f ] R n × n ,   k = 0 , 1 ,   t ( B j ( t , i ) , C j ( t , i ) , D j ( t , i ) ) : [ t 0 , t f ] R n × m j × R n z j × n × R n × m j ,   j = 1 , 2 are continuous matrix-valued functions. The set U k of the admissible controls available to the decision-makers P k ,   k = 1 , 2 , consists of the stochastic processes in an affine state feedback form, i.e.,
u k ( t ) = F k ( t , η t ) x ( t ) + φ k ( t , η t )
where t F k ( t , i ) : [ t 0 , t f ] R m k × n and t φ k ( t , i ) : [ t 0 , t f ] R m k are arbitrary continuous functions. Applying Theorem 1.1 in Chapter 5 from [22] (see also Section 1.12 from [9]) we obtain:
Corollary 1. 
For each x 0 R n and for each u ( · ) = ( u 1 ( · ) , u 2 ( · ) ) U 1 × U 2 , the initial value problem (IVP) (1a) has a unique solution x u ( · ; t 0 , x 0 ) which is a stochastic process with the properties
(a) 
x u ( · ; t 0 , x 0 ) is a.s. continuous in every t [ t 0 , t f ] ;
(b) 
for each t [ t 0 , t f ] ,   x u ( t ; t 0 , x 0 ) is F t measurable, where F t F is the σ algebra generated by the random variables w ( s ) ,   η s ,   0 s t ;
(c) 
sup t [ t 0 , t f ] E [ | x u ( t ; t 0 , x 0 ) | p ] < , for all p 1 ;
(d) 
x u ( t 0 ; t 0 , x 0 ) = x 0 .
Throughout this work, E [ · ] stands for the mathematical expectation. Roughly speaking, the aim of the decision-maker P k is to find a control law (or an admissible strategy) of type (3), u ˜ k ( · ) U k which minimizes the deviation of the signal z k ( · ) from a given reference signal r k ( · ) , when the other decision-maker P l wants to minimize the deviation of the signal z l ( · ) ,   ( l k ) from another reference signal r l ( · ) .
Since this problem is an optimization problem with two objective functions, its solution can be viewed as an equilibrium strategy for a non-cooperative differential game with two players. For a rigorous mathematical setting of this optimization problem, let us introduce the following cost functions which are modeling the performance criterion for each player ( k = 1 , 2 ) :
J k ( x 0 ; u 1 ( · ) , u 2 ( · ) ) = E [ ( z k ( t f ) ζ k ) T G k ( η t f ) ( z k ( t f ) ζ k ) ] + E t 0 t f { ( z k ( t ) r k ( t ) ) T M k ( t , η t ) ( z k ( t ) r k ( t ) ) + u 1 T ( t ) R k 1 ( t , η t ) u 1 ( t ) + u 2 T ( t ) R k 2 ( t , η t ) u 2 ( t ) } d t .
Here r k ( · ) : [ t 0 , t f ] R n z k ,   k = 1 , 2 is the reference which must be tracked by the signal z k ( · ) , and ζ k R n z k ,   k = 1 , 2 is the target of the final value z k ( t f ) . The weight matrices involved in (4) are satisfying the assumption:
Hypothesis 1. 
(a) 
t ( M k ( t , i ) , R k 1 ( t , i ) , R k 2 ( t , i ) ) : [ t 0 , t f ] S n z k × S m 1 × S m 2 ,   i N are continuous matrix-valued functions;
(b) 
M k ( t , i ) 0 ,   R k l ( t , i ) 0 ,   R k k ( t , i ) > 0 , for all t [ t 0 , t f ] ,   k , l = 1 , 2 ,   l k ,   G k ( i ) 0 ,   i N .
Here and in the sequel, S p R p × p denotes the subspace of symmetric matrices of size p × p ,   p 1 .
Definition 1. 
We say that the pair of admissible strategies ( u ˜ 1 ( · ) , u ˜ 2 ( · ) ) U 1 × U 2 achieve a Nash equilibrium for the differential game described by the controlled system (1), the performance criterion (4), and the admissible strategies of type (3) if
J 1 ( x 0 ; u ˜ 1 ( · ) , u ˜ 2 ( · ) ) J 1 ( x 0 ; u 1 ( · ) , u ˜ 2 ( · ) ) , f o r a l l u 1 U 1
and
J 2 ( x 0 ; u ˜ 1 ( · ) , u ˜ 2 ( · ) ) J 2 ( x 0 ; u ˜ 1 ( · ) , u 2 ( · ) ) , f o r a l l u 2 U 2 .
In the next section, we shall derive explicit formulae of a Nash equilibrium strategy ( u ˜ 1 ( · ) , u ˜ 2 ( · ) ) for the linear quadratic differential game described by (1), (3), and (4).
Remark 1.
(a) 
We shall see that for the computation of the gain matrices of a Nash equilibrium strategy we need to know a priori the whole reference signal r k ( · ) .
(b) 
When M k ( t , i ) = 0 , for all ( t , i ) [ t 0 , t f ] × N , then (4) reduces to
J k ( x 0 ; u 1 ( · ) , u 2 ( · ) ) = E [ ( z k ( t f ) ζ k ) T G k ( η t f ) ( z k ( t f ) ζ k ) ] + E t 0 t f { u 1 T ( t ) R k 1 ( t , η t ) u 1 ( t ) + u 2 T ( t ) R k 2 ( t , η t ) u 2 ( t ) } d t , k = 1 , 2 .
The performance criterion (4) could be replaced by one of the form (7), when the decision-maker is interested only by the minimization of the deviation of the final value z k ( t f ) from the target ζ k . The term
E t 0 t f { u 1 T ( t ) R k 1 ( t , η t ) u 1 ( t ) + u 2 T ( t ) R k 2 ( t , η t ) u 2 ( t ) } d t ,
which appears both in (4) and (7), must be viewed as a penalization of the control effort.

3. The Main Results

3.1. The Case with Only One Decision Maker

In order to derive in an elegant way the state space representation of a pair of form (3) which satisfies (5) and (6), respectively, we study first the problem of tracking a reference signal in the case where there is only one decision maker.
We consider the optimal control problem described by the controlled system:
d x ( t ) = [ A 0 ( t , η t ) x ( t ) + B ( t , η t ) u ( t ) + g 0 ( t , η t ) ] d t + [ A 1 ( t , η t ) x ( t ) + D ( t , η t ) u ( t ) + g 1 ( t , η t ) ] d w ( t ) ,
x ( t 0 ) = x 0 ,
z ( t ) = C ( t , η t ) x ( t ) ,
t [ t 0 , t f ] and the performance criterion
J ( x 0 ; u ( · ) ) = E [ ( z ( t f ) ζ ) T G ( η t f ) ( z ( t f ) ζ ) ] + E t 0 t f { ( z ( t ) r ( t ) ) T M ( t , η t ) ( z ( t ) r ( t ) ) + u T ( t ) R ( t , η t ) u ( t ) } d t .
Here, the stochastic processes { w ( t ) } t 0 and { η t } t 0 have the same properties as in the case of system (1). In (8) and (9), x ( t ) R n is the state vector at the instance time t and u ( t ) R m is the vector of the control parameters.
In this subsection, the set U of admissible controls consists of stochastic processes of the form
u ( t ) = F ( t , η t ) x ( t ) + φ ( t , η t ) ,
t F ( t , i ) : [ t 0 , t f ] R m × n ,   t φ ( t , i ) : [ t 0 , t f ] R m which are arbitrary continuous functions. The optimal control problem which we want to solve in this subsection consists in finding a control u ˜ ( · ) from U which minimizes the cost function (9) along the trajectories of the system (8) determined by all admissible controls of the form (10).
Regarding the coefficients of (8) and (9), we suppose:
Hypothesis 2.
(a) 
t ( A 0 ( t , i ) , A 1 ( t , i ) , B ( t , i ) , D ( t , i ) , C ( t , i ) , M ( t , i ) , R ( t , i ) ) : [ t 0 , t f ] R n × n × R n × n × R n × m × R n × m × R n z × n × S n z × S m ,   t g k ( t , i ) : [ t 0 , t f ] R n ,   k = 0 , 1 ,   t r ( t ) : [ t 0 , t f ] R n z are continuous matrix-valued functions;
(b) 
M ( t , i ) 0 ,   R ( t , i ) > 0 , for all t [ t 0 , t f ] ,   G ( i ) 0 ,   i N .
Let us consider the function V : [ t 0 , t f ] × R n × N R defined by
V ( t , x , i ) = x T X ( t , i ) x 2 x T Ψ ( t , i ) + μ ( t , i )
where t ( X ( t , i ) , Ψ ( t , i ) , μ ( t , i ) ) : [ t 0 , t f ] S n × R n × R ,   i N are continuous and differentiable functions. Applying the Itô formula, see for example Theorem 1.10.2 from [9] in the case of the function (11) and to the stochastic process x ( t ) satisfying (8a), we obtain
E [ V ( t f , x ( t f ) , η t f ) V ( t 0 , x ( t 0 ) , η t 0 ) | η t 0 = i ] = E t o t f x ( t ) 1 u ( t ) T W 11 ( t , η t ) W 12 ( t , η t ) W 13 ( t , η t ) W 12 T ( t , η t ) W 22 ( t , η t ) W 23 ( t , η t ) W 13 T ( t , η t ) W 23 T ( t , η t ) W 33 ( t , η t ) x ( t ) 1 u ( t ) d t | η t 0 = i
for all i N , where
W 11 ( t , i ) = X ˙ ( t , i ) + A 0 T ( t , i ) X ( t , i ) + X ( t , i ) A 0 ( t , i )
+ A 1 T ( t , i ) X ( t , i ) A 1 ( t , i ) + j = 1 N q i j X ( t , j ) W 12 ( t , i ) = Ψ ˙ ( t , i ) A 0 T ( t , i ) Ψ ( t , i ) + X ( t , i ) g 0 ( t , i )
+ A 1 T ( t , i ) X ( t , i ) g 1 ( t , i ) + j = 1 N q i j Ψ ( t , j )
W 13 ( t , i ) = X ( t , i ) B ( t , i ) + A 1 T ( t , i ) X ( t , i ) D ( t , i )
W 22 ( t , i ) = μ ˙ ( t , i ) + j = 1 N q i j μ ( t , j ) + g 1 T ( t , i ) X ( t , i ) g 1 ( t , i ) 2 g 0 T ( t , i ) Ψ ( t , i )
W 23 ( t , i ) = Ψ T ( t , i ) B ( t , i ) + g 1 T ( t , i ) X ( t , i ) D ( t , i )
W 33 ( t , i ) = D T ( t , i ) X ( t , i ) D ( t , i )
Taking the expectation in (12) and adding with (9) we obtain
J ( x 0 , u ( · ) ) + E [ V ( t f , x ( t f ) , η t f ) ] E [ V ( t 0 , x 0 , η t 0 ) ] = E ( x ( t f ) ζ ) T G ( η t f ) ( z ( t f ) ζ ) + E t o t f x ( t ) 1 u ( t ) T W ^ 11 ( t , η t ) W ^ 12 ( t , η t ) W ^ 13 ( t , η t ) W ^ 12 T ( t , η t ) W ^ 22 ( t , η t ) W ^ 23 ( t , η t ) W ^ 13 T ( t , η t ) W ^ 23 T ( t , η t ) W ^ 33 ( t , η t ) x ( t ) 1 u ( t ) d t
where
W ^ 11 ( t , i ) = W 11 ( t , i ) + C T ( t , i ) M ( t , i ) C ( t , i )
W ^ 12 ( t , i ) = W 12 ( t , i ) C T ( t , i ) M ( t , i ) r ( t , i )
W ^ 13 ( t , i ) = W 13 ( t , i )
W ^ 22 ( t , i ) = W 22 ( t , i ) + r T ( t , i ) M ( t , i ) r ( t , i )
W ^ 23 ( t , i ) = W 23 ( t , i )
W ^ 33 ( t , i ) = R ( t , i ) + D T ( t , i ) X ( t , i ) D ( t , i )
Let
X ( · ) = ( X ( · , 1 ) , X ( · , 2 ) , , X ( · , N ) ) ,
Ψ ( · ) = ( Ψ ( · , 1 ) , Ψ ( · , 2 ) , , Ψ ( · , N ) ) ,
μ ( · ) = ( μ ( · , 1 ) , μ ( · , 2 ) , , μ ( · , N ) ) T
be the solutions of the following terminal value problem (TVP)
X ˙ ( t , i ) + A 0 T ( t , i ) X ( t , i ) + X ( t , i ) A 0 ( t , i ) + A 1 T ( t , i ) X ( t , i ) A 1 ( t , i ) ( X ( t , i ) B ( t , i ) + A 1 T ( t , i ) X ( t , i ) D ( t , i ) ) ( R ( t , i ) + D T ( t , i ) X ( t , i ) D ( t , i ) ) 1 · ( B T ( t , i ) X ( t , i ) + D T ( t , i ) X ( t , i ) A 1 ( t , i ) )
+ C T ( t , i ) M ( t , i ) C ( t , i ) + j = 1 N q i j X ( t , j ) = 0 , t 0 t t f ,
X ( t f , i ) = C T ( t f , i ) G ( i ) C ( t f , i ) , i N ,
Ψ ˙ ( t , i ) + ( A 0 ( t , i ) + B ( t , i ) F ˜ ( t , i ) ) T Ψ ( t , i ) + j = 1 N q i j Ψ ( t , j ) + G ( t , X ( t , i ) , i ) = 0
Ψ ( t f , i ) = C T ( t f , i ) G ( i ) ζ F ˜ ( t , i ) = ( R ( t , i ) + D T ( t , i ) X ( t , i ) D ( t , i ) ) 1 ( B T ( t , i ) X ( t , i )
+ D T ( t , i ) X ( t , i ) A 1 ( t , i ) ) G ( t , X ( t , i ) , i ) = C T ( t , i ) M ( t , i ) r ( t ) X ( t , i ) g 0 ( t , i ) ( A 1 ( t , i )
+ D ( t , i ) F ˜ ( t , i ) ) T X ( t , i ) g 1 ( t , i )
X ( t , i ) ,   i N being the components of the solution of the TVP (16),
μ ˙ ( t , i ) + j = 1 N q i j μ ( t , j ) + h ( t , i ) = 0
μ ( t f , i ) = ζ T G ( i ) ζ h ( t , i ) = g 1 T ( t , i ) X ( t , i ) g 1 ( t , i ) + r T ( t ) M ( t , i ) r ( t ) 2 g 0 T ( t , i ) Ψ ( t , i ) ( B T ( t , i ) Ψ ( t , i ) D T ( t , i ) X ( t , i ) g 1 ( t , i ) ) T ( R ( t , i ) + D T ( t , i ) X ( t , i ) D ( t , i ) ) 1
· ( B T ( t , i ) Ψ ( t , i ) D T ( t , i ) X ( t , i ) g 1 ( t , i ) )
X ( t , i ) ,   Ψ ( t , i ) ,   i N being the components of the solution of the TVPs (16) and (17), respectively. The main properties of the solutions of the TVPs (16)–(18) are summarized in the following lemma.
Lemma 1. 
Under the assumption (H2) the following hold:
(i) 
the unique solution X ( · ) = ( X ( · , 1 ) , , X ( · , N ) ) of the TVP (16) is defined on the whole interval [ t 0 , t f ] . Moreover, X ( t , i ) = X T ( t , i ) 0 , for all ( t , i ) [ t 0 , t f ] × N ;
(ii) 
the TVPs (17) and (18) have unique solutions t Ψ ( t ) = ( Ψ ( t , 1 ) , Ψ ( t , 2 ) , , Ψ ( t , N ) ) : [ t 0 , t f ] R n × R n × × R n and t μ ( t ) = ( μ ( t , 1 ) , μ ( t , 2 ) , , μ ( t , N ) ) T : [ t 0 , t f ] R N .
Proof. 
(i)
Follows immediately applying Corollary 5.2.3 from [9] applied in the case of TVP (16).
(ii)
The TVP (17) is associated with a linear nonhomogeneous differential equation with time-varying coefficients. Hence its solution is defined on the whole interval of a definition of its coefficients. According to ( i ) it follows that the coefficients of the differential Equation (17a) are defined on the whole interval [ t 0 , t f ] . Hence, its solution is also defined on the whole interval [ t 0 , t f ] . The conclusion regarding the definition of the solution of TVP (18) on the interval [ t 0 , t f ] is obtained in the same way.
Further, we consider the case when (11) is defined using the solutions of the TVPs (16)–(18). In this case, (13), (15), (16)–(18) allow us to reduce (14) to
J ( x 0 ; u ( · ) ) = E [ V ( t 0 , x 0 , η t o ) ] + E t 0 t f ( u ( t ) F ˜ ( t , η t ) x ( t ) φ ˜ ( t , η t ) ) T · W ^ 33 ( t , η t ) ( u ( t ) F ˜ ( t , η t ) x ( t ) φ ˜ ( t , η t ) ) d t
for all u ( · ) of type (10), where W ^ 33 ( t , i ) are computed as in (15f), F ˜ ( t , i ) is computed as in (17b) based on the solution of TVP (16), whereas
φ ˜ ( t , i ) : = ( R ( t , i ) + D T ( t , i ) X ( t , i ) D ( t , i ) ) 1 ( D T ( t , i ) X ( t , i ) g 1 ( t , i ) B T ( t , i ) Ψ ( t , i ) )
for all ( t , i ) [ t 0 , t f ] × N .
Now we are in position to state and prove the main result of this subsection.
Theorem 1. 
Assume that the assumption H2 is fulfilled. We consider the control law
u ˜ ( t ) = F ˜ ( t , η t ) x ˜ ( t ) + φ ˜ ( t , η t )
where F ˜ ( t , i ) and φ ˜ ( t , i ) are computed via (17b) and (20), respectively, based on the solution X ( · ) and Ψ ( · ) of TVPs (16) and (17) and x ˜ ( t ) is the solution of the closed-loop system obtained when coupling the control (21) to (8a). Under these conditions, the control (21) satisfies the following minimality condition
J ( x 0 ; u ˜ ( · ) ) = min u ( · ) U J ( x 0 ; u ( · ) ) .
The minimal value of the cost function (9) in the class of the controls U of type (10) is given by
J ( x 0 ; u ˜ ( · ) ) = x 0 T E [ X ( t 0 , η t 0 ) ] x 0 2 x 0 T E [ Ψ ( t 0 , η t 0 ) ] + E [ μ ( t 0 , η t 0 ) ] .
Proof. 
From (15f) we deduce via Lemma 1 ( i ) that under the assumption H2 ( b ) we have W ^ 33 ( t , η t ) > 0 , for all t [ t 0 , t f ] . The conclusion is obtained immediately from (19). □

3.2. The Case of Two Decision-Makers

In this subsection, we shall use the result derived in Theorem 1 to obtain the state space representation of an equilibrium strategy of type (3) which satisfies (5). Let k , l { 1 , 2 } be fixed such that l k . Let
u ˜ j ( t ) = F ˜ j ( t , η t ) x ( t ) + φ ˜ j ( t , η t ) , j = 1 , 2
be a candidate for a Nash equilibrium strategy. Taking j = l we rewrite (1) and (4) as
d x ( t ) = [ ( A 0 ( t , η t ) + B l ( t , η t ) F ˜ l ( t , η t ) ) x ( t ) + B k ( t , η t ) u k ( t ) + B l ( t , η t ) φ ˜ ( t , η t ) ] d t + [ ( A 1 ( t , η t ) + D l ( t , η t ) F ˜ l ( t , η t ) ) x ( t ) + D k ( t , η t ) u k ( t ) + D l ( t , η t ) φ ˜ l ( t , η t ) ] d w ( t ) x ( t 0 ) = x 0
J k l ( x 0 ; u k ( · ) ) = E ( z k ( t f ) ζ k ) T G k ( η t f ) ( z k ( t f ) ζ k ) + E t 0 t f { ( z k ( t ) r k ( t ) ) T M k ( t , η t ) ( z k ( t ) r k ( t ) ) + ( F ˜ l ( t , η t ) x ( t ) + φ ˜ l ( t , η t ) ) T R k l ( t , η t ) ( F ˜ l ( t , η t ) x ( t ) + φ ˜ ( t , η t ) ) + u k T ( t ) R k k ( t , η t ) u k ( t ) } d t
From Definition 1 it follows that ( u ˜ k ( · ) , u ˜ l ( · ) ) is a Nash equilibrium strategy for the dynamic game described by (1), (3), and (4) if u ˜ k ( · ) minimizes the cost (23) along with the trajectories of the system (22), determined by the controls of type
u k ( t ) = F k ( t , η t ) x ( t ) + φ k ( t , η t )
with F k ( · , i ) ,   φ k ( · , i ) arbitrary continuous functions.
In order to obtain the explicit formula of u ˜ k ( · ) with these properties, we apply Theorem 1 specialized to the case of the optimal tracking problem described by the system (22) and the performance criterion (23). To this end, we shall rewrite TVPs (16)–(18) with the updates
A 0 ( t , i ) A 0 ( t , i ) + B l ( t , i ) F ˜ l ( t , i ) , A 1 ( t , i ) A 1 ( t , i ) + D l ( t , i ) F ˜ l ( t , i ) ,
B ( t , i ) B k ( t , i ) , D ( t , i ) D k ( t , i ) , g 0 ( t , i ) B l ( t , i ) φ ˜ l ( t , i ) ,
g 1 ( t , i ) D l ( t , i ) φ ˜ l ( t , i ) , C ( t , i ) C k ( t , i ) F ˜ l ( t , i ) ,
M ( t , i ) M k ( t , i ) 0 0 R k l ( t , i ) , R ( t , i ) R k k ( t , i ) ,
G ( i ) G k ( i ) 0 0 0 , r ( t ) = r k ( t ) φ l ( t , i ) .
Thus, TVP (16) becomes
X ˙ k ( t , i ) + ( A 0 ( t , i ) + B l ( t , i ) F ˜ l ( t , i ) ) T X k ( t , i ) + X k ( t , i ) ( A 0 ( t , i ) + B l ( t , i ) F ˜ l ( t , i ) ) + ( A 1 ( t , i ) + D l ( t , i ) F ˜ l ( t , i ) ) T X k ( t , i ) ( A 1 ( t , i ) + D l ( t , i ) F ˜ l ( t , i ) ) + j = 1 N q i j X k ( t , j ) [ X k ( t , i ) B k ( t , i ) + ( A 1 ( t , i ) + D l ( t , i ) F ˜ l ( t , i ) ) T X k ( t , i ) D k ( t , i ) ] · ( R k k ( t , i ) + D k T ( t , i ) X k ( t , i ) D k ( t , i ) ) 1 · [ B k T ( t , i ) X k ( t , i ) + D k T ( t , i ) X k ( t , i ) ( A 1 ( t , i ) + D l ( t , i ) F ˜ l ( t , i ) ) ]
+ C k T ( t , i ) M k ( t , i ) C k ( t , i ) + F ˜ l T ( t , i ) R k l ( t , i ) F ˜ l ( t , i ) = 0
X k ( t f , i ) = C k T ( t f , i ) G k ( i ) C k ( t f , i )
i N ,   k = 1 , 2 ,   l = 3 k . The analogous of the feedback gains F ˜ ( t , i ) associated with the solution of TVP (16), via (17b) becomes, in the case of TVP (25)
F ˜ k ( t , i ) = ( R k k ( t , i ) + D k T ( t , i ) X k ( t , i ) D k ( t , i ) ) 1 · [ B k T ( t , i ) X k ( t , i ) + D k T ( t , i ) X k ( t , i ) ( A 1 ( t , i ) + D 3 k ( t , i ) F ˜ 3 k ( t , i ) ) ] .
In the case of the tracking problem described by (22) and (24), the TVPs (17) and (18) take the form
Ψ ˙ k ( t , i ) + ( A 0 ( t , i ) + B 1 ( t , i ) F ˜ 1 ( t , i ) + B 2 ( t , i ) F ˜ 2 ( t , i ) ) T Ψ k ( t , i )
+ j = 1 N q i j Ψ k ( t , j ) + G k ( t , X k ( t , i ) , i ) = 0
Ψ k ( t f , i ) = C k T ( t f , i ) G k ( i ) ζ k where G ( t , X k ( t , i ) , i ) = C k T ( t , i ) M k ( t , i ) r k ( t ) [ X k ( t , i ) B 3 k ( t , i ) + F ˜ 3 k T ( t , i ) R k , 3 k ( t , i )
+ ( A 1 ( t , i ) + D 1 ( t , i ) F ˜ 1 ( t , i ) + D 2 ( t , i ) F ˜ 2 ( t , i ) ) T X k ( t , i ) D 3 k ( t , i ) ] φ ˜ 3 k ( t , i )
μ ˙ k ( t , i ) + j = 1 N q i j μ k ( t , j ) + h k ( t , i ) = 0
μ k ( t f , i ) = ζ k T G k ( i ) ζ k h k ( t , i ) = φ ˜ 3 k T ( t , i ) D 3 k T ( t , i ) X k ( t , i ) D 3 k ( t , i ) φ ˜ 3 k ( t , i ) + r k T ( t , i ) M k ( t , i ) r k ( t , i ) + φ ˜ 3 k T ( t , i ) R k , 3 k ( t , i ) φ ˜ 3 k ( t , i ) 2 Ψ ˜ k T ( t , i ) B 3 k ( t , i ) φ ˜ 3 k ( t , i )
( φ ˜ 3 k T ( t , i ) D 3 k T ( t , i ) X k ( t , i ) D k ( t , i ) Ψ ˜ k T ( t , i ) B k ( t , i ) ( R k k ( t , i ) + D k T ( t , i ) X k ( t , i ) D k ( t , i ) ) 1 · ( D k T ( t , i ) X k ( t , i ) D 3 k ( t , i ) φ ˜ 3 k ( t , i ) B k T ( t , i ) Ψ ˜ k ( t , i ) )
for all i N ,   k = 1 , 2 . In this context (20) becomes
φ ˜ k ( t , i ) = ( R k k ( t , i ) + D k T ( t , i ) X k ( t , i ) D k ( t , i ) ) 1 × ( D k T ( t , i ) X k ( t , i ) D 3 k ( t , i ) φ ˜ 3 k ( t , i ) B k T ( t , i ) Ψ ˜ k ( t , i ) )
for all i N ,   k = 1 , 2 .
Remark 2. 
Although the TVP (25) is defined by a Riccati differential equation of type (16), we cannot be sure that the solution of this problem is defined on the whole interval [ t 0 , t f ] , because the domain of definition of its coefficients depends upon the domain of definition of the gain matrices F l ( · , i ) .
In the following, we shall regard (25) and (26) as a TVP associated with a hybrid system of nonlinear differential equations and nonlinear algebraic equations
X ˙ 1 ( t , i ) + ( A 0 ( t , i ) + B 2 ( t , i ) F 2 ( t , i ) ) T X 1 ( t , i ) + X 1 ( t , i ) ( A 0 ( t , i ) + B 2 ( t , i ) F 2 ( t , i ) ) + ( A 1 ( t , i ) + D 2 ( t , i ) F 2 ( t , i ) ) T X 1 ( t , i ) ( A 1 ( t , i ) + D 2 ( t , i ) F 2 ( t , i ) ) + j = 1 N q i j X 1 ( t , j ) [ X 1 ( t , i ) B 1 ( t , i ) + ( A 1 ( t , i ) + D 2 ( t , i ) F 2 ( t , i ) ) T X 1 ( t , i ) D 1 ( t , i ) ] · ( R 11 ( t , i ) + D 1 T ( t , i ) X 1 ( t , i ) D 1 ( t , i ) ) 1 · [ B 1 T ( t , i ) X 1 ( t , i ) + D 1 T ( t , i ) X 1 ( t , i ) ( A 1 ( t , i ) + D 2 ( t , i ) F 2 ( t , i ) ) ] + F 2 T ( t , i ) R 12 ( t , i ) F 2 ( t , i ) + C 1 T ( t , i ) M 1 ( t , i ) C 1 ( t , i ) = 0
X ˙ 2 ( t , i ) + ( A 0 ( t , i ) + B 1 ( t , i ) F 1 ( t , i ) ) T X 2 ( t , i ) + X 2 ( t , i ) ( A 0 ( t , i ) + B 1 ( t , i ) F 1 ( t , i ) ) + ( A 1 ( t , i ) + D 1 ( t , i ) F 1 ( t , i ) ) T X 2 ( t , i ) ( A 1 ( t , i ) + D 1 ( t , i ) F 1 ( t , i ) ) + j = 1 N q i j X 2 ( t , j ) [ X 2 ( t , i ) B 2 ( t , i ) + ( A 1 ( t , i ) + D 1 ( t , i ) F 1 ( t , i ) ) T X 2 ( t , i ) D 2 ( t , i ) ] ( R 22 ( t , i ) + D 2 T ( t , i ) X 2 ( t , i ) D 2 ( t , i ) ) 1 · [ B 2 T ( t , i ) X 2 ( t , i ) + D 2 T ( t , i ) X 2 ( t , i ) ( A 1 ( t , i ) + D 1 ( t , i ) F 1 ( t , i ) ) ] + F 1 T ( t , i ) R 21 ( t , i ) F 1 ( t , i ) + C 2 T ( t , i ) M 2 ( t , i ) C 2 ( t , i ) = 0
( R 11 ( t , i ) + D 1 T ( t , i ) X 1 ( t , i ) D 1 ( t , i ) ) F 1 ( t , i ) + D 1 T ( t , i ) X 1 ( t , i ) D 2 ( t , i ) F 2 ( t , i )
+ B 1 T ( t , i ) X 1 ( t , i ) + D 1 T ( t , i ) X 1 ( t , i ) A 1 ( t , i ) = 0 D 2 T ( t , i ) X 2 ( t , i ) D 1 ( t , i ) F 1 ( t , i ) + ( R 22 ( t , i ) + D 2 T ( t , i ) X 2 ( t , i ) D 2 ( t , i ) ) F 2 ( t , i )
+ B 2 T ( t , i ) X 2 ( t , i ) + D 2 T ( t , i ) X 2 ( t , i ) A 1 ( t , i ) = 0
X k ( t f , i ) = C k T ( t f , i ) G k ( i ) C k ( t f , i )
i N ,   k = 1 , 2 . At the same time, (27) and (29) can be viewed as a TVP associated with a hybrid system formed by two backward linear differential equations and two algebraic linear equations, as
Ψ ˙ 1 ( t , i ) + ( A 0 ( t , i ) + B 1 ( t , i ) F ˜ 1 ( t , i ) + B 2 ( t , i ) F ˜ 2 ( t , i ) ) T Ψ 1 ( t , i )
+ j = 1 N q i j Ψ 1 ( t , j ) G 12 ( t , i ) φ 2 ( t , i ) + C 1 T ( t , i ) M 1 ( t , i ) r 1 ( t ) = 0 Ψ ˙ 2 ( t , i ) + ( A 0 ( t , i ) + B 1 ( t , i ) F ˜ 1 ( t , i ) + B 2 ( t , i ) F ˜ 2 ( t , i ) ) T Ψ 2 ( t , i )
+ j = 1 N q i j Ψ 2 ( t , j ) G 21 ( t , i ) φ 1 ( t , i ) + C 2 T ( t , i ) M 2 ( t , i ) r 2 ( t ) = 0 ( R 11 ( t , i ) + D 1 T ( t , i ) X ˜ 1 ( t , i ) D 1 ( t , i ) ) φ 1 ( t , i ) + D 1 T ( t , i ) X ˜ 1 ( t , i ) D 2 ( t , i ) φ 2 ( t , i )
+ B 1 T ( t , i ) Ψ 1 ( t , i ) = 0 D 2 T ( t , i ) X ˜ 2 ( t , i ) D 1 ( t , i ) φ 1 ( t , i ) + ( R 22 ( t , i ) + D 2 T ( t , i ) X ˜ 2 ( t , i ) D 2 ( t , i ) ) φ 2 ( t , i )
+ B 2 T ( t , i ) Ψ 2 ( t , i ) = 0
Ψ k ( t f , i ) = C k T ( t f , i ) G k ( i ) ζ k ,
i N ,   k = 1 , 2 , where we denoted
G 12 ( t , i ) = X ˜ 1 ( t , i ) B 2 ( t , i ) + F ˜ 2 T ( t , i ) R 12 ( t , i ) + ( A 1 ( t , i ) + D 1 ( t , i ) F ˜ 1 ( t , i )
+ D 2 ( t , i ) F ˜ 2 ( t , i ) ) T X ˜ 1 ( t , i ) D 2 ( t , i ) G 21 ( t , i ) = X ˜ 2 ( t , i ) B 1 ( t , i ) + F ˜ 1 T ( t , i ) R 21 ( t , i ) + ( A 1 ( t , i ) + D 1 ( t , i ) F ˜ 1 ( t , i )
+ D 2 ( t , i ) F ˜ 2 ( t , i ) ) T X ˜ 2 ( t , i ) D 1 ( t , i ) .
In (31) and (32), ( X ˜ 1 ( t , i ) , X ˜ 2 ( t , i ) , F ˜ 1 ( t , i ) , F ˜ 2 ( t , i ) ) ,   i N , is a solution of the TVP (30). Applying Theorem 1 in the case of the optimal tracking problems described by system (22) and the performance criterion (23) for k = 1 and k = 2 , we obtain
Theorem 2. 
Assume:
(a) 
the assumption (H1) is fulfilled;
(b) 
the solutions ( X ˜ 1 ( · , i ) , X ˜ 2 ( · , i ) , F ˜ 1 ( · , i ) , F ˜ 2 ( · , i ) ) ,   i N , and ( Ψ ˜ 1 ( · , i ) , Ψ ˜ 2 ( · , i ) , φ ˜ 1 ( · , i ) ,   φ ˜ 2 ( · , i ) ) ,   i N of the TVPs (30) and (31), respectively, are defined on the whole interval [ t 0 , t f ] .
We set
u ˜ j ( t ) = F ˜ j ( t , η t ) x ˜ ( t ) + φ ˜ j ( t , η t ) , j = 1 , 2
x ˜ ( · ) being the solution of the IVP obtained replacing (33) in (1). Under these conditions, ( u ˜ 1 ( · ) , u ˜ 2 ( · ) ) is an equilibrium strategy for the differential game described by the controlled system (1), the performance criteria (4), and the family of the admissible strategies of type (3). The optimal values of the performance criteria are given by
J k ( x 0 ; u ˜ 1 ( · ) , u ˜ 2 ( · ) ) = x 0 T E [ X ˜ k ( t 0 , η t 0 ) ] x 0 2 x 0 T E [ Ψ ˜ k ( t o , η t 0 ) ] + E [ μ ˜ k ( t 0 , η t 0 ) ] ,
k = 1 , 2 .

4. Several Special Cases

4.1. The Case without Control-Dependent Noise of the Diffusion Part of the Controlled System

We assume that the controlled system (1) is in the special form
d x ( t ) = ( A 0 ( t , η t ) x ( t ) + B 1 ( t , η t ) u 1 ( t ) + B 2 ( t , η t ) u 2 ( t ) ) d t + A 1 ( t , η t ) x ( t ) d w ( t ) x ( t 0 ) = x 0 .
In this case the TVPs (30) and (31), respectively, reduce to
X ˙ 1 ( t , i ) + ( A 0 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) ) T X 1 ( t , i ) + X 1 ( t , i ) ( A 0 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) ) + A 1 T ( t , i ) X 1 ( t , i ) A 1 ( t , i ) X 1 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) + X 2 ( t , i ) S 12 ( t , i ) X 2 ( t , i ) + j = 1 N q i j X 1 ( t , j ) + C 1 T ( t , i ) M 1 ( t , i ) C 1 ( t , i ) = 0
X ˙ 2 ( t , i ) + ( A 0 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) ) T X 2 ( t , i ) + X 2 ( t , i ) ( A 0 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) ) + A 1 T ( t , i ) X 2 ( t , i ) A 1 ( t , i ) X 2 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) + X 1 ( t , i ) S 21 ( t , i ) X 1 ( t , i ) + j = 1 N q i j X 2 ( t , j ) + C 2 T ( t , i ) M 2 ( t , i ) C 2 ( t , i ) = 0
X k ( t f , i ) = C k T ( t f , i ) G k ( i ) C k ( t f , i )
F k ( t , i ) = R k k 1 ( t , i ) B k T ( t , i ) X k ( t , i )
i N , k = 1 , 2 ,
Ψ ˙ 1 ( t , i ) + ( A 0 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) ) T Ψ 1 ( t , i ) + j = 1 N q i j Ψ 1 ( t , j ) + ( X 1 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) S 12 ( t , i ) ) Ψ 2 ( t , i )
+ C 1 T ( t , i ) M 1 ( t , i ) r 1 ( t ) = 0 Ψ ˙ 2 ( t , i ) + ( X 2 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) S 21 ( t , i ) ) Ψ 1 ( t , i ) + ( A 0 ( t , i ) S 1 ( t , i ) X 1 ( t , i ) S 2 ( t , i ) X 2 ( t , i ) ) T Ψ 2 ( t , i ) + j = 1 N q i j Ψ 2 ( t , j )
+ C 2 T ( t , i ) M 2 ( t , i ) r 2 ( t ) = 0
Ψ k ( t f , i ) = C k T ( t f , i ) G k ( i ) ζ k
φ k ( t , i ) = R k k 1 ( t , i ) B k T ( t , i ) Ψ k ( t , i )
i N ,   k = 1 , 2 . In (35) and (36) we have denoted
S j ( t , i ) = B j ( t , i ) R j j 1 ( t , i ) B j T ( t , i )
S j k ( t , i ) = B k ( t , i ) R k k 1 ( t , i ) R j k ( t , i ) R k k 1 ( t , i ) B k T ( t , i )
j = 1 , 2 ,   k = 3 j . The TVP (28) becomes
μ ˙ k ( t , i ) + j = 1 N q i j μ k ( t , j ) + h ˜ k ( t , i ) = 0
μ k ( t f , i ) = ζ k T G k ( i ) ζ k h ˜ 1 ( t , i ) = r 1 T ( t ) M 1 ( t , i ) r 1 ( t ) + Ψ ˜ 2 T ( t , i ) S 12 ( t , i ) Ψ ˜ 2 ( t , i )
2 Ψ ˜ 1 T ( t , i ) S 2 ( t , i ) Ψ ˜ 2 ( t , i ) Ψ ˜ 1 T ( t , i ) S 1 ( t , i ) Ψ ˜ 1 ( t , i ) h ˜ 2 ( t , i ) = r 2 T ( t ) M 2 ( t , i ) r 2 ( t ) + Ψ ˜ 1 T ( t , i ) S 21 ( t , i ) Ψ ˜ 1 ( t , i )
2 Ψ ˜ 2 T ( t , i ) S 1 ( t , i ) Ψ ˜ 1 ( t , i ) Ψ ˜ 2 T ( t , i ) S 2 ( t , i ) Ψ ˜ 2 ( t , i )
In (37), ( Ψ ˜ 1 ( · , i ) , Ψ ˜ 2 ( · , i ) ) ,   i N is the solution of the TVP (36). Applying the result derived in Theorem 2, we obtain
Corollary 2. 
Assume:
(a) 
the assumption (H1)  is fulfilled;
(b) 
the solution ( X 1 ( · , i ) , X 2 ( · , i ) ) ,   i N , of the TVP (35a)–(35c) is defined on the whole interval [ t 0 , t f ] .
We set
u j ( t ) = R j j 1 ( t , η t ) B j T ( t , η t ) ( X j ( t , η t ) x ( t ) + Ψ j ( t , η t ) ) ,
x ( · ) being the solution of the IVP obtained substituting (38) in (34). Under these conditions, ( u 1 ( · ) , u 2 ( · ) ) is an equilibrium strategy for the differential game described by the controlled system (34), the performance criterion (4), and the admissible strategies of type (3). The optimal values of the performance criterion (4) are given by
J k ( x 0 ; u 1 ( · ) , u 2 ( · ) ) = x 0 T E [ X k ( t 0 , η t 0 ) ] x 0 2 x 0 T E [ Ψ k ( t o , η t 0 ) ] + E [ μ k ( t 0 , η t 0 ) ] ,
k = 1 , 2 ,   Ψ k ( · , i ) ,   μ k ( · , i ) being the solutions of the TVPs (36) and (37), respectively.

4.2. The Case when the Performance Criterion (4) Is Replaced by Performance Criterion of Type (7)

In this case, the aim of the decision-makers is to minimize the mean square of the deviations of the final value of the output z k ( t f ) from the target ζ k ,   k = 1 , 2 . If so, the equilibrium strategy is obtained solving the TVPs (30) and (31), respectively, when the controlled system is of type (1) or the TVPs (35) and (36), respectively, when the controlled system is of type (34). In both cases M k ( t , i ) = 0 ,   ( t , i ) [ t 0 , t f ] × N .

5. A Numerical Experiment

For the numerical experiment we considered the time invariant case of the system (34) with performance criterion (4) without Markovian jumping. We rewrite equations (35a) and (35b) of the form:
X ˙ 1 ( t ) = R 1 ( t , X 1 ( t ) , X 2 ( t ) ) = ( C 1 T M 1 C 1 + ( A 0 S 2 X 2 ( t ) ) T X 1 ( t ) + X 1 ( t ) ( A 0 S 2 X 2 ( t ) ) + A 1 T X 1 ( t ) A 1 X 1 ( t ) S 1 X 1 ( t ) + X 2 ( t ) S 12 X 2 ( t ) ) X ˙ 2 ( t ) = R 2 ( t , X 1 ( t ) , X 2 ( t ) ) = ( C 2 T M 2 C 2 + ( A 0 S 1 X 1 ( t ) ) T X 2 ( t ) + X 2 ( t ) ( A 0 S 1 X 1 ( t ) ) + A 1 T X 2 ( t ) A 1 X 2 ( t ) S 2 X 2 ( t ) + X 1 ( t ) S 21 X 1 ( t ) ) X k ( t f ) = C k T G k C k , k = 1 , 2 F k ( t ) = R k k 1 B k T X k ( t ) , k = 1 , 2 .
In this case, we have M k ( t , i ) = M k 0 , R k k ( t , i ) = R k k > 0 , G k ( i ) = G k 0 , R k ( t , i ) = R k 0 , r k ( t ) = 0 , 0 t t f , k , = 1 , 2 , k .
The matrix coefficients of the controlled system (1) are:
A 0 = 0.4 0 0.3 0.7 0.3 0.2 0 0.1 0.8 0.25 0.25 0 0.1 0.2 0 0.5 , A 0 R 4 × 4
A 1 = 0 0.2 0 0.1 0 0 0.3 0.15 0.145 0.06 0.2 0 0 0.3 0 0.5 , A 1 R 4 × 4
B 1 = 0.5 0 1 2.5 0.5 2 1 3 B 2 = 0 0.5 0.75 1.5 1 2 1 0 , B 1 , B 2 R 4 × 2
C 1 = 1.0 0.25 0.75 0.5 0.25 0.75 0.5 0.25 C 2 = 1.2 0.5 0.75 0.045 0.5 0.15 0.35 0.55 , C 1 , C 2 R 4 × 4
D 1 = 0 , D 2 = 0 .
The weight matrices for the performance criteria are of the form:
G 1 = 0.2778 0.094 0.094 0.166 G 2 = 0.2778 0.133 0.133 0.31 , G 1 , G 2 R 2 × 2
M 1 = 3.2 0.5 0.5 2.5 M 2 = 0.75 0.05 0.05 0.95 , M 1 , M 2 R 2 × 2
R 11 = 0.8 0.3 0.3 1.5 R 22 = 0.95 0.65 0.65 1.25
R 12 = 0.6 0.3 0.3 1.2 R 21 = 0.8 0.2 0.2 1.0 , R i j R 2 × 2 i , j = 1 , 2
Moreover, r k ( t ) = 0 , k = 1 , 2 and the targets are ζ 1 = [ 0.3 ; 0.8 ] , ζ 2 = [ 0.6 ; 0.9 ] , and t [ 0 , 1 ] . The initial point x 0 is chosen to be x 0 = [ 0.4 ; 0.01 ; 0.2 ; 0.25 ] ; R 4 × 1 .
To compute X 1 ( t ) , X 2 ( t ) we can use the Euler discretization method as ( k = 1 , 2 ) :
X ˜ k ( j h ) = X ˜ k ( ( j + 1 ) h ) + h R k ( ( j + 1 ) h , X 1 ( ( j + 1 ) h ) , X 2 ( ( j + 1 ) h ) )
with X ˜ k ( N h ) = C k T G k , C k , j = N 1 , N 2 , , 1 , 0 , N = [ t f / h ] , k = 1 , 2 .
We consider the following algorithm to compute the behavior of the controlled signals z ˜ k , k = 1 , 2 .
Step 1. 
The aim of this step is to compute the gains matrices F ˜ k ( j h ) and φ ˜ k ( j h ) , j = 0 , 1 , , N , k = 1 , 2 .
Step 2. 
The aim is to compute E [ | z ˜ k ( j h ) | 2 ] for j = 0 , 1 , . N ; k = 1 , 2 . We have:
E [ | z ˜ k ( j h ) | 2 ] = T r [ C k Σ ( j h ) C k T ]
Consider two cases:
(A).
The base variant using the above matrix coefficients. We have executed Step 1 and Step 2. The computed values E [ | z ˜ k ( j h ) | 2 ] , k = 1 , 2 for the signals z ˜ k , k = 1 , 2 of the players are given on Figure 1 and Figure 2 for the first player and the second player, respectively. Moreover, we have obtained the following values of E [ | z k ( 1 ) ζ k | 2 ] for the players k = 1 , 2 , i.e.,
E [ | z 1 ( 1 ) ζ 1 | 2 ] = 0.5033 , E [ | z 2 ( 1 ) ζ 2 | 2 ] = 1.0173 .
(B).
We want to compute the output of the closed-loop system z k ( · ) by using a control law (other than the optimal one) u ˜ k ( j h ) = F k ( j h ) x ( j h ) + φ k ( j h ) with k = 1 , 2 , j = 0 , 1 , , N 1 . For this, we take F k ( j h ) and φ k ( j h ) different from the optimal cases. We use the same matrix coefficients. After Step 1, we obtain the optimal values of F ˜ k ( j h ) and φ ˜ k ( j h ) k = 1 , 2 . Then, we compute the different values F ˜ k k ( j h ) , φ ˜ k k ( j h ) , k = 1 , 2 as follows ( j = 0 , 1 , , N 1 ):
F ˜ 1 1 ( j h ) = ( F ˜ 1 ( j h ) + F ˜ 2 ( j h ) ) / 2 , F ˜ 2 2 ( j h ) = ( F ˜ 1 ( j h ) F ˜ 2 ( j h ) ) / 2 ,
φ ˜ 1 1 ( j h ) = ( φ ˜ 1 ( j h ) φ ˜ 2 ( j h ) ) / 2 , φ ˜ 2 2 ( j h ) = ( φ ˜ 1 ( j h ) + φ ˜ 2 ( j h ) ) / 2 .
The computations continue with Step 2 with F ˜ k k ( j h ) , φ ˜ k k ( j h ) , k = 1 , 2 . The computed values of E [ | z k ( 1 ) ζ k | 2 ] , k = 1 , 2 are
E [ | z 1 ( 1 ) ζ 1 | 2 ] = 8.5203 , E [ | z 2 ( 1 ) ζ 2 | 2 ] = 4.5174 .
One sees from (39) and (40) that the values of the obtained deviation from the target provided by the optimal control are better than the ones provided by another control.

6. Conclusions

In this work, we studied the problem of the minimization of the deviation of some outputs of a controlled dynamical system from some given reference signals. We considered the case where the dynamical system is controlled by two decision-makers which are not cooperating. One of the two decision-makers wants to minimize the deviation of a preferential output of the dynamical system from a given reference signal, whereas the other decision-maker wants to minimize the deviation of another output of the same dynamical controlled system from another reference signal. This problem was viewed as a problem of designing a Nash equilibrium strategy for an affine quadratic differential game with two players. Since it was supposed that the controlled dynamical system is subject to multiplicative white noise perturbations and Markovian jumping, we must find a Nash equilibrium strategy for a stochastic affine quadratic differential game. We have obtained explicit formulae of the equilibrium strategy. To this end, the solutions of two TVPs were involved. The first TVP is associated with a hybrid system formed by two nonlinear backward differential equations and two nonlinear algebraic equations, namely, the TVP (30). The second TVP is associated with a hybrid system formed by two backward linear differential equations coupled with two affine matrix algebraic equations, that is TVP (31). The first TVP is the same as that involved in the description of the Nash equilibrium strategy for an LQ differential game. The second TVP takes into consideration the reference signals r k ( · ) together with the final targets ζ k , k = 1 , 2 .
There are few directions that can be considered as future research:
  • Direct extensions from this article can be considered as follows: the case when two or more players (with different cost functionals) are willing to cooperate or the case when t f for the tracking problem associated with a controlled system of type (1).
  • Anther direction of future research can consider the case of a tracking problem with preview in the case when the controlled dynamical system is affected by state multiplicative and/or control multiplicative white noise perturbations. To our knowledge, this case was not yet considered in the existing literature. Some results in this direction have been reported, for example in [2,6,7], for the case of only one decision-maker and [29,30] for the case with more than one decision-maker.
  • Finally, other directions can consider the case of linear quadratic tracking problem with a delay component (for one or more players) for Itô stochastic systems. Some results in this direction have been reported for examples in [10,11].

Author Contributions

Conceptualization, V.D., I.G.I. and I.-L.P.; methodology, V.D., I.G.I. and I.-L.P.; investigation, V.D., I.G.I. and I.-L.P.; writing—original draft preparation, V.D., I.G.I. and I.-L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Axelband, E. The structure of the optimal tracking problem for distributed-parameter systems. IEEE Trans. Autom. Control 1968, 13, 50–56. [Google Scholar] [CrossRef]
  2. Cohen, A.; Shaked, U. Linear discrete-time H-optimal tracking with preview. IEEE Trans. Autom. Control 1997, 42, 270–276. [Google Scholar] [CrossRef]
  3. Emami-Naeini, A.; Franklin, G. Deadbeat control and tracking of discrete-time systems. IEEE Trans. Autom. Control 1982, 27, 176–181. [Google Scholar] [CrossRef]
  4. Liu, D.; Liu, X. Optimal and minimum-energy optimal tracking of discrete linear time-varying systems. Automatica 1995, 31, 1407–1419. [Google Scholar] [CrossRef]
  5. Shaked, U.; de Souza, C.E. Continuous-time tracking problems in an H setting: A game theory approach. IEEE Trans. Autom. Control 1995, 40, 841–852. [Google Scholar] [CrossRef]
  6. Gershon, E.; Shaked, U.; Yaesh, I. H tracking of linear systems with stochastic uncertainties and preview. IFAC Proc. Vol. 2002, 35, 407–412. [Google Scholar] [CrossRef]
  7. Gershon, E.; Limebeer, D.J.N.; Shaked, U.; Yaesh, I. Stochastic H tracking with preview for state-multiplicative systems. IEEE Trans. Autom. Control 2004, 49, 2061–2068. [Google Scholar] [CrossRef]
  8. Dragan, V.; Morozan, T. Discrete-time Riccati type equations and the tracking problem. ICIC Express Lett. 2008, 2, 109–116. [Google Scholar]
  9. Dragan, V.; Morozan, T.; Stoica, A.M. Mathematical Methods in Robust Control of Linear Stochastic Systems; Springer: New York, NY, USA, 2013. [Google Scholar]
  10. Han, C.; Wang, W. Optimal LQ tracking control for continuous-time systems with pointwise time-varying input delay. Int. J. Control. Autom. Syst. 2017, 15, 2243–2252. [Google Scholar] [CrossRef]
  11. Jin, N.; Liu, S.; Zhang, H. Tracking Problem for Itô Stochastic System with Input Delay. In Proceedings of the Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 1370–1374. [Google Scholar]
  12. Pindyck, R. An application of the linear quadratic tracking problem to economic stabilization policy. IEEE Trans. Autom. Control 1972, 17, 287–300. [Google Scholar] [CrossRef]
  13. Alba-Florest, R.; Barbieri, E. Real-time infinite horizon linear quadratic tracking controller for vibration quenching in flexible beams. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Taipei, Taiwa, 8–11 October 2006; pp. 38–43. [Google Scholar]
  14. Wang, Y.-L.; Yang, G.-H. Robust H model reference tracking control for networked control systems with communication constraints. Int. J. Control. Autom. Syst. 2009, 7, 992–1000. [Google Scholar] [CrossRef]
  15. Ou, M.; Li, S.; Wang, C. Finite-time tracking control for multiple non-holonomic mobile robots based on visual servoing. Int. J. Control 2013, 86, 2175–2188. [Google Scholar] [CrossRef]
  16. Fu, Y.-M.; Lu, Y.; Zhang, T.; Zhang, M.-R.; Li, C.-J. Trajectory tracking problem for Markov jump systems with Itô stochastic disturbance and its application in orbit manoeuvring. IMA J. Math. Control. Inf. 2018, 35, 1201–1216. [Google Scholar] [CrossRef]
  17. Basar, T. Nash equilibria of risk-sensitive nonlinear stochastic differential games. J. Optim. Theory Appliations 1999, 100, 479–498. [Google Scholar] [CrossRef]
  18. Buckdahn, R.; Cardaliaguet, P.; Rainer, C. Nash equilibrium payoffs for nonzero-sum stochastic differential games. SIAM J. Control. Optim. 2004, 43, 624–642. [Google Scholar] [CrossRef] [Green Version]
  19. Sun, H.Y.; Li, M.; Zhang, W.H. Linear quadratic stochastic differential game: Infinite time case. ICIC Express Lett. 2010, 5, 1449–1454. [Google Scholar]
  20. Sun, H.; Yan, L.; Li, L. Linear quadratic stochastic differential games with Markov jumps and multiplicative noise: Infinite time case. Int. J. Innov. Comput. Inf. Control 2015, 11, 348–361. [Google Scholar]
  21. Nakura, G. Nash tracking game with preview by stare feedback for linear continuous-time systems. In Proceedings of the 50th ISCIE International Symposium on Stochastic Systems Theory and Its Applications, Kyoto, Japan, 1–2 November 2018; pp. 49–55. [Google Scholar]
  22. Fridman, A. Stochastic Differential Equations and Applications; Academic: New York, NY, USA, 1975; Volume I. [Google Scholar]
  23. Oksendal, B. Stoch. Differ. Equations; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  24. Chung, K.L. Markov Chains with Stationary Transition Probabilities; Springer: Berlin/Heidelberg, Germany, 1967. [Google Scholar]
  25. Doob, J.L. Stochastic Processes; Wiley: New York, NY, USA, 1967. [Google Scholar]
  26. Boukas, E.R. Stochastic Switching Systems: Analysis and Design; Birkhäuser: Boston, MA, USA, 2005. [Google Scholar]
  27. Costa, O.L.V.; Fragoso, M.D.; Marques, R.P. Discrete-Time Markov Jump Linear Systems; Series: Probability and Its Applications; Springer: London, UK, 2005. [Google Scholar]
  28. Costa, O.L.V.; Fragoso, M.D.; Todorov, M.G. Continuous-Time Markov Jump Linear Systems; Series: Probability and Its Applications; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  29. Nakura, G. Soft-Constrained Nash Tracking Game with Preview by State Feedback for Linear Continuous-Time Markovian Jump Systems. In Proceedings of the 2018 57th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), Nara, Japan, 11–14 September 2018; pp. 450–455. [Google Scholar]
  30. Nakura, G. Nash Tracking Game with Preview by State Feedback for Linear Continuous-Time Markovian Jump Systems. In Proceedings of the 50th ISCIE International Symposium on Stochastic Systems Theory and Its Applications, Kyoto, Japan, 1–2 November 2018; pp. 56–63. [Google Scholar]
Figure 1. Plot of traces of the first player: E [ | z ˜ 1 ( j h ) | 2 ] in [ 0 , t f ] = [ 0 , 1 ] .
Figure 1. Plot of traces of the first player: E [ | z ˜ 1 ( j h ) | 2 ] in [ 0 , t f ] = [ 0 , 1 ] .
Axioms 12 00076 g001
Figure 2. Plot of traces of the second player: E [ | z ˜ 2 ( j h ) | 2 ] in [ 0 , t f ] = [ 0 , 1 ] .
Figure 2. Plot of traces of the second player: E [ | z ˜ 2 ( j h ) | 2 ] in [ 0 , t f ] = [ 0 , 1 ] .
Axioms 12 00076 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Drăgan, V.; Ivanov, I.G.; Popa, I.-L. A Game—Theoretic Model for a Stochastic Linear Quadratic Tracking Problem. Axioms 2023, 12, 76. https://doi.org/10.3390/axioms12010076

AMA Style

Drăgan V, Ivanov IG, Popa I-L. A Game—Theoretic Model for a Stochastic Linear Quadratic Tracking Problem. Axioms. 2023; 12(1):76. https://doi.org/10.3390/axioms12010076

Chicago/Turabian Style

Drăgan, Vasile, Ivan Ganchev Ivanov, and Ioan-Lucian Popa. 2023. "A Game—Theoretic Model for a Stochastic Linear Quadratic Tracking Problem" Axioms 12, no. 1: 76. https://doi.org/10.3390/axioms12010076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop