Pareto Optimal Strategy under H ∞ Constraint for Discrete-Time Stochastic Systems

: This paper investigates the Pareto optimal strategy of discrete-time stochastic systems under H ∞ constraint, in which the weighting matrices of the weighted sum cost function can be indeﬁnite. Combining the H ∞ control theory with the indeﬁnite LQ control theory, the generalized difference Riccati equations (GDREs) are obtained. By means of the solution of the GDREs, the Pareto optimal strategy with H ∞ constraint is derived, and the necessary and sufﬁcient conditions for the existence of the strategy are presented. Then the Pareto optimal solution under the worst-case disturbance is solved. Finally, the efﬁciency of the obtained results is illustrated by a numerical example.


Introduction
With the increasing scale of modern industry and economy, it is unavoidable to deal with multi-player and multi-objective optimal control problems [1]. There will inevitably be cooperation or competition between different players. As an important tool to solve this problem, game theory has been widely studied by scholars [2][3][4][5]. Game theory includes the noncooperative game and the cooperative game. In a noncooperative game, one player makes independent decisions without considering the benefits of the other players. On the contrary, the cooperative game reasonably coordinates the interests of each player within a specific rule. As the concept of win-win cooperation gains popularity, the cooperative game has also become a popular topic.
As an important type of cooperative games, Pareto game has firstly been used in economic theories [6,7], and now it is also used in the engineering field, such as path planning [8], crude oil scheduling [9] and mobile edge computing [10]. Hence, Pareto game has been widely investigated by many researchers. Engwerda [11] gave a characterization of all Pareto solutions when the weighting matrices of the cost function are positive definite and then generalized this result for indefinite criteria [12]. Further, Reddy [13] studied the conditions for the existence of Pareto optimal strategy in infinite horizon, and systematically analyzed the relationship between Pareto optimality and weight sum minimization. Along with maturing of Pareto optimal control theory for deterministic systems, scholars have done some work on Pareto optimal control of stochastic systems. Lin et al. [14] derived the necessary and sufficient conditions for Pareto optimal strategy of stochastic system. For discrete-time stochastic systems, Zhu et al. [15] gave sufficient conditions for the existence of the strategy sets with finite horizon, and Peng et al. [16,17] studied the Pareto optimality of linear and nonlinear systems with infinite horizon, respectively. Ahmed et al. [18] studied the Pareto optimal control with external disturbances, and gave the form of Pareto optimal control under H ∞ constraint for continuous-time stochastic systems by means of linear matrix inequalities. Jiang et al. [19] introduced the generalized differential Riccati • Using the weighted sum method of Pareto optimization and combined with the H ∞ control theory, the GDREs are obtained. Based on obtained GDREs, we get the Pareto efficient strategies under H ∞ constraint, which can not only achieve Pareto optimization, but also reduce the influence of external disturbances. • Based on the solvability of the GDREs, we derive the necessary and the sufficient conditions for the existence of H ∞ constraint Pareto optimal control for discrete-time stochastic systems. Then we derive all Pareto solutions for all Pareto efficient strategies. • We investigate the indefinite linear-quadratic difference game with external disturbance and stochastic bounded real lemma (SBRL) with a nonzero initial value. The weighting matrices of the cost functional are allowed to be indefinite in this paper.
The rest of this paper is organized as follows: Section 2 presents the system description and makes some useful preliminaries. In Section 3, Pareto optimality under H ∞ constraint is investigated. Section 4 presents an example of space heating to illustrate the obtained results. The conclusion of this paper is given in Section 5.
Notations: A : the transpose of the matrix or vector A; A † : the Moore-Penrose pseudoinverse of A; A > 0(A 0): A is the positive definite (positive semi-definite) symmetric matrix; E(·): the mathematical expectation operator; R n : the set of n-dimensional real vectors; R m×n : the set of m × n real matrices; I n : the n × n identity matrix;

System Descriptions and Preliminaries
Consider stochastic finite horizon discrete-time linear system with multi-player as follows: where x k ∈ R n represents the system state, u i,k ∈ R m i is the ith control input at time k, v k ∈ R n v is the disturbance signal, z k ∈ R n z is the controlled output. Denote the joint action of each controllers by u k := col(u 1,k , . . . , . . , N are matrix-valued continuous functions with appropriate dimensions. {w k } k∈N T is an independent one-dimensional real random variable sequence defined in a given complete filtered probability space {Ω, Before giving the definition of Pareto optimal strategy with H ∞ constraint, we need to analyze the Pareto optimality and H ∞ performance of discrete-time systems, respectively. We will first introduce some definitions and lemmas of Pareto optimal strategy. In this part, the disturbance is not considered. Let v k ≡ 0 , system (1) can be reduced to For system (2), the cost functionals that the player or controller u i,k wants to minimize are where i, j ∈ N , Q i , R ij ∈ S n and R −1 ij exists.

Definition 1 ([19]
). Denote J i (u, x 0 ) = J i (u 1,k , . . . , u N,k ; x 0 ) and joint control u := (u 1,k , . . . , u N,k ) ∈ U , where U is the set of all admissible controls. The u * is called Pareto efficient for system (2), if the set of the inequalities J i (u, x 0 ) ≤ J i (u * , x 0 ), i ∈ N := {1, 2, . . . , N} do not hold for any solution u ∈ U , where at least one of the inequalities is strict. The (J 1 (u * , x 0 ), . . . , J N (u * , x 0 )) corresponding to Pareto efficiency u * is a Pareto solution, and all Pareto solutions form the Pareto frontier.
If u * is Pareto efficient, it means that we cannot find other admissible u to make one or more J i (u, x 0 ), i ∈ N get better while no J j (u, x 0 ), j ∈ N \i gets worse at the same time. To solve pareto efficiency, we need to introduce the following two lemmas.
Then u * is Pareto efficient.
Lemma 2 ([12]). Assume that the control strategy set U is a convex set and the cost functionals J i (u, x 0 ), i ∈ N , are convex w.r.t. u. If admissible u * is Pareto efficient, then there exists an α = (α 1 , . . . , α N ) ∈ A such that (4) holds.

Remark 1.
Lemma 1 is only a sufficient condition to obtain Pareto efficient strategies, and it cannot guarantee that all Pareto efficient strategies can be obtained by (4). If Q i 0 and R ij > 0, i, j ∈ N ,through triangle inequality [11], we can infer that the corresponding cost functionals J i , i ∈ N are convex. According to Lemma 2 , if the control strategy set U and the cost functionals J i , i ∈ N are convex, the Pareto efficient strategy u * can be obtained by the weighted sum method.
In this paper, we consider the case that Q i and R ij may be indefinite matrices, which requires us first to ensure that J i is convex.

Lemma 3 ([14]). Consider the system (2). The cost functionals
Under the assumption that u ∈ U is convex and min u∈U J i (u, 0) = 0, i ∈ N , the convexity of the cost function is guaranteed, which further ensures that all Pareto efficient strategies with indefinite matrices Q i and R ij can be obtained by minimizing the weighted cost functional.
For the weighted sum cost functional Next, let control input u ≡ 0 consider the following discrete-time stochastic perturbed system for H ∞ analysis.
The perturbed operator of system (6) is defined by L T : .Define the norm of the perturbed operator of system (6) as In (7) the initial weighting matrix S = S > 0 is introduced to measure the uncertainty of initial state x 0 . It can be seen that L T represents the effect of the initial value and external disturbance on the system output. When we require L T < γ, the following robust cost functional is obtained, which establishes a relationship between the disturbance attenuation problem and the solvability of GDRE.
For notational convenience, simplify discrete-time system (1) as Based on the above analysis, we define the Pareto optimal strategy for discrete-time system (9) with H ∞ constraint.

Definition 2.
Consider the controlled stochastic system (9). For a given disturbance attenuation level γ > 0, find a state feedback joint control u * with u * k = K k x k , such that (1) For the closed-loop system the norm of the perturbed operator of (10) satisfies when such (u * , v * ) exists, we say that the Pareto optimal strategy for discrete-time system (9) with H ∞ constraint is solvable.

Main Results
In this section, we will first study H ∞ control and Pareto optimal control separately. Then by solving the coupled GDREs equation, the Pareto optimal control under the worst-case disturbance can be obtained.
In order to obtain the worst-case disturbance, we need to introduce the stochastic bounded real lemma (SBRL), which plays a crucial role in H ∞ analysis. Below, we give some lemmas that are essential for our main results.

Lemma 4 ([27]
). Suppose P k , k ∈ N T+1 , are arbitrary real symmetric matrices, then for any x 0 ∈ R n in system (6), we have where

Lemma 5 ([27]
). Suppose P k , k ∈ N T+1 , are arbitrary real symmetric matrices. It can be further derived that, for any x 0 ∈ R n in system (6): Then M(P k ) can be simplified as

Lemma 6 ([27]
). For c, b ∈ R n , A = A and A −1 exists, we have Lemma 5 rewrites the cost functional J v (v, x 0 ) so that Lemma 6 can be applied. Finally, the cost functional J v (v, x 0 ) is transformed into the following Equations (16). Accordingly, the minimum value of J v (v, x 0 ) and the corresponding worst-case disturbance are apparent.

Lemma 7.
(SBRL) Consider the discrete-time stochastic system (6) and perturbed operator (7), we have L T < γ for some disturbance attenuation γ > 0 and initial weighting matrix S = S > 0, if and only if has a unique solution P k ≤ 0 on N T with P 0 + γ 2 S > 0.
Necessity part: The literature [31] has proved that for arbitrary x 0 ∈ R n , if L T < γ, then (15) admits a solution P k 0 on N T . Next, we will prove P 0 + γ 2 S > 0 by contradiction.

Remark 2.
In this paper, the Pareto solution we studied is valid for any initial value x 0 . In order to maintain consistency, we extend the SBRL in [31] to the case where the initial value x 0 can be arbitrary.
According to Lemmas 1 and 2, Pareto optimal strategy can be obtained by minimizing weighted sum objective functional J α (u, x 0 ), which is a single-objective optimization problem. Because R α and Q α are allowed to be indefinite, if J α (u, x 0 ) is taken as the cost functional of LQ problem, Pareto optimal control can be regarded as the solution of stochastic discrete-time system indefinite LQ problem.
Consider discrete-time stochastic system without disturbance And the corresponding cost functional is given as The LQ problem aims to find a control strategy that minimizes weighted sum cost functional (18). We should note that the LQ problem may be ill-posed under the constraint (17) since R α and Q α may be indefinite. Therefore, the following two definitions are given.

Definition 4 ([25]). The LQ problem
It can be seen that if the LQ problem is attainable, it means that there must exist a corresponding optimal control u * .
The property of the pseudo matrix inverse will be used in order to solve the indefinite LQ problem.

Lemma 8 ([32]).
Given a matrix C ∈ R m×n , there exists a unique matrix C † ∈ R m×n satisfying In Lemma 8, C † is called the Moore-Penrose pseudoinverse of C . Lemma 9 ([25]). For the system (17) and indefinite weighted sum cost functional (18), the following are equivalent: (1) The following GDRE is solved by a symmetric matrix sequence {P k }, k ∈ N T .
(2) The LQ problem is well-posed.
(3) The LQ problem is attainable. If any of the above three conditions can be satisfied, the LQ problem is attainable by where P 0 , . . . , P T are solutions of GDRE (19).
Based on the above analysis of the indefinite LQ problem and the SBRL with nonzero initial value, we study the Pareto optimal control with H ∞ constraint. Theorem 1. Consider the discrete-time stochastic system (9) with multiple players , control inputs u i,k , i = 1, 2, . . . , N, k ∈ N T and the external disturbance v k . Set γ > 0, the weighting factor α ∈ A and S = S > 0. If the following GDREs (20)-(23) have a solution P k u , P k v ; K k p , K k γ with P k v ≤ 0, P 0 v + γ 2 S > 0 and P k u ∈ S n , k ∈ N T , then the discrete-time finite horizon Pareto optimal control with H ∞ constraint is solvable. Pareto efficiency strategy u * k under the worst-case Proof of Theorem 1. Sufficiency part: Applying u k = u * k = K k p x k into system (9), where K k p is defined in (23), we have Because system (24) and system (6) have the same structure, the related lemmas of system (6) are also applicable to system (24). Similar to the proof of Lemma 7, we denotẽ Applying Lemma 5 and completing squares method to system (24) and considering the corresponding cost functional J v u * k , v k , x 0 , we have where Since Equation (20) holds, combined with (25), we can obtain x 0 ) > 0, that means inequality (11) holds, i.e., L T < γ, Accordingly, the weighted sum cost functional is Replacing x k+1 withÂ k x k + B k u k +D k x k w k + Z k u k w k , using completing squares method and considering Equation (22), it follows that: k . According to Lemmas 1 and 9, when the worst-case disturbance v * k is imposed on system (9) the Pareto efficiency can be given as u * k = K k p x k . Necessity part: Assume the Pareto optimal strategy for discrete-time system (9) with worst-case disturbance v * k = K k γ x k is u * k = K k p x k . It means that when u k = u * k = K k p x k is applied to system (22), we have L T < γ. According to Lemma 7, we can conclude that Equation (20) has a unique solution P k The proof is completed.
Remark 4. Theorem 2 shows how to obtain the value of J i (u * , v * , x 0 ), i = 1, 2, . . . , N for any controllers u * i . According to the definition of Pareto solutions, J i (u * , v * , x 0 ) is not uniquely determined. When α ∈ A changes, J i (u * , v * , x 0 ) will also change, and the set of all Pareto solutions constitutes the Pareto frontier.

Remark 5.
Because of the existence ofÃ k ,D k ,Â k andD k in Equations (21) and (23) , K k γ and K k p are coupled. To avoid overly complex solutions, the system (1) is reduced to a system with only state-dependent noise: Equations (21) and (23) can be rewritten as the follwing two coupled equations: Substituting K k p into the Equation (35), after calculations, K k γ is as follows:

Example
In this section, we give a numerical example to show more details of calculating the Pareto optimal control with H ∞ constraints. Example 1. Consider system (34) with N = 2, and the corresponding coefficients are given in Table 1. The cost functionals of two players are  The corresponding weighted sum cost functional is And the the robust cost functional J v (u, v, x 0 ) is given as (13). Set the initial temperature x 0 = [−2 − 3] , α 1 = 0.8, the disturbance attenuation γ = 0.8, T = 10, S = I.
The first step to get the solution of this Pareto game is to calculate K 10 γ and K 10 p by solving Equations (37) and (36) with P 11 v = 0 and P 11 u = 0. Then by solving the Equations (20) and (22), we get P 10 v and P 10 u . Calculate like this until k=0, we get evolutions of P k v = P v11 P v12 P v12 P v22 , P k u = P u11 P u12 P u12 P u22 , K k γ and K k p shown in Figures 1 and 2, respectively, which show the convergence of the solutions of coupled GDREs (20)- (23). From Figures 3 and 4, we can konw that the constraints P k v ≤ 0 and P 0 v + γ 2 S > 0 are satisfied, which means that the obtained P k v and P k u are valid.    According to the obtained K k γ and K k p , we can get the Pareto efficient strategy u * k = K k p x k under the worst-case disturbance v * k = K k γ x k and the corresponding x * k , as shown in Figures 5 and 6. It can be seen from Figure 6 that the system state is stable after k = 5. Therefore, the control input u * k obtained by Theorem 1 can achieve rapid convergence of the system state, even when external disturbance exists. Next, let v =v = 0.8 k sin(k), we get the corresponding values of ∑ 10 k=0 E z kz k and ∑ 10 k=0 E γ 2v kv k + γ 2 x 0 S x 0 as shown in Figure 7. As a comparison, we apply the same disturbance to the algorithm without considering the H ∞ constraint in [25], and obtain the corresponding ∑ 10 k=0 E ẑ kẑ k as shown by the dotted line in Figure 7. Obviously, for the same disturbance, the system output under the control input obtained in this paper has a smaller sum, and is less affected by the disturbance. In Figure 7, ∑ 10 k=0 E z kz k is always smaller than ∑ 10 k=0 E γ 2v kv k + γ 2 x 0 S x 0 , which demonstrates L T < γ, that is, the influence of disturbance on the system output is controlled within the set range. According to Definition 2, we can say that u * satisfies the H ∞ constraint. Finally, let α vary on [0, 1]. By solving Equation (31), we can get different Pareto solutions, which constitutes the Pareto frontier as shown in Figure 8. It can be seen from Figure 8 that on the Pareto front if J 1 (u * 1 , u * 2 , v * , x 0 ) gets better J 2 (u * 1 , u * 2 , v * , x 0 ) will get worse, as every set of solutions on the Pareto frontier satisfies Definition 1. It also shows that the methods proposed in this paper are effective in obtaining Pareto solutions and Pareto frontier. Each of the Pareto solutions on the Pareto frontier satisfies that it cannot be improved by all the players simultaneously. Therefore, we can make tradeoffs on the Pareto frontier to determine desired J 1 (u * 1 , u * 2 , v * , x 0 ) and J 2 (u * 1 , u * 2 , v * , x 0 ).

Conclusions
This paper has studied the Pareto optimal strategy under H ∞ constraint for finite horizon discrete-time stochastic systems, where the SBRL with nonzero initial value has been obtained. By means of four coupled GDREs, the necessary and sufficient conditions for the existence of Pareto optimal strategy under worst-case disturbance have been given. The Pareto solutions of each player corresponding to the optimal strategy have been studied. Simulation results of a numerical example have shown the effectiveness of the main results. In the future, we can extend the obtained Pareto optimal strategy under H ∞ constraint to the infinite horizon case [13] or apply it to the mean-field stochastic system [23] and time-delay system [33]. We can also explore model-free Pareto optimal control under H ∞ constraint through reinforcement learning methods [34].