Cooperative Stochastic Games with Mean-Variance Preferences

: In stochastic games, the player’s payoff is a stochastic variable. In most papers, expected payoff is considered as a payoff, which means the risk neutrality of the players. However, there may exist risk-sensitive players who would take into account “risk” measuring their stochastic payoffs. In the paper, we propose a model of stochastic games with mean-variance payoff functions, which is the sum of expectation and standard deviation multiplied by a coefﬁcient characterizing a player’s attention to risk. We construct a cooperative version of a stochastic game with mean-variance preferences by deﬁning characteristic function using a maxmin approach. The imputation in a cooperative stochastic game with mean-variance preferences is supposed to be a random vector. We construct the core of a cooperative stochastic game with mean-variance preferences. The paper extends existing models of discrete-time stochastic games and approaches to ﬁnd cooperative solutions in these games.


Introduction
The class of stochastic games was initially introduced by L. Shapley [1]. He considered two-player zero-sum stochastic games with finite state space and finite strategy spaces and proved the existence of optimal stationary strategies when players' payoff function is discounted mathematical expected payoff. This result has been extended to the case of n-player games by Fink [2], Takahashi [3], Sobel [4]. The results in stochastic games of the recent 60 years, mostly for non-cooperative case, are discussed by Neyman and Sorin [5], Solan [6], and Solan and Vieille [7].
Players' payoffs in stochastic games are stochastic variables. In most papers, the expected discounted payoff is considered as a measure of a player's payoff in a stochastic game. In this case, it is supposed that the players are risk neutral. Hence, the risk averse and risk loving attitudes cannot be modeled considering only expectation as a criterion. We propose a model of a stochastic game in which the players' attentions to risk are taken into account. We use a mean-variance preference popular in decision making theory. In particular, this type of preference for random payoffs is used in financial risk management, in a portfolio decision theory proposed by Markowitz [8]. The risk allocation problem in portfolio management based on cooperative game theory approach is also examined by Csóka et al. [9].
We consider n-person stochastic games with finite sets of actions in any state of the game. A mean-variance stochastic game in discrete time with the payoff function, which is a linear combination of expectation and standard deviation of the payoff, is proposed. The goal of the paper is to construct the cooperative version of a mean-variance stochastic game and define the core as a cooperative solution of such a cooperative game. In the paper, we first define the players' payoffs according to mean-variance preferences in a noncooperative setting, and second, we define a cooperative game in the form of characteristic function. Then, we determine a solution of a cooperative stochastic game using the method proposed by Suijs and Borm [10], and Suijs et al. in [11]. The contribution of the paper is in proposing the theory of cooperative stochastic games with mean-variances preferences. The research question about the stability of the cooperative solution over time is open and briefly discussed in Section 5.
First, we consider a non-cooperative stochastic game with mean-variance utility function. Second, we define the cooperative version of the game constructing characteristic function using maxmin approach (see Neumann and Morgenstern [12]). This approach supposes that the value of characteristic function for any coalition is the maxmin payoff in zero-sum stochastic game between the coalition and anti-coalition. As the players may have different coefficients before standard deviations in the utility functions, the summation of the utilities does not make sense. When a coalition is formed, the maximal utility of the coalition payoff over the coalition members is used to measure the utility of the coalition payoff. There are other approaches of defining characteristic function in dynamic games proposed by Gromova and Petrosyan [13], Petrosjan and Zaccour [14] and examined by Reddy and Zaccour [15]. However, applications of these approaches need corrections, considering the stochastic nature of payoffs and mean-variance utility functions. Cooperative stochastic games with expected payoff as a utility function are introduced by Petrosjan [16] and Petrosjan and Baranova [17], and then extended by Parilina [18] and Parilina and Tampieri [19] by examining the stability of cooperative solutions in stochastic dynamic settings. As proposed by Suijs et al. [11], we consider the core as a solution of a cooperative game. The conditions of node-consistency of the core in dynamic games played over event trees with stochastic nature are obtained by Parilina and Zaccour [20]. Recently, the sufficient conditions of strong subgame consistency of the core are obtained by Parilina and Petrosyan [21].
Then, we define a solution of a cooperative stochastic game. The idea is briefly described below. Suijs et al. introduce a new class of cooperative games arising from cooperative decision making problems in the stochastic environment [10,11]. They introduce some types of preferences to order stochastic payoffs and model the games with risk neutral, as well as risk averse and risk loving players. Suijs et al. in [11] define the core of the cooperative game with stochastic payoffs and obtain the necessary and sufficient conditions for the non-emptiness of the core. The deterministic equivalent of a stochastic payoff with some properties is introduced by Suijs and Borm [10]. The nucleolus for this type of games is introduced by Suijs et al. [11]. The Shapley-like solutions for the cooperative game with stochastic payoffs are proposed by Timmer et al. [22]. The model of transferable utility game with uncertainty is introduced by Habis and Herings [23]. They propose a solution concept of a weak sequential core for this class of games and show that it is non-empty if all ex post TU-games are convex.
The goal of the paper is to propose a scheme of construction of a cooperative version of a stochastic game, in which not only expectation but also standard deviation of players' payoffs is taken into account. The idea of considering risk-sensitive players recently appears in different variations in some papers on dynamic and differential games. The risk sensitive nonzero-sum continuous time stochastic games are considered by Wei [24], where the existence of Nash equilibrium is proved. The risk-sensitive average payoff criterion for zero-sum (see Bäuerle and Rieder [25]) and for the nonzero-sum discrete-time stochastic games with a denumerable state space (see Wei and Chen [26]), in which an existence of the Nash equilibrium is examined. The risk sensitive Nash equilibria in stochastic games are discussed by Nowak [27]. The class of risk sensitive stochastic games with overlapping generations is considered by Jaśkiewicz and Nowak [28]. Recently, the Nash equilibria for stochastic games of resource extraction in which risk coefficient is included into utility function are found by Asienkiewicz and Balbus [29]. In mean-field games, the model with both expectation and variance in payoff criteria is proposed by Saldi et al. [30].
The paper is organized as follows. The class of mean-variance stochastic games in discrete time is introduced in Section 2. The propositions providing formulae to calculate a variance and covariance of players' payoffs in stochastic games are given in Section 3. In Section 4, we define a cooperative mean-variance stochastic game and an imputation as a cooperative solution of the game. We briefly conclude in Section 5. The proofs of all propositions are given in the appendix.

Stochastic Game
Define a finite stochastic game G using the following notations. The finite set of players is N = {1, . . . , n}, and the finite set of states is S = {1, 2, . . . , t}. In any state s ∈ S, an n-person normal-form game N, {A s i } i∈N , {K s i } i∈N is defined, where A s i is the finite set of Player i's actions in state s, K s i a s 1 , . . . , a s n = K s i (a s ) ∈ R is a payoff function of Player i in state s, i ∈ N, s ∈ S. The action profile a s = (a s 1 , . . . , a s n ) belongs to the set A s = ∏ i∈N A s i . The transition map π is T → ∆(S) function, where T = {(s, a s ) | s ∈ S; a s ∈ A s } and ∆(S) is a probability distribution on S. For each pair (s, a s ), mapping π identifies the vector of probabilities (π(1 | s; a s ), . . ., π(s | s; a s ), . . ., π(t | s; a s )), where π(s | s; a s ) is the probability that state s will be realized at the next stage if at the present stage (state s) action-profile a s has been realized, π(s | s; a s ) 0 and ∑ s ∈S π(s | s; a s ) = 1 for any a s ∈ A s and s ∈ S.
Let vector p 0 = (p 0 1 , . . . , p 0 t ) be a vector of initial distribution on S, and p 0 s is the probability that state s will be realized at the first stage of the game, p 0 s 0 for any s ∈ S, and ∑ s∈S p 0 s = 1. Let all players have the same discount factor δ ∈ (0, 1) for their payoffs. Definition 1. The stochastic game G is defined by N, S, π, p 0 , δ .
(1) Definition 2. Stochastic subgame G s is a finite stochastic game (1) with initial distribution vector p 0 , such that p 0 s = 1. Subgame G s is a stochastic game starting from state s ∈ S.
The implementation of a stochastic game can be briefly described in the following way. First, the chance moves and chooses the initial state of the game according to vector p 0 . Let the initial state be s ∈ S. Second, the first stage of game G is realised, i.e., the normalform game N, {A s i } i∈N , {K s i } i∈N is played. Players simultaneously choose their actions from the action sets {A s i } i∈N . Let Player i's action in state s be a s i ∈ A s i , s ∈ S. Therefore, an action profile a s = (a s Then, the transition to the next stage is realized. The probability that state s will appear at the next stage is equal to π(s | s; a s ), s ∈ S. After that, the game proceeds in the described way.
Assume that players know all the parameters of the game. We consider a class of stationary strategies. Let Player i's strategy in game G be a mapping η i (·), such that Denote the set of stationary strategies of Player i as H i . The stationary strategy defines an action at any stage depending only on which state is realized at the stage, i.e., η i : s −→ a s i ∈ A s i , s ∈ S, and not depending on the stage number and the history of the stage. A vector of stationary strategies is a strategy profile η = (η 1 , . . . , η n ) ∈ H 1 × . . . × H n in stochastic game G. Obviously, η is a strategy profile in any subgame G s .

Mean-Variance Payoff Functions
Players' payoffs in stochastic games are random variables and they are functions of stationary strategy profile η. Denote Player i's payoff in stochastic game G as ξ i = ξ i (η) and Player i's payoff in stochastic subgame G s as ξ s i = ξ s i (η) for any s ∈ S. Random variables ξ i and ξ i = (ξ 1 i , . . . , ξ t i ) satisfy the following equation: To find a player's "optimal" strategy, it is necessary to determine how to compare random payoffs. The most common way is to measure a random payoff by its mathematical expectation, assuming the risk neutrality of the players. Strictly speaking, ξ i is preferred to is the mathematical expectation of random variable ξ. This preference is complete and transitive but it does not allow one to model different attentions to risk. Example 1. One may compare random payoffs using the principle of stochastic domination. Let ξ i , ψ i ∈ L 1 (R) be random payoffs of Player i and F ξ i (·) and F ψ i (·) be cumulative distribution functions of ξ i and ψ i , respectively. It is said that ξ i stochastically dominates ψ i and write holds. This preference is incomplete, therefore, random payoffs can be incomparable with this preference. Another disadvantage of this preference is its computational complexity for a class of stochastic games. Thus, its application to the theory of stochastic games is questionable.
We use the method of ordering random payoffs proposed in [10,11]. Let every player i have a utility function u i of a random payoff such that u i : [10] the preferences ( i ) i∈N are considered for the cooperative games with stochastic payoffs and they are such that for any player i ∈ N there exists a utility function u i : Expression ξ ∼ i u i (ξ) means that the random payoff ξ is equivalent to u i (ξ), and ξ i ψ means that random payoff ξ is preferred or equivalent to random payoff ψ for Player i. In [10], the function u i (·) is called a deterministic equivalent of the preference. In our work, we call function u i (ξ) the utility function of a random payoff ξ. Obviously, the utility function u i (ξ i ) = E(ξ i ) satisfies properties 1-5 given above.

Example 2.
One may consider a preference α such that for any ξ, This utility function also satisfies properties 1-5 given above.
Let Player i's utility function of random payoff ξ i be where Var(ξ i ) is the variance of ξ i , parameter b i ∈ R shows player's attention to risk. Note that if b i = 0, it is understood as Player i is risk neutral. If b i > 0, then Player i is risk loving, and if b i < 0, then Player i is a risk averse person. Denote the preference corresponding to utility function (2) as b i . The utility function (2) satisfies properties 1-5. Preference b i is complete but not implied by F . The preference b i is attractive for applications because of its computation simplicity and possibility of modeling attention to risk. If b i = 0 for any i ∈ N, utility function (2) becomes u i (ξ i ) = E(ξ i ), which coincides with preference E .

Stochastic Games with Mean-Variance Preferences
We define a stochastic game in which any player i ∈ N has a utility function u i :L 1 (R) → R of form (2). Definition 3. Stochastic game G b with mean-variance preferences is defined by tuple where b = (b 1 , . . . , b n ) ∈ R n is a profile of parameters determining players' utility functions (2) for any i ∈ N.
By G s b , s ∈ S, we define the subgame of stochastic game (3) with p 0 = (p 0 1 , . . . , p 0 t ) such that p 0 s = 1. In the next section, we present the formulae to calculate the utilities of random payoffs in stochastic game (3) and any subgame G s b . We also consider the case when players form a coalition and calculate its summarized stochastic payoff.

Remark 1.
All the eigenvalues of stochastic matrix Π(η) are from the interval [−1, 1]. For the existence of matrix Π(η), it is necessary and sufficient that the determinant of matrix (Π(η) − I/δ) be not equal to zero. Therefore, matrix (Π(η) − I/δ) must not have an eigenvalue equal to 1/δ. The last condition is satisfied because 1/δ > 1. Therefore, it cannot be an eigenvalue of stochastic matrix Π(η).
An expectation of Player i's payoff in stochastic game G equals The variance Var(ξ i (η)) of Player i's payoff ξ i (η) in game G can be found using the following proposition. Proposition 1. Let in stochastic game G or G b , Player i's strategy be η i ∈ H i . The variance of Player i's payoff is where operation A • B is the Hadamard product (Let A and B be m × n matrices with entries in R. For further calculations, in particular, for the construction of a cooperative form of stochastic game G b and subgames G s b , it is required to determine the random payoff of any coalition C ⊂ N, which is denoted as ξ C and equals We should notice that players' payoffs ξ s i and ξ s j are not independent variables for i = j, i, j ∈ C. The following proposition gives the formulae to calculate an expectation and variance of random payoff ξ C (η) of any coalition C. Proposition 2. The expectation of payoff ξ C (η) of coalition C in stochastic game G or G b is The variance of payoff ξ C (η) of coalition C in stochastic game G or G b is Var(ξ C (η)) = ∑ i∈C Var(ξ i (η)) + 2 ∑ l<m, l,m∈C Cov(ξ l (η), ξ m (η)), (9) where variance Var(ξ i (η)) of Player i's payoff is defined by formula (7), and Cov(ξ l (η), ξ m (η)) is a covariance of Players l and m' payoffs, which is equal to

Proof. See Appendix A.2.
Corollary 2. The variance of payoff ξ s C (η) of coalition C in subgame G s or G s b , s ∈ S, can be found by formula (9) substituting p 0 = (p 0 1 , . . . , p 0 t ) s. t. p 0 s = 1 and p 0 s = 0 for any s ∈ S, s = s.
Equation (6) and Proposition 1 provide formulae to calculate the utilities of players' payoffs in stochastic games with mean-variance preferences for any strategy profile η. Proposition 2 provides formulae to calculate the utilities of coalition payoffs in stochastic games with mean-variance preferences.

Cooperative Stochastic Games with Mean-Variance Preferences
In this section, we construct a cooperative version of a stochastic game G b and subgame G s b , starting from state s ∈ S. Therefore, we determine characteristic function v(·) : 2 N → R satisfying two properties: 1.
(superadditivity) For any disjoint coalitions C, C ⊂ N, The value v(C) can be interpreted as a "worth" or "power" of coalition C when its members play together as a unit. We use "maxmin" approach to determine superadditive characteristic function [12].
The characteristic function for subgame G s b is denoted as v s (·) and defined as for any coalition C ⊂ N. Here, η C ∈ ∏ i∈C H i is a stationary strategy of coalition C and η N\C ∈ ∏ i∈N\C H i is a stationary strategy of coalition N \ C. Strategies (η C , η N\C ) form a strategy profile in any subgame G s b and game G b . In Equation (11) and further calculations, the utility function of any player i is of form (2), as required in the definition of stochastic game G b . An expression in the right-handed side of (11); we find maxmin of the maximal value of utility among players from C. Maxmin value is such that coalition C can guarantee by strategy η C when coalition N \ C plays against coalition C using strategy η N\C .
Characteristic function for game G b is denoted by v(·) and it can be defined for coalition C using the following expression: This function is equal to the maximal value of utility function of random payoff ξ C (η C , η N\C ) (among the players from coalition C), which coalition C can guarantee by itself using stationary strategy η C ∈ ∏ i∈C H i if coalition N \ C plays against C in zero-sum game using strategy η N\C ∈ ∏ i∈N\C H i . Cooperation in stochastic game G b assumes that players form the grand coalition N and play together to maximize the maximal utility of their total payoff ξ N . We call the stationary strategy profile η * in game G b a cooperative one if Therefore, the value of characteristic function v(N) can be rewritten as The value v s (N) of the characteristic function v s (N Definition 4. Cooperative stochastic game with mean-variance preferences is a tuple N, v with characteristic function v : 2 N → R defined by (12) and (14), and v(∅) = 0.

Definition 5.
Cooperative stochastic subgame with mean-variance preferences is a tuple N, v s with characteristic function v s : 2 N → R defined by (11) and (15), and v(∅) = 0.
In cooperation, players with mean-variance preferences realize a cooperative stationary strategy profile η * given by (13).
The main problem of cooperative game theory is to find a "fair" imputation of coalition N's total payoff. In a classical cooperative game theory, values of characteristic function are non-random variables and there is a variety of solution concepts to determine, in some sense, "fair" imputation.
The concept of a random payoff imputation is proposed in [11], in which it is supposed that the components of the imputation are random variables. We give a definition of an imputation of a random payoff ξ N (η * ).
According to Definition 6, the payoff ξ N (η * ) of coalition N in game G b is a random vector characterized by an allocation of expected payoff E(ξ N (η * )) and allocation of random variable (ξ N (η * ) − E(ξ N (η * )), which can be interpreted as a risk of random payoff ξ N (η * ). Denote the set of imputations in cooperative game G b as Φ(N).
We use the statement from [11] about necessary and sufficient conditions that an imputation belongs to the core of cooperative stochastic game G b with mean-variance preferences as a definition of the core.
holds for any coalition C ⊂ N.
To illustrate the idea of defining the core in cooperative stochastic game with meanvariance preferences, we provide a numerical example demonstrating calculations step by step. Transition probabilities are defined in matrices: where element (i, j) of the first (second) matrix consists of transition probabilities from states 1 (2) to the states 1, 2 respectively under condition that Player 1 chooses strategy i and Player 2 chooses strategy j in state 1 (2). Let discount factor δ be 0.9 and vector of initial distribution p 0 on state set {1, 2} be (0.5, 0.5). Suppose that both players are risk averse and b 1 = −2, b 2 = −1.
Using condition (13), we obtain η = (η 1 , η 2 ), where η 1 = (0, 0), η 2 = (1, 1), i.e., player 1 chooses action 2 in both states, and player 2 chooses action 1 in both states. The maximal value of the maximal utility function is equal to the following one: v(N) = max i∈N u i (ξ N (η)) = 94.1709. Therefore, imputation defined by (d, r) is in the core of game G b according to Definition 7 if and only if the following inequality takes place for any C ⊂ N: If C = {1}, the inequality is and for C = {2}: Taking into account Definition 6 of the imputation, the following conditions hold: The following system defines the core of the cooperative stochastic game with mean-variance preferences: The region of (r 1 , d 1 ), r 1 ∈ [0, 1], satisfying the first inequality of the latter system, is depicted in Figure 1. We can easily notice that the core is non-empty and contains an infinite number of imputations. The imputation of the core is a vector where d 1 , d 2 , r 1 , r 2 ∈ R and satisfy (16).

Conclusions
We have constructed a model of cooperative stochastic games with mean-variance preferences. The utility function for mean-variance preference takes into account players' attention to risk. The risk sensitivity of a player is modeled by a linear combination of mathematical expectation and standard deviation of the payoff. We have provided formulas to calculate player's and coalition utilities for any stationary strategy profile. We have defined a cooperative stochastic game with mean-variance preferences in the form of characteristic function determined by a maxmin approach. A particular cooperative solution proposed for the game is the core. The natural research question arising in the area of dynamic games is the sustainability and time-consistency of the cooperative solution over time. The problem of stability of a cooperative agreement in stochastic games with mean preferences is discussed by Parilina and Tampieri [19], Parilina [18], Parilina and Petrosyan [21], where conditions of stable cooperation are examined. The other research question is how to determine another cooperative solution like the Shapley value, nucleolus, etc., for stochastic games with mean-variance preferences and how to sustain them over time. where W k is the expectation of the squared payoff of the ith player in stochastic game G under condition that the game ends at stage k. The value of W k can be obtained by the following equation: or another form of the same equation: We calculate W k recurrently and establish the general form of W k using the method of mathematical induction. It is obvious that The following computations are needed to determine W 2 : We obtain where Π j is the jth row of matrix Π. We should notice that Making similar calculations, we obtain an expression for the value of W 3 : Suppose the following formula is true for any k = 1, 2, . . . , n − 1: and we prove it using the method of mathematical induction for k = n. Let Equation (A2) hold for k = n − 1. The value of W n+1 is equal to the following one: Taking into account that we can write an expression: Taking into account Equation (A2) for k = n, which is true by an assumption of mathematical induction, we obtain an expression for W n+1 : Taking into account Equation (A1) and expression for W k , we obtain where To find the sum Then, we calculate the second sum: Taking into account the last two expressions, we obtain Therefore, we obtain the following expression for E(ξ 2 i ): Since p 0 Π(K i • K i ) is equal to (p 0 Π • K i )K i by the property of Hadamard product, Equation (A3) can be rewritten as follows There is a final formula for the variance of Player i's payoff: Appendix A.2. Proof of Proposition 2 As in Appendix A.1, we omit the symbols of strategy profile in notations. So, we have K s i = K s i (a s ), E(ξ i ) = E(ξ i (η)), Π = Π(η), Π = Π(η), ξ = ξ(η), π(l | m; a m ) = π(l | m). Equation (8) takes place because the expectation of payoff ξ C is equal to the sum of expectations of the payoffs ξ i of the players from coalition C and it follows from (5).
To prove Equation (9), we take into account that ξ C = ∑ i∈C ξ i and use the following formula for computation of the variance of coalition C's payoff in stochastic game G: Var(ξ C ) = ∑ i∈C Var(ξ i ) + 2 ∑ l<m, l,m∈C cov(ξ l , ξ m ). (A4) The variance of ξ i can be calculated by Equation (7), so we should obtain the formula to calculate covariance of the lth and mth players' payoffs. Covariance cov(ξ l , ξ m ) can be derived as follows: cov(ξ l , ξ m ) = E(ξ l ξ m ) − E(ξ l )E(ξ m ).
The way of determining covariance cov(ξ l , ξ m ) is similar to the way of determining E(ξ 2 i ) in Appendix A.1. That is why we will not perform all computations in detail. The expectation of multiplication of Player l and m's payoffs is where W k is the expectation of payoffs multiplication under condition that the game ends at stage k. The value of W k can be expressed in the following form: It is obvious that W 1 = p 0 (K l • K m ).
Expression for W 2 is Making similar computation as in the previous proof, we can write the recurrent equation for W k : With respect to (A5), we obtain Simplifying the summands of the last expression of Equation (A6), we can rewrite it as follows Since p 0 Π(K l • K m ) is equal to (p 0 Π • K m )K l and to (p 0 Π • K l )K m by the property of Hadamard product, Equation (A7) can be rewritten as follows Following Equations (7), (A4) and (A8), the covariance of payoffs of Players l and m in stochastic game G can be found by formula: cov(ξ l , ξ m ) =(p 0 Π • K l )( Π − 0.5I)K m + (p 0 Π • K m )( Π − 0.5I)K l − (p 0 ΠK l )(p 0 ΠK m ).