On a Simpliﬁed Method of Deﬁning Characteristic Function in Stochastic Games

: In the paper, we propose a new method of constructing cooperative stochastic game in the form of characteristic function when initially non-cooperative stochastic game is given. The set of states and the set of actions for any player is ﬁnite. The construction of the characteristic function is based on a calculation of the maximin values of zero-sum games between a coalition and its anti-coalition for each state of the game. The proposed characteristic function has some advantages in comparison with previously deﬁned characteristic functions for stochastic games. In particular, the advantages include computation simplicity and strong subgame consistency of the core calculated with the values of the new characteristic function.


Introduction
When a non-cooperative game is initially defined, the problem of construction of a cooperative version of the game is actual if players start acting as a unique coalition to maximize their joint payoff or minimize joint costs. The classical approach is to define cooperative game in a form of characteristic function that assigns the value for any coalition of players. Subsequently, based on this function one can calculate the imputation of the joint payoff allocating it among players. The component of the imputations may vary if we calculate them based on different characteristic functions. Therefore, the way of defining this function is important and it has influence on the players' payoffs in cooperative game. Moreover, some approaches to define characteristic function make it impossible to apply in dynamic or differential games because of computational difficulties. Additionally, the way of constructing characteristic function also influences on the consistency properties of cooperative solutions that are realized in dynamics.
The choice of the approach on how to define characteristic function also depends on the background of the considered problem if it arises from an applied area. The existence and uniqueness issues are also actual when one chooses the way of constructing characteristic function. There exist different approaches that can be applied to stochastic game. The so-called maxmin and minmax approaches define the value of the function for coalition S as maxmin and minmax payoff of coalition S in zero-sum game against coalition of all left-out players [1,2]. Another approach is proposed in [3,4] when the value of coalition S is defined as its payoff in the Nash equilibrium in the non-cooperative game between coalition S and left-out players acting individually. The calculation of characteristic function in two-step procedure is proposed in [5], in which the authors first find an n-player non-cooperative equilibrium and then allow coalition S to optimize its payoff, assuming that left-out players use their Nash equilibrium actions found at the first step. The properties of this function are examined in [6,7]. Another two-stage approach for defining characteristic function is proposed in [8], in which the strategies maximizing total payoff of the players are first found. Subsequently, these strategies are used by the players from coalition S, while the out-coalition players use the strategies minimizing the total payoff of players from S. The joint payoff of players from the coalition equals the value of characteristic function for this coalition.
The new simplified method of constructing characteristic function in multistage games is introduced in [9]. They examine the properties of this function and proved that the corresponding core is strongly subgame-consistent in multistage game. This property cannot be proved in general case when the characteristic function is constructed with the classical approaches, like maxmin or minmax.
In the paper, we adopt the method of constructing the characteristic function proposed in [9] to stochastic games. Based on the values of the characteristic function, one can determine the core. Moreover, the core satisfies the strong subgame consistency property, which is a refinement of subgame consistency on the case of set-valued cooperative solutions. The problem of subgame consistency is originally examined for differential games in [10,11]. The construction of a special payment scheme, called imputation distribution procedure (see [11]), allows for coping with the problem of time inconsistency of cooperative solutions. This problem is described for stochastic games in [12][13][14] in the case of unique-valued cooperative solutions. The node-consistent core is constructed in dynamic games played over event trees in [15]. The strong subgame consistency of the set-valued cooperative solution, like the core, guarantees players to obtain, in total, the solution from initially defined core. It means that, in any intermediate time period, the solution is the sum of obtained payments up to the current period, and the core elements of subgame starting from the next time period. The strong subgame consistency condition is proposed in [16]. The subcore satisfying strong subgame property is constructed for multistage games in [17]. The problem of subgame consistency is actual for different classes of dynamic and differential games and it is examined in [18] for stochastic games with finite duration, in [19] for differential games with finite time horizon, in [20] for multistage games. In the paper, we construct characteristic function for stochastic game in a special way and calculate the core while using the values of this function. The core satisfies strong subgame consistency property. To prove this result, we define the imputation distribution procedure, which determines the payments to the players in any state realized in the game process.
The rest of the paper is organized, as follows. We describe the model of stochastic games in Section 2.1. In Section 2.2, we define the new approximated characteristic function for stage games, and then extend this approach to the case of stochastic game in Section 2.3. We formulate the definition of the imputation distribution procedure for stochastic games and describe the idea of strongly subgame consistency of the core in Section 3. We briefly conclude in Section 4.

Model
Consider a non-cooperative stochastic game G given by where • N = {1, . . . , n} is the set of players.
• Ω = {ω 1 , . . . , ω k } is the finite set of states. • Γ(ω) is the game in normal form associated with state ω. The set of players N is common for any state ω. Let A ω i be a finite set of actions of player i ∈ N in state ω, a ω i ∈ A ω i be an action of player i ∈ N in this state; K ω i : ∏ j∈N A ω j → R be a payoff function of player i in state ω. • p(·|ω, a ω ) : Ω × A ω → ∆(Ω) is a transition function from state ω when action profile a ω ∈ ∏ j∈N A ω j is realized, where ∆(Ω) is a probability distribution over set Ω.
Denote by G ω the subgame of G starting from state ω defined by (1) with π 0 , such that π ω 0 = 1 and π ω 0 = 0 for any state ω = ω. We assume that, in stochastic game G, the set of any player's strategies H i is stationary. The stationary strategy of player i is η i assigning action (maybe mixed) a i ∈ ∆(A ω i ) to any state ω. The vector (η 1 , . . . , η n ) ∈ ∏ j∈N H j is a stationary strategy profile in stochastic game G. It is obvious that a stationary strategy η i of player i ∈ N in game G is the stationary strategy of this player in any subgame G ω .
By the payoff of player i, we assume the expected payoff in stochastic subgame G ω given by where η ∈ H = ∏ j∈N H j is a stationary strategy profile such that η(ω) = a ω ∈ ∏ j∈N A ω j . We rewrite Equation (2) in a vector form and obtain where . A matrix of transition probabilities is formed in the following way in which each row contains transition probabilities from a corresponding state.
Equation (3) implies the explicit formula to calculate the expected payoff of player i when the stationary strategy profile η is realized: where I k is an identity matrix of size k × k. Inverted matrix (I k − δΠ(η)) −1 always exists for discount factor δ ∈ (0, 1).
Taking into account the probability distribution π 0 , we calculate the expected payoff in game G, as If players cooperate, they find the cooperative strategy profile η * maximizing the total expected payoff, which is We should notice that η * is a pure stationary strategy profile.
The profile η * is such that η We also assume that the profile η * is such that for any state ω ∈ Ω, which means that the cooperative strategy profile maximizes the total payoff of the players independently of which state is initial. This assumption is usually satisfied for most stochastic games.
To define cooperative game when the non-cooperative stochastic game is given, we use the classical approach and define it in the form of characteristic function v : 2 N → R 1 whose values estimate the "power" of any coalition or the subset of players. In [21], the characteristic function value for coalition S in subgame starting at any state ω is defined in as maxmin value, which is where G ω S is a zero-sum stochastic subgame starting at state ω, in which coalition S is a maximizing player, coalition N\S is a minimizing player. Existence of the value of game G ω S for stochastic games is proved in [22].

Approximated Characteristic Function for State Games
Before we define a characteristic function in a new form, we need to make additional calculations. First, we consider state games and propose a scheme of calculation of the approximated characteristic function values for any state. Define characteristic function for a state ω ∈ Ω or one-shot game Γ(ω) given in normal form while using the maxmin approach: where maxmin in (7) is found in pure strategies. Let C(ω) be a non-empty core in the game defined in state ω using c.f. (7), which is Remark 1. We assume that conditions under which the core C(ω) exists for any state ω are satisfied. The core C(ω) is non-empty if and only if for any function ψ : holds. Characteristic function v(ω, S) is defined by (7). We refer to the book [25] for further discussion of non-emptiness of the core.
Second, for any coalition S ⊆ N define maximal value of characteristic function (7) over set Ω: which is the maximal value that coalition S can obtain in state games. The next step is to define the approximated value of the characteristic function for any state in the following way. Let for any state ω ∈ Ω the approximated characteristic function w(ω, S) be given as In Equation (11), the summarized payoff of the players adopting cooperative action profile a ω * is assigned to the grand coalition. The approximated (maximal possible value over all possible states) values of characteristic functionŵ(S) given by (10) are assigned to any coalition S different from N. Denote the core constructed with the values of characteristic function (11) as D(ω) and assume that it is non-empty for any state ω, is true, and the core D(ω) is non-empty for any ω, and then D(ω) ⊂ C(ω).
If condition (13) Subsequently, for any coalition (13) states that the maximal total payoff of the players in state ω coincides with their payoff if players adopt actions prescribed by the cooperative strategy profile. It may not be satisfied in general case in dynamic games. If condition (13) is not true, the main result of the paper can be proved, but it requires a modification in the method of characteristic function definition. We leave this case for future research.

Remark 3.
We assume that the approximated core D(ω) is non-empty for any ω. The conditions under which it is non-empty are similar to the ones given in Remark 1, but in Equation (9)  • in state ω 1 : (4, 4, 2) (0, 0, 0) • in state ω 2 : Player 1 chooses a row, player 2 chooses a column and player 3 chooses a matrix.
The transition probabilities are written in the matrices: • for state ω 1 : (1, 0) • for state ω 2 : The first (second) element in any entry of the matrix is the probability of transition from the particular state and action profile to state ω 1 (state ω 2 ). One can easily notice that the probabilistic transitions are defined in state ω 1 when players choose action profiles (a 1 , b 1 , c 1 ), (a 2 , b 1 , c 2 ) and (a 1 , b 2 , c 2 ), and in state ω 2 when players choose action profiles (α 1 , ζ 1 , γ 2 ). All other transitions are deterministic.
The discount factor equals 0.9. Cooperative strategy profile η * = (η * 1 , η * 2 , η * 3 ) is such that which prescribes any player to choose the first action in state ω 1 and the second action in state ω 2 . The cooperative strategy profile defines a Markov chain with the structure that is depicted in Figure 1. The players' payoffs are (10,10,8) in state ω 1 and (7, 5, 7) in state ω 2 . We obtain that the maximal total payoff of the players in state games coincide with the payoff that players get in states implementing cooperative strategy profile η * . However, Theorem 1 is also true for the case when this condition is not satisfied.
First, we calculate the characteristic function v(ω, S) by Equation (7) and its approximation w(ω, S) by (11) for state games. The values of these functions are represented in Table 1. The cores of state games C(ω) and D(ω) calculated with values of functions v(ω, S) and w(ω, S) by Formulae (8) and (12) are non-empty for any ω and represented on Figures 2 and 3 for ω 1 and ω 2 respectively. . The core C(ω 1 ) (gray region) and approximated core D(ω 1 ) (blue region inside gray region) for ω 1 state game.  Figure 3. The core C(ω 2 ) (gray region) and approximated core D(ω 2 ) (blue region inside gray region) for ω 2 state game.

New Approximated Characteristic Function for Stochastic Games
We propose a new method of determining characteristic function for stochastic games based on the values of approximated characteristic function defined in states and given by Formula (11).
We assume that coalition S at any state of the game may obtainŵ(S) as maximum. Accordingly, this value is the maximal value that the coalition can get, regardless of the state that currently appears. If we summarize this value over infinite horizon with discount factor δ, we can calculate the approximation or the upper bound of the payoff that coalition S can get in stochastic subgame starting from state ω, which is One should notice that, according to Equation (15), we save the value of characteristic function for grand coalition without approximation. The reason is that, when we define the allocation of a joint payoff, the players should redistribute the value that they obtain using cooperative strategy profile, but not the approximated one. The cooperative stochastic subgame is defined by the set of players N and function (15). In the following, we omit the set of players and refer the cooperative stochastic subgame asw(ω, S) given by (15).
LetD(ω) be the core calculated with the values of function (15), i.e., LetD(ω) be non-empty for any ω ∈ Ω. We can compare the coreD(ω) constructed with the values of approximated function (15) and the core defined with the values of characteristic function defined with the classical approach. For any subgame G ω , we define characteristic function using the maxmin approach:v (ω, S) = max LetC(ω) be a non-empty core of subgame G ω constructed with the values of function (17).
Proof. Ifŵ(S) < min ω∈Ω v(ω, N) is not satisfied, then the coreD(ω) is empty by construction. Consider any imputationᾱ(ω) ∈D(ω) and prove that it belongs to the setC(ω). N). Second, we prove that ∑ i∈Sᾱ i (ω) v(ω, S) taking into account that ∑ i∈Sᾱ i (ω) w(ω, S) for any S = N. We prove thatw(ω, S) v(ω, S). By definition, we havev (ω, S) = max and we write the functional equation for the right-hand side of this equality and obtain the following Let profile (η S , η N\S ) be such that maxmin is reached at this profile, we can write the functional equation, as follows: In the last inequality, we use the property of stochastic matrices, i.e., the sum of the elements in any row of matrix (I k − δΠ(η S , η N\S )) −1 equal 1/(1 − δ), because Π(η S , η N\S ) is a stochastic matrix. The lemma is proved.

Remark 4.
We assume non-emptiness of the approximated coreD(ω) in stochastic game with any initial state ω. If condition (18) in Lemma 2 is satisfied, the non-emptiness of the approximated cores D(ω) for any ω implies the non-emptiness of approximated coreD(ω). It follows from definition of characteristic functionw(ω, S) and formula (10). Moreover, the non-emptiness of approximated coreD(ω) implies non-emptiness of coreC(ω).

Imputation Distribution Procedure
In cooperation, players follow the cooperative strategy profile η * and then agree on the core as a cooperative solution of the game or the set of possible imputations of the joint payoff in the game. We assume that the core for any subgame G ω is calculated based on function (15), which isD(ω). Consider an imputationᾱ(ω) ∈D(ω). Obviously, if the players are paid step by step according to initially given payoff functions K ω i , i ∈ N, we cannot guarantee that they will get the components of imputationᾱ(ω) as an expected payoff in subgame G ω . Therefore, we define the scheme of state payments that, in total, will give the players to obtain the components of imputationᾱ(ω). Definition 1. [10,11] We call the collection of vectors (β i : i ∈ N), where β i = (β i (ω 1 ), . . . , β i (ω k )), β i (ω) is a payment to player i in state ω in cooperative stochastic game, an imputation distribution procedure (IDP) of imputationᾱ(ω) ∈D(ω) if where B ω i is the expected discounted sum of payments to player i in stochastic subgame starting from state ω, according to procedure β.
The expected sum of payments to player i made according to IDP can be calculated by formula (see [14]): where π 0 is such that π ω 0 = 1 and π ω 0 = 0 for any ω = ω.
In the following section, we describe a property of the imputations from the core and corresponding IDP, which allows to narrow the set of IDP.

Strongly Subgame-Consistent Core
We formulate the property of strongly subgame consistency of the core and propose sufficient conditions of strongly subgame consistency of the core in stochastic games with characteristic function (15). We suppose that the cores of stochastic game G and any subgame G ω , ω ∈ Ω, are non-empty.
In cooperation, players agree on the joint implementation of cooperative strategy profile η * and expect to obtain the components of the imputation belonging to the coreD(ω) in the subgame stating from ω. Reaching an intermediate state ω ∈ Ω, Player i chooses action a ω * i prescribed by cooperative strategy profile η * and gets payoff K ω i (a ω * ). If the players recalculate the solution in the current subgame and find solution of cooperative subgame G ω , we would assume that the cooperative solution is chosen from the coreD(ω). It would be reasonable to require that the payoff received by a player in state ω summarized with the expected sum of any imputations from the coresD(ω ), ω ∈ Ω, following state ω, would be an imputation from the coreD(ω). If this property holds for any intermediate state ω ∈ Ω, then the core of cooperative stochastic game with characteristic function (15) is strongly subgame-consistent.
To determine a strongly subgame-consistent core, we need to define the so-called expected core at state ω, i.e., we define the set of expected imputations belonging to the cores, which are cooperative solutions of the following subgames. We determine the expected core of state ω ∈ Ω, as follows:

Remark 6. The inclusion
where β(ω 1 ) ∈ R n , ED(ω 1 ) ⊂ R n ,D(ω 1 ) ⊂ R n . The operation a ⊕ C, where a ∈ R n and C is a set in R n , is defined as the set {a + c, for all c ∈ C}. Theorem 1. The coreD(ω), if it exists, is strongly subgame-consistent.
Theorem 1 gives the method of construction of payment scheme of any element from the coreD defined by (16) while using values of function (15).

Remark 7.
The new method of construction of the characteristic function or the so-called approximated characteristic function proposed in the paper allows not only to find the strongly subgame-consistent subset of the core, but also simplifies calculations. In the example, each player has two actions in any state. Therefore, he has four pure stationary strategies in a stochastic game, and there are 64 strategy profiles in the game. The calculations of maxmin payoff of a coalition in such games is a complicated computational problem. The new approach allows for avoiding these calculations using the values of approximated characteristic function defined in state games to determine the function for a stochastic game.

Conclusions
We have proposed a new method of constructing the characteristic function in stochastic games. The method simplifies calculations in comparison with the previously introduced approaches. An additional advantage of the method is that the core calculated with the values of this characteristic function satisfies strongly subgame consistency. This property positively characterizes the realization of the imputations from the core in a dynamic game process. The property of strongly subgame consistency is applied for set-valued cooperative solutions, like the core. We can briefly characterize the possible directions for future research in this area. We can also consider additional simplifications in characteristic function definitions, which allow not only to keep the strong subgame consistency properties of the core, but also to reduce the number of calculations defining cooperative stochastic game.