Next Article in Journal
The Effect of Setting a Warning Vaccination Level on a Stochastic SIVS Model with Imperfect Vaccine
Next Article in Special Issue
Rational Behavior in Dynamic Multicriteria Games
Previous Article in Journal
A Numerical Method for a System of Fractional Differential-Algebraic Equations Based on Sliding Mode Control
Previous Article in Special Issue
Maximizing the Minimal Satisfaction—Characterizations of Two Proportional Values
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On a Simplified Method of Defining Characteristic Function in Stochastic Games

Department of Mathematical Game Theory and Statistical Decisions, Saint Petersburg State University, 7/9 Universitetskaya nab., Saint Petersburg 199034, Russia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2020, 8(7), 1135; https://doi.org/10.3390/math8071135
Submission received: 29 May 2020 / Revised: 1 July 2020 / Accepted: 9 July 2020 / Published: 11 July 2020
(This article belongs to the Special Issue Game Theory)

Abstract

:
In the paper, we propose a new method of constructing cooperative stochastic game in the form of characteristic function when initially non-cooperative stochastic game is given. The set of states and the set of actions for any player is finite. The construction of the characteristic function is based on a calculation of the maximin values of zero-sum games between a coalition and its anti-coalition for each state of the game. The proposed characteristic function has some advantages in comparison with previously defined characteristic functions for stochastic games. In particular, the advantages include computation simplicity and strong subgame consistency of the core calculated with the values of the new characteristic function.

1. Introduction

When a non-cooperative game is initially defined, the problem of construction of a cooperative version of the game is actual if players start acting as a unique coalition to maximize their joint payoff or minimize joint costs. The classical approach is to define cooperative game in a form of characteristic function that assigns the value for any coalition of players. Subsequently, based on this function one can calculate the imputation of the joint payoff allocating it among players. The component of the imputations may vary if we calculate them based on different characteristic functions. Therefore, the way of defining this function is important and it has influence on the players’ payoffs in cooperative game. Moreover, some approaches to define characteristic function make it impossible to apply in dynamic or differential games because of computational difficulties. Additionally, the way of constructing characteristic function also influences on the consistency properties of cooperative solutions that are realized in dynamics.
The choice of the approach on how to define characteristic function also depends on the background of the considered problem if it arises from an applied area. The existence and uniqueness issues are also actual when one chooses the way of constructing characteristic function. There exist different approaches that can be applied to stochastic game. The so-called maxmin and minmax approaches define the value of the function for coalition S as maxmin and minmax payoff of coalition S in zero-sum game against coalition of all left-out players [1,2]. Another approach is proposed in [3,4] when the value of coalition S is defined as its payoff in the Nash equilibrium in the non-cooperative game between coalition S and left-out players acting individually. The calculation of characteristic function in two-step procedure is proposed in [5], in which the authors first find an n-player non-cooperative equilibrium and then allow coalition S to optimize its payoff, assuming that left-out players use their Nash equilibrium actions found at the first step. The properties of this function are examined in [6,7]. Another two-stage approach for defining characteristic function is proposed in [8], in which the strategies maximizing total payoff of the players are first found. Subsequently, these strategies are used by the players from coalition S, while the out-coalition players use the strategies minimizing the total payoff of players from S. The joint payoff of players from the coalition equals the value of characteristic function for this coalition.
The new simplified method of constructing characteristic function in multistage games is introduced in [9]. They examine the properties of this function and proved that the corresponding core is strongly subgame-consistent in multistage game. This property cannot be proved in general case when the characteristic function is constructed with the classical approaches, like maxmin or minmax.
In the paper, we adopt the method of constructing the characteristic function proposed in [9] to stochastic games. Based on the values of the characteristic function, one can determine the core. Moreover, the core satisfies the strong subgame consistency property, which is a refinement of subgame consistency on the case of set-valued cooperative solutions. The problem of subgame consistency is originally examined for differential games in [10,11]. The construction of a special payment scheme, called imputation distribution procedure (see [11]), allows for coping with the problem of time inconsistency of cooperative solutions. This problem is described for stochastic games in [12,13,14] in the case of unique-valued cooperative solutions. The node-consistent core is constructed in dynamic games played over event trees in [15]. The strong subgame consistency of the set-valued cooperative solution, like the core, guarantees players to obtain, in total, the solution from initially defined core. It means that, in any intermediate time period, the solution is the sum of obtained payments up to the current period, and the core elements of subgame starting from the next time period. The strong subgame consistency condition is proposed in [16]. The subcore satisfying strong subgame property is constructed for multistage games in [17]. The problem of subgame consistency is actual for different classes of dynamic and differential games and it is examined in [18] for stochastic games with finite duration, in [19] for differential games with finite time horizon, in [20] for multistage games. In the paper, we construct characteristic function for stochastic game in a special way and calculate the core while using the values of this function. The core satisfies strong subgame consistency property. To prove this result, we define the imputation distribution procedure, which determines the payments to the players in any state realized in the game process.
The rest of the paper is organized, as follows. We describe the model of stochastic games in Section 2.1. In Section 2.2, we define the new approximated characteristic function for stage games, and then extend this approach to the case of stochastic game in Section 2.3. We formulate the definition of the imputation distribution procedure for stochastic games and describe the idea of strongly subgame consistency of the core in Section 3. We briefly conclude in Section 4.

2. Cooperative Stochastic Games

2.1. Model

Consider a non-cooperative stochastic game G given by
G = N , Ω , { Γ ( ω ) } ω Ω , π 0 , p ( ω | ω , a ω ) ω , ω Ω a ω i N A i ω , δ ,
where
  • N = { 1 , , n } is the set of players.
  • Ω = { ω 1 , , ω k } is the finite set of states.
  • Γ ( ω ) is the game in normal form associated with state ω . The set of players N is common for any state ω . Let A i ω be a finite set of actions of player i N in state ω , a i ω A i ω be an action of player i N in this state; K i ω : j N A j ω R be a payoff function of player i in state ω .
  • p ( · | ω , a ω ) : Ω × A ω Δ ( Ω ) is a transition function from state ω when action profile a ω j N A j ω is realized, where Δ ( Ω ) is a probability distribution over set Ω .
  • π 0 = ( π 0 ω 1 , , π 0 ω k ) is an initial state distribution.
  • δ ( 0 , 1 ) is a common discount factor.
Denote by G ω the subgame of G starting from state ω defined by (1) with π 0 , such that π 0 ω = 1 and π 0 ω = 0 for any state ω ω .
We assume that, in stochastic game G, the set of any player’s strategies H i is stationary. The stationary strategy of player i is η i assigning action (maybe mixed) a i Δ ( A i ω ) to any state ω . The vector ( η 1 , , η n ) j N H j is a stationary strategy profile in stochastic game G. It is obvious that a stationary strategy η i of player i N in game G is the stationary strategy of this player in any subgame G ω .
By the payoff of player i, we assume the expected payoff in stochastic subgame G ω given by
E i ω ( η ) = K i ω ( a ω ) + δ ω Ω p ( ω | ω , a ω ) E i ω ( η ) .
where η H = j N H j is a stationary strategy profile such that η ( ω ) = a ω j N A j ω . We rewrite Equation (2) in a vector form and obtain
E i ( η ) = K i ( a ) + δ Π ( η ) E i ( η ) ,
where E i ( η ) = ( E i ω 1 ( η ) , , E i ω k ( η ) ) , K i ( a ) = ( K i ω 1 ( a ω 1 ) , , K i ω k ( a ω k ) ) . A matrix of transition probabilities is formed in the following way
Π ( η ) = p ( ω 1 | ω 1 , a ω 1 ) p ( ω k | ω 1 , a ω 1 ) p ( ω 1 | ω 2 , a ω 2 ) p ( ω k | ω 2 , a ω 2 ) p ( ω 1 | ω k , a ω k ) p ( ω k | ω k , a ω k )
in which each row contains transition probabilities from a corresponding state.
Equation (3) implies the explicit formula to calculate the expected payoff of player i when the stationary strategy profile η is realized:
E i ( η ) = I k δ Π ( η ) 1 K i ( a ) ,
where I k is an identity matrix of size k × k . Inverted matrix I k δ Π ( η ) 1 always exists for discount factor δ ( 0 , 1 ) .
Taking into account the probability distribution π 0 , we calculate the expected payoff in game G, as
E ¯ i ( η ) = π 0 E i ( η ) = π 0 I k δ Π ( η ) 1 K i ( a ) .
If players cooperate, they find the cooperative strategy profile η * maximizing the total expected payoff, which is
η * = arg max η H i N E ¯ i ( η ) .
We should notice that η * is a pure stationary strategy profile. The profile η * is such that η i * ( ω ) = a i ω * A i ω , ω Ω . We also assume that the profile η * is such that max η H i N E i ω ( η ) = i N E i ω ( η * ) for any state ω Ω , which means that the cooperative strategy profile maximizes the total payoff of the players independently of which state is initial. This assumption is usually satisfied for most stochastic games.
To define cooperative game when the non-cooperative stochastic game is given, we use the classical approach and define it in the form of characteristic function v : 2 N R 1 whose values estimate the “power” of any coalition or the subset of players. In [21], the characteristic function value for coalition S in subgame starting at any state ω is defined in as maxmin value, which is
v ω ( S ) = v a l G S ω ,
where G S ω is a zero-sum stochastic subgame starting at state ω , in which coalition S is a maximizing player, coalition N \ S is a minimizing player. Existence of the value of game G S ω for stochastic games is proved in [22].

2.2. Approximated Characteristic Function for State Games

Before we define a characteristic function in a new form, we need to make additional calculations. First, we consider state games and propose a scheme of calculation of the approximated characteristic function values for any state. Define characteristic function for a state ω Ω or one-shot game Γ ( ω ) given in normal form while using the maxmin approach:
v ( ω , S ) = max a S j S A j ω min a N \ S j N \ S A j ω i S K i ω ( a S ω , a N \ S ω ) ,
where maxmin in (7) is found in pure strategies.
Let C ( ω ) be a non-empty core in the game defined in state ω using c.f. (7), which is
C ( ω ) = ( α 1 ( ω ) , , α n ( ω ) ) : i S α i ( ω ) v ( ω , S ) , S N , i N α i ( ω ) = v ( ω , N )
Remark 1.
We assume that conditions under which the core C ( ω ) exists for any state ω are satisfied. The core C ( ω ) is non-empty if and only if for any function ψ : 2 N \ [ 0 , 1 ] , where S 2 N : S i ψ ( S ) = 1 for any i N , condition (see [23,24])
S 2 N \ ψ ( S ) v ( ω , S ) v ( ω , N )
holds. Characteristic function v ( ω , S ) is defined by (7). We refer to the book [25] for further discussion of non-emptiness of the core.
Second, for any coalition S N define maximal value of characteristic function (7) over set Ω :
w ^ ( S ) = max ω Ω v ( ω , S ) ,
which is the maximal value that coalition S can obtain in state games.
The next step is to define the approximated value of the characteristic function for any state in the following way. Let for any state ω Ω the approximated characteristic function w ( ω , S ) be given as
w ( ω , S ) = i S K i ω ( a ω * ) , if S = N , w ^ ( S ) , if S N .
In Equation (11), the summarized payoff of the players adopting cooperative action profile a ω * is assigned to the grand coalition. The approximated (maximal possible value over all possible states) values of characteristic function w ^ ( S ) given by (10) are assigned to any coalition S different from N. Denote the core constructed with the values of characteristic function (11) as D ( ω ) and assume that it is non-empty for any state ω ,
D ( ω ) = ( α 1 ( ω ) , , α n ( ω ) ) : i S α i ( ω ) w ( ω , S ) , S N , i N α i ( ω ) = w ( ω , N ) .
Lemma 1.
Let for any coalition S N , S N , the inequality w ^ ( S ) < min ω Ω v ( ω , N ) hold. If condition
i N K i ω ( a ω * ) = max a ω j N A j ω i N K i ω ( a ω ) ,
is true, and the core D ( ω ) is non-empty for any ω, and then D ( ω ) C ( ω ) .
Proof. 
If there exists coalition S N , S N , such that w ^ ( S ) min ω Ω v ( ω , N ) , then the core D ( ω ) is empty. Assuming the non-emptiness of the core D ( ω ) , we consider any imputation α ( ω ) D ( ω ) . If condition (13) is true, it means that i N α i ( ω ) = v ( ω , N ) = w ( ω , N ) .
Subsequently, for any coalition S N , we have i S α i ( ω ) w ( ω , S ) = w ^ ( S ) = max ω Ω v ( ω , S ) v ( ω , S ) , which proves that α ( ω ) C ( ω ) . □
Remark 2.
Condition (13) states that the maximal total payoff of the players in state ω coincides with their payoff if players adopt actions prescribed by the cooperative strategy profile. It may not be satisfied in general case in dynamic games. If condition (13) is not true, the main result of the paper can be proved, but it requires a modification in the method of characteristic function definition. We leave this case for future research.
Remark 3.
We assume that the approximated core D ( ω ) is non-empty for any ω. The conditions under which it is non-empty are similar to the ones given in Remark 1, but in Equation (9) characteristic function w ( ω , S ) given by (11) is used. If the conditions of Lemma 1 are satisfied, then D ( ω ) C ( ω ) , and non-emptiness of approximated core D ( ω ) implies non-emptiness of core C ( ω ) .
Example 1.
Consider three-player stochastic game with two states ( ω 1 and ω 2 ). The sets of actions of player 1, 2, and 3 in state ω 1 ( ω 2 ) are { a 1 , a 2 } , { b 1 , b 2 } and { c 1 , c 2 } ( { α 1 , α 2 } , { ζ 1 , ζ 2 } , { γ 1 , γ 2 } ), respectively. The payoff functions are given by the following matrices:
  • in state ω 1 :
    c 1 : b 1 b 2 a 1 a 2 ( ( 10 , 10 , 8 ) ( 15 , 0 , 0 ) ( 0 , 15 , 0 ) ( 5 , 5 , 5 ) ) c 2 : b 1 b 2 a 1 a 2 ( ( 0 , 0 , 15 ) ( 4 , 4 , 2 ) ( 2 , 4 , 4 ) ( 0 , 0 , 0 ) )
  • in state ω 2 :
    γ 1 : ζ 1 ζ 2 α 1 α 2 ( ( 2 , 1 , 1 ) ( 0 , 4 , 2 ) ( 4 , 0 , 2 ) ( 7 , 5 , 3 ) ) γ 2 : ζ 1 ζ 2 α 1 α 2 ( ( 2 , 3 , 0 ) ( 3 , 4 , 3 ) ( 4 , 2 , 4 ) ( 7 , 5 , 7 ) )
Player 1 chooses a row, player 2 chooses a column and player 3 chooses a matrix.
The transition probabilities are written in the matrices:
  • for state ω 1 :
    c 1 : b 1 b 2 a 1 a 2 ( ( 0.5 , 0.5 ) ( 0 , 1 ) ( 0 , 1 ) ( 0 , 1 ) ) c 2 : b 1 b 2 a 1 a 2 ( ( 0 , 1 ) ( 0.5 , 0.5 ) ( 0.5 , 0.5 ) ( 1 , 0 ) )
  • for state ω 2 :
    γ 1 : ζ 1 ζ 2 α 1 α 2 ( ( 0 , 1 ) ( 1 , 0 ) ( 1 , 0 ) ( 1 , 0 ) ) γ 2 : ζ 1 ζ 2 α 1 α 2 ( ( 0.2 , 0.8 ) ( 0 , 1 ) ( 0 , 1 ) ( 1 , 0 ) )
    The first (second) element in any entry of the matrix is the probability of transition from the particular state and action profile to state ω 1 (state ω 2 ). One can easily notice that the probabilistic transitions are defined in state ω 1 when players choose action profiles ( a 1 , b 1 , c 1 ) , ( a 2 , b 1 , c 2 ) and ( a 1 , b 2 , c 2 ) , and in state ω 2 when players choose action profiles ( α 1 , ζ 1 , γ 2 ) . All other transitions are deterministic.
The discount factor equals 0.9.
Cooperative strategy profile η * = ( η 1 * , η 2 * , η 3 * ) is such that
η 1 * = ( a 1 , α 2 ) , η 2 * = ( b 1 , ζ 2 ) , η 2 * = ( c 1 , γ 2 ) ,
which prescribes any player to choose the first action in state ω 1 and the second action in state ω 2 . The cooperative strategy profile defines a Markov chain with the structure that is depicted in Figure 1.
The players’ payoffs are ( 10 , 10 , 8 ) in state ω 1 and ( 7 , 5 , 7 ) in state ω 2 . We obtain that the maximal total payoff of the players in state games coincide with the payoff that players get in states implementing cooperative strategy profile η * . However, Theorem 1 is also true for the case when this condition is not satisfied.
First, we calculate the characteristic function v ( ω , S ) by Equation (7) and its approximation w ( ω , S ) by (11) for state games. The values of these functions are represented in Table 1.
The cores of state games C ( ω ) and D ( ω ) calculated with values of functions v ( ω , S ) and w ( ω , S ) by Formulae (8) and (12) are non-empty for any ω and represented on Figure 2 and Figure 3 for ω 1 and ω 2 respectively.

2.3. New Approximated Characteristic Function for Stochastic Games

We propose a new method of determining characteristic function for stochastic games based on the values of approximated characteristic function defined in states and given by Formula (11).
We assume that coalition S at any state of the game may obtain w ^ ( S ) as maximum. Accordingly, this value is the maximal value that the coalition can get, regardless of the state that currently appears. If we summarize this value over infinite horizon with discount factor δ , we can calculate the approximation or the upper bound of the payoff that coalition S can get in stochastic subgame starting from state ω , which is
w ¯ ( ω , S ) = w ^ ( S ) + δ w ^ ( S ) + = 1 1 δ w ^ ( S ) , if S N , S N , i N E i ω ( η * ) , if S = N .
One should notice that, according to Equation (15), we save the value of characteristic function for grand coalition without approximation. The reason is that, when we define the allocation of a joint payoff, the players should redistribute the value that they obtain using cooperative strategy profile, but not the approximated one. The cooperative stochastic subgame is defined by the set of players N and function (15). In the following, we omit the set of players and refer the cooperative stochastic subgame as w ¯ ( ω , S ) given by (15).
Let D ¯ ( ω ) be the core calculated with the values of function (15), i.e.,
D ¯ ( ω ) = ( α 1 ( ω ) , , α n ( ω ) ) : i S α i ( ω ) w ¯ ( ω , S ) , S N , i N α i ( ω ) = w ¯ ( ω , N ) .
Let D ¯ ( ω ) be non-empty for any ω Ω . We can compare the core D ¯ ( ω ) constructed with the values of approximated function (15) and the core defined with the values of characteristic function defined with the classical approach. For any subgame G ω , we define characteristic function using the maxmin approach:
v ¯ ( ω , S ) = max η S j S H j min η N \ S j N \ S H j i S E i ω ( η S , η N \ S ) .
Let C ¯ ( ω ) be a non-empty core of subgame G ω constructed with the values of function (17).
Lemma 2.
Let for any coalition S N , S N the inequality
w ^ ( S ) < min ω Ω v ( ω , N )
hold, and D ¯ ( ω ) is non-empty for any ω, then D ¯ ( ω ) C ¯ ( ω ) .
Proof. 
If w ^ ( S ) < min ω Ω v ( ω , N ) is not satisfied, then the core D ¯ ( ω ) is empty by construction. Consider any imputation α ¯ ( ω ) D ¯ ( ω ) and prove that it belongs to the set C ¯ ( ω ) .
First, i N α ¯ i ( ω ) = w ¯ ( ω , N ) = i N E i ω ( η * ) = v ¯ ( ω , N ) .
Second, we prove that i S α ¯ i ( ω ) v ¯ ( ω , S ) taking into account that i S α ¯ i ( ω ) w ¯ ( ω , S ) for any S N . We prove that w ¯ ( ω , S ) v ¯ ( ω , S ) .
By definition, we have
v ¯ ( ω , S ) = max η S min η N \ S i S E i ω ( η S , η N \ S ) ,
and we write the functional equation for the right-hand side of this equality and obtain the following
max η S min η N \ S i S E i ω ( η S , η N \ S ) = max η S min η N \ S i S K i ω ( a S ω , a N \ S ω ) + δ p ( ω , a ω ) i S E i ( η S , η N \ S ) ,
where p ( ω , a ω ) is a vector p ( ω | ω , a ω ) : ω Ω .
Let profile ( η S , η N \ S ) be such that maxmin is reached at this profile, we can write the functional equation, as follows:
i S E i ω ( η S , η N \ S ) = ( I k δ Π ( η S , η N \ S ) ) 1 i S K i ( a S , a N \ S ) 1 1 δ max ω Ω max a S min a N \ S i S K i ω ( a S ω , a N \ S ω ) = 1 1 δ max ω Ω v ( ω , S ) = w ¯ ( ω , S ) .
In the last inequality, we use the property of stochastic matrices, i.e., the sum of the elements in any row of matrix ( I k δ Π ( η S , η N \ S ) ) 1 equal 1 / ( 1 δ ) , because Π ( η S , η N \ S ) is a stochastic matrix. The lemma is proved. □
Remark 4.
We assume non-emptiness of the approximated core D ¯ ( ω ) in stochastic game with any initial state ω. If condition (18) in Lemma 2 is satisfied, the non-emptiness of the approximated cores D ( ω ) for any ω implies the non-emptiness of approximated core D ¯ ( ω ) . It follows from definition of characteristic function w ¯ ( ω , S ) and Formula (10). Moreover, the non-emptiness of approximated core D ¯ ( ω ) implies non-emptiness of core C ¯ ( ω ) .
Example 2.
(continuation of Example 1) We continue calculations for stochastic game described in Example 1. Define characteristic function v ¯ by (17) and approximated characteristic function w ¯ by (15). The values of these functions are given in Table 2.
The cores C ¯ ( ω ) and D ¯ ( ω ) constructed with the values of functions v ¯ and w ¯ , respectively, are non-empty and depicted on Figure 4 and Figure 5 for initial states ω 1 and ω 2 , respectively. One can notice that D ¯ ( ω ) C ¯ ( ω ) for any ω.
The approximated core D ¯ ( ω 1 ) is defined as the set
D ¯ ( ω 1 ) = { ( α ¯ 1 , α ¯ 2 , α ¯ 3 ) : α ¯ 1 + α ¯ 2 + α ¯ 3 = 252.07 , α ¯ 1 + α ¯ 2 120.00 , α ¯ 1 + α ¯ 3 100.00 , α ¯ 2 + α ¯ 3 100.00 , α ¯ 1 20.00 , α ¯ 2 10.00 , α ¯ 3 10.00 } .
The approximated core D ¯ ( ω 2 ) is defined as the set
D ¯ ( ω 2 ) = { ( α ¯ 1 , α ¯ 2 , α ¯ 3 ) : α ¯ 1 + α ¯ 2 + α ¯ 3 = 245.86 , α ¯ 1 + α ¯ 2 120.00 , α ¯ 1 + α ¯ 3 100.00 , α ¯ 2 + α ¯ 3 100.00 , α ¯ 1 20.00 , α ¯ 2 10.00 , α ¯ 3 10.00 } .

3. Strongly Subgame-Consistent Core in Stochastic Games

3.1. Imputation Distribution Procedure

In cooperation, players follow the cooperative strategy profile η * and then agree on the core as a cooperative solution of the game or the set of possible imputations of the joint payoff in the game. We assume that the core for any subgame G ω is calculated based on function (15), which is D ¯ ( ω ) . Consider an imputation α ¯ ( ω ) D ¯ ( ω ) . Obviously, if the players are paid step by step according to initially given payoff functions K i ω , i N , we cannot guarantee that they will get the components of imputation α ¯ ( ω ) as an expected payoff in subgame G ω . Therefore, we define the scheme of state payments that, in total, will give the players to obtain the components of imputation α ¯ ( ω ) .
Definition 1.
[10,11] We call the collection of vectors ( β i : i N ) , where β i = ( β i ( ω 1 ) , , β i ( ω k ) ) , β i ( ω ) is a payment to player i in state ω in cooperative stochastic game, an imputation distribution procedure (IDP) of imputation α ¯ ( ω ) D ¯ ( ω ) if
  • i N β i ( ω ) = i N K i ω ( a ω * ) for any ω Ω ;
  • α ¯ i ( ω ) = B i ω , where B i ω is the expected discounted sum of payments to player i in stochastic subgame starting from state ω, according to procedure β.
The expected sum of payments to player i made according to IDP can be calculated by formula (see [14]):
B i ω = π 0 ( I δ Π ( η * ) ) 1 β i ,
where π 0 is such that π 0 ω = 1 and π 0 ω = 0 for any ω ω .
Remark 5.
The IDP determined in Definition 1 for an imputation α ¯ ( ω ) D ¯ ( ω ) may be non-unique.
In the following section, we describe a property of the imputations from the core and corresponding IDP, which allows to narrow the set of IDP.

3.2. Strongly Subgame-Consistent Core

We formulate the property of strongly subgame consistency of the core and propose sufficient conditions of strongly subgame consistency of the core in stochastic games with characteristic function (15). We suppose that the cores of stochastic game G and any subgame G ω , ω Ω , are non-empty.
In cooperation, players agree on the joint implementation of cooperative strategy profile η * and expect to obtain the components of the imputation belonging to the core D ¯ ( ω ) in the subgame stating from ω . Reaching an intermediate state ω Ω , Player i chooses action a i ω * prescribed by cooperative strategy profile η * and gets payoff K i ω ( a ω * ) . If the players recalculate the solution in the current subgame and find solution of cooperative subgame G ω , we would assume that the cooperative solution is chosen from the core D ¯ ( ω ) . It would be reasonable to require that the payoff received by a player in state ω summarized with the expected sum of any imputations from the cores D ¯ ( ω ) , ω Ω , following state ω , would be an imputation from the core D ¯ ( ω ) . If this property holds for any intermediate state ω Ω , then the core of cooperative stochastic game with characteristic function (15) is strongly subgame-consistent.
To determine a strongly subgame-consistent core, we need to define the so-called expected core at state ω , i.e., we define the set of expected imputations belonging to the cores, which are cooperative solutions of the following subgames. We determine the expected core of state ω Ω , as follows:
E D ¯ ( ω ) = δ ω Ω p ( ω | ω , a ω * ) α ¯ ( ω ) , α ¯ ( ω ) D ¯ ( ω ) .
Definition 2.
We call the core D ¯ ( ω ) strongly subgame consistent solution of cooperative stochastic game with approximated characteristic function w ¯ ( ω , S ) starting from state ω if for any imputation α ¯ ( ω ) D ¯ ( ω ) there exists an IDP β = ( β i : i N ) , where β i = ( β i ( ω ) : ω Ω ) , satisfying condition:
β E D ¯ D ¯ ,
where E D ¯ is the vector ( E D ¯ ( ω 1 ) , , E D ¯ ( ω k ) ) of expected cores for states ω 1 , , ω k respectively, D ¯ is a vector with elements which are sets, i.e., D ¯ = ( D ¯ ( ω 1 ) , , D ¯ ( ω k ) ) .
Remark 6.
The inclusion (19) is written in a vector form. To explain it, we write the first row of vector inclusion (19):
β ( ω 1 ) E D ¯ ( ω 1 ) D ¯ ( ω 1 )
where β ( ω 1 ) R n , E D ¯ ( ω 1 ) R n , D ¯ ( ω 1 ) R n . The operation a C , where a R n and C is a set in R n , is defined as the set { a + c , f o r a l l c C } .
Theorem 1.
The core D ¯ ( ω ) , if it exists, is strongly subgame-consistent.
Proof. 
Following Definition 2 we need to prove that there exists an IDP of the elements from the core D ¯ ( ω ) defined in (16) satisfying two properties from Definition 1, such that inclusion (19) is true.
Let for any imputation α ¯ i ( ω ) D ¯ ( ω ) , the IDP is calculated as
β i = ( I k δ Π ( η * ) ) α ¯ i ,
where β i = ( β i ( ω 1 ) , , β i ( ω k ) ) and α ¯ i = ( α ¯ i ( ω 1 ) , , α ¯ i ( ω k ) ) .
First, we prove that β , defined in (20), satisfies properties 1 and 2 in Definition 1.
  • Find the sum of β i over the set of players, we obtain
    i N β i = ( I k δ Π ( η * ) ) i N α ¯ i = ( I k δ Π ( η * ) ) ( v ¯ ( ω 1 , N ) , , v ¯ ( ω k , N ) ) = ( I k δ Π ( η * ) ) ( I k δ Π ( η * ) ) 1 i N K i ( a * ) = i N K i ( a * ) ,
    or for any ω Ω the equality i N β i ( ω ) = i N K i ω ( a ω * ) is true.
  • We prove that α ¯ i ( ω ) = B i ω or in vector form α ¯ i = B i , where B i = ( B i ( ω 1 ) , , B i ( ω k ) ) . We have
    B i = ( I k δ Π ( η * ) ) 1 β i = ( I k δ Π ( η * ) ) 1 ( I k δ Π ( η * ) ) α ¯ i = α ¯ i .
Therefore, the payment vector β i , i N , is the distribution procedure of imputation α ¯ i .
Now, we prove that inclusion (19) holds. Let β i be given by Equation (20), there α ¯ i = ( α ¯ i ( ω 1 ) , , α ¯ i ( ω k ) ) and α ¯ i ( ω j ) D ¯ ( ω j ) for any j = 1 , , k . Consider the sum β ( ω ) + ε ( ω ) , where ε ( ω ) is any vector from the expected core E D ¯ ( ω ) . Substituting expressions of β i from Equation (20) and element of the expected core into the sum, we get
β + ε = ( I k δ Π ( η * ) ) α ¯ + δ Π ( η * ) α ¯ = α ¯ D ¯ ,
which proves the theorem. □
Theorem 1 gives the method of construction of payment scheme of any element from the core D ¯ defined by (16) while using values of function (15).
Example 3.
(continuation of Example 1 and 2) We demonstrate how to define IDP using a method from the proof of Theorem 1. Let for ω 1 and ω 2 the core imputations α ¯ ( ω 1 ) = ( 100.00 , 100.00 , 52.07 ) D ¯ ( ω 1 ) and α ¯ ( ω 2 ) = ( 50.00 , 95.86 , 100.00 ) D ¯ ( ω 2 ) be chosen. To calculate IDP by Formula (20), we need to define matrix Π ( η * ) , which is
Π ( η * ) = 0.5 0.5 0 1
for cooperative strategy profile η * determined by (14).
Using Formula (20) with α ¯ 1 = ( 100.00 , 50.00 ) , α ¯ 2 = ( 100.00 , 95.86 ) , α ¯ 3 = ( 52.07 , 100.00 ) , we obtain
β 1 = ( 32.50 , 40.00 ) , β 2 = ( 11.86.5.86 ) , β 3 = ( 16.36.53.14 ) ,
where the first component of vector β i is the payment to player i in state ω 1 and the second component is the payment in state ω 2 . We can easily check that collection of vectors ( β i : i N ) satisfies conditions from Definition 1 of IDP.
The approximated cores D ¯ ( ω 1 ) and D ¯ ( ω 2 ) are strongly subgame-consistent, which is proved in Theorem 1.
Remark 7.
The new method of construction of the characteristic function or the so-called approximated characteristic function proposed in the paper allows not only to find the strongly subgame-consistent subset of the core, but also simplifies calculations. In the example, each player has two actions in any state. Therefore, he has four pure stationary strategies in a stochastic game, and there are 64 strategy profiles in the game. The calculations of maxmin payoff of a coalition in such games is a complicated computational problem. The new approach allows for avoiding these calculations using the values of approximated characteristic function defined in state games to determine the function for a stochastic game.

4. Conclusions

We have proposed a new method of constructing the characteristic function in stochastic games. The method simplifies calculations in comparison with the previously introduced approaches. An additional advantage of the method is that the core calculated with the values of this characteristic function satisfies strongly subgame consistency. This property positively characterizes the realization of the imputations from the core in a dynamic game process. The property of strongly subgame consistency is applied for set-valued cooperative solutions, like the core. We can briefly characterize the possible directions for future research in this area. We can also consider additional simplifications in characteristic function definitions, which allow not only to keep the strong subgame consistency properties of the core, but also to reduce the number of calculations defining cooperative stochastic game.

Author Contributions

Conceptualization, E.P. and L.P.; methodology, E.P. and L.P.; software, E.P. and L.P.; validation, E.P. and L.P.; formal analysis, E.P. and L.P.; investigation, E.P. and L.P.; resources, E.P. and L.P.; data curation, E.P. and L.P.; writing—original draft preparation, E.P. and L.P.; writing—review and editing, E.P. and L.P.; visualization, E.P. and L.P.; supervision, E.P. and L.P.; project administration, E.P. and L.P.; funding acquisition, E.P. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by Russian Science Foundation, grant no. 17-11-01079.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Von Neumann, J.; Morgenstern, O. Theory of Games and Economic Behavior; Princeton University Press: Princeton, NJ, USA, 1944. [Google Scholar]
  2. Aumann, R.J.; Peleg, B. Von Neumann-Morgenstern Solutions to Cooperative Games without Side Payments. Bull. Am. Math. 1960, 66, 173–179. [Google Scholar] [CrossRef] [Green Version]
  3. Chander, P.; Tulkens, H. The core of an economy with multilateral environmental externalities. Int. J. Game Theory 1997, 26, 379–401. [Google Scholar] [CrossRef]
  4. Chander, P. The gamma-core and coalition formation. Int. J. Game Theory 2007, 35, 539–556. [Google Scholar] [CrossRef]
  5. Petrosyan, L.; Zaccour, G. Time-consistent Shapley value allocation of pollution cost reduction. J. Econ. Dyn. Control. 2003, 27, 381–398. [Google Scholar] [CrossRef]
  6. Zaccour, G. Computation of Characteristic function values for linear-state differential games. J. Optim. Theory Appl. 2003, 117, 183–194. [Google Scholar] [CrossRef]
  7. Reddy, P.V.; Zaccour, G. A friendly computable characteristic function. Math. Soc. Sci. 2016, 82, 18–25. [Google Scholar] [CrossRef]
  8. Petrosyan, L.A.; Gromova, E.V. On an approach to constructing a characteristic function in cooperative differential games. Autom. Remote Control. 2017, 78, 1680–1692. [Google Scholar]
  9. Pankratova, Y.B.; Petrosyan, L.A. New Characteristic Function for Multistage Dynamic Games. 2018. Available online: https://cyberleninka.ru/article/n/new-characteristic-function-for-multistage-dynamic-games (accessed on 10 June 2020).
  10. Petrosjan, L.A. Consistency of Solutions of Differential Games with Many Players; Vestnik of Leningrad University, Series Mathematics, Mechanics; Astronomy: Saint Petersburg, Russia, 1977; pp. 46–52. [Google Scholar]
  11. Petrosjan, L.A.; Danilov, N.A. Time-Sonsitent Solutions of Non-Antagonistic Differential Games with Transferable Payoffs; Vestnik Leningrad University: Saint Petersburg, Russia, 1979; pp. 52–59. [Google Scholar]
  12. Petrosjan, L.A. Cooperative stochastic games. In Advances in Dynamic Games. Annals of the International Society of Dynamic Games; Haurie, A., Muto, S., Petrosjan, L.A., Raghavan, T.E.S., Eds.; Birkhauser: Boston, MA, USA, 2006; Volume 8, pp. 52–59. [Google Scholar]
  13. Avrachenkov, K.; Cottatellucci, L.; Maggi, L. Cooperative Markov decision processes: Time consistency, greedy players satisfaction, and cooperation maintenance. Int. J. Game Theory 2013, 42, 39–262. [Google Scholar] [CrossRef]
  14. Parilina, E.M. Stable cooperation in stochastic games. Autom. Remote. Control. 2015, 76, 1111–1122. [Google Scholar] [CrossRef]
  15. Parilina, E.; Zaccour, G. Node-Consistent Core for Games Played over Event Trees. Automatica 2015, 53, 304–311. [Google Scholar] [CrossRef]
  16. Petrosjan, L.A. Construction of Strongly Time Consistent Solutions in Cooperative Differential Games; Birkhäuser: Cham, Switzerland, 1992; pp. 33–38. [Google Scholar]
  17. Petrosyan, L. Strong Strategic Support of Cooperation in Multistage Games. Int. Game Theory Rev. 2019, 21, 1940004. [Google Scholar] [CrossRef]
  18. Parilina, E.M.; Petrosyan, L.A. Strongly Subgame-Consistent Core in Stochastic Games. Autom. Remote Control. 2018, 79, 1515–1527. [Google Scholar] [CrossRef]
  19. Petrosian, O.; Gromova, E.; Pogozhev, S. Strong Time-Consistent Subset of the Core in Cooperative Differential Games with Finite Time Horizon. Autom. Remote Control. 2018, 79, 1912–1928. [Google Scholar] [CrossRef]
  20. Sedakov, A.A. On the Strong Time Consistency of the Core. Autom. Remote Control. 2018, 79, 757–767. [Google Scholar] [CrossRef]
  21. Parilina, E.; Tampieri, A. Stability and Cooperative Solution in Stochastic Games. Theory Decis. 2018, 84, 601–625. [Google Scholar] [CrossRef] [Green Version]
  22. Shapley, L.S. Stochastic Games. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. Available online: https://www.pnas.org/content/39/10/1095.short (accessed on 28 May 2020). [CrossRef] [PubMed]
  23. Bondareva, O.N. Some applications of linear programming methods to the theory of cooperative games. Problemy Kybernetiki 1963, 10, 119–139. [Google Scholar]
  24. Shapley, L.S. On balanced sets and cores. Nav. Res. Logist. Q. 1967, 14, 453–460. [Google Scholar] [CrossRef]
  25. Peleg, B.; Sudhölter, P. Introduction to the Theory of Cooperative Games; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Figure 1. The transition probabilities defined by cooperative strategy profile η * .
Figure 1. The transition probabilities defined by cooperative strategy profile η * .
Mathematics 08 01135 g001
Figure 2. The core C ( ω 1 ) (gray region) and approximated core D ( ω 1 ) (blue region inside gray region) for ω 1 state game.
Figure 2. The core C ( ω 1 ) (gray region) and approximated core D ( ω 1 ) (blue region inside gray region) for ω 1 state game.
Mathematics 08 01135 g002
Figure 3. The core C ( ω 2 ) (gray region) and approximated core D ( ω 2 ) (blue region inside gray region) for ω 2 state game.
Figure 3. The core C ( ω 2 ) (gray region) and approximated core D ( ω 2 ) (blue region inside gray region) for ω 2 state game.
Mathematics 08 01135 g003
Figure 4. The core C ¯ ( ω ) (gray region) and approximated core D ¯ ( ω ) (blue region inside gray region) in stochastic game with ω 1 initial state.
Figure 4. The core C ¯ ( ω ) (gray region) and approximated core D ¯ ( ω ) (blue region inside gray region) in stochastic game with ω 1 initial state.
Mathematics 08 01135 g004
Figure 5. The core C ¯ ( ω 2 ) (gray region) and approximated core D ¯ ( ω 2 ) (blue region inside gray region) in stochastic game with ω 2 initial state.
Figure 5. The core C ¯ ( ω 2 ) (gray region) and approximated core D ¯ ( ω 2 ) (blue region inside gray region) in stochastic game with ω 2 initial state.
Mathematics 08 01135 g005
Table 1. Values of characteristic function v and approximated characteristic function w for states ω 1 and ω 2 .
Table 1. Values of characteristic function v and approximated characteristic function w for states ω 1 and ω 2 .
S { 1 , 2 , 3 } { 1 , 2 } { 1 , 3 } { 2 , 3 } { 1 } { 2 } { 3 }
v ( ω 1 , S ) 2881010000
v ( ω 2 , S ) 191268211
w ( ω 1 , S ) 28121010211
w ( ω 2 , S ) 19121010211
Table 2. Values of characteristic function v ¯ and approximated characteristic function w ¯ for stochastic game starting from states ω 1 and ω 2 .
Table 2. Values of characteristic function v ¯ and approximated characteristic function w ¯ for stochastic game starting from states ω 1 and ω 2 .
S { 1 , 2 , 3 } { 1 , 2 } { 1 , 3 } { 2 , 3 } { 1 } { 2 } { 3 }
v ¯ ( ω 1 , S ) 252.0792.4162.0164.009.479.009.00
v ¯ ( ω 2 , S ) 245.8695.1777.8760.0010.5210.0010.00
w ¯ ( ω 1 , S ) 252.07120.00100.00100.0020.0010.0010.00
w ¯ ( ω 2 , S ) 245.86120.00100.00100.0020.0010.0010.00

Share and Cite

MDPI and ACS Style

Parilina, E.; Petrosyan, L. On a Simplified Method of Defining Characteristic Function in Stochastic Games. Mathematics 2020, 8, 1135. https://doi.org/10.3390/math8071135

AMA Style

Parilina E, Petrosyan L. On a Simplified Method of Defining Characteristic Function in Stochastic Games. Mathematics. 2020; 8(7):1135. https://doi.org/10.3390/math8071135

Chicago/Turabian Style

Parilina, Elena, and Leon Petrosyan. 2020. "On a Simplified Method of Defining Characteristic Function in Stochastic Games" Mathematics 8, no. 7: 1135. https://doi.org/10.3390/math8071135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop