1. Introduction
In a dynamic n-person game the players first choose their “optimal” strategies at the initial position
(which form the optimal strategy profile for the whole game), and then have an option to change their strategies at any intermediate position
and switch to other strategies if these strategies constitute the locally optimal strategy profile for the subgame starting at
. The time consistency property (first introduced in References [
1,
2,
3] for differential games) ensures that the players will not have an incentive to change their strategies at any subgame along the optimal game evolution, and hence plays an important role in the designing of the optimal players’ behavior in non-cooperative and cooperative dynamic games (see, e.g., References [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21], for details).
We consider an
n-person finite multistage games in the extensive form (see, e.g., References [
5,
17,
22,
23]) with perfect information and with chance moves. Note that much research has been already done on time consistent solutions (or close concepts) in extensive-form games (see, e.g., References [
4,
6,
13,
17,
21]). Time consistency concept was extended to dynamic games played over event trees in References [
14,
16,
20] as well as to multicriteria extensive-form cooperative games (without chance moves) in References [
7,
8,
10,
11,
15]. The property of "time consistency in the whole game" was extended to multicriteria extensive-form cooperative games with chance moves in Reference [
9] (note that in these games an optimal pure strategy profile does not generate the unique optimal trajectory in the game tree but rather the whole optimal bundle of the trajectories).
In the paper, we mainly focus on the dynamic aspects of cooperation in a dynamic extensive-form game with chance moves, and propose to design a mechanism of the players’ sustainable cooperation which satisfies three properties. First, a fragment of each cooperative trajectory from the optimal bundle for the original game should “remain optimal” at each subgame along the cooperative game evolution, that is, it should belong to the subgame optimal bundle of cooperative trajectories. Secondarily, a cooperative payoff-to-go at the subgame is no less than the non-cooperative payoff-to-go for all players. Finally, when the players re-evaluate their expected cooperative payoff after each passed chance move, they have no incentive to change original cooperative agreement.
To this aim, we first need to provide a rule for choosing a unique cooperative strategy profile as well as the unique optimal bundle of cooperative trajectories. We introduce the
Players’ Rank Based (PRB) algorithm and prove that this algorithm generates the unique optimal bundle of cooperative trajectories which satisfies time consistency. Note that a rather close approach—the so-called Refined Leximin (RL) algorithm—was introduced recently in Reference [
8]. Let us notice the main differences of these two algorithms. The RL algorithm is applicable for multicriteria game without chance moves and is based on the ranking of the criteria, while the PRB algorithm is designed for single-criterion extensive-form game with chance moves and employed the players’ ranks. Further, the RL algorithm allows to choose a unique cooperative trajectory while the PRB algorithm generates the unique optimal bundle of the cooperative trajectories in the game tree. To the best of the authors’ knowledge, other approaches to choosing an optimal bundle of the cooperative trajectories in extensive-form game with chance moves have not been considered yet.
Then, to construct a characteristic function (which describes the worth of each coalition in cooperative game) we use an equilibrium-based approach, namely the
-characteristic function introduced in Reference [
24]. Hence, the players have to accept a specific method for choosing a unique Subgame Perfect Equilibria (SPE) [
25] in an extensive-form game with chance moves. To solve this problem we provide the novel refinement of the backwards induction procedure (see, e.g., References [
5,
17,
23])—the so-called
Attitude SPE algorithm. A similar approach to construct a unique SPE in extensive-form game with perfect information was explored in References [
17,
26,
27] and was called the Type Equilibrium (TE) algorithm. Both algorithms are the refinements of the general backwards induction procedure that take into account the attitudes of each player towards other players. Let us point out the main differences of these algorithms. The TE algorithm is applicable for the game without chance moves and for the case when the payoffs are only determined in terminal nodes. In addition, the TE algorithm allows to construct SPE that is “unique” in the sense of payoffs (i.e., there may exist several optimal trajectories which generate the same equilibrium payoffs) while the Attitude SPE algorithm allows to choose unique SPE strategy profile as well as unique bundle of trajectories. Another rather close approach to find a unique SPE—the so-called Indifferent Equilibrium (IE) algorithm—was introduced in Reference [
28]. Again, the IE algorithm is applicable only for the game without chance moves and for the partial case when the payoffs are determined in terminal nodes. Moreover, IE algorithm in general allows to construct a SPE in behavior strategies while the proposed Attitude SPE algorithm always generates a SPE in pure strategies.
It is worth noting, that other approaches to analyze an extensive-form game, except for the backwards induction procedure and its refinements mentioned above, imply that the researcher first needs to obtain a strategic representation of the original extensive game and then analyzes this strategic (or normal-form) game (see, e.g., References [
29,
30,
31] ). For instance, the software tool “Game Theory Explorer” [
29] is based on the strategic-form representation and then applying the modified Lemke-Howson algorithm [
32] to find all Nash equilibria. The majority of existing algorithms are developed to find Nash equilibria in mixed strategies for 2-person games and do not allow to construct SPE in pure strategies. Moreover, as it was noted in Reference [
31], in general the strategic-form representation is exponential in the size of the original game tree. In contrast, the proposed Attitude SPE algorithm is a rather simple recursive algorithm which deals with n-person extensive-form game (with perfect information) itself and allows to compute a unique SPE in pure strategies.
After computing the
-characteristic function we suppose that the players adopt some single-valued cooperative solution
(for instance, the Shapley value [
33], the nucleolus [
34], etc.) which satisfies the individual and collective rationality property. Finally, to guarantee the sustainability of the achieved long-term cooperative agreement we employ the
Imputation Distribution Procedure (IDP) based approach (see, e.g., References [
3,
12,
14,
16,
17,
18,
20,
35]), that is, a payment schedule to redistribute the
ith player’s expected cooperative payoff along the optimal bundle of cooperative trajectories. In this paper, we mainly focus on the following good properties an IDP may satisfy: subgame efficiency, strict balance condition [
10,
15,
17] and an appropriate refinement of the time consistency property, called
subgame consistency. The point is that the “time consistency in the whole game” property [
9,
14,
16,
20] is based on an a priori assessment of the
ith player’s expected optimal payoff (before the game
starts). However, when the players make a decision in the subgame
after the chance move occurs, they need to re-estimate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. To deal with this interesting feature of the game with chance moves we adopt the notion of subgame consistency that was firstly proposed in Reference [
36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [
37,
38].
Since we derive a suitable definition of subgame consistency for other class of games, the proposed Definition 6 differs from ones provided in References [
37,
38] but captures the same idea. Let us point out the main differences with References [
37,
38]. While D. Yeung and L. Petrosyan do not consider the issue of multiple equilibria and study the stochastic games in which there exists a unique Nash equilibrium in each subgame, we focus on the problem of how to select a unique (subgame perfect) Nash equilibrium in extensive-form game with chance moves and derive the corresponding algorithm. Secondarily, the characteristic function has not been constructed in References [
37,
38] and, hence, the players are restricted to using the simplest cooperative solutions (for instance, they may share equally the excess of the total expected cooperative payoff over the expected sum of individual non-cooperative payoffs), whereas we provide a method for calculating the
-characteristic function. Hence, the players may use different solution concepts based on the characteristic function approach. Finally, it turns out that the incremental IDP specified for extensive-form games with chance moves in Reference [
9] satisfies not only the subgame consistency but also subgame efficiency and strict balance condition.
Therefore, the suggested PRB algorithm, the Attitude SPE algorithm combined with the -characteristic function, and the incremental payment schedule for any single-valued cooperative solution (meeting individual and collective rationality) together constitute a required mechanism of the players’ sustainable cooperation that satisfies exactly three properties mentioned above for any extensive-form game with chance moves.
It is worth noting that the extensive-form games, as well as dynamic games played over event trees, differential games and multistage games with discrete dynamics are used to model various real-world situations where several decision makers (or players) with different objectives may cooperate (see, e.g., References [
5,
12,
14,
16,
17,
20,
39,
40,
41,
42,
43,
44]. Hence, a proposed approach to implement a long-term cooperative agreement may have a number of possible applications.
The rest of the paper is organized as follows:
Section 2 recalls the main ingredients of the class of games of interest. In
Section 3, we specify the attitude SPE algorithm that allows constructing a unique SPE in a extensive-form game with chance moves. In
Section 4, we provide the PRB algorithm and prove that the optimal bundle of cooperative trajectories generated by this algorithm satisfy time consistency.
Section 5 reveals a drawback of the IDP “time consistency in the whole game” property and presents a subgame consistency definition that is applicable for extensive-form games with chance moves. We prove that incremental IDP satisfies a number of good properties and consider an example of a 3-person multistage game with chance moves to illustrate the incremental IDP implementation.
Section 6 provides a brief review of the results and discussion.
2. Extensive-Form Game with Chance Moves
We consider a finite multistage game in extensive form following References [
6,
13,
17,
22,
23]. First we need to define the basic notations and briefly remind some properties of extensive-form game that will be used in the sequel:
is the set of all players.
K is the game tree with the root and the set of all nodes P.
is the set of all direct successors (descendants) of the node x, and is the unique predecessor (parent) of the node such that .
is the set of all decision nodes of the ith player (at these nodes the player i chooses the following node), , for all , .
denotes the set of all terminal nodes (final positions), .
is the set of all nodes at which a chance moves, where denotes the probability of transition from node to node . We suppose that for each it holds that . Lastly, .
is the trajectory (or the path) in the game tree, , ; where index t in denotes the ordinal number of this node within the trajectory and can be interpreted as the "time index".
is the payoff of the ith player at the node . We assume that for all , , and the payoffs are non-negative, that is, .
In the following, we will use
to denote the class of all finite multistage
n-person games with chance moves in extensive form defined above, where
denotes a game with root
. Note that
is an extensive-form game with perfect information (see, e.g., References [
17,
22,
23] for details).
Since all the solutions we are interested in throughout the paper are attainable when the players restrict themselves to the class of pure strategies we will focus on this class of strategies. The pure strategy of the ith player is a function with domain that specifies for each node the next node which the player i has to choose at x. Let denote the (finite) set of all ith player’s pure strategies, .
Denote by the conditional probability that node is reached if node x has been already reached (the probability of transition from x to y) while the players use the strategies . Note that for all , , and for all if and if . For chance moves, that is, if for all for each .
Then one can calculate the probability
of realization of the trajectory
,
,
,
, when the players use the strategies
from the strategy profile
.
Denote by the finite set (or the bundle) of the trajectories which are generated by strategy profile . Note that for all , for all , , .
Let denote the ith player’s vector payoff corresponding to the trajectory .
Denote by
the (expected) value of the
ith player’s payoff function which corresponds to the strategy profile
. Let
denote the set of all terminal nodes of the trajectories
.
Remark 1 ([
9])
. If the pure strategy profiles u and v generate different bundles and of the trajectories, that is, , then . According to References [
17,
22,
23] each intermediate node
generates a subgame
with the subgame tree
and the subgame root
as well as a factor-game
with the factor-game tree
. Decomposition of the original extensive game
at node
onto the subgame
and the factor-game
generates the corresponding decomposition of the pure (and mixed) strategies (see References [
17,
22] for details).
Let
,
, denote the restriction of
on the subgame tree
, and
,
, denote the restriction of the
ith player’s pure strategy
in
on
. The pure strategy profile
generates the bundle of the subgame trajectories
Similarly to (
2), let us denote by
the expected value of the
ith player’s payoff in
, and by
the set of all possible
ith player’s pure strategies in the subgame
,
. Note that for each trajectory
,
,
,
where
denotes a fragment of trajectory
implemented before the subgame
starts, and
denotes the probability that node
is reached when the players employ the strategies
,
. It is worth noting that factor-game
is usually defined for given strategy profile
in the subgame
since we assume that
(see, e.g., References [
17,
22] for details). Moreover, given intermediate node
, the bundle
can be divided in two subsets, that is,
where
, and
,
. Then, taking (
1), (
3), (
4) and (
5) into account, we get
Note that, since
, one can compose the
ith player’s pure strategy
in the original game
from her strategies
in the subgame
and
in the factor-game
[
17,
22].
3. Refined Backwards Induction Procedure to Construct a Unique SPE
Definition 1 ([
45])
. A strategy profile is a Nash Equilibrium (NE) in , ifLet denote the set of all pure strategy Nash equilibria in .
Definition 2 ([
25])
. A strategy profile u is a subgame perfect (Nash) equilibrium (SPE) in , if it holds that , i. e. the restriction of u on each subgame forms a in this subgame. To construct SPE in an extensive-form game with perfect information one may employ a so-called backwards induction procedure (see, e.g., References [
12,
17,
22,
23,
46,
47]).
However, the backwards induction procedure may generate multiple subgame perfect equilibriums in an extensive form game with different payoffs to the players (see, e.g., References [
5,
12,
17,
23]). To choose a unique SPE and unique corresponding bundle of trajectories we use an approach based on the players’ attitude vectors. Namely, let the
ith player’s attitude vector
be a permutation of numbers
meeting the condition
. If
one may interpret the player
j as an "
ith player’s associate of level
k".
In the paper we will use these attitude vectors when constructing SPE via backwards induction procedure in the following way. Let
,
denote the
ith player’s expected payoff in the subgame
,
while
be a SPE in this subgame. Assume that there exist multiple nodes
such that
, that is, player
i is indifferent to the choice of particular node
from
while the
ith player’s choice may affect the other players’ payoffs. If
, suppose that the
ith player aims to maximize firstly the
jth player’s expected payoff
when choosing a unique node
y from
. If again there are several nodes
y with the same value
the
ith player purposes to maximize secondarily the expected payoff
of such player
l that
, and so on. Note that similar approach to construct a unique SPE in extensive-form game with perfect information but without chance moves was explored in References [
17,
26,
27] for the case when the payoffs are only determined in terminal nodes.
Now let us provide a rigorous specification of this backwards induction procedure refinement which we will refer to as the Attitude SPE or A-SPE algorithm.
Attitude SPE algorithm. Suppose that the players attitude vectors are of common knowledge, i. e. each player knows these vectors, and all the players are aware of it. Let the length of the trajectory equals to , and the multistage game length equals to the maximal length of the trajectory in . We’ll construct the unique subgame perfect equilibrium in by induction in the length L of the subgame .
- Step
: Consider a subgame of the length . If , , we have two cases.
- Case 1:
there exists a unique such that Then suppose that
- Case 2:
there exist
nodes
such that
Then suppose that the
ith player chooses the terminal position
such that
Let
denote the set of all nodes
meeting (
7). If
consists of unique node
then
,
,
. Otherwise, suppose that the
ith player chooses terminal node
such that
Let
denote the set of all final nodes
satisfying (
8), and so on.
⋮
Finally, if contains unique node , then , , . Otherwise, suppose that player i chooses the final node from with minimal ordinal number k.
Note that for all cases , .
If then and we do not need to define a strategy of any player at x, while Hence, the players’ behavior and the expected payoffs are defined for all subgames of the length 1. In addition, for games , of length we assume that , .
- Step
: Suppose that at each subgame of the length or less the unique SPE has been already constructed (“inductive assumption”), and , , is the corresponding vector of all the players’ payoffs.
- Step
: Consider the game
of the length
. Note that for all
the length of the subgame
is less than
L. If
then
for all
since
due to induction assumption, and each player
can deviate from
only in the subgames
,
.
If for some , we have two cases.
- Case 1:
there exists a unique
such that
Then we suppose that if , .
- Case 2:
there exist
nodes
such that
Then we suppose that the
ith player chooses
such that
Let
denote the set of all nodes
satisfying (
12). If
consists of unique node
then we suppose that
;
if
,
,
. Otherwise, suppose that the
ith player chooses node
such that
Let
denote the set of all nodes
meeting (
13), and so on.
⋮
Finally, if contains several nodes , denote by the minimal number of terminal nodes of the trajectories generated by subgame perfect equilibriums in the subgames , (see Remark 1). Note that there exists unique trajectory from to in the game , and let . Again, we suppose that ; if , , .
Now we prove that for both cases no player has profitable deviation in
from the strategy profile
constructed above.
for all
,
due to (
10), (
11) and the induction assumption that
,
.
For other players
,
, we have
for all
since
, and the only deviation of player
,
from
in the subgame
may affect the players’ payoffs.
Hence, taking (
9), (
14) and (
15) into account we obtain by induction that the strategy profile
constructed above forms unique subgame perfect equilibria in
.
Proposition 1. If the players attitude vectors are of common knowledge, the Attitude SPE algorithm allows to construct a unique subgame perfect equilibrium in pure strategies for any extensive-form game with chance moves as well as a unique bundle of trajectories .
It is worth noting than the existence of (subgame perfect) pure strategy equilibrium in extensive form game with perfect information and chance moves was first proved in References [
46,
47] for the partial case when the payoffs are only defined in terminal nodes. Hence, Proposition 1 could be considered as a corollary of these results. However, we provide a rigorous algorithm how to construct a unique
in extensive-form game with chance moves as well as a (unique) corresponding bundle of trajectories. We will use this algorithm, in particular, to calculate the characteristic function of the cooperative extensive-form game in
Section 4.
Let us use the following example to demonstrate how the Attitude SPE algorithm works.
Example 1. (A 3-player multistage game with chance moves).
Let , , , , . The players’ payoffs and probabilities , are written in the game tree.
Suppose that the players’ attitude vectors are , and .
When using the Attitude SPE algorithm, at each node , , the ith player has to choose the alternative marked in bold violet in Figure 1. Note that Hence, , and due to the player’s 2 attitude vector .
The A-SPE algorithm generates unique SPE , where , ; , ; , , while . We will use this SPE later in Section 4 when calculating the γ-characteristic function. 4. Cooperative Strategies and Trajectories
If the players agree to cooperate in multicriteria game
, first they are expected to maximize the total payoff
of the grand coalition. Let
denote the set of all pure strategy profiles
u, such that
The set
is known to be nonempty and it may contain multiple strategy profiles (see, e.g., Reference [
17]). Hence, the players need to agree on a specific approach they are going to use to choose a unique optimal cooperative strategy profile
as well as the corresponding optimal bundle of cooperative trajectories in the game tree. To this aim we introduce the so-calle
Players’ Rank Based (PRB) algorithm. Note that rather close approach—using the ranking of the criteria to choose a unique cooperative trajectory—was proposed recently in Reference [
8] for multicriteria extensive-form games without chance moves. Namely, suppose that the players have agreed on the so-called "rank" of each player within the grand coalition
N, and
means that the rank of player
i equals
k,
.
Players’ rank based (PRB) algorithm.
- Step
0. Consider the set
. If all strategy profiles
generate the same bundle of trajectories
(see, e.g., References [
17,
22,
23] for discussion on a certain redundancy of the pure strategy definition in extensive game), let the players choose any strategy profile
as the cooperative strategy profile and
denote the corresponding bundle of cooperative trajectories.
- Step
. Otherwise, that is, if the strategy profiles from
generate different (and hence, disjoint—see Remark 1) bundles of the trajectories, calculate
Let denote the set of all strategy profiles u such that . If all strategy profiles generate the same bundle of trajectories , the players may choose any strategy profile as the cooperative strategy profile. Otherwise proceed to the next step.
- Step
. Consider the set . If all strategy profiles generate the same bundle of trajectories , the players may choose any strategy profile as the cooperative strategy profile. Otherwise, proceed to the next step.
- Step
().
- ⋮
- Step
. Finally, if the strategy profiles from generate different bundles of the trajectories, we suppose that the players choose such that contains the trajectory with minimal number l of the terminal node (see Remark 1).
Henceforth, we will refer to the strategy profile and the bundle of the trajectories as the optimal cooperative strategy profile and the optimal bundle of cooperative trajectories respectively.
In the dynamic setting it is significant that a specific method which the players agreed to accept in order to choose a unique optimal cooperative strategy profile
as well as the corresponding optimal bundle of cooperative trajectories satisfies time consistency (see, e.g., References [
1,
2,
6,
13,
17]), that is, a fragment of the optimal bundle of the cooperative trajectories in the subgame should remain optimal in this subgame. Suppose that at each subgame
along the cooperative trajectories, that is
,
, the players choose the strategy profile
such that
Let
denote the set of all pure strategy profiles
which satisfy (
17) and the players use the same approach to choose a unique optimal cooperative strategy profile
in the subgame as for the original game
(namely, the PRB algorithm).
Proposition 2. A cooperative strategy profile for based on the PRB algorithm satisfies time consistency. Namely, let satisfies (16), and be the optimal bundle of cooperative trajectories. Then for each subgame , with , , it holds that while , that is, belongs to the optimal bundle of cooperative trajectories in the subgame .
Proof. The optimal bundle of cooperative trajectories
generated by
can be divided onto two subsets
and
while
,
. Then, taking (
5) and (
6) into account we get
and (
16) for
takes the form
Suppose that
does not satisfy (
18), that is, there exists
such that
Denote by
the bundle of all trajectories in the subgame
generated by
. Then (
21) takes the form
Denote by , , the ith player’s compound pure strategy in . The strategy profile generates the strategy bundle that can be divided onto two disjoint subsets and , where the second subset for coincides with the second subset for since , and
Adding
to both sides of (
22) we get
Then we can multiply both sides of (
23) on
and then add
to both sides of the last inequality. Taking into account (
4)–(
6) and (
20) we obtain
for the constructed strategy profile
. The last inequality contradicts the fact that
, hence (
18) is valid.
Arguing in a similar way (for the case when different strategy profiles from generate different bundles of the trajectories) we can verify that — a fragment of the cooperative trajectory , starting at — belongs to the optimal bundle of cooperative trajectories in the subgame , that is, . □
We will assume in this paper that all the players have agreed to apply the PRB algorithm in order to choose the cooperative strategy profile
that generates the optimal bundle
of cooperative trajectories in
. The next step of cooperation is to define a characteristic function
. There are different notions of characteristic functions (see, e.g., References [
23,
24,
48]), in this paper we adopt the so-called
-characteristic function introduced in Reference [
24]. Namely, we assume that
is given by the SPE (based on the Attitude SPE algorithm) outcome of S in the noncooperative game between members of S maximizing their joint payoff, and non members playing individually.
The
-characteristic function
for the subgame
,
,
along the optimal bundle of cooperative trajectories can be constructed using the same approach. Note that
Let denote extensive-form cooperative game with -characteristic function, and denote the corresponding subgame.
We assume that the players adopt a single-valued cooperative solution
(for instance, the Shapley value [
33], the nucleolus [
34], etc.) for the cooperative game
which satisfies the collective rationality property
and the individual rationality property
In addition, we assume that the same properties (
25) and (
26) are valid for the cooperative solutions
at each subgame
,
.
It is worth noting that the last assumption as well as the choice of -characteristic function ensure that every player has an incentive to cooperate at each subgame along the optimal game evolution since the ith player’s cooperative payoff-to-go at , , is at least equal to her non-cooperative counterpart: .
5. Subgame Consistency and Incremental IDP
Let
,
;
,
,
denote the Imputation Distribution Procedure (IDP) for the cooperative solution
or the payment schedule (see, e.g., References [
3,
8,
9,
10,
11,
12,
14,
15,
16,
17,
18,
20] for details). The IDP approach means that all the players have agreed to allocate the total cooperative payoff
between the players along the optimal bundle
of cooperative trajectories
according to some specific rule which is called IDP. Namely,
denotes the actual current payment which the player
i receives at position
(instead of
) if the players employ the IDP
. Moreover, one can design such an IDP
that all the players will be interested in cooperation in any subgame
,
,
, that is, at any intermediate time instant.
Definition 3. The IDP satisfies subgame efficiency, if at any intermediate node , , , it holds that: Equation (
27) means that the expected sum of the payments to player
i along the optimal subgame
evolution equals to what she is entitled to in this subgame. Then the IDP for each player can be reasonably implemented as a rule for step-by-step allocation of the
ith player’s current expected optimal payoff. Note that for
the subgame efficiency definition coincides with the efficiency at initial node
or the efficiency in the whole game
condition (see References [
9,
14,
16,
20]).
Definition 4 ([
10])
. The IDP satisfies the strict balance condition if for each node , Equation (
28) ensures the “admissibility” of the IDP, that is, the sum of payments to the players in any node
is equal to the sum of payoffs that they can collect in this node.
The next advantageous dynamic property of an IDP—the time consistency, introduced in Reference [
3]—was extended to dynamic games played over event trees in References [
14,
16,
20] as well as to multicriteria extensive-form cooperative games (with chance moves) in Reference [
9].
To write down properly the time consistency condition for some intermediate node , , , in multistage game with chance moves we need to pay attention to all chance nodes on the path .
Namely, let us numerate the chance nodes from in order of their occurrence on the path , that is, , , .
Definition 5 ([
9])
. The IDP for the cooperative solution is called time consistent in the whole game if at any intermediate node , , , for all , it holds that …
Note that for partial case when
, that is, if
follows the chance node
Equation (
30) takes the simpler form
A similar note is valid for equation (
31), and so forth.
Roughly speaking, Definition 5 implies that the payments collected by the ith player (according to the payment schedule ) before reaching some intermediate node plus the expected ith player’s component of the Shapley value in the subgame starting at plus this player’s expected Shapley value components in other subgames along the cooperative trajectories which do not contain corresponds to what the player i is entitled to in the original game .
It is worth noting that Definition 5 indeed provides a reasonable consistency requirements which a good payment schedule
should satisfy when the player evaluates IDP
at the initial node
, that is, before the game
starts (and the words “in the whole game” in Definition 5 properly reflect this feature). However, when the player purposes to evaluate IDP
in the subgame
, that is, after reaching some intermediate node
(in case when
) this player will unlikely take into account the expected future payoffs in all the subgames which are unattainable if the node
has been already reached, that is, the last addends in the LHS of (
30) and (
31). To overcome this problem we suggest the players to use a notion of subgame consistency—a refinement of time consistency that was firstly proposed in Reference [
36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [
37,
38]. Let us provide a rigorous definition of the IDP subgame consistency for extensive-form games with chance moves that is applicable in all the subgames along the optimal bundle of cooperative trajectories.
Definition 6. The IDP is called subgame consistent if at any intermediate node , , , for all , it holds that
case (no chance nodes before the subgame root ): case (only one chance node before ): case (two chance nodes before ): ⋮
case (no chance nodes after ):
The subgame consistency definition differs from the “time consistency in the whole game” property (see References [
9,
14,
16,
20]) which is based on an a priori assessment of the
ith player’s expected optimal payoff (before the game starts). However, when the players make a decision in the subgame after the chance move occurs they need to recalculate the expected optimal payoff since the original optimal bundle of cooperative trajectories shrinks after each chance node. Note that we can not write out the subgame consistency condition for
, that is, for the nodes
that immediately follow the chance nodes.
One can suggest different imputation distribution procedures that may or may not satisfy the useful properties listed above. The review of different IDP for multistage games (without chance moves) as well as the analysis of their properties can be found in References [
10,
12,
15,
17]. Below we consider the refinement of the so-called incremental IDP (see, e.g., References [
10,
14,
16,
17,
20,
21]) that was recently introduced for multistage games with chance moves [
9].
Definition 7 ([
9])
. The incremental IDP for the cooperative solution in multistage game with chance moves is defined as follows:for , , ;for .
Remark 2. Formulas (36), (37) are similar to the imputation distribution procedures suggested in References [14,16,20] for (single-criterion) stochastic discrete-time dynamic games played over event trees. If , Equation (36) takes the simpler form , where , that coincides with the “classical” incremental IDP. Let us use again 3-person extensive-form game from Example 1 to demonstrate a proposed scheme of cooperation.
Example 2. (Cooperative behavior in 3-player game from Ex. 1).
Suppose that the players have agreed on the following ranks: , and . When implementing the PRB algorithm we get the optimal bundle which contains four cooperative trajectories (marked in bold, deep blue in Figure 2): , , and . Note that players use the ranks when making decision at node . To demonstrate the implementation of the incremental IDP and its properties we will adopt the Shapley value as a single valued cooperative solution. The values of the γ-characteristic function for the original game and the Shapley value are
Consider, for instance, the incremental IDP along the longest cooperative trajectory from . If we calculate γ-characteristic functions using Attitude SPE algorithm for the subgames, we get the following results.
Subgame :
Subgame :
Subgame :
Subgame :
Subgame :
Subgame :
Subgame :
Finally, .
One can calculate the incremental IDP using (36) and (37): | | | | | | | |
| 6 | 0 | | 12 | 36 | | 12 |
| 0 | 6 | 4 | 0 | | 12 | 0 |
| 0 | 0 | 18 | 0 | 6 | 24 | 0 |
,
Note that the subgame consistency conditions at nodes , and according to (32)–(34) respectively take the form: It is known that the classical incremental IDP for multistage (and differential) games may imply negative current payments to some players at some positions (see References [
4,
10,
17,
38] for details). As one can observe in Ex. 2, this drawback of the incremental IDP may appear in the extensive-form game with chance moves as well. Two approaches how to overcome this possible disadvantage were suggested in References [
4,
10]. Unfortunately, as it was firstly proved in Reference [
10], in general it is impossible to design a time consistent IDP which satisfies both the balance condition and non-negativity constraint.
Proposition 3. The incremental IDP (36), (37) satisfies strict balance condition (28), the subgame efficiency condition (27), and the subgame consistency conditions (32)–(35). Proof. Incremental IDP
was proved to satisfiy strict balance condition (
28) in Reference [
9]. The proof of subgame consistency can be carried out by direct verification. For instance, consider the case when
. Then, using Remark 2 we get
Obviously, (
33) is satisfied.
The proof that IDP (
36), (
37) satisfies subgame efficiency (
27) is based on direct calculations but rather cumbersome in general case (i.e., for arbitrary game
). Let us demonstrate how it works for the game in Example 2. For instance we verify that the incremental IDP meets the subgame efficiency condition at node
.
Note that
while
,
and
. Then, using (
32), (
33), Remark 2, equality
and the notation
we obtain
□
According to Proposition 3, the incremental payment schedule (
36), (
37) can be used to implement a long-term cooperative agreement in an extensive-form game with chance moves.
6. Conclusions
In the paper we purposes to design a mechanism of the players’ sustainable long-term cooperation that satisfies a number of good properties. To this aim we formalised the players’ rank based algorithm for selecting a unique optimal bundle of cooperative trajectories, and proved that corresponding cooperative strategy profile satisfies time consistency. To calculate -characteristic function one need to have a specific method for constructing a unique (subgame perfect) equilibrium at any extensive-form game with chance moves. Hence, we formalised a backwards induction procedure refinement based on the players’ attitude vectors—the so-called attitude SPE algorithm.
As a result of reexamination of the “IDP time consistency in the whole game” concept, we suggest to adopt the concept of subgame consistency, introduced in Reference [
36] for differential stochastic games and then extend it to dynamic stochastic games in References [
37,
38]. The definition of subgame consistency for extensive-form game with chance moves is provided. This property takes into account such an interesting feature of the games under consideration that when the players make a decision in the subgame
after the chance move occurs, they need to recalculate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. It is worth noting that a similar approach, based on the IDP subgame consistency notion could be applied to dynamic games played over event trees ([
14,
16,
20]). We proved that the incremental IDP specified for multistage games with chance moves in Reference [
9] satisfy subgame consistency and subgame efficiency as well as the strict balance condition.
It follows from Propositions 1–3 that two specified algorithms combined with the
-characteristic function, and the incremental payment schedule together constitute a mechanism of the players’ sustainable cooperation that satisfies a number of good properties and could be used in extensive-form games with chance moves. Note that the main result of the paper—Proposition 3—does not depend on the specific method which the players employ to calculate the characteristic function as well as on the specific single-valued cooperative solution meeting (
25) and (
26).
Since this is the first time that subgame consistent solutions are examined for extensive-form games with chance moves, further research along this line is expected. It is surely of interest to develop appropriate software application to implement proposed algorithms in arbitrary extensive-form game with chance moves. Possibly, one can use the so-called Game Theory Explorer [
30] when developing such software tools for 2-person extensive games. Further, it might be interesting to run experiments with large-scale datasets, after the software application that allows to construct unique SPE, the optimal bundle of cooperative trajectories,
-characteristic function, and so forth, will be developed.
Let us notice some preliminary suggestions on how one can use such software application to run simulations. First, one can vary the main parameter—the length of the game tree, and the additional parameters such as the game structure, the players’ payoffs, probabilities of transitions, and so forth, to obtain practical estimations of the proposed algorithms complexity and scalability. Secondarily, one can generate external disturbances of the stage payoffs and probabilities and vary the players’ attitude vectors to carry out the sensitivity analysis of the proposed non-cooperative and cooperative solutions. Further, it is of interest to get experimental estimations of the price of anarchy and the price of stability for the class of games under consideration. Finally, one can use such software application to check whether the additional properties (non-negativity, irrational-behavior-proof conditions, etc.) of the proposed incremental IDP and other payment schedules (see, e.g., Reference [
15]) are satisfied for given extensive-form game with chance moves.