Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves

: We design a mechanism of the players’ sustainable cooperation in multistage n -person game in the extensive form with chance moves. When the players agreed to cooperate in a dynamic game they have to ensure time consistency of the long-term cooperative agreement. We provide the players’ rank based (PRB) algorithm for choosing a unique cooperative strategy proﬁle and prove that corresponding optimal bundle of cooperative strategies satisﬁes time consistency, that is, at every subgame along the optimal game evolution a part of each original cooperative trajectory belongs to the subgame optimal bundle. We propose a reﬁnement of the backwards induction procedure based on the players’ attitude vectors to ﬁnd a unique subgame perfect equilibrium and use this algorithm to calculate a characteristic function. Finally, to ensure the sustainability of the cooperative agreement in a multistage game we employ the imputation distribution procedure (IDP) based approach, that is, we design an appropriate payment schedule to redistribute each player’s optimal payoff along the optimal bundle of cooperative trajectories. We extend the subgame consistency notion to extensive-form games with chance moves and prove that incremental IDP satisﬁes subgame consistency, subgame efﬁciency and balance condition. An example of a 3-person multistage game is provided to illustrate the proposed cooperation mechanism.


Introduction
In a dynamic n-person game the players first choose their "optimal" strategies at the initial position x 0 (which form the optimal strategy profile for the whole game), and then have an option to change their strategies at any intermediate position x t and switch to other strategies if these strategies constitute the locally optimal strategy profile for the subgame starting at x t . The time consistency property (first introduced in References [1-3] for differential games) ensures that the players will not have an incentive to change their strategies at any subgame along the optimal game evolution, and hence plays an important role in the designing of the optimal players' behavior in non-cooperative and cooperative dynamic games (see, e.g., References [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21], for details).
We consider an n-person finite multistage games in the extensive form (see, e.g., References [5,17,22,23]) with perfect information and with chance moves. Note that much research has been already done on time consistent solutions (or close concepts) in extensive-form games (see, e.g., References [4,6,13,17,21]). Time consistency concept was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (without chance moves) in References [7,8,10,11,15]. The property of "time consistency in the whole game" was extended to multicriteria extensive-form cooperative games with chance moves in Reference [9] (note that in these games an optimal pure strategy profile does not generate the unique optimal trajectory in the game tree but rather the whole optimal bundle of the trajectories).
In the paper, we mainly focus on the dynamic aspects of cooperation in a dynamic extensive-form game with chance moves, and propose to design a mechanism of the players' sustainable cooperation which satisfies three properties. First, a fragment of each cooperative trajectory from the optimal bundle for the original game Γ x 0 should "remain optimal" at each subgame Γ x t along the cooperative game evolution, that is, it should belong to the subgame optimal bundle of cooperative trajectories. Secondarily, a cooperative payoff-to-go at the subgame Γ x t is no less than the non-cooperative payoff-to-go for all players. Finally, when the players re-evaluate their expected cooperative payoff after each passed chance move, they have no incentive to change original cooperative agreement.
To this aim, we first need to provide a rule for choosing a unique cooperative strategy profile as well as the unique optimal bundle of cooperative trajectories. We introduce the Players' Rank Based (PRB) algorithm and prove that this algorithm generates the unique optimal bundle of cooperative trajectories which satisfies time consistency. Note that a rather close approach-the so-called Refined Leximin (RL) algorithm-was introduced recently in Reference [8]. Let us notice the main differences of these two algorithms. The RL algorithm is applicable for multicriteria game without chance moves and is based on the ranking of the criteria, while the PRB algorithm is designed for single-criterion extensive-form game with chance moves and employed the players' ranks. Further, the RL algorithm allows to choose a unique cooperative trajectory while the PRB algorithm generates the unique optimal bundle of the cooperative trajectories in the game tree. To the best of the authors' knowledge, other approaches to choosing an optimal bundle of the cooperative trajectories in extensive-form game with chance moves have not been considered yet.
Then, to construct a characteristic function (which describes the worth of each coalition in cooperative game) we use an equilibrium-based approach, namely the γ-characteristic function introduced in Reference [24]. Hence, the players have to accept a specific method for choosing a unique Subgame Perfect Equilibria (SPE) [25] in an extensive-form game with chance moves. To solve this problem we provide the novel refinement of the backwards induction procedure (see, e.g., References [5,17,23])-the so-called Attitude SPE algorithm. A similar approach to construct a unique SPE in extensive-form game with perfect information was explored in References [17,26,27] and was called the Type Equilibrium (TE) algorithm. Both algorithms are the refinements of the general backwards induction procedure that take into account the attitudes of each player towards other players. Let us point out the main differences of these algorithms. The TE algorithm is applicable for the game without chance moves and for the case when the payoffs are only determined in terminal nodes. In addition, the TE algorithm allows to construct SPE that is "unique" in the sense of payoffs (i.e., there may exist several optimal trajectories which generate the same equilibrium payoffs) while the Attitude SPE algorithm allows to choose unique SPE strategy profile as well as unique bundle of trajectories. Another rather close approach to find a unique SPE-the so-called Indifferent Equilibrium (IE) algorithm-was introduced in Reference [28]. Again, the IE algorithm is applicable only for the game without chance moves and for the partial case when the payoffs are determined in terminal nodes. Moreover, IE algorithm in general allows to construct a SPE in behavior strategies while the proposed Attitude SPE algorithm always generates a SPE in pure strategies.
It is worth noting, that other approaches to analyze an extensive-form game, except for the backwards induction procedure and its refinements mentioned above, imply that the researcher first needs to obtain a strategic representation of the original extensive game and then analyzes this strategic (or normal-form) game (see, e.g., References [29][30][31] ). For instance, the software tool "Game Theory Explorer" [29] is based on the strategic-form representation and then applying the modified Lemke-Howson algorithm [32] to find all Nash equilibria. The majority of existing algorithms are developed to find Nash equilibria in mixed strategies for 2-person games and do not allow to construct SPE in pure strategies. Moreover, as it was noted in Reference [31], in general the strategic-form representation is exponential in the size of the original game tree. In contrast, the proposed Attitude SPE algorithm is a rather simple recursive algorithm which deals with n-person extensive-form game (with perfect information) itself and allows to compute a unique SPE in pure strategies.
After computing the γ-characteristic function we suppose that the players adopt some single-valued cooperative solution ϕ (for instance, the Shapley value [33], the nucleolus [34], etc.) which satisfies the individual and collective rationality property. Finally, to guarantee the sustainability of the achieved long-term cooperative agreement we employ the Imputation Distribution Procedure (IDP) based approach (see, e.g., References [3,12,14,[16][17][18]20,35]), that is, a payment schedule to redistribute the ith player's expected cooperative payoff along the optimal bundle of cooperative trajectories. In this paper, we mainly focus on the following good properties an IDP may satisfy: subgame efficiency, strict balance condition [10,15,17] and an appropriate refinement of the time consistency property, called subgame consistency. The point is that the "time consistency in the whole game" property [9,14,16,20] is based on an a priori assessment of the ith player's expected optimal payoff (before the game Γ x 0 starts). However, when the players make a decision in the subgame Γ x t after the chance move occurs, they need to re-estimate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. To deal with this interesting feature of the game with chance moves we adopt the notion of subgame consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38].
Since we derive a suitable definition of subgame consistency for other class of games, the proposed Definition 6 differs from ones provided in References [37,38] but captures the same idea. Let us point out the main differences with References [37,38]. While D. Yeung and L. Petrosyan do not consider the issue of multiple equilibria and study the stochastic games in which there exists a unique Nash equilibrium in each subgame, we focus on the problem of how to select a unique (subgame perfect) Nash equilibrium in extensive-form game with chance moves and derive the corresponding algorithm. Secondarily, the characteristic function has not been constructed in References [37,38] and, hence, the players are restricted to using the simplest cooperative solutions (for instance, they may share equally the excess of the total expected cooperative payoff over the expected sum of individual non-cooperative payoffs), whereas we provide a method for calculating the γ-characteristic function. Hence, the players may use different solution concepts based on the characteristic function approach. Finally, it turns out that the incremental IDP specified for extensive-form games with chance moves in Reference [9] satisfies not only the subgame consistency but also subgame efficiency and strict balance condition. Therefore, the suggested PRB algorithm, the Attitude SPE algorithm combined with the γ-characteristic function, and the incremental payment schedule for any single-valued cooperative solution (meeting individual and collective rationality) together constitute a required mechanism of the players' sustainable cooperation that satisfies exactly three properties mentioned above for any extensive-form game with chance moves. It is worth noting that the extensive-form games, as well as dynamic games played over event trees, differential games and multistage games with discrete dynamics are used to model various real-world situations where several decision makers (or players) with different objectives may cooperate (see, e.g., References [5,12,14,16,17,20,[39][40][41][42][43][44]. Hence, a proposed approach to implement a long-term cooperative agreement may have a number of possible applications.
The rest of the paper is organized as follows: Section 2 recalls the main ingredients of the class of games of interest. In Section 3, we specify the attitude SPE algorithm that allows constructing a unique SPE in a extensive-form game with chance moves. In Section 4, we provide the PRB algorithm and prove that the optimal bundle of cooperative trajectories generated by this algorithm satisfy time consistency. Section 5 reveals a drawback of the IDP "time consistency in the whole game" property and presents a subgame consistency definition that is applicable for extensive-form games with chance moves. We prove that incremental IDP satisfies a number of good properties and consider an example of a 3-person multistage game with chance moves to illustrate the incremental IDP implementation. Section 6 provides a brief review of the results and discussion.

Extensive-Form Game with Chance Moves
We consider a finite multistage game in extensive form following References [6,13,17,22,23]. First we need to define the basic notations and briefly remind some properties of extensive-form game that will be used in the sequel: . . , n} is the set of all players. • K is the game tree with the root x 0 and the set of all nodes P. • S(x) is the set of all direct successors (descendants) of the node x, and S −1 (y) is the unique predecessor (parent) of the node y = x 0 such that y ∈ S(S −1 (y)). • P i is the set of all decision nodes of the ith player (at these nodes the player i chooses the following node), P i ∩ P j = ∅, for all i, j ∈ N, i = j.
denotes the set of all terminal nodes (final positions), S(z j ) = ∅ ∀z j ∈ P n+1 . • P 0 is the set of all nodes at which a chance moves, where π(y|x) > 0 denotes the probability of transition from node x ∈ P 0 to node y ∈ S(x). We suppose that for each x ∈ P 0 it holds that • ω = (x 0 , . . . , x t−1 , x t , . . . , x T ) is the trajectory (or the path) in the game tree, ; where index t in x t denotes the ordinal number of this node within the trajectory ω and can be interpreted as the "time index".
) is the payoff of the ith player at the node x ∈ P. We assume that for all i ∈ N, k = 1, . . . , r, and x ∈ P the payoffs are non-negative, that is, h i/k (x) 0.
In the following, we will use G cm (n) to denote the class of all finite multistage n-person games with chance moves in extensive form defined above, where Γ x 0 ∈ G cm (n) denotes a game with root x 0 . Note that Γ x 0 is an extensive-form game with perfect information (see, e.g., References [17,22,23] for details).
Since all the solutions we are interested in throughout the paper are attainable when the players restrict themselves to the class of pure strategies we will focus on this class of strategies. The pure strategy u i (·) of the ith player is a function with domain P i that specifies for each node x ∈ P i the next node u i (x) ∈ S(x) which the player i has to choose at x. Let U i denote the (finite) set of all ith player's pure strategies, U = ∏ i∈N U i . Denote by p(y|x, u) the conditional probability that node y ∈ S(x) is reached if node x has been already reached (the probability of transition from x to y) while the players use the strategies u i , i ∈ N. Note that for all x ∈ P i , i = 1, . . . , n, and for all Then one can calculate the probability p(ω, u) of realization of the trajectory ω = (x 0 , . . . , . . , T − 1, when the players use the strategies u i from the strategy profile u = (u 1 , . . . , u n ).
Denote by Ω(u) = {ω k (u)|p(ω k , u) > 0} the finite set (or the bundle) of the trajectories ω k which are generated by strategy profile u ∈ U. Note that for all ω k (u) ∈ Ω(u), u j (x τ ) = x τ+1 for all

Denote by
the (expected) value of the ith player's payoff function which corresponds to the strategy profile u = (u 1 , . . . , u n ). Let Ω n+1 (u) = {Ω(u) ∩ P n+1 } denote the set of all terminal nodes of the trajectories

Remark 1 ([9]
). If the pure strategy profiles u and v generate different bundles Ω(u) and Ω(v) of the trajectories, that is, According to References [17,22,23] each intermediate node x t ∈ P \ P n+1 generates a subgame Γ x t with the subgame tree K x t and the subgame root x t as well as a factor-game Γ D with the factor-game Decomposition of the original extensive game Γ x t at node x t onto the subgame Γ x t and the factor-game Γ D generates the corresponding decomposition of the pure (and mixed) strategies (see References [17,22] for details). Let the expected value of the ith player's payoff in Γ x t , and by U x t i the set of all possible ith player's pure strategies in the subgame where ω x t = (x 0 , x 1 , . . . , x t−1 , x t ) denotes a fragment of trajectory ω implemented before the subgame Γ x t starts, and p(ω x t , u) = p(x t , u) denotes the probability that node x t is reached when the players employ the strategies u i , i ∈ N. It is worth noting that factor-game Γ D = Γ D (u x t ) is usually defined for given strategy profile u x t in the subgame Γ x t since we assume that (see, e.g., References [17,22] for details). Moreover, given intermediate node Then, taking (1), (3), (4) and (5) into account, we get Note that, since

Refined Backwards Induction Procedure to Construct a Unique SPE
the restriction of u on each subgame Γ x forms a NE in this subgame.
However, the backwards induction procedure may generate multiple subgame perfect equilibriums in an extensive form game with different payoffs to the players (see, e.g., References [5,12,17,23]). To choose a unique SPE and unique corresponding bundle of trajectories we use an approach based on the players' attitude vectors. Namely, let the ith player's attitude vector . . , f i (n)} be a permutation of numbers {1, . . . , n} meeting the condition f i (i) = 1. If f i (j) = k one may interpret the player j as an "ith player's associate of level k".
In the paper we will use these attitude vectors when constructing SPE via backwards induction procedure in the following way. Let x ∈ P i , H y i (u y ) denote the ith player's expected payoff in the subgame Γ y , y ∈ S(x) while u y be a SPE in this subgame. Assume that there exist multiple nodes , that is, player i is indifferent to the choice of particular node y from {y 1 , . . . , y q } while the ith player's choice may affect the other players' payoffs. If f i (j) = 2, suppose that the ith player aims to maximize firstly the jth player's expected payoff H y j (u y ) when choosing a unique node y from y 1 , . . . , y q . If again there are several nodes y with the same value H y j (u y ) the ith player purposes to maximize secondarily the expected payoff H y l (u y ) of such player l that f i (l) = 3, and so on. Note that similar approach to construct a unique SPE in extensive-form game with perfect information but without chance moves was explored in References [17,26,27] for the case when the payoffs are only determined in terminal nodes. Now let us provide a rigorous specification of this backwards induction procedure refinement which we will refer to as the Attitude SPE or A-SPE algorithm.
Attitude SPE algorithm. Suppose that the players attitude vectors F 1 , F 2 , . . . , F n are of common knowledge, i. e. each player knows these vectors, and all the players are aware of it. Let the length of the trajectory ω = (x 0 , . . . , x t , x t+1 , . . . , x T ) equals to T − 1, and the multistage game Γ x 0 length equals to the maximal length of the trajectory ω in Γ x 0 . We'll construct the unique subgame perfect equilibrium u = (u 1 , . . . , u n ) in Γ x 0 by induction in the length L of the subgame Γ x .
Step L = 1: Consider a subgame Γ x of the length L = 1. If x ∈ P i , i = 1, . . . , n, we have two cases.
. Then suppose that the ith player chooses the terminal position Let S i,2 (x) denote the set of all nodes z k ∈ S i,1 (x) meeting (7).
Let S i,3 (x) denote the set of all final nodes z k ∈ S i,2 (x) satisfying (8), and so on. . . .
Otherwise, suppose that player i chooses the final node z k from S i,n (x) with minimal ordinal number k.
Note that for all cases H j (u) = h j (z k ), j ∈ N.
If x ∈ P 0 then S(x) = P x n+1 and we do not need to define a strategy of any player at x, while H j (u) = ∑ z k ∈S(x) π(z k |x) · h j (z k ). Hence, the players' behavior u x = (u x 1 , . . . , u x n ) ∈ NE(Γ x ) and the expected payoffs H x j (u x ), j ∈ N are defined for all subgames Γ x of the length 1. In addition, for games Γ y , y ∈ P n+1 of length L = 0 we assume that H y i (u y ) = h i (y), i ∈ N.
Step 2, . . . , L − 1: Suppose that at each subgame Γ y of the length (L − 1) or less the unique SPE u y = (u y 1 , . . . , u y n ) has been already constructed ("inductive assumption"), and H y i (u y ), i ∈ N, is the corresponding vector of all the players' payoffs.
Step L: Consider the game Γ x 0 of the length L 1. Note that for all y ∈ S(x 0 ) the length of the subgame Γ y is less than L. If x 0 ∈ P 0 then for all u j = u y j ∈ U y j = U j since u y ∈ NE(Γ y ) due to induction assumption, and each player j ∈ N can deviate from u j only in the subgames Γ y , y ∈ S(x 0 ). If x 0 ∈ P i for some i ∈ N, we have two cases.
Case 1: there exists a unique y ∈ S(x 0 ) such that Then we suppose that u i (x 0 ) = y; u j (x) = u y j (x) if x ∈ P j ∩ K y , y ∈ S(x 0 ), j = 1, . . . , n.
Case 2: there exist q > 1 nodes y 1 , . . . , y q ∈ S(x 0 ) such that Then we suppose that the ith player chooses y ∈ {y 1 , . . . , Let S i,2 (x 0 ) denote the set of all nodes y ∈ S i,1 (x 0 ) satisfying (12). If S i,2 (x 0 ) consists of unique node y then we suppose that u i (x 0 ) = y; u j (x) = u y j (x) if x ∈ P j ∩ K y , y ∈ S(x 0 ), j = 1, . . . , n. Otherwise, suppose that the ith player chooses node y ∈ S i,2 (x 0 ) such that Let S i,3 (x 0 ) denote the set of all nodes y ∈ S i,2 (x 0 ) meeting (13), and so on. . . .
Finally, if S i,n (x 0 ) contains several nodes y m , denote by l = min ∩ Ω(u y m )} the minimal number of terminal nodes of the trajectories generated by subgame perfect equilibriums u y m in the subgames Γ y m , y m ∈ S i,n (x 0 ) (see Remark 1). Note that there exists unique trajectory ω = (x 0 , . . . , z l ) from x 0 to z l in the game Γ x 0 , and let y = ω ∩ S i,n (x 0 ). Again, we suppose that Now we prove that for both cases no player has profitable deviation in Γ x 0 from the strategy profile u = (u 1 , . . . , u n ) constructed above.
for all y ∈ S(x 0 ), u y i ∈ U y i due to (10), (11) and the induction assumption that u y ∈ NE(Γ y ), y ∈ S(x 0 ). For other players j ∈ N, j = i, we have for all u j ∈ U j since u y ∈ NE(Γ y ), and the only deviation of player j ∈ N, j = i from u j in the subgame Γ y may affect the players' payoffs. Hence, taking (9), (14) and (15) into account we obtain by induction that the strategy profile u = (u 1 , . . . , u n ) constructed above forms unique subgame perfect equilibria in Γ x 0 . Proposition 1. If the players attitude vectors F 1 , F 2 , ..., F n are of common knowledge, the Attitude SPE algorithm allows to construct a unique subgame perfect equilibrium u = (u 1 , . . . , u n ) in pure strategies for any extensive-form game Γ x 0 ∈ G cm (n) with chance moves as well as a unique bundle of trajectories Ω(u).
It is worth noting than the existence of (subgame perfect) pure strategy equilibrium in extensive form game with perfect information and chance moves was first proved in References [46,47] for the partial case when the payoffs are only defined in terminal nodes. Hence, Proposition 1 could be considered as a corollary of these results. However, we provide a rigorous algorithm how to construct a unique SPE in extensive-form game with chance moves as well as a (unique) corresponding bundle of trajectories. We will use this algorithm, in particular, to calculate the characteristic function of the cooperative extensive-form game in Section 4.
Let us use the following example to demonstrate how the Attitude SPE algorithm works.
When using the Attitude SPE algorithm, at each node x ∈ P i , i = 1, 2, 3, the ith player has to choose the alternative marked in bold violet in Figure 1. Note that Hence, S 2,1 (x 1 2 ) = {z 2 , x 3 }, and u 2 (x 1 2 ) = z 2 due to the player's 2 attitude vector F 2 . The A-SPE algorithm generates unique SPE u = (u 1 , (11,22,18). We will use this SPE later in Section 4 when calculating the γ-characteristic function.

Cooperative Strategies and Trajectories
If the players agree to cooperate in multicriteria game Γ x 0 , first they are expected to maximize the total payoff n ∑ i=1 H i (u) of the grand coalition. Let U(Γ x 0 ) denote the set of all pure strategy profiles u, The set U(Γ x 0 ) is known to be nonempty and it may contain multiple strategy profiles (see, e.g., Reference [17]). Hence, the players need to agree on a specific approach they are going to use to choose a unique optimal cooperative strategy profile u ∈ U(Γ x 0 ) as well as the corresponding optimal bundle of cooperative trajectories in the game tree. To this aim we introduce the so-calle Players' Rank Based (PRB) algorithm. Note that rather close approach-using the ranking of the criteria to choose a unique cooperative trajectory-was proposed recently in Reference [8] for multicriteria extensive-form games without chance moves. Namely, suppose that the players have agreed on the so-called "rank" of each player within the grand coalition N, and r(k) = i means that the rank of player i equals k, k = 1, ..., n.
Step 0. Consider the set U(Γ x 0 ). If all strategy profiles u ∈ U(Γ x 0 ) generate the same bundle of trajectories Ω(u) (see, e.g., References [17,22,23] for discussion on a certain redundancy of the pure strategy definition in extensive game), let the players choose any strategy profile u ∈ U(Γ x 0 ) as the cooperative strategy profile and Ω(u) denote the corresponding bundle of cooperative trajectories.
Step k = 1. Otherwise, that is, if the strategy profiles from U(Γ x 0 ) generate different (and hence, disjoint-see Remark 1) bundles of the trajectories, calculate Let U r(1) (Γ x 0 ) denote the set of all strategy profiles u such that H r(1) (u) = H r (1) . If all strategy profiles u ∈ U r(1) (Γ x 0 ) generate the same bundle of trajectories Ω(u), the players may choose any strategy profile u ∈ U r(1) (Γ x 0 ) as the cooperative strategy profile. Otherwise proceed to the next step.
Step k = 2. Consider the set U r(1) (Γ x 0 ). If all strategy profiles u ∈ U r(1) (Γ x 0 ) generate the same bundle of trajectories Ω(u), the players may choose any strategy profile u ∈ U r(1) (Γ x 0 ) as the cooperative strategy profile. Otherwise, proceed to the next step.
Henceforth, we will refer to the strategy profile u ∈ U(Γ x 0 ) and the bundle of the trajectories Ω(u) as the optimal cooperative strategy profile and the optimal bundle of cooperative trajectories respectively.
In the dynamic setting it is significant that a specific method which the players agreed to accept in order to choose a unique optimal cooperative strategy profile u ∈ U(Γ x 0 ) as well as the corresponding optimal bundle of cooperative trajectories satisfies time consistency (see, e.g., References [1,2,6,13,17]), that is, a fragment of the optimal bundle of the cooperative trajectories in the subgame should remain optimal in this subgame. Suppose that at each subgame Γ x t along the cooperative trajectories, that is x t ∈ ω(u), ω(u) ∈ Ω(u), the players choose the strategy profile u x t ∈ U x t such that Let U(Γ x t ) denote the set of all pure strategy profiles u x t ∈ U x t which satisfy (17) and the players use the same approach to choose a unique optimal cooperative strategy profile u x t ∈ U(Γ x t ) in the subgame as for the original game Γ x 0 (namely, the PRB algorithm).

Proposition 2.
A cooperative strategy profile for Γ x 0 ∈ G cm (n) based on the PRB algorithm satisfies time consistency. Namely, let u ∈ U satisfies (16), and Ω(u) be the optimal bundle of cooperative trajectories. Then for each subgame Γ x t , x t ∈ ω(u) = (x 0 , . . . , x t , x t+1 , . . . , while ω x t = (x t ,x t+1 , . . . ,x T ) ∈ Ω(u x t ), that is, ω x t belongs to the optimal bundle of cooperative trajectories in the subgame Γ x t .
Proof. The optimal bundle of cooperative trajectories Ω(u) generated by u ∈ PO(Γ x 0 ) can be divided onto two subsets . Then, taking (5) and (6) into account we get (19) and (16) for u takes the form Suppose that u x t does not satisfy (18), that is, there exists v x t ∈ U x t such that Denote by W i = (ū D i , v x t i ), i ∈ N, the ith player's compound pure strategy in Γ x 0 . The strategy profile W = (W 1 , . . . , W n ) generates the strategy bundle Ω(W) that can be divided onto two disjoint subsets {λ m } = {ω ∈ Ω(W) | x t ∈ ω} and {χ l } = {ω ∈ Ω(W) | x t / ∈ ω}, where the second subset for Ω(W) coincides with the second subset for Ω(u) since W D = u D , and λ m = (x 0 , . . . , x t ) ∪ (x t , . . . , x T(m) ) = (x 0 , . . . , x t ) ∪ λ x t m . Adding ∑ i∈Nh i (x 0 , x 1 . . . , x t−1 ) to both sides of (22) we get Then we can multiply both sides of (23) on p(x t , u) = p(x t , u D ) = p(x t , W D ) = p(x t , W) > 0 and then add ∑ i∈N ∑ l p(χ l , u D ) ·h i (χ l ) to both sides of the last inequality. Taking into account (4)- (6) and (20) we obtain for the constructed strategy profile W ∈ U. The last inequality contradicts the fact that u ∈ U(Γ x 0 ), hence (18) is valid.
Arguing in a similar way (for the case when different strategy profiles from U(Γ x t ) generate different bundles of the trajectories) we can verify that ω x t = (x t , . . . , x T ) -a fragment of the cooperative trajectory ω ∈ Ω(u), starting at x t -belongs to the optimal bundle of cooperative trajectories in the subgame Γ x t , that is, ω x t ∈ Ω(u x t ).
We will assume in this paper that all the players have agreed to apply the PRB algorithm in order to choose the cooperative strategy profile u = (ū 1 , . . . ,ū n ) that generates the optimal bundle Ω(u) of cooperative trajectories in Γ x 0 ∈ G cm (n). The next step of cooperation is to define a characteristic function V x 0 (S). There are different notions of characteristic functions (see, e.g., References [23,24,48]), in this paper we adopt the so-called γ-characteristic function introduced in Reference [24]. Namely, we assume that V x 0 (S) is given by the SPE (based on the Attitude SPE algorithm) outcome of S in the noncooperative game between members of S maximizing their joint payoff, and non members playing individually.
The γ-characteristic function V x t for the subgame Γ x t , x t ∈ ω m (u) = (x 0 , . . . , x t , . . . , x T(m) ), ω m (u) ∈ Ω(u) along the optimal bundle of cooperative trajectories can be constructed using the same approach. Note that Let Γ x 0 (N, V x 0 ) denote extensive-form cooperative game Γ x 0 ∈ G cm (n) with γ-characteristic function, and Γ x t N, V x t denote the corresponding subgame.
We assume that the players adopt a single-valued cooperative solution ϕ x 0 (for instance, the Shapley value [33], the nucleolus [34], etc.) for the cooperative game Γ x 0 (N, V x 0 ) which satisfies the collective rationality property and the individual rationality property In addition, we assume that the same properties (25) and (26) are valid for the cooperative solutions ϕx t at each subgame Γx t (N, Vx t ), t = 0, . . . , T − 1.
It is worth noting that the last assumption as well as the choice of γ-characteristic function ensure that every player has an incentive to cooperate at each subgame along the optimal game evolution since the ith player's cooperative payoff-to-go at Γx t (N, Vx t ), t = 0, . . . , T − 1, is at least equal to her non-cooperative counterpart: ϕx t i Hx t i (ux t ).

Subgame Consistency and Incremental IDP
Let β = {β i (x τ )}, i = 1, . . . , n; τ = 1, . . . , T(l), x(τ) ∈ ω l (u), ω l (u) ∈ Ω(u) denote the Imputation Distribution Procedure (IDP) for the cooperative solution ϕ x 0 i i∈N or the payment schedule (see, e.g., References [3,[8][9][10][11][12][14][15][16][17][18]20] for details). The IDP approach means that all the players have agreed to allocate the total cooperative payoff V x 0 (N) between the players along the optimal bundle Ω(u) of cooperative trajectories ω l (u) according to some specific rule which is called IDP. Namely, β i (x τ ) denotes the actual current payment which the player i receives at position x τ (instead of h i (x τ )) if the players employ the IDP β. Moreover, one can design such an IDP β that all the players will be interested in cooperation in any subgame Equation (27) means that the expected sum of the payments to player i along the optimal subgame Γ x t evolution equals to what she is entitled to in this subgame. Then the IDP for each player can be reasonably implemented as a rule for step-by-step allocation of the ith player's current expected optimal payoff. Note that for t = 0 the subgame efficiency definition coincides with the efficiency at initial node x 0 or the efficiency in the whole game Γ x 0 condition (see References [9,14,16,20]).
Equation (28) ensures the "admissibility" of the IDP, that is, the sum of payments to the players in any nodex τ is equal to the sum of payoffs that they can collect in this node.
The next advantageous dynamic property of an IDP-the time consistency, introduced in Reference [3] -was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (with chance moves) in Reference [9].
Note that for partial case when x t ∈ S(x t(1) ), that is, if x t follows the chance node x t(1) Equation (30) takes the simpler form A similar note is valid for equation (31), and so forth. Roughly speaking, Definition 5 implies that the payments collected by the ith player (according to the payment schedule β) before reaching some intermediate node x t plus the expected ith player's component of the Shapley value in the subgame Γ x t starting at x t plus this player's expected Shapley value components in other subgames along the cooperative trajectories which do not contain x t corresponds to what the player i is entitled to in the original game Γ x 0 (N, V x 0 ).
It is worth noting that Definition 5 indeed provides a reasonable consistency requirements which a good payment schedule β should satisfy when the player evaluates IDP β at the initial node x 0 , that is, before the game Γ x 0 (N, V x 0 ) starts (and the words "in the whole game" in Definition 5 properly reflect this feature). However, when the player purposes to evaluate IDP β in the subgame Γ x t , that is, after reaching some intermediate node x t (in case when θ 1) this player will unlikely take into account the expected future payoffs in all the subgames which are unattainable if the node x t has been already reached, that is, the last addends in the LHS of (30) and (31). To overcome this problem we suggest the players to use a notion of subgame consistency -a refinement of time consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38]. Let us provide a rigorous definition of the IDP subgame consistency for extensive-form games with chance moves that is applicable in all the subgames along the optimal bundle of cooperative trajectories. Definition 6. The IDP β = {β i (x τ )} is called subgame consistent if at any intermediate node x t ∈ ω(u), ω(u) ∈ Ω(u), 1 t T, for all i ∈ N, it holds that case 1 t t(1) (no chance nodes before the subgame Γ x t root x t ): case t(1) + 1 < t t(2) (only one chance node y 1 = x t(1) before x t ): case t(2) + 1 < t t(3) (two chance nodes before x t ): . . . case t(θ) + 1 < t T (no chance nodes after x t ): The subgame consistency definition differs from the "time consistency in the whole game" property (see References [9,14,16,20]) which is based on an a priori assessment of the ith player's expected optimal payoff (before the game starts). However, when the players make a decision in the subgame after the chance move occurs they need to recalculate the expected optimal payoff since the original optimal bundle of cooperative trajectories shrinks after each chance node. Note that we can not write out the subgame consistency condition for t = t(1) + 1, t(2) + 1, ..., t(θ) + 1, that is, for the nodes x t that immediately follow the chance nodes.
One can suggest different imputation distribution procedures that may or may not satisfy the useful properties listed above. The review of different IDP for multistage games (without chance moves) as well as the analysis of their properties can be found in References [10,12,15,17]. Below we consider the refinement of the so-called incremental IDP (see, e.g., References [10,14,16,17,20,21]) that was recently introduced for multistage games with chance moves [9]. Definition 7 ([9]). The incremental IDP for the cooperative solution ϕ x 0 in multistage game with chance moves Γ x 0 is defined as follows: for x t ∈ ω l (u) = (x 0 , . . . , x t , . . . , x T(l) ), ω l (u) ∈ Ω(u), t = 0, . . . , T(l) − 1; It is known that the classical incremental IDP for multistage (and differential) games may imply negative current payments to some players at some positions (see References [4,10,17,38] for details). As one can observe in Ex. 2, this drawback of the incremental IDP may appear in the extensive-form game with chance moves as well. Two approaches how to overcome this possible disadvantage were suggested in References [4,10]. Unfortunately, as it was firstly proved in Reference [10], in general it is impossible to design a time consistent IDP which satisfies both the balance condition and non-negativity constraint.
According to Proposition 3, the incremental payment schedule (36), (37) can be used to implement a long-term cooperative agreement in an extensive-form game with chance moves.

Conclusions
In the paper we purposes to design a mechanism of the players' sustainable long-term cooperation that satisfies a number of good properties. To this aim we formalised the players' rank based algorithm for selecting a unique optimal bundle of cooperative trajectories, and proved that corresponding cooperative strategy profile satisfies time consistency. To calculate γ-characteristic function one need to have a specific method for constructing a unique (subgame perfect) equilibrium at any extensive-form game with chance moves. Hence, we formalised a backwards induction procedure refinement based on the players' attitude vectors-the so-called attitude SPE algorithm.
As a result of reexamination of the "IDP time consistency in the whole game" concept, we suggest to adopt the concept of subgame consistency, introduced in Reference [36] for differential stochastic games and then extend it to dynamic stochastic games in References [37,38]. The definition of subgame consistency for extensive-form game with chance moves is provided. This property takes into account such an interesting feature of the games under consideration that when the players make a decision in the subgame Γ x t after the chance move occurs, they need to recalculate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. It is worth noting that a similar approach, based on the IDP subgame consistency notion could be applied to dynamic games played over event trees ([14,16,20]). We proved that the incremental IDP specified for multistage games with chance moves in Reference [9] satisfy subgame consistency and subgame efficiency as well as the strict balance condition.
It follows from Propositions 1-3 that two specified algorithms combined with the γ-characteristic function, and the incremental payment schedule together constitute a mechanism of the players' sustainable cooperation that satisfies a number of good properties and could be used in extensive-form games with chance moves. Note that the main result of the paper-Proposition 3-does not depend on the specific method which the players employ to calculate the characteristic function as well as on the specific single-valued cooperative solution meeting (25) and (26).
Since this is the first time that subgame consistent solutions are examined for extensive-form games with chance moves, further research along this line is expected. It is surely of interest to develop appropriate software application to implement proposed algorithms in arbitrary extensive-form game with chance moves. Possibly, one can use the so-called Game Theory Explorer [30] when developing such software tools for 2-person extensive games. Further, it might be interesting to run experiments with large-scale datasets, after the software application that allows to construct unique SPE, the optimal bundle of cooperative trajectories, γ-characteristic function, and so forth, will be developed.
Let us notice some preliminary suggestions on how one can use such software application to run simulations. First, one can vary the main parameter-the length of the game tree, and the additional parameters such as the game structure, the players' payoffs, probabilities of transitions, and so forth, to obtain practical estimations of the proposed algorithms complexity and scalability. Secondarily, one can generate external disturbances of the stage payoffs and probabilities and vary the players' attitude vectors to carry out the sensitivity analysis of the proposed non-cooperative and cooperative solutions. Further, it is of interest to get experimental estimations of the price of anarchy and the price of stability for the class of games under consideration. Finally, one can use such software application to check whether the additional properties (non-negativity, irrational-behavior-proof conditions, etc.) of the proposed incremental IDP and other payment schedules (see, e.g., Reference [15]) are satisfied for given extensive-form game with chance moves.