Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves

Kuzyutin, Denis; Smirnova, Nadezhda

doi:10.3390/math8071061

Open AccessArticle

Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves

by

Denis Kuzyutin

^1,2

and

Nadezhda Smirnova

^1,2,*

¹

Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya nab. 7/9, 199034 St. Petersburg, Russia

²

St. Petersburg School of Mathematics, Physics and Computer Science, National Research University Higher School of Economics (HSE), Soyuza Pechatnikov ul. 16, 190008 St. Petersburg, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(7), 1061; https://doi.org/10.3390/math8071061

Submission received: 28 May 2020 / Revised: 21 June 2020 / Accepted: 23 June 2020 / Published: 1 July 2020

(This article belongs to the Special Issue Game Theory)

Download

Browse Figures

Versions Notes

Abstract

We design a mechanism of the players’ sustainable cooperation in multistage n-person game in the extensive form with chance moves. When the players agreed to cooperate in a dynamic game they have to ensure time consistency of the long-term cooperative agreement. We provide the players’ rank based (PRB) algorithm for choosing a unique cooperative strategy profile and prove that corresponding optimal bundle of cooperative strategies satisfies time consistency, that is, at every subgame along the optimal game evolution a part of each original cooperative trajectory belongs to the subgame optimal bundle. We propose a refinement of the backwards induction procedure based on the players’ attitude vectors to find a unique subgame perfect equilibrium and use this algorithm to calculate a characteristic function. Finally, to ensure the sustainability of the cooperative agreement in a multistage game we employ the imputation distribution procedure (IDP) based approach, that is, we design an appropriate payment schedule to redistribute each player’s optimal payoff along the optimal bundle of cooperative trajectories. We extend the subgame consistency notion to extensive-form games with chance moves and prove that incremental IDP satisfies subgame consistency, subgame efficiency and balance condition. An example of a 3-person multistage game is provided to illustrate the proposed cooperation mechanism.

Keywords:

time consistency; multistage game; chance moves; subgame perfect equilibria; cooperative trajectory; imputation distribution procedure

1. Introduction

In a dynamic n-person game the players first choose their “optimal” strategies at the initial position

x_{0}

(which form the optimal strategy profile for the whole game), and then have an option to change their strategies at any intermediate position

x_{t}

and switch to other strategies if these strategies constitute the locally optimal strategy profile for the subgame starting at

x_{t}

. The time consistency property (first introduced in References [1,2,3] for differential games) ensures that the players will not have an incentive to change their strategies at any subgame along the optimal game evolution, and hence plays an important role in the designing of the optimal players’ behavior in non-cooperative and cooperative dynamic games (see, e.g., References [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21], for details).

We consider an n-person finite multistage games in the extensive form (see, e.g., References [5,17,22,23]) with perfect information and with chance moves. Note that much research has been already done on time consistent solutions (or close concepts) in extensive-form games (see, e.g., References [4,6,13,17,21]). Time consistency concept was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (without chance moves) in References [7,8,10,11,15]. The property of "time consistency in the whole game" was extended to multicriteria extensive-form cooperative games with chance moves in Reference [9] (note that in these games an optimal pure strategy profile does not generate the unique optimal trajectory in the game tree but rather the whole optimal bundle of the trajectories).

In the paper, we mainly focus on the dynamic aspects of cooperation in a dynamic extensive-form game with chance moves, and propose to design a mechanism of the players’ sustainable cooperation which satisfies three properties. First, a fragment of each cooperative trajectory from the optimal bundle for the original game

Γ^{x_{0}}

should “remain optimal” at each subgame

Γ^{x_{t}}

along the cooperative game evolution, that is, it should belong to the subgame optimal bundle of cooperative trajectories. Secondarily, a cooperative payoff-to-go at the subgame

Γ^{x_{t}}

is no less than the non-cooperative payoff-to-go for all players. Finally, when the players re-evaluate their expected cooperative payoff after each passed chance move, they have no incentive to change original cooperative agreement.

To this aim, we first need to provide a rule for choosing a unique cooperative strategy profile as well as the unique optimal bundle of cooperative trajectories. We introduce the Players’ Rank Based (PRB) algorithm and prove that this algorithm generates the unique optimal bundle of cooperative trajectories which satisfies time consistency. Note that a rather close approach—the so-called Refined Leximin (RL) algorithm—was introduced recently in Reference [8]. Let us notice the main differences of these two algorithms. The RL algorithm is applicable for multicriteria game without chance moves and is based on the ranking of the criteria, while the PRB algorithm is designed for single-criterion extensive-form game with chance moves and employed the players’ ranks. Further, the RL algorithm allows to choose a unique cooperative trajectory while the PRB algorithm generates the unique optimal bundle of the cooperative trajectories in the game tree. To the best of the authors’ knowledge, other approaches to choosing an optimal bundle of the cooperative trajectories in extensive-form game with chance moves have not been considered yet.

Then, to construct a characteristic function (which describes the worth of each coalition in cooperative game) we use an equilibrium-based approach, namely the

γ

-characteristic function introduced in Reference [24]. Hence, the players have to accept a specific method for choosing a unique Subgame Perfect Equilibria (SPE) [25] in an extensive-form game with chance moves. To solve this problem we provide the novel refinement of the backwards induction procedure (see, e.g., References [5,17,23])—the so-called Attitude SPE algorithm. A similar approach to construct a unique SPE in extensive-form game with perfect information was explored in References [17,26,27] and was called the Type Equilibrium (TE) algorithm. Both algorithms are the refinements of the general backwards induction procedure that take into account the attitudes of each player towards other players. Let us point out the main differences of these algorithms. The TE algorithm is applicable for the game without chance moves and for the case when the payoffs are only determined in terminal nodes. In addition, the TE algorithm allows to construct SPE that is “unique” in the sense of payoffs (i.e., there may exist several optimal trajectories which generate the same equilibrium payoffs) while the Attitude SPE algorithm allows to choose unique SPE strategy profile as well as unique bundle of trajectories. Another rather close approach to find a unique SPE—the so-called Indifferent Equilibrium (IE) algorithm—was introduced in Reference [28]. Again, the IE algorithm is applicable only for the game without chance moves and for the partial case when the payoffs are determined in terminal nodes. Moreover, IE algorithm in general allows to construct a SPE in behavior strategies while the proposed Attitude SPE algorithm always generates a SPE in pure strategies.

It is worth noting, that other approaches to analyze an extensive-form game, except for the backwards induction procedure and its refinements mentioned above, imply that the researcher first needs to obtain a strategic representation of the original extensive game and then analyzes this strategic (or normal-form) game (see, e.g., References [29,30,31] ). For instance, the software tool “Game Theory Explorer” [29] is based on the strategic-form representation and then applying the modified Lemke-Howson algorithm [32] to find all Nash equilibria. The majority of existing algorithms are developed to find Nash equilibria in mixed strategies for 2-person games and do not allow to construct SPE in pure strategies. Moreover, as it was noted in Reference [31], in general the strategic-form representation is exponential in the size of the original game tree. In contrast, the proposed Attitude SPE algorithm is a rather simple recursive algorithm which deals with n-person extensive-form game (with perfect information) itself and allows to compute a unique SPE in pure strategies.

After computing the

γ

-characteristic function we suppose that the players adopt some single-valued cooperative solution

φ

(for instance, the Shapley value [33], the nucleolus [34], etc.) which satisfies the individual and collective rationality property. Finally, to guarantee the sustainability of the achieved long-term cooperative agreement we employ the Imputation Distribution Procedure (IDP) based approach (see, e.g., References [3,12,14,16,17,18,20,35]), that is, a payment schedule to redistribute the ith player’s expected cooperative payoff along the optimal bundle of cooperative trajectories. In this paper, we mainly focus on the following good properties an IDP may satisfy: subgame efficiency, strict balance condition [10,15,17] and an appropriate refinement of the time consistency property, called subgame consistency. The point is that the “time consistency in the whole game” property [9,14,16,20] is based on an a priori assessment of the ith player’s expected optimal payoff (before the game

Γ^{x_{0}}

starts). However, when the players make a decision in the subgame

Γ^{x_{t}}

after the chance move occurs, they need to re-estimate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. To deal with this interesting feature of the game with chance moves we adopt the notion of subgame consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38].

Since we derive a suitable definition of subgame consistency for other class of games, the proposed Definition 6 differs from ones provided in References [37,38] but captures the same idea. Let us point out the main differences with References [37,38]. While D. Yeung and L. Petrosyan do not consider the issue of multiple equilibria and study the stochastic games in which there exists a unique Nash equilibrium in each subgame, we focus on the problem of how to select a unique (subgame perfect) Nash equilibrium in extensive-form game with chance moves and derive the corresponding algorithm. Secondarily, the characteristic function has not been constructed in References [37,38] and, hence, the players are restricted to using the simplest cooperative solutions (for instance, they may share equally the excess of the total expected cooperative payoff over the expected sum of individual non-cooperative payoffs), whereas we provide a method for calculating the

γ

-characteristic function. Hence, the players may use different solution concepts based on the characteristic function approach. Finally, it turns out that the incremental IDP specified for extensive-form games with chance moves in Reference [9] satisfies not only the subgame consistency but also subgame efficiency and strict balance condition.

Therefore, the suggested PRB algorithm, the Attitude SPE algorithm combined with the

γ

-characteristic function, and the incremental payment schedule for any single-valued cooperative solution (meeting individual and collective rationality) together constitute a required mechanism of the players’ sustainable cooperation that satisfies exactly three properties mentioned above for any extensive-form game with chance moves.

It is worth noting that the extensive-form games, as well as dynamic games played over event trees, differential games and multistage games with discrete dynamics are used to model various real-world situations where several decision makers (or players) with different objectives may cooperate (see, e.g., References [5,12,14,16,17,20,39,40,41,42,43,44]. Hence, a proposed approach to implement a long-term cooperative agreement may have a number of possible applications.

The rest of the paper is organized as follows: Section 2 recalls the main ingredients of the class of games of interest. In Section 3, we specify the attitude SPE algorithm that allows constructing a unique SPE in a extensive-form game with chance moves. In Section 4, we provide the PRB algorithm and prove that the optimal bundle of cooperative trajectories generated by this algorithm satisfy time consistency. Section 5 reveals a drawback of the IDP “time consistency in the whole game” property and presents a subgame consistency definition that is applicable for extensive-form games with chance moves. We prove that incremental IDP satisfies a number of good properties and consider an example of a 3-person multistage game with chance moves to illustrate the incremental IDP implementation. Section 6 provides a brief review of the results and discussion.

2. Extensive-Form Game with Chance Moves

We consider a finite multistage game in extensive form following References [6,13,17,22,23]. First we need to define the basic notations and briefly remind some properties of extensive-form game that will be used in the sequel:

$N = {1, \dots, n}$ is the set of all players.
K is the game tree with the root $x_{0}$ and the set of all nodes P.
$S (x)$ is the set of all direct successors (descendants) of the node x, and $S^{- 1} (y)$ is the unique predecessor (parent) of the node $y \neq x_{0}$ such that $y \in S (S^{- 1} (y))$ .
$P_{i}$ is the set of all decision nodes of the ith player (at these nodes the player i chooses the following node), $P_{i} \cap P_{j} = \emptyset$ , for all $i, j \in N$ , $i \neq j$ .
$P_{n + 1} = {z^{j}}_{j = 1}^{m}$ denotes the set of all terminal nodes (final positions), $S (z^{j}) = \emptyset$ $\forall z^{j} \in P_{n + 1}$ .
$P_{0}$ is the set of all nodes at which a chance moves, where $π (y | x) > 0$ denotes the probability of transition from node $x \in P_{0}$ to node $y \in S (x)$ . We suppose that for each $x \in P_{0}$ it holds that $S (x) \cap P_{0} = \emptyset$ . Lastly, $⋃_{i = 0}^{n + 1} P_{i} = P$ .
$ω = (x_{0}, \dots, x_{t - 1}, x_{t}, \dots, x_{T})$ is the trajectory (or the path) in the game tree, $x_{t - 1} = S^{- 1} (x_{t}), 1 \leq t \leq T$ , $x_{T} = z^{j} \in P_{n + 1}$ ; where index t in $x_{t}$ denotes the ordinal number of this node within the trajectory $ω$ and can be interpreted as the "time index".
$h_{i} (x) = (h_{i / 1} (x), \dots, h_{i / r} (x))$ is the payoff of the ith player at the node $x \in P$ . We assume that for all $i \in N$ , $k = 1, \dots, r$ , and $x \in P$ the payoffs are non-negative, that is, $h_{i / k} (x) ⩾ 0$ .

In the following, we will use

G^{c m} (n)

to denote the class of all finite multistage n-person games with chance moves in extensive form defined above, where

Γ^{x_{0}} \in G^{c m} (n)

denotes a game with root

x_{0}

. Note that

Γ^{x_{0}}

is an extensive-form game with perfect information (see, e.g., References [17,22,23] for details).

Since all the solutions we are interested in throughout the paper are attainable when the players restrict themselves to the class of pure strategies we will focus on this class of strategies. The pure strategy

u_{i} (\cdot)

of the ith player is a function with domain

P_{i}

that specifies for each node

x \in P_{i}

the next node

u_{i} (x) \in S (x)

which the player i has to choose at x. Let

U_{i}

denote the (finite) set of all ith player’s pure strategies,

U = \prod_{i \in N} U_{i}

.

Denote by

p (y | x, u)

the conditional probability that node

y \in S (x)

is reached if node x has been already reached (the probability of transition from x to y) while the players use the strategies

u_{i}, i \in N

. Note that for all

x \in P_{i}

,

i = 1, \dots, n

, and for all

y \in S (x)

p (y | x, u) = 1

if

u_{i} (x) = y

and

p (y | x, u) = 0

if

u_{i} (x) \neq y

. For chance moves, that is, if

x \in P_{0}

p (y | x, u) = π (y | x)

for all

y \in S (x)

for each

u \in U

.

Then one can calculate the probability

p (ω, u)

of realization of the trajectory

ω = (x_{0}, \dots, x_{τ}, x_{τ + 1}, \dots, x_{T})

,

x_{T} \in P_{n + 1}

,

x_{τ + 1} \in S (x_{τ})

,

τ = 0, \dots, T - 1

, when the players use the strategies

u_{i}

from the strategy profile

u = (u_{1}, \dots, u_{n})

.

p (ω, u) = p (x_{1} | x_{0}, u) \cdot p (x_{2} | x_{1}, u) \cdot \dots \cdot p (x_{T} | x_{T - 1}, u) = \prod_{τ = 0}^{T - 1} p (x_{τ + 1} | x_{τ}, u) .

(1)

Denote by

Ω (u) = {ω_{k} (u) | p (ω_{k}, u) > 0}

the finite set (or the bundle) of the trajectories

ω_{k}

which are generated by strategy profile

u \in U

. Note that for all

ω_{k} (u) \in Ω (u)

,

u_{j} (x_{τ}) = x_{τ + 1}

for all

x_{τ} \in ω_{k} (u) \cap P_{j}

,

j \in N

,

0 \leq τ \leq T - 1

.

Let

{\tilde{h}}_{i} (ω) = \sum_{τ = 0}^{T} h_{i} (x_{τ})

denote the ith player’s vector payoff corresponding to the trajectory

ω = (x_{0}, \dots, x_{t}, x_{t + 1}, \dots, x_{T})

.

Denote by

H_{i} (u) = \sum_{ω_{k} \in Ω (u)} p (ω_{k}, u) \cdot {\tilde{h}}_{i} (ω_{k}) = \sum_{ω_{k} \in Ω (u)} p (ω_{k}, u) \cdot \sum_{τ = 0}^{T (k)} h_{i} (x_{τ})

(2)

the (expected) value of the ith player’s payoff function which corresponds to the strategy profile

u = (u_{1}, \dots, u_{n})

. Let

Ω_{n + 1} (u) = {Ω (u) \cap P_{n + 1}}

denote the set of all terminal nodes of the trajectories

ω_{k} (u) \in Ω (u)

.

Remark 1

([9]). If the pure strategy profiles u and v generate different bundles

Ω (u)

and

Ω (v)

of the trajectories, that is,

Ω (u) \neq Ω (v)

, then

Ω_{n + 1} (u) \cap Ω_{n + 1} (v) = \emptyset

.

According to References [17,22,23] each intermediate node

x_{t} \in P \ P_{n + 1}

generates a subgame

Γ^{x_{t}}

with the subgame tree

K^{x_{t}}

and the subgame root

x_{t}

as well as a factor-game

Γ^{D}

with the factor-game tree

K^{D} = (K \ K^{x_{t}}) \cup {x_{t}}

. Decomposition of the original extensive game

Γ^{x_{t}}

at node

x_{t}

onto the subgame

Γ^{x_{t}}

and the factor-game

Γ^{D}

generates the corresponding decomposition of the pure (and mixed) strategies (see References [17,22] for details).

Let

P_{i}^{x_{t}} (P_{i}^{D})

,

i \in N

, denote the restriction of

P_{i}

on the subgame tree

K^{x_{t}} (K^{D})

, and

u_{i}^{x_{t}} (u_{i}^{D})

,

i \in N

, denote the restriction of the ith player’s pure strategy

u_{i} (\cdot)

in

Γ^{x_{0}}

on

P_{i}^{x_{t}} (P_{i}^{D})

. The pure strategy profile

u^{x_{t}} = (u_{1}^{x_{t}}, \dots, u_{n}^{x_{t}})

generates the bundle of the subgame trajectories

Ω^{x_{t}} (u^{x_{t}}) = {ω_{k}^{x_{t}} (u^{x_{t}}) | p (ω_{k}^{x_{t}}, u^{x_{t}})

> 0} .

Similarly to (2), let us denote by

H_{i}^{x_{t}} (u^{x_{t}}) = \sum_{ω_{k}^{x_{t}} \in Ω^{x_{t}} (u^{x_{t}})} p (ω_{k}^{x_{t}}, u^{x_{t}}) \cdot \sum_{τ = t}^{T (k)} h_{i} (x_{τ}) = \sum_{ω_{k}^{x_{t}} \in Ω^{x_{t}} (u^{x_{t}})} p (ω_{k}^{x_{t}}, u^{x_{t}}) \cdot {\tilde{h}}_{i} (ω_{k}^{x_{t}})

(3)

the expected value of the ith player’s payoff in

Γ^{x_{t}}

, and by

U_{i}^{x_{t}}

the set of all possible ith player’s pure strategies in the subgame

Γ^{x_{t}}

,

U^{x_{t}} = \prod_{i \in N} U_{i}^{x_{t}}

. Note that for each trajectory

ω = (x_{0}, \dots, x_{t}, x_{t + 1}, \dots, x_{T})

,

1 ⩽ t ⩽ T - 1

,

x_{T} \in P_{n + 1}

,

\begin{matrix} p (ω, u) = \prod_{τ = 0}^{t - 1} p (x_{τ + 1} | x_{τ}, u) \cdot \prod_{τ = t}^{T - 1} p (x_{τ + 1} | x_{τ}, u) = \\ = p ({\underset{̲}{ω}}^{x_{t}}, u) \cdot p (ω^{x_{t}}, u) = p ({\underset{̲}{ω}}^{x_{t}}, u^{D}) \cdot p (ω^{x_{t}}, u^{x_{t}}), \end{matrix}

(4)

where

{\underset{̲}{ω}}^{x_{t}} = (x_{0}, x_{1}, \dots, x_{t - 1}, x_{t})

denotes a fragment of trajectory

ω

implemented before the subgame

Γ^{x_{t}}

starts, and

p ({\underset{̲}{ω}}^{x_{t}}, u) = p (x_{t}, u)

denotes the probability that node

x_{t}

is reached when the players employ the strategies

u_{i}

,

i \in N

. It is worth noting that factor-game

Γ^{D} = Γ^{D} (u^{x_{t}})

is usually defined for given strategy profile

u^{x_{t}}

in the subgame

Γ^{x_{t}}

since we assume that

h_{i}^{D} (x_{0}, x_{1}, \dots, x_{t - 1}, x_{t}) = \sum_{τ = 0}^{t - 1} h_{i} (x_{τ}) + H_{i}^{x_{t}} (u^{x_{t}}) = {\tilde{h}}_{i} ({\underset{̲}{ω}}^{x_{t}} \ {x_{t}}) + H_{i}^{x_{t}} (u^{x_{t}})

(5)

(see, e.g., References [17,22] for details). Moreover, given intermediate node

x_{t}

, the bundle

Ω (u) = {ω_{k} (u) | p (ω_{k}, u) > 0}

can be divided in two subsets, that is,

Ω (u) = {Ψ_{m}} \cup {χ_{l}},

where

x_{t} \in Ψ_{m}

, and

x_{t} \notin χ_{l}

,

{Ψ_{m}} \cap {χ_{l}} = \emptyset

. Then, taking (1), (3), (4) and (5) into account, we get

\begin{matrix} H_{i} (u) = \sum_{m} p (Ψ_{m}, u) \cdot {\tilde{h}}_{i} (Ψ_{m}) + \sum_{l} p (χ_{l}, u) \cdot {\tilde{h}}_{i} (χ_{l}) = \\ = \sum_{m} p (x_{t}, u) \cdot p (Ψ_{m}^{x_{t}}, u^{x_{t}}) \cdot [{\tilde{h}}_{i} ({\underset{̲}{Ψ}}_{m}^{x_{t}} \ {x_{t}}) + {\tilde{h}}_{i} (Ψ_{m}^{x_{t}})] + \\ + \sum_{l} p (χ_{l}, u) \cdot {\tilde{h}}_{i} (χ_{l}) = p (x_{t}, u^{D}) \cdot {\tilde{h}}_{i} (x_{0}, \dots, x_{t - 1}) \cdot \sum_{m} p (Ψ_{m}^{x_{t}}, u^{x_{t}}) + \\ + p (x_{t}, u^{D}) \cdot \sum_{m} p (Ψ_{m}^{x_{t}}, u^{x_{t}}) \cdot {\tilde{h}}_{i} (Ψ_{m}^{x_{t}}) + \sum_{l} p (χ_{l}, u) \cdot {\tilde{h}}_{i} (χ_{l}) = \\ = p (x_{t}, u^{D}) \cdot {\tilde{h}}_{i} (x_{0}, \dots, x_{t - 1}) + p (x_{t}, u^{D}) \cdot H_{i}^{x_{t}} (u^{x_{t}}) + \\ + \sum_{l} p (χ_{l}, u) \cdot {\tilde{h}}_{i} (χ_{l}) = p (x_{t}, u^{D}) \cdot h_{i}^{D} (x_{0}, \dots, x_{t}) + \sum_{l} p (χ_{l}, u) \cdot {\tilde{h}}_{i} (χ_{l}) . \end{matrix}

(6)

Note that, since

P_{i} = P_{i}^{x_{t}} \cup P_{i}^{D}

, one can compose the ith player’s pure strategy

W_{i} = (u_{i}^{D}, v_{i}^{x_{t}}) \in U_{i}

in the original game

Γ^{x_{0}}

from her strategies

v_{i}^{x_{t}} \in U_{i}^{x_{t}}

in the subgame

Γ^{x_{t}}

and

u_{i}^{D} \in U_{i}^{D}

in the factor-game

Γ^{D}

[17,22].

3. Refined Backwards Induction Procedure to Construct a Unique SPE

Definition 1

([45]). A strategy profile

u = (u_{1}, u_{2}, \dots, u_{n})

is a Nash Equilibrium (NE) in

Γ^{x_{0}} \in G^{c m} (n)

, if

H_{i} (v_{i}, u_{- i}) ⩽ H_{i} (u_{i}, u_{- i}), \forall v_{i} \in U_{i}, \forall i \in N .

Let

N E (Γ^{x_{0}})

denote the set of all pure strategy Nash equilibria in

Γ^{x_{0}}

.

Definition 2

([25]). A strategy profile u is a subgame perfect (Nash) equilibrium (SPE) in

Γ^{x_{0}} \in G^{c m} (n)

, if

\forall x \in P \ P_{n + 1}

it holds that

u^{x} \in N E (Γ^{x})

, i. e. the restriction of u on each subgame

Γ^{x}

forms a

N E

in this subgame.

To construct SPE in an extensive-form game with perfect information one may employ a so-called backwards induction procedure (see, e.g., References [12,17,22,23,46,47]).

However, the backwards induction procedure may generate multiple subgame perfect equilibriums in an extensive form game with different payoffs to the players (see, e.g., References [5,12,17,23]). To choose a unique SPE and unique corresponding bundle of trajectories we use an approach based on the players’ attitude vectors. Namely, let the ith player’s attitude vector

F_{i} = {f_{i} (1), \dots, f_{i} (n)}

be a permutation of numbers

{1, \dots, n}

meeting the condition

f_{i} (i) = 1

. If

f_{i} (j) = k

one may interpret the player j as an "ith player’s associate of level k".

In the paper we will use these attitude vectors when constructing SPE via backwards induction procedure in the following way. Let

x \in P_{i}

,

H_{i}^{y} ({\underset{̲}{u}}^{y})

denote the ith player’s expected payoff in the subgame

Γ^{y}

,

y \in S (x)

while

{\underset{̲}{u}}^{y}

be a SPE in this subgame. Assume that there exist multiple nodes

y_{1}, \dots, y_{q}

such that

h_{i} (y_{1}) + H_{i}^{y_{1}} ({\underset{̲}{u}}^{y_{1}}) = \dots = h_{i} (y_{q}) + H_{i}^{y_{q}} ({\underset{̲}{u}}^{y_{q}})

, that is, player i is indifferent to the choice of particular node

\bar{y}

from

{y_{1}, \dots, y_{q}}

while the ith player’s choice may affect the other players’ payoffs. If

f_{i} (j) = 2

, suppose that the ith player aims to maximize firstly the jth player’s expected payoff

H_{j}^{y} ({\underset{̲}{u}}^{y})

when choosing a unique node y from

y_{1}, \dots, y_{q}

. If again there are several nodes y with the same value

H_{j}^{y} ({\underset{̲}{u}}^{y})

the ith player purposes to maximize secondarily the expected payoff

H_{l}^{y} ({\underset{̲}{u}}^{y})

of such player l that

f_{i} (l) = 3

, and so on. Note that similar approach to construct a unique SPE in extensive-form game with perfect information but without chance moves was explored in References [17,26,27] for the case when the payoffs are only determined in terminal nodes.

Now let us provide a rigorous specification of this backwards induction procedure refinement which we will refer to as the Attitude SPE or A-SPE algorithm.

Attitude SPE algorithm. Suppose that the players attitude vectors

F_{1}, F_{2}, \dots, F_{n}

are of common knowledge, i. e. each player knows these vectors, and all the players are aware of it. Let the length of the trajectory

ω = (x_{0}, \dots, x_{t}, x_{t + 1}, \dots, x_{T})

equals to

T - 1

, and the multistage game

Γ^{x_{0}}

length equals to the maximal length of the trajectory

ω

in

Γ^{x_{0}}

. We’ll construct the unique subgame perfect equilibrium

\underset{̲}{u} = ({\underset{̲}{u}}_{1}, \dots, {\underset{̲}{u}}_{n})

in

Γ^{x_{0}}

by induction in the length L of the subgame

Γ^{x}

.

Step

L = 1

: Consider a subgame

Γ^{x}

of the length

L = 1

. If

x \in P_{i}

,

i = 1, \dots, n

, we have two cases.

Case 1:: there exists a unique $z^{k} \in S (x) = P_{n + 1}^{x}$ such that $h_{i} (z^{k}) = max_{z \in S (x)} h_{i} (z) .$ Then suppose that ${\underset{̲}{u}}_{i} (x) = z^{k}, p (z^{k} | x, \underset{̲}{u}) = 1, p (z | x, \underset{̲}{u}) = 0 \forall z \in S (x) \ {z^{k}} .$
Case 2:: there exist $q > 1$ nodes $z^{k_{q}} \in S (x) = P_{n + 1}^{x}$ such that $h_{i} (z^{k_{1}}) = h_{i} (z^{k_{2}}) = \dots = h_{i} (z^{k_{q}}) = max_{z \in S (x)} h_{i} (z) .$ Then suppose that the ith player chooses the terminal position $z^{k} \in {z^{k_{1}}, \dots, z^{k_{q}}} = S^{i, 1} (x)$ such that

$h_{j} (z^{k}) = max_{z \in S^{i, 1} (x)} h_{j} (z), where f_{i} (j) = 2 .$

(7)

Let $S^{i, 2} (x)$ denote the set of all nodes $z^{k} \in S^{i, 1} (x)$ meeting (7). If $S^{i, 2} (x)$ consists of unique node $z^{k}$ then ${\underset{̲}{u}}_{i} (x) = z^{k}$ , $p (z^{k} | x, \underset{̲}{u}) = 1$ , $p (z | x, \underset{̲}{u}) = 0$ $\forall z \in S (x) \ {z^{k}}$ . Otherwise, suppose that the ith player chooses terminal node $z^{k} \in S^{i, 2} (x)$ such that

$h_{l} (z^{k}) = max_{z \in S^{i, 2} (x)} h_{l} (z), where f_{i} (l) = 3 .$

(8)

Let $S^{i, 3} (x)$ denote the set of all final nodes $z^{k} \in S^{i, 2} (x)$ satisfying (8), and so on.
⋮
Finally, if $S^{i, n} (x)$ contains unique node $z^{k}$ , then ${\underset{̲}{u}}_{i} (x) = z^{k}$ , $p (z^{k} | x, \underset{̲}{u}) = 1$ , $p (z | x, \underset{̲}{u}) = 0$ $\forall z \in S (x) \ {z^{k}}$ . Otherwise, suppose that player i chooses the final node $z^{k}$ from $S^{i, n} (x)$ with minimal ordinal number k.

Note that for all cases

H_{j} (\underset{̲}{u}) = h_{j} (z^{k})

,

j \in N

.

If

x \in P_{0}

then

S (x) = P_{n + 1}^{x}

and we do not need to define a strategy of any player at x, while

H_{j} (\underset{̲}{u}) = \sum_{z^{k} \in S (x)} π (z^{k} | x) \cdot h_{j} (z^{k}) .

Hence, the players’ behavior

{\underset{̲}{u}}^{x} = ({\underset{̲}{u}}_{1}^{x}, \dots, {\underset{̲}{u}}_{n}^{x}) \in N E (Γ^{x})

and the expected payoffs

H_{j}^{x} ({\underset{̲}{u}}^{x}), j \in N

are defined for all subgames

Γ^{x}

of the length 1. In addition, for games

Γ^{y}

,

y \in P_{n + 1}

of length

L = 0

we assume that

H_{i}^{y} ({\underset{̲}{u}}^{y}) = h_{i} (y)

,

i \in N

.

Step

2, \dots, L - 1

: Suppose that at each subgame

Γ^{y}

of the length

(L - 1)

or less the unique SPE

{\underset{̲}{u}}^{y} = ({\underset{̲}{u}}_{1}^{y}, \dots, {\underset{̲}{u}}_{n}^{y})

has been already constructed (“inductive assumption”), and

H_{i}^{y} ({\underset{̲}{u}}^{y})

,

i \in N

, is the corresponding vector of all the players’ payoffs.

Step

L

: Consider the game

Γ^{x_{0}}

of the length

L ⩾ 1

. Note that for all

y \in S (x_{0})

the length of the subgame

Γ^{y}

is less than L. If

x_{0} \in P_{0}

then

H_{j} (\underset{̲}{u}) = \sum_{y \in S (x_{0})} π (y | x_{0}) \cdot (h_{j} (y) + H_{j}^{y} ({\underset{̲}{u}}^{y})) ⩾ \sum_{y \in S (x_{0})} π (y | x_{0}) \cdot (h_{j} (y) + H_{j}^{y} (u_{j}^{y}, {\underset{̲}{u}}_{- j}^{y})) = H_{j} (u_{j}^{y}, {\underset{̲}{u}}_{- j}^{y})

(9)

for all

u_{j} = u_{j}^{y} \in U_{j}^{y} = U_{j}

since

{\underset{̲}{u}}^{y} \in N E (Γ^{y})

due to induction assumption, and each player

j \in N

can deviate from

{\underset{̲}{u}}_{j}

only in the subgames

Γ^{y}

,

y \in S (x_{0})

.

If

x_{0} \in P_{i}

for some

i \in N

, we have two cases.

Case 1:: there exists a unique $\bar{y} \in S (x_{0})$ such that

$h_{i} (\bar{y}) + H_{i}^{\bar{y}} ({\underset{̲}{u}}^{\bar{y}}) = max_{y \in S (x_{0})} (h_{i} (y) + H_{i}^{y} ({\underset{̲}{u}}^{y})) .$

(10)

Then we suppose that ${\underset{̲}{u}}_{i} (x_{0}) = \bar{y}; {\underset{̲}{u}}_{j} (x) = {\underset{̲}{u}}_{j}^{y} (x)$ if $x \in P_{j} \cap K^{y}, y \in S (x_{0})$ , $j = 1, \dots, n$ .
Case 2:: there exist $q > 1$ nodes ${\bar{y}}_{1}, \dots, {\bar{y}}_{q} \in S (x_{0})$ such that

$h_{i} ({\bar{y}}_{1}) + H_{i}^{{\bar{y}}_{1}} ({\underset{̲}{u}}^{{\bar{y}}_{1}}) = \dots = h_{i} ({\bar{y}}_{q}) + H_{i}^{{\bar{y}}_{q}} ({\underset{̲}{u}}^{{\bar{y}}_{q}}) = max_{y \in S (x_{0})} (h_{i} (y) + H_{i}^{y} ({\underset{̲}{u}}^{y})) .$

(11)

Then we suppose that the ith player chooses

\bar{y} \in {{\bar{y}}_{1}, \dots, {\bar{y}}_{q}} = S^{i, 1} (x_{0})

such that

h_{j} (\bar{y}) + H_{j}^{\bar{y}} ({\bar{u}}^{\bar{y}}) = max_{y \in S^{i, 1} (x_{0})} (h_{j} (y) + H_{j}^{y} ({\underset{̲}{u}}^{y})), where f_{i} (j) = 2 .

(12)

Let

S^{i, 2} (x_{0})

denote the set of all nodes

\bar{y} \in S^{i, 1} (x_{0})

satisfying (12). If

S^{i, 2} (x_{0})

consists of unique node

\bar{y}

then we suppose that

{\underset{̲}{u}}_{i} (x_{0}) = \bar{y}

;

{\underset{̲}{u}}_{j} (x) = {\underset{̲}{u}}_{j}^{y} (x)

if

x \in P_{j} \cap K^{y}

,

y \in S (x_{0})

,

j = 1, \dots, n

. Otherwise, suppose that the ith player chooses node

\bar{y} \in S^{i, 2} (x_{0})

such that

h_{l} (\bar{y}) + H_{l}^{\bar{y}} ({\bar{u}}^{\bar{y}}) = max_{y \in S^{i, 2} (x_{0})} (h_{l} (y) + H_{l}^{y} ({\underset{̲}{u}}^{y})), where f_{i} (l) = 3 .

(13)

Let

S^{i, 3} (x_{0})

denote the set of all nodes

\bar{y} \in S^{i, 2} (x_{0})

meeting (13), and so on.

⋮

Finally, if

S^{i, n} (x_{0})

contains several nodes

{\bar{y}}_{m}

, denote by

\underset{̲}{l} = min_{{\bar{y}}_{m} \in S^{i, n} (x_{0})} {l | z^{l} \in P_{n + 1}^{{\bar{y}}_{m}} \cap Ω ({\underset{̲}{u}}^{{\bar{y}}_{m}})}

the minimal number of terminal nodes of the trajectories generated by subgame perfect equilibriums

{\underset{̲}{u}}^{{\bar{y}}_{m}}

in the subgames

Γ^{{\bar{y}}_{m}}

,

{\bar{y}}_{m} \in S^{i, n} (x_{0})

(see Remark 1). Note that there exists unique trajectory

ω = (x_{0}, \dots, z^{\underset{̲}{l}})

from

x_{0}

to

z^{\underset{̲}{l}}

in the game

Γ^{x_{0}}

, and let

\bar{y} = ω \cap S^{i, n} (x_{0})

. Again, we suppose that

{\underset{̲}{u}}_{i} (x_{0}) = \bar{y}

;

{\underset{̲}{u}}_{j} (x) = {\underset{̲}{u}}_{j}^{y} (x)

if

x \in P_{j} \cap K^{y}

,

y \in S (x_{0})

,

j = 1, \dots, n

.

Now we prove that for both cases no player has profitable deviation in

Γ^{x_{0}}

from the strategy profile

\underset{̲}{u} = ({\underset{̲}{u}}_{1}, \dots, {\underset{̲}{u}}_{n})

constructed above.

H_{i} (\underset{̲}{u}) = h_{i} (\bar{y}) + H_{i}^{\bar{y}} ({\underset{̲}{u}}^{\bar{y}}) ⩾ h_{i} (y) + H_{i}^{y} ({\underset{̲}{u}}^{y}) ⩾ h_{i} (y) + H_{i}^{y} (u_{i}^{y}, {\underset{̲}{u}}_{- i}^{y})

(14)

for all

y \in S (x_{0})

,

u_{i}^{y} \in U_{i}^{y}

due to (10), (11) and the induction assumption that

{\underset{̲}{u}}^{y} \in N E (Γ^{y})

,

y \in S (x_{0})

.

For other players

j \in N

,

j \neq i

, we have

H_{j} (\underset{̲}{u}) = h_{j} (\bar{y}) + H_{j}^{\bar{y}} ({\underset{̲}{u}}^{\bar{y}}) ⩾ h_{j} (\bar{y}) + H_{j}^{\bar{y}} (u_{j}^{\bar{y}}, {\underset{̲}{u}}_{- j}^{\bar{y}}) = H_{j} (u_{j}, {\underset{̲}{u}}_{- j})

(15)

for all

u_{j} \in U_{j}

since

{\underset{̲}{u}}^{\bar{y}} \in N E (Γ^{\bar{y}})

, and the only deviation of player

j \in N

,

j \neq i

from

{\underset{̲}{u}}_{j}

in the subgame

Γ^{\bar{y}}

may affect the players’ payoffs.

Hence, taking (9), (14) and (15) into account we obtain by induction that the strategy profile

\underset{̲}{u} = ({\underset{̲}{u}}_{1}, \dots, {\underset{̲}{u}}_{n})

constructed above forms unique subgame perfect equilibria in

Γ^{x_{0}}

.

Proposition 1.

If the players attitude vectors

F_{1}, F_{2}, . . ., F_{n}

are of common knowledge, the Attitude SPE algorithm allows to construct a unique subgame perfect equilibrium

\underset{̲}{u} = ({\underset{̲}{u}}_{1}, \dots, {\underset{̲}{u}}_{n})

in pure strategies for any extensive-form game

Γ^{x_{0}} \in G^{c m} (n)

with chance moves as well as a unique bundle of trajectories

Ω (\underset{̲}{u})

.

It is worth noting than the existence of (subgame perfect) pure strategy equilibrium in extensive form game with perfect information and chance moves was first proved in References [46,47] for the partial case when the payoffs are only defined in terminal nodes. Hence, Proposition 1 could be considered as a corollary of these results. However, we provide a rigorous algorithm how to construct a unique

S P E

in extensive-form game with chance moves as well as a (unique) corresponding bundle of trajectories. We will use this algorithm, in particular, to calculate the characteristic function of the cooperative extensive-form game in Section 4.

Let us use the following example to demonstrate how the Attitude SPE algorithm works.

Example 1.

(A 3-player multistage game with chance moves).

Let

P_{0} = {{\bar{x}}_{1}, {\bar{x}}_{3}}

,

P_{1} = {{\bar{x}}_{0}, {\bar{x}}_{4}^{2}}

,

P_{2} = {{\bar{x}}_{2}^{1}, {\bar{x}}_{5}}

,

P_{3} = {x_{2}^{2}, x_{4}^{3}}

,

P_{n + 1} = {z_{1}, \dots, z_{10}}

. The players’ payoffs and probabilities

π (y | x)

,

x \in P_{0}

are written in the game tree.

Suppose that the players’ attitude vectors are

F_{1} = (f_{1} (1), f_{1} (2), f_{1} (3)) = (1, 3, 2)

,

F_{2} = (2, 3, 1)

and

F_{3} = (3, 1, 2)

.

When using the Attitude SPE algorithm, at each node

x \in P_{i}

,

i = 1, 2, 3

, the ith player has to choose the alternative marked in bold violet in Figure 1. Note that

H^{x_{3}} ({\underset{̲}{u}}^{x_{3}}) = (\begin{matrix} 12 \\ 0 \\ 0 \end{matrix}) + \frac{1}{6} (\begin{matrix} 0 \\ 24 \\ 0 \end{matrix}) + \frac{1}{2} (\begin{matrix} 24 \\ 0 \\ 24 \end{matrix}) + \frac{1}{3} (\begin{matrix} 0 \\ 18 \\ 12 \end{matrix}) = (\begin{matrix} 24 \\ 10 \\ 16 \end{matrix}) a n d H^{z_{2}} = h (z_{2}) = (\begin{matrix} 0 \\ 10 \\ 20 \end{matrix}) .

Hence,

S^{2, 1} (x_{2}^{1}) = {z_{2}, x_{3}}

, and

{\underset{̲}{u}}_{2} (x_{2}^{1}) = z_{2}

due to the player’s 2 attitude vector

F_{2}

.

The A-SPE algorithm generates unique SPE

\underset{̲}{u} = ({\underset{̲}{u}}_{1}, {\underset{̲}{u}}_{2}, {\underset{̲}{u}}_{3})

, where

{\underset{̲}{u}}_{1} (x_{0}) = x_{1}

,

{\underset{̲}{u}}_{1} (x_{4}^{2}) = z_{8}

;

{\underset{̲}{u}}_{2} (x_{2}^{1}) = z_{2}

,

{\underset{̲}{u}}_{2} (x_{5}) = z_{9}

;

{\underset{̲}{u}}_{3} (x_{2}^{2}) = z_{4}

,

{\underset{̲}{u}}_{3} (x_{4}^{3}) = z_{6}

, while

H (\underset{̲}{u}) = (11, 22, 18)

. We will use this SPE later in Section 4 when calculating the γ-characteristic function.

4. Cooperative Strategies and Trajectories

If the players agree to cooperate in multicriteria game

Γ^{x_{0}}

, first they are expected to maximize the total payoff

\sum_{i = 1}^{n} H_{i} (u)

of the grand coalition. Let

\bar{U} (Γ^{x_{0}})

denote the set of all pure strategy profiles u, such that

\sum_{i \in N} H_{i} (u) = max_{v \in U} \sum_{i \in N} H_{i} (v) = \bar{H} .

(16)

The set

\bar{U} (Γ^{x_{0}})

is known to be nonempty and it may contain multiple strategy profiles (see, e.g., Reference [17]). Hence, the players need to agree on a specific approach they are going to use to choose a unique optimal cooperative strategy profile

\bar{u} \in \bar{U} (Γ^{x_{0}})

as well as the corresponding optimal bundle of cooperative trajectories in the game tree. To this aim we introduce the so-calle Players’ Rank Based (PRB) algorithm. Note that rather close approach—using the ranking of the criteria to choose a unique cooperative trajectory—was proposed recently in Reference [8] for multicriteria extensive-form games without chance moves. Namely, suppose that the players have agreed on the so-called "rank" of each player within the grand coalition N, and

r (k) = i

means that the rank of player i equals k,

k = 1, . . ., n

.

Players’ rank based (PRB) algorithm.

Step: 0. Consider the set $\bar{U} (Γ^{x_{0}})$ . If all strategy profiles $u \in \bar{U} (Γ^{x_{0}})$ generate the same bundle of trajectories $Ω (u)$ (see, e.g., References [17,22,23] for discussion on a certain redundancy of the pure strategy definition in extensive game), let the players choose any strategy profile $\bar{u} \in \bar{U} (Γ^{x_{0}})$ as the cooperative strategy profile and $Ω (\bar{u})$ denote the corresponding bundle of cooperative trajectories.
Step: $k = 1$ . Otherwise, that is, if the strategy profiles from $\bar{U} (Γ^{x_{0}})$ generate different (and hence, disjoint—see Remark 1) bundles of the trajectories, calculate

$max_{v \in \bar{U} (Γ^{x_{0}})} H_{r (1)} (v) = {\bar{H}}_{r (1)} .$

Let ${\bar{U}}_{r (1)} (Γ^{x_{0}})$ denote the set of all strategy profiles u such that $H_{r (1)} (u) = {\bar{H}}_{r (1)}$ . If all strategy profiles $u \in {\bar{U}}_{r (1)} (Γ^{x_{0}})$ generate the same bundle of trajectories $Ω (u)$ , the players may choose any strategy profile $\bar{u} \in {\bar{U}}_{r (1)} (Γ^{x_{0}})$ as the cooperative strategy profile. Otherwise proceed to the next step.
Step: $k = 2$ . Consider the set ${\bar{U}}_{r (1)} (Γ^{x_{0}})$ . If all strategy profiles $u \in {\bar{U}}_{r (1)} (Γ^{x_{0}})$ generate the same bundle of trajectories $Ω (u)$ , the players may choose any strategy profile $\bar{u} \in {\bar{U}}_{r (1)} (Γ^{x_{0}})$ as the cooperative strategy profile. Otherwise, proceed to the next step.
Step: $k$ ( $k = 2, \dots, n$ ).
⋮
Step: $k + 1$ . Finally, if the strategy profiles from $u \in {\bar{U}}_{r (n) (Γ^{x_{0}})}$ generate different bundles of the trajectories, we suppose that the players choose such $\bar{u} \in {\bar{U}}_{r (n) (Γ^{x_{0}})}$ that $Ω (\bar{u}) = {ω_{m} (\bar{u}) = (x_{0}, \dots, x_{T (m)} = z_{l}) | p (ω_{m}, \bar{u}) > 0}$ contains the trajectory $ω (\bar{u})$ with minimal number l of the terminal node $z_{l}$ (see Remark 1).

Henceforth, we will refer to the strategy profile

\bar{u} \in \bar{U} (Γ^{x_{0}})

and the bundle of the trajectories

Ω (\bar{u})

as the optimal cooperative strategy profile and the optimal bundle of cooperative trajectories respectively.

In the dynamic setting it is significant that a specific method which the players agreed to accept in order to choose a unique optimal cooperative strategy profile

\bar{u} \in \bar{U} (Γ^{x_{0}})

as well as the corresponding optimal bundle of cooperative trajectories satisfies time consistency (see, e.g., References [1,2,6,13,17]), that is, a fragment of the optimal bundle of the cooperative trajectories in the subgame should remain optimal in this subgame. Suppose that at each subgame

Γ^{{\bar{x}}_{t}}

along the cooperative trajectories, that is

{\bar{x}}_{t} \in ω (\bar{u})

,

ω (\bar{u}) \in Ω (\bar{u})

, the players choose the strategy profile

u^{{\bar{x}}_{t}} \in U^{{\bar{x}}_{t}}

such that

u^{{\bar{x}}_{t}} \in \underset{v^{{\bar{x}}_{t}} \in U^{{\bar{x}}_{t}}}{arg max} \sum_{i \in N} H_{i}^{{\bar{x}}_{t}} (v^{{\bar{x}}_{t}}) .

(17)

Let

\bar{U} (Γ^{{\bar{x}}_{t}})

denote the set of all pure strategy profiles

u^{{\bar{x}}_{t}} \in U^{{\bar{x}}_{t}}

which satisfy (17) and the players use the same approach to choose a unique optimal cooperative strategy profile

{\bar{u}}^{{\bar{x}}_{t}} \in \bar{U} (Γ^{{\bar{x}}_{t}})

in the subgame as for the original game

Γ^{x_{0}}

(namely, the PRB algorithm).

Proposition 2.

A cooperative strategy profile for

Γ^{x_{0}} \in G^{c m} (n)

based on the PRB algorithm satisfies time consistency. Namely, let

\bar{u} \in U

satisfies (16), and

Ω (\bar{u})

be the optimal bundle of cooperative trajectories. Then for each subgame

Γ^{{\bar{x}}_{t}}

,

{\bar{x}}_{t} \in ω (\bar{u}) = ({\bar{x}}_{0}, \dots, {\bar{x}}_{t}, {\bar{x}}_{t + 1}, \dots, {\bar{x}}_{T}), 1 ⩽ t < T,

with

{\bar{x}}_{0} = x_{0}

,

ω (\bar{u}) \in Ω (\bar{u})

, it holds that

\sum_{i \in N} H_{i}^{{\bar{x}}_{t}} ({\bar{u}}^{{\bar{x}}_{t}}) = max_{v^{{\bar{x}}_{t}} \in U^{{\bar{x}}_{t}}} \sum_{i \in N} H_{i}^{{\bar{x}}_{t}} (v^{{\bar{x}}_{t}}),

(18)

while

ω^{{\bar{x}}_{t}} = ({\bar{x}}_{t}, {\bar{x}}_{t + 1}, \dots, {\bar{x}}_{T}) \in Ω ({\bar{u}}^{{\bar{x}}_{t}})

, that is,

ω^{{\bar{x}}_{t}}

belongs to the optimal bundle of cooperative trajectories in the subgame

Γ^{{\bar{x}}_{t}}

.

Proof.

The optimal bundle of cooperative trajectories

Ω (\bar{u})

generated by

\bar{u} \in \bar{P O} (Γ^{x_{0}})

can be divided onto two subsets

{Ψ_{m}} = {ω \in Ω (\bar{u}) | {\bar{x}}_{t} \in ω}

and

{χ_{l}} = {ω \in Ω (\bar{u}) | {\bar{x}}_{t} \notin ω}

while

{Ψ_{m}} \cap {χ_{l}} = \emptyset

,

{Ψ_{m}} \cup {χ_{l}} = Ω (\bar{u})

. Then, taking (5) and (6) into account we get

\begin{matrix} H_{i} (\bar{u}) = \sum_{m} p (Ψ_{m}, \bar{u}) \cdot {\tilde{h}}_{i} (Ψ_{m}) + \sum_{l} p (χ_{l}, \bar{u}) \cdot {\tilde{h}}_{i} (χ_{l}) = \\ = p ({\bar{x}}_{t}, \bar{u}) \cdot [{\tilde{h}}_{i} ({\bar{x}}_{0}, {\bar{x}}_{1}, \dots, {\bar{x}}_{t - 1}) + H_{i}^{{\bar{x}}_{t}} ({\bar{u}}^{{\bar{x}}_{t}})] + \sum_{l} p (χ_{l}, {\bar{u}}^{D}) \cdot {\tilde{h}}_{i} (χ_{l}), \end{matrix}

(19)

and (16) for

\bar{u}

takes the form

\begin{matrix} \sum_{i \in N} p ({\bar{x}}_{t}, \bar{u}) \cdot ({\tilde{h}}_{i} ({\bar{x}}_{0}, {\bar{x}}_{1}, \dots, {\bar{x}}_{t - 1}) + H_{i}^{{\bar{x}}_{t}} ({\bar{u}}^{{\bar{x}}_{t}})) + \\ + \sum_{i \in N} \sum_{l} p (χ_{l}, {\bar{u}}^{D}) \cdot {\tilde{h}}_{i} (χ_{l}) = max_{v \in U} H_{i} (v) . \end{matrix}

(20)

Suppose that

{\bar{u}}^{{\bar{x}}_{t}}

does not satisfy (18), that is, there exists

v^{{\bar{x}}_{t}} \in U^{{\bar{x}}_{t}}

such that

\sum_{i \in N} H_{i}^{{\bar{x}}_{t}} ({\bar{u}}^{{\bar{x}}_{t}}) < \sum_{i \in N} H_{i}^{{\bar{x}}_{t}} (v^{{\bar{x}}_{t}}) .

(21)

Denote by

Ω (v^{{\bar{x}}_{t}}) = {λ_{m}^{{\bar{x}}_{t}} = ({\bar{x}}_{t}, \dots, {\bar{x}}_{T (m)}) | p (λ_{m}^{{\bar{x}}_{t}}, v^{{\bar{x}}_{t}}) > 0}

the bundle of all trajectories in the subgame

Γ^{{\bar{x}}_{t}}

generated by

v^{{\bar{x}}_{t}}

. Then (21) takes the form

\sum_{i \in N} \sum_{m} p (Ψ_{m}^{{\bar{x}}_{t}}, {\bar{u}}^{{\bar{x}}_{t}}) \cdot {\tilde{h}}_{i}^{{\bar{x}}_{t}} (Ψ_{m}^{{\bar{x}}_{t}}) < \sum_{i \in N} \sum_{m} p (λ_{m}^{{\bar{x}}_{t}}, v^{{\bar{x}}_{t}}) \cdot {\tilde{h}}_{i}^{{\bar{x}}_{t}} (λ_{m}^{{\bar{x}}_{t}}) .

(22)

Denote by

W_{i} = ({\bar{u}}_{i}^{D}, v_{i}^{{\bar{x}}_{t}})

,

i \in N

, the ith player’s compound pure strategy in

Γ^{x_{0}}

. The strategy profile

W = (W_{1}, \dots, W_{n})

generates the strategy bundle

Ω (W)

that can be divided onto two disjoint subsets

{λ_{m}} = {ω \in Ω (W) | {\bar{x}}_{t} \in ω}

and

{χ_{l}} = {ω \in Ω (W) | {\bar{x}}_{t} \notin ω}

, where the second subset for

Ω (W)

coincides with the second subset for

Ω (\bar{u})

since

W^{D} = u^{D}

, and

λ_{m} = ({\bar{x}}_{0}, \dots, {\bar{x}}_{t}) \cup ({\bar{x}}_{t}, \dots, {\bar{x}}_{T (m)}) = ({\bar{x}}_{0}, \dots, {\bar{x}}_{t}) \cup λ_{m}^{{\bar{x}}_{t}} .

Adding

\sum_{i \in N} {\tilde{h}}_{i} ({\bar{x}}_{0}, {\bar{x}}_{1} \dots, {\bar{x}}_{t - 1})

to both sides of (22) we get

\sum_{i \in N} ({\tilde{h}}_{i} ({\bar{x}}_{0}, \dots, {\bar{x}}_{t - 1}) + H_{i}^{{\bar{x}}_{t}} ({\bar{u}}^{{\bar{x}}_{t}})) < \sum_{i \in N} ({\tilde{h}}_{i} ({\bar{x}}_{0}, \dots, {\bar{x}}_{t - 1}) + H_{i}^{{\bar{x}}_{t}} (v^{{\bar{x}}_{t}})) .

(23)

Then we can multiply both sides of (23) on

p ({\bar{x}}_{t}, \bar{u}) = p ({\bar{x}}_{t}, {\bar{u}}^{D}) = p ({\bar{x}}_{t}, W^{D}) = p ({\bar{x}}_{t}, W) > 0

and then add

\sum_{i \in N} \sum_{l} p (χ_{l}, {\bar{u}}^{D}) \cdot {\tilde{h}}_{i} (χ_{l})

to both sides of the last inequality. Taking into account (4)–(6) and (20) we obtain

\sum_{i \in N} H_{i} (\bar{u}) < \sum_{i \in N} H_{i} (W)

for the constructed strategy profile

W \in U

. The last inequality contradicts the fact that

\bar{u} \in \bar{U} (Γ^{x_{0}})

, hence (18) is valid.

Arguing in a similar way (for the case when different strategy profiles from

\bar{U} (Γ^{{\bar{x}}_{t}})

generate different bundles of the trajectories) we can verify that

ω^{{\bar{x}}_{t}} = ({\bar{x}}_{t}, \dots, {\bar{x}}_{T})

— a fragment of the cooperative trajectory

ω \in Ω (\bar{u})

, starting at

{\bar{x}}_{t}

— belongs to the optimal bundle of cooperative trajectories in the subgame

Γ^{{\bar{x}}_{t}}

, that is,

ω^{{\bar{x}}_{t}} \in Ω ({\bar{u}}^{{\bar{x}}_{t}})

. □

We will assume in this paper that all the players have agreed to apply the PRB algorithm in order to choose the cooperative strategy profile

\bar{u} = ({\bar{u}}_{1}, \dots, {\bar{u}}_{n})

that generates the optimal bundle

Ω (\bar{u})

of cooperative trajectories in

Γ^{x_{0}} \in G^{c m} (n)

. The next step of cooperation is to define a characteristic function

V^{x_{0}} (S)

. There are different notions of characteristic functions (see, e.g., References [23,24,48]), in this paper we adopt the so-called

γ

-characteristic function introduced in Reference [24]. Namely, we assume that

V^{x_{0}} (S)

is given by the SPE (based on the Attitude SPE algorithm) outcome of S in the noncooperative game between members of S maximizing their joint payoff, and non members playing individually.

The

γ

-characteristic function

V^{{\bar{x}}_{t}}

for the subgame

Γ^{{\bar{x}}_{t}}

,

{\bar{x}}_{t} \in ω_{m} (\bar{u}) = ({\bar{x}}_{0}, \dots, {\bar{x}}_{t},

\dots, {\bar{x}}_{T (m)})

,

ω_{m} (\bar{u}) \in Ω (\bar{u})

along the optimal bundle of cooperative trajectories can be constructed using the same approach. Note that

V^{{\bar{x}}_{t}} (N) = \sum_{ω_{m}^{{\bar{x}}_{t}} \in Ω ({\bar{u}}^{{\bar{x}}_{t}})} p (ω_{m}^{{\bar{x}}_{t}}, {\bar{u}}^{{\bar{x}}_{t}}) \cdot \sum_{τ = t}^{T (m)} \sum_{i \in N} h_{i} ({\bar{x}}_{τ}), t = 0, 1, \dots, T (m) .

(24)

Let

Γ^{x_{0}} (N, V^{x_{0}})

denote extensive-form cooperative game

Γ^{x_{0}} \in G^{c m} (n)

with

γ

-characteristic function, and

Γ^{{\bar{x}}_{t}} (N, V^{{\bar{x}}_{t}})

denote the corresponding subgame.

We assume that the players adopt a single-valued cooperative solution

φ^{x_{0}}

(for instance, the Shapley value [33], the nucleolus [34], etc.) for the cooperative game

Γ^{x_{0}} (N, V^{x_{0}})

which satisfies the collective rationality property

\sum_{i = 1}^{n} φ_{i}^{x_{0}} = V^{x_{0}} (N) = \sum_{ω_{m} \in Ω (\bar{u})} p (ω_{m}, \bar{u}) \cdot \sum_{τ = 0}^{T (m)} \sum_{i \in N} h_{i} ({\bar{x}}_{τ})

(25)

and the individual rationality property

φ_{i}^{x_{0}} ≧ V^{x_{0}} ({i}), i = 1, \dots, n .

(26)

In addition, we assume that the same properties (25) and (26) are valid for the cooperative solutions

φ^{{\bar{x}}_{t}}

at each subgame

Γ^{{\bar{x}}_{t}} (N, V^{{\bar{x}}_{t}})

,

t = 0, \dots, T - 1

.

It is worth noting that the last assumption as well as the choice of

γ

-characteristic function ensure that every player has an incentive to cooperate at each subgame along the optimal game evolution since the ith player’s cooperative payoff-to-go at

Γ^{{\bar{x}}_{t}} (N, V^{{\bar{x}}_{t}})

,

t = 0, \dots, T - 1

, is at least equal to her non-cooperative counterpart:

φ_{i}^{{\bar{x}}_{t}} ⩾ H_{i}^{{\bar{x}}_{t}} ({\underset{̲}{u}}^{{\bar{x}}_{t}})

.

5. Subgame Consistency and Incremental IDP

Let

β = {β_{i} ({\bar{x}}_{τ})}

,

i = 1, \dots, n

;

τ = 1, \dots, T (l)

,

\bar{x} (τ) \in ω_{l} (\bar{u})

,

ω_{l} (\bar{u}) \in Ω (\bar{u})

denote the Imputation Distribution Procedure (IDP) for the cooperative solution

{(φ_{i}^{{\bar{x}}_{0}})}_{i \in N}

or the payment schedule (see, e.g., References [3,8,9,10,11,12,14,15,16,17,18,20] for details). The IDP approach means that all the players have agreed to allocate the total cooperative payoff

V^{x_{0}} (N)

between the players along the optimal bundle

Ω (\bar{u})

of cooperative trajectories

ω_{l} (\bar{u})

according to some specific rule which is called IDP. Namely,

β_{i} ({\bar{x}}_{τ})

denotes the actual current payment which the player i receives at position

{\bar{x}}_{τ}

(instead of

h_{i} ({\bar{x}}_{τ})

) if the players employ the IDP

β

. Moreover, one can design such an IDP

β

that all the players will be interested in cooperation in any subgame

Γ^{{\bar{x}}_{τ}}

,

\bar{x} (τ) \in ω_{l} (\bar{u})

,

ω_{l} (\bar{u}) \in Ω (\bar{u})

, that is, at any intermediate time instant.

Definition 3.

The IDP

β = \{β_{i} ({\bar{x}}_{τ})\}

satisfies subgame efficiency, if at any intermediate node

{\bar{x}}_{t} \in ω (\bar{u})

,

ω (\bar{u}) \in Ω (\bar{u})

,

0 ⩽ t < T

, it holds that:

\sum_{ω_{m}^{{\bar{x}}_{t}} \in Ω ({\bar{u}}^{{\bar{x}}_{t}})} p (ω_{m}^{{\bar{x}}_{t}}, {\bar{u}}^{{\bar{x}}_{t}}) \cdot \sum_{τ = t}^{T (m)} β_{i} ({\bar{x}}_{τ}) = φ_{i}^{{\bar{x}}_{t}}, i \in N .

(27)

Equation (27) means that the expected sum of the payments to player i along the optimal subgame

Γ^{{\bar{x}}_{t}}

evolution equals to what she is entitled to in this subgame. Then the IDP for each player can be reasonably implemented as a rule for step-by-step allocation of the ith player’s current expected optimal payoff. Note that for

t = 0

the subgame efficiency definition coincides with the efficiency at initial node

x_{0}

or the efficiency in the whole game

Γ^{x_{0}}

condition (see References [9,14,16,20]).

Definition 4

([10]). The IDP

β = {β_{i} ({\bar{x}}_{τ})}

satisfies the strict balance condition if for each node

{\bar{x}}_{τ} \in ω_{m} (\bar{u})

,

ω_{m} (\bar{u}) \in Ω (\bar{u})

\forall t = 0, \dots, T (m)

\sum_{i \in N} β_{i} ({\bar{x}}_{τ}) = \sum_{i \in N} h_{i} ({\bar{x}}_{τ}) .

(28)

Equation (28) ensures the “admissibility” of the IDP, that is, the sum of payments to the players in any node

{\bar{x}}_{τ}

is equal to the sum of payoffs that they can collect in this node.

The next advantageous dynamic property of an IDP—the time consistency, introduced in Reference [3]—was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (with chance moves) in Reference [9].

To write down properly the time consistency condition for some intermediate node

{\bar{x}}_{t} \in ω (\bar{u}) = ({\bar{x}}_{0}, {\bar{x}}_{1}, \dots, {\bar{x}}_{t - 1}, {\bar{x}}_{t}, {\bar{x}}_{t + 1}, \dots, {\bar{x}}_{T})

,

ω (\bar{u}) \in Ω (\bar{u})

,

1 ⩽ t < T

, in multistage game

Γ^{x_{0}}

with chance moves we need to pay attention to all chance nodes on the path

({\bar{x}}_{0}, \dots, {\bar{x}}_{t - 1}) = {\underset{̲}{ω}}^{x_{t}} \ {{\bar{x}}_{t}}

.

Namely, let us numerate the chance nodes from

P_{0} \cap ({\underset{̲}{ω}}^{x_{t}} \ {{\bar{x}}_{t}})

in order of their occurrence on the path

({\bar{x}}_{0}, \dots, {\bar{x}}_{t - 1})

, that is,

y_{1} = {\bar{x}}_{t (1)}

,

y_{2} = {\bar{x}}_{t (2)}, \dots, y_{θ} = {\bar{x}}_{t (θ)}

,

0 ⩽ t (1) < t (2) < \dots < t (θ) < t

.

Definition 5

([9]). The IDP

β = {β_{i / k} ({\bar{x}}_{τ})}

for the cooperative solution

φ^{x_{0}}

is called time consistent in the whole game

Γ^{x_{0}} (N, V^{x_{0}}) \in G^{c m} (n)

if at any intermediate node

{\bar{x}}_{t} \in ω (\bar{u})

,

ω (\bar{u}) \in Ω (\bar{u})

,

1 ⩽ t < T

, for all

i \in N

, it holds that

case $θ = 0$ (no chance nodes on the path $({\bar{x}}_{0}, \dots, {\bar{x}}_{t - 1})$ ):

$\sum_{τ = 0}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}} = φ_{i}^{x_{0}},$

(29)
case $θ = 1$ (only one chance node $y_{1} = {\bar{x}}_{t (1)}$ before ${\bar{x}}_{t}$ ):

$\begin{matrix} \sum_{τ = 0}^{t (1)} β_{i} ({\bar{x}}_{τ}) + p ({\bar{x}}_{t (1) + 1}, \bar{u}) \cdot \{\sum_{τ = t (1) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}}\} + \\ + \sum_{x^{k} \in S ({\bar{x}}_{t (1)}) \ {{\bar{x}}_{t (1) + 1}}} p (x^{k}, \bar{u}) \cdot φ_{i}^{x^{k}} = φ_{i}^{x_{0}}, \end{matrix}$

(30)
case $θ = 2$ (two chance nodes $y_{1} = {\bar{x}}_{t (1)}$ , $y_{2} = {\bar{x}}_{t (2)}$ before ${\bar{x}}_{t}$ ):

\begin{matrix} \sum_{τ = 0}^{t (1)} β_{i} ({\bar{x}}_{τ}) + p ({\bar{x}}_{t (1) + 1}, \bar{u}) \cdot {\sum_{τ = t (1) + 1}^{t (2)} β_{i} ({\bar{x}}_{τ}) + p ({\bar{x}}_{t (2) + 1} | {\bar{x}}_{t (2)}, \bar{u}) \times \\ \times [\sum_{τ = t (2) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}}] + \sum_{x^{m} \in S ({\bar{x}}_{t (2)}) \ {{\bar{x}}_{t (2) + 1}}} p (x^{m} | {\bar{x}}_{t (2)}, \bar{u}) \cdot φ_{i}^{x^{m}}} + \\ + \sum_{x^{k} \in S ({\bar{x}}_{t (1)}) \ {{\bar{x}}_{t (1) + 1}}} p (x^{k}, \bar{u}) \cdot φ_{i}^{x^{k}} = φ_{i}^{x_{0}}, \end{matrix}

(31)

…

Note that for partial case when

{\bar{x}}_{t} \in S ({\bar{x}}_{t (1)})

, that is, if

{\bar{x}}_{t}

follows the chance node

{\bar{x}}_{t (1)}

Equation (30) takes the simpler form

\sum_{τ = 0}^{t (1)} β_{i} ({\bar{x}}_{τ}) + \sum_{x^{k} \in S ({\bar{x}}_{t (1)})} p (x^{k}, \bar{u}) \cdot φ_{i}^{x^{k}} = φ_{i}^{x_{0}} .

A similar note is valid for equation (31), and so forth.

Roughly speaking, Definition 5 implies that the payments collected by the ith player (according to the payment schedule

β

) before reaching some intermediate node

{\bar{x}}_{t}

plus the expected ith player’s component of the Shapley value in the subgame

Γ^{{\bar{x}}_{t}}

starting at

{\bar{x}}_{t}

plus this player’s expected Shapley value components in other subgames along the cooperative trajectories which do not contain

{\bar{x}}_{t}

corresponds to what the player i is entitled to in the original game

Γ^{x_{0}} (N, V^{x_{0}})

.

It is worth noting that Definition 5 indeed provides a reasonable consistency requirements which a good payment schedule

β

should satisfy when the player evaluates IDP

β

at the initial node

x_{0}

, that is, before the game

Γ^{x_{0}} (N, V^{x_{0}})

starts (and the words “in the whole game” in Definition 5 properly reflect this feature). However, when the player purposes to evaluate IDP

β

in the subgame

Γ^{{\bar{x}}_{t}}

, that is, after reaching some intermediate node

{\bar{x}}_{t}

(in case when

θ ⩾ 1

) this player will unlikely take into account the expected future payoffs in all the subgames which are unattainable if the node

{\bar{x}}_{t}

has been already reached, that is, the last addends in the LHS of (30) and (31). To overcome this problem we suggest the players to use a notion of subgame consistency—a refinement of time consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38]. Let us provide a rigorous definition of the IDP subgame consistency for extensive-form games with chance moves that is applicable in all the subgames along the optimal bundle of cooperative trajectories.

Definition 6.

The IDP

β = \{β_{i} ({\bar{x}}_{τ})\}

is called subgame consistent if at any intermediate node

{\bar{x}}_{t} \in ω (\bar{u})

,

ω (\bar{u}) \in Ω (\bar{u})

,

1 ⩽ t ⩽ T

, for all

i \in N

, it holds that

case $1 ⩽ t ⩽ t (1)$ (no chance nodes before the subgame $Γ^{{\bar{x}}_{t}}$ root ${\bar{x}}_{t}$ ):

$\sum_{τ = 0}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}} = φ_{i}^{x_{0}},$

(32)
case $t (1) + 1 < t ⩽ t (2)$ (only one chance node $y_{1} = {\bar{x}}_{t (1)}$ before ${\bar{x}}_{t}$ ):

$\sum_{τ = t (1) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}} = φ_{i}^{{\bar{x}}_{t (1) + 1}},$

(33)
case $t (2) + 1 < t ⩽ t (3)$ (two chance nodes before ${\bar{x}}_{t}$ ):

$\sum_{τ = t (2) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}} = φ_{i}^{{\bar{x}}_{t (2) + 1}},$

(34)
⋮
case $t (θ) + 1 < t ⩽ T$ (no chance nodes after ${\bar{x}}_{t}$ ):

$\sum_{τ = t (θ) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) + φ_{i}^{{\bar{x}}_{t}} = φ_{i}^{{\bar{x}}_{t (θ) + 1}} .$

(35)

The subgame consistency definition differs from the “time consistency in the whole game” property (see References [9,14,16,20]) which is based on an a priori assessment of the ith player’s expected optimal payoff (before the game starts). However, when the players make a decision in the subgame after the chance move occurs they need to recalculate the expected optimal payoff since the original optimal bundle of cooperative trajectories shrinks after each chance node. Note that we can not write out the subgame consistency condition for

t = t (1) + 1, t (2) + 1, . . ., t (θ) + 1

, that is, for the nodes

{\bar{x}}_{t}

that immediately follow the chance nodes.

One can suggest different imputation distribution procedures that may or may not satisfy the useful properties listed above. The review of different IDP for multistage games (without chance moves) as well as the analysis of their properties can be found in References [10,12,15,17]. Below we consider the refinement of the so-called incremental IDP (see, e.g., References [10,14,16,17,20,21]) that was recently introduced for multistage games with chance moves [9].

Definition 7

([9]). The incremental IDP for the cooperative solution

φ^{x_{0}}

in multistage game with chance moves

Γ^{x_{0}}

is defined as follows:

β_{i} (x_{t}) = φ_{i}^{x_{t}} - \sum_{x_{t + 1}^{k} \in S (x_{t})} p (x_{t + 1}^{k} | x_{t}, \bar{u}) \cdot φ_{i}^{x_{t + 1}^{k}}

(36)

for

x_{t} \in ω_{l} (\bar{u}) = (x_{0}, \dots, x_{t}, \dots, x_{T (l)})

,

ω_{l} (\bar{u}) \in Ω (\bar{u})

,

t = 0, \dots, T (l) - 1

;

β_{i} (x_{T (l)}) = φ_{i}^{x_{T (l)}}

(37)

for

x_{T (l)} \in Ω (\bar{u}) \cap P_{n + 1}

.

Remark 2.

Formulas (36), (37) are similar to the imputation distribution procedures suggested in References [14,16,20] for (single-criterion) stochastic discrete-time dynamic games played over event trees. If

x_{t} \in P_{i}

,

i = 1, \dots, n

Equation (36) takes the simpler form

β_{i} (x_{t}) = φ_{i}^{x_{t}} - φ_{i}^{x_{t + 1}}

, where

{\bar{u}}_{i} (x_{t}) = x_{t + 1}

, that coincides with the “classical” incremental IDP.

Let us use again 3-person extensive-form game from Example 1 to demonstrate a proposed scheme of cooperation.

Example 2.

(Cooperative behavior in 3-player game from Ex. 1).

Suppose that the players have agreed on the following ranks:

r (1) = 1

,

r (2) = 2

and

r (3) = 3

. When implementing the PRB algorithm we get the optimal bundle

Ω (\bar{u})

which contains four cooperative trajectories (marked in bold, deep blue in Figure 2):

ω_{1} = ({\bar{x}}_{0}, {\bar{x}}_{1}, {\bar{x}}_{2}^{1}, {\bar{x}}_{3}, {\bar{x}}_{4}^{2}, {\bar{x}}_{5}, {\bar{x}}_{6})

,

ω_{2} = ({\bar{x}}_{0}, {\bar{x}}_{1}, x_{2}^{2}, z_{3})

,

ω_{3} = ({\bar{x}}_{0}, {\bar{x}}_{1}, {\bar{x}}_{2}^{1}, {\bar{x}}_{3}, x_{4}^{1})

and

ω_{4} = ({\bar{x}}_{0}, {\bar{x}}_{1}, {\bar{x}}_{2}^{1}, {\bar{x}}_{3}, x_{4}^{3}, z_{7})

. Note that players use the ranks when making decision at node

{\bar{x}}_{5}

.

To demonstrate the implementation of the incremental IDP and its properties we will adopt the Shapley value as a single valued cooperative solution. The values of the γ-characteristic function

V^{x_{0}}

for the original game

Γ^{x_{0}} (N, V^{x_{0}})

and the Shapley value

φ^{x_{0}}

are

Consider, for instance, the incremental IDP along the longest cooperative trajectory

ω_{2} = ({\bar{x}}_{0}, \dots, {\bar{x}}_{6})

from

Ω (\bar{u})

. If we calculate γ-characteristic functions using Attitude SPE algorithm for the subgames, we get the following results.

Subgame

Γ^{{\bar{x}}_{1}} (N, V^{{\bar{x}}_{1}})

:

Subgame

Γ^{{\bar{x}}_{2}^{1}} (N, V^{{\bar{x}}_{2}^{1}})

:

Subgame

Γ^{x_{2}^{2}} (N, V^{x_{2}^{2}})

:

Subgame

Γ^{{\bar{x}}_{3}} (N, V^{{\bar{x}}_{3}})

:

Subgame

Γ^{{\bar{x}}_{4}^{2}} (N, V^{{\bar{x}}_{4}^{2}})

:

Subgame

Γ^{x_{4}^{3}} (N, V^{x_{4}^{3}})

:

Subgame

Γ^{{\bar{x}}_{5}} (N, V^{{\bar{x}}_{5}})

:

Finally,

φ^{{\bar{x}}_{6}} = h^{{\bar{x}}_{6}} = (12, 0, 0)

.

One can calculate the incremental IDP

\{β_{i} ({\bar{x}}_{τ}), {\bar{x}}_{τ} \in ω_{2}\}

using (36) and (37):

	${\bar{x}}_{0}$	${\bar{x}}_{1}$	${\bar{x}}_{2}^{1}$	${\bar{x}}_{3}$	${\bar{x}}_{4}^{2}$	${\bar{x}}_{5}$	${\bar{x}}_{6}$
$β_{1} ({\bar{x}}_{τ})$	6	0	$- 10$	12	36	$- 12$	12
$β_{2} ({\bar{x}}_{τ})$	0	6	4	0	$- 6$	12	0
$β_{3} ({\bar{x}}_{τ})$	0	0	18	0	6	24	0

,

Note that the subgame consistency conditions at nodes

{\bar{x}}_{1}

,

{\bar{x}}_{3}

and

{\bar{x}}_{5}

according to (32)–(34) respectively take the form:

β_{i} ({\bar{x}}_{0}) + φ_{i}^{{\bar{x}}_{1}} = φ_{i}^{x_{0}}, i \in N, o r (\begin{matrix} 6 \\ 0 \\ 0 \end{matrix}) + (\begin{matrix} 19 \frac{1}{6} \\ 17 \frac{1}{6} \\ 25 \frac{2}{3} \end{matrix}) = (\begin{matrix} 25 \frac{1}{6} \\ 17 \frac{1}{6} \\ 25 \frac{2}{3} \end{matrix}),

β_{i} ({\bar{x}}_{2}^{1}) + φ_{i}^{{\bar{x}}_{3}} = φ_{i}^{x_{2}^{1}}, i \in N, o r (\begin{matrix} - 10 \\ 4 \\ 18 \end{matrix}) + (\begin{matrix} 31 \\ 13 \\ 20 \end{matrix}) = (\begin{matrix} 21 \\ 17 \\ 38 \end{matrix}),

β_{i} ({\bar{x}}_{4}^{2}) + φ_{i}^{{\bar{x}}_{5}} = φ_{i}^{x_{4}^{2}}, i \in N, o r (\begin{matrix} 36 \\ - 6 \\ 6 \end{matrix}) + (\begin{matrix} 0 \\ 12 \\ 24 \end{matrix}) = (\begin{matrix} 36 \\ 6 \\ 30 \end{matrix}) .

It is known that the classical incremental IDP for multistage (and differential) games may imply negative current payments to some players at some positions (see References [4,10,17,38] for details). As one can observe in Ex. 2, this drawback of the incremental IDP may appear in the extensive-form game with chance moves as well. Two approaches how to overcome this possible disadvantage were suggested in References [4,10]. Unfortunately, as it was firstly proved in Reference [10], in general it is impossible to design a time consistent IDP which satisfies both the balance condition and non-negativity constraint.

Proposition 3.

The incremental IDP (36), (37) satisfies strict balance condition (28), the subgame efficiency condition (27), and the subgame consistency conditions (32)–(35).

Proof.

Incremental IDP

β

was proved to satisfiy strict balance condition (28) in Reference [9]. The proof of subgame consistency can be carried out by direct verification. For instance, consider the case when

t (1) + 1 < t ⩽ t (2)

. Then, using Remark 2 we get

\sum_{τ = t (1) + 1}^{t - 1} β_{i} ({\bar{x}}_{τ}) = (φ_{i}^{{\bar{x}}_{t (1) + 1}} - φ_{i}^{{\bar{x}}_{t (1) + 2}}) + \dots + (φ_{i}^{{\bar{x}}_{t - 1}} - φ_{i}^{{\bar{x}}_{t}}) = φ_{i}^{{\bar{x}}_{t (1) + 1}} - φ_{i}^{{\bar{x}}_{t}} .

Obviously, (33) is satisfied.

The proof that IDP (36), (37) satisfies subgame efficiency (27) is based on direct calculations but rather cumbersome in general case (i.e., for arbitrary game

Γ^{x_{0}}

). Let us demonstrate how it works for the game in Example 2. For instance we verify that the incremental IDP meets the subgame efficiency condition at node

{\bar{x}}_{3}

.

Note that

Ω ({\bar{u}}^{{\bar{x}}_{3}}) = \{ω_{1}^{{\bar{x}}_{3}} = ({\bar{x}}_{3}, x_{4}^{1}); ω_{2}^{{\bar{x}}_{3}} = ({\bar{x}}_{3}, {\bar{x}}_{4}^{2}, {\bar{x}}_{5}, {\bar{x}}_{6}); ω_{3}^{{\bar{x}}_{3}} = ({\bar{x}}_{3}, x_{4}^{3}, z_{7})\}

while

p (ω_{1}^{{\bar{x}}_{3}}, {\bar{u}}^{{\bar{x}}_{3}}) = π (x_{4}^{1} | {\bar{x}}_{3})

,

p (ω_{2}^{{\bar{x}}_{3}}, {\bar{u}}^{{\bar{x}}_{3}}) = π ({\bar{x}}_{4}^{2} | {\bar{x}}_{3})

and

p (ω_{3}^{{\bar{x}}_{3}}, {\bar{u}}^{{\bar{x}}_{3}}) = π (x_{4}^{3} | {\bar{x}}_{3})

. Then, using (32), (33), Remark 2, equality

\sum_{x_{4}^{k} \in S ({\bar{x}}_{3})} π (x_{4}^{k} | {\bar{x}}_{3}) = 1

and the notation

Φ_{i}^{4} = \sum_{k = 1}^{3} π (x_{4}^{k} | {\bar{x}}_{3}) \cdot φ_{i}^{x_{4}^{k}},

we obtain

\begin{matrix} \sum_{ω_{k}^{{\bar{x}}_{3}} \in Ω ({\bar{u}}^{{\bar{x}}_{3}})} p (ω_{k}^{{\bar{x}}_{3}}, {\bar{u}}^{{\bar{x}}_{3}}) \cdot \sum_{τ = 3}^{T (k)} β_{i} ({\bar{x}}_{τ}) = π (x_{4}^{1} | {\bar{x}}_{3}) \cdot [(φ_{i}^{{\bar{x}}_{3}} - Φ_{i}^{4}) + φ_{i}^{x_{4}^{1}}] + \\ + π ({\bar{x}}_{4}^{2} | {\bar{x}}_{3}) \cdot [(φ_{i}^{{\bar{x}}_{3}} - Φ_{i}^{4}) + (φ_{i}^{{\bar{x}}_{4}^{2}} - φ_{i}^{{\bar{x}}_{5}}) + (φ_{i}^{{\bar{x}}_{5}} - φ_{i}^{{\bar{x}}_{6}}) + φ_{i}^{{\bar{x}}_{6}}] + \\ + π ({\bar{x}}_{4}^{3} | {\bar{x}}_{3}) \cdot [(φ_{i}^{{\bar{x}}_{3}} - Φ_{i}^{4}) + (φ_{i}^{x_{4}^{3}} - φ_{i}^{z_{7}}) + φ_{i}^{z_{7}}] = φ_{i}^{{\bar{x}}_{3}} \cdot \sum_{k = 1}^{3} π (x_{4}^{k} | {\bar{x}}_{3}) + \\ + π (x_{4}^{1} | {\bar{x}}_{3}) \cdot [- Φ_{i}^{4} + φ_{i}^{x_{4}^{1}}] + π ({\bar{x}}_{4}^{2} | {\bar{x}}_{3}) \cdot [- Φ_{i}^{4} + φ_{i}^{{\bar{x}}_{4}^{2}}] + π (x_{4}^{3} | {\bar{x}}_{3}) \cdot [- Φ_{i}^{4} + φ_{i}^{x_{4}^{3}}] = \\ = φ_{i}^{{\bar{x}}_{3}} - Φ_{i}^{4} \cdot \sum_{k = 1}^{3} π (x_{4}^{k} | {\bar{x}}_{3}) + \sum_{k = 1}^{3} π (x_{4}^{k} | {\bar{x}}_{3}) \cdot φ_{i}^{x_{4}^{k}} = φ_{i}^{{\bar{x}}_{3}} . \end{matrix}

□

According to Proposition 3, the incremental payment schedule (36), (37) can be used to implement a long-term cooperative agreement in an extensive-form game with chance moves.

6. Conclusions

In the paper we purposes to design a mechanism of the players’ sustainable long-term cooperation that satisfies a number of good properties. To this aim we formalised the players’ rank based algorithm for selecting a unique optimal bundle of cooperative trajectories, and proved that corresponding cooperative strategy profile satisfies time consistency. To calculate

γ

-characteristic function one need to have a specific method for constructing a unique (subgame perfect) equilibrium at any extensive-form game with chance moves. Hence, we formalised a backwards induction procedure refinement based on the players’ attitude vectors—the so-called attitude SPE algorithm.

As a result of reexamination of the “IDP time consistency in the whole game” concept, we suggest to adopt the concept of subgame consistency, introduced in Reference [36] for differential stochastic games and then extend it to dynamic stochastic games in References [37,38]. The definition of subgame consistency for extensive-form game with chance moves is provided. This property takes into account such an interesting feature of the games under consideration that when the players make a decision in the subgame

Γ^{x_{t}}

after the chance move occurs, they need to recalculate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. It is worth noting that a similar approach, based on the IDP subgame consistency notion could be applied to dynamic games played over event trees ([14,16,20]). We proved that the incremental IDP specified for multistage games with chance moves in Reference [9] satisfy subgame consistency and subgame efficiency as well as the strict balance condition.

It follows from Propositions 1–3 that two specified algorithms combined with the

γ

-characteristic function, and the incremental payment schedule together constitute a mechanism of the players’ sustainable cooperation that satisfies a number of good properties and could be used in extensive-form games with chance moves. Note that the main result of the paper—Proposition 3—does not depend on the specific method which the players employ to calculate the characteristic function as well as on the specific single-valued cooperative solution meeting (25) and (26).

Since this is the first time that subgame consistent solutions are examined for extensive-form games with chance moves, further research along this line is expected. It is surely of interest to develop appropriate software application to implement proposed algorithms in arbitrary extensive-form game with chance moves. Possibly, one can use the so-called Game Theory Explorer [30] when developing such software tools for 2-person extensive games. Further, it might be interesting to run experiments with large-scale datasets, after the software application that allows to construct unique SPE, the optimal bundle of cooperative trajectories,

γ

-characteristic function, and so forth, will be developed.

Let us notice some preliminary suggestions on how one can use such software application to run simulations. First, one can vary the main parameter—the length of the game tree, and the additional parameters such as the game structure, the players’ payoffs, probabilities of transitions, and so forth, to obtain practical estimations of the proposed algorithms complexity and scalability. Secondarily, one can generate external disturbances of the stage payoffs and probabilities and vary the players’ attitude vectors to carry out the sensitivity analysis of the proposed non-cooperative and cooperative solutions. Further, it is of interest to get experimental estimations of the price of anarchy and the price of stability for the class of games under consideration. Finally, one can use such software application to check whether the additional properties (non-negativity, irrational-behavior-proof conditions, etc.) of the proposed incremental IDP and other payment schedules (see, e.g., Reference [15]) are satisfied for given extensive-form game with chance moves.

Author Contributions

Conceptualization, D.K.; methodology, D.K.; formal analysis, D.K.; investigation, D.K. and N.S.; writing—original draft preparation, D.K. and N.S.; writing—review and editing, D.K. and N.S.; visualization, D.K. and N.S.; supervision, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

The reported study was funded by RFBR under the research project 18-00-00727 (18-00-00725).

Acknowledgments

We would like to thank three anonymous Reviewers and Leon Petrosyan for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Haurie, A. A note on nonzero-sum diferential games with bargaining solution. J. Optim. Theory Appl. 1976, 18, 31–39. [Google Scholar] [CrossRef]
Petrosyan, L. Time-consistency of solutions in multi-player differential games. Astronomy 1977, 4, 46–52. [Google Scholar]
Petrosyan, L.A.; Danilov, N.N. Stability of solutions in non-zero sum differential games with transferable payoffs. Astronomy 1979, 1, 52–59. [Google Scholar]
Gromova, E.V.; Plekhanova, T.M. On the regularization of a cooperative solution in a multistage game with random time horizon. Discret. Appl. Math. 2019, 255, 40–55. [Google Scholar] [CrossRef]
Haurie, A.; Krawczyk, J.B.; Zaccour, G. Games and Dynamic Games; Scientific World: Singapore, 2012. [Google Scholar]
Kuzyutin, D. On the problem of the stability of solutions in extensive games. Vestnik St. Petersburg Univ. Math. 1995, 4, 18–23. [Google Scholar]
Kuzyutin, D. On the consistency of weak equilibria in multicriteria extensive games. In Contributions to Game Theory and Management; Petrosyan, L.A., Zenkevich, N.A., Eds.; St. Petersburg State Univ. Press: St. Petersburg, Russia, 2012; Volume V, pp. 168–177. [Google Scholar]
Kuzyutin, D.; Gromova, E.; Pankratova, Y. Sustainable cooperation in multicriteria multistage games. Oper. Res. Lett. 2018, 46, 557–562. [Google Scholar] [CrossRef]
Kuzyutin, D.; Gromova, E.; Smirnova, N. On the cooperative behavior in multistage multicriteria game with chance moves. In Mathematical Optimization Theory and Operations Research (MOTOR 2020); Intern. Conference Proceedings; LNCS Series; Springer: Berlin, Germany, 2020; forthcoming. [Google Scholar]
Kuzyutin, D.; Nikitina, M. Time consistent cooperative solutions for multistage games with vector payoffs. Oper. Res. Lett. 2017, 45, 269–274. [Google Scholar] [CrossRef]
Kuzyutin, D.; Nikitina, M. An irrational behavior proof condition for multistage multicriteria games. In Consrtuctive Nonsmooth Analysis and Related Topics (Dedic. to the Memory of V.F.Demyanov); CNSA 2017; IEEE: New York, NY, USA, 2017; pp. 178–181. [Google Scholar]
Petrosyan, L.A.; Danilov, N.N. Cooperative Differential Games and Their Applications; Publishing House of Tomsk University: Tomsk, Russia, 1985. [Google Scholar]
Petrosyan, L.A.; Kuzyutin, D.V. On the stability of E-equilibrium in the class of mixed strategies. Vestn. St. Petersburg Univ. Math. 1995, 3, 54–58. [Google Scholar]
Parilina, E.; Zaccour, G. Node-consistent Shapley value for games played over event trees with random terminal time. J. Optim. Theory Appl. 2017, 175, 236–254. [Google Scholar] [CrossRef]
Kuzyutin, D.; Smirnova, N.; Gromova, E. Long-term implementation of the cooperative solution in multistage multicriteria game. Oper. Res. Perspect. 2019, 6, 100107. [Google Scholar] [CrossRef]
Parilina, E.; Zaccour, G. Node-consistent core for games played over event trees. Automatica 2015, 55, 304–311. [Google Scholar] [CrossRef]
Petrosyan, L.; Kuzyutin, D. Games in Extensive Form: Optimality and Stability; Saint Petersburg University Press: St. Petersburg, Russia, 2000. [Google Scholar]
Petrosyan, L.; Zaccour, G. Time-consistent Shapley value allocation of pollution cost reduction. J. Econ. Dyn. Control. 2003, 27, 381–398. [Google Scholar] [CrossRef]
Sedakov, A. On the Strong Time Consistency of the Core. Autom. Remote Control 2018, 79, 757–767. [Google Scholar] [CrossRef]
Reddy, P.; Shevkoplyas, E.; Zaccour, G. Time-consistent Shapley value for games played over event trees. Automatica 2013, 49, 1521–1527. [Google Scholar] [CrossRef]
Zakharov, V.; Dementieva, M. Multistage Cooperative Games and Problem of Time Consistency. Int. Game Theory Rev. 2004, 6, 157–170. [Google Scholar] [CrossRef]
Kuhn, H. Extensive games and the problem of information. Ann. Math. 1953, 28, 193–216. [Google Scholar]
Myerson, R. Game Theory. Analysis of Conflict; Harvard University Press: Cambridge, MA, USA, 1997. [Google Scholar]
Chandler, P.; Tulkens, H. The core of an economy with multilateral environmental externalities. Int. J. Game Theory 1997, 26, 379–401. [Google Scholar] [CrossRef]
Selten, R. Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games. Int. J. Game Theory 1975, 4, 25–55. [Google Scholar] [CrossRef]
Petrosyan, L.; Zenkevich, N. Game Theory; World Scientific Publisher: Singapore; London, UK, 1996. [Google Scholar]
Kuzyutin, D.; Osokina, O.; Romanenko, I. On the consistency of optimal behavior in extensive games. In Game Theory and Applications; Petrosyan, L., Mazalov, V., Eds.; Nova Science Publ.: New York, NY, USA, 1997; pp. 107–116. [Google Scholar]
Petrosyan, L.A.; Mamkina, S.I. Games with changing coalitional partition. Vestn. St. Petersburg Univ. Math. 2004, 3, 60–69. [Google Scholar]
McKelvey, R.; McLennan, A.; Turocy, T. Gambit: Software Tools for Game Theory. Version 16.0.1. 2016. Available online: http://www.gambit-project.org (accessed on 25 June 2020).
Savani, R.; von Stengel, B. Game theory explorer—software for the applied game theorist. Comput. Manag. Sci. 2015, 12, 5–33. [Google Scholar] [CrossRef][Green Version]
Von Stengel, B. Computing equilibria for two-person games. In Handbook of Game Theory; Aumann, R., Hart, S., Eds.; North-Holland: Amsterdam, Netherlands, 2002; Volume 3, pp. 1723–1759. [Google Scholar]
Lemke, C. Bimatrix equilibrium points and mathematical programming. Manag. Sci. 1965, 11, 681–689. [Google Scholar] [CrossRef]
Shapley, L. A value for n-person games. In Contributions to the Theory of Games, II; Kuhn, H., Tucker, A.W., Eds.; Princeton University Press: Princeton, NJ, USA, 1953; pp. 307–317. [Google Scholar]
Schmeidler, D. The nucleolus of a characteristic function game. SIAM J. Appl. Math. 1969, 17, 1163–1170. [Google Scholar] [CrossRef]
Petrosian, O.; Zakharov, V. IDP-Core: Novel Cooperative Solution for Differential Games. Mathematics 2020, 8, 721. [Google Scholar] [CrossRef]
Yeung, D.; Petrosyan, L. Subgame consistent cooperative solutions in stochastic differential games. J. Optim. Theory Appl. 2004, 120, 651–666. [Google Scholar] [CrossRef]
Yeung, D.; Petrosyan, L. Subgame consistent solutions for cooperative stochastic dynamic games. J. Optim. Theory Appl. 2010, 145, 579–596. [Google Scholar] [CrossRef]
Yeung, D.; Petrosyan, L. Subgame-consistent cooperative solutions in randomly furcating stochastic dynamic games. Math. Comput. Model. 2013, 57, 976–991. [Google Scholar] [CrossRef]
Breton, M.; Dahmouni, I.; Zaccour, G. Equilibria in a two-species fishery. Math. Biosci. 2019, 309, 78–91. [Google Scholar] [CrossRef] [PubMed]
Crettez, B.; Hayek, N.; Zaccour, G. Do charities spend more on their social programs when they cooperate than when they compete? Eur. J. Oper. Res. 2020, 283, 1055–1063. [Google Scholar] [CrossRef]
Finus, M. Game Theory and International Environmental Cooperation; Edward, E., Ed.; Edward Elgar Publ.: Northampton, MA, USA, 2001. [Google Scholar]
Mazalov, V.V.; Rettiyeva, A.N. The discrete-time bioresource sharing model. J. Appl. Math. Mech. 2011, 75, 180–188. [Google Scholar] [CrossRef]
Ougolnitsky, G.; Usov, A. Spatially distributed differential game theoretic model of fisheries. Mathematics. 2019, 7, 732. [Google Scholar] [CrossRef]
Yeung, D.; Petrosyan, L. Subgame Consistent Economic Optimization: An Advanced Cooperative Dynamic Game Analysis; Springer: New York, NY, USA, 2012. [Google Scholar]
Nash, J.F. Equilibrium points in n-person games. Proc. Nat. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef]
Birch, B.J. On games with almost complete information. Proc. Camb. Philos. Soc. 1955, 51, 275–287. [Google Scholar] [CrossRef]
Dalkey, N. Equivalence of information patterns and essentially determinate games. Contrib. Theory Games. 1953, 21, 217–244. [Google Scholar]
Gromova, E.V.; Petrosyan, L.A. On an approach to constructing a characteristic function in cooperative differential games. Autom. Remote. Control. 2017, 78, 1680–1692. [Google Scholar] [CrossRef]

Figure 1. 3-person extensive-form game: A-Subgame Perfect Equilibria (SPE) algorithm implementation.

Figure 2. 3-player extensive-form game: cooperative behavior.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kuzyutin, D.; Smirnova, N. Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves. Mathematics 2020, 8, 1061. https://doi.org/10.3390/math8071061

AMA Style

Kuzyutin D, Smirnova N. Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves. Mathematics. 2020; 8(7):1061. https://doi.org/10.3390/math8071061

Chicago/Turabian Style

Kuzyutin, Denis, and Nadezhda Smirnova. 2020. "Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves" Mathematics 8, no. 7: 1061. https://doi.org/10.3390/math8071061

APA Style

Kuzyutin, D., & Smirnova, N. (2020). Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves. Mathematics, 8(7), 1061. https://doi.org/10.3390/math8071061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves

Abstract

1. Introduction

2. Extensive-Form Game with Chance Moves

3. Refined Backwards Induction Procedure to Construct a Unique SPE

4. Cooperative Strategies and Trajectories

5. Subgame Consistency and Incremental IDP

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI