Critical Discount Factor Values in Discounted Supergames

This paper examines the subgame-perfect equilibria in symmetric 2 × 2 supergames. We solve the smallest discount factor value for which the players obtain all the feasible and individually rational payoffs as equilibrium payoffs. We show that the critical discount factor values are not that high in many games and they generally depend on how large the payoff set is compared to the set of feasible payoffs. We analyze how the stage-game payoffs affect the required level of patience and organize the games into groups based on similar behavior. We study how the different strategies affect the set of equilibria by comparing pure, mixed and correlated strategies. This helps us understand better how discounting affects the set of equilibria and we can identify the games where extreme patience is required and the type of payoffs that are difficult to obtain. We also observe discontinuities in the critical values, which means that small changes in the stage-game payoffs may affect dramatically the equilibrium payoffs.


Introduction
The folk theorem tells us that any feasible and individually rational (FIR) payoff is an equilibrium, when the players are patient enough [1][2][3][4].However, the players may not be extremely patient but they rather have some intermediate value for the discount factor.This paper finds the smallest discount factor value for which the FIR payoffs are the equilibrium payoffs in the symmetric 2 × 2 supergames.This extends the folk theorem by solving exactly how patient the players need to be as a function of the stage-game payoffs.This reveals why and in what type of games extreme patience is required in the folk theorem and what payoffs are difficult to obtain.
This paper compares three types of strategies: pure strategies with and without public correlation and mixed strategies without public correlation.We want to examine how these assumptions affect the results.In some applications, it may be reasonable to assume that the players may only use pure strategies and they may not be able to coordinate their actions using a correlation device.The pure-strategy model has been examined in [5][6][7][8].These papers characterize the subgame-perfect equilibria with a set-valued fixed-point equation, which forms the basis of our analysis.The mixed-strategy model has been studied in the more general model of imperfect public monitoring [9][10][11]; see also [12,13] for the model examined in this paper.
The computation of the payoff set has been examined in [14][15][16][17][18].These papers assume public correlation, which makes the payoff set convex and simplifies the computation dramatically.
References [19,20] have developed a method for computing pure-strategy equilibria without public correlation.
The main result of this paper is to solve the critical values, i.e., the smallest discount factor value for which the players obtain all the FIR payoffs as equilibrium payoffs.The results are based on solving analytically and geometrically the fixed-point equation of [7,8].This idea has been presented in [21] where the critical values are solved in a class of prisoner's dilemma games under public randomization; see also Sections 2.5.3 and 2.5.6 in [11].The pure-strategy model without public correlation is a quite straightforward extension of [21], but the mixed-strategy model requires analyzing totally new types of strategies with complicated payoffs; see [12,13].We restrict our analysis to the symmetric 2 × 2 games since the set of pure-strategy equilibria may be empty in asymmetric games, it may be difficult to find the smallest equilibrium payoffs [22], and it is tedious to examine all the asymmetric games.
The results of this paper can be used (i) to identify the games where the players have trouble in obtaining all the payoffs and to find the payoffs that are difficult to obtain, (ii) to understand better how the equilibrium payoffs depend on the discount factor, and (iii) to find a range of discount factor values for which the computation of equilibria is easy.For example, we know that the payoff set coincides with the FIR payoffs for all the discount factor values above the critical value, and there is no need to compute the payoff set for these values.We also observe discontinuities in the critical values, which means that the set of equilibria may not behave well, i.e., small changes in the stage-game payoffs may affect dramatically the payoff set.
We organize the 2 × 2 games into groups based on the equations that determine the critical values and provide a useful visualization of the critical values in different games.From the figures, it is easy to see when the high level of patience is needed and to make the comparison between the different strategies.

Stage Games
In a repeated game, a stage game is played again and again by the same players.A stage game is defined by a finite set of players N = {1, . . ., n}, a finite set of pure actions for each player A i , i ∈ N, and the players' utilities for each action profile u : A → R n , where A = × i∈N A i is the set of pure-action profiles.Moreover, a pure action of player i is denoted a i ∈ A i and a pure-action profile is a ∈ A.
Each player i ∈ N may randomize over his pure actions a i ∈ A i .This defines a mixed action q i such that q i (a i ) ≥ 0 for each a i ∈ A i and ∑ a i ∈A i q i (a i ) = 1.The set of probability distributions over A i is denoted Q i and Q = × i∈N Q i .A mixed-action profile is denoted by q = (q 1 , . . ., q n ) ∈ Q.A support of a mixed action is the set of pure actions that is played with a strictly positive probability: We also define Supp(q) = × i∈N Supp(q i ), and for each a ∈ Supp(q), we let π q (a) be the probability that the action profile a is realized if the mixed-action profile q is played: π q (a) = ∏ j∈N q j (a j ).In pure strategies, we make the restriction that q i (a i ) = 1 for one action a i ∈ A i .
In a model with public correlation, the players observe a realization ω ∈ [0, 1] of a public lottery and they can condition their action based on the signal ω.For example, two players may agree to take action a 1 if ω ≤ 1/2 and a 2 otherwise.This way the players can coordinate their actions such that they randomize between the outcomes (a 1 , a 1 ) and (a 2 , a 2 ), and avoid the outcomes (a 1 , a 2 ) and (a 2 , a 1 ).The latter outcomes would be realized if standard mixed strategies were used and no public correlation device was available.
The stage-game payoffs are given by function u : Q → R n .For example, if the players choose a mixed-action profile q ∈ Q, then player i receives an expected payoff of u i (q) = ∑ a∈A u i (a)π q (a). ( Let q −i ∈ Q −i = × j∈N,j =i Q j denote player i's opponents' actions.Now, an action profile q is a Nash equilibrium in the stage game if no player has a profitable deviation, i.e., u i (q) ≥ u i (q i , q −i ) for all q i ∈ Q i and i ∈ N. ( The twelve symmetric strict ordinal 2 × 2 games are presented in Figure 1, see ref. [23] for the taxonomy.Strict ordinality means that all of the payoffs must be unequal and there can be no indifferences.The two actions are C (cooperate) and D (defect), and they give the players the payoffs a = 1, b, c and d = 0; the corresponding action profiles are also denoted by letters a, b, c, d.For example, if the players choose the action profile b = (C, D), the players receive payoffs b and c.

Repeated Games
We examine a model where the stage game is repeated infinitely many times and these games are sometimes called as supergames.We assume that the players observe all the past realized pure actions but not the probabilities that the other players are using in their mixed strategies.The public past play is denoted by the set of histories H k = A k = ∏ k A, where H 0 = A 0 = {∅} is the empty set and corresponds to the beginning of the game.Thus, the history contains all the pure actions that were played in the previous rounds.The set of all possible histories is H = ∞ k=0 H k .A behavior strategy σ i of player i ∈ N is a mapping that assigns a probability distribution over player i's pure actions for each possible history σ i : H → Q i .The set of player i's strategies is Σ i .The players' strategies form a strategy profile σ = (σ 1 , . . ., σ n ), a strategy profile of all players except player i is denoted by σ −i and the set of strategy profiles is given by Σ = × i∈N Σ i .A pure strategy assigns a pure action for each possible history σ i : H → A i .With public correlation, the players observe a public signal ω k ∈ [0, 1] on each round k before making their decisions, and thus the history contains all the past pure actions, signals and the current signal.
We assume that the players discount the future payoffs with a common discount factor δ ∈ [0, 1).They have the same discount factor since we examine symmetric games.The expected discounted payoff of a strategy profile σ to player i is where u k i (σ) is the payoff of player i at round k induced by the strategy profile σ.A strategy profile σ is a Nash equilibrium if no player has a profitable deviation, i.e., and it is a subgame-perfect equilibrium (SPE) if it induces a Nash equilibrium in every subgame, i.e., where σ|h is the restriction of the strategy profile after history h ∈ H. From now on, by equilibrium we mean subgame-perfect equilibrium.

Critical Discount Factor Values
Let V be the compact set of SPE payoffs and we also use V(δ) when we want to emphasize the players' discount factor δ. By V C , V P and V M we refer to the equilibria in pure strategies with and without public correlation, and mixed strategies, respectively.
The player i's minimum equilibrium payoff, which is also called the punishment payoff, is denoted by and this is the case in the symmetric 2 × 2 supergames.Similarly, the maximum equilibrium payoff is Please note that the minimax payoffs can be different in pure and mixed strategies.However, it holds under perfect monitoring that 4,11].The player's minimum and maximum payoffs in a compact set W are denoted by ) be the set of feasible payoffs, where co denotes the convex hull of the set.The set of feasible and individually rational (FIR) payoffs are Let us denote the critical discount factor by which gives the smallest discount factor value when the payoff set coincides with the FIR payoffs.Please note that V * is convex and thus V(δ) = V * for all δ ≥ δ F by Theorem 3. Please note that the minimum equilibrium payoff v − i (δ) may be strictly higher than the minimax payoff v i , and then it is impossible to obtain all the FIR payoffs for given δ < 1.The minimum pure-strategy payoffs have been studied in [24], and [22] present an algorithm for finding the punishment paths and payoffs.
For most of the symmetric 2 × 2 games, the minimum equilibrium payoffs in pure strategies are equal to the minimax payoffs for all discount factors.No conflict, its anti-game and anti-stag hunt games are the only exceptions.In these games, the minimum payoffs are equal to the minimax values when the players are patient enough.This issue does not affect the results, since it can be shown the required levels of patience are smaller than the critical values.
The minimax payoffs in mixed strategies are the same as in pure strategies, except in leader, battle of the sexes, coordination and anti-coordination games.In these games, the minimax payoff is given by a mixed-strategy Nash equilibrium.Thus, v − (δ) = v for all δ in mixed strategies.It can be shown that the FIR payoffs are not obtained for any δ < 1 in these games.

The Characterization of Equilibria
A pair (a, w) of an action profile a ∈ A and a continuation payoff w ∈ W is admissible with respect to W if it satisfies This incentive-compatibility condition means that it is better for player i to take action a i and get the continuation payoff w i than to deviate and then obtain v − i (W).For a set of continuation payoffs W, the set of supportable action profiles is denoted by For a ∈ F δ (W), we denote the set of admissible continuation payoffs as Let D δ a : R n → R n be an affine mapping that corresponds to an action profile a ∈ A and a discount factor δ where I is an n × n identity matrix and T is an n × n diagonal matrix with the discount factor δ on the diagonal.The mapping D δ a is also defined for sets; then the addition is the Minkowski sum and D δ a (∅) = ∅.Finally, we denote the admissible payoffs that start with an action profile a ∈ A by Theorem 1.The set of pure-strategy subgame-perfect equilibrium payoffs V P is the unique largest compact set satisfying the fixed-point [8,11] The payoff set under public correlation V C is given by the largest compact set satisfying [11] Let us now characterize the set of equilibria in mixed strategies [12,13].In a repeated game, the play at each round is strategically equivalent to playing an augmented stage game, where the continuation payoffs are included in the payoffs.For each action profile a ∈ A, the payoff in the augmented game is given by ũδ (a) where x(a) is the continuation payoff after a. Please note that in pure strategies there are only two continuation payoffs for each player: w i if the player follows the equilibrium path or v − i if the player deviates.In mixed strategies, there can be a different continuation payoff x(a) after each action profile a ∈ A. Let M(u(a)) denote the set of Nash equilibrium payoffs in a stage game with payoffs u(a), a ∈ A. Now, we are ready to state the characterization for the subgame-perfect equilibrium payoffs [12,13].This result has been derived earlier in a more general model of imperfect monitoring [11].
Theorem 2. The payoff set V M is the largest compact fixed point of B: This means that the payoff set V M corresponds to the set of equilibria in augmented stage games where the payoffs are (1 − δ)u(a) + δx(a) and each x(a) can be chosen from the set V M .

Monotonicity and Helpful Results
In this subsection, by B(V) we refer to the sets B(V), B(V) and B(V), depending on which strategies are in question.A set W is called self-generating if W ⊆ B δ (W).The following result follows directly from Theorems 1 and 2 and Equation (13).
The following shows that the payoff set is monotone in the discount factor when it is convex [8,11,24,25].
This is the right-hand side of Equation ( 8) for the column that contains the maximum payoff in the game.Remark 1.
for all a and i where player i can deviate to the maximum payoff of the stage game.
The following result describes how the sets B a , a ∈ A, may cover the boundaries of V * .The result implies that we need as many sets B a , a ∈ A, to cover the FIR payoffs as there are corner points in V * .Proposition 2. The set B δ a (V * ), a ∈ A, may only cover the corner point of V * closest to u(a).It cannot cover the other corner points or the boundary of V * between these other corner points.
Proof.By the definition of D δ q , the set V * is contracted by δ and is thus strictly smaller than V * .The translation part (I − T)u(a) moves the set towards u(a).

Results for Different Strategies
We present now the results for the three different strategies, and the proofs are given in the appendix.Section 4 gives an example of the proofs in a prisoner's dilemma game.The proofs for the mixed strategies are novel in Appendix C.However, the principle behind the proofs is the same but finding all the mixed-strategy payoffs is more complicated.
The results are based on Theorems 1 and 2 and Equation ( 13), which tell that the payoff set coincides with the FIR payoffs when the admissible payoffs cover all the payoffs in V * .To find the critical value δ F , we need to find the smallest discount factor for this to happen.The main idea is to find the last payoff point v F ∈ V * that is covered when the discount factor is increased to δ F .The value δ F is typically solved from a condition that two sets, say B b and B c intersect.In the proofs, we show both the necessary and sufficient conditions for δ F : the point v F is not covered for a smaller discount factor value and all the other FIR payoffs are covered for the given value δ F .
The results are given in Tables 1-3.They show the values of δ F for different groups of games.Please note that the group boundaries cross the game boundaries.For example, there are two groups of prisoner's dilemma (PD) games in pure strategies without public correlation: the quadrilateral PDs belong to Group I (see example in Figure 2a) and the triangle PDs in Group IIIb, and the boundary between these groups is given by the equation c = 2 − b.Also, different games may belong to the same group: the triangle PDs and triangle chicken games all belong to Group IIIb. Figure 3 shows the groups for the different strategies.The thick grey lines show that there are discontinuities of δ F between the groups.This means that the value of δ F is not continuous and there may be a jump, when b and c are changed.Figure 4 shows the values of δ F in different games.
Table 1.The values of the discount factor δ F without public correlation.Some games have multiple groups, and the equation that gives the boundary is shown on the right.

Game First Group Second Group Group Boundary
Prisoner's dilemma I: Battle of the sexes IIIc:

Game First Group Second Group Third Group
Prisoner's dilemma II: δ F = (c − 1)/c Ib: Anti-stag hunt Table 3.The values of discount factor δ F in mixed strategies.

Game First Group Second Group Third Group
Prisoner's dilemma Ib:   The groups are based on the equations that determine the value of δ F and the location of the last payoff v F .We note that our classification is heuristic, and the games could be organized differently into groups.In pure strategies without public correlation: in Group I, v F is found on the upper edge of V * between u(a) and u(b), on the bottom edge between u(b) and u(d) in Group II, and in the middle in Groups IIIa-d.In Group IV, we have δ F = 1 and in Group V, δ F = 0.
With public correlation, the groups are based on the last corner point to be covered: in Group I, the last corner is in the northwest, corresponding to u(b) or u(c), the u(a) corner in Group II, and the u(d) corner in Group III.
In mixed strategies, the groups are the following.In Group Ia, the last point to be covered is on the boundary between u(b) and u(a), i.e., the intersection of B a and B b .Group Ib corresponds to the triangle games, where B b and B c intersect.In Group II, a corner point determines the value of δ F : u(b) corner is last to be covered in Group IIa and u(d) corner in Group IIb.The intersection of B d and B b determines the value for Group III.
The overview of the values of δ F is similar for all the strategies.Groups IV and V are the same, and the locations of the high and low values are about the same.However, the values of δ F are much lower with public correlation.The scale with public correlation is between 0 and 1, and between 1/2 or 2/3 to 1 with the other strategies.The smallest and the highest values within the game classes are shown with z 1 to z 4 .The difference between the pure and mixed strategies is surprisingly small; the values are typically smaller than 0.05 in quadrilateral games and smaller than 0.15 in triangle games.
We note that for all groups, there are some payoff parameters for which δ F → 1, except in Group V where δ F = 0 for all payoffs.For example, δ F → 1 when b → −∞ for prisoner's dilemma, stag hunt and coordination games.This means that we cannot extend the folk theorem unless these extreme payoff parameters can be ruled out.
We can observe that the value of δ F depends on how large V * is compared to V † .When V * is small, it is difficult to play certain actions in the game and the value of δ F is high.For example, it is difficult to play the actions b and c in a prisoner's dilemma when b → −∞.Geometrically, this means that V * stays almost the same but V † keeps increasing, making the proportion of V * to V † smaller.On the other hand, if V * is large then δ F is smaller.Please note that Groups IV and V are exceptions, where δ F is a constant and thus independent of V * and V † .
We also note that there are discontinuities in δ F on some of the group boundaries; see the thick grey lines in Figure 3.The discontinuity means that small changes in the payoffs can affect dramatically the payoff set and how large δ F is.The discontinuity between the prisoner's dilemma games is surprising and it shows that a small geometric change from the triangle-shape to the quadrangle-shape may affect whether the Pareto efficient payoffs are obtained or not.
For mixed strategies, we have a few remarks.The necessary and sufficient conditions for δ F are more difficult to prove, since the strategies and their payoffs are more complicated.For Groups IIb and III, the values are only upper bounds since we only use the sufficient conditions.In all leader, battle of the sexes, coordination and anti-coordination games, δ F = 1 but in the figure we use the values for which the pure-strategy FIR payoffs are obtained as a comparison.

Group I in Pure Strategies without Public Correlation
This group is defined by parameters 1 < c < 2 − b and b < 0. These are the prisoner's dilemmas where V * is a quadrilateral with an obtuse angle in u(a) corner; see Figure 2a.It is enough to examine the upper half of V * where v 2 > v 1 since V * is symmetric with respect to the center line.Thus, the last point v F that is covered is only defined in the upper half of V * .
The point v F I in Group I is located on the upper edge between u(a) and u(b).This point and the corresponding discount factor δ F I are solved from the intersection of B a and B b .It is enough to consider only player 1's payoffs: On the second line, the first payoff v B 1 (B a (δ F )) can be solved from the right-hand side of the admissibility condition.
We first show that v F / ∈ B δ a (V * ), a ∈ A, if δ < δ F .Since the sets B a and B b intersect at δ F , it means that v F does not belong to either B a or B b for δ < δ F .Moreover, the sets B c and B d cannot cover v F on the boundary of V * by Proposition 2. Now, let us show that every We show that all corner points (v M (δ F ), v M (δ F )), (0, 0) and (0, v M (δ F )) belong to B d .(0, 0) is the Nash equilibrium and belongs to B d for any δ.Also, (0 16), this is equal to b ≤ 0 and this is true in this group. Finally

Conclusions
This paper examines the discount factor values for which the subgame-perfect equilibrium payoffs coincide with the FIR payoffs in the symmetric 2 × 2 supergames.The main motivation is to study if the folk theorem could be extended in a class of games and find out the reasons why a high level of patience such as δ → 1 is required in some games.We find that the main reason is that it is impossible to obtain payoffs close to the minimax values: (1) this happens in Group IV in all strategies, (2) it is a result of the fact that the mixed-strategy punishment payoff is strictly smaller than the pure-strategy punishment in leader, battle of the sexes, coordination and its anti-game in mixed strategies, and (3) it
) and v D (W) be the corners of a quadrilateral set W corresponding to payoffs u(a), u(b), u(c) and u(d).For example, if u(a) and u(b) are the payoffs in the northeast and northwest corners of V † , then v A (B b (δ)) and v B (B a (δ)) are the northeast and the northwest corners of the sets B b and B a ; see Figure 2a.Moreover, let

Figure 2 .
Figure 2. Admissible payoffs in Groups I and IIIc.The shaded areas show the B a sets for a ∈ A. (a) Group I; (b) Group IIIc.

Figure 3 .
Figure 3. Illustration of the groups in pure, correlated and mixed strategies.The thick grey lines show the discontinuities between the classes.

Figure 4 .
Figure 4. Values of δ F with pure, correlated and mixed strategies.The darker shade means that the value of δ F is higher.
at the upper edge and B b covers the edge all the way to

1 and v 2 ≥
v M (δ F ) then v belongs to either B b or B d because the slope between v A (B d ) and v B (B d ) is greater than the slope between v A (B b ) and v B (B b ).Finally, the region where 0 ≤ v 2 ≤ v M (δ F ) is examined with Group II and it is covered since δ F I I I A > δ F I I in this group.

Table 2 .
The values of discount factor δ F with public correlation.