3.1. General Remarks
The type space is now assumed to be discrete. The term
, for
, is the same as the continuous prior except that the integral is now replaced with a sum. Thus, we keep the same notations and consider
When there are only two actions, the notion of threshold strategy remains exactly the same; however, since the type space is discrete, two strategies with different thresholds can evaluate to the same value for types and actions .
Definition 11 (Equivalent strategies and thresholds)
. Two strategies and for player are said to be equivalent () ifGiven two threshold strategies and , and are said to be equivalent () if . Notice that the binary relation ∼ restricted to threshold strategies is obviously an equivalence relation.
We also note that in contrast to the relation between strategies, the relation between thresholds is not an equivalence relation. Indeed, if a threshold leads to a mixed threshold strategy we cannot write because of Definition 7 is not constrained. In other words, two strategies with the same threshold can be different (as long as the mixed case is reached by some ). Note that a threshold strategy is mixed if and only if there exists such that .
Given a threshold strategy , let be the set of threshold strategies equivalent to . Given a threshold , let be the set of thresholds equivalent to .
Proposition 6. Suppose that a threshold for leads to a pure threshold strategy (i.e., ). Then, we have the following properties:
- 1.
is a non-empty set.
- 2.
For any (strictly) positive , .
- 3.
is a convex set.
- 4.
If is finite, contains vectors that are not collinear with .
Proof. 1. We have ; thus, the latter set is non-empty.
- 2.
Since the strategy only depends on the sign of , multiplying by a strictly positive has no impact on the resulting action.
- 3.
Take , and , then notice that has the same sign as , so that .
- 4.
Assume that
is finite and consider
and
. Let
:
So, and have the same sign for any which concludes the proof.
□
3.2. The Double Game Prisoner’s Dilemma
We keep the same framework for the DGPD with a discrete type space. For each
, we have
and
Proposition 7. Consider agent i and let be two consecutive types from . All threshold strategies with a threshold such that are equivalent.
Proof. This follows from the fact that there exists no type between and , so as long as we keep the threshold between those two values we end up with the same actions for agent i. □
Note that for threshold strategies, the third case () can only be reached if . So, if or , then the strategy is pure.
The search for a pure Nash equilibrium in the discrete case requires a different method since we do not have a general result for two-action multigames that can be used as in the case of continuous type space. To go further, we assume that the type space is finite. Recall for player i that is the probability that the other agent’s action is given that it follows .
Proposition 8 (
monotonicity)
. Consider agent i and suppose that the other agent follows a threshold strategy with threshold . This threshold strategy induces a value for agent i. If the other agent changes its strategy by increasing its threshold , then the induced value decreases.
Proof. By increasing its threshold value, agent
decreases the probability of playing C and thus decreases the value
□
For integers with , let denote the set of integers, .
Lemma 1. Suppose and are both increasing (or both decreasing). Then, there exists such that .
Proof. We first notice that is increasing and for . Hence, there exists a least non-negative integer k such that and it follows that . □
Theorem 5. For any DGPD with finite type space there exists a pure Bayesian Nash equilibrium comprising threshold strategies such that for both agents, the threshold .
Proof. We suppose that (the reasoning for is similar). Suppose that agent plays a threshold strategy . Then, the agent i’s best response is also a threshold strategy . By the monotonicity of and (Propositions 2 and 8), if agent increases its threshold then the threshold of agent i’s best response also increases (if then agent i’s best response threshold decreases).
Now, consider the partition of
into intervals according to agents’ type spaces:
For agent i’s threshold strategy and agent ’s best response , there exists (1) such that and (2) such that . Note that if , then it belongs to two adjacent intervals and . In this case, we arbitrarily choose to take .
We define the transition functions
and
such that
. In order to show that there exists a pure Nash equilibrium, we need to show that
Equivalently, we search for such that . From Lemma 1 with and as f and g, we can find a solution . □
3.4. On a Simple Multigame Classification
In this section, we consider multigames with finite type space. Both actions are still denoted by C and D (even if they are not necessarily associated to cooperation or defection). The payoff matrix U is not constrained as opposed to DGPD configuration. First, we define a simple classification of such games according to properties of U.
Definition 12. We denote the type space configuration by . Any payoff matrix U belongs to one of the following sets:
- 1.
The full set: payoff matrices U such that for any type space configuration the game has a pure Nash equilibrium.
- 2.
The solutionless set: payoff matrices U such that for any type space configuration the game has no pure Nash equilibrium.
- 3.
The hybrid set: payoff matrices U such that the existence of a pure NE for depends on the type space configuration.
Proposition 9. The full set is not empty: it contains matrices U with the DGPD payoff constraints. The hybrid set is also not empty.
Proof. The first set is obviously not empty in view of Theorem 5. The second set is not empty as we can find G and sharing the same payoff matrix such that one has a pure NE and the other not (see next examples). □
Consider the uniform double game
G with utilities for agents
given by
Table 14 with type space
,
and the prior
,
such that
This game has no pure Bayesian Nash equilibrium. Consider a slight variation
of the game
G such that utilities for agents
are given by
Table 14 with type space
,
and the prior
,
such that
Consider as a C/D threshold strategy with and as a C/D threshold strategy with . The pair is a pure Bayesian Nash equilibrium for the game .
Note that we do not give any particular property for the solutionless set. We postulate that this set is empty (see
Section 3.5 on algorithmic results). Recall the following formulae for the
configuration:
Observe that with no assumption on the payoff matrix, there is no guarantee that
and
will cross for a given
. Moreover, the crossing point
(if it exists) is not guaranteed to be inside
as illustrated by
Figure 4 and
Figure 5. It cuts
into two regions, one in which the best action is C and the other in which the best action is D. If the left region is C then the resulting strategy is a C/D strategy (
Figure 4) and if the left region is D then the resulting strategy is a D/C strategy (
Figure 5). We call this the
strategy type. For any DGPD configuration, the best response is always a D/C strategy.
As with the DGPD, we define the threshold function
for agent
i:
When , both utility functions have the same slopes and the threshold function is not defined. As long as , utility functions are not equal; so, there is one best action (either C or D). In this case, the best response is also a threshold strategy with the threshold . Otherwise, both utility functions are equal (there is an infinite number of crossing points). Because of the latter case, we can no longer state that any best response comprises threshold strategies.
Definition 13 (D/C threshold strategy)
. A threshold D/C strategy with threshold for agent i is a strategy such that Now, we can express the threshold as a function of
instead of
:
The case of equal slopes is reached at the forbidden value
:
Therefore, the case of equal utility functions happens when
We notice that the graph of
is split into two regions by the forbidden value (
Figure 6 and
Figure 7). Each region is associated with a different strategy type. Thus, if
is not in
, then agent
i will always play the same strategy type notwithstanding its opponent’s strategy. In
Figure 7, the forbidden value is outside
, so agent
i will only play a D/C strategy. In
Figure 6, the forbidden value is in
, so agent
i may play both strategy types depending on the opponent’s strategy
: if
then agent
i plays a C/D strategy and if
then agent
i plays a D/C strategy. Suppose that
for
. Both agents always stick to the same strategy type and there are three cases: (1) both play C/D, (2) both play D/C (like the DGPD), and (3) one plays C/D while the other plays D/C.
In order to characterize the variations of
, define
Proposition 10 ( monotonicity). The threshold function is monotonic and satisfies the following
- 1.
if then is increasing,
- 2.
if then is decreasing,
- 3.
if then is constant.
Proof. Simply compute the derivative of the function which is a homography and obtain the condition on . □
Proposition 11. Let U be the payoff matrix of a two-player double game such that
- 1.
for where is the forbidden value of ,
- 2.
and have the same sign if both agents play the same strategy type or have opposite sign otherwise.
Then, for any type space configuration the game has a pure Bayesian Nash equilibrium.
Proof. The two conditions imply that (1) agents have a unique strategy type and (2) their best response threshold functions have the same monotonicity, i.e., increasing or decreasing. Thus, we can follow the same reasoning as we had for DGPD to prove the existence of a pure Bayesian NE. □
3.5. Algorithmic Results
In this section, we develop efficient algorithms to find pure Bayesian Nash equilibria for finite type space . The first part focuses on an optimized version for DGPD while the second one focuses on a more general version. In the third part, the complexity of both algorithms are evaluated and compared to each other.
3.5.1. Algorithm for DGPD
Recall that the agents’ best response sets only comprise threshold strategies. Thus, given a finite type space (with elements for agent i) the search space comprises threshold strategy pairs. The pure Bayesian Nash equilibrium search consists of finding a fixed point (two strategies that are the mutual best responses of each other) among those combinations.
For clarity, we formulate a graphical method to represent the solution search that we call a strategy diagram that looks like
Figure 8. There are two unit intervals
, one for each agent’s type space. An arrow from agent
i’s interval
to agent
’s interval
indicates that if agent
i plays a threshold strategy with a threshold in
, the best response of agent
is a threshold strategy with a threshold in
. A solution is then simply represented by two compatible arrows as displayed in
Figure 9.
We now introduce Algorithm 1 for NE search on finite DGPD. For every threshold strategy of agent 1, we compute the associated best response of agent 2 and then compute the best response of agent 1 given the latter. Whenever for any of agent 1’s threshold strategy the computed best response is equivalent, we obtain a pure NE. The procedure
compute_cumul_proba() returns the cumulative probabilities given a probability distribution.
finder() returns the index of the type space interval that contains a given threshold.
search_space_boundaries() computes
and
and then the associated indices in the type space. This helps us optimize the overall algorithm by reducing the search space to thresholds
. Finally,
threshold_i() is the threshold function of agent
i.
Algorithm 1 Exhaustive NE search |
Require: for do if then return , end if end for return False |
Figure 10 and
Figure 11 illustrate the finding of NE search for two situations that we encounter. Note that when
we reverse the second agent’s axis as the best response threshold monotonicity is reversed compared to
. Also note that there could be multiple solutions (
Figure 11). If we only seek one solution then we stop at the first finding to reduce the computational cost.
3.5.2. General Algorithm
We now describe a different algorithm that can handle a broader range of games but is more expensive. It can also handle DGPD games but is computationally less efficient as it explores a larger search space without exploiting the DGPD structure.
Apart from the fact that we have to explore both C/D and D/C strategies, the double game algorithm is very similar to the DGPD algorithm. For each type space interval of agent 1, we compute the best response of agent 2 and the best response of agent 1 given the latter. We check that we fall back to the initial strategy by ensuring that the thresholds are in the same interval and that the strategy types are the same. Of course, we cannot benefit from the reduced search space with
and
as they make no sense in the general context. The overall procedure is summarized in
Figure 12.
Given this general algorithm we explore some properties of payoff matrices. We generated hundreds of random payoff matrices and thousands of type space configurations and ran Nash equilibrium searches. We experimentally classified the payoff matrices thanks to the NE search results. Interestingly, we notice that none of the generated matrices belongs to the solutionless set as we postulated earlier. Secondly, we find that there exist matrices in the full set that do not satisfy the conditions of Proposition 11: the set of conditions is sufficient, but not necessary, for belonging to the full set.
Next, using a modified version of the NE search that does not stop at the first solution (if it exists) we computed the average number of solutions for different type space configurations. Let input size denote the number of elements in the type space for both players. In this context, and .
Figure 13 shows the results for a sample of payoff matrices randomly chosen from the full set (thanks to our previous classification). The first trivial result is that the average never goes under 1 (otherwise it would not belong to the full set). Secondly, it seems that the average number of solution can either be constant, increase or decrease with the input size. In many cases, we even observe that the average number tends to stabilize around an arbitrary value which seems to be an integer.
We repeated this experiment with payoff matrices randomly picked from the hybrid set.
Figure 14 shows a sample of results for such matrices. As we said earlier for the full set, it seems that the averages can either increase, decrease, or stay constant and that they converge to different values. This time, those values are smaller than 1 and may not be integers.
3.5.3. Complexity Comparison
In this section, we discuss the complexity of each algorithm and compare their performances.
First of all, the left part of
Figure 15 displays the complexity of DGPD NE search for different variations of inputs. When
, both input sizes vary, while when
or
only input size varies. Notice that when the first agent’s input size is constant, the complexity is also constant. When
n varies, the complexity is linear and is almost independent of whether
m varies or not. In fact, the main driver of the complexity is the main loop that iterates over the elements of
. The procedure
finder() has a low computational cost as it comprises a binary search. In practice, for imbalanced type space size, one should always consider the agent having the smallest type space as the first agent.
The right part of
Figure 15 illustrates the improved performance of DGPD specific algorithm over the more general one. The latter also has a linear complexity but with a much higher slope. One explanation comes from the fact that for each element of
we consider both C/D and D/C strategies as starting points. Also, we do not reduce the search space with bounds like
and
.
For the general algorithm, we also compare the complexity for different variations in the input size as display in
Figure 16 (left). Again, the complexity heavily relies on the first agent’s type space size. In contrast to the DGPD algorithm, we notice that for
the complexity is slightly below the complexity when
. On the right side of
Figure 16 we see that this behavior cannot be explained by a difference in the number of iterations on the main loop. It is probably due to the cost of
finder() that increases in
when
m increases. This effect might also affect the DGPD algorithm but is not significant in our experiment.