Human Strategic Decision Making in Parametrized Games

Ganzfried, Sam

doi:10.3390/math10071147

Open AccessArticle

Human Strategic Decision Making in Parametrized Games

by

Sam Ganzfried

Ganzfried Research, Miami Beach, FL 33139, USA

Mathematics 2022, 10(7), 1147; https://doi.org/10.3390/math10071147

Submission received: 7 November 2021 / Revised: 14 January 2022 / Accepted: 31 March 2022 / Published: 2 April 2022

(This article belongs to the Special Issue Game Theory and Artificial Intelligence)

Download

Browse Figure

Versions Notes

Abstract

:

Many real-world games contain parameters which can affect payoffs, action spaces, and information states. For fixed values of the parameters, the game can be solved using standard algorithms. However, in many settings agents must act without knowing the values of the parameters that will be encountered in advance. Often the decisions must be made by a human under time and resource constraints, and it is unrealistic to assume that a human can solve the game in real time. We present a new framework that enables human decision makers to make fast decisions without the aid of real-time solvers. We demonstrate applicability to a variety of situations including settings with multiple players and imperfect information.

Keywords:

game theory; interpretable AI; parametrized game; imperfect information

1. Introduction

Strong algorithms have been developed for game classes with many elements of complexity. For example, algorithms were recently able to defeat human professional players in two-player [1,2] and six-player no-limit Texas hold ’em [3]. These games have imperfect information, sequential actions, very large state spaces, and the latter has more than two players (solving multiplayer games is more challenging than two-player zero-sum games from a complexity-theoretic perspective). However, these algorithms all require an extremely large amount of computational resources for offline and/or online computations and for optimizing neural network hyperparameters. The algorithms also have a further limitation in that they are using all these resources just to solve for one very specific version of the game (e.g., Libratus and DeepStack assumed that all players start the hand with 200 times the big blind, and Pluribus assumed that all players start the hand with 100 times the big blind). In real poker, the stack sizes of the players will fluctuate as players win or lose hands, and will often differ from the starting values. While one could apply the same algorithms for any specific starting stack and blind values, there are too many possibilities to be able to run the algorithms on all of them.

In many real-world situations, there are some game parameters that will be encountered during gameplay that are unknown in advance (such as the stack sizes in poker). Furthermore, often the real-time decision maker will be a human, who may not have the ability to perform complex computations (even though such computations may have been performed offline in advance). For example, football teams have used statistical and game-theoretic models to decide whether or not to punt on a fourth down. In advance of game play, algorithms can be run that solve for optimal solutions in these models, perhaps by utilizing databases of historical play. However, in real time, the coach has only a matter of seconds (or minutes if a timeout is used) to make the decision. The optimal decision may depend on several factors that are not known until real time; for example, the score, how much time is remaining, the overall strength of the two teams, etc. The coach will need to weigh any offline algorithmic solution with newly-observed values of these parameters to make quick real-time decisions. Similar situations are also encountered frequently in national security. In recent years game-theoretic algorithms have been increasingly applied to security domains. These algorithms must be applied offline with specific values of parameters used, which may differ from the actual values encountered in real time (e.g., an attacker may use more resources than anticipated).

In these settings it is not sufficient to develop a successful algorithm for solving the game for one specific value of parameters; it is also necessary to develop a procedure for a decision maker to compute an optimal solution in real time for the newly-observed value of the parameters. Often the decision maker is a human, who may have limited technical expertise. It is not realistic to assume that the human decision maker can perform a complex algorithmic computation, tune or optimize a neural network, or perform a search over a large historical database during the seconds or minutes available to make the decision. While it is realistic to assume that a human can perform a table lookup to implement a previously computed strategy (even one stored in a large binary file), as described above this strategy may not apply to the observed parameter values. In addition, it is reasonable to assume that the human has access to a “cheat sheet” which contains a relatively small set of general rules to be applied depending on the parameter values encountered. It is also reasonable to assume that the decision maker can perform basic arithmetic (perhaps with the aid of a calculator). In this paper we present a novel framework for enabling human decision makers to act strategically in these settings.

2. Parametrized Games

We consider a setting where a decision maker must determine a strategy for a game

G_{λ}

, which contains a vector of parameters

λ

drawn from some domain

Λ

(typically subsets of the set of real numbers or integers). For any specific values of the parameters

λ

,

G_{λ}

can be solved using standard algorithms. However, it is infeasible to solve the game in advance for all possible parameter values (there may be infinitely many).

The parameters can affect different components of the game. They can affect the payoffs, set of players, strategy spaces, and/or private information. Often our goal will be to solve the game by computing a standard game-theoretic solution concept such as a Nash equilibrium. A second goal that we will also consider is opponent exploitation—computing a best response to perceived strategies of the opponents. The model of the opponents’ strategies is determined in real time based on observations of the opponents’ play, perhaps utilizing a prior distribution based on historical data. In this situation, the parameters can also affect the strategies of the opponents. In general our framework can allow for optimizing any objective, though we will focus on the natural objectives of computing Nash equilibrium or optimizing performance against assessments of opponents’ strategies.

Definition 1.

A strategic-form game G is a tuple

(N, S, u)

with finite set of players

N = {1, \dots, n}

, finite set of pure strategies

S_{i}

for each player

i \in N

, and real-valued utility for each player for each strategy vector (aka strategy profile),

u_{i} : \times_{j} S_{j} \to R

.

Definition 2.

A payoff-parametrized strategic-form game is a tuple

({G_{λ}}, Λ)

where for each real-valued vector of parameters

λ \in Λ

,

G_{λ}

is a tuple

(N, S, u_{λ})

with finite set of players

N = {1, \dots, n}

, finite set of pure strategies

S_{i}

for each player

i \in N

, and real-valued utility for each player for each strategy vector (aka strategy profile),

u_{λ, i} : \times_{j} S_{j} \to R

.

Definition 3.

A strategy-parametrized strategic-form game is a tuple

({G_{λ}}, Λ)

where for each real-valued vector of parameters

λ \in Λ

,

G_{λ}

is a tuple

(N, S_{λ}, u)

with finite set of players N, finite set of pure strategies

S_{λ, i}

for each player

i \in N

, and real-valued utility for each player for each strategy vector (aka strategy profile),

u_{i} : \times_{j} S_{λ, j} \to R

.

We can analogously define a player-parametrized game where the set of players is determined by

λ .

We can also consider games that simultaneously consider several different types of parametrization. In general we will denote a strategic-form game that has payoff, strategy, and/or player parametrization as a parametrized strategic-form game,

({G_{λ}}, Λ)

, with

G_{λ} = (λ, N_{λ}, S_{λ}, u_{λ})

for

λ \in Λ

.

While the strategic form can be used to model simultaneous actions, another representation, called the extensive form, is generally preferred when modeling settings that have sequential moves. The extensive form can also model simultaneous actions, as well as chance events and imperfect information (i.e., situations where some information is available to only some of the agents and not to others). Extensive-form games consist primarily of a game tree; each non-terminal node has an associated player (possibly chance) that makes the decision at that node, and each terminal node has associated utilities for the players. Additionally, game states are partitioned into information sets, where the player to act cannot distinguish among the states in the same information set. Therefore, in any given information set, a player must choose actions with the same distribution at each state contained in the information set. A pure strategy for player i is a mapping that selects an action at each information set belonging to player i.

In typical imperfect-information extensive-form games, the initial move is a chance move that assigns private information to each player (from a publicly-known distribution). For example, this could be private cards in poker, item valuations in auctions, or resource values in security games. Then players perform a sequence of publicly-observable actions until a terminal node is reached. Thus, for each sequence of public actions p, each player i selects a strategy that is dependent on their own private information,

τ_{i}

. Instead of viewing this decision as a selection of separate actions at each information set that follows the action sequence p, one can view this as a mapping that assigns an action for each possible value of the private information

τ_{i} .

If the set of possible values of the private information

τ_{i}

is dictated by a vector of parameters

λ

, then we say that the game is information-parametrized.

3. Parametric Decision Lists

Our goal for parametrized games is to develop a “cheat sheet” that allows a human decision maker to quickly select a strategy in real time for any possible value of the parameters

λ .

We propose a new structure, which we call a parametric decision list (PDL), which contains a small set of rules that dictate which strategy should be played for every possible parameter value in a way that can be easily understood and implemented by a human. Similarly to a standard decision list, a parametric decision list consists of a series of conditions, each resulting in the output of a strategy. For game

G_{λ}

, each condition will be of the form “if

f_{i} (λ) o_{i} 0

”, where

f_{i}

is a vector-valued function of the parameters

λ

, and

o_{i}

is vector of comparison operators from the set

{<, \leq, >, \geq, =, \neq} .

For example if

λ = (λ_{1}, λ_{2})

,

f_{i} = (4 λ_{1} + 2 λ_{2}, 3 λ_{1} - 5 λ_{2})

,

o_{i} = (\geq, <)

, then the condition would correspond to “if

4 λ_{1} + 2 λ_{2} \geq 0

and

3 λ_{1} - 5 λ_{2} < 0 .

” If the condition is satisfied, then (mixed) strategy

s_{i}

is output. We can view the initial condition as corresponding to an “If” statement, subsequent conditions as “Else if,” and the final condition as “Else”.

Definition 4.

A parametric decision list L for

G_{λ}

is a tuple

L = (F, O, S)

, where

F = (f_{i})

is a sequence of functions

f_{i} : R^{| λ |} \to R^{w}

,

O = (o_{i})

is a sequence of vectors of primitive comparison operators

o_{i}

with

| O | = | F |

, with

w = | o_{i} |,

and

S = (s_{i})

is a sequence of

| F | + 1

(mixed) strategies.

We define the depth of parametric decision list L to equal the number of strategies,

| S | = | F | + 1 .

The first

| F |

strategies correspond to when each of the

| F |

conditions are met, and the final strategy corresponds to the default case when none are met (aka, the “else” condition). The width of L is equal to w, the length of the vectors

o_{i}

. Each function outputs a w-dimensional vector

f_{i}

. Then each component j is compared to 0 using operator

o_{i j}

. If all conditions of the operators are met, then the list dictates following strategy

s_{i}

.

We say that parametrized game

G_{λ}

with objective function

g_{λ}

is

(d, w, ϵ)

-implementable if there exists a parametric decision list L with depth at most d and width at most w that achieves an objective value

g_{λ} (s_{L}) \geq g_{λ}^{*} - ϵ

for all

λ \in Λ

for the strategy

s_{L}

determined by L, where

g_{λ}^{*}

is the optimal value of the objective

g_{λ}

for

G_{λ}

. The two primary objective functions we will be considering are the exploitability of a strategy in a two-player zero-sum game, which is defined as the difference between the game value and payoff of the strategy against a best response to it, and performance of a strategy against a specific strategy (or distribution of strategies) for the opponents.

4. Parameter Sampling

A second approach for generating a set of rules for a human decision-maker would be to repeatedly sample values for parameters

λ_{i}

and compute the optimal strategy

s_{i}

in the parametrized game

G_{λ_{i}}

using standard approaches. Then when game

G_{λ^{*}}

is encountered in real time, we output the solution

s_{i}

corresponding to the value of i that minimizes

d (λ_{i}, λ^{*})

, where d is an appropriate distance metric. This sampling can be done uniformly at random over a suitably-chosen domain, or according to a more informative prior distribution if one is available. Assuming that the number of sampled games is relatively small and it is not too difficult to compute the distance function, this can potentially be another approach for human decision-making in parametrized games.

Theorem 1 shows that as the number of samples grows large, this approach produces an optimal strategy if the payoffs are continuous functions of the parameters. The analysis is for the minimum exploitability metric in two-player zero-sum games, though similar analysis can also apply for other objectives. For simplicity we assume that the parameter

λ

is one-dimensional and use the absolute value for the distance function, though an analogous result can be shown for the multi-dimensional case using an arbitrary distance metric.

Lemma 1.

Suppose all payoffs of

G^{″}

are within ϵ of the payoffs of

G^{'}

. Let

s^{'}

be a strategy profile in

G^{'} .

Then

| u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{'}) - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{'}) | \leq ϵ .

Proof.

The utility against

s_{- i}^{'}

equals the sum of the utilities against each of the opponent’s pure strategies

s_{- i}

multiplied by the weight that

s_{- i}^{'}

places on

s_{- i}

. Since each of the utilities in

G^{″}

is within

ϵ

of the utilities in

G^{'}

, the weighted sums must be within

ϵ

of each other. □

Lemma 2.

Suppose all payoffs of

G^{″}

are within ϵ of the payoffs of

G^{'}

. Let

s_{i}^{'}

be a strategy for player i of

G^{'}

, let

s_{- i}^{*}

be a nemesis strategy against

s_{i}^{'}

in

G^{'}

, and let

s_{- i}^{* *}

be a nemesis strategy against

s_{i}^{'}

in

G^{″}

. Then

| u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{*}) - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) | \leq ϵ .

Proof.

Suppose that

u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{*}) > u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) + ϵ .

By Lemma 1,

u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{* *}) \leq u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) + ϵ < u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{*}),

which contradicts the fact that

s_{i}^{*}

is a nemesis strategy against

s_{i}^{'}

in

G^{'} .

We obtain a similar contradiction if

u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) > u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{*}) + ϵ .

So we conclude that

| u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{*}) - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) | \leq ϵ .

□

Lemma 3.

Suppose all payoffs of

G^{″}

are within ϵ of the payoffs of

G^{'}

. Let

v_{i}^{'}

be the game value of

G^{'}

to player i and

v_{i}^{″}

be the value of

G^{″}

. Then

| v_{i}^{'} - v_{i}^{″} | \leq ϵ .

Proof.

Let

s_{i}^{'}

be a Nash equilibrium strategy profile in

G^{'} .

Then

u_{i} (s_{i}^{'}, s_{- i}^{'}) = v_{i}^{'} .

Let

s_{- i}^{* *}

be a nemesis against

s_{i}^{'}

in

G^{″} .

By Lemma 2,

| u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{'}) - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) | \leq ϵ .

So

| v_{i}^{'} - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) | \leq ϵ .

So

v_{i}^{'} \leq u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) + ϵ .

We know that

u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) \leq v_{i}^{″} .

So

v_{i}^{'} \leq v_{i}^{″} + ϵ .

Similar reasoning shows that

v_{i}^{″} \leq v_{i}^{'} + ϵ .

So

| v_{i}^{'} - v_{i}^{″} | \leq ϵ .

□

Lemma 4.

Suppose

s^{'}

is a Nash equilibrium of

G^{'}

, and all payoffs of

G^{″}

are within ϵ of the payoffs of

G^{'}

. Then the exploitability of

s^{'}

in

G^{″}

is at most

2 ϵ .

Proof.

Let

s^{'}

be a Nash equilibrium of

G^{'}

, and

s_{- i}^{* *}

be a nemesis strategy to

s^{'}

in

G^{″} .

Let

v_{i}^{'}

denote the value of

G^{'}

and

v_{i}^{″}

denote the value of

G^{″} .

The exploitability of

s_{i}^{'}

in

G^{″}

equals

v_{i}^{″} - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) .

By the triangle inequality and Lemmas 1 and 3,

| v_{i}^{″} - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) |

\leq | v_{i}^{″} - v_{i}^{'} | + | v_{i}^{'} - u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{'}) | + | u_{i}^{G^{'}} (s_{i}^{'}, s_{- i}^{'}) - u_{i}^{G^{″}} (s_{i}^{'}, s_{- i}^{* *}) |

\leq ϵ + 0 + ϵ = 2 ϵ .

□

Denote

u (s, λ)

as

f_{s} (λ)

. Without loss of generality suppose

λ \in [0, 1]

and that we repeatedly sample

λ_{i}

from

U (0, 1)

and compute optimal strategy

s_{i}^{*}

in the game

G_{λ_{i}}

. Then when we encounter

λ^{*}

in real time, calculate

i^{*} = {arg min}_{i} | λ^{*} - λ_{i} |

, and play

s_{i^{*}}^{*}

. Suppose the game has

n = 2

players, has m actions per player, is zero sum, and that we take t samples. Let

ϵ_{t}

denote the exploitability of

s_{i^{*}}^{*}

in

G_{λ^{*}}

.

Theorem 1.

If

f_{s}

is continuous in λ for all

s \in S

, then

{lim}_{t \to \infty} E [ϵ_{t}] = 0

.

Proof.

Let

ϵ > 0

be arbitrary, and set

ϵ^{'} = \frac{ϵ}{3} .

From continuity of

f_{s}

, there exists

δ_{s} > 0

such that

| f_{s} (λ) - f_{s} (λ^{'}) | < ϵ^{'}

for all

λ^{'}

such that

| λ^{'} - λ | < δ_{s} .

Let

δ = {min}_{s \in S} δ_{s} .

Let

T = \frac{ln (\frac{ϵ}{6 - 2 ϵ})}{ln (1 - 2 δ)}

, and let

t \geq T

be arbitrary.

Suppose that

λ^{*}

is the actual value of

λ

encountered. The probability that at least one of the sampled values

λ_{i}

satisfies

| λ_{i} - λ^{*} | \leq δ

equals

1 - {(1 - 2 δ)}^{t} .

If this occurs, then

ϵ_{t} = 2 ϵ^{'} = \frac{2 ϵ}{3}

by Lemma 4. Otherwise, the exploitability is at most 2 (since we assume all payoffs are in [0, 1]). So

E [ϵ_{t}] \leq \frac{2 ϵ}{3} (1 - {(1 - 2 δ)}^{t}) + 2 {(1 - 2 δ)}^{t}

= \frac{2 ϵ}{3} + \frac{(6 - 2 ϵ) {(1 - 2 δ)}^{t}}{3}

\leq \frac{2 ϵ}{3} + \frac{(6 - 2 ϵ) {(1 - 2 δ)}^{T}}{3}

= \frac{2 ϵ}{3} + \frac{(6 - 2 ϵ) (\frac{ϵ}{6 - 2 ϵ})}{3}

= \frac{2 ϵ}{3} + \frac{ϵ}{3} = ϵ .

□

Theorem 2.

Suppose that we have t samples

λ_{i}

producing exploitability

ϵ_{t}

. Then

G_{λ}

is

(t, t, ϵ_{t})

-implementable using the minimum exploitability objective.

Proof.

We can construct parametric decision list L as follows. First, we construct the function

f_{1} : Λ \to R^{k} .

The first component of

f_{1}

is

| λ - λ_{2} | - | λ - λ_{1} |

with operation

o_{11}

≤. So in other words, this corresponds to the condition

| λ - λ_{2} | - | λ - λ_{1} | \leq 0 .

The ith component of

f_{1}

is

| λ - λ_{i} | - | λ - λ_{1} |

, with

o_{1 i}

also

\leq .

For the strategy, we set

s_{i} = s_{i}^{*} .

These conditions put together tell us to play strategy

s_{i}

if

| λ - λ_{1} | \leq | λ - λ_{i} |

for all

i > 1 .

In general the ith component of

f_{j}

is

| λ - λ_{i} | - | λ - λ_{j} |

, with

o_{j i}

≤ and

s_{j} = s_{j}^{*} .

We can omit the final set of conditions for

f_{t}

since it is implied by the first

t - 1

sets of conditions all failing, and output

s_{t} = s_{t}^{*}

as the default strategy at depth

t .

The constructed parametric decision list L has depth and width t, and produces a strategy with exploitability of

ϵ_{t}

. □

5. Comparison of Approaches in 2 × 2 Games

In general it may be challenging to construct a small parametric decision list that achieves an approximately optimal value of the objective function. Similarly, for the sampling approach we may require a large number of samples to obtain a small approximation error. The sampling approach could be improved by first sampling as many values for the parameters as possible, then clustering the games generated (e.g., using k-means) into k clusters. We then implement the strategy corresponding to the parameter values from the cluster mean with smallest distance from the parameters we encounter in real time. This approach would require an effective distance metric between parameter vectors that can be efficiently computed, while such a metric is not required for the parametric decision list approach. Furthermore, it may be challenging to determine the optimal value of k, and there is no guarantee that this approach with k clusters will produce small error.

In this section we compare the two approaches for the problem of computing Nash equilibrium strategies (i.e., using the objective of minimizing exploitability) in two-player zero-sum strategic-form games with two pure strategies per player. We can represent a two-player

2 \times 2

game as a matrix M depicted in Equation (1), where the parameters correspond to the payoffs of the row and column players. For general-sum games we can view this game as having 8 parameters (a–h), while for zero-sum games there are 4, since

b = - a, d = - c, f = - e, h = - g .

While we can easily solve a specific game given the payoff parameters, we seek to construct a small set of rules that allow a human to easily obtain a solution for arbitrary parameter values.

M = [\begin{matrix} (a, b) & (c, d) \\ (e, f) & (g, h) \end{matrix}]

(1)

We first explore the sampling approach. We generated 100,000

2 \times 2

2-player zero-sum games with payoffs for the row player chosen uniformly at random in

[- 1, 1]

. We then used the first k games from this set as training data, for various values of k. For each value of k, we generated 10,000 new test games with uniform random payoffs. For each test game, we determine which of the k training games is “closest.” For the distance metric, we use the L2-norm over vectors for the values

(a, c, e, g) .

We then compute the exploitability of the previously-computed equilibrium strategies from the closest training game in the new test game. The average exploitability over the 10,000 games is plotted as a function of k in Figure 1. From the figure we can see that as the number of training games gets large the average exploitability approaches zero, as expected. Surprisingly, training on just the first two games actually produces a very small average exploitability of 0.0129, while training on all 100,000 produces exploitability 0.0077. The exploitability for 3 training games is significantly higher than that for 2, and jumps up sharply to a peak of 0.159 for 20 training games before descending towards zero. This erratic behavior shows that, on the one hand, the L2 distance metric has limitations for this problem and does not lead average exploitability to decrease monotonically with number of training games as we may expect. However, it also shows that it may be possible to generate a very small sample of training games (just 2) that produces a very low average exploitability.

As it turns out, there actually exists a small parametric decision list (PDL) that computes an exact Nash equilibrium for

2 \times 2

two-player general-sum strategic-form games (we can view zero-sum games as a special case). This is depicted below, using the notation for the parameters from Equation (1). We actually output equilibrium strategies for both players 1 and 2 in our PDL, though in practice we would only need to specify the strategy for the one player we are interested in. The final condition outputs a mixed strategy for both players, where the row player plays his strategies with probability p and

1 - p

while the column player plays his strategies with probability q and

1 - q

. This PDL provides a

(4, 2, 0)

implementation of the problem of computing a Nash equilibrium in this game class. A proof of correctness is in Appendix A.

If $a \geq e$ and $b \geq d$ then (1,1).
Else if $c \geq g$ and $d \geq b$ then (1,2).
Else if $e \geq a$ and $f \geq h$ then (2,1).
Else if $g \geq c$ and $h \geq f$ then (2,2).
Else $((p, 1 - p), (q, 1 - q))$ for $p = \frac{h - f}{b - f + h - d}$ , $q = \frac{g - c}{a - c + g - e}$ .

6. Parametrized Game Examples

In this section we present several examples of realistic games that depict various forms of our model. In Section 6.1, we present a game we call Simplified Final Jeopardy, which is a simplified two-player variant of the problem of determining how much to wager in final jeopardy (we can view it as the three-player version in which the third player has $0). The three-player version is played for large amounts of money on the popular game show. We will assume that the player balances are fixed. Our model has two parameters which denote assessments of the probabilities that each player will answer correctly. We can assume that the assessments are based on observations of play throughout the game, as well as the category. These parameters affect the payoffs of the players. So the game exemplifies payoff parametrization. This game is two-player zero-sum, and we use the Nash equilibrium approximation/minimum exploitability objective.

In Section 6.2 we consider a generalization of a simplified poker game that has been widely studied. Kuhn poker was one of the first games studied by game theorists [4]. More recently, it has received significant attention in the artificial intelligence community as a tractable test problem for equilibrium-finding [5,6,7,8,9,10] and opponent-exploitation [11] algorithms. In the standard version, there are two players, each dealt one card from a three-card deck. We consider a variant in which the deck has n cards. (Previously a version with a 13-card deck has been studied [12].) The cards represent private information to the players, and therefore the game exemplifies information parametrization, though no strategy or payoff parametrization.

Finally in Section 6.3, we study a game model based on the game show Weakest Link. In the Weakest Link game show, eight contestants answer a series of trivia questions to accumulate a “bank” of money, with one contestant (the “weakest link”) voted off at each round. When there are two contestants remaining, they face off for a series of five questions each, with the winner receiving the entire amount that was banked. In theory the champion could win “up to a million dollars”, though in practice the total bank ends up being in the $50k–100k range. When three contestants remain, the players face an interesting strategic decision of deciding whom to vote for. Our model has five parameters: the total amount in the bank, assessments of the probability of winning against each opponent in the final round, and assessments of voting strategies of the opposing players (the probabilities that they will vote for each player). Thus, this is a three-player game using the opponent exploitation metric with strategy parametrization over the opponents’ strategies as well as payoff parametrization.

These three games exemplify several of the different types of parametrization we have discussed: payoff, strategy, information, and opponent strategy. They exemplify the main objectives we have considered: minimizing exploitability for two-player zero-sum games, and maximizing opponent exploitation in multiplayer games. They also exemplify several different game classes (two-player zero-sum, imperfect-information, and multiplayer). These are simplified models of real popular games that are frequently played for large amounts of money. The purpose of these examples is to demonstrate the realistic applicability of the new concepts and frameworks we have presented.

In each of the games human players must make their decision under extreme time pressure in real time without any computational assistance (though of course they can prepare a strategy in advance). While the optimal strategy can be computed easily in advance for fixed values of the parameters, the players must be prepared to face any possible values for the parameters. It turns out that for the games we consider we are able to construct parametric decision lists with small width and depth that exactly solve the problem based on the derivation of closed-form solutions. While in general larger games will clearly often not have closed-form solutions, the new concepts and frameworks can still be applied, though they may require the development of new focused algorithms.

For each of the games we consider, we present the rules, as well as a small PDL that exactly optimizes the objective. Full derivations and additional analysis are in Appendix A, Appendix B, Appendix C and Appendix D.

6.1. Final Jeopardy

In the simplified final jeopardy game, two players each have an amount

X_{i}

and must select a non-negative amount

w_{i} \leq X_{i}

to wager. We will assume that

X_{1} = 5

,

X_{2} = 3

, and the

w_{i}

must be non-negative integers. The player who finishes with a higher amount wins and obtains payoff 1, while the losing player obtains payoff 0 (we can then subtract 0.5 from each payoff to make the game zero sum). If there is a tie, then we assume each player obtains payoff 0.5. Finally, there are parameters

p_{i}

that denote the probability that the players expect player i to correctly answer the question. We assume that these values are correct and are common knowledge.

For specific fixed values of the parameters

p_{1}, p_{2}

, the game is a two-player zero-sum strategic-form game, and can be solved easily using standard algorithms. But such an approach is not helpful for a human player who must be prepared to be able to quickly construct his strategy in real time for any possible values of the parameters. The parametrized game is a

6 \times 4

strategic-form game where the payoffs are functions of the parameters, and it is not obvious how to compute equilibrium strategies for all possible parameter values. As it turns out, we can construct the following small PDL which determines exact equilibrium strategies for all values of the parameters for player 1 (a derivation is provided in Appendix B).

If $p_{2} = 0$ wager 0.
Else if $p_{1} = 0$ wager 0.
Else if $p_{1} = 1$ wager 2.
Else if $p_{2} \geq \frac{1}{2}$ wager 2.
Otherwise wager 1 with probability $x = \frac{(1 - p_{1}) (1 - 2 p_{2})}{1 - p_{1} + p_{1} p_{2}}$ and wager 2 with probability $1 - x .$

The PDL for player 2 is the following:

If $p_{2} = 0$ wager 0.
Else if $p_{1} = 0$ wager 3.
Else if $p_{1} = 1$ wager 0.
Else if $p_{1} \geq \frac{1}{2}$ and $p_{2} \geq \frac{1}{2}$ wager 2.
Else if $p_{2} \geq \frac{1}{2}$ wager 3.
Otherwise wager 0 with probability $y = \frac{p_{1} p_{2}}{1 + p_{1} p_{2} - p_{1}}$ and wager 3 with probability $1 - y .$

6.2. Generalized Kuhn Poker

The rules of three-card Kuhn poker are as follows:

Two players: A and B
Both players ante $1
Deck containing three cards: 1, 2, and 3
Each player is dealt one card uniformly at random
Player A acts first and can either bet $1 or check
–
If A bets, player B can call or fold
∗
If A bets and B calls, then whoever has the higher card wins the $4 pot
∗
If A bets and B folds, then A wins the entire $3 pot
–
If A checks, B can bet $1 or check.
∗
If A checks and B bets, then A can call or fold.
·
If A checks, B bets, and A calls, then whoever has the higher card wins the $4 pot
·
If A checks, B bets, and A folds, B wins the $3 pot
∗
If A and B check, then whoever has the higher card wins the $2 pot

An analysis of the equilibria is provided in Appendix C. The equilibrium strategies contain some elements of deceptive behavior, as is present in larger variants of poker. For example, player A sometimes checks with a 3 as a trap or slowplay, and both players sometimes bet with a 1 as a bluff.

Generalized Kuhn poker has the same rules as standard Kuhn poker except that the deck contains n cards instead of 3. We will denote the game with n cards by

G_{n} .

As it turns out, we can represent equilibrium strategies for both players in

G_{n}

by the following PDL, which is derived in Appendix C.

1.

Player A’s strategy in the first round:

A always bets if $x \leq ⌊ \frac{n - 1}{9} ⌋$
If $n \neq 1 mod 9$ , then A bets with $x = ⌈ \frac{n - 1}{9} ⌉$ with probability $\frac{n - 1}{9} - ⌊ \frac{n - 1}{9} ⌋$
A always checks if $⌈ \frac{n - 1}{9} ⌉ < x < ⌊ \frac{2 n + 4}{3} ⌋$
A always bets if $x \geq ⌈ \frac{2 n + 4}{3} ⌉$
If $n \neq 1 mod 3$ , then A bets with $x = ⌊ \frac{2 n + 4}{3} ⌋$ with probability $⌈ \frac{2 n + 4}{3} ⌉ - \frac{2 n + 4}{3}$

2.

Player B’s strategy facing a bet:

B always calls if $y \geq ⌈ \frac{n - 1}{3} ⌉$
If $n \neq 1 mod 3$ , then B calls with $y = ⌊ \frac{n - 1}{3} ⌋$ with probability $⌈ \frac{n - 1}{3} ⌉ - \frac{n - 1}{3}$
B always folds if $y < ⌊ \frac{n - 1}{3} ⌋$

3.

Player B’s strategy facing a check:

B always bets if $y \leq ⌊ \frac{n - 1}{6} ⌋$
If $n \neq 1 mod 6$ , then B bets with $y = ⌈ \frac{n - 1}{6} ⌉$ with probability $\frac{n - 1}{6} - ⌊ \frac{n - 1}{6} ⌋$
B always checks if $⌈ \frac{n - 1}{6} ⌉ < y < ⌊ \frac{n + 3}{2} ⌋$
B always bets if $y \geq ⌈ \frac{n + 3}{2} ⌉$
If $n \neq 1 mod 2$ , then B bets with $y = ⌊ \frac{n + 3}{2} ⌋$ with probability $⌈ \frac{n + 3}{2} ⌉ - \frac{n + 3}{2}$

4.

Player A’s strategy after A checks and B bets:

A always calls if $x \geq ⌈ \frac{n + 5}{3} ⌉$
If $n \neq 1 mod 3,$ then A calls with $x = ⌊ \frac{n + 5}{3} ⌋$ with probability $⌈ \frac{n + 5}{3} ⌉ - \frac{n + 5}{3}$
A always folds otherwise

6.3. Weakest Link

Our final example game is a model for the final voting round (three players remain) in the show Weakest Link. Suppose that the total amount of money to be awarded to the winner is

W > 0

(the loser gets $0). Suppose that if you are head-to-head against opponent 1 you will win with probability

p_{1}

, and against opponent 2 you will win with probability

p_{2}

. Assume that player 2 is stronger than player 1, so that

p_{1} > p_{2}

. Finally, assume that player 1 will vote for you with probability

y_{1}

(and therefore will vote for player 2 with probability 1-

y_{1}

), and that player 2 will vote for you with probability

y_{2}

. We will assume that clearly no player will vote for themselves. See Appendix D for further details.

If there is a three-way tie, we will assume that each player is voted out with probability 1/3. (In reality, the “statistically strongest link” from the previous round gets to cast a tie-breaking vote, but we will ignore this aspect of the problem to simplify the analysis.) Under this assumption, our expected payoff in the case of a tie equals

1 / 3 (W * p_{1}) + 1 / 3 (W * p_{2}) + 1 / 3 * 0 = W (p_{1} + p_{2}) / 3 .

Using this game model, our analysis (provided in Appendix D) shows that we should vote for player 1 if the following condition is met (and otherwise should vote for player 2):

2 y_{1} p_{2} + y_{2} p_{2} + 3 y_{1} y_{2} p_{1} \geq 2 y_{2} p_{1} + y_{1} p_{1} + 3 y_{1} y_{2} p_{2} .

This constitutes an optimal depth 1 PDL for the objective. If

p_{2} \leq \frac{p_{1}}{2}

, then it is always optimal to vote for player 2 (the stronger player) regardless of your beliefs of the strategies taken by the other players.

7. Related Research

The current state of the art for creating strong game-theoretic strategies is to train an algorithm on a supercomputer for a significant period of time for one specific value of the game parameters. For example, strong agents were recently developed for two-player no-limit Texas hold ’em assuming that both players start with 200 times the big blind [1,2]. The strategies are typically stored in a large binary file and looked up during runtime. In real poker the values of the stack sizes relative to the blinds often change, and can be viewed as parameters. The standard approach would require performing a massive computation for each possible value of the parameters, which is intractable. Human poker players must devise a strategy for any possible combination of stack sizes. So for the realistic version of poker and many other games, which are naturally modeled as parametrized games, the standard existing approaches are inadequate.

There has recently been some recent work exploring the construction of human-understandable strategy rules in the parametrized setting. One paper showed that equilibrium strategies for endgames in two-player limit Texas hold ’em conform to one of three qualitative models, which enabled improved equilibrium computation algorithms [6]. Recent work in other imperfect-information poker games has applied machine learning algorithms (decision trees and regression) to compute human-understandable rules for fundamental situations: when a player should make a very large or small bet, and when a player should call a bet by the opponent [13,14]. The former can be viewed as a special case of the current work, and the latter computes a single general “rule of thumb” while the approach in this paper constructs a full strategy for a human player for all possible values of the game parameters.

There has also been some recent study of theoretical properties of certain classes of parametrized games in game theory literature [15,16].

8. Conclusions

We presented a new framework that enables human decision makers to make fast decisions without the aid of real-time solvers. In many settings it is unrealistic to assume that a human strategic player can perform complex computations in real time in a matter of minutes or seconds. Many real-world settings also contain parameters whose values are unknown until runtime, and it is infeasible to solve for all possible parameter values using the standard existing game-solving approaches. If a concise parametric decision list can be constructed for a given game and objective, then a human can quickly execute the corresponding strategy for any value of the parameters encountered. We presented several examples of realistic scenarios that demonstrate applicability to a variety of situations including settings with multiple players and imperfect information, and to different objectives such as minimizing exploitability and maximizing exploitation of opponents.

While we have constructed optimal PDLs analytically for several example games, this may not be possible in general. In the future we plan to explore algorithms for computing small PDLs that achieve low objective error. Such algorithms are needed to achieve large-scale applicability of the new framework. Given the similarities to decision trees and decision lists, algorithms for computing those models may be useful for PDLs [17]. We would also like to perform experiments on human subjects to determine the practical usefulness of the PDL representation for human strategic decision-making in realistic complex games.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Uniform Random 2 × 2 Two-Player Strategic-Form Games

Consider the game defined by matrix M in Equation (A1). If the top-left cell is a pure-strategy equilibrium, then we must have

a \geq e

and

b \geq d

. The analysis is identical for the other pure strategy profiles. Next, suppose there is a Nash equilibrium where one player has support of size 2 and the other player has support of size 1. Without loss of generality, suppose the column player’s strategy has support of size 1 (Left) and the row player’s strategy has support of size 2 (suppose it puts probability p on Top and

1 - p

on Bottom). Player 1 must be indifferent between his two strategies, so

a = e .

Player 2 cannot prefer R to L, so

p b + (1 - b) f \geq p d + (1 - b) h .

If

b < d

and

f < h

, then we would have

p b + (1 - b) f < p d + (1 - b) h,

which produces a contradiction. So we must have

b \geq d

or

f \geq h

. If

b \geq d

, then

(T, L)

is a pure-strategy equilibrium, and if

f \geq h

then

(B, L)

is a pure-strategy equilibrium. So the existence of a Nash equilibrium with support size 1 for one player and 2 for the other player implies the existence of a pure-strategy Nash equilibrium. Therefore, if no pure-strategy Nash equilibrium exists, then there must be a Nash equilibrium where both players’ strategies have support size 2. In this case it can be shown straightforwardly that the row player plays T with probability

\frac{h - f}{b - f + h - d}

and column player plays L with probability

\frac{g - c}{a - c + g - e}

. This produces the PDL given below.

M = [\begin{matrix} (a, b) & (c, d) \\ (e, f) & (g, h) \end{matrix}]

(A1)

If $a \geq e$ and $b \geq d$ then (1,1).
Else if $c \geq g$ and $d \geq b$ then (1,2).
Else if $e \geq a$ and $f \geq h$ then (2,1).
Else if $g \geq c$ and $h \geq f$ then (2,2).
Else $((p, 1 - p), (q, 1 - q))$ for $p = \frac{h - f}{b - f + h - d}$ , $q = \frac{g - c}{a - c + g - e}$ .

Appendix B. Simplified Two-Player Final Jeopardy

In the two-player Final Jeopardy game, players have an amount

X_{i}

and each player must select a non-negative amount

w_{i} \leq X_{i}

to wager, where

X_{i}

and

w_{i}

are non-negative integers. The player who finishes with a higher amount wins and obtains payoff 1, while the losing player obtains payoff 0 (we can then subtract 0.5 from each payoff to make the game zero sum). If there is a tie, then we assume each player obtains payoff 0.5. Finally, there are parameters

p_{i}

that denote the probability that the players expect player i to correctly answer the question. We assume that these values are correct and are common knowledge.

In the simplified version we consider, the values

X_{1} = 5

,

X_{2} = 3

are fixed. For specific fixed values of the parameters

p_{1}, p_{2}

, the game is two-player zero-sum strategic form game, and can be solved easily using standard algorithms. But such an approach is not helpful for a human player who must be prepared to be able to quickly construct his strategy in real time for any possible values of

p_{1}

and

p_{2}

.

If player 1 wagers 0 and player 2 wagers 0, then player 1 wins with probability 1. So player 1’s expected payoff is $1 - 0.5 = 0.5 .$ (Note that we are counting a win as having payoff 0.5, a loss as having payoff -0.5, and tie as having payoff 0, so that the game is zero sum.)
If player 1 wagers 0 and player 2 wagers 1, then player 1 also wins with probability 1. So player 1’s expected payoff is $1 - 0.5 = 0.5 .$
If player 1 wagers 0 and player 2 wagers 2, then player 1 wins with probability $1 - p_{2}$ , and the players tie with probability $p_{2} .$ So player 1’s expected payoff is $1 - p_{2} + 0.5 p_{2} - 0.5 = 0.5 - 0.5 p_{2} .$
If player 1 wagers 0 and player 2 wagers 3, then player 1 wins with probability $1 - p_{2}$ , and player 2 wins with probability $p_{2} .$ So player 1’s expected payoff is $1 - p_{2} - 0.5 = 0.5 - p_{2} .$
If player 1 wagers 1 and player 2 wagers 0, then player 1 wins with probability 1. So player 1’s expected payoff is $1 - 0.5 = 0.5 .$
If player 1 wagers 1 and player 2 wagers 1, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and the players tie with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) + 0.5 (1 - p_{1}) p_{2} - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} + 0.5 p_{2} - 0.5 p_{1} p_{2} - 0.5 = 0.5 p_{1} p_{2} - 0.5 p_{2} + 0.5 .$
If player 1 wagers 1 and player 2 wagers 2, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} - 0.5 = 0.5 - p_{2} .$
If player 1 wagers 1 and player 2 wagers 3, then player 1 wins with probability $1 - p_{2}$ , player 2 wins with probability $(1 - p_{1}) p_{2}$ , and the players tie with probability $p_{1} p_{2} .$ So player 1’s expected payoff is

$1 - p_{2} + 0.5 p_{1} p_{2} - 0.5 = 0.5 + 0.5 p_{1} p_{2} - p_{2} .$
If player 1 wagers 2 and player 2 wagers 0, then player 1 wins with probability $p_{1}$ , and the players tie with probability $1 - p_{1} .$ So player 1’s expected payoff is

$p_{1} + 0.5 (1 - p_{1}) - 0.5 = p_{1} + 0.5 - 0.5 p_{1} - 0.5 = 0.5 p_{1} .$
If player 1 wagers 2 and player 2 wagers 1, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 2 and player 2 wagers 2, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 2 and player 2 wagers 3, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 3 and player 2 wagers 0, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 3 and player 2 wagers 1, then player 1 wins with probability $p_{1}$ , player 2 wins with probability $(1 - p_{1}) p_{2},$ and the players tie with probability $(1 - p_{1}) (1 - p_{2}) .$ So player 1’s expected payoff is

$p_{1} + 0.5 (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 0.5 - 0.5 p_{1} - 0.5 p_{2} + 0.5 p_{1} p_{2} - 0.5 = 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} .$
If player 1 wagers 3 and player 2 wagers 2, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 3 and player 2 wagers 3, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 4 and player 2 wagers 0, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 4 and player 2 wagers 1, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 4 and player 2 wagers 2, then player 1 wins with probability $p_{1}$ , player 2 wins with probability $(1 - p_{1}) p_{2},$ and the players tie with probability $(1 - p_{1}) (1 - p_{2}) .$ So player 1’s expected payoff is

$p_{1} + 0.5 (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 0.5 - 0.5 p_{1} - 0.5 p_{2} + 0.5 p_{1} p_{2} - 0.5 = 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} .$
If player 1 wagers 4 and player 2 wagers 3, then player 1 wins with probability $p_{1} + (1 - p_{1}) (1 - p_{2})$ , and player 2 wins with probability $(1 - p_{1}) p_{2} .$ So player 1’s expected payoff is

$p_{1} + (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 1 - p_{1} - p_{2} + p_{1} p_{2} - 0.5 = 0.5 - p_{2} + p_{1} p_{2} .$
If player 1 wagers 5 and player 2 wagers 0, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 5 and player 2 wagers 1, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 5 and player 2 wagers 2, then player 1 wins with probability $p_{1}$ , and player 2 wins with probability $1 - p_{1} .$ So player 1’s expected payoff is $p_{1} - 0.5 .$
If player 1 wagers 5 and player 2 wagers 3, then player 1 wins with probability $p_{1}$ , player 2 wins with probability $(1 - p_{1}) p_{2},$ and the players tie with probability $(1 - p_{1}) (1 - p_{2}) .$ So player 1’s expected payoff is

$p_{1} + 0.5 (1 - p_{1}) (1 - p_{2}) - 0.5 = p_{1} + 0.5 - 0.5 p_{1} - 0.5 p_{2} + 0.5 p_{1} p_{2} - 0.5 = 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} .$

The game corresponds to the following payoff matrix, where the payoffs are for player 1 (we assume that player 1 is the row player and player 2 is the column player).

[\begin{matrix} 0.5 & 0.5 & 0.5 - 0.5 p_{2} & 0.5 - p_{2} \\ 0.5 & 0.5 p_{1} p_{2} - 0.5 p_{2} + 0.5 & 0.5 - p_{2} & 0.5 + 0.5 p_{1} p_{2} - p_{2} \\ 0.5 p_{1} & 0.5 - p_{2} + p_{1} p_{2} & 0.5 - p_{2} + p_{1} p_{2} & 0.5 - p_{2} + p_{1} p_{2} \\ p_{1} - 0.5 & 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} & 0.5 - p_{2} + p_{1} p_{2} & 0.5 - p_{2} + p_{1} p_{2} \\ p_{1} - 0.5 & p_{1} - 0.5 & 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} & 0.5 - p_{2} + p_{1} p_{2} \\ p_{1} - 0.5 & p_{1} - 0.5 & p_{1} - 0.5 & 0.5 p_{1} p_{2} + 0.5 p_{1} - 0.5 p_{2} \end{matrix}]

(A2)

The payoff matrix for player 2 is the following (it is the same matrix with all payoffs negated):

[\begin{matrix} - 0.5 & - 0.5 & - 0.5 + 0.5 p_{2} & - 0.5 + p_{2} \\ - 0.5 & - 0.5 p_{1} p_{2} + 0.5 p_{2} - 0.5 & - 0.5 + p_{2} & - 0.5 - 0.5 p_{1} p_{2} + p_{2} \\ - 0.5 p_{1} & - 0.5 + p_{2} - p_{1} p_{2} & - 0.5 + p_{2} - p_{1} p_{2} & - 0.5 + p_{2} - p_{1} p_{2} \\ - p_{1} + 0.5 & - 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2} & - 0.5 + p_{2} - p_{1} p_{2} & - 0.5 + p_{2} - p_{1} p_{2} \\ - p_{1} + 0.5 & - p_{1} + 0.5 & - 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2} & - 0.5 + p_{2} - p_{1} p_{2} \\ - p_{1} + 0.5 & - p_{1} + 0.5 & - p_{1} + 0.5 & - 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2} \end{matrix}]

(A3)

(0,0) is a Nash equilibrium if $p_{2} = 0 .$
Else (0,3) is a Nash equilibrium if $p_{1} = 0 .$
Else (2,0) is a Nash equilibrium if $p_{1} = 1 .$
Else (2,2) is a Nash equilibrium if $p_{1} \geq \frac{1}{2}, p_{2} \geq \frac{1}{2} .$
Else (2,3) is a Nash equilibrium if $p_{1} < \frac{1}{2}, p_{2} \geq \frac{1}{2} .$
Else P1 wagers 1 with probability $x = \frac{(1 - p_{1}) (1 - 2 p_{2})}{1 - p_{1} + p_{1} p_{2}}$ and wagers 2 with probability $1 - x$ , and P2 wagers 0 with probability $y = \frac{p_{1} p_{2}}{1 + p_{1} p_{2} - p_{1}}$ and 3 with probability $1 - y$ is a Nash equilibrium if $p_{2} < \frac{1}{2} .$

Expected payoff for player 1 against this strategy is:

0.5 y + (1 - y) (0.5 + 0.5 p_{1} p_{2} - p_{2})

= 0.5 y + 0.5 + 0.5 p_{1} p_{2} - p_{2} - 0.5 y - 0.5 y p_{1} p_{2} + y p_{2}

= 0.5 + 0.5 p_{1} p_{2} - p_{2} - 0.5 y p_{1} p_{2} + y p_{2}

= 0.5 + 0.5 p_{1} p_{2} - p_{2} + y (p_{2} - 0.5 p_{1} p_{2})

= 0.5 + 0.5 p_{1} p_{2} - p_{2} + \frac{p_{1} p_{2} (p_{2} - 0.5 p_{1} p_{2})}{1 + p_{1} p_{2} - p_{1}}

= \frac{(0.5 + 0.5 p_{1} p_{2} - p_{2}) (1 + p_{1} p_{2} - p_{1}) + p_{1} p_{2} (p_{2} - 0.5 p_{1} p_{2})}{1 + p_{1} p_{2} - p_{1}}

= \frac{0.5 + 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{1} p_{2} + 0.5 p_{1}^{2} p_{2}^{2} - 0.5 p_{1}^{2} p_{2} - p_{2} - p_{1} p_{2}^{2} + p_{1} p_{2} + p_{1} p_{2}^{2} - 0.5 p_{1}^{2} p_{2}^{2}}{1 + p_{1} p_{2} - p_{1}}

= \frac{0.5 + 2 p_{1} p_{2} - 0.5 p_{1} - 0.5 p_{1}^{2} p_{2} - p_{2}}{1 + p_{1} p_{2} - p_{1}}

Expected payoff for player 1 wagering 0 against this strategy:

0.5 y + (1 - y) (0.5 - p_{2}) = 0.5 y + 0.5 - p_{2} + 0.5 y + y p_{2} = 0.5 - p_{2} + y (1 + p_{2})

Suppose this exceeds the payoff of playing the above strategy. Then

0.5 - p_{2} + y (1 + p_{2}) > 0.5 + 0.5 p_{1} p_{2} - p_{2} + y (p_{2} - 0.5 p_{1} p_{2})

y + y p_{2} > 0.5 p_{1} p_{2} + y p_{2} - 0.5 p_{1} p_{2}^{2}

y > 0.5 p_{1} p_{2} - 0.5 p_{1} p_{2}^{2}

\frac{p_{1} p_{2}}{1 + p_{1} p_{2} - p_{1}} > 0.5 p_{1} p_{2} - 0.5 p_{1} p_{2}^{2}

p_{1} p_{2} > (0.5 p_{1} p_{2} - 0.5 p_{1} p_{2}^{2}) (1 + p_{1} p_{2} - p_{1})

p_{1} p_{2} > 0.5 p_{1} p_{2} + 0.5 p_{1}^{2} p_{2}^{2} - 0.5 p_{1}^{2} p_{2} - 0.5 p_{1} p_{2}^{2} - 0.5 p_{1}^{2} p_{2}^{3} + 0.5 p_{1}^{2} p_{2}^{2}

0 > - 0.5 p_{1} p_{2} + p_{1}^{2} p_{2}^{2} - 0.5 p_{1}^{2} p_{2} - 0.5 p_{1} p_{2}^{2} - 0.5 p_{1}^{2} p_{2}^{3}

0 > p_{1} p_{2} + 2 p_{1}^{2} p_{2}^{2} - p_{1}^{2} p_{2} - p_{1} p_{2}^{2} - p_{1}^{2} p_{2}^{3}

0 > 1 + 2 p_{1} p_{2} - p_{1} - p_{2} - p_{2}^{2}

which always false for

p_{2} < \frac{1}{2} .

Expected payoff for player 1 wagering 3 against this strategy:

y (p_{1} - 0.5) + (1 - y) (0.5 - p_{2} + p_{1} p_{2})

= y p_{1} - 0.5 y + 0.5 - p_{2} + p_{1} p_{2} - 0.5 y + y p_{2} - y p_{1} p_{2}

= y p_{1} - y + 0.5 - p_{2} + p_{1} p_{2} + y p_{2} - y p_{1} p_{2}

= 0.5 - p_{2} + p_{1} p_{2} + y (p_{1} - 1 + p_{2} - p_{1} p_{2})

Suppose this exceeds the payoff of playing the above strategy. Then

0.5 - p_{2} + p_{1} p_{2} + y (p_{1} - 1 + p_{2} - p_{1} p_{2}) > 0.5 + 0.5 p_{1} p_{2} - p_{2} + y (p_{2} - 0.5 p_{1} p_{2})

0.5 p_{1} p_{2} + y (p_{1} - 1 - 0.5 p_{1} p_{2}) > 0

0.5 p_{1} p_{2} + \frac{p_{1} p_{2} (p_{1} - 1 - 0.5 p_{1} p_{2})}{1 + p_{1} p_{2} - p_{1}} > 0

0.5 p_{1} p_{2} (1 + p_{1} p_{2} - p_{1}) + p_{1} p_{2} (p_{1} - 1 - 0.5 p_{1} p_{2}) > 0

0.5 p_{1} p_{2} + 0.5 p_{1}^{2} p_{2}^{2} - 0.5 p_{1}^{2} p_{2} + p_{1}^{2} p_{2} - p_{1} p_{2} - 0.5 p_{1}^{2} p_{2}^{2} > 0

- 0.5 p_{1} p_{2} + 0.5 p_{1}^{2} p_{2} > 0

- 0.5 + 0.5 p_{1} > 0

p_{1} > 1

which is a contradiction.

Expected payoff for player 1 wagering 4 against this strategy is identical to the expected payoff of wagering 3, so the same argument will apply.

Expected payoff for player 1 wagering 5 against this strategy:

y (p_{1} - 0.5) + (1 - y) (- 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2})

= y p_{1} - 0.5 y - 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2} + 0.5 y p_{1} p_{2} + 0.5 y p_{1} - 0.5 y p_{2}

= 1.5 y p_{1} - 0.5 y - 0.5 p_{1} p_{2} - 0.5 p_{1} + 0.5 p_{2} + 0.5 y p_{1} p_{2} - 0.5 y p_{2}

= - 0.5 p_{1} p_{2} - 0.5 p_{1} + y (1.5 p_{1} - 0.5 + 0.5 p_{1} p_{2} - 0.5 p_{2})

Suppose this exceeds the payoff of playing the above strategy. Then

= - 0.5 p_{1} p_{2} - 0.5 p_{1} + y (1.5 p_{1} - 0.5 + 0.5 p_{1} p_{2} - 0.5 p_{2}) > 0.5 + 0.5 p_{1} p_{2} - p_{2} + y (p_{2} - 0.5 p_{1} p_{2})

- 0.5 - p_{1} p_{2} - 0.5 p_{1} + p_{2} + y (1.5 p_{1} - 0.5 + p_{1} p_{2} - 1.5 p_{2}) > 0

- 0.5 - p_{1} p_{2} - 0.5 p_{1} + p_{2} + \frac{p_{1} p_{2} (1.5 p_{1} - 0.5 + p_{1} p_{2} - 1.5 p_{2})}{1 + p_{1} p_{2} - p_{1}} > 0

(- 0.5 - p_{1} p_{2} - 0.5 p_{1} + p_{2}) (1 + p_{1} p_{2} - p_{1}) + p_{1} p_{2} (1.5 p_{1} - 0.5 + p_{1} p_{2} - 1.5 p_{2}) > 0

- 0.5 - p_{1} p_{2} - 0.5 p_{1} + p_{2} - 0.5 p_{1} p_{2} - p_{1}^{2} p_{2}^{2} - 0.5 p_{1}^{2} p_{2} + p_{1} p_{2}^{2} + 0.5 p_{1} + p_{1}^{2} p_{2} + 0.5 p_{1}^{2} - p_{1} p_{2} + 1.5 p_{1}^{2} p_{2} - 0.5 p_{1} p_{2} + p_{1}^{2} p_{2}^{2} - 1.5 p_{1} p_{2}^{2} > 0

- 0.5 - 3 p_{1} p_{2} + p_{2} + 2 p_{1}^{2} p_{2} - 0.5 p_{1} p_{2}^{2} + 0.5 p_{1}^{2} > 0

Derivative with respect to

p_{2}

and setting to 0 gives

p_{2} = \frac{1 - 3 p_{1} + 2 p_{1}^{2}}{p_{1}}

For this to be between 0 and

\frac{1}{2}

, we must have

0.35 \leq p_{1} \leq 0.5 .

- 0.5 - 3 p_{1} \frac{1 - 3 p_{1} + 2 p_{1}^{2}}{p_{1}} + \frac{1 - 3 p_{1} + 2 p_{1}^{2}}{p_{1}} + 2 p_{1}^{2} \frac{1 - 3 p_{1} + 2 p_{1}^{2}}{p_{1}} - 0.5 p_{1} {(\frac{1 - 3 p_{1} + 2 p_{1}^{2}}{p_{1}})}^{2} + 0.5 p_{1}^{2} > 0

- 0.5 p_{1} - 3 p_{1} (1 - 3 p_{1} + 2 p_{1}^{2}) + (1 - 3 p_{1} + 2 p_{1}^{2}) + 2 p_{1}^{2} (1 - 3 p_{1} + 2 p_{1}^{2}) - 0.5 {(1 - 3 p_{1} + 2 p_{1}^{2})}^{2} + 0.5 p_{1}^{3} > 0

- 0.5 p_{1} - 3 p_{1} + 9 p_{1}^{2} - 6 p_{1}^{3} + 1 - 3 p_{1} + 2 p_{1}^{2} + 2 p_{1}^{2} - 6 p_{1}^{3} + 4 p_{1}^{4} - 0.5 (1 + 9 p_{1}^{2} + 4 p_{1}^{4} - 6 p_{1} + 4 p_{1}^{2} - 12 p_{1}^{3}) + 0.5 p_{1}^{3} > 0

- 0.5 p_{1} - 3 p_{1} + 9 p_{1}^{2} - 6 p_{1}^{3} + 1 - 3 p_{1} + 2 p_{1}^{2} + 2 p_{1}^{2} - 6 p_{1}^{3} + 4 p_{1}^{4} - 0.5 - 4.5 p_{1}^{2} - 2 p_{1}^{4} + 3 p_{1} - 2 p_{1}^{2} + 6 p_{1}^{3} + 0.5 p_{1}^{3} > 0

0.5 - 3.5 p_{1} + 6.5 p_{1}^{2} - 5.5 p_{1}^{3} + 2 p_{1}^{4} > 0

The LHS is always negative for

0.35 \leq p_{1} \leq 0.5 .

So we have a contradiction, and we have shown that player 1 can’t profitably deviate.

Expected payoff for player 2 against the strategy of player 1:

- 0.5 p_{1} + x (0.5 p_{1} - 0.5)

= \frac{- 0.5 - 2 p_{1} p_{2} + 0.5 p_{1} + 0.5 p_{1}^{2} p_{2} + p_{2}}{1 + p_{1} p_{2} - p_{1}}

Expected payoff for player 2 wagering 1 against this strategy:

x (- 0.5 p_{1} p_{2} + 0.5 p_{2} - 0.5) + (1 - x) (- 0.5 + p_{2} - p_{1} p_{2})

= - 0.5 x p_{1} p_{2} + 0.5 x p_{2} - 0.5 x - 0.5 + p_{2} - p_{1} p_{2} + 0.5 x - x p_{2} + x p_{1} p_{2}

= 0.5 x p_{1} p_{2} - 0.5 x p_{2} - 0.5 + p_{2} - p_{1} p_{2}

= - 0.5 + p_{2} - p_{1} p_{2} + x (0.5 p_{1} p_{2} - 0.5 p_{2})

Suppose this exceeds the payoff of playing the above strategy. Then

= - 0.5 + p_{2} - p_{1} p_{2} + x (0.5 p_{1} p_{2} - 0.5 p_{2}) > - 0.5 p_{1} + x (0.5 p_{1} - 0.5)

- 0.5 + p_{2} - p_{1} p_{2} + 0.5 p_{1} + x (0.5 p_{1} p_{2} - 0.5 p_{2} - 0.5 p_{1} + 0.5) > 0

(- 0.5 + p_{2} - p_{1} p_{2} + 0.5 p_{1}) (1 - p_{1} + p_{1} p_{2}) + (1 - p_{1}) (1 - 2 p_{2}) (0.5 p_{1} p_{2} - 0.5 p_{2} - 0.5 p_{1} + 0.5) > 0

- 0.5 + 0.5 p_{1} - 0.5 p_{1} p_{2} + p_{2} - p_{1} p_{2} + p_{1} p_{2}^{2} - p_{1} p_{2} + p_{1}^{2} p_{2} - p_{1}^{2} p_{2}^{2} + 0.5 p_{1} - 0.5 p_{1}^{2} + 0.5 p_{1}^{2} p_{2}

+ 0.5 p_{1} p_{2} - 0.5 p_{2} - 0.5 p_{1} + 0.5 - 0.5 p_{1}^{2} p_{2} + 0.5 p_{1} p_{2} + 0.5 p_{1}^{2} - 0.5 p_{1} - p_{1} p_{2}^{2} + p_{2}^{2} + p_{1} p_{2} - p_{2} > 0

- 0.5 p_{1} p_{2} - 0.5 p_{2} + p_{1}^{2} p_{2} - p_{1}^{2} p_{2}^{2} + p_{2}^{2} > 0

- 0.5 p_{1} - 0.5 + p_{1}^{2} - p_{1}^{2} p_{2} + p_{2} > 0

This is never true for

p_{2} < \frac{1}{2} .

So we have a contradiction.

Expected payoff for player 2 wagering 2 against this strategy:

x (- 0.5 + p_{2}) + (1 - x) (- 0.5 + p_{2} - p_{1} p_{2})

= - 0.5 x + x p_{2} - 0.5 + p_{2} - p_{1} p_{2} + 0.5 x - x p_{2} + x p_{1} p_{2}

= - 0.5 + p_{2} - p_{1} p_{2} + x p_{1} p_{2}

Suppose this exceeds the payoff of playing the above strategy. Then

(- 0.5 + p_{2} - p_{1} p_{2}) (1 - p_{1} + p_{1} p_{2}) + p_{1} p_{2} (1 - p_{1}) (1 - 2 p_{2}) > 0

- 0.5 + p_{2} - p_{1} p_{2} + 0.5 p_{1} - p_{1} p_{2} + p_{1}^{2} p_{2} - 0.5 p_{1} p_{2} + p_{1} p_{2}^{2} - p_{1}^{2} p_{2}^{2} + p_{1} p_{2} - p_{1}^{2} p_{2} - 2 p_{1} p_{2}^{2} + 2 p_{1}^{2} p_{2}^{2} > 0

- 0.5 + p_{2} - 1.5 p_{1} p_{2} + 0.5 p_{1} - p_{1} p_{2}^{2} + p_{1}^{2} p_{2}^{2} > 0

Taking derivative wrt

p_{2}

and setting to 0 gives

p_{2} = \frac{2 - 3 p_{1}}{4 p_{1} - 4 p_{1}^{2}}

To obtain

0 < p_{2} < \frac{1}{2}

, we must have

0.5 < p_{1} < \frac{2}{3} .

- 0.5 + \frac{2 - 3 p_{1}}{4 p_{1} - 4 p_{1}^{2}} - 1.5 p_{1} \frac{2 - 3 p_{1}}{4 p_{1} - 4 p_{1}^{2}} + 0.5 p_{1} - p_{1} {(\frac{2 - 3 p_{1}}{4 p_{1} - 4 p_{1}^{2}})}^{2} + p_{1}^{2} {(\frac{2 - 3 p_{1}}{4 p_{1} - 4 p_{1}^{2}})}^{2} > 0

- 0.5 {(4 p_{1} - 4 p_{1}^{2})}^{2} + (2 - 3 p_{1}) (4 p_{1} - 4 p_{1}^{2}) - 1.5 p_{1} (2 - 3 p_{1}) (4 p_{1} - 4 p_{1}^{2}) + 0.5 p_{1} {(4 p_{1} - 4 p_{1}^{2})}^{2} - p_{1} {(2 - 3 p_{1})}^{2} + p_{1}^{2} {(2 - 3 p_{1})}^{2} > 0

- 0.5 p_{1} {(4 - 4 p_{1})}^{2} + (2 - 3 p_{1}) (4 - 4 p_{1}) - 1.5 p_{1} (2 - 3 p_{1}) (4 - 4 p_{1}) + 0.5 p_{1}^{2} {(4 - 4 p_{1})}^{2} - {(2 - 3 p_{1})}^{2} + p_{1} {(2 - 3 p_{1})}^{2} > 0

- 8 p_{1} + 16 p_{1}^{2} - 8 p_{1}^{3} + 8 - 8 p_{1} - 12 p_{1} + 12 p_{1}^{2} - 12 p_{1} + 12 p_{1}^{2} + 18 p_{1}^{2} - 18 p_{1}^{3} + 8 p_{1}^{2} - 16 p_{1}^{3} + 8 p_{1}^{4} - 4 + 12 p_{1} - 9 p_{1}^{2} + 4 p_{1} - 12 p_{1}^{2} + 9 p_{1}^{3} > 0

8 x^{4} - 33 x^{3} + 45 x^{2} - 24 x + 4 > 0

The LHS is always negative for

0.5 < p_{1} < \frac{2}{3} .

So we have a contradiction, and we have shown that player 2 can’t profitably deviate.

Appendix C. Generalized Kuhn Poker

Kuhn poker was one of the first games studied by game theorists and was developed by Harold Kuhn in 1950 [4]. More recently, it has received significant attention in the artificial intelligence community as a tractable test problem for equilibrium-finding [5,6,7,8,9,10] and opponent-exploitation [11] algorithms. In the standard version, there are two players, each dealt one card from a three-card deck. We consider a variant in which the deck has n cards. (Previously a version with a 13-card deck has been studied [12]).

Appendix C.1. Three Card Kuhn Poker

Two players: A and B
Both players ante $1
Deck containing three cards: 1, 2, and 3
Each player is dealt one card uniformly at random
Player A acts first and can either bet $1 or check
–
If A bets, player B can call or fold
∗
If A bets and B calls, then whoever has the higher card wins the $4 pot
∗
If A bets and B folds, then A wins the entire $3 pot
–
If A checks, B can bet $1 or check.
∗
If A checks and B bets, then A can call or fold.
·
If A checks, B bets, and A calls, then whoever has the higher card wins the $4 pot
·
If A checks, B bets, and A folds, then B wins the $3 pot
∗
If A checks and B checks, then whoever has the higher card wins the $2 pot

It is known that for any

0 \leq α \leq 1

the following strategy profile is an equilibrium (and that these are all the equilibria) [4].

A bets with a 1 in the first round with probability $\frac{α}{3}$
A always checks with a 2 in the first round
A bets with a 3 in the first round with probability $α$
If A bets in the first round, then:
–
B always folds with a 1
–
B calls with a 2 with probability $\frac{1}{3}$
–
B always calls with a 3
If A checks in the first round, then:
–
B bets with a 1 with probability $\frac{1}{3}$
–
B always checks with a 2
–
B always bets with a 3
If A checks and B bets, then:
–
A always folds with a 1
–
A calls with a 2 with probability $\frac{α}{3} + \frac{1}{3}$
–
A always calls with a 3

Several immediate observations can be made from the equilibria of three-card Kuhn poker.

There are infinitely many equilibria
There are no pure strategy equilibria
Equilibrium strategies contain some elements of deceptive behavior. For example, player A sometimes checks with a 3 as a trap or slowplay, and both players sometimes bet with a 1 as a bluff.

Generalized Kuhn poker (GKP) has the same rules as standard Kuhn poker except that the deck contains n cards instead of 3. We will denote the game with n cards by

G_{n} .

Unlike the

n = 3

case, no closed-form solution has previously been derived for general

n .

We compute the solution to this game for the first time and present it below.

Appendix C.2. Solution to Generalized Kuhn Poker

Player A’s strategy in the first round:
–
A always bets if $x \leq ⌊ \frac{n - 1}{9} ⌋$
–
If $n \neq 1 mod 9$ , then A bets with $x = ⌈ \frac{n - 1}{9} ⌉$ with probability $\frac{n - 1}{9} - ⌊ \frac{n - 1}{9} ⌋$
–
A always checks if $⌈ \frac{n - 1}{9} ⌉ < x < ⌊ \frac{2 n + 4}{3} ⌋$
–
A always bets if $x \geq ⌈ \frac{2 n + 4}{3} ⌉$
–
If $n \neq 1 mod 3$ , then A bets with $x = ⌊ \frac{2 n + 4}{3} ⌋$ with probability $⌈ \frac{2 n + 4}{3} ⌉ - \frac{2 n + 4}{3}$
Player B’s strategy facing a bet:
–
B always calls if $y \geq ⌈ \frac{n - 1}{3} ⌉$
–
If $n \neq 1 mod 3$ , then B calls with $y = ⌊ \frac{n - 1}{3} ⌋$ with probability $⌈ \frac{n - 1}{3} ⌉ - \frac{n - 1}{3}$
–
B always folds if $y < ⌊ \frac{n - 1}{3} ⌋$
Player B’s strategy facing a check:
–
B always bets if $y \leq ⌊ \frac{n - 1}{6} ⌋$
–
If $n \neq 1 mod 6$ , then B bets with $y = ⌈ \frac{n - 1}{6} ⌉$ with probability $\frac{n - 1}{6} - ⌊ \frac{n - 1}{6} ⌋$
–
B always checks if $⌈ \frac{n - 1}{6} ⌉ < y < ⌊ \frac{n + 3}{2} ⌋$
–
B always bets if $y \geq ⌈ \frac{n + 3}{2} ⌉$
–
If $n \neq 1 mod 2$ , then B bets with $y = ⌊ \frac{n + 3}{2} ⌋$ with probability $⌈ \frac{n + 3}{2} ⌉ - \frac{n + 3}{2}$
Player A’s strategy after A checks and B bets:
–
A always calls if $x \geq ⌈ \frac{n + 5}{3} ⌉$
–
If $n \neq 1 mod 3,$ then A calls with $x = ⌊ \frac{n + 5}{3} ⌋$ with probability $⌈ \frac{n + 5}{3} ⌉ - \frac{n + 5}{3}$
–
A always folds otherwise

Appendix C.3. Proof of Correctness

B is facing a bet from A.
Note that for all $n \geq 3,$ we have

$⌈\frac{n - 1}{9}⌉ \leq ⌊\frac{n - 1}{3}⌋ \leq ⌈\frac{n - 1}{3}⌉ \leq ⌊\frac{2 n + 4}{3}⌋$

–
B is dealt $y \geq ⌈ \frac{n - 1}{3} ⌉$
A is bluffing with probability

$\frac{(\frac{n - 1}{9})}{(\frac{n - 1}{9}) + (\frac{n - 1}{3})} = \frac{1}{4} .$

B wins the pot whenever A is bluffing, and either wins or loses when A is value betting. Therefore, his expected payoff of calling is at least

$\frac{1}{4} \cdot $ 3 - \frac{3}{4} \cdot $ 1 = $ 0 .$

Since his expected payoff of folding would be $0, calling is a best response.
–
B is dealt $y = ⌊ \frac{n - 1}{3} ⌋$
The analysis of the previous case shows that B will obtain an expected payoff of $0 by both calling and folding, and is therefore indifferent between the two actions.
–
B is dealt $y < ⌊ \frac{n - 1}{3} ⌋$
Now B loses whenever A is value betting, and either wins or loses when A is bluffing. So his expected payoff of calling is at most $0, and folding is a best response.
A checked in the first round and is facing a bet from B. Note that for all $n \geq 3,$ we have

$⌈\frac{n - 1}{6}⌉ \leq ⌊\frac{n - 1}{3}⌋ \leq ⌈\frac{n - 1}{3}⌉ \leq ⌊\frac{n + 3}{2}⌋$

–
A is dealt $x \geq ⌈ \frac{n - 1}{3} ⌉$
B is bluffing with probability

$\frac{(\frac{n - 1}{6})}{(\frac{n - 1}{6}) + (\frac{n - 1}{2})} = \frac{1}{4} .$

A wins the pot whenever B is bluffing, and either wins or loses when B is value betting. Therefore, his expected payoff of calling is at least

$\frac{1}{4} \cdot $ 3 - \frac{3}{4} \cdot $ 1 = $ 0 .$

Since his expected payoff of folding would be $0, calling is a best response.
–
A is dealt $x = ⌊ \frac{n - 1}{3} ⌋$
The analysis of Case Appendix C.3 shows that A will obtain an expected payoff of $0 by both calling and folding, and is therefore indifferent between the two actions.
–
A is dealt $x < ⌊ \frac{n - 1}{3} ⌋$
Now A loses whenever B is value betting, and either wins or loses when B is bluffing. So his expected payoff of calling is at most $0, and folding is a best response.

Proof of optimality can be shown similarly for the other cases, which we omit for brevity.

Appendix D. Weakest Link

In the Weakest Link game show, eight contestants answer a series of trivia questions to accumulate a “bank” of money, with one contestant (the “weakest link”) voted off at each round. When there are two contestants remaining, they face off for a series of five questions each, with the winner receiving the entire amount that was banked. In theory the champion could win “up to a million dollars”, but in practice the total bank ends up being in the 40k–80k range.

For the first several rounds, it makes a lot of sense to vote for players who are actually the “weakest”, since they will be less likely to answer questions correctly and contribute to increasing the amount in the bank throughout the game. But in the final voting round (when three contestants remain), it becomes pretty clear that you should actually vote off the “strongest” player so that you can go up against a weaker opponent in the final round.

However, this analysis for the final voting round makes several assumptions, which may not hold in practice. First, it assumes that it is clear to you, and to the other players, who the strongest player remaining actually is (and if it is you, then you are screwed). This may be difficult to assess over the relatively small sample of questions each player is given during the game. For example, player A may have correctly answered more questions than player B, but A might have also received easier questions. Furthermore, while it may be evident to you that player A is the strongest contestant, it may not be obvious to player B. If B incorrectly perceives you to be stronger than A and votes for you, while A votes for B, then it is clearly optimal for you to vote for B despite the fact that B is weaker than A.

A second issue is that, regardless of whether the opponents’ perceptions of abilities are correct, they may not understand that they actually want to eliminate the strongest player as opposed to the weakest player in the final voting round. While it seems obvious that one wants to go head-to-head against the weaker remaining opponent, often I see players voting for the clearly weaker remaining opponent. In the interview of the final contestant eliminated, often their explanation makes it clear that they are not aware of the fact that players would prefer to vote off the strongest player in the last round.

Considering these additional factors, it may actually be optimal under certain circumstances to vote for the weaker remaining contestant, as opposed to the “obviously optimal strategy” of voting for the strongest one (I already gave one example above).

To construct our model, suppose that the total amount of money in the bank to be awarded to the winner is

W > 0

(the loser gets $0). Suppose that if you are head-to-head against opponent 1 you will win with probability

p_{1}

, and against opponent 2 you will win with probability

p_{2}

. Assume that player 2 is stronger than player 1, so that

p_{1} > p_{2}

. Finally, assume that player 1 will vote for you with probability

y_{1}

(and therefore will vote for player 2 with probability 1-

y_{1}

), and that player 2 will vote for you with probability

y_{2}

. We will assume that obviously no player will vote for themselves.

If there is a three-way tie, we will assume that each player is voted out with probability 1/3. (In reality, the “statistically strongest link” from the previous round gets to cast a tie-breaking vote, which is obviously very relevant, but for simplicity I will ignore this aspect of the problem to simplify the analysis). So under this assumption, our expected payoff in the case of a tie equals:

\frac{1}{3} (W * p_{1}) + \frac{1}{3} (W * p_{2}) + \frac{1}{3} * 0 = \frac{W (p_{1} + p_{2})}{3}

Given this model and assumptions, we now compute your optimal voting strategy. Observe that if both players vote for you, then your vote is irrelevant, since you will be eliminated regardless. So the only relevant cases to consider are when P1 votes for you and P2 votes for P1, and when P2 votes for you and P1 votes for P2.

P1 votes for you, P2 votes for P1:
If you vote for P1, you will go head-to-head against P2 and obtain expected payoff $p_{2} W$ .
If you vote for P2, it will be a three-way tie and you will obtain expected payoff $\frac{W (p_{1} + p_{2})}{3}$ , which was calculated above.
P2 votes for you, P1 votes for P2:
If you vote for P2, you will go head-to-head against P1 and obtain expected payoff $p_{1} W$ .
If you vote for P1, it will be a tie and you obtain $\frac{W (p_{1} + p_{2})}{3}$ .

Assuming we are in either Case 1 or Case 2 (since the other cases are irrelevant, as showed above), the probability that we are in case 1 is

y_{1} (1 - y_{2})

, and the probability we are in case 2 is

y_{2} (1 - y_{1})

. We need to normalize these so they sum to 1, so the real probability we are in case 1 is:

\frac{y_{1} (1 - y_{2})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})},

and the probability we are in case 2 is:

\frac{y_{2} (1 - y_{1})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})}

Putting this all together, our expected payoff of voting for player 1 is:

\frac{y_{1} (1 - y_{2})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})} * (W * p_{2}) + \frac{y_{2} (1 - y_{1})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})} * \frac{W (p_{1} + p_{2})}{3}

= \frac{y_{1} (1 - y_{2}) * (W * p_{2}) + y_{2} (1 - y_{1}) * \frac{W (p_{1} + p_{2})}{3}}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})}

Similarly, our expected payoff of voting for player 2 is:

\frac{y_{1} (1 - y_{2})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})} * \frac{W (p_{1} + p_{2})}{3} + \frac{y_{2} (1 - y_{1})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})} * (W * p_{2})

= \frac{y_{1} (1 - y_{2}) * \frac{W (p_{1} + p_{2})}{3} + y_{2} (1 - y_{1}) * (W * p_{1})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})}

So we should vote for player 1 if

\frac{y_{1} (1 - y_{2}) * (W * p_{2}) + y_{2} (1 - y_{1}) * \frac{W (p_{1} + p_{2})}{3}}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})} \geq \frac{y_{1} (1 - y_{2}) * \frac{W (p_{1} + p_{2})}{3} + y_{2} (1 - y_{1}) * (W * p_{1})}{y_{1} (1 - y_{2}) + y_{2} (1 - y_{1})}

We can multiply both sides by the denominator to eliminate it and obtain an equivalent condition:

y_{1} (1 - y_{2}) * (W * p_{2}) + y_{2} (1 - y_{1}) * \frac{W (p_{1} + p_{2})}{3} \geq y_{1} (1 - y_{2}) * \frac{W (p_{1} + p_{2})}{3} + y_{2} (1 - y_{1}) * (W * p_{1})

If we multiply through and expand both sides, we obtain:

y_{1} W p_{2} - y_{1} y_{2} W p_{2} + \frac{y_{2} W p_{1}}{3} + \frac{y_{2} W p_{2}}{3} - \frac{y_{1} y_{2} W p_{1}}{3} - \frac{y_{1} y_{2} W p_{2}}{3}

\geq \frac{y_{1} W p_{1}}{3} + \frac{y_{1} W p_{2}}{3} - \frac{y_{1} y_{2} W p_{1}}{3} - \frac{y_{1} y_{2} W p_{2}}{3} + y_{2} W p_{1} - y_{1} y_{2} W p_{1}

We can simplify this to obtain:

y_{1} W p_{2} - y_{1} y_{2} W p_{2} + \frac{y_{2} W p_{1}}{3} + \frac{y_{2} W p_{2}}{3} \geq \frac{y_{1} W p_{1}}{3} + \frac{y_{1} W p_{2}}{3} + y_{2} W p_{1} - y_{1} y_{2} W p_{1}

Multiplying both sides by 3:

3 y_{1} W p_{2} - 3 y_{1} y_{2} W p_{2} + y_{2} W p_{1} + y_{2} W p_{2} \geq y_{1} W p_{1} + y_{1} W p_{2} + 3 y_{2} W p_{1} - 3 y_{1} y_{2} W p_{1}

Simplifying further:

2 y_{1} W p_{2} + y_{2} W p_{2} + 3 y_{1} y_{2} W p_{1} \geq 2 y_{2} W p_{1} + y_{1} W p_{1} + 3 y_{1} y_{2} W p_{2}

Dividing all terms by W, we see that we should vote for player 1 iff:

2 y_{1} p_{2} + y_{2} p_{2} + 3 y_{1} y_{2} p_{1} \geq 2 y_{2} p_{1} + y_{1} p_{1} + 3 y_{1} y_{2} p_{2}

One immediate observation is that the optimal strategy does not depend on W.

We can further show that if

p_{2} \leq \frac{p_{1}}{2}

, then it is always optimal to vote for player 2 (the stronger player) regardless of the beliefs of the strategies taken by the other players. For brevity we omit the proof of this result.

References

Moravčík, M.; Schmid, M.; Burch, N.; Lisý, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; Bowling, M. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 2017, 356, 508–513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brown, N.; Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 2017, 359, 418–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Brown, N.; Sandholm, T. Superhuman AI for multiplayer poker. Science 2019, 365, 885–890. [Google Scholar] [CrossRef] [PubMed]
Kuhn, H.W. Simplified Two-Person Poker. In Contributions to the Theory of Games; Kuhn, H.W., Tucker, A.W., Eds.; Princeton University Press: Princeton, NJ, USA, 1950; Volume 1, pp. 97–103. [Google Scholar]
Abou Risk, N.; Szafron, D. Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, ON, Canada, 10–14 May 2010; pp. 159–166. [Google Scholar]
Ganzfried, S.; Sandholm, T. Computing equilibria by incorporating qualitative models. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, ON, Canada, 10–14 May 2010. [Google Scholar]
Gordon, G.J. No-Regret Algorithms for Structured Prediction Problems; Technical Report CMU-CALD-05-112; Carnegie Mellon University: Pittsburgh, PA, USA, 2005. [Google Scholar]
Hawkin, J.; Holte, R.; Szafron, D. Automated action abstraction of imperfect information extensive-form games. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), San Francisco, CA, USA, 7–11 August 2011; pp. 681–687. [Google Scholar]
Koller, D.; Pfeffer, A. Generating and solving imperfect information games. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; pp. 1185–1192. [Google Scholar]
Koller, D.; Pfeffer, A. Representations and Solutions for Game-Theoretic Problems. Artif. Intell. 1997, 94, 167–215. [Google Scholar] [CrossRef] [Green Version]
Hoehn, B.; Southey, F.; Holte, R.C.; Bulitko, V. Effective Short-Term Opponent Exploitation in Simplified Poker. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Pittsburgh, PA, USA, 9–13 July 2005; pp. 783–788. [Google Scholar]
Gordon, G. One-Card Poker. 2013. Available online: http://www.cs.cmu.edu/~ggordon/poker/ (accessed on 7 November 2021).
Ganzfried, S.; Yusuf, F. Computing Human-Understandable Strategies: Deducing Fundamental Rules of Poker Strategy. Games 2017, 8, 49. [Google Scholar] [CrossRef] [Green Version]
Ganzfried, S.; Chiswick, M. Most Important Fundamental Rule of Poker Strategy. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS), North Miami Beach, FL, USA, 17–20 May 2020. [Google Scholar]
Page, F. Parameterized Games, Minimal Nash Correspondences, and Connectedness; LSE Research Online Documents on Economics 65102; London School of Economics and Political Science, LSE Library: London, UK, 2015. [Google Scholar]
Flesch, J.; Predtetchinski, A. Parameterized games of perfect information. Ann. Oper. Res. 2020, 287, 683–699. [Google Scholar] [CrossRef] [Green Version]
Angelino, E.; Larus-Stone, N.; Alabi, D.; Seltzer, M.; Rudin, C. Learning Certifiably Optimal Rule Lists for Categorical Data. J. Mach. Learn. Res. 2018, 18, 1–78. [Google Scholar]

Figure 1. Exploitability vs. number of training games for two-player zero-sum games with uniform-random payoffs in [−1, 1], with results averaged over 10,000 test games for each number of training games.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ganzfried, S. Human Strategic Decision Making in Parametrized Games. Mathematics 2022, 10, 1147. https://doi.org/10.3390/math10071147

AMA Style

Ganzfried S. Human Strategic Decision Making in Parametrized Games. Mathematics. 2022; 10(7):1147. https://doi.org/10.3390/math10071147

Chicago/Turabian Style

Ganzfried, Sam. 2022. "Human Strategic Decision Making in Parametrized Games" Mathematics 10, no. 7: 1147. https://doi.org/10.3390/math10071147

APA Style

Ganzfried, S. (2022). Human Strategic Decision Making in Parametrized Games. Mathematics, 10(7), 1147. https://doi.org/10.3390/math10071147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Human Strategic Decision Making in Parametrized Games

Abstract

1. Introduction

2. Parametrized Games

3. Parametric Decision Lists

4. Parameter Sampling

5. Comparison of Approaches in 2 × 2 Games

6. Parametrized Game Examples

6.1. Final Jeopardy

6.2. Generalized Kuhn Poker

6.3. Weakest Link

7. Related Research

8. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Appendix A. Uniform Random 2 × 2 Two-Player Strategic-Form Games

Appendix B. Simplified Two-Player Final Jeopardy

Appendix C. Generalized Kuhn Poker

Appendix C.1. Three Card Kuhn Poker

Appendix C.2. Solution to Generalized Kuhn Poker

Appendix C.3. Proof of Correctness

Appendix D. Weakest Link

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI