1. Introduction
Games of chance have always played a crucial role in the development and teaching of
probability (Dagobert, 1946; Rubel, 2008) [
1,
2], game theory (Brokaw and Mertz, 2004) [
3] and now, increasingly, computer programming (Dlab and Hoic-Bozic, 2021) [
4], machine learning (Hazra and Anjaria, 2022) [
5], and especially reinforcement learning (Konstantia et al., 2018; Zha et al., 2021) [
6,
7]. Games of chance can be viewed as simplified versions of real-world problems. Developing algorithms for their optimal solutions can often inspire the subsequent solutions to real-world problems, as shown in AlphaGo (Silver et al., 2016) [
8] and AlphaFold (Jumper et al., 2021) [
9]. Among the various games of chance, one common type is dice games in which six-sided dice are involved prominently in the play of the game. There are over 40 common types of dice games, including Yahtzee, backgammon, and the dice game Pig, which we shall focus on in this work. Back in 1979, a reinforcement learning algorithm-based AI backgammon player famously beat the world champion at all games for the first time (Berliner, 1980) [
10]. This was tremendous and has since inspired and propelled the development of increasingly more-sophisticated reinforcement learning algorithms and applications not only in games, but also various real-world applications [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. In the following, we focus on developing optimal play strategies for the two-dice game of Pig using ingredients of reinforcement learning.
The dice game Pig was first documented in the book
Scarne on Dice by the American magician John Scarne in 1945 [
21]. There, Scarne introduced the one-die Pig game with a simple rule of play: Two players race to reach a certain point total, say 100 points, and the first one to reach the said total wins the game. At each turn, a player repeatedly rolls a die until either a one is rolled (and the player scores nothing for that turn and it becomes the opponent’s turn), or the player holds out of his/her own volition and scores the sum of all the rolls in that turn. That is, at any time during a player’s turn, the player is faced with two decisions: roll or hold. If the player rolls a 1, he/she must stop, and the turn total will be 0. If the player rolls a number other than 1, namely, 2 to 6, this number will be added to the turn total, and the player can again choose to roll or hold; if they choose to hold, it becomes the opponent’s turn. Pig is a “jeopardy dice game”, as one can jeopardize his/her previous gains by continuing to roll for greater gains (if the next roll is not 1) or ruin (if 1 is rolled) [
22]. The one-die Pig game is simple, but the optimal strategy to play this simple game is not so simple and has only been solved in recent years using the Markov decision process and dynamic programming [
23,
24]. These authors even wrote an online interactive app so that people can play one-die Pig with the computer:
http://cs.gettysburg.edu/projects/pig/piggame.html (accessed on 1 June 2022)
The one-die Pig game has since evolved into several variations, with the most popular one being the two-dice Pig game, where two dice are rolled instead of one. Commercial variants of the two-dice Pig game include Pass the Pigs, Pig Dice, and Skunk. Being a more complicated setting than the one-die Pig game, the optimal strategy of two-dice Pig had not been derived prior to our work. The only attempt back in 1973 yielded sub-optimal strategies [
25]. In this work, we derive the optimal strategy for the original version of the two-dice Pig game using the Markov decision process and dynamic programming. The objective of two-dice Pig remains the same as the one-die game: the first player to reach the said total points, say 100 points, wins. Each player’s turn consists of repeatedly rolling two dice. After each roll, the player is faced with the same two choices: roll or hold. The rules of play for the standard two-dice Pig are summarized in
Figure 1:
If the player holds, the turn total, which is defined as the sum of the rolls during the turn, is added to the player’s score, and it becomes the opponent’s turn.
If the player rolls, then one of the following applies: If neither die shows a one, his/her sum is added to the turn total, and the player’s turn continues. If a single one is rolled, the player’s turn ends with the loss of his/her turn total. If two ones are rolled, the player’s turn ends with the loss of his/her turn total as well as his/her entire score.
3. Optimal Strategy for the Standard Two-Dice Pig Game
Throughout this paper, the default goal is 100 unless otherwise specified. For a given policy
, define the winning probability for a given state
to be
. For another policy
, we say policy
is no better than policy
if, for all states
:
The value
for state
s given policy
is defined as the winning probability:
where
and
are the winning probabilities for rolling and holding, respectively, for policy
. For the standard two-dice Pig game, these probabilities together with the policy are:
If both players are rational, i.e., Player A and the opponent Player B both play optimally, then the solution of
in the above system is the Nash equilibrium, or the optimal policy
. The optimal policy
can be obtained through value iteration (Algorithm 1).
Algorithm 1: Value Iteration Algorithm |
Initialize arbitrarily for all states and a small threshold determining accuracy of estimation whiledo for each state do: ▹ old probability ▹ new probability end for end while Output a deterministic policy such that
|
Figure 2 shows the roll/hold boundary for each possible state
s. As Opponent B’s score approaches the goal of 100, Player A plays more aggressively on each turn, especially when Player A’s score is relatively low. On the contrary, when Opponent B’s score is close to 0, Player A plays conservatively.
Figure 3 shows the decision boundary given Player A’s score when Opponent B’s score is fixed. For a better comparison, we combine these results in
Figure 4 with the additional case of an opponent score of 90. This produces the same observation we mentioned above. At the extreme case of an opponent score of 90, Player A’s optimal strategy is to simply reach the goal of 100 in one turn, as Opponent B has a very high probability of winning the game in his/her turn.
Intuitively, we believe that the player who goes first has some advantage.
Figure 5 shows the winning probabilities of the first player, Player A, when the goal is increasing. Indeed, the first player has a certain advantage in terms of a higher winning probability, and such advantage vanishes gradually as we increase the goal. When the goal is 100, the winning probability for Player A is 52.18%, while the winning probability drops to 50.76% when the goal is 200. Detailed winning probabilities for the first player when both players play optimally are summarized in
Table A1.
If the opponent Player B tries to fool us by playing less than optimally, shall Player A still use the optimal strategy? Will the optimal strategy further increase Player A’s winning chance?
Let
and
denote the winning probabilities of Player A with policy
and the opponent Player B with policy
, respectively. Further assume the optimal policy is
. Then, we have
for any other policy
. In particular,
for all state
. Define
to be the winning probability for Player A under policy
with the opponent Player B under policy
for state
; then
Therefore, no matter how the opponent Player B plays, which is unknown to Player A, Player A should always play with the optimal strategy. The worse the opponent Player B plays, the higher the winning chance for Player A. However, if the opponent Player B’s strategy is known, there may exist a better strategy for Player A such that the winning probability for Player A is maximized against .
A simple policy is the “hold at 20” policy, where a player holds as soon as the turn total reaches at least 20 if the score needed to reach the goal is more than 20 (otherwise hold the score needed to reach the goal). This generalizes to the strategy of “hold at n”, where n can be any integer number between 0 and the goal, say 100. Barring large and small values of n not meaningful for the “hold at n” strategy, we examine the range of n from 10 to 35.
Figure 6 demonstrates that for the standard two-dice Pig game, the best scenario is to “hold at 23”, as the winning probability of the optimal strategy is lowest among all those values of n. However the optimal strategy wins over all “hold at n” strategies. Even when the player who uses the optimal strategy goes second, his/her winning probability is still higher than 50% for all n. Detailed winning probabilities against “hold at n” are summarized in
Table A2.
4. “Double Trouble”, a Variation of the Standard Two-Dice Pig
This variation is the same as the standard two-dice Pig except for the addition of Rule #4 below (summarized in
Figure 7 with red boxes indicating the difference between the standard two-dice Pig and the “Double Trouble” variation):
1. Two standard dice are rolled. If neither shows a one, the sum is added to the player’s turn total.
2. If a single one is rolled, the player scores nothing and the turn ends.
3. If two ones are rolled, the player’s entire score is lost, and the turn ends.
4. If a double other than two ones is rolled, the point total is added to the turn total as with any roll, but the player is obligated to roll again.
With Rule #4 added to the standard two-dice Pig game, the framework for finding the optimal strategy needs to be changed accordingly. Indeed, such change is not trivial, as explained below, and, hence, we nickname this variation “Double Trouble”.
To apply value iteration, there are two issues we need to address first:
1. Insufficient information. Suppose the current state is (30, 10, 20), indicating Player A’s entire score is 30, the opponent Player B’s entire score is 10, and the turn total for Player A is 20. Further, assume Player A decides to roll again (there is nothing special if Player A chooses to hold). The new rolls {3, 5} and {4, 4} are totally different, although the turn totals are the same: . According to the optimal strategy obtained by the standard two-dice Pig game, Player A is supposed to hold in both cases. However, with the addition of Rule #4, Player A is forced to roll again in the latter situation, while for the first case, Player A can hold to increase his/her entire score. Therefore, for the “Double Trouble” variation, by looking at the turn total even without rolling a single one, one does not have enough information to decide whether to roll or to hold.
2. Infinite states. For the standard two-dice Pig game, we have the constraints that and that correspond to the winning condition and losing condition, respectively. As a result, the states are finite. By adding Rule #4, this variation has the potential to cause an infinite states problem. Assume the current state is (90, 20, 10) and Player A rolls {5, 5} in this turn. Originally, Player A could simply hold and win the game. In “Double Trouble”, unfortunately, Player A is not allowed to hold when a double (other than two ones) occurs. Player A has to roll again. Imagine Player A keeps on rolling doubles (other than two ones) without the opportunity to hold. In theory, Player A could reach a state even like (90, 20, 1000) with extremely low probability. Therefore, the turn total k here may blow up to infinity, and is no longer the winning condition.
To distinguish whether a double (other than two ones) has been rolled, as the decision depends on this information, we need to introduce another element to track the information. Instead of the original three-element tuple state
, here, we propose a four-element tuple
to represent the state, where the first three elements
stay the same, and the fourth element
l is a binary variable recording whether a non-ones double has been rolled in the most recent roll, where
means no other double has occurred, and
indicates a non-ones double has been rolled. The new probabilities at each given state
can be calculated by:
where
When a player decides to roll, there are four different situations: (i) a single one; (ii) two ones; (iii) a non-ones double; or (iv) none of the above. We shall prove that (iv) is a necessary and sufficient condition for winning the game with the next single roll given state
with
. Suppose Player A’s next roll has two different faces and neither of them is one, which is exactly situation (iv). Since
, Player A can simply hold and win. To win the game in one roll, the player must be able to hold, rendering doubles impossible. After the player holds, the winning condition
must be satisfied to win the game. Thus, the new rolled faces cannot include a one, otherwise
so that
(here we do not consider
because it is meaningless). Therefore, (iv) is also a necessary condition. Through the above simple proof, as long as
, the probability of winning with the next roll for a state
does not depend on the exact value of
k, since the probability of situation (iv) is always
independent of
k.
To avoid the infinite state space, which only occurs when , we claim that under the condition , the turn total k no longer matters for winning the game. For situations (i) and (ii), the new turn total will become 0 regardless of the current turn total k. For situation (iv), the player can hold and win the game no matter what the current turn total k is, because . For situation (iii), the turn total k will increase and the player is obligated to roll again. If situation (iii) keeps happening, the turn total k becomes irrelevant because the player can do nothing except roll. The probability that situation (iii) lasts forever is . Once one of the situations (i), (ii), or (iv) happen (with probability of one), the turn total k does not matter according to the previous result. Therefore, the turn total k is important only when . After that, we can ignore the exact value of k.
As the value of
k is no longer important for determining the winning probability when
, we have
for any
satisfying
and
. In particular,
for any
. Therefore, for
,
By doing so, we reduce the number of states to be finite so that there is no need to consider the case
during the value iteration, but we set it as one of the boundary conditions. The boundary conditions under the “Double Trouble” setting are:
The optimal strategy can thence be found using value iteration with the new updates (3) and (4), as well as the new boundary conditions (6).
Figure 8 shows the winning probabilities of the optimal strategy against the “hold at n” strategy considering both optimal strategies for the player going first vs. second. Like with standard two-dice Pig, even if the player goes second, the optimal strategy outperforms the “hold at n” strategy for any choice of n.
Table A3 shows the detailed winning probabilities of the optimal strategy against the “hold at n” strategy. The best n for this “Double Trouble” variation is 18. In fact, from the plot we can conclude that the winning probabilities from hold at 18 to hold at 23 are very close. It is reasonable that the best n decreases from 23 in the standard two-dice Pig game to 18 in this variant, as any non-ones double will force the player to roll again. In real game play, hold at 18 does not indicate that in each turn a player holds at exactly 18, as this is impossible. Considering the non-ones double case, the turn total can be much larger than 18, even if the player adopts the “hold at 18” strategy.
Will forcing a player to roll again when a non-ones double occurs speed up game-play in terms of fewer total turns? Here total turns is defined as the sum of the number of turns that each player has played. We simulate 10,000,000 games, assuming one player using the optimal strategy and the other player using the “hold at 25” strategy, and the average number of turns played in one-die Pig is 19.19 if the optimal strategy player goes first; the average number of turns played in one-die Pig is 19.15 if the optimal strategy player goes second. Meanwhile, the average number of turns played increases slightly to 19.25 if both players adopt the optimal strategy. For the standard two-dice Pig game, the average number of turns played is 23.38 for the optimal strategy against the “hold at 23” strategy. If the “hold at 23” strategy goes first, the average number of turns played is 23.35. If both players play optimally, the average number of turns increases slightly to 23.68. For the two-dice variation Pig, “Double Trouble”, the average number of turns played for the optimal strategy player going first and second are 24.23 and 24.21, respectively, versus “hold at 18”. If both players play optimally, the average number of turns increases slightly to 24.49. Whether the optimal strategy goes first or goes second does not impact the total turns significantly, but the optimal strategy going second always has a smaller number of turns compared to that of the optimal strategy going first. As for the standard deviation, both versions of the two-dice Pig game (approximately 11.0) have much larger standard deviations than the one-die Pig game (approximately 6.5), which is consistent with our intuition, as more dice cause higher randomness. In particular, the “Double Trouble” two-dice Pig variation (around 11.3) has a larger standard deviation than the standard two-dice Pig game (around 10.8). The exact means and standard deviations for each strategy combination are summarized in
Table 1.
One interesting fact is that both players playing optimally does not reduce the number of turns in all three case, especially in both versions of the two-dice Pig game. This phenomenon is due to the rolling one penalty, since any ones will at least lose the turn total, especially the “snake eyes” rule: double ones drop the entire set score to zero. For one-die Pig, the entire score can never decrease, which is not true for the two-dice Pig game. We refer to playing optimally in this paper as maximizing the probability of winning rather than minimizing the number of turns played. Maximizing the winning probability may also increase the chances of rolling ones, including the “snake eyes”, which increases the number of turns played. Another fact is that the two-dice Pig variation, “Double Trouble”, will have more turns on average compared to the standard two-dice Pig. Instead of reaching the goal more quickly, forcing a player to roll again in non-ones doubles increases the risk of rolling ones, rendering the turn total to be zero (or even resetting the set total if double ones are rolled), and thus the turn is wasted.