Next Article in Journal
A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors
Previous Article in Journal
Autoregressive Models with Time-Dependent Coefficients—A Comparison between Several Approaches
Previous Article in Special Issue
Quantitative Trading through Random Perturbation Q-Network with Nonlinear Transaction Costs
 
 
Article

Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning

by 1,* and 2
1
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794, USA
2
Stony Brook School, 1 Chapman Pkwy, Stony Brook, NY 11790, USA
*
Author to whom correspondence should be addressed.
Academic Editor: Stéphane Mussard
Stats 2022, 5(3), 805-818; https://doi.org/10.3390/stats5030047
Received: 15 July 2022 / Revised: 9 August 2022 / Accepted: 13 August 2022 / Published: 17 August 2022
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity. View Full-Text
Keywords: dynamic programming; game theory; Markov decision process; optimization; two-dice pig game; value iteration dynamic programming; game theory; Markov decision process; optimization; two-dice pig game; value iteration
Show Figures

Figure 1

MDPI and ACS Style

Zhu, T.; Ma, M.H. Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning. Stats 2022, 5, 805-818. https://doi.org/10.3390/stats5030047

AMA Style

Zhu T, Ma MH. Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning. Stats. 2022; 5(3):805-818. https://doi.org/10.3390/stats5030047

Chicago/Turabian Style

Zhu, Tian, and Merry H. Ma. 2022. "Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning" Stats 5, no. 3: 805-818. https://doi.org/10.3390/stats5030047

Find Other Styles

Article Access Map by Country/Region

1
Back to TopTop