# The Game Is Not over Yet—Go in the Post-AlphaGo Era

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. The Idea of Go

#### Komi

## 3. Go AI Short History

#### 3.1. Programming Human Expertise

#### 3.2. Monte Carlo Methods

#### 3.3. Deep Reinforcement Learning

## 4. Using AIs

#### 4.1. Playing against AIs

**Play handicap games:**One of the best features of Go is its natural handicap system that works by allowing Black to place extra stones on the board in the beginning.**Limit the AIs:**There are numerous attempts to tweak AI engines to play weaker, or one can also play an earlier network. Both of these methods may suffer from the problem of the AIs making strange, unreasonable moves.

#### 4.2. Analysis

## 5. Impact

#### 5.1. Shifting View of Strong Play

#### 5.2. Social Changes

**player:**somebody who enjoys the competition and is motivated by winning games and climbing the ranking ladder; and**scholar:**somebody who enjoys the game as an intellectual challenge, derives pleasure from the game content and learning itself and not directly from the fact of winning.

## 6. Can Humans Catch Up?

#### 6.1. Human versus AI Learning

#### 6.2. Taboos and Dogmas

## 7. Perfection: Mathematical and Relative

#### 7.1. Solved Games

**Ultra-weakly solved:**The result of perfect play is known, but not the strategy. For example Hex, where a simple mathematical proof shows that the first player wins, but we do not know how to achieve that result.**Weakly solved:**Perfect result and strategy are known from the starting position. Checkers is solved this way; therefore, there are still positions for which we do not know the perfect result.**Strongly solved:**For all legal board positions, we know the perfect result and we can demonstrate a sequences of moves leading to that.

#### 7.2. Relative Perfection

- Winning chances are balanced in the beginning, as close to 50% as possible. This is achieved by evaluating the empty board adjusting komi to a suitable integer value.
- The game has to end in a draw (for a given ruleset).
- At each turn, the playing entity chooses the best known moves, thus it is not allowed to choose a weaker move in order to attain a draw artificially.

#### 7.2.1. Motivations for Relative Perfection

#### 7.2.2. Creating ‘Perfect’ Games

**Extremely long thinking times:**While the raw policy output of the deep neural networks in modern AIs already has superhuman strength, the Monte Carlo tree search can further refine the policy and yield even better moves. Theoretically, Monte Carlo search methods converge to an optimal solution given enough time [29]. Therefore, giving long thinking times to the engines can approximate perfect play.**Handcrafted self-plays:**A human observer can do an ongoing analysis on the board positions, restart the search if the position has several good moves, and choose between equally good options. This way the human acts as a meta level for the search algorithm.

#### 7.2.3. Observations in Handcrafting

**‘Priming the pump’:**One easy method to produce a flat winning percentage graph is to build up a large search tree by allowing millions of visits in a position, preferably at a position with a forced move, so all simulations go into one move. Then, one can relatively quickly make the moves by following the most visited move until the built-up visit count disperses, usually when there are two moves with equivalent values. On the $9\times 9$ board, building up the search tree in the opening almost takes the game to its endgame. However, the search can become biased, and somehow the choices are not as good as they look.**Restarting the search:**The Monte Carlo tree search is a randomized process. Therefore, it should not be surprising that restarting the search gives different recommendations for the next move.**Making the endgame sensible:**The neural networks we use now were trained to maximize the winning percentage and to some extent the score lead. They do not value the aesthetics of the moves. This becomes a problem in the endgame, when the engines suggest moves that actually do not change the score. The theory of endgame can be discussed with mathematical precision [1] and can even be played perfectly [30]. There is probably room for developing an engine that switches from the neural network to a coded algorithm to play the endgame with no unnecessary moves. In the meantime, we can use the human observer to select aesthetic moves.**The engine changes its mind:**It is a frequent phenomenon that, after millions of visits, well after a best move has been established, a better option emerges. The engine ‘changes its mind’. Clearly, this is due to the discovery of a better variation somewhere deeper in the search tree. This does not matter much when playing against human players, but in self-play this leads to a more balanced game.

## 8. How Compressible Is Go Knowledge?

#### 8.1. OmegaGo

#### 8.2. Which Complexity Measure?

## 9. Conclusions

- we are free from the previously unknown constraints of thinking;
- we have better analysis tools and access to high-quality statistical patterns; and
- we have causal reasoning abilities.

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

AI | the field of artificial intelligence, or a software package with artificial intelligence capabilities |

AG, AGZ, AZ | AlphaGo, AlphaGo Zero, Alpha Zero |

OGS | Online-Go Server |

## References

- Törmänen, A.; Verlag, H. Rational Endgame; Hebsacker Verlag: Scheeßel, Germany, 2019. [Google Scholar]
- Power, J. Invincible, the Game of Shusaku; Game Collections Series; Kiseido Publishing Company: Kanagawa-Ken, Japan, 1998. [Google Scholar]
- Wolf, T. The program GoTools and its computer-generated tsume go database. In Proceedings of the Game Programming Workshop in Japan’94, Hakone, Japan, 21–23 October 1994; pp. 84–96. [Google Scholar]
- Kishimoto, A.; Müller, M. Search versus knowledge for solving life and death problems in Go. In Proceedings of the AAAI, Pittsburgh, PA, USA, 9–13 July 2005; pp. 1374–1379. [Google Scholar]
- Millen, J.K. Programming the game of Go. In Byte Magazine; UBM Technology Group: San Francisco, CA, USA, 1981; Volume 6. [Google Scholar]
- Baudiš, P.; Gailly, J.L. PACHI: State of the Art Open Source Go Program. In Advances in Computer Games; van den Herik, H., Plaat, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7168, pp. 24–38. [Google Scholar] [CrossRef]
- Dehaene, S. Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts; Penguin Publishing Group: New York, NY, USA, 2014. [Google Scholar]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction, 2nd ed.; Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Kruger, J.; Dunning, D. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J. Personal. Soc. Psychol.
**1999**, 77, 1121. [Google Scholar] [CrossRef] - Hotta, Y.; Obata, T. Hikaru No Go; Original Japanese Version Published in 1998; VIZ: San Francisco, CA, USA, 2004; Volume 23. [Google Scholar]
- Schaeffer, J. One Jump Ahead: Computer Perfection at Checkers, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Egri-Nagy, A.; Törmänen, A. Derived metrics for the game of Go—Intrinsic network strength assessment and cheat-detection. arXiv
**2020**, arXiv:2009.01606. [Google Scholar] - Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature
**2016**, 529, 484–489. [Google Scholar] [CrossRef] [PubMed] - Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of Go without human knowledge. Nature
**2017**, 550, 354–359. [Google Scholar] [CrossRef] [PubMed] - Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science
**2018**, 362, 1140–1144. Available online: https://science.sciencemag.org/content/362/6419/1140.full.pdf (accessed on 12 November 2020). [CrossRef] [PubMed] [Green Version] - Leela Zero, G.C.P. Go Engine with No Human-Provided Knowledge, Modeled after the AlphaGo Zero Paper. 2019. Available online: https://zero.sjeng.org (accessed on 12 November 2020).
- Wu, D.J. Accelerating Self-Play Learning in Go. arXiv
**2019**, arXiv:1902.10565. [Google Scholar] - Kahneman, D. Thinking, Fast and Slow; Farrar, Straus and Giroux: New York, NY, USA, 2011. [Google Scholar]
- Törmänen, A.; Verlag, H. Invisible—The Games of AlphaGo; Hebsacker Verlag:: Scheeßel, Germany, 2017. [Google Scholar]
- Zweig, S.; Gay, P.; Rotenberg, J. Chess Story; New York Review Books Classics; Also known as The Royal Game, Published in 1943; New York Review Books: New York, NY, USA, 2011. [Google Scholar]
- Brady, F. Endgame: Bobby Fischer’s Remarkable Rise and Fall—From America’s Brightest Prodigy to the Edge of Madness; Crown: New York, NY, USA, 2011. [Google Scholar]
- Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; Penguin Books: New York, NY, USA, 2018. [Google Scholar]
- Schaeffer, J.; Burch, N.; Björnsson, Y.; Kishimoto, A.; Müller, M.; Lake, R.; Lu, P.; Sutphen, S. Checkers Is Solved. Science
**2007**, 317, 1518–1522. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Werf, E.; Herik, H.; Uiterwijk, J. Solving Go On Small Boards. ICGA J.
**2003**, 26. [Google Scholar] - Werf, E.; Winands, M. Solving Go for Rectangular Boards. ICGA J.
**2009**, 32, 77–88. [Google Scholar] [CrossRef] [Green Version] - Tromp, J.; Farnebäck, G. Combinatorics of Go. In Computers and Games; Springer: Berlin/Heidelberg, Germany, 2007; pp. 84–99. [Google Scholar]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall Press: Englewood Cliffs, NJ, USA, 2009. [Google Scholar]
- Cobb, W. Reflections on the Game of Go: The Empty Board 1994–2004; Slate and Shell: Richmond VA, USA, 2005. [Google Scholar]
- Kocsis, L.; Szepesvári, C. Bandit Based Monte-Carlo Planning. In Machine Learning: ECML 2006; Fürnkranz, J., Scheffer, T., Spiliopoulou, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 282–293. [Google Scholar]
- Berlekamp, E.; Wolfe, D. Mathematical Go: Chilling Gets the Last Point; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
- Rhodes, J.; Nehaniv, C. Applications of Automata Theory and Algebra: Via the Mathematical Theory of Complexity to Biology, Physics, Psychology, Philosophy, and Games; World Scientific: Singapore, 2010. [Google Scholar]
- Lichtenstein, D.; Sipser, M. GO Is Polynomial-Space Hard. J. ACM
**1980**, 27, 393–401. [Google Scholar] [CrossRef] - Wolfe, D. Go endgames are PSPACE-hard. In More Games of No Chance; MSRI Publications: Cambridge, UK, 2002; Volume 42, pp. 125–136. [Google Scholar]

**Figure 1.**Patterns in GnuGo are stored in text files. These are two examples with explanation on the left. The first example is about blocking a possible incursion. The second is a standard tactical move, a net. The configuration on the right is a possible match for the second pattern.

**Figure 2.**White’s play in this figure follows Pattern Attack1 in Figure 1. However, in the first diagram, White’s move is unnecessary and inefficient, and, in the second diagram, the move does not work because of the combined effect of Black’s stones below and to the right. These two examples showcase one of the main challenges in preprogramming Go patterns into an AI.

**Figure 3.**Shoulder hit on the 5th line goes against the previous wisdom of the balance of the 3rd (secures territory) and 4th line (creates influence). Early 3-3 invasion was considered to be a weak move, until strong AIs started to play it earlier and earlier (now as early as the 2nd move). Crawling on the second line again contradicts the idea of 3rd–4th line balance, but in this example it worked very well.

**Figure 4.**Best available results on the $5\times 5$ board for the first move of Black [24], assuming area scoring. Color indicates winner and the number is the score lead.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Egri-Nagy, A.; Törmänen, A.
The Game Is Not over Yet—Go in the Post-AlphaGo Era. *Philosophies* **2020**, *5*, 37.
https://doi.org/10.3390/philosophies5040037

**AMA Style**

Egri-Nagy A, Törmänen A.
The Game Is Not over Yet—Go in the Post-AlphaGo Era. *Philosophies*. 2020; 5(4):37.
https://doi.org/10.3390/philosophies5040037

**Chicago/Turabian Style**

Egri-Nagy, Attila, and Antti Törmänen.
2020. "The Game Is Not over Yet—Go in the Post-AlphaGo Era" *Philosophies* 5, no. 4: 37.
https://doi.org/10.3390/philosophies5040037