Algorithms for Game AI
- Search algorithms are used for real-time planning. These algorithms are designed to expand a search tree from the game state currently encountered to evaluate future states under different sequences of actions. Some algorithms like A* [12] leverage heuristic functions to guide the selection of unexplored nodes, which evaluate game states based on the domain knowledge of human players. In general, searching more deeply usually yields better policies because it eliminates some errors of value estimation by looking more steps ahead. However, searching more deeply costs more computation, and the search depth is usually, in practice, fixed or decided by iterative deepening to limit the time cost. In multi-agent competitive games, adversarial search algorithms such as Minimax [13] are often used, where the agents must act against opponents and maximize their own benefits. Monte Carlo Tree Search (MCTS) [14] is one of the most popular search algorithms due to its efficiency and robustness with no domain knowledge needed. Many variants of MCTS have been proposed to further improve its efficiency by incorporating upper confidence bounds [15], domain knowledge [16], and parallelization [17].
- Optimization algorithms refine solutions to achieve specific goals, which are used to train agents with parameterized models. Among these, linear programming algorithms optimize objective functions under a set of constraints; evolutionary methods, such as Genetic Algorithms (GA) and Evolutionary Strategies (ES) [18], create a population of individuals where the fitter ones have a higher probability of reproducing and inheriting part of their structures, inspired by the process of natural selection; gradient descent is the foundation of deep learning, which achieves the efficient parameter tuning of modern artificial neural networks. These algorithms train policy models or value models containing prior knowledge before actual gameplay, which can be combined with real-time planning algorithms during the inference phase.
- Supervised learning is a data-driven method to learn patterns and relationships from labeled data. In the context of game AI, the data usually refer to the game states or observations, and the task is to learn a policy model or value model that predicts the action or the estimated value under the current state. These algorithms require a great deal of labeled data in the form of state–action or state–value pairs, usually collected from data on human gameplay or data generated by other game-playing algorithms. In general, there are two types of applications based on the data source. Applying supervised learning on human data can learn implicit human knowledge and store it in policy models or value models. Another type of application, known as policy distillation [19], relies on data generated by other models, which is used to distill a lightweight model from a larger one to improve computational efficiency or to consolidate multiple task-specific policies into a single unified policy.
- Reinforcement learning (RL) studies how agents should take actions in an environment to maximize cumulative rewards over time. Typically, RL models the environment as a Markov Decision Process (MDP), where the transition probability and rewards satisfy the Markov property that they are only related to the current state and action. When the transition model is known, generalized policy iteration, such as value iteration, uses dynamic programming to solve the optimal policy and its value function based on Bellman Equation. However, in most cases, the environment model is unknown, and model-free algorithms are preferable as they learn from experiences by interacting with the environment. There are two kinds of model-free algorithms:
- ○
- Value-based algorithms optimize the policy by approximating the values of states or state–action pairs and selecting better actions based on these values. There are different ways to update the value function [20]. A Monte Carlo (MC) algorithm updates the value function based on the cumulative rewards towards the end of the episode, while a temporal difference (TD) algorithm updates the value function based on the current reward and the value of the next state in a bootstrapping manner. Algorithms such as DQN [21] use deep neural networks to approximate the state–action value function, which can be applied to games with large state space.
- ○
- o Policy-based algorithms directly learn parameterized policies based on gradients of some performance measure using the gradient descent method. For example, REINFORCE [22] samples full episode trajectories with Monte Carlo methods to estimate returns as the loss function. However, such pure policy-based algorithms suffer from high variance, and actor–critic algorithms [23] have been proposed, which use actors to learn parameterized policies and critics to learn value functions, allowing the policy updates to consider the value estimates to reduce the variance. Actor–critic algorithms include DDPG [24], A3C [25], IMPALA [26], TRPO [27], and PPO [28], to name but a few.
- Regret-based algorithms seek to find the Nash equilibrium in games with imperfect information. The basic idea is to choose actions to minimize the regret of not having chosen other actions in previous games. Counterfactual regret minimization (CFR) [29] traverses the game tree for multiple iterations to calculate the cumulative regrets of each state–action pair. Many variants of CFR, such as Discounted CFR [30] and MCCFR [31], have been proposed to improve efficiency by incorporating sampling, discounting, and reweighting techniques. For games with large state space, algorithms such as Deep CFR [32] and DREAM [33] adopt neural networks to approximate the regrets and policies.
- Self-play algorithms are often used in competitive multi-agent environments. In fictitious play (FP) [34], each agent calculates its best response to the opponent’s average policies, which has been proven to converge to the Nash equilibrium in theory. When applied to games with large state space, neural fictitious self-play (NFSP) [35] adopts neural networks as the policy model, using reinforcement learning to calculate best responses and supervised learning to learn average policies. Double oracle (DO) [36] starts from a small policy subset, where each agent iteratively calculates the Nash equilibrium under the current strategy set and adds it to the set. Policy-space response oracles (PSRO) [37] provide a unified view of them using a policy pool to train new policies to be added, which has become the common practice of multi-agent RL training.
- Multi-agent RL (MARL) algorithms extend single-agent RL algorithms to multi-agent settings, usually following a training paradigm called centralized training decentralized execution (CTDE). CTDE jointly trains multiple agents in a centralized manner but keeps their independence in execution, which can provide a mechanism of communication to eliminate the problems of unstable dynamics in independent training. For example, value-based algorithms like Value Decomposition Network (VDN) [38] and QMIX [39] are variants of DQN in cooperative multi-agent settings that adopt centralized state-action value functions, using summation and mixing networks to combine individual Q-networks. Multi-agent DDPG (MADDPG) [40] is a policy-based algorithm that generalizes DDPG to multi-agent settings, and many variants of it have been proposed to improve its performance and efficiency.
Funding
Conflicts of Interest
List of Contributions
- Wang, W.; Sun, D.; Jiang, F.; Chen, X.; Zhu, C. Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security. Algorithms 2022, 15, 134. https://doi.org/10.3390/a15040134.
- Sanjaya, R.; Wang, J.; Yang, Y. Measuring the Non-Transitivity in Chess. Algorithms 2022, 15, 152. https://doi.org/10.3390/a15050152.
- Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. https://doi.org/10.3390/a15060205.
- Lu, Y.; Li, W. Techniques and Paradigms in Modern Game AI Systems. Algorithms 2022, 15, 282. https://doi.org/10.3390/a15080282.
- Lu, Y.; Li, W.; Li, W. Official International Mahjong: A New Playground for AI Research. Algorithms 2023, 16, 235. https://doi.org/10.3390/a16050235.
- Li, Z.; Chen, X.; Fu, J.; Xie, N.; Zhao, T. Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL. Algorithms 2024, 17, 36. https://doi.org/10.3390/a17010036.
- Li, J.; Xie, N.; Zhao, T. Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer. Algorithms 2024, 17, 37. https://doi.org/10.3390/a17010037.
- Schaa, H.; Barriga, N.A. Evaluating the Expressive Range of Super Mario Bros Level Generators. Algorithms 2024, 17, 307. https://doi.org/10.3390/a17070307.
- Zhang, L.; Zou, H.; Zhu, Y. An Efficient Optimization of the Monte Carlo Tree Search Algorithm for Amazons. Algorithms 2024, 17, 334. https://doi.org/10.3390/a17080334.
- Hsieh, Y.-H.; Kao, C.-C.; Yuan, S.-M. Imitating Human Go Players via Vision Transformer. Algorithms 2025, 18, 61. https://doi.org/10.3390/a18020061.
- Penelas, G.; Barbosa, L.; Reis, A.; Barroso, J.; Pinto, T. Machine Learning for Decision Support and Automation in Games: A Study on Vehicle Optimal Path. Algorithms 2025, 18, 106. https://doi.org/10.3390/a18020106.
References
- Campbell, M.; Hoane, A.J., Jr.; Hsu, F.H. Deep blue. Artif. Intell. 2002, 134, 57–83. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [PubMed]
- Moravčík, M.; Schmid, M.; Burch, N.; Lisý, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; Bowling, M. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 2017, 356, 508–513. [Google Scholar] [CrossRef] [PubMed]
- Brown, N.; Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 2018, 359, 418–424. [Google Scholar] [CrossRef]
- Brown, N.; Sandholm, T. Superhuman AI for multiplayer poker. Science 2019, 365, 885–890. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Chung, J.; Mathieu, M.; Jaderberg, M.; Czarnecki, W.M.; Dudzik, A.; Huang, A.; Georgiev, P.; Powell, R.; et al. Alphastar: Mastering the real-time strategy game starcraft ii. Deep. Blog 2019, 2, 20. [Google Scholar]
- Berner, C.; Brockman, G.; Chan, B.; Cheung, V.; Dębiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, C.; et al. Dota 2 with large scale deep reinforcement learning. arXiv 2019, arXiv:1912.06680. [Google Scholar]
- Ye, D.; Chen, G.; Zhang, W.; Chen, S.; Yuan, B.; Liu, B.; Chen, J.; Liu, Z.; Qiu, F.; Yu, H.; et al. Towards playing full moba games with deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 621–632. [Google Scholar]
- Xia, B.; Ye, X.; Abuassba, A.O. Recent research on ai in games. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 505–510. [Google Scholar]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Stockman, G.C. A minimax algorithm better than alpha-beta? Artif. Intell. 1979, 12, 179–196. [Google Scholar] [CrossRef]
- Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
- Kocsis, L.; Szepesvári, C. Bandit based monte-carlo planning. In Proceedings of the European Conference on Machine Learning, Berlin, Germany, 18–22 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 282–293. [Google Scholar]
- Gelly, S.; Silver, D. Combining online and offline knowledge in UCT. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 273–280. [Google Scholar]
- Chaslot, G.M.B.; Winands, M.H.; van Den Herik, H.J. Parallel monte-carlo tree search. In Proceedings of the Computers and Games: 6th International Conference, CG 2008, Proceedings 6, Beijing, China, 29 September–1 October 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 60–71. [Google Scholar]
- Rechenberg, I. Evolutionsstrategien. In Proceedings of the Simulationsmethoden in der Medizin und Biologie: Workshop, Hannover, Germany, 29 September–1 October 1977; Springer: Berlin/Heidelberg, Germany, 1978; pp. 83–114. [Google Scholar]
- Rusu, A.A.; Colmenarejo, S.G.; Gulcehre, C.; Desjardins, G.; Kirkpatrick, J.; Pascanu, R.; Mnih, V.; Kavukcuoglu, K.; Hadsell, R. Policy distillation. arXiv 2015, arXiv:1511.06295. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1, pp. 9–11. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. In Proceedings of the Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999; pp. 1008–1014. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PmLR, New York, NY, USA, 19–24 June 2006; pp. 1928–1937. [Google Scholar]
- Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1407–1416. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Zinkevich, M.; Johanson, M.; Bowling, M.; Piccione, C. Regret minimization in games with incomplete information. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1729–1736. [Google Scholar]
- Brown, N.; Sandholm, T. Solving imperfect-information games via discounted regret minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1829–1836. [Google Scholar]
- Lanctot, M.; Waugh, K.; Zinkevich, M.; Bowling, M. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1078–1086. [Google Scholar]
- Brown, N.; Lerer, A.; Gross, S.; Sandholm, T. Deep counterfactual regret minimization. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 793–802. [Google Scholar]
- Steinberger, E.; Lerer, A.; Brown, N. Dream: Deep regret minimization with advantage baselines and model-free learning. arXiv 2020, arXiv:2006.10410. [Google Scholar]
- Heinrich, J.; Lanctot, M.; Silver, D. Fictitious self-play in extensive-form games. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 805–813. [Google Scholar]
- Heinrich, J.; Silver, D. Deep reinforcement learning from self-play in imperfect-information games. arXiv 2016, arXiv:1603.01121. [Google Scholar]
- McMahan, H.B.; Gordon, G.J.; Blum, A. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 536–543. [Google Scholar]
- Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Perolat, J.; Silver, D.; Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4190–4203. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6379–6390. [Google Scholar]
- Wang, W.; Sun, D.; Jiang, F.; Chen, X.; Zhu, C. Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security. Algorithms 2022, 15, 134. [Google Scholar] [CrossRef]
- Sanjaya, R.; Wang, J.; Yang, Y. Measuring the Non-Transitivity in Chess. Algorithms 2022, 15, 152. [Google Scholar] [CrossRef]
- Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. [Google Scholar] [CrossRef]
- Lu, Y.; Li, W. Techniques and Paradigms in Modern Game AI Systems. Algorithms 2022, 15, 282. [Google Scholar] [CrossRef]
- Lu, Y.; Li, W.; Li, W. Official International Mahjong: A New Playground for AI Research. Algorithms 2023, 16, 235. [Google Scholar] [CrossRef]
- Li, Z.; Chen, X.; Fu, J.; Xie, N.; Zhao, T. Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL. Algorithms 2024, 17, 36. [Google Scholar] [CrossRef]
- Li, J.; Xie, N.; Zhao, T. Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer. Algorithms 2024, 17, 37. [Google Scholar] [CrossRef]
- Schaa, H.; Barriga, N.A. Evaluating the Expressive Range of Super Mario Bros Level Generators. Algorithms 2024, 17, 307. [Google Scholar] [CrossRef]
- Zhang, L.; Zou, H.; Zhu, Y. An Efficient Optimization of the Monte Carlo Tree Search Algorithm for Amazons. Algorithms 2024, 17, 334. [Google Scholar] [CrossRef]
- Hsieh, Y.-H.; Kao, C.-C.; Yuan, S.-M. Imitating Human Go Players via Vision Transformer. Algorithms 2025, 18, 61. [Google Scholar] [CrossRef]
- Penelas, G.; Barbosa, L.; Reis, A.; Barroso, J.; Pinto, T. Machine Learning for Decision Support and Automation in Games: A Study on Vehicle Optimal Path. Algorithms 2025, 18, 106. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 1856–1865. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, W.; Zhang, H. Algorithms for Game AI. Algorithms 2025, 18, 363. https://doi.org/10.3390/a18060363
Li W, Zhang H. Algorithms for Game AI. Algorithms. 2025; 18(6):363. https://doi.org/10.3390/a18060363
Chicago/Turabian StyleLi, Wenxin, and Haifeng Zhang. 2025. "Algorithms for Game AI" Algorithms 18, no. 6: 363. https://doi.org/10.3390/a18060363
APA StyleLi, W., & Zhang, H. (2025). Algorithms for Game AI. Algorithms, 18(6), 363. https://doi.org/10.3390/a18060363