Algorithms for Game AI

Li, Wenxin; Zhang, Haifeng

doi:10.3390/a18060363

Open AccessEditorial

Algorithms for Game AI

by

Wenxin Li

^1,* and

Haifeng Zhang

²

¹

School of Computer Science, Peking University, Beijing 100871, China

²

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(6), 363; https://doi.org/10.3390/a18060363

Submission received: 19 May 2025 / Accepted: 26 May 2025 / Published: 13 June 2025

(This article belongs to the Special Issue Algorithms for Games AI)

Download Versions Notes

Games have long been benchmarks for AI algorithms and, with the boost of computational power and the application of new algorithms, AI systems have achieved superhuman performance in games for which it was once thought that they could only be mastered by humans due to their high complexity. Since Deep Blue beat professional human players in chess [1], many milestones have been reported in various games, from board games like Go [2,3,4] and card games like Texas Hold’em [5,6,7] to video games like StarCraft [8], Dota 2 [9], and HoK [10].

There are two reasons why games are excellent benchmarks for AI algorithms. First, games are born to test and challenge human intelligence—just as children learn about the world by playing with games and toys during their first years of life, the diversity of games can provide a rich context to test different cognitive and decision-making capabilities. For example, board games, with their formalized state representation and perfect information, require searching and planning from the current game state. Card games, with their non-deterministic transition and imperfect information, reveal more sophisticated strategies, such as bluffing and deception, skills that are normally reserved for humans. Video games, with their high-dimensional states and long episodes, require feature extraction, memorization, long-term planning, and multi-agent cooperation and competition. These characteristics make games strong testbeds for the gradual skill progression of AI.

Second, solving complex real-world problems usually requires repeated trial and error, which can be very costly. By simulating or emulating real-world scenarios, games provide a low-cost or even zero-cost platform to validate various algorithms and solutions. In a broad sense, games are not limited to those that people play for entertainment, but any decision-making problems can be modeled and simulated as games, from simple robot control to complex scenarios of resource allocation, scheduling, and routing across various industrial fields. Algorithms proposed and developed to play games are eventually applied to various real-world problems for social benefit in all aspects of life.

This Special Issue entitled “Algorithms for Game AI” covers new and innovative approaches for solving game AI problems. These approaches range from traditional algorithms like planning and searching, to modern algorithms such as deep reinforcement learning. The papers in this Special Issue address both the theoretical and practical challenges of the application of these algorithms.

AI techniques have been implemented in gaming industry with a wide range of applications [11], such as procedure content generation (PCG), where algorithms autonomously create game worlds, levels, and assets to enhance replayability, and AI-driven non-player character (NPC) behavior, simulating human-like decision-making and emotional responses for better engagement. Nevertheless, we mainly focused on the application of building game-playing agents, because of the inherent challenges and the potential for solving real-world decision-making problems. In this field, game AI algorithms can be divided into several categories, such as the following:

Search algorithms are used for real-time planning. These algorithms are designed to expand a search tree from the game state currently encountered to evaluate future states under different sequences of actions. Some algorithms like A* [12] leverage heuristic functions to guide the selection of unexplored nodes, which evaluate game states based on the domain knowledge of human players. In general, searching more deeply usually yields better policies because it eliminates some errors of value estimation by looking more steps ahead. However, searching more deeply costs more computation, and the search depth is usually, in practice, fixed or decided by iterative deepening to limit the time cost. In multi-agent competitive games, adversarial search algorithms such as Minimax [13] are often used, where the agents must act against opponents and maximize their own benefits. Monte Carlo Tree Search (MCTS) [14] is one of the most popular search algorithms due to its efficiency and robustness with no domain knowledge needed. Many variants of MCTS have been proposed to further improve its efficiency by incorporating upper confidence bounds [15], domain knowledge [16], and parallelization [17].
Optimization algorithms refine solutions to achieve specific goals, which are used to train agents with parameterized models. Among these, linear programming algorithms optimize objective functions under a set of constraints; evolutionary methods, such as Genetic Algorithms (GA) and Evolutionary Strategies (ES) [18], create a population of individuals where the fitter ones have a higher probability of reproducing and inheriting part of their structures, inspired by the process of natural selection; gradient descent is the foundation of deep learning, which achieves the efficient parameter tuning of modern artificial neural networks. These algorithms train policy models or value models containing prior knowledge before actual gameplay, which can be combined with real-time planning algorithms during the inference phase.
Supervised learning is a data-driven method to learn patterns and relationships from labeled data. In the context of game AI, the data usually refer to the game states or observations, and the task is to learn a policy model or value model that predicts the action or the estimated value under the current state. These algorithms require a great deal of labeled data in the form of state–action or state–value pairs, usually collected from data on human gameplay or data generated by other game-playing algorithms. In general, there are two types of applications based on the data source. Applying supervised learning on human data can learn implicit human knowledge and store it in policy models or value models. Another type of application, known as policy distillation [19], relies on data generated by other models, which is used to distill a lightweight model from a larger one to improve computational efficiency or to consolidate multiple task-specific policies into a single unified policy.
Reinforcement learning (RL) studies how agents should take actions in an environment to maximize cumulative rewards over time. Typically, RL models the environment as a Markov Decision Process (MDP), where the transition probability and rewards satisfy the Markov property that they are only related to the current state and action. When the transition model is known, generalized policy iteration, such as value iteration, uses dynamic programming to solve the optimal policy and its value function based on Bellman Equation. However, in most cases, the environment model is unknown, and model-free algorithms are preferable as they learn from experiences by interacting with the environment. There are two kinds of model-free algorithms:
○
Value-based algorithms optimize the policy by approximating the values of states or state–action pairs and selecting better actions based on these values. There are different ways to update the value function [20]. A Monte Carlo (MC) algorithm updates the value function based on the cumulative rewards towards the end of the episode, while a temporal difference (TD) algorithm updates the value function based on the current reward and the value of the next state in a bootstrapping manner. Algorithms such as DQN [21] use deep neural networks to approximate the state–action value function, which can be applied to games with large state space.
○
o Policy-based algorithms directly learn parameterized policies based on gradients of some performance measure using the gradient descent method. For example, REINFORCE [22] samples full episode trajectories with Monte Carlo methods to estimate returns as the loss function. However, such pure policy-based algorithms suffer from high variance, and actor–critic algorithms [23] have been proposed, which use actors to learn parameterized policies and critics to learn value functions, allowing the policy updates to consider the value estimates to reduce the variance. Actor–critic algorithms include DDPG [24], A3C [25], IMPALA [26], TRPO [27], and PPO [28], to name but a few.
Regret-based algorithms seek to find the Nash equilibrium in games with imperfect information. The basic idea is to choose actions to minimize the regret of not having chosen other actions in previous games. Counterfactual regret minimization (CFR) [29] traverses the game tree for multiple iterations to calculate the cumulative regrets of each state–action pair. Many variants of CFR, such as Discounted CFR [30] and MCCFR [31], have been proposed to improve efficiency by incorporating sampling, discounting, and reweighting techniques. For games with large state space, algorithms such as Deep CFR [32] and DREAM [33] adopt neural networks to approximate the regrets and policies.
Self-play algorithms are often used in competitive multi-agent environments. In fictitious play (FP) [34], each agent calculates its best response to the opponent’s average policies, which has been proven to converge to the Nash equilibrium in theory. When applied to games with large state space, neural fictitious self-play (NFSP) [35] adopts neural networks as the policy model, using reinforcement learning to calculate best responses and supervised learning to learn average policies. Double oracle (DO) [36] starts from a small policy subset, where each agent iteratively calculates the Nash equilibrium under the current strategy set and adds it to the set. Policy-space response oracles (PSRO) [37] provide a unified view of them using a policy pool to train new policies to be added, which has become the common practice of multi-agent RL training.
Multi-agent RL (MARL) algorithms extend single-agent RL algorithms to multi-agent settings, usually following a training paradigm called centralized training decentralized execution (CTDE). CTDE jointly trains multiple agents in a centralized manner but keeps their independence in execution, which can provide a mechanism of communication to eliminate the problems of unstable dynamics in independent training. For example, value-based algorithms like Value Decomposition Network (VDN) [38] and QMIX [39] are variants of DQN in cooperative multi-agent settings that adopt centralized state-action value functions, using summation and mixing networks to combine individual Q-networks. Multi-agent DDPG (MADDPG) [40] is a policy-based algorithm that generalizes DDPG to multi-agent settings, and many variants of it have been proposed to improve its performance and efficiency.

This Special Issue presents ten papers covering a wide range of game AI topics, including the quantification of non-transitivity in chess, the expressiveness of level generators in Super Mario Bros, Mahjong as a new game AI benchmark, new MARL algorithms to reduce Q-value bias, surveys of various AI algorithms in cyber defense, energy areas and games, the application of MCTS in Amazons, the application of deep reinforcement learning in autonomous vehicle driving, and the application of transformers in both offline RL and imitation learning. The list of contributions to this Special Issue is as follows:

Contribution 1 [41] examines the issue of cyber defense from the perspective of decision-making and proposes a framework to study the problem characteristics of different cyber defense contexts. The framework identifies four stages in the life cycle of threats—pentest, design, response, and recovery—integrating both the attacker’s and defender’s perspectives. By analyzing the decision-making scenarios in each of these four stages, this paper presents a comprehensive survey of existing research on the application of reinforcement learning in cyber defense, providing a review of both the problem boundary and current methods.

Contribution 2 [42] measures the degree of non-transitivity in chess and investigates the implications of non-transitivity for population-based self-play algorithms. By measuring both the Nash clusters and the length of the longest transition cycles in the policies from billions of human games, this paper reveals that the strategy space of chess indicates a spinning top geometry where middle-level strategies have longer non-transitive cycles than top-level or worst-performing ones. Further experiments with a fixed-memory fictitious play algorithm, which simulates the behavior of common population-based self-play methods, indicate not only that larger populations are required for training to converge, but that the minimum size of the population is also related to the degree of non-transitivity of the strategy space.

Contribution 3 [43] reviews machine learning algorithms to solve combinatorial optimization problems (COPs) in energy areas. COPs are a class of NP-hard problems used to optimize an objective function within a discrete domain, which have been the focus of traditional game AI research and can be seen across various industrial fields, from resource allocation and scheduling to routing scenarios. This paper focuses on the context of energy areas, including petroleum supply chains, steel making, electric power systems, and wind power. It presents a systematic review of ML algorithms used to solve COPs in these scenarios, including supervised learning (SL), deep learning (DL), reinforcement learning (RL), and recently proposed game theory-based methods. Application of these algorithms are discussed in detail, including recent advances and the challenges and future directions of this field.

Contribution 4 [44] reviews algorithms and paradigms to build high-performance game-playing agents. Many game AI algorithms, from searching and planning to SL, RL, and game theory-based methods are presented as basic components of game AI systems. By summarizing the implementation of recent state-of-the-art game AI systems in various games, which have been achieving superhuman performance, this paper provides a comprehensive comparison of these milestones via decomposing each into its components and relating them to the characteristics of the games to which they are applied. Three types of paradigms are concluded from these milestones, with their scope and limitations discussed in detail. Stemming from these comparisons and analyses, the paper claims that deep reinforcement learning is likely to become a general methodology for game AI as future trends unfold.

Contribution 5 [45] introduces Official International Mahjong as a new benchmark for game AI research. Mahjong presents challenges for AI research due to its multi-agent nature, rich hidden information, and complex scoring rules. This paper presents in detail Mahjong Competition Rules (MCR), a Mahjong variant widely used in competitions due to its complexity, as well as a series of Mahjong AI competitions held in IJCAI, which adopts the duplicate format to reduce variance. By comparing the algorithms and the performance of AI agents in these competitions, the paper shows that supervised learning and reinforcement learning are currently the state-of-the-art methods in this game and perform much better than heuristic methods. While the top AI agents still cannot beat professional human players, this paper claims that this game can be a new benchmark for AI research due to its complexity and popularity.

Contribution 6 [46] explores the application of language models in offline RL, and proposes Action-Translator Transformer (ATT) as the policy model. ATT is built upon the Sequence-to-Sequence (Seq2Seq) model structure used in text translation tasks. Different from Decision Transformer (DT), which predicts next states as well as rewards and actions, ATT only predicts actions to maximize cumulative rewards, given partial trajectories containing states, actions, and rewards. Positional encoding typically used in NLP tasks is also devised to adapt to new input elements of RL. Experiments performed in several Mujoco environments show that ATT performs better than other offline RL algorithms such as DT.

Contribution 7 [47] discusses the issue of Q-value overestimation in MADDPG, and proposes two variants to reduce estimation bias, called Multi-Agent Mutual Twin Delayed Deep Deterministic Policy Gradient (M2ATD3) and Multi-Agent Softmax Twin Delayed Deep Deterministic Policy Gradient (MASTD3). Both methods introduce a second Q-network for each agent and incorporate either the minimum or maximum value of two Q-values to reduce bias in the opposite direction. MASTD3 further extends the Softmax Bellman operation to multi-agent settings by fixing others’ actions and only changing its own actions. Experiments on two multi-agent environments, Particles and Tank, show some performance improvement and Q-value bias reduction compared to baseline methods such as MATD3.

Contribution 8 [48] investigates the quality of PCG and specifically evaluates the expressive range of three level generators for Super Mario Bros (SMB) under different algorithms, including Genetic Algorithms (GA), Generative Adversarial Networks (GAN), and Markov Chains (MC). By defining nine metrics for each SMB level covering different types of characteristics, this paper visualizes the expressive ranges based on the metric distribution of generated levels and concludes that GA and MC have a much wider expressive range than GAN. The practice of expressive range analysis (ERA) presented in this paper can help game developers to recognize potential problems early in the PCG process to guarantee high-quality game content.

Contribution 9 [49] explores the application of the MCTS algorithm in Amazons, and proposes MG-PEO, a combination of several optimization strategies called “Move Groups” and “Parallel Evaluation”, to improve performance and efficiency. Move groups is a method used to divide a move into two parts, which can largely reduce the branching factor of the game tree and focus the visited nodes of MCTS more on promising subtrees. Parallel evaluation uses multiple threads to speed up the evaluation of leaf nodes. A technique called “Shallow Rollout” is also used to expand the small, fixed depth of nodes and apply the evaluation function to them. Ablation experiments show that MG-PEO agents achieve higher win rate than vanilla MCTS agents, indicating the effectiveness of the proposed optimization strategies.

Contribution 10 [50] explores the application of vision transformer (ViT) to imitate the strategy of human Go players. While AlphaGo has achieved superhuman performance in Go, it often comes up with moves beyond human players’ comprehension. This paper adopts supervised learning to train a policy network with vision transformer as its backbone architecture, which achieves professional-level play while mimicking the decision-making of human players. Experiments show that this ViT model achieves higher accuracy than models based on convolutional neural networks (CNNs).

Contribution 11 [51] develops a vehicle driving game and explores the application of different reinforcement learning, including PPO and SAC [52], to train autonomous driving agents. By designing a series of tasks with rising complexity as a curriculum, the agent can gradually learn to navigate through multiple targets within lane boundaries without running into obstacles. Rewards are carefully designed in each task for effective and robust driving. Experiments show that though SAC agents learn faster in early phase, PPO agents achieve better performance and adaptability than SAC, shedding light on the applications of autonomous systems in real-world scenarios.

Finally, as the Guest Editor, it was my pleasure to work with the editorial staff of Algorithms to prepare this Special Issue.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflicts of interest, since her co-authored paper in this Special Issue has been externally reviewed and accepted.

List of Contributions

Wang, W.; Sun, D.; Jiang, F.; Chen, X.; Zhu, C. Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security. Algorithms 2022, 15, 134. https://doi.org/10.3390/a15040134.
Sanjaya, R.; Wang, J.; Yang, Y. Measuring the Non-Transitivity in Chess. Algorithms 2022, 15, 152. https://doi.org/10.3390/a15050152.
Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. https://doi.org/10.3390/a15060205.
Lu, Y.; Li, W. Techniques and Paradigms in Modern Game AI Systems. Algorithms 2022, 15, 282. https://doi.org/10.3390/a15080282.
Lu, Y.; Li, W.; Li, W. Official International Mahjong: A New Playground for AI Research. Algorithms 2023, 16, 235. https://doi.org/10.3390/a16050235.
Li, Z.; Chen, X.; Fu, J.; Xie, N.; Zhao, T. Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL. Algorithms 2024, 17, 36. https://doi.org/10.3390/a17010036.
Li, J.; Xie, N.; Zhao, T. Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer. Algorithms 2024, 17, 37. https://doi.org/10.3390/a17010037.
Schaa, H.; Barriga, N.A. Evaluating the Expressive Range of Super Mario Bros Level Generators. Algorithms 2024, 17, 307. https://doi.org/10.3390/a17070307.
Zhang, L.; Zou, H.; Zhu, Y. An Efficient Optimization of the Monte Carlo Tree Search Algorithm for Amazons. Algorithms 2024, 17, 334. https://doi.org/10.3390/a17080334.
Hsieh, Y.-H.; Kao, C.-C.; Yuan, S.-M. Imitating Human Go Players via Vision Transformer. Algorithms 2025, 18, 61. https://doi.org/10.3390/a18020061.
Penelas, G.; Barbosa, L.; Reis, A.; Barroso, J.; Pinto, T. Machine Learning for Decision Support and Automation in Games: A Study on Vehicle Optimal Path. Algorithms 2025, 18, 106. https://doi.org/10.3390/a18020106.

References

Campbell, M.; Hoane, A.J., Jr.; Hsu, F.H. Deep blue. Artif. Intell. 2002, 134, 57–83. [Google Scholar] [CrossRef]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [PubMed]
Moravčík, M.; Schmid, M.; Burch, N.; Lisý, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; Bowling, M. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 2017, 356, 508–513. [Google Scholar] [CrossRef] [PubMed]
Brown, N.; Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 2018, 359, 418–424. [Google Scholar] [CrossRef]
Brown, N.; Sandholm, T. Superhuman AI for multiplayer poker. Science 2019, 365, 885–890. [Google Scholar] [CrossRef]
Vinyals, O.; Babuschkin, I.; Chung, J.; Mathieu, M.; Jaderberg, M.; Czarnecki, W.M.; Dudzik, A.; Huang, A.; Georgiev, P.; Powell, R.; et al. Alphastar: Mastering the real-time strategy game starcraft ii. Deep. Blog 2019, 2, 20. [Google Scholar]
Berner, C.; Brockman, G.; Chan, B.; Cheung, V.; Dębiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, C.; et al. Dota 2 with large scale deep reinforcement learning. arXiv 2019, arXiv:1912.06680. [Google Scholar]
Ye, D.; Chen, G.; Zhang, W.; Chen, S.; Yuan, B.; Liu, B.; Chen, J.; Liu, Z.; Qiu, F.; Yu, H.; et al. Towards playing full moba games with deep reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 621–632. [Google Scholar]
Xia, B.; Ye, X.; Abuassba, A.O. Recent research on ai in games. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 505–510. [Google Scholar]
Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
Stockman, G.C. A minimax algorithm better than alpha-beta? Artif. Intell. 1979, 12, 179–196. [Google Scholar] [CrossRef]
Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
Kocsis, L.; Szepesvári, C. Bandit based monte-carlo planning. In Proceedings of the European Conference on Machine Learning, Berlin, Germany, 18–22 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 282–293. [Google Scholar]
Gelly, S.; Silver, D. Combining online and offline knowledge in UCT. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 273–280. [Google Scholar]
Chaslot, G.M.B.; Winands, M.H.; van Den Herik, H.J. Parallel monte-carlo tree search. In Proceedings of the Computers and Games: 6th International Conference, CG 2008, Proceedings 6, Beijing, China, 29 September–1 October 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 60–71. [Google Scholar]
Rechenberg, I. Evolutionsstrategien. In Proceedings of the Simulationsmethoden in der Medizin und Biologie: Workshop, Hannover, Germany, 29 September–1 October 1977; Springer: Berlin/Heidelberg, Germany, 1978; pp. 83–114. [Google Scholar]
Rusu, A.A.; Colmenarejo, S.G.; Gulcehre, C.; Desjardins, G.; Kirkpatrick, J.; Pascanu, R.; Mnih, V.; Kavukcuoglu, K.; Hadsell, R. Policy distillation. arXiv 2015, arXiv:1511.06295. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1, pp. 9–11. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
Konda, V.; Tsitsiklis, J. Actor-critic algorithms. In Proceedings of the Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999; pp. 1008–1014. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PmLR, New York, NY, USA, 19–24 June 2006; pp. 1928–1937. [Google Scholar]
Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1407–1416. [Google Scholar]
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Zinkevich, M.; Johanson, M.; Bowling, M.; Piccione, C. Regret minimization in games with incomplete information. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1729–1736. [Google Scholar]
Brown, N.; Sandholm, T. Solving imperfect-information games via discounted regret minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1829–1836. [Google Scholar]
Lanctot, M.; Waugh, K.; Zinkevich, M.; Bowling, M. Monte Carlo sampling for regret minimization in extensive games. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1078–1086. [Google Scholar]
Brown, N.; Lerer, A.; Gross, S.; Sandholm, T. Deep counterfactual regret minimization. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 793–802. [Google Scholar]
Steinberger, E.; Lerer, A.; Brown, N. Dream: Deep regret minimization with advantage baselines and model-free learning. arXiv 2020, arXiv:2006.10410. [Google Scholar]
Heinrich, J.; Lanctot, M.; Silver, D. Fictitious self-play in extensive-form games. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 805–813. [Google Scholar]
Heinrich, J.; Silver, D. Deep reinforcement learning from self-play in imperfect-information games. arXiv 2016, arXiv:1603.01121. [Google Scholar]
McMahan, H.B.; Gordon, G.J.; Blum, A. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 536–543. [Google Scholar]
Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Perolat, J.; Silver, D.; Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4190–4203. [Google Scholar]
Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6379–6390. [Google Scholar]
Wang, W.; Sun, D.; Jiang, F.; Chen, X.; Zhu, C. Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security. Algorithms 2022, 15, 134. [Google Scholar] [CrossRef]
Sanjaya, R.; Wang, J.; Yang, Y. Measuring the Non-Transitivity in Chess. Algorithms 2022, 15, 152. [Google Scholar] [CrossRef]
Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas. Algorithms 2022, 15, 205. [Google Scholar] [CrossRef]
Lu, Y.; Li, W. Techniques and Paradigms in Modern Game AI Systems. Algorithms 2022, 15, 282. [Google Scholar] [CrossRef]
Lu, Y.; Li, W.; Li, W. Official International Mahjong: A New Playground for AI Research. Algorithms 2023, 16, 235. [Google Scholar] [CrossRef]
Li, Z.; Chen, X.; Fu, J.; Xie, N.; Zhao, T. Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL. Algorithms 2024, 17, 36. [Google Scholar] [CrossRef]
Li, J.; Xie, N.; Zhao, T. Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer. Algorithms 2024, 17, 37. [Google Scholar] [CrossRef]
Schaa, H.; Barriga, N.A. Evaluating the Expressive Range of Super Mario Bros Level Generators. Algorithms 2024, 17, 307. [Google Scholar] [CrossRef]
Zhang, L.; Zou, H.; Zhu, Y. An Efficient Optimization of the Monte Carlo Tree Search Algorithm for Amazons. Algorithms 2024, 17, 334. [Google Scholar] [CrossRef]
Hsieh, Y.-H.; Kao, C.-C.; Yuan, S.-M. Imitating Human Go Players via Vision Transformer. Algorithms 2025, 18, 61. [Google Scholar] [CrossRef]
Penelas, G.; Barbosa, L.; Reis, A.; Barroso, J.; Pinto, T. Machine Learning for Decision Support and Automation in Games: A Study on Vehicle Optimal Path. Algorithms 2025, 18, 106. [Google Scholar] [CrossRef]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 1856–1865. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Zhang, H. Algorithms for Game AI. Algorithms 2025, 18, 363. https://doi.org/10.3390/a18060363

AMA Style

Li W, Zhang H. Algorithms for Game AI. Algorithms. 2025; 18(6):363. https://doi.org/10.3390/a18060363

Chicago/Turabian Style

Li, Wenxin, and Haifeng Zhang. 2025. "Algorithms for Game AI" Algorithms 18, no. 6: 363. https://doi.org/10.3390/a18060363

APA Style

Li, W., & Zhang, H. (2025). Algorithms for Game AI. Algorithms, 18(6), 363. https://doi.org/10.3390/a18060363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Algorithms for Game AI

Funding

Conflicts of Interest

List of Contributions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI