The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study
Abstract
:1. Introduction
2. Reinforcement Learning: Background
2.1. Sequential Decision Making
2.2. Reinforcement Learning
3. Experimental Setting to Probe the Role of a Reward Using Football Scenarios
3.1. Environment: AI World Cup
3.2. Engineering Artificial Football Agents
4. Results
4.1. Simulation Settings
4.2. Experimental Results
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 1998; Volume 1. [Google Scholar]
- Silver, D.; Singh, S.; Precup, D.; Sutton, R.S. Reward is enough. Artif. Intell. 2021, 299, 103535. [Google Scholar] [CrossRef]
- Szepesvári, C. Algorithms for reinforcement learning. Synth. Lect. Artif. Intell. Mach. Learn. 2010, 4, 1–103. [Google Scholar]
- Sigaud, O.; Buffet, O. Markov Decision Processes in Artificial Intelligence; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Bertsekas, D.P.; Tsitsiklis, J.N. Neuro-dynamic programming: An overview. In Proceedings of the 34th IEEE Conference on Decision and Control, New Orleans, LO, USA, 13–15 December 1995; Volume 1, pp. 560–564. [Google Scholar]
- Si, J. Handbook of Learning and Approximate Dynamic Programming; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 2. [Google Scholar]
- Busoniu, L.; Babuska, R.; De Schutter, B.; Ernst, D. Reinforcement Learning and Dynamic Programming Using Function Approximators; CRC Press: Amsterdam, The Netherlands, 2010; Volume 39. [Google Scholar]
- Soni, K.; Dogra, D.P.; Sekh, A.A.; Kar, S.; Choi, H.; Kim, I.J. Person re-identification in indoor videos by information fusion using Graph Convolutional Networks. Expert Syst. Appl. 2022, 210, 118363. [Google Scholar] [CrossRef]
- Tuyls, K.; Omidshafiei, S.; Muller, P.; Wang, Z.; Connor, J.; Hennes, D.; Graham, I.; Spearman, W.; Waskett, T.; Steel, D.; et al. Game Plan: What AI can do for Football, and What Football can do for AI. J. Artif. Intell. Res. 2021, 71, 41–88. [Google Scholar] [CrossRef]
- Hong, C.; Jeong, I.; Vecchietti, L.F.; Har, D.; Kim, J.H. AI World Cup: Robot-Soccer-Based Competitions. IEEE Trans. Games 2021, 13, 330–341. [Google Scholar] [CrossRef]
- Bellman, R. A Markovian Decision Process. J.Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Littman, M.L. Algorithms for Sequential Decision Making. Ph.D. Thesis, Brown University, Providence, RI, USA, 1996. [Google Scholar]
- Winston, P.H. Artificial Intelligence, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1992. [Google Scholar]
- Krause, A.; Golovin, D.; Converse, S. Sequential decision making in computational sustainability via adaptive submodularity. AI Mag. 2014, 35, 8–18. [Google Scholar] [CrossRef] [Green Version]
- Bellman, R. Dynamic programming and Lagrange multipliers. Proc. Natl. Acad. Sci. USA 1956, 42, 767–769. [Google Scholar] [CrossRef] [Green Version]
- Sutton, R.S. Learning to predict by the methods of temporal differences. Mach. Learn. 1988, 3, 9–44. [Google Scholar] [CrossRef]
- Roijers, D.M.; Vamplew, P.; Whiteson, S.; Dazeley, R. A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 2013, 48, 67–113. [Google Scholar] [CrossRef] [Green Version]
- van Otterlo, M.; Wiering, M. Reinforcement Learning and Markov Decision Processes. In Reinforcement Learning: State-of-the-Art; Wiering, M., van Otterlo, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 3–42. [Google Scholar]
- Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, University of Cambridge England, Cambridge, UK, 1989. [Google Scholar]
- Barto, A.G.; Duff, M. Monte Carlo matrix inversion and reinforcement learning. Adv. Neural Inf. Process. Syst. 1994, 6, 687. [Google Scholar]
- Singh, S.P.; Sutton, R.S. Reinforcement learning with replacing eligibility traces. Recent Adv. Reinf. Learn. 1996, 22, 123–158. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Brooks, R.A. Intelligence without representation. Artif. Intell. 1991, 47, 139–159. [Google Scholar] [CrossRef]
- Bordini, R.H.; Hübner, J.F.; Wooldridge, M. Programming Multi-Agent Systems in AgentSpeak Using Jason; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
- Michel, O. Cyberbotics ltd. webotsTM: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 2004, 1, 5. [Google Scholar] [CrossRef] [Green Version]
- Open Dynamic Engine. Available online: https://www.ode.org/ (accessed on 4 February 2023).
- AI Soccer 3D. Available online: https://github.com/aisoccer/aisoccer-3d/releases/ (accessed on 4 February 2023).
- Examples of AI Football Agents. Available online: https://github.com/idea-lab-smu/ai-football-pilot (accessed on 4 February 2023).
- Bryson, J.J. The behavior-oriented design of modular agent intelligence. Lect. Notes Comput. Sci. 2003, 2592, 61–76. [Google Scholar]
- Yi, S.; Lee, J.; Lee, C.; Kim, J.; An, S.; Lee, S.W. A Competitive Path to Build Artificial Football Agents for AI Worldcup. In Proceedings of the IEEE/IEIE International Conference on Consumer Electronics (ICCE) Asia, Jeju, Republic of Korea, 24–26 June 2018. [Google Scholar]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [Green Version]
- Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature 2020, 588, 604–609. [Google Scholar] [CrossRef]
- Hessel, M.; Danihelka, I.; Viola, F.; Guez, A.; Schmitt, S.; Sifre, L.; Weber, T.; Silver, D.; Van Hasselt, H. Muesli: Combining improvements in policy optimization. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 4214–4226. [Google Scholar]
- Foerster, J.; Chen, R.Y.; Al-Shedivat, M.; Whiteson, S.; Abbeel, P.; Mordatch, I. Learning with opponent-learning awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 10–15 July 2018; pp. 122–130. [Google Scholar]
- Wang, T.; Gupta, T.; Mahajan, A.; Peng, B.; Whiteson, S.; Zhang, C. Rode: Learning roles to decompose multi-agent tasks. arXiv 2020, arXiv:2010.01523. [Google Scholar]
- Wang, J.; Ren, Z.; Liu, T.; Yu, Y.; Zhang, C. Qplex: Duplex dueling multi-agent q-learning. arXiv 2020, arXiv:2008.01062. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 7234–7284. [Google Scholar]
- Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 330–337. [Google Scholar]
- Wang, J.X.; Kurth-Nelson, Z.; Tirumala, D.; Soyer, H.; Leibo, J.Z.; Munos, R.; Blundell, C.; Kumaran, D.; Botvinick, M. Learning to reinforcement learn. arXiv 2016, arXiv:1611.05763. [Google Scholar]
- Wang, J.X.; Kurth-Nelson, Z.; Kumaran, D.; Tirumala, D.; Soyer, H.; Leibo, J.Z.; Hassabis, D.; Botvinick, M. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 2018, 21, 860. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.W.; Shimojo, S.; O’Doherty, J.P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 2014, 81, 687–699. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lee, J.H.; Seymour, B.; Leibo, J.Z.; An, S.J.; Lee, S.W. Toward high-performance, memory-efficient, and fast reinforcement learning—Lessons from decision neuroscience. Sci. Robot. 2019, 4, eaav2975. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, S.H.; Kim, J.H.; Lee, J.H. The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study. Appl. Sci. 2023, 13, 3622. https://doi.org/10.3390/app13063622
Kim SH, Kim JH, Lee JH. The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study. Applied Sciences. 2023; 13(6):3622. https://doi.org/10.3390/app13063622
Chicago/Turabian StyleKim, So Hyeon, Ji Hun Kim, and Jee Hang Lee. 2023. "The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study" Applied Sciences 13, no. 6: 3622. https://doi.org/10.3390/app13063622
APA StyleKim, S. H., Kim, J. H., & Lee, J. H. (2023). The Role of a Reward in Shaping Multiple Football Agents’ Behavior: An Empirical Study. Applied Sciences, 13(6), 3622. https://doi.org/10.3390/app13063622