Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control
Abstract
:1. Introduction
2. Methods
2.1. Game Theoretic Tasks
Cooperate | Defect | |
---|---|---|
Cooperate | R, R | S, T |
Defect | T, S | P, P |
2.2. Control-Based Reinforcement Learning
2.3. Agent Models
2.3.1. TD-Learning Model
2.3.2. Rational Model
2.3.3. Predictive Model
2.3.4. Internal Model
2.3.5. Deterministic Agent Models
Greedy
Cooperative/Nice
Tit-for-Tat
2.4. Experimental Setup
3. Results
3.1. Experiment 1: Versus a Deterministic-Greedy Agent
3.2. Experiment 2: Versus a Deterministic-Nice Agent
3.3. Experiment 3: Versus a Tit-for-Tat Agent
3.4. Experiment 4: Versus the TD-Learning Agent
3.5. Experiment 5: Continuous-Time Effects on Prediction Accuracy
3.6. Comparison against Human Data
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ToM | Theory of Mind |
DAC | Distributed Adaptive Control |
CRL | Control-based Reinforcement Learning |
TD-Learning | Temporal-Difference Learning |
TFT | Tit-for-tat |
References
- Premack, D.; Woodruff, G. Does the chimpanzee have a theory of mind? Behav. Brain Sci. 1978, 1, 515–526. [Google Scholar] [CrossRef] [Green Version]
- Baron-Cohen, S.; Leslie, A.M.; Frith, U. Does the autistic child have a “theory of mind”? Cognition 1985, 21, 37–46. [Google Scholar] [CrossRef]
- Premack, D. The infant’s theory of self-propelled objects. Cognition 1990, 36, 1–16. [Google Scholar] [CrossRef]
- Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Pérolat, J.; Silver, D.; Graepel, T. A unified game-theoretic approach to multiagent reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4190–4203. [Google Scholar]
- Lerer, A.; Peysakhovich, A. Learning social conventions in markov games. arXiv 2018, arXiv:1806.10071. [Google Scholar]
- Zhao, Z.; Zhao, F.; Zhao, Y.; Zeng, Y.; Sun, Y. A brain-inspired theory of mind spiking neural network improves multi-agent cooperation and competition. Patterns 2023, 100775. [Google Scholar] [CrossRef]
- Rabinowitz, N.C.; Perbet, F.; Song, H.F.; Zhang, C.; Eslami, S.; Botvinick, M. Machine Theory of Mind. arXiv 2018, arXiv:1802.07740. [Google Scholar]
- Sclar, M.; Neubig, G.; Bisk, Y. Symmetric machine theory of mind. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 19450–19466. [Google Scholar]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Yoshida, W.; Dolan, R.J.; Friston, K.J. Game theory of mind. PLoS Comput. Biol. 2008, 4, e1000254. [Google Scholar] [CrossRef] [PubMed]
- Baker, C.; Saxe, R.; Tenenbaum, J. Bayesian Theory of Mind: Modeling Joint Belief-Desire Attribution. In Proceedings of the Annual Meeting of the Cognitive Science Society, Boston, MA, USA, 20–23 July 2011; Volume 33. [Google Scholar]
- Baker, C.L.; Jara-Ettinger, J.; Saxe, R.; Tenenbaum, J.B. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nat. Hum. Behav. 2017, 1, 0064. [Google Scholar] [CrossRef]
- Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [Green Version]
- Berke, M.; Jara-Ettinger, J. Integrating Experience into Bayesian Theory of Mind. In Proceedings of the Annual Meeting of the Cognitive Science Society, Toronto, ON, Canada, 27–30 July 2022; Volume 44. [Google Scholar]
- Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 1. [Google Scholar]
- Jara-Ettinger, J. Theory of mind as inverse reinforcement learning. Curr. Opin. Behav. Sci. 2019, 29, 105–110. [Google Scholar] [CrossRef]
- Wu, H.; Sequeira, P.; Pynadath, D.V. Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning. arXiv 2023, arXiv:2302.10238. [Google Scholar]
- Ruiz-Serra, J.; Harré, M.S. Inverse Reinforcement Learning as the Algorithmic Basis for Theory of Mind: Current Methods and Open Problems. Algorithms 2023, 16, 68. [Google Scholar] [CrossRef]
- Kahneman, D.; Slovic, P.; Tversky, A. Judgment under Uncertainty: Heuristics and Biases; Cambridge University Press: Cambridge, UK, 1982. [Google Scholar]
- Cuzzolin, F.; Morelli, A.; Cirstea, B.; Sahakian, B.J. Knowing me, knowing you: Theory of mind in AI. Psychol. Med. 2020, 50, 1057–1061. [Google Scholar] [CrossRef]
- Albrecht, S.V.; Stone, P. Autonomous agents modelling other agents: A comprehensive survey and open problems. Artif. Intell. 2018, 258, 66–95. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Zhong, F.; Xu, J.; Wang, Y. Tom2c: Target-oriented multi-agent communication and cooperation with theory of mind. arXiv 2021, arXiv:2111.09189. [Google Scholar]
- Yuan, L.; Fu, Z.; Zhou, L.; Yang, K.; Zhu, S.C. Emergence of theory of mind collaboration in multiagent systems. arXiv 2021, arXiv:2110.00121. [Google Scholar]
- Freire, I.T.; Moulin-Frier, C.; Sanchez-Fibla, M.; Arsiwalla, X.D.; Verschure, P.F. Modeling the formation of social conventions from embodied real-time interactions. PLoS ONE 2020, 15, e0234434. [Google Scholar] [CrossRef]
- Freire, I.T.; Puigbò, J.Y.; Arsiwalla, X.D.; Verschure, P.F. Limits of Multi-Agent Predictive Models in the Formation of Social Conventions. Artif. Intell. Res. Dev. Curr. Chall. New Trends Appl. 2018, 308, 297. [Google Scholar]
- Köster, R.; McKee, K.R.; Everett, R.; Weidinger, L.; Isaac, W.S.; Hughes, E.; Duéñez-Guzmán, E.A.; Graepel, T.; Botvinick, M.; Leibo, J.Z. Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences. arXiv 2020, arXiv:2010.09054. [Google Scholar]
- Kleiman-Weiner, M.; Ho, M.K.; Austerweil, J.L.; Littman, M.L.; Tenenbaum, J.B. Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction. In Proceedings of the CogSci, Philadelphia, PA, USA, 10–13 August 2016. [Google Scholar]
- Perolat, J.; Leibo, J.Z.; Zambaldi, V.; Beattie, C.; Tuyls, K.; Graepel, T. A multi-agent reinforcement learning model of common-pool resource appropriation. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3643–3652. [Google Scholar]
- Peysakhovich, A.; Lerer, A. Prosocial learning agents solve generalized stag hunts better than selfish ones. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 10–15 July 2018; International Foundation for Autonomous Agents and Multiagent Systems: London, UK, 2018; pp. 2043–2044. [Google Scholar]
- Freire, I.T.; Puigbò, J.Y.; Arsiwalla, X.D.; Verschure, P.F. Modeling the Opponent’s Action Using Control-Based Reinforcement Learning. In Proceedings of the Conference on Biomimetic and Biohybrid Systems; Springer: Berlin/Heidelberg, Germany, 2018; pp. 179–186. [Google Scholar]
- Gaparrini, M.J.; Sánchez-Fibla, M. Loss Aversion Fosters Coordination in Independent Reinforcement Learners. Artif. Intell. Res. Dev. Curr. Challenges New Trends Appl. 2018, 308, 307. [Google Scholar]
- Leibo, J.Z.; Zambaldi, V.; Lanctot, M.; Marecki, J.; Graepel, T. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, Sao Paulo, Brazil, 8–12 May 2017; International Foundation for Autonomous Agents and Multiagent Systems: London, UK, 2017; pp. 464–473. [Google Scholar]
- Peysakhovich, A.; Lerer, A. Consequentialist conditional cooperation in social dilemmas with imperfect information. arXiv 2017, arXiv:1710.06975. [Google Scholar]
- Nash, J.F. Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef] [PubMed]
- Hawkins, R.X.; Goldstone, R.L. The formation of social conventions in real-time environments. PLoS ONE 2016, 11, e0151670. [Google Scholar] [CrossRef] [Green Version]
- Hawkins, R.X.; Goodman, N.D.; Goldstone, R.L. The emergence of social norms and conventions. Trends Cogn. Sci. 2018. [Google Scholar] [CrossRef] [PubMed]
- Poncela-Casasnovas, J.; Gutiérrez-Roig, M.; Gracia-Lázaro, C.; Vicens, J.; Gómez-Gardeñes, J.; Perelló, J.; Moreno, Y.; Duch, J.; Sánchez, A. Humans display a reduced set of consistent behavioral phenotypes in dyadic games. Sci. Adv. 2016, 2, e1600451. [Google Scholar] [CrossRef] [Green Version]
- Sanfey, A.G. Social decision-making: Insights from game theory and neuroscience. Science 2007, 318, 598–602. [Google Scholar] [CrossRef] [Green Version]
- Verschure, P.F.; Voegtlin, T.; Douglas, R.J. Environmentally mediated synergy between perception and behaviour in mobile robots. Nature 2003, 425, 620. [Google Scholar] [CrossRef]
- Moulin-Frier, C.; Arsiwalla, X.D.; Puigbò, J.Y.; Sanchez-Fibla, M.; Duff, A.; Verschure, P.F. Top-Down and Bottom-Up Interactions between Low-Level Reactive Control and Symbolic Rule Learning in Embodied Agents. In Proceedings of the CoCo@ NIPS, Barcelona, Spain, 9 December 2016. [Google Scholar]
- Braitenberg, V. Vehicles: Experiments in Synthetic Psychology; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Corbetta, M.; Shulman, G.L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 2002, 3, 201. [Google Scholar] [CrossRef]
- Koechlin, E.; Ody, C.; Kouneiher, F. The architecture of cognitive control in the human prefrontal cortex. Science 2003, 302, 1181–1185. [Google Scholar] [CrossRef] [Green Version]
- Munakata, Y.; Herd, S.A.; Chatham, C.H.; Depue, B.E.; Banich, M.T.; O’Reilly, R.C. A unified framework for inhibitory control. Trends Cogn. Sci. 2011, 15, 453–459. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Den Ouden, H.E.; Kok, P.; De Lange, F.P. How prediction errors shape perception, attention, and motivation. Front. Psychol. 2012, 3, 548. [Google Scholar] [CrossRef] [Green Version]
- Wacongne, C.; Labyt, E.; van Wassenhove, V.; Bekinschtein, T.; Naccache, L.; Dehaene, S. Evidence for a hierarchy of predictions and prediction errors in human cortex. Proc. Natl. Acad. Sci. USA 2011, 108, 20754–20759. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S. Learning to predict by the methods of temporal differences. Mach. Learn. 1988, 3, 9–44. [Google Scholar] [CrossRef]
- Axelrod, R.; Hamilton, W.D. The evolution of cooperation. Science 1981, 211, 1390–1396. [Google Scholar] [CrossRef] [PubMed]
- Axelrod, R. Effective choice in the prisoner’s dilemma. J. Confl. Resolut. 1980, 24, 3–25. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Lengyel, M.; Dayan, P. Hippocampal contributions to control: The third way. Adv. Neural Inf. Process. Syst. 2007, 20. [Google Scholar]
- Freire, I.T.; Amil, A.F.; Verschure, P.F. Sequential Episodic Control. arXiv 2021, arXiv:2112.14734. [Google Scholar]
- Rosado, O.G.; Amil, A.F.; Freire, I.T.; Verschure, P.F. Drive competition underlies effective allostatic orchestration. Front. Robot. AI 2022, 9, 1052998. [Google Scholar] [CrossRef]
- Sweis, B.M.; Abram, S.V.; Schmidt, B.J.; Seeland, K.D.; MacDonald, A.W., III; Thomas, M.J.; Redish, A.D. Sensitivity to “sunk costs” in mice, rats, and humans. Science 2018, 361, 178–181. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tutić, A.; Voss, T. Trust and game theory. The Routledge Handbook of Trust and Philosophy; Routledge: London, UK, 2020; pp. 175–188. [Google Scholar]
- Moulin-Frier, C.; Puigbo, J.Y.; Arsiwalla, X.D.; Sanchez-Fibla, M.; Verschure, P. Embodied artificial intelligence through distributed adaptive control: An integrated framework. arXiv 2017, arXiv:1704.01407. [Google Scholar]
- Freire, I.T.; Urikh, D.; Arsiwalla, X.D.; Verschure, P.F. Machine Morality: From Harm-Avoidance to Human-Robot Cooperation. In Proceedings of the Conference on Biomimetic and Biohybrid Systems; Springer: Berlin/Heidelberg, Germany, 2020; pp. 116–127. [Google Scholar]
- Arsiwalla, X.D.; Herreros, I.; Moulin-Frier, C.; Sánchez-Fibla, M.; Verschure, P.F. Is Consciousness a Control Process? In Proceedings of the CCIA, Catalonia, Spain, 19–21 October 2016; pp. 233–238. [Google Scholar]
- Arsiwalla, X.D.; Sole, R.; Moulin-Frier, C.; Herreros, I.; Sanchez-Fibla, M.; Verschure, P. The Morphospace of Consciousness. arXiv 2017, arXiv:1705.11190. [Google Scholar]
- Gopnik, A.; Meltzoff, A. Imitation, cultural learning and the origins of “theory of mind”. Behav. Brain Sci. 1993, 16, 521–523. [Google Scholar] [CrossRef]
- Gavrilets, S. Coevolution of actions, personal norms and beliefs about others in social dilemmas. Evol. Hum. Sci. 2021, 3, e44. [Google Scholar] [CrossRef]
Cooperate | Defect | |
---|---|---|
Cooperate | 2, 2 | 0, 3 |
Defect | 3, 0 | 1, 1 |
Cooperate | Defect | |
---|---|---|
Cooperate | 3, 3 | 0, 2 |
Defect | 2, 0 | 1, 1 |
Cooperate | Defect | |
---|---|---|
Cooperate | 2, 2 | 1, 3 |
Defect | 3, 1 | 0, 0 |
Cooperate | Defect | |
---|---|---|
Cooperate | 3, 3 | 1, 2 |
Defect | 2, 1 | 0, 0 |
A | B | |
---|---|---|
A | 0, 0 | 1, 4 |
B | 4, 1 | 0, 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Freire, I.T.; Arsiwalla, X.D.; Puigbò, J.-Y.; Verschure, P. Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control. Information 2023, 14, 441. https://doi.org/10.3390/info14080441
Freire IT, Arsiwalla XD, Puigbò J-Y, Verschure P. Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control. Information. 2023; 14(8):441. https://doi.org/10.3390/info14080441
Chicago/Turabian StyleFreire, Ismael T., Xerxes D. Arsiwalla, Jordi-Ysard Puigbò, and Paul Verschure. 2023. "Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control" Information 14, no. 8: 441. https://doi.org/10.3390/info14080441