Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation
Abstract
:1. Introduction
2. Background
3. Active and Passive Strategy Acquisition Experiments
3.1. Methods
3.2. Results
4. Experiment of Cooperative Behavior Acquisition Using Meta-Strategy
4.1. Methods
4.2. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yokoyama, A.; Omori, T. Modeling of human intention estimation process in social interaction scene. In Proceedings of the 2010 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
- Miyamoto, K.; Takefuji, Y.; Watanabe, N. Pedestrian meta-strategy analysis of collision avoidance with two autonomous agents. In Proceedings of the 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE 2015), Osaka, Japan, 27–30 October 2015; pp. 467–469. [Google Scholar]
- Sugahara, R.; Katagami, D. Proposal of discommunication robot. In Proceedings of the First International Conference on Human-Agent Interaction, Sapporo, Japan, 7–9 August 2013. [Google Scholar]
- Katagami, D.; Tanaka, Y. Change of impression resulting fromvoice in Discommunication motion of baby robot. In Proceedings of the HAI Symposium, Copenhagen, Denmark, 28–30 May 2015; pp. 171–176. (In Japanese). [Google Scholar]
- Kozima, H.; Michalowski, M.P.; Nakagawa, C. Keepon. Int. J. Soc. Robot. 2009, 1, 3–18. [Google Scholar] [CrossRef]
- Sato, T. Emergence of robust cooperative states by Iterative internalizations of opponents’ personalized values in minority game. J. Inf. Commun. Eng. 2017, 3, 157–166. [Google Scholar]
- Kitamura, Y.; Tanaka, T.; Kishino, F.; Yachida, M. Real-time path planning in a dynamically changing 3-D environment. In Proceedings of the International Conference on Intelligent Robots and Systems, Osaka, Japan, 4–8 November 1996; pp. 925–931. [Google Scholar]
- Kerr, W.; Spears, D.; Spears, W.; Thayer, D. Two for-mal gas models for multi-agent sweeping and obstacle avoidance. In Proceedings of the International Workshop on Formal Approaches to Agent-Based Systems, Greenbelt, MD, USA, 26–27 April 2004; pp. 111–130. [Google Scholar]
- Mastellone, S.; Stipanović, D.M.; Graunke, C.R.; Intlekofer, K.A.; Spong, M.W. Formation control and collision avoidance for multi-agent non-holonomic systems: Theory and experiments. Int. J. Robot. Res. 2008, 27, 107–126. [Google Scholar] [CrossRef]
- Thompson, S.; Horiuchi, T.; Kagami, S. A Probailistic Model of Human Motion and Naigation Intent for Mobile Robot Path Planning. In Proceedings of the IEEE International Conference on Autonomous Robots and Agents, New York, NY, USA, 10–12 February 2009; pp. 1051–1061. [Google Scholar]
- Hamasaki, S.; Tamura, Y.; Yamashita, A.; Asama, H. Prediction of Human’s Movement for Collision Avoidance of Mobile Robot. In Proceedings of the IEEE International Conference on Robotics and Biomimentics, Phuket, Thailand, 7–11 December 2011; pp. 1633–1638. [Google Scholar]
- Yamada, K.; Takano, S.; Watanabe, S. Reinforcement Learning Approaches for Acquiring Conflict Avoidance Behaviors in Multi-Agent Systems. In Proceedings of the 2011 IEEE/SICE International Symposium on System Integration, Kyoto, Japan, 20–22 December 2011; pp. 679–684. [Google Scholar]
Trials | Episodes | Steps | Policy Function | Temperature Drops | Sub-Strategy Alpha |
---|---|---|---|---|---|
5 | 3000 | 500 | softmax | 500 episodes | same with meta-strategy |
Avg | Std | |
---|---|---|
without cooperative rewards | 342.9 | 36.3 |
with cooperative rewards | 280.7 | 89.1 |
meta-strategy | 384.6 | 24.5 |
Trials | Episodes | Steps | Policy Function | Temperature Drops | Sub-Strategy Alpha |
---|---|---|---|---|---|
3 | 25,000 × (5 or 6) | 500 | softmax + ϵ | 10,000 episodes | less than meta-strategy |
Rewards | Set 4 (Passive Suitable) | Set 5 (Active Suitable) | ||
Avg | Std | Avg | Std | |
(0, 0) | 389.1 | 20.9 | 389.4 | 15.5 |
(2, 1) | 390.3 | 22.5 | 389.3 | 15.1 |
(20, 10) | 387.2 | 19.2 | 372.7 | 31.0 |
Rewards | set 5 (Passive Suitable) | Set 6 (Active Suitable) | ||
r (0, 0) | 391.2 | 18.7 | 387.5 | 14.6 |
r (2, 1) | 392.3 | 21.2 | 381.0 | 15.0 |
r (20, 10) | 391.1 | 20.6 | 358.7 | 35.8 |
Rewards | Set 4 (Passive Suitable) | Set 5 (Active Suitable) | ||
Avg | Std | Avg | Std | |
(0, 0) | 128.8 | 98.5 | 372.6 | 39.9 |
(2, 1) | 92.8 | 111.9 | 362.7 | 41.1 |
(20, 10) | 237.1 | 91.4 | 352.8 | 53.1 |
Rewards | Set 5 (Passive Suitable) | Set 6 (Active Suitable) | ||
r (0, 0) | 249.2 | 103.5 | 344.6 | 62.1 |
r (2, 1) | 295.6 | 94.8 | 353.9 | 54.2 |
r (20, 10) | 355.0 | 63.9 | 354.2 | 47.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miyamoto, K.; Watanabe, N.; Takefuji, Y. Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation. Appl. Sci. 2021, 11, 1786. https://doi.org/10.3390/app11041786
Miyamoto K, Watanabe N, Takefuji Y. Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation. Applied Sciences. 2021; 11(4):1786. https://doi.org/10.3390/app11041786
Chicago/Turabian StyleMiyamoto, Kensuke, Norifumi Watanabe, and Yoshiyasu Takefuji. 2021. "Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation" Applied Sciences 11, no. 4: 1786. https://doi.org/10.3390/app11041786
APA StyleMiyamoto, K., Watanabe, N., & Takefuji, Y. (2021). Adaptation to Other Agent’s Behavior Using Meta-Strategy Learning by Collision Avoidance Simulation. Applied Sciences, 11(4), 1786. https://doi.org/10.3390/app11041786