## 1. Background, Motivation and Contribution

#### 1.1. Motivation

#### 1.2. Contribution

#### 1.3. Extending Single-Agent Optimization Failures

- Tails Fall Apart, or Regressional inaccuracy, where the relationship between the modeled goal and the true goal is inexact due to noise (for example, measurement error,) so that the bias grows as the system is optimized.
- Extremal Model Insufficiency, where the approximate model omits factors which dominate the system’s behavior after optimization.
- Extremal Regime Change, where the model does not include a regime change that occurs under certain (unobserved) conditions that optimization creates.
- Causal Model Failure, where the agent’s actions are based on a model which incorrectly represents causal relationships, and the optimization involves interventions that break the causal structure the model implicitly relies on.

#### 1.4. Defining Multi-Agent Failures

## 2. Multi-Agent Failures: Context and Categorization

#### 2.1. Texas Hold’em and the Complexity of Multi-Agent Dynamics

#### 2.2. Limited Complexity Models versus the Real World

#### 2.3. Failure modes

**Failure Mode**

**1.**

**Accidental Steering**is when multiple agents alter the systems in ways not anticipated by at least one agent, creating one of the above-mentioned single-party overoptimization failures.

**Remark**

**1.**

**Model.**

**1.1—Group Overoptimization.**

**Remark**

**2.**

**Model.**

**1.2—Catastrophic Threshold Failure.**

**Remark**

**3.**

**Example**

**1.**

**Example**

**2.**

**Failure Mode**

**2.**

**Coordination Failure**occurs when multiple agents clash despite having potentially compatible goals.

**Remark**

**4.**

**Model.**

**2.1—Unintended Resource Contention.**

**Remark**

**5.**

**Example**

**3.**

**Remark**

**6.**

**Model.**

**2.2—Unnecessary Resource Contention.**

**Remark**

**7.**

**Failure Mode**

**3.**

**Adversarial optimization**can occur when a victim agent has an incomplete model of how an opponent can influence the system. The opponent’s model of the victim allows it to intentionally select for cases where the victim’s model performs poorly and/or promotes the opponent’s goal [3].

**Model.**

**3.1—Adversarial Goal Poisoning.**

**Example**

**4.**

**Example**

**5.**

**Example**

**6.**

**Remark**

**8.**

**Model.**

**3.2—Adversarial Optimization Theft.**

**Failure Mode**

**4.**

**Input spoofing and filtering**—Filtered evidence can be provided, or false evidence can be manufactured and put into the training data stream of a victim agent.

**Model.**

**4.1—Input Spoofing.**

**Remark**

**9.**

**Example**

**7.**

**Example**

**8.**

**Model.**

**4.2—Active Input Spoofing.**

**Example**

**9.**

**Example**

**10.**

**Model.**

**4.3—Input Filtering.**

**Example**

**11.**

**Remark**

**10.**

**Failure Mode**

**5.**

**Goal co-option**is when an opponent controls the system the Victim runs on, or relies on, and can therefore make changes to affect the victim’s actions.

**Remark**

**11.**

**Model.**

**5.1—External Reward Function Modification.**

**Remark**

**12.**

**Model.**

**5.2—Output Interception.**

**Model.**

**5.3—Data or Label Interception.**

**Example**

**12.**

**Remark**

**13.**

## 3. Discussion

#### Potential Avenues for Mitigation

## 4. Conclusions: Model Failures and Policy Failures

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Clark, J.; Amodei, D. Faulty Reward Functions in the Wild. 2016. Available online: https://openai.com/blog/faulty-reward-functions/ (accessed on 12 March 2019).
- Goodhart, C.A.E. Problems of Monetary Management: The UK Experience; Papers in Monetary Economics; Reserve Bank of Australia: Sydney, Australia, 1975. [Google Scholar]
- Manheim, D.; Garrabrant, S. Categorizing Variants of Goodhart’s Law. arXiv, 2018; arXiv:1803.04585. [Google Scholar]
- Campbell, D.T. Assessing the impact of planned social change. Eval. Program Plan.
**1979**, 2, 67–90. [Google Scholar] [CrossRef] - Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in AI safety. arXiv, 2016; arXiv:1606.06565. [Google Scholar]
- Kleinberg, J.; Raghavan, M. How Do Classifiers Induce Agents To Invest Effort Strategically? arXiv, 2018; arXiv:1807.05307. [Google Scholar]
- Bostrom, N. Superintelligence; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
- Braganza, O. Proxyeconomics, An agent based model of Campbell’s law in competitive societal systems. arXiv, 2018; arXiv:1803.00345. [Google Scholar]
- Krakovna, V. Specification Gaming Examples in AI. 2018. Available online: https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/ (accessed on 12 March 2019).
- Liu, L.; Cheng, L.; Liu, Y.; Jia, Y.; Rosenblum, D.S. Recognizing Complex Activities by a Probabilistic Interval-based Model. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Cheney, N.; MacCurdy, R.; Clune, J.; Lipson, H. Unshackling evolution: Evolving soft robots with multiple materials and a powerful generative encoding. ACM SIGEVOlution
**2014**, 7, 11–23. [Google Scholar] [CrossRef] - Figueras, J. Genetic Algorithm Physics Exploiting. 2015. Available online: https://youtu.be/ppf3VqpsryU (accessed on 12 March 2019).
- Lehman, J.; Clune, J.; Misevic, D.; Adami, C.; Beaulieu, J.; Bentley, P.J.; Bernard, S.; Belson, G.; Bryson, D.M.; Cheney, N. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. arXiv, 2018; arXiv:1803.03453. [Google Scholar]
- Chopra, J. GitHub issue for OpenAI gym environment FetchPush-v0. 2018. Available online: https://github.com/openai/gym/issues/920 (accessed on 12 March 2019).
- Popov, I.; Heess, N.; Lillicrap, T.; Hafner, R.; Barth-Maron, G.; Vecerik, M.; Lampe, T.; Tassa, Y.; Erez, T.; Riedmiller, M. Data-efficient deep reinforcement learning for dexterous manipulation. arXiv, 2017; arXiv:1704.03073. [Google Scholar]
- Weimer, W. Advances in Automated Program Repair and a Call to Arms. In Proceedings of the 5th International Symposium on Search Based Software Engineering—Volume 8084; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Sandberg, A. Friendly Superintelligence. Presentation at Extro 5 Conference. 2001. Available online: http://www.nada.kth.se/~asa/Extro5/Friendly%20Superintelligence.htm (accessed on 12 March 2019).
- Yudkowsky, E. Complex value systems in friendly AI. In Proceedings of the International Conference on Artificial General Intelligence, Mountain View, CA, USA, 3–6 August 2011; Springer: New York, NY, USA, 2011; pp. 388–393. [Google Scholar]
- Worley, G.G., III. Robustness to fundamental uncertainty in AGI alignment. arXiv, 2018; arXiv:1807.09836. [Google Scholar]
- Danzig, R. Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority; Technical Report; Center for a New American Security: Washington, DC, USA, 2018. [Google Scholar]
- Baum, S. Superintelligence skepticism as a political tool. Information
**2018**, 9, 209. [Google Scholar] [CrossRef] - Yudkowsky, E. Intelligence explosion microeconomics. Mach. Intell. Res.
**2013**, 23, 2015. [Google Scholar] - Lewis, G.T. Why the Tails Come Apart Apart. Lesswrong. 2014. Available online: http://lesswrong.com/lw/km6/whythetailscomeapart/ (accessed on 12 March 2019).
- Yudkowsky, E. The AI Alignment Problem: Why It’s Hard, and Where to Start; Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
- Behzadan, V.; Munir, A. Models and Framework for Adversarial Attacks on Complex Adaptive Systems. arXiv, 2017; arXiv:1709.04137. [Google Scholar]
- Drexler, K.E. Engines of Creation; Anchor: New York, NY, USA, 1986. [Google Scholar]
- Armstrong, S.; Sandberg, A.; Bostrom, N. Thinking inside the box: Controlling and using an oracle AI. Minds Mach.
**2012**, 22, 299–324. [Google Scholar] [CrossRef] - Mulligan, T.S. How Enron Manipulated State’s Power Market. Los Angeles Times. 9 May 2002. Available online: http://articles.latimes.com/2002/may/09/business/fi-scheme9 (accessed on 9 March 2019).
- Borel, E.; Ville, J. Applications de la théorie des Probabilités aux jeux de Hasard; Gauthier-Villars: Paris, France, 1938. [Google Scholar]
- Kuhn, H.W. A simplified two-person poker. Contrib. Theory Games
**1950**, 1, 97–103. [Google Scholar] - Bowling, M.; Burch, N.; Johanson, M.; Tammelin, O. Heads-up limit hold’em poker is solved. Science
**2015**, 347, 145–149. [Google Scholar] [CrossRef] [PubMed] - Brown, N.; Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science
**2018**, 359, 418–424. [Google Scholar] [CrossRef] [PubMed] - Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, J.; Schauenberg, T.; Szafron, D. Approximating game-theoretic optimal strategies for full-scale poker. IJCAI
**2003**, 3, 661. [Google Scholar] - Soares, N. Formalizing Two Problems of Realistic World-Models. Technical Report. Available online: https://intelligence.org/files/RealisticWorldModels.pdf (accessed on 9 March 2019).
- Conant, R.C.; Ross Ashby, W. Every good regulator of a system must be a model of that system. Int. J. Syst. Sci.
**1970**, 1, 89–97. [Google Scholar] [CrossRef] - Demski, A.; Garrabrant, S. Embedded Agency. arXiv, 2019; arXiv:1902.09469. [Google Scholar]
- O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Broadway Books: New York City, NY, USA, 2016. [Google Scholar]
- Eisen, M. Amazon’s $23,698,655.93 Book about Flies. 2011. Available online: http://www.michaeleisen.org/blog/?p=358 (accessed on 9 March 2019).
- Smaldino, P.E.; McElreath, R. The natural selection of bad science. Open Sci.
**2016**, 3, 160384. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gibbard, A. Manipulation of Voting Schemes: A General Result. Econometrica
**1973**, 41, 587–601. [Google Scholar] [CrossRef] - Yudkowsky, E. Inadequate Equilibria: Where and How Civilizations Get Stuck; Machine Intelligence Research Institute: Berkeley, CA, USA, 2017. [Google Scholar]
- Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Nisan, N.; Roughgarden, T.; Tardos, E.; Vazirani, V.V. Algorithmic Game Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
- Tramèr, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing Machine Learning Models via Prediction APIs. In Proceedings of the USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2016; pp. 601–618. [Google Scholar]
- Shorter, G.W.; Miller, R.S. High-Frequency Trading: Background, Concerns, and Regulatory Developments; Congressional Research Service: Washington, DC, USA, 2014; Volume 29. [Google Scholar]
- Wang, Y.; Chaudhuri, K. Data Poisoning Attacks against Online Learning. arXiv, 2018; arXiv:1808.08994. [Google Scholar]
- Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv, 2017; arXiv:1712.05526. [Google Scholar]
- Xiao, H.; Xiao, H.; Eckert, C. Adversarial Label Flips Attack on Support Vector Machines. Front. Artif. Intell. Appl.
**2012**, 242. [Google Scholar] [CrossRef] - Dixon, H.D. Keeping up with the Joneses: Competition and the evolution of collusion. J. Econ. Behav. Organ.
**2000**, 43, 223–238. [Google Scholar] [CrossRef] - Sandberg, A. There is plenty of time at the bottom: The economics, risk and ethics of time compression. Foresight
**2018**, 21, 84–99. [Google Scholar] [CrossRef] - Leibo, J.Z.; Zambaldi, V.; Lanctot, M.; Marecki, J.; Graepel, T. Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil, 8–12 May 2017; pp. 464–473. [Google Scholar]
- Leibo, J.Z.; Hughes, E.; Lanctot, M.; Graepel, T. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research. arXiv, 2019; arXiv:1903.00742. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O.P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst.
**2017**, 6379–6390. [Google Scholar] - Russell, S. Comment to Victoria Krakovna, Specification Gaming Examples in AI. 2018. Available online: https://perma.cc/3U33-W8HN (accessed on 12 March 2019).
- Taylor, J. Quantilizers: A safer alternative to maximizers for limited optimization. In Proceedings of the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Shalev-Shwartz, S.; Shammah, S.; Shashua, A. On a formal model of safe and scalable self-driving cars. arXiv, 2017; arXiv:1708.06374. [Google Scholar]
- Manheim, D. Oversight of Unsafe Systems via Dynamic Safety Envelopes. arXiv, 2018; arXiv:1811.09246. [Google Scholar]
- Liu, Y.; Nie, L.; Liu, L.; Rosenblum, D.S. From action to activity: Sensor-based activity recognition. Neurocomputing
**2016**, 181, 108–115. [Google Scholar] [CrossRef] - Yampolskiy, R.; Fox, J. Safety engineering for artificial general intelligence. Topoi
**2013**, 32, 217–226. [Google Scholar] [CrossRef] - Christiano, P.; Shlegeris, B.; Amodei, D. Supervising strong learners by amplifying weak experts. arXiv, 2018; arXiv:1810.08575. [Google Scholar]
- Irving, G.; Christiano, P.; Amodei, D. AI safety via debate. arXiv, 2018; arXiv:1805.00899. [Google Scholar]

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).