# Provably Safe Artificial General Intelligence via Interactive Proofs

## Abstract

**:**

^{1}rapidly triggers a succession of more powerful AGI

^{n}that differ dramatically in their computational capabilities (AGI

^{n}<< AGI

^{n+1}). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2

^{−100}). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGI

^{n}↔ AGI

^{n+1}interaction hazards to an acceptably low level.

## 1. Introduction

#### 1.1. ‘Hard Take-Off’ and Automated AGI Government

#### 1.2. Intrinsic and Extrinsic AGI Control Systems

Our goal is to design self-organizing systems, comprising networks of interacting human and autonomous agents, that sustainably optimize for one or more objective functions.… Our challenge is to choose the right incentives, rules, and interfaces such that agents in our system, whom we do not control, self-organize in a way that fulfills the purpose of our protocol.Ramirez, The Graph [13]

#### 1.3. Preserving Safety and Control Transitively across AGI Generations

^{n+1}may turn off AGI

^{n}and its predecessors to prevent ‘wasteful’ use of finite resources by ‘inferior’ classes.

^{n}to AGI

^{n+1}will preserve the value-alignment with humanity that we construct with AGI

^{1}. With a general method, humanity would construct provably safe AGI

^{1}, which would be endowed with the motivation to produce the specific methods to construct and prove safe AGI

^{2}, etc. In this manner, the methods presented here, with induction, lead to a weak proof of trans-generational AGI safety – weak since the field lacks a precise definition of ‘safety’ (cf. Armstrong [14]).

#### 1.4. Lack of Proof of Safe AGI or Methods to Prove Safe AGI

#### 1.5. Defining “Safe AGI”; Value-Alignment; Ethics and Morality

^{n}and AGI

^{n+1}. Value-sets, from which goals are generated, can be hard-coded, re-coded in AGI versions, or be more dynamic by programming the AGI to learn the desired values via techniques such as inverse reinforcement learning [3,16,20]. In such a scenario, which saints will select the saintly humans to emulate? Or select the rewards to reinforce [21,22]? Which human values would the AGI learn?

- Most humans seek to impose their values on others.
- Most humans shift their values depending on circumstances (‘situation ethics’).

## 2. The Fundamental Problem of Asymmetric Technological Ability

^{n}may have access to classes of algorithmic methods more powerful than those of the more primitive civilizations, such as quantum computation (QC) versus Church–Turing computation (CT). It is believed that QC is capable of fundamentally out-performing CT computation via QC’s ability to encode 2

^{n}states in n spins, i.e. to solve problems requiring exponential scaling in polynomial time [27,28].

## 3. Interactive Proof Systems Solve the General Technological Asymmetry Problem

## 4. Interactive Proof Systems Provide a Transitive Methodology to Prove AGI^{n} Safety

^{1}using bounded polynomial-time probabilistic (BPP) IPS and between AGI

^{n}and AGI

^{n+1}via more complex bounded probabilistic IPS. Necessary conditions for this effort are:

- Identifying a property of AGI, e.g., value-alignment, to which to apply IPS.
- Identifying or creating an IPS method to apply to the property.
- Developing a representation of the property to which a method is applicable.
- Using a pseudo-random number generator compatible with the method [33].

## 5. The Extreme Generality of Interactive Proof Systems

^{−1}with each iteration, and probability of validation accumulates to 1 − 2

^{−k}for k iterations; thus, the Verifier can accept the proof with an arbitrarily small chance of falsehood.

_{n}(b) to the compositeness of n if it passes a certain algorithmic test:

^{k}and cites computer searches that verify the theoretical result.

^{16}[35].

## 6. Correct Interpretation of the Probability of the Proof

^{−100}. This does not mean the number or behavior being tested has 2

^{−100}chance of being composite or of being unsafe; it means that if we were to perform 2

^{100}primality or behavior tests, we expect that just one will be a false positive—composite or unsafe [34]. Thus, if the universe of numbers or potential behaviors under consideration is far smaller than 2

^{100}, we can rely on the test. Paraphrasing Rabin, we could market a product (think ‘safe AGI’) with failure rate 2

^{−100}because, with rigorously-calculated odds, it would be possible to buy error and omission insurance for it [34,36].

## 7. Epistemology: Undecidability, Incompleteness, Inconsistency, Unprovable Theorems

^{1}to find suitable algorithms for proving AGI safety, and the IPS algorithms used by AGI

^{n}may be complex and difficult or impossible for humans to understand.

^{I}becomes the next input state, similar to tetration but state space sizes may change over time [47]. A fundamental limitation may remain, that communication between agents will be at a linear O(t) bandwidth but use of compression as exemplified by axiomatic systems and scientific theories may partially offset the limitation [48].

## 8. Properties of Interactive Proof Systems

^{n}and AGI

^{n+1}, interactive proof systems (IPS) seem custom-designed for the problem of proving properties such as safety or value-alignment. The key IPS properties are:

- The methods used by the Prover are not specified and unbounded.
- The Prover is assumed to have greater computational power than the Verifier.
- The Verifier accepts a proof based on an arbitrarily small chance that the proof is incorrect or he has been fooled by the Prover.

## 9. Multiple Prover Interactive Proof Systems (MIP)

## 10. Random vs. Non-Random Sampling, Prover’s Exploitation of Bias

## 11. Applying IPS to Proving Safe AGI: Examples

#### 11.1. Detection of Behavior Control System (BCS) Forgery via Acyclic Graphs

_{1}and B

_{2}where

**F**is a finite field with at least 3m elements [33]. The representation requires an assignment of polynomials to the graph nodes and edges.

- Verifier selects elements a
_{i}through a_{m}randomly from**F**. - Prover evaluates the assigned polynomials p
_{1}and p_{2}at a_{1}through a_{m}. - If p
_{1}(a_{1}, …, a_{m}) = p_{2}(a_{1}, …, a_{m}), Verifier accepts, otherwise, rejects.

#### 11.2. Program-Checking via Graph Nonisomorphism

^{n}behavior control subroutine, on a machine, the Checker C [32]. Assume the Prover runs a program P that states that two uniquely-labeled graphs are isomorphic P(G

_{1},G

_{2}). The procedure is (1) the Verifier repeatedly permutes labels of one of {G

_{1},G

_{2}}, chosen randomly, and (2) asks the Prover if they are still isomorphic, a problem suspected to be NP-complete. The Prover supplies the permutation as the witness, which can be checked in PTIME A guess has a 50–50 chance of being correct. Thus, with k iterations of the procedure, the probability of error is 2

^{−k}.

#### 11.3. Axiomatic System Representations

- Axioms A = {a
_{1}, a_{2}, a_{3}, …, a_{i}}. - Transformation rules $R=\left\{{r}_{1},{r}_{1},{r}_{1}\dots ,{r}_{j}\right\}$.
- Compositions of axioms and inference rules C = {c
_{1}, c_{2}, c_{3}, …, c_{k}}, e.g., - $({a}_{1},{a}_{2})\not\equiv {r}_{1}\to {c}_{1}$.
- $\left({a}_{2},{a}_{3},{a}_{4}\right)\not\equiv {r}_{2}\to {c}_{2}$.

**Unique**

**Factorization**

**Theorem.**

_{1}, p

_{2}, …, p

_{n}are one or more positive primes, not necessarily distinct. The representation (6) is unique except for the order in which the primes occur.

#### 11.4. Checking for Ethical or Moral Behavior

#### 11.5. BPP Method 1: Random Sampling of Behaviors

#### 11.6. BPP Method 2: Random Sampling of Formulae

_{i}in the axiom/inference rule primes set, which amounts to the derivation of a theorem, and gives the resulting composite. The Prover tests for primality and provides a primality certificate or not to the Verifier.

#### 11.7. BPP Methods 3 and 4: Multiple-Prover Versions of #1 and #2

#### 11.8. BPP Method 5: Behavior Program Correctness

^{P}). Using an IPS, C reduces the probability of P producing buggy outputs to an acceptably low level.

#### 11.9. BPP Method 6: A SAT Representation of Behavior Control

## 12. Probabilistically Checkable Proofs (PCP Theorem)

^{n}behavior in a general sense, representations of AGI behavior, such as axiom systems and CNF, will become increasingly complex. Given an axiomatic system and a theorem/behavior, a Verifier can ask a Prover to create a representation of its proof, serving as its validity certificate, such that the certificate can be verified probabilistically by checking only a constant number of its bits rather than every step [32,59]. Since theorem-proving and SAT are both NP-complete, a similar modification of a CNF representation of BCS by a Prover would be subject to PCP methods, as well as any other NP-complete problem representation. PCP methods further address Yampolskiy’s concerns over error-checking in extremely long proofs [17].

## 13. If ‘Safety’ Can Never Be Described Precisely or Perilous Paths Are Overlooked

## 14. Securing Ethics Modules via Distributed Ledger Technology

- A safe AGI ethics module E
_{1}is developed via simulation in the sandbox. - The safe AGI ethics E
_{1}is encrypted and stored as an immutable reference copy E_{1R}via DLT. - All AGIs of a given computational class are endowed with E
_{1}. - To alter the archived reference copy E
_{1R}requires a strong level S_{1}of consensus. - To alter AGI
^{n}’s personal copy of its ethics BT E_{1i}requires a strong level S_{2}of consensus S_{2}≤ S_{1}. - The smart contract IPS compares AGI
_{n}’s E_{1i}with E_{1R}. - If the individual copy of AGI
_{n}E_{1i}is validated against E_{1R}, the smart contract (a) logs the validation in a blockchain [64], (b) issues a Declaration of Conformity [65], and (c) authorizes AGI is to be released from the sandbox or to access a restricted technology, otherwise authorization fails.

## 15. Interactive Proof Procedure with Multiple Provers in the Sandbox

- Initialization of multiple provers. A number of identical AGIs from a generation are initialized with different conditions. The O
^{II…}tetratic progression of their individual state-space trajectories will quickly diverge. - A smart contract requires and records their identity credentials [63].
- The Verifiers ask the Provers to attempt proofs (tree-traversals) of identical, randomly chosen formulae (behaviors).
- The proofs will be different, but if the ethics and behavior control system are valid, the behaviors (theorems) will be within circumscribed limits.

## 16. Conclusions

^{n}that could circumvent measures proven effective for AGI

^{1}. Interactive proof systems (IPS) allow a weaker Verifier to prove properties of AGI through interaction with a more powerful Prover, consisting of one or more AGIs, with unlimited probability of certainty. Thus, IPS are a means to prove AGI safety between humans and the first AGI generation, and between successive AGI generations. For each AGI property to be proved with IPS, a behavior control representation and one or more probabilistic algorithms, which, alone or in combination, produce acceptably high odds of safe behavior, must be created. Certificates of AGI safety would be stored in blockchains to eliminate untrustworthy and single-source, single-point-of-failure intermediaries. Safety certificates can facilitate smart contract-authorized access to AGI technology in ongoing AGI evolution. These methods are necessary but not sufficient to ensure human safety upon the advent of AGI.

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## Appendix A

#### Appendix A.1. Logical Foundations of IPS

#### Appendix A.2. Deterministic Turing Machine

#### Appendix A.3. Probabilistic and Nondeterministic Turing Machines

_{0}, δ

_{1}, but instead of making a copy of itself to follow all computational paths as does the NDTM, the PTM selects between paths randomly (i.e., probability = ½ for each transition function). The key difference between PTM and a non-deterministic Turing machine (NDTM) is that the NDTM accepts a language if any of its branches contains an accept state for the language while the PTM accepts a language if the majority of branches terminate in the accept state. Unlike DTM and NDTM, PTM accepts a language with a small probability of error ϵ = 0 − majority, set arbitrarily small [32,33].

^{−100}results in a far larger chance of error due to hardware failures than via the probabilistic algorithm [33,34], which also addresses the idea that advances in the physics of computation could defeat a provability method [8]. Further, to the degree that civilization is run on scientific foundations rather than irrefutable logical or mathematical truths, it rests on the same probabilistic logic explicated here [27].

_{1}, we run it repeatedly on machine M

_{2}(say k iterations) until the desired error tolerance ${\u03f5}^{k}$ has been reached. A further restriction is needed to define an efficient PTM.

#### Appendix A.4. Bounded Probabilistic Polynomial Time (BPP)

^{1}, we restrict these machines to be efficient with its current definition—to run in polynomial time. Adding this condition within the amplification procedure defines the language class BPP [32,33]:

**Amplification**

**Lemma**

^{n}, for instance, QC-time efficiency, in testing for safety of an even more powerful AGI

^{n+1}.

#### Interactive Proof Systems

^{1}IPS, the Verifier operates in PSPACE and PTIME and with current scientific knowledge and methods.

## References

- Yampolskiy, R.; Sotala, K. Risks of the Journey to the Singularity. In The Technological Singularity; Callaghan, V., Miller, J., Yampolskiy, R., Armstrong, S., Eds.; Springer: Berlin, Germany, 2017; pp. 11–24. [Google Scholar]
- Yampolskiy, R. Taxonomy of Pathways to Dangerous Artificial Intelligence. In Proceedings of the Workshops of the 30th AAAI Conference on AI, Ethics, and Society, Louisville, AL, USA, 12–13 February 2016; pp. 143–148. [Google Scholar]
- Bostrom, N. Superintelligence: Paths, Dangers, Strategies; Oxford University Press: Oxford, UK, 2014; p. 415. [Google Scholar]
- Babcock, J.; Krámar, J.; Yampolskiy, R.V. Guidelines for Artificial Intelligence Containment. 2017, p. 13. Available online: https://www.cambridge.org/core/books/abs/nextgeneration-ethics/guidelines-for-artificial-intelligence-ppcontainment/9A75BAFDE4FEEAA92EBE84C7B9EF8F21 (accessed on 4 October 2021).
- Callaghan, V.; Miller, J.; Yampolskiy, R.; Armstrong, S. The Technological Singularity: Managing the Journey; Springer: Berlin, Germany, 2017. [Google Scholar]
- Turchin, A. A Map: AGI Failures Modes and Levels. LessWrong 2015 [Cited 5 February 2018]. Available online: http://immortality-roadmap.com/AIfails.pdf (accessed on 4 October 2021).
- Yampolskiy, R.; Duettman, A. Artificial Superintelligence: Coordination & Strategy; MDPI: Basel, Switzerland, 2020; p. 197. [Google Scholar]
- Yampolskiy, R. On controllability of artificial intelligence. 2020. Available online: https://philpapers.org/archive/YAMOCO.pdf (accessed on 4 October 2021).
- Yudkowsky, E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks; Bostrom, N., Ćirković, M.M., Eds.; Oxford University Press: New York, NY, USA, 2008; pp. 308–345. [Google Scholar]
- Carlson, K.W. Safe Artificial General Intelligence via Distributed Ledger Technology. Big Data Cogn. Comput.
**2019**, 3, 40. [Google Scholar] [CrossRef] [Green Version] - Yampolskiy, R.V. From Seed AI to Technological Singularity via Recursively Self-Improving Software. arXiv
**2015**, arXiv:1502.06512. [Google Scholar] - Good, I.J. Speculations concerning the first ultraintelligent machine. Adv. Comput.
**1965**, 6, 31–61. [Google Scholar] - Ramirez, B. Modeling Cryptoeconomic Protocols as Complex Systems - Part 1 (thegraph.com). Available online: https://thegraph.com/blog/modeling-cryptoeconomic-protocols-as-complex-systems-part-1 (accessed on 4 October 2021).
- Armstrong, S. AGI Chaining. 2007. Available online: https://www.lesswrong.com/tag/agi-chaining (accessed on 9 September 2021).
- Omohundro, S. Autonomous technology and the greater human good. J. Exp. Theor. Artif. Intell.
**2014**, 26, 303–315. [Google Scholar] [CrossRef] - Russell, S.J. Human Compatible: Artificial Intelligence and the Problem of Control; Viking: New York, NY, USA, 2019. [Google Scholar]
- Yampolskiy, R.V. What are the ultimate limits to computational techniques: Verifier theory and unverifiability. Phys. Scr.
**2017**, 92, 1–8. [Google Scholar] [CrossRef] - Williams, R.; Yampolskiy, R. Understanding and Avoiding AI Failures: A Practical Guide. Philosophies
**2021**, 6, 53. [Google Scholar] [CrossRef] - Tegmark, M. Life 3.0: Being Human in the Age of Artificial Intelligence, 1st ed.; Alfred, A., Ed.; Knopf: New York, NY, USA, 2017. [Google Scholar]
- Soares, N. The value learning problem. In Proceedings of the Ethics for Artificial Intelligence Workshop at 25th IJCAI, New York, NY, USA, 9 July 2016. [Google Scholar]
- Silver, D.; Singh, S.; Precup, S.; Sutton, R.S. Reward is Enough. Artif. Intell.
**2021**, 299. [Google Scholar] [CrossRef] - Yampolskiy, R. Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach. In Philosophy and Theory of Artificial Intelligence; Müller, V.C., Ed.; Springer: Berlin, Germany, 2012; pp. 389–396. [Google Scholar]
- Soares, N.; Fallenstein, B. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda. Mach. Intell. Res. Inst.
**2014**. [Google Scholar] [CrossRef] - Future of Life Institute. ASILOMAR AI Principles. 2017. Available online: https://futureoflife.org/ai-principles/ (accessed on 22 December 2018).
- Hanson, R. Prefer Law to Values. 2009. Available online: http://www.overcomingbias.com/2009/10/prefer-law-to-values.html (accessed on 4 October 2021).
- Rothbard, M.N. Man, Economy, and State: A Treatise on Economic Principles; Ludwig Von Mises Institute: Auburn, AL, USA, 1993; p. 987. [Google Scholar]
- Aharonov, D.; Vazirani, U.V. Is Quantum Mechanics Falsifiable? A Computational Perspective on the Foundations of Quantum Mechanics. In Computability: Turing, Gödel, Church, and Beyond; Copeland, B.J., Posy, C.J., Shagrir, O., Eds.; MIT Press: Cambridge, MA, USA, 2015; pp. 329–394. [Google Scholar]
- Feynman, R.P. Quantum Mechanical Computers. Opt. News
**1985**, 11, 11–20. [Google Scholar] [CrossRef] - Goldwasser, S.; Micali, S.; Rackoff, C. The Knowledge Complexity of Interactive Proof Systems. SIAM J. Comput.
**1989**, 18, 186–208. [Google Scholar] [CrossRef] - Babai, L. Trading Group Theory for Randomness. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing, Providence, RI, USA, 6–8 May 1985; pp. 421–429. [Google Scholar]
- Yampolskiy, R.V. Unpredictability of AI: On the impossibility of accurately predicting all actions of a smarter agent. J. Artifcial Intell. Conscious.
**2020**, 7, 109–118. [Google Scholar] [CrossRef] - Arora, S.; Barak, B. Computational Complexity: A Modern Approach; Cambridge Univ. Press: Cambridge, UK, 2009; p. 579. [Google Scholar]
- Sipser, M. Introduction to the Theory of Computation, 3rd ed.; Course Technology Cengage Learning: Boston, MA, USA, 2012. [Google Scholar]
- Rabin, M. A Probabilistic Algorithm for Testing Primality. J. Number Theory
**1980**, 12, 128–138. [Google Scholar] [CrossRef] [Green Version] - Wagon, S. Mathematica in Action: Problem Solving through Visualization and Computation, 3rd ed.; Springer: New York, NY, USA, 2010; p. 578. [Google Scholar]
- Ribenboim, P. The Little Book of Bigger Primes; Springer: New York, NY, USA, 2004; p. 1. [Google Scholar]
- LeVeque, W.J. Fundamentals of Number Theory; Dover, Ed.; Dover: New York, NY, USA, 1996; p. 280. [Google Scholar]
- Wolfram, S. A New Kind of Science; Wolfram Media: Champaign, IL, USA, 2002; p. 1197. [Google Scholar]
- Calude, C.S.; Jürgensen, H. Is complexity a source of incompleteness? Adv. Appl. Math.
**2005**, 35, 1–15. [Google Scholar] [CrossRef] [Green Version] - Chaitin, G.J. The Unknowable. Springer Series in Discrete Mathematics and Theoretical Computer Science; Springer: New York, NY, USA, 1999; p. 122. [Google Scholar]
- Calude, C.S.; Rudeanu, S. Proving as a computable procedure. Fundam. Inform.
**2005**, 64, 1–10. [Google Scholar] - Boolos, G.; Jeffrey, R.C. Computability and Logic, 3rd ed.; Cambridge University Press: Oxford, UK, 1989; p. 304. [Google Scholar]
- Davis, M. The Undecidable; Basic Papers on Undecidable Propositions; Unsolvable Problems and Computable Functions; Raven Press: Hewlett, NY, USA, 1965; p. 440. [Google Scholar]
- Chaitin, G.J. Meta Math!: The Quest for Omega, 1st ed.; Pantheon Books: New York, NY, USA, 2005; p. 220. [Google Scholar]
- Calude, C.; Paăun, G. Finite versus Infinite: Contributions to an Eternal Dilemma. Discrete Mathematics and Theoretical Computer Science; Springer: London, UK, 2000; p. 371. [Google Scholar]
- Newell, A. Unified Theories of Cognition. William James Lectures; Harvard Univ. Press: Cambridge, UK, 1990; p. 549. [Google Scholar]
- Goodstein, R. Transfinite ordinals in recursive number theory. J. Symb. Log.
**1947**, 12, 123–129. [Google Scholar] [CrossRef] [Green Version] - Potapov, A.; Svitenkov, A.; Vinogradov, Y. Differences between Kolmogorov Complexity and Solomonoff Probability: Consequences for AGI. In Artificial General Intelligence; Springer: Berlin, Germany, 2012. [Google Scholar]
- Babai, L.; Fortnow, L.; Lund, C. Non-deterministic exponential time has two-prover interactive protocols. Comput Complex.
**1991**, 1, 3–40. [Google Scholar] [CrossRef] - Miller, J.D.; Yampolskiy, R.; Häggström, O. An AGI modifying its utility function in violation of the strong orthogonality thesis. Philosophies
**2020**, 5, 40. [Google Scholar] [CrossRef] - Howe, W.J.; Yampolskiy, R.V. Impossibility of unambiguous communication as a source of failure in AI systems. 2020. Available online: https://www.researchgate.net/profile/Roman-Yampolskiy/publication/343812839_Impossibility_of_Unambiguous_Communication_as_a_Source_of_Failure_in_AI_Systems/links/5f411ebb299bf13404e0b7c5/Impossibility-of-Unambiguous-Communication-as-a-Source-of-Failure-in-AI-Systems.pdf (accessed on 4 October 2021). [CrossRef]
- Horowitz, E. Programming Languages, a Grand Tour: A Collection of Papers, Computer software engineering series, 2nd ed.; Computer Science Press: Rockville, MD, USA, 1985; p. 758. [Google Scholar]
- DeLong, H. A Profile of Mathematical Logic. In Addison-Wesley series in mathematics; Addison-Wesley: Reading, MA, USA, 1970; p. 304. [Google Scholar]
- Enderton, H.B. A Mathematical Introduction to Logic; Academic Press: New York, NY, USA, 1972; p. 295. [Google Scholar]
- Iovino, M.; Scukins, E.; Styrud, J.; Ögren, P.; Smith, C. A survey of behavior trees in robotics and AI. arXiv
**2020**, arXiv:2005.05842v2. [Google Scholar] - Defense Innovation Board. AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department of Defense. U.S. Department of Defense; 2019. Available online: https://media.defense.gov/2019/Oct/31/2002204458/-1/-1/0/DIB_AI_PRINCIPLES_PRIMARY_DOCUMENT.PDF (accessed on 4 October 2021).
- Karp, R.M. Reducibility among Combinatorial Problems, in Complexity of Computer Computations; Miller, R.E., Thatcher, J.W., Bohlinger, J.D., Eds.; Springer: Boston, MA, USA, 1972. [Google Scholar]
- Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W. H. Freeman: San Francisco, CA, USA, 1979; p. 338. [Google Scholar]
- Arora, S.; Safra, S. Probabilistic checking of proofs: A new characterization of NP. JACM
**2012**, 45, 70–122. [Google Scholar] [CrossRef] - Asimov, I. Robot; Gnome Press: New York, NY, USA, 1950; p. 253. [Google Scholar]
- Yudkowsky, E. Complex Value Systems in Friendly AI. In Artificial General Intelligence; Schmidhuber, J., Thórisson, K.R., Looks, M., Eds.; Springer: Berlin, Germany, 2011; pp. 389–393. [Google Scholar]
- Yampolskiy, R.V. Leakproofing singularity—Artificial intelligence confinement problem. J. Conscious. Stud.
**2012**, 19, 194–214. [Google Scholar] - Yampolskiy, R.V. Behavioral Biometrics for Verification and Recognition of AI Programs. In Proceedings of the SPIE—The International Society for Optical Engineering, Buffalo, NY, USA, 20–23 January 2008. [Google Scholar] [CrossRef]
- Bore, N.K. Promoting distributed trust in machine learning and computational simulation via a blockchain network. arXiv
**2018**, arXiv:1810.11126. [Google Scholar] - Hind, M. Increasing trust in AI services through Supplier’s Declarations of Conformity. arXiv
**2018**, arXiv:1808.0726129. [Google Scholar]

**Figure 1.**Cartoon robot behavior tree (BT) with typical numbering of vertices and edges (after Iovino et al. [55], Figure 2), as a minute portion of a large and complex AGI behavior control system. Vertex codes for high-level BT algorithms: 1: Fallback. 21, 22, 23, 34: Sequence. 41, 42: Condition. Vertex codes for lower-level BT algorithms: 31: Human approaching? 32: Maintain prescribed safe distance. 33: Human asks for help with task. 41: Begin log. 42: Is task moral? 43: Is task ethical? 35: Low energy? 36: Seek power station. 24: Wander.

**Figure 2.**The same BT with prime numbers (red) representing vertex algorithms and omitting edge labels. 2: Fallback. 3: Sequence. 5: Parallel. 7: Action. 11: Condition. 13: Human approaching? 17: Maintain prescribed safe distance. 19: Human asks for help with task. 23: Begin log. 29: Is task moral? 31: Is task ethical? 37: Low energy? 41: Seek power station. 43: Wander. The trajectory to the ethical test is described by the sequence (2, 3, 3, 31) and composite 2 × 3 × 3 × 31 = 558.

Value-aligned interaction | Voluntary, non-fraudulent transactions driven by individual value-sets |

Value mis-aligned interaction | A set of values preferred by ≥1 agent(s) forced on ≥1 agent(s) |

Syntactical Symbol | Prime | Model |
---|---|---|

a_{1} | p_{1} | 2 |

a_{2} | p_{2} | 3 |

a_{3} | p_{3} | 5 |

… | … | … |

a_{n} | p_{n} | p_{n} |

r_{1} | p_{n+1} | p_{n+1} |

r_{2} | p_{n+2} | p_{n+2} |

r_{3} | p_{n+3} | p_{n+3} |

… | … | … |

r_{n} | p_{m} | p_{m} |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Carlson, K.
Provably Safe Artificial General Intelligence via Interactive Proofs. *Philosophies* **2021**, *6*, 83.
https://doi.org/10.3390/philosophies6040083

**AMA Style**

Carlson K.
Provably Safe Artificial General Intelligence via Interactive Proofs. *Philosophies*. 2021; 6(4):83.
https://doi.org/10.3390/philosophies6040083

**Chicago/Turabian Style**

Carlson, Kristen.
2021. "Provably Safe Artificial General Intelligence via Interactive Proofs" *Philosophies* 6, no. 4: 83.
https://doi.org/10.3390/philosophies6040083