# A Philosophical Treatise of Universal Induction

## Abstract

**:**

This article is dedicated to Ray Solomonoff (1926–2009), the discoverer and inventor of Universal Induction. |

## 1. Introduction

#### 1.1. Overview of Article

## 2. Broader Context

#### 2.1. Induction versus Deduction

#### 2.2. Prediction versus Induction

#### 2.3. Prediction, Concept Learning, Classification, Regression

#### 2.4. Prediction with Expert Advice versus Bayesian Learning

#### 2.5. No Free Lunch versus Occam’s Razor

#### 2.6. Non-Monotonic Reasoning

#### 2.7. Solomonoff Induction

## 3. Probability

#### 3.1. Frequentist

#### 3.2. Objectivist

#### Kolmogorov’s Probability Axioms.

- If A and B are events, then the intersection $A\cap B$, the union $A\cup B$, and the difference $A\backslash B$ are also events.
- The sample space Ω and the empty set $\left\{\right\}$ are events.
- There is a function P that assigns non-negative real numbers, called probabilities, to each event.
- $P(\Omega )=1$ and $P\left(\right\{\left\}\right)=0$.
- $P(A\cup B)=P\left(A\right)+P\left(B\right)-P(A\cap B)$
- For a decreasing sequence ${A}_{1}\supset {A}_{2}\supset {A}_{3}...$of events with ${\cap}_{n}{A}_{n}=\left\{\right\}$ we have ${lim}_{n\to \infty}P\left({A}_{n}\right)=\phantom{\rule{3.33333pt}{0ex}}0$

#### 3.3. Subjectivist

#### Cox’s Axioms for Beliefs.

- The degree of belief in an event B, given that event A has occurred can be characterized by a real-valued function $Bel\left(B\right|A)$.
- $Bel(\Omega \backslash B|A)$ is a twice differentiable function of $Bel\left(B\right|A)$ for $A\ne \left\{\right\}$.
- $Bel(B\cap C|A)$ is a twice differentiable function of $Bel\left(C\right|B\cap A)$ and $Bel\left(B\right|A)$ for $B\cap A\ne \left\{\right\}$.

## 4. Bayesianism for Prediction

#### 4.1. Notation

#### 4.2. Thomas Bayes

#### 4.3. Models, Hypotheses and Environments

#### 4.4. Bayes Theorem

#### 4.5. Partial Hypotheses

#### 4.6. Sequence Prediction

#### 4.7. Bayes Mixture

#### 4.8. Expectation

#### 4.9. Convergence Results

#### 4.10. Bayesian Decisions

#### 4.11. Continuous Environment Classes

#### 4.12. Choosing the Model Class

## 5. History

#### 5.1. Epicurus

#### 5.2. Sextus Empiricus and David Hume

#### 5.3. William of Ockham

#### 5.4. Pierre-Simon Laplace and the Rule of Succession

#### 5.5. Confirmation Problem

#### 5.6. Patrick Maher Does not Capture the Logic of Confirmation

#### 5.7. Black Ravens Paradox

**$B\left(x\right)$**are true confirms the hypothesis “all x which are A are also B” or $\forall xA\left(x\right)\Rightarrow B\left(x\right)$. This is known as Nicods condition which has been seen as a highly intuitive property but it is not universally accepted [9]. However even if there are particular situations where it does not hold it is certainly true in the majority of situations and in these situations the following problem remains.

#### 5.8. Alan Turing

#### 5.9. Andrey Kolmogorov

## 6. How to Choose the Prior

#### 6.1. Subjective versus Objective Priors

#### 6.2. Indifference Principle

#### 6.3. Reparametrization Invariance

#### 6.4. Regrouping Invariance

#### 6.5. Universal Prior

## 7. Solomonoff Universal Prediction

#### 7.1. Universal Bayes Mixture

#### 7.2. Deterministic Representation

#### 7.3. Old Evidence and New Hypotheses

#### 7.4. Black Ravens Paradox Using Solomonoff

## 8. Prediction Bounds

#### 8.1. Total Bounds

#### 8.2. Instantaneous Bounds

#### 8.3. Future Bounds

#### 8.4. Universal is Better than Continuous $\mathcal{M}$

## 9. Approximations and Applications

#### 9.1. Golden Standard

“in spite of its incomputability, Algorithmic Probability can serve as a kind of `Gold standard’ for inductive systems”—Ray Solomonoff, 1997

#### 9.2. Minimum Description Length Principle

#### 9.3. Resource Bounded Complexity and Prior

#### 9.4. Context Tree Weighting

#### 9.5. Universal Similarity Measure

#### 9.6. Universal Artificial Intelligence

## 10. Discussion

#### 10.1. Prior Knowledge

#### 10.2. Dependence on Universal Turing Machine U

#### 10.3. Advantages and Disadvantages

- General total bounds for generic class, prior and loss function as well as instantaneous and future bounds for both the i.i.d. and universal cases.
- The bound for continuous classes and the more general result that M works well even in non-computable environments.
- Solomonoff satisfies both reparametrization and regrouping invariance.
- Solomonoff solves many persistent philosophical problems such as the zero prior and confirmation problem for universal hypotheses. It also deals with the problem of old evidence and we argue that it should solve the black ravens paradox.
- The issue of incorporating prior knowledge is also elegantly dealt with by providing two methods which theoretically allow any knowledge with any degree of relevance to be most effectively exploited.

#### 10.4. Conclusions

## References

- McGinn, C. Can we solve the mind-body problem? Mind
**1989**, 98, 349–366. [Google Scholar] [CrossRef] - Asmis, E. Epicurus’ Scientific Method; Cornell Univ. Press: Ithaca, NY, USA, 1984. [Google Scholar]
- Ockham, W. Philosophical Writings: A Selection, 2nd ed.; Hackett Publishing Company: Indianapolis, IN, USA, 1990. [Google Scholar]
- Hume, D. A Treatise of Human Nature, Book I, Edited version; Selby-Bigge, L.A., Nidditch, P.H., Eds.; Oxford University Press: Oxford, UK, 1978. [Google Scholar]
- McGrayne, S.B. The Theory that Would Not Die; Yale University Press: New Haven, CT, USA, 2011. [Google Scholar]
- Handbook of Inductive Logic; Gabbay, D.M.; Hartmann, S.; Woods, J. (Eds.) North Holland: Amsterdamm, The Netherlands, 2011.
- Solomonoff, R.J. A Formal Theory of Inductive Inference: Parts 1 and 2. Inform. Contr.
**1964**, 7, 1–22 and 224–254. [Google Scholar] - Hutter, M. On Universal Prediction and Bayesian Confirmation. Theor. Comput. Sci.
**2007**, 384, 33–48. [Google Scholar] - Maher, P. Probability Captures the Logic of Scientific Confirmation. In Contemporary Debates in Philosophy of Science; Hitchcock, C., Ed.; Blackwell Publishing: Malden, MA, USA, 2004; Chapter 3; pp. 69–93. [Google Scholar]
- Hutter, M.; Poland, J. Adaptive Online Prediction by Following the Perturbed Leader. J. Mach. Learn. Res.
**2005**, 6, 639–660. [Google Scholar] - Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Comput.
**1997**, 1, 67–82. [Google Scholar] - Mitchell, T.M. The Need for Biases in Learning Generalizations. In Readings in Machine Learning Machine Learning; Shavlik, J., Dietterich, T., Eds.; Morgan Kaufmann: San Mateo, CA, USA, 1990; pp. 184–192. [Google Scholar]
- Good, I.J. Explicativity, corroboration, and the relative odds of hypotheses. In Good Thinking: The Foundations of Probability and its Applications; University of Minnesota Press: Minneapolis, MN, USA, 1983. [Google Scholar]
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, MA, USA, 2003; p. 758. [Google Scholar]
- Reichenbach, H. The Theory of Probability: An Inquiry into the Logical and Mathematical Foundations of the Calculus of Probability, 2nd ed.; University of California Press: Berkeley, CA, USA, 1949. [Google Scholar]
- Kolmogorov, A.N. Foundations of the Theory of Probability., 2nd ed.; Chelsea Pub Co: New York, NY, USA, 1956. [Google Scholar]
- Hutter, M. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability; Springer: Berlin, Germany, 2005. [Google Scholar]
- Schmidhuber, J. A Computer Scientist’s View of Life, the Universe, and Everything. In Foundations of Computer Science: Potential—Theory—Cognition; Springer: Berlin, Germany, 1997; Volume 1337, LNCS; pp. 201–208. [Google Scholar]
- Earman, J. Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory; MIT Press: Cambridge, MA, USA, 1993. [Google Scholar]
- Schick, F. Dutch Bookies and Money Pumps. J. Philos.
**1986**, 83, 112–119. [Google Scholar] - Good, I.J. 46656 Varieties of Bayesians. Letter in American Statistician
**1971**, 25, 62–63, Reprinted in Good Thinking, University of Minnesota Press, 1982, pp. 2021. [Google Scholar] - Bayes, T. An essay towards solving a problem in the doctrine of chances. Phil. Trans. Biol. Sci.
**1763**, 53, 370–418, Reprinted in Biometrika,**1958**, 45, 296–315. [Google Scholar] - Zabell, S.L. The Rule of Succession. Erkenntnis
**1989**, 31, 283–321. [Google Scholar] - Hutter, M. Convergence and Loss Bounds for Bayesian Sequence Prediction. IEEE Trans. Inform. Theor.
**2003**, 49, 2061–2067. [Google Scholar] - Hutter, M. Optimality of Universal Bayesian Prediction for General Loss and Alphabet. J. Mach. Learn. Res.
**2003**, 4, 971–1000. [Google Scholar] - Clarke, B.S.; Barron, A.R. Information-Theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theor.
**1990**, 36, 453–471. [Google Scholar] - Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed.; Springer: Berlin, Germany, 2008. [Google Scholar]
- Empiricus, S. Sextus Empiricus, with an English Translation by R. G. Bury; Heinemann: London, UK, 1933. [Google Scholar]
- Empiricus, S. Sextus Empiricus: Outlines of Scepticism, 2nd ed.; Annas, J., Barnes, J., Eds.; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Good, I.J. The Paradox of Confirmation. Br. J. Philos. Sci.
**1960**, 11, 145–149. [Google Scholar] - Hutter, M. Algorithmic Information Theory: A brief non-technical guide to the field. Scholarpedia
**2007**, 2, 2519. [Google Scholar] - Hutter, M. Open Problems in Universal Induction & Intelligence. Algorithms
**2009**, 3, 879–906. [Google Scholar] - Wallace, C.S. Statistical and Inductive Inference by Minimum Message Length; Springer: Berlin, Germany, 2005. [Google Scholar]
- Hutter, M. A Complete Theory of Everything (will be subjective). Algorithms
**2010**, 3, 329–350. [Google Scholar] - Solomonoff, R.J. Complexity-Based Induction Systems: Comparisons and Convergence Theorems. IEEE Trans. Inform. Theor.
**1978**, IT-24, 422–432. [Google Scholar] [CrossRef] - Blackwell, D.; Dubins, L. Merging of opinions with increasing information. Ann. Math. Stat.
**1962**, 33, 882–887. [Google Scholar] - Barnsley, M.F.; Hurd, L.P. Fractal Image Compression; Peters, A.K., Ed.; CRC Press: Boca Raton, FL, USA, 1993. [Google Scholar]
- Cilibrasi, R.; Vitányi, P.M.B. Similarity of Objects and the Meaning of Words. In Proceedings of the 3rd Annual Conferene on Theory and Applications of Models of Computation (TAMC’06), Beijing, China, 15–20 May 2006; Springer: Berlin, Germany, 2006; Volume 3959, LNCS. pp. 21–45. [Google Scholar]
- Chernov, A.; Hutter, M.; Schmidhuber, J. Algorithmic Complexity Bounds on Future Prediction Errors. Inform. Comput.
**2007**, 205, 242–261. [Google Scholar] - Grünwald, P.D. The Minimum Description Length Principle; The MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Schmidhuber, J. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In Proceedings of the 15th Conf. on Computational Learning Theory (COLT’02), Sydney, Australia, 8–10 July 2002; Springer, Berlin: Sydney, Australia, 2002; Volume 2375, LNAI. pp. 216–228. [Google Scholar]
- Lempel, A.; Ziv, J. On the Complexity of Finite Sequences. IEEE Trans. Inform. Theor.
**1976**, 22, 75–81. [Google Scholar] [CrossRef] - Willems, F.M.J.; Shtarkov, Y.M.; Tjalkens, T.J. The Context Tree Weighting Method: Basic Properties. IEEE Trans. Inform. Theor.
**1995**, 41, 653–664. [Google Scholar] [CrossRef] - Veness, J.; Ng, K.S.; Hutter, M.; Uther, W.; Silver, D. A Monte Carlo AIXI Approximation. J. Artif. Intell. Res.
**2011**, 40, 95–142. [Google Scholar] - Li, M.; Chen, X.; Li, X.; Ma, B.; Vitányi, P.M.B. The similarity metric. IEEE Trans. Inform. Theor.
**2004**, 50, 3250–3264. [Google Scholar] [CrossRef] - Cilibrasi, R.; Vitányi, P.M.B. Clustering by compression. IEEE Trans. Inform. Theor.
**2005**, 51, 1523–1545. [Google Scholar] [CrossRef] - Legg, S.; Hutter, M. Universal Intelligence: A Definition of Machine Intelligence. Mind. Mach.
**2007**, 17, 391–444. [Google Scholar] [CrossRef]

**Figure 1.**Schematic graph of prefix Kolmogorov complexity $K\left(x\right)$ with string x interpreted as integer.

Induction | ⇔ | Deduction | |
---|---|---|---|

Type of inference: | generalization/prediction | ⇔ | specialization/derivation |

Framework: | probability axioms | $\widehat{=}$ | logical axioms |

Assumptions: | prior | $\widehat{=}$ | non-logical axioms |

Inference rule: | Bayes rule | $\widehat{=}$ | modus ponens |

Results: | posterior | $\widehat{=}$ | theorems |

Universal scheme: | Solomonoff probability | $\widehat{=}$ | Zermelo-Fraenkel set theory |

Universal inference: | universal induction | $\widehat{=}$ | universal theorem prover |

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

## Share and Cite

**MDPI and ACS Style**

Rathmanner, S.; Hutter, M. A Philosophical Treatise of Universal Induction. *Entropy* **2011**, *13*, 1076-1136.
https://doi.org/10.3390/e13061076

**AMA Style**

Rathmanner S, Hutter M. A Philosophical Treatise of Universal Induction. *Entropy*. 2011; 13(6):1076-1136.
https://doi.org/10.3390/e13061076

**Chicago/Turabian Style**

Rathmanner, Samuel, and Marcus Hutter. 2011. "A Philosophical Treatise of Universal Induction" *Entropy* 13, no. 6: 1076-1136.
https://doi.org/10.3390/e13061076