# A Complete Theory of Everything (Will Be Subjective)

## Abstract

**:**

“... in spite of it’s incomputability, Algorithmic Probability can serve as a kind of ‘Gold Standard’ for induction systems”— Ray Solomonoff (1997)

“There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.”— Douglas Adams, Hitchhikers guide to the Galaxy (1979)

## 1. Introduction

## 2. Theories of Something, Everything & Nothing

**(G) Geocentric model.**In the well-known geocentric model, the Earth is at the center of the universe and the Sun, the Moon, and all planets and stars move around Earth. The ancient model assumed concentric spheres, but increasing precision in observations and measurements revealed a quite complex geocentric picture with planets moving with variable speed on epicycles. This Ptolemaic system predicted the celestial motions quite well for its time, but was relatively complex in the common sense and in the sense of involving many parameters that had to be fitted experimentally.

**(H) Heliocentric model.**In the modern (later) heliocentric model, the Sun is at the center of the solar system (or universe), with all planets (and stars) moving in ellipses around the Sun. Copernicus developed a complete model, much simpler than the Ptolemaic system, which interestingly did not offer better predictions initially, but Kepler’s refinements ultimately outperformed all geocentric models. The price for this improvement was to expel the observers (humans) from the center of the universe to one out of 8 moving planets. While today this price seems small, historically it was quite high. Indeed we will compute the exact price later.

**(E) Effective theories.**After the celestial mechanics of planets have been understood, ever more complex phenomena could be captured with increasing coverage. Newton’s mechanics unifies celestial and terrestrial gravitational phenomena. When unified with special relativity theory one arrives at Einstein’s general relativity, predicting large scale phenomena like black holes and the big bang. On the small scale, electrical and magnetic phenomena are unified by Maxwell’s equations for electromagnetism. Quantum mechanics and electromagnetism have further been unified to quantum electrodynamics (QED). QED is the most powerful theory ever invented, in terms of precision and coverage of phenomena. It is a theory of all physical and chemical processes, except for radio-activity and gravity.

**(P) Standard model of particle physics.**Salam, Glashow and Weinberg extended QED to include weak interactions, responsible for radioactive decay. Together with quantum chromo dynamic [12], which describes the nucleus, this constitutes the current standard model (SM) of particle physics. It describes all known non-gravitational phenomena in our universe. There is no experiment indicating any limitation (precision, coverage). It has about 20 unexplained parameters (mostly masses and coupling constants) that have to be (and are) experimentally determined (although some regularities can be explained [13]). The effective theories of the previous paragraph can be regarded as approximations of SM, hence SM, although founded on a subatomic level, also predicts medium scale phenomena.

**(S) String theory.**Pure gravitational and pure quantum phenomena are perfectly predictable by general relativity and the standard model, respectively. Phenomena involving both, like the big bang, require a proper final unification. String theory is the candidate for a final unification of the standard model with the gravitational force. As such it describes the universe at its largest and smallest scale, and all scales in-between. String theory is essentially parameter-free, but is immensely difficult to evaluate and it seems to allow for many solutions (spatial compactifications). For these and other reasons, there is currently no uniquely accepted cosmological model.

**(C) Cosmological models.**Our concept of what the universe is, seems to ever expand. In ancient times there was Earth, Sun, Moon, and a few planets, surrounded by a sphere of shiny points (fixed stars). The current textbook universe started in a big bang and consists of billions of galaxy clusters each containing billions of stars, probably many with a planetary system. But this is just the visible universe. According to inflation models, which are needed to explain the homogeneity of our universe, the “total” universe is vastly larger than the visible part.

**(M) Multiverse theories.**Many theories (can be argued to) imply a multitude of essentially disconnected universes (in the conventional sense), often each with their own (quite different) characteristics [14]. In Wheeler’s oscillating universe a new big bang follows the assumed big crunch, and this repeats indefinitely. Lee Smolin proposed that every black hole recursively produces new universes on the “other side” with quite different properties. Everett’s many-worlds interpretation of quantum mechanics postulates that the wave function doesn’t collapse but the universe splits (decoheres) into different branches, one for each possible outcome of a measurement. Some string theorists have suggested that possibly all compactifications in their theory are realized, each resulting in a different universe.

**(U) Universal ToE.**The last two multiverse suggestions contain the seed of a general idea. If theory X contains some unexplained elements Y (quantum or compactification or other indeterminism), one postulates that every realization of Y results in its own universe, and we just happen to live in one of them. Often the anthropic principle is used in some hand-waving way to argue why we are in this and not that universe [8]. Taking this to the extreme, Schmidhuber [9,15] postulates a multiverse (which I call universal universe) that consists of every computable universe (note there are “just” countably many computer programs). Clearly, if our universe is computable, then it is contained in the universal universe, so we have a ToE already in our hands. Similar in spirit but neither constructive nor formally well-defined is Tegmark’s mathematical multiverse [16].

**(R) Random universe.**Actually there is a much simpler way of obtaining a ToE. Consider an infinite sequence of random bits (fair coin tosses). It is easy to see that any finite pattern, i.e., any finite binary sequence, occurs (actually infinitely often) in this string. Now consider our observable universe quantized at e.g. Planck level, and code the whole space-time universe into a huge bit string. If the universe ends in a big crunch, this string is finite. (Think of a digital high resolution 3D movie of the universe from the big bang to the big crunch). This big string also appears somewhere in our random string, hence our random string is a perfect ToE. This is reminiscent of the Boltzmann brain idea that in a sufficiently large random universe, there exist low entropy regions that resemble our own universe and/or brain (observer) [17, Sec.3.8].

**(A) All-a-Carte models.**The existence of true randomness is controversial and complicates many considerations. So ToE (R) may be rejected on this ground, but there is a simple deterministic computable variant. Glue the natural numbers written in binary format, 1,10,11,100,101,110,111,1000,1001,... to one long string.

**Remarks.**I presume that every reader of this section at some point regarded the remainder as bogus. In a sense this paper is about a rational criterion to decide whether a model is sane or insane. The problem is that the line of sanity differs for different people and different historical times.

## 3. Predictive Power & Observer Localization

**Particle physics.**The standard model has more power and hence is closer to a ToE than all effective theories (E) together. String theory plus the right choice of compactification reduces to the standard model, so has the same or superior power. The key point here is the inclusion of the “right choice of compactification”. Without it, string theory is in some respect less powerful than SM.

**Baby universes.**Let us now turn to the cosmological models, in particular Smolin’s baby universe theory, in which infinitely many universes with different properties exist. The theory “explains” why a universe with our properties exist (since it includes universes with all kinds of properties), but it has little predictive power. The baby universe theory plus a specification in which universe we happen to live would determine the value of the inter-universe variables for our universe, and hence have much more predictive power. So localizing ourselves increases the predictive power of the theory.

**Universal ToE.**Let us consider the even larger universal multiverse. Assuming our universe is computable, the multiverse generated by UToE contains and hence perfectly describes our universe. But this is of little use, since we can’t use UToE for prediction. If we knew our “position” in this multiverse, we would know in which (sub)universe we are. This is equivalent to knowing the program that generates our universe. This program may be close to any of the conventional cosmological models, which indeed have a lot of predictive power. Since locating ourselves in UToE is equivalent and hence as hard as finding a conventional ToE of our universe, we have not gained much.

**All-a-Carte models**also contain and hence perfectly describe our universe. If and only if we can localize ourselves, we can actually use it for predictions. (For instance, if we knew we were in the center of universe 001011011 we could predict that we will ‘see’ 0010 when ‘looking’ to the left and 1011 when looking to the right.) Let u be a snapshot of our space-time universe; a truly gargantuan string. Locating ourselves means to (at least) locate u in the multiverse. We know that u is the u’s number in Champernowne’s sequence (interpreting u as a binary number), hence locating u is equivalent to specifying u. So a ToE based on normal numbers is only useful if accompanied by the gargantuan snapshot u of our universe. In light of this, an “All-a-Carte” ToE (without knowing u) is rather a theory of nothing than a theory of everything.

**Localization within our universe.**The loss of predictive power when enlarging a universe to a multiverse model has nothing to do with multiverses per se. Indeed, the distinction between a universe and a multiverse is not absolute. For instance, Champernowne’s number could also be interpreted as a single universe, rather than a multiverse. It could be regarded as an extreme form of the infinite fantasia land from the NeverEnding Story, where everything happens somewhere. Champernowne’s number constitutes a perfect map of the All-a-Carte universe, but the map is useless unless you know where you are. Similarly but less extreme, the inflation model produces a universe that is vastly larger than its visible part, and different regions may have different properties.

**Egocentric to Geocentric model.**Consider now the “small” scale of our daily life. A young child believes it is the center of the world. Localization is trivial. It is always at “coordinate” (0,0,0). Later it learns that it is just one among a few billion other people and as little or much special as any other person thinks of themself. In a sense we replace our egocentric coordinate system by one with origin (0,0,0) in the center of Earth. The move away from an egocentric world view has many social advantages, but dis-answers one question: Why am I this particular person and not any other? (It also comes at the cost of constantly having to balance egoistic with altruistic behavior.)

**Geocentric to Heliocentric model.**While being expelled from the center of the world as an individual, in the geocentric model, at least the human race as a whole remains in the center of the world, with the remaining (dead?) universe revolving around us. The heliocentric model puts Sun at (0,0,0) and degrades Earth to planet number 3 out of 8. The astronomic advantages are clear, but dis-answers one question: Why this planet and not one of the others? Typically we are muzzled by questionable anthropic arguments [8,18]. (Another scientific cost is the necessity now to switch between coordinate systems, since the ego- and geocentric views are still useful.)

**Heliocentric to modern cosmological model.**The next coup of astronomers was to degrade our Sun to one star among billions of stars in our milky way, and our milky way to one galaxy out of billions of others. It is generally accepted that the question why we are in this particular galaxy in this particular solar system is essentially unanswerable.

**Summary.**The exemplary discussion above has hopefully convinced the reader that we indeed lose something (some predictive power) when progressing to too large universe and multiverse models. Historically, the higher predictive power of the large-universe models (in which we are seemingly randomly placed) overshadowed the few extra questions they raised compared to the smaller ego/geo/helio-centric models. (we’re not concerned here with the psychological disadvantages/damage, which may be large). But the discussion of the (physical, universal, random, and all-a-carte) multiverse theories has shown that pushing this progression too far will at some point harm predictive power. We saw that this has to do with the increasing difficulty to localize the observer.

## 4. Complete ToEs (CToEs)

**A complete ToE needs specification of**

- (i)
- initial conditions
- (e)
- state evolution
- (l)
- localization of observer
- (n)
- random noise
- (o)
- perception ability of observer

**Epistemology.**I assume that the observers’ experience of the world consists of a single temporal binary sequence which gets longer with time. This is definitely true if the observer is a robot equipped with sensors like a video camera whose signal is converted to a digital data stream, fed into a digital computer and stored in a binary file of increasing length. In humans, the signal transmitted by the optic and other sensory nerves could play the role of the digital data stream. Of course (most) human observers do not possess photographic memory. We can deal with this limitation in various ways: digitally record and make accessible upon request the nerve signals from birth till now, or allow for uncertain or partially remembered observations. Classical philosophical theories of knowledge [19] (e.g., as justified true belief) operate on a much higher conceptual level and therefore require stronger (and hence more disputable) philosophical presuppositions. In my minimalist “spartan” information-theoretic epistemology, a bit-string is the only observation, and all higher ontologies are constructed from it and are pure “imagination”.

**Predictive power and elegance.**Whatever the intermediary guiding principles for designing theories/models (elegance, symmetries, tractability, consistency), the ultimate judge is predictive success. Unfortunately we can never be sure whether a given ToE makes correct predictions in the future. After all we cannot rule out that the world suddenly changes tomorrow in a totally unexpected way (cf. the quote at beginning of this article). We have to compare theories based on their predictive success in the past. It is also clear that the latter is not enough: For every model we can construct an alternative model that behaves identically in the past but makes different predictions from, say, year 2020 on. Popper’s falsifiability dogma is little helpful. Beyond postdictive success, the guiding principle in designing and selecting theories, especially in physics, is elegance and mathematical consistency. The predictive power of the first heliocentric model was not superior to the geocentric one, but it was much simpler. In more profane terms, it has significantly less parameters that need to be specified.

**Ockham’s razor**suitably interpreted tells us to choose the simpler among two or more otherwise equally good theories. For justifications of Ockham’s razor, see [6] and Section 8. Some even argue that by definition, science is about applying Ockham’s razor, see [20]. For a discussion in the context of theories in physics, see [21]. It is beyond the scope of this paper to repeat these considerations. In Section 4 and Section 8 I will show that simpler theories more likely lead to correct predictions, and therefore Ockham’s razor is suitable for finding ToEs.

**Complexity of a ToE.**In order to apply Ockham’s razor in a non-heuristic way, we need to quantify simplicity or complexity. Roughly, the complexity of a theory can be defined as the number of symbols one needs to write the theory down. More precisely, write down a program for the state evolution together with the initial conditions, and define the complexity of the theory as the size in bits of the file that contains the program. This quantification is known as algorithmic information or Kolmogorov complexity [6] and is consistent with our intuition, since an elegant theory will have a shorter program than an inelegant one, and extra parameters need extra space to code, resulting in longer programs [4,5].

**Standard model versus string theory.**To keep the discussion simple, let us pretend that standard model (SM) + gravity (G) and string theory (S) both qualify as ToEs. SM+Gravity is a mixture of a few relatively elegant theories, but contains about 20 parameters that need to be specified. String theory is truly elegant, but ensuring that it reduces to the standard model needs sophisticated extra assumptions (e.g., the right compactification).

**CToE selection principle.**It is trivial to write down a program for an All-a-Carte multiverse (A). It is also not too hard to write a program for the universal multiverse (U), see Section 6. Lengthwise (A) easily wins over (U), and (U) easily wins over (P) and (S), but as discussed (A) and (U) have serious defects. On the other hand, these theories can only be used for predictions after extra specifications: Roughly, for (A) this amounts to tabling the whole universe, (U) requires defining a ToE in the conventional sense, (P) needs 20 or so parameters and (S) a compactification scheme. Hence localization-wise (P) and (S) easily win over (U), and (U) easily wins over (A). Given this trade-off, it now nearly suggests itself that we should include the description length of the observer location in our ToE evaluation measure. That is,

**ToE versus UToE.**Consider any (C)ToE and its program q, e.g., (P) or (S). Since (U) runs all programs including q, specifying q means localizing (C)ToE q in (U). So (U)+q is a CToE whose length is just some constant bits (the simulation part of (U)) more than that of (C)ToE q. So whatever (C)ToE physicists come up with, (U) is nearly as good as this theory. This essentially clarifies the paradoxical status of (U). Naked, (U) is a theory of nothing, but in combination with another ToE it excels to a good CToE, albeit slightly longer=worse than the latter.

**Localization within our universe.**So far we have only localized our universe in the multiverse, but not ourselves in the universe. To localize our Sun, we could, e.g., sort (and index) stars by their creation date, which the model (i)+(e) provides. Most stars last for 1-10 billion years (say an average of 5 billion years). The universe is 14 billion years old, so most stars may be 3rd generation (Sun definitely is), so the total number of stars that have ever existed should very roughly be 3 times the current number of stars of about ${10}^{11}\times {10}^{11}$. Probably “3” is very crude, but this doesn’t really matter for sake of the argument. In order to localize our Sun we only need its index, which can be coded in about ${\mathrm{log}}_{2}(3\times {10}^{11}\times {10}^{11})\doteq 75$ bits. Similarly we can sort and index planets and observers. To localize earth among the 8 planets needs 3 bits. To localize yourself among 7 billion humans needs 33 bits. Alternatively one could simply specify the $(x,y,z,t)$ coordinate of the observer, which requires more but still only very few bits. These localization penalties are tiny compared to the difference in predictive power (to be quantified later) of the various theories (ego/geo/helio/cosmo). This explains and justifies theories of large universes in which we occupy a random location.

## 5. Complete ToE - Formalization

**Objective ToE.**Since we essentially identify a ToE with a program generating a universe, we need to fix some general purpose programming language on a general purpose computer. In theoretical computer science, the standard model is a so-called Universal Turing Machine ($\mathrm{UTM}$) [6]. It takes a program coded as a finite binary string $q\in {\{0,1\}}^{*}$, executes it and outputs a finite or infinite binary string $u\in {\{0,1\}}^{*}\cup {\{0,1\}}^{\infty}$. The details do not matter to us, since drawn conclusions are typically independent of them. In this section we only consider q with infinite output

**Observational process and subjective complete ToE.**As we have demonstrated it is also important to localize the observer. In order to avoid potential qualms with modeling human observers, consider as a surrogate a (conventional not extra cosmic) video camera filming=observing parts of the world. The camera may be fixed on Earth or installed on an autonomous robot. It records part of the universe u denoted by $o={o}_{1:\infty}$. (If the lifetime of the observer is finite, we append zeros to the finite observation ${o}_{1:N}$).

**CToE selection principle.**So far, s and q were fictitious subjects and universe programs. Let ${o}_{1:t}^{true}$ be the past observations of some concrete observer in our universe, e.g., your own personal experience of the world from birth till today. The future observations ${o}_{t+1:\infty}^{true}$ are of course unknown. By definition, ${o}_{1:t}$ contains all available experience of the observer, including e.g., outcomes of scientific experiments, school education, read books, etc.

## 6. Universal ToE - Formalization

**Definition of Universal ToE.**The Universal ToE generates all computable universes. The generated multiverse can be depicted as an infinite matrix in which each row corresponds to one universe.

**Partial ToEs.**Cutting the universes into bits and interweaving them into one string might appear messy, but is unproblematic for two reasons: First, the bijection $i=\langle q,k\rangle $ is very simple, so any particular universe string ${u}^{q}$ can easily be recovered from $\stackrel{\u02d8}{u}$. Second, such an extraction will be included in the localization / observational process s, i.e., s will contain a specification of the relevant universe q and which bits k are to be observed.

**ToE versus UToE.**We can formalize the argument in the last section of simulating a ToE by UToE as follows: If $(q,s)$ is a CToE, then $(\stackrel{\u02d8}{q},\tilde{s})$ based on UToE $\stackrel{\u02d8}{q}$ and observer $\tilde{s}:=rqs$, where program r extracts ${u}^{q}$ from $\stackrel{\u02d8}{u}$ and then ${o}^{sq}$ from ${u}^{q}$, is an equivalent but slightly larger CToE, since $\mathrm{UTM}(\tilde{s},\stackrel{\u02d8}{u})={o}^{qs}=\mathrm{UTM}(s,{u}^{q})$ by definition of $\tilde{s}$ and $\mathrm{Length}\left(\stackrel{\u02d8}{q}\right)+\mathrm{Length}\left(\tilde{s}\right)=\mathrm{Length}\left(q\right)+\mathrm{Length}\left(s\right)+O\left(1\right)$.

**The best CToE.**Finally, one may define the best CToE (of an observer with experience ${o}_{1:t}^{true}$) as

## 7. Extensions

**Partial theories.**Not all interesting theories are ToEs. Indeed, most theories are only partial models of aspects of our world.

**Approximate theories.**Most theories are not perfect but only approximate reality, even in their limited domain. The geocentric model is less accurate than the heliocentric model, Newton’s mechanics approximates general relativity, etc. Approximate theories can be viewed as a version of partial theories. For example, consider predicting locations of planets with locations being coded by (truncated) real numbers in binary representation, then Einstein gets more bits right than Newton. The remaining erroneous bits could be tabled as above. Errors are often more subtle than simple bit errors, in which case correction programs rather than just tables are needed.

**Celestial example.**The ancient celestial models just capture the movement of some celestial bodies, and even those only imperfectly. Nevertheless it is interesting to compare them. Let us take as our corpus of observations ${o}_{1:t}^{true}$, say, all astronomical tables available in the year 1600, and ignore all other experience.

**Probabilistic theories.**Contrary to a deterministic theory that predicts the future from the past for sure, a probabilistic theory assigns to each future a certain chance that it will occur. Equivalently, a deterministic universe is described by some string u, while a probabilistic universe is described by some probability distribution $Q\left(u\right)$, the a priori probability of u. (In the special case of $Q\left({u}^{\prime}\right)=1$ for ${u}^{\prime}=u$ and 0 else, Q describes the deterministic universe u.) Similarly, the observational process may be probabilistic. Let $S\left(o\right|u)$ be the probability of observing o in universe u. Together, $(Q,S)$ is a probabilistic CToE that predicts observation o with probability $P\left(o\right)={\sum}_{u}S\left(o\right|u)Q\left(u\right)$. A computable probabilistic CToE is one for which there exist programs (of lengths $\mathrm{Length}\left(Q\right)$ and $\mathrm{Length}\left(S\right)$) that compute the functions $Q(\xb7)$ and $S(\xb7|\xb7)$.

**Probabilistic examples.**Assume $S\left(o\right|o)=1\phantom{\rule{1.66656pt}{0ex}}\forall o$ and consider the observation sequence ${o}_{1:t}^{true}={u}_{1:t}^{true}=11001001000011111101101010100$. If we assume this is a sequence of fair coin flips, then $Q\left({o}_{1:t}\right)=P\left({o}_{1:t}\right)={2}^{-t}$ are very simple functions, but $|{\mathrm{log}}_{2}P\left({o}_{1:t}\right)|=t$ is large. If we assume that ${o}_{1:t}^{true}$ is the binary expansion of π (which it is), then the corresponding deterministic Q is somewhat more complex, but $|{\mathrm{log}}_{2}P\left({o}_{1:t}^{true}\right)|=0$. So for sufficiently large t, the deterministic model of π is selected, since it leads to a shorter code (5) than the fair-coin-flip model.

**Theories with parameters.**Many theories in physics depend on real-valued parameters. Since observations have finite accuracy, it is sufficient to specify these parameters to some finite accuracy. Hence the theories including their finite-precision parameters can be coded in finite length. There are general results and techniques [4,5] that allow a comfortable handling of all this. For instance, for smooth parametric models, a parameter accuracy of $O(1/\sqrt{n})$ is needed, which requires $\frac{1}{2}}{\mathrm{log}}_{2}n+O\left(1\right)$ bits per parameter. The explicable $O\left(1\right)$ term depends on the smoothness of the model and prevents ‘cheating’ (e.g. zipping two parameters into one).

**Infinite and continuous universes.**So far we have assumed that each time-slice through our universe can be described in finitely many bits and time is discrete. Assume our universe were the infinite continuous 3+1 dimensional Minkowski space $I\phantom{\rule{-1.66656pt}{0ex}}\phantom{\rule{-1.66656pt}{0ex}}{R}^{4}$ occupied by (tiny) balls (“particles”). Consider all points $(x,y,z,t)\in I\phantom{\rule{-1.66656pt}{0ex}}\phantom{\rule{-1.66656pt}{0ex}}{R}^{4}$ with rational coordinates, and let $i=\langle x,y,z,t\rangle $ be a bijection to the natural numbers similarly to the dovetailing in Section 6. Let ${u}_{i}=1$ if $(x,y,z,t)$ is occupied by a particle and 0 otherwise. String ${u}_{1:\infty}$ is an exact description of this universe. The above idea generalizes to any so-called separable mathematical space. Since all spaces occurring in established physical theories are separable, there is currently no ToE candidate that requires uncountable universes. Maybe continuous theories are just convenient approximations of deeper discrete theories. An even more fundamental argument put forward in this context by [9] is that the Loewenheim-Skolem theorem (an apparent paradox) implies that Zermelo-Fraenkel set theory (ZFC) has a countably infinite model. Since all physical theories so far are formalizable in ZFC, it follows they all have a countable model. For some strange reason (possibly an historical artifact), the adopted uncountable interpretation seems just more convenient.

**Multiple theories.**Some proponents of pluralism and some opponents of reductionism argue that we need multiple theories on multiple scales for different (overlapping) application domains. They argue that a ToE is not desirable and/or not possible. Here I give a reason why we need one single fundamental theory (with all other theories having to be regarded as approximations): Consider two Theories ($T1$ and $T2$) with (proclaimed) application domains $A1$ and $A2$, respectively.

## 8. Justification of Ockham’s Razor

Ockham’s razor could be regarded as correct if among all considered theories, the one selected by Ockham’s razor is the one that most likely leads to correct predictions.

**Assumptions.**Assume we live in the universal multiverse $\stackrel{\u02d8}{u}$ that consists of all computable universes, i.e., UToE is a correct/true/perfect ToE. Since every computable universe is contained in UToE, it is at least under the computability assumption impossible to disprove this assumptions. The second assumption we make is that our location in the multiverse is random. We can divide this into two steps: First, the universe ${u}^{q}$ in which we happen to be is chosen randomly. Second, our “location” s within ${u}^{q}$ is chosen at random. We call these the universal self-sampling assumption. The crucial difference to the informal anthropic self-sampling assumption used in doomsday arguments is discussed below.

A priori it is equally likely to be in any of the universes ${u}^{q}$ generated by some program $q\in {\{0,1\}}^{*}$.

**Counting consistent universes.**Let ${o}_{1:t}^{true}={u}_{1:t}^{true}$ be the universe observed so far and

**Probabilistic prediction.**Given observations ${u}_{1:t}^{true}$ we now determine the probability of being in a universe that continues with ${u}_{t+1:n}$, where $n>t$. Similarly to the previous paragraph we can approximately count the number of such universes:

**Ockham’s razor.**Relation (6) implies that the most likely continuation ${\widehat{u}}_{t+1:n}:={\mathrm{argmax}}_{{u}_{t+1:n}}P\left({u}_{t+1:n}\right|{u}_{1:t}^{true})$ is (approximately) the one that minimizes ${l}_{n}$. By definition, ${q}_{min}$ is the shortest program in ${Q}_{L}={\bigcup}_{{u}_{t+1:n}}{Q}_{L}^{n}$. Therefore

We are most likely in a universe that is (equivalent to) the simplest universe consistent with our past observations.

Ockham’s razor is correct under the universal self-sampling assumption.

**Discussion.**It is important to note that the universal self-sampling assumption has not by itself any bias towards simple models q. Indeed, most q in ${Q}_{L}$ have length close to L, and since we sample uniformly from ${Q}_{L}$ this actually represents a huge bias towards large models for $L\to \infty $.

**Comparison to anthropic self-sampling.**Our universal self-sampling assumption is related to anthropic self-sampling [18] but crucially different. The anthropic self-sampling assumption states that a priori you are equally likely any of the (human) observers in our universe. First, we sample from any universe and any location (living or dead) in the multiverse and not only among human (or reasonably intelligent) observers. Second, we have no problem of what counts as a reasonable (human) observer. Third, our principle is completely formal.

**No Free Lunch (NFL) myth.**Wolpert [23] considers algorithms for finding the minimum of a function, and compares their average performance. The simplest performance measure is the number of function evaluations needed to find the global minimum. The average is taken uniformly over the set of all functions from and to some fixed finite domain. Since sampling uniformly leads with (very) high probability to a totally random function (white noise), it is clear that on average no optimization algorithm can perform better than exhaustive search, and no reasonable algorithm (that is one that probes every function argument at most once) performs worse. That is, all reasonable optimization algorithms are equally bad on average. This is the essence of Wolpert’s NFL theorem and all variations thereof I am aware of, including the ones for less uniform distributions.

**Some technical details${}^{*}$.**Readers not familiar with Algorithmic Information Theory might want to skip this paragraph. $P\left(u\right)$ in (6) tends for $L\to \infty $ to Solomonoff’s a priori distribution $M\left(u\right)$. In the definition of M [26] only programs of length $=L$, rather than $\le L$ are considered, but since ${lim}_{L\to \infty}\frac{1}{L}{\sum}_{l=1}^{L}{a}_{l}={lim}_{L\to \infty}{a}_{L}$ if the latter exists, they are equivalent. Modern definitions involve a ${2}^{-l\left(q\right)}$-weighted sum of prefix programs, which is also equivalent [6]. Finally, $M\left(u\right)$ is also equal to the probability that a universal monotone Turing machine with uniform random noise on the input tape outputs a string starting with u [20]. Further, $l\equiv \mathrm{Length}\left({q}_{min}\right)=K\phantom{\rule{-1.66656pt}{0ex}}m\left(u\right)$ is the monotone complexity of $u:={u}_{1:t}^{true}$. It is a deep result in Algorithmic Information Theory that $K\phantom{\rule{-1.66656pt}{0ex}}m\left(u\right)\approx -{\mathrm{log}}_{2}M\left(u\right)$. For most u equality holds within an additive constant, but for some u only within logarithmic accuracy [6]. Taking the ratio of $M\left(u\right)\approx {2}^{-K\phantom{\rule{-1.66656pt}{0ex}}m\left(u\right)}$ for $u={u}_{1:t}^{true}{u}_{t+1:n}$ and $u={u}_{1:t}^{true}$ yields (6).

## 9. Discussion

**Summary.**I have demonstrated that a theory that perfectly describes our universe or multiverse, rather than being a Theory of Everything (ToE), might also be a theory of nothing. I have shown that a predictively meaningful theory can be obtained if the theory is augmented by the localization of the observer. This resulted in a truly Complete Theory of Everything (CToE), which consists of a conventional (objective) ToE plus a (subjective) observer process. Ockham’s razor quantified in terms of code-length minimization has been invoked to select the “best” theory (UCToE).

**Assumptions.**The construction of the subjective complete theory of everything rested on the following assumptions: $\left(i\right)$ The observers’ experience of the world consists of a single temporal binary sequence ${o}_{1:t}^{true}$. All other physical and epistemological concepts are derived. $\left(ii\right)$ There exists an objective world independent of any particular observer in it. $\left(iii\right)$ The world is computable, i.e., there exists an algorithm (a finite binary string) which when executed outputs the whole space-time universe. This assumption implicitly assumes (i.e., implies) that temporally stable binary strings exist. $\left(iv\right)$ The observer is a computable process within the objective world. $\left(v\right)$ The algorithms for universe and observer are chosen at random, which I called universal self-sampling assumption.

**Implications.**As demonstrated, under these assumptions, the scientific quest for a theory of everything can be formalized. As a side result, this allows to separate objective knowledge q from subjective knowledge s. One might even try to argue that if q for the best $(q,s)$ pair is non-trivial, this is evidence for the existence of an objective reality. Another side result is that there is no hard distinction between a universe and a multiverse; the difference is qualitative and semantic. Last but not least, another implication is the validity of Ockham’s razor.

**Conclusion.**Respectable researchers, including Nobel Laureates, have dismissed and embraced each single model of the world mentioned in Section 2, at different times in history and concurrently. (Excluding All-a-Carte ToEs which I haven’t seen discussed before.) As I have shown, Universal ToE is the sanity critical point.

## List of Notation

G,H,E,P,S,C,M,U,R,A,... specific models/theories defined in Section 2 | |

T | $\in \{$G,H,E,P,S,C,M,U,R,A,...} theory/model |

ToE | Theoy of Everything (in any sense) |

ToE candidate | a theory that might be a partial or perfect or wrong ToE |

UToE | Universal ToE |

CToE | Complete ToE (i+e+l+n+o) |

UCToE | Universal Complete ToE |

theory | model which can explain≈describe≈predict≈compress observations |

universe | typically refers to visible/observed universe |

multiverse | un- or only weakly connected collection of universes |

predictive power | precision and coverage |

precision | the accuracy of a theory |

coverage | how many phenomena a theory can explain/predict |

prediction | refers to unseen, usually future observations |

computability assumption: that our universe is computable | |

${q}^{T}\in {\{0,1\}}^{*}$ | the program that generates the universe modeled by theory T |

${u}^{q}\in {\{0,1\}}^{\infty}$ | the universe generated by program q: $\phantom{\rule{1.em}{0ex}}{u}^{q}=\mathrm{UTM}\left(q\right)$ |

$\mathrm{UTM}$ | Universal Turing Machine |

$s\in {\{0,1\}}^{*}$ | observation model/program. Extracts o from u. |

${o}^{qs}\in {\{0,1\}}^{\infty}$ | Subject’s s observations in universe ${u}^{q}$: $\phantom{\rule{1.em}{0ex}}{o}^{qs}=\mathrm{UTM}(s,{u}^{q})$ |

${o}_{1:t}^{true}$ | True past observations |

$\stackrel{\u02d8}{q},\stackrel{\u02d8}{u}$ | Program and universe of UToE |

$S\left(o\right|u)$ | Probability of observing o in universe u |

$Q\left(u\right)$ | Probability of universe u (according to some prob. theory T) |

$P\left(o\right)$ | Probability of observing o |

## References

- Hutter, M. Human Knowledge Compression Prize, 2006. Availible on line: http://prize.hutter1.net/ (accessed on 20 September 2010).
- Stove, D.C. Popper and After: Four Modern Irrationalists; Pergamon Press: Oxford, UK, 1982. [Google Scholar]
- Gardner, M. A Skeptical Look at Karl Popper. Skeptical Inquirer
**2001**, 25, 13–14,72. [Google Scholar] - Wallace, C.S. Statistical and Inductive Inference by Minimum Message Length; Springer: Berlin, German, 2005. [Google Scholar]
- Grünwald, P.D. The Minimum Description Length Principle; The MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and its Applications, 3rd ed.; Springer: Berlin, German, 2008. [Google Scholar]
- Hutter, M. On Universal Prediction and Bayesian Confirmation. Theor. Comput. Sci.
**2007**, 384, 33–48. [Google Scholar] [CrossRef] - Smolin, L. Scientific Alternatives to the Anthropic Principle. 2004; arXiv:hep-th/0407213v3. [Google Scholar]
- Schmidhuber, J. Algorithmic Theories of Everything. Report IDSIA-20-00, IDSIA, Manno (Lugano), Switzerland. 2000; arXiv:quant-ph/0011122. [Google Scholar]
- Harrison, E. Cosmology: The Science of the Universe, 2nd ed.; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Barrow, J.D.; Davies, P.C.W.; Harper, C.L. Science and Ultimate Reality; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Hutter, M. Instantons in QCD: Theory and Application of the Instanton Liquid Model. PhD thesis, Faculty for Theoretical Physics, LMU Munich, Munich, German, 1996. [Google Scholar]
- Blumhofer, A.; Hutter, M. Family Structure from Periodic Solutions of an Improved Gap Equation. Nucl. Phys.
**1997**, B484, 80–96, Missing figures in B494 (1997) 485. [Google Scholar] [CrossRef] - Tegmark, M. Parallel Universes. In Science and Ultimate Reality; Cambridge University Press: Cambridge, UK, 2004; pp. 459–491. [Google Scholar]
- Schmidhuber, J. A Computer Scientist’s View of Life, the Universe, and Everything. In Foundations of Computer Science: Potential - Theory - Cognition; Springer: Berlin, German, 1997; Vol. 1337, LNCS; pp. 201–208. [Google Scholar]
- Tegmark, M. The Mathematical Universe. Found. Phys.
**2008**, 38, 101–150. [Google Scholar] [CrossRef] - Barrow, J.; Tipler, F. The Anthropic Cosmological Principle; Oxford Univ. Press: Oxford, UK, 1986. [Google Scholar]
- Bostrom, N. Anthropic Bias; Routledge: Oxford, UK, 2002. [Google Scholar]
- Alchin, N. Theory of Knowledge, 2nd ed.; John Murray Press: London, UK, 2006. [Google Scholar]
- Hutter, M. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability; Springer: Berlin, German, 2005. [Google Scholar]
- Gell-Mann, M. The Quark and the Jaguar: Adventures in the Simple and the Complex; W.H. Freeman & Company: New York, NY, USA, 1994. [Google Scholar]
- Schmidhuber, J. Hierarchies of Generalized Kolmogorov Complexities and Nonenumerable Universal Measures Computable in the Limit. Int. J. Found. Comput. Sci.
**2002**, 13, 587–612. [Google Scholar] [CrossRef] - Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Optimization. IEEE Trans. Evolut. Comput.
**1997**, 1, 67–82. [Google Scholar] [CrossRef] - Stork, D. Foundations of Occam’s Razor and Parsimony in Learning. NIPS 2001 Workshop. 2001. Availible online: http://www.rii.ricoh.com/∼ stork/OccamWorkshop.html (accessed on 20 September 2010).
- Schmidhuber, J.; Hutter, M. Universal Learning Algorithms and Optimal Search. NIPS 2001 Workshop. 2002. Availible online: http://www.hutter1.net/idsia/nipsws.htm (accessed on 20 September 2010).
- Solomonoff, R.J. A Formal Theory of Inductive Inference: Parts 1 and 2. Inf. Contr.
**1964**, 7, 1–22 and 224–254. [Google Scholar] [CrossRef] - Hutter, M. Sequential Predictions based on Algorithmic Complexity. J. Comput. Syst. Sci.
**2006**, 72, 95–117. [Google Scholar] [CrossRef]

© 2010 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Hutter, M.
A Complete Theory of Everything (Will Be Subjective). *Algorithms* **2010**, *3*, 329-350.
https://doi.org/10.3390/a3040329

**AMA Style**

Hutter M.
A Complete Theory of Everything (Will Be Subjective). *Algorithms*. 2010; 3(4):329-350.
https://doi.org/10.3390/a3040329

**Chicago/Turabian Style**

Hutter, Marcus.
2010. "A Complete Theory of Everything (Will Be Subjective)" *Algorithms* 3, no. 4: 329-350.
https://doi.org/10.3390/a3040329