1. Introduction
Consider a large statistical system formed by a mixture of particles of different kinds that remains in thermal equilibrium for an arbitrarily long span of time. According to statistical mechanics, there are fluctuations at thermal equilibrium, and in principle all configurations can be reached by such fluctuations with enough time available. Consider one of these random fluctuations giving rise—just by chance—precisely to a brain like ours, complete with our memories, information and perception, but no corresponding external physical world. Such a brain—indistinguishable from ours, but arising as a fleeting fluctuation—is call a “Boltzmann brain” (BB) [
1,
2,
3]). Such a brain would have exactly our current memories, perceptions, and it would know and feel precisely what we know and feel right now.
Is it possible to know that we are not a BB? An intuitive answer is that, yes, we can know this, that we are not a BB, since the probability for such a fluctuation to occur is fantastically small, and the expected time for this to happen is colossally longer than the current age of the universe. However, recent literature [
1,
2,
4,
5,
6] has questioned this simple answer. Counter-counter arguments have then said that the reasoning behind the BB hypothesis is that it is“cognitively unstable”, that in fact it is “unstable reasoning” [
7,
8,
9,
10].
To understand the issues involved in these disputes, it is worth describing the standard argument for the BB hypothesis in a bit more detail. To simplify matters, for the most part we restrict attention to classical physics, since much of the controversy also imposes that restriction. The associated standard argument for the BB hypothesis starts by pointing out that we currently have two particular sets of information:
- (i)
A particular set of microscopic laws of physics, including in particular Newton’s laws. Note that all of these laws are symmetric under time inversion.
- (ii)
The set of all present observations that we have concerning the physical universe. This includes all of our current records, recorded data, and memories.
What can we say about the physical universe, based on these two sets of information? The set of laws admits a large family of solutions. For simplicity, for the moment we restrict ourselves to these solutions that are consistent with , in the sense that has non-infinitesimal likelihood under those solutions. Similarly, let us assign equal probability to all these solutions that we restrict ourselves to.
Building on the widely accepted set of laws of modern classical and quantum physics, the fluctuation theorems of stochastic thermodynamics and open quantum thermodynamics have established that the evolution of entropy in all statistical physics systems is a stochastic process [
11,
12,
13]. It is also widely accepted that (in non-cosmological contexts) this process:
Is time translation invariant;
Is symmetric under time-inversion;
Has the property that if it is conditioned on a value of entropy at one particular time, , and not conditioned on anything else, and if is sufficiently lower than the maximal value of entropy, then the expected value of entropy for times greater than are greater than .
We call this set of three presumptions the entropy conjecture. We call any process satisfying the first two of these presumptions a Boltzmann process (whether or not it involves physical entropy).
In addition to implying that the universe’s entropy evolves as a Boltzmann process, includes the current value of the universe’s entropy, and in particular the fact that it is far lower than its maximum. (In other words, our current data tells us the current value of the universe’s entropy, albeit only with limited precision.) Crucially, the H theorem says that with extraordinarily high probability, entropy increases both into the future and into the past, from any such time at which entropy is known to have some specific low value. (Indeed, the entropy conjecture is time-symmetric about such a special time.) In other words, the most probable situation, given the current observations, is that we happen to be precisely at a special point in the dynamics of the universe’s entropy. In other words, the most probable situation is that we are just an entropy fluctuation, which is to say that we are a BB.
This standard argument for the BB hypothesis initially arose in the context of de Sitter quantum cosmology [
1,
2,
4,
5]. If such a universe evolves into the future, it asymptotes to a vacuum state with a finite, non-zero temperature. This equilibrium state has fluctuations that act similarly to thermal fluctuations in a classical gas at equilibrium that satisfy an entropy conjecture and corresponding recurrence theorems.
Indeed, suppose that the set of laws implies thermalization of the universe in the distant future, so that the universe settles into a long living equilibrium state (Boltzmann’s “thermal death”). Then. even if the universe started off in a low entropy state, will be realized an infinite number of times (up to arbitrary precision) in the distant future. This then raises the question of how we could conclude that with our data is not one of these fluctuations, which occur with probability 1.
One may respond to these arguments that we also know the second law of thermodynamics, and from this we know that entropy must grow with t. This directly contradicts the BB hypothesis. However, our knowledge of the second law arose from consideration of our data records about the past: how do we know that these are data records of the past and are not themselves due to a fluctuation? A common response is to simply assume that entropy was in fact low in the past and has been growing since. But this amounts to a (very) improbable assumption, given the entropy conjecture and . Those data are vastly more likely to be the result of a fluctuation than to be the result of low entropy in the distant past.
Round and round the arguments go. A common problem with all these arguments is that they are, at best semi-formal. The only way to resolve their apparent conflicts, is to use fully rigorous reasoning, grounded in stochastic process theory. That is what we do here.
In a nutshell, our analysis starts by recognizing that what Boltzmann proved (though he did not have the tools to formulate this way) is that the stochastic process of entropy (or in his formulation, the H quantity) is a time-stationary Markov process, symmetric under time inversion, which is biased towards high values.
Crucially, to compute a distribution over possible trajectories of a random variable that evolves according to any stochastic process, it is first necessary to say precisely what (if anything) that distribution is conditioned on. In the context of the entropy conjecture, that means specifying a (perhaps empty) set of pairs of a value of entropy, and an associated time when the universe had that value of entropy.
Any such specified conditioning events cannot arise from the entropy conjecture. In fact, it is not provided by physics. Rather, such events must be determined using Bayesian reasoning, based on our current observational data and the prior, we adopt over the possible laws of the universe.
As we show, this simple observation, formulated in a fully rigorous manner, disentangles all the ambiguities and circular reasoning that are rife in the literature on Boltzmann brains, and more generally in the literature on the past hypothesis (PH), and even in the literature on the second law. In particular, our analysis shows that the PH and the BB hypothesis are, formally speaking identical, in that both only condition on a subset of our current data, differing only in which such data they condition on. We also show that arguments about “unstable reasoning” in fact do not formally prove anything concerning the validity of the BB hypothesis, only about one particular argument that has been offered for that validity. We also formalize some of the properties of all circular reasoning, illustrating that they apply to common arguments for the second law.
We emphasize that we do not make any arguments for or against the BB hypothesis, the PH, or the second law. Indeed, as we prove, ultimately any such arguments must rely on choices about what data to condition the H theory on, and that choice cannot be provided by physics. Rather we provide a novel, fully formal framework for investigating those concepts and their very subtle relations.
We also emphasize that we do not dispute the major role of cosmology in the second law of thermodynamics [
14,
15,
16] or investigate precisely which models predict BB and which do not. Indeed, there are even major arguments that a framework for studying BB depending strictly on the entropy conjecture, without cosmology, is incomplete. It has long been known that the BB issue is highly sensitive to schema for assigning probabilities to values of the scale factor in FLRW scalar field cosmology models [
6,
17,
18]. We simply start from the working basis of pure statistical physics to unify and contextualize a recent genre of arguments in the literature.
We begin in the next section by summarizing one of the more nuanced, semi-formal counter-arguments that can be made against the BB hypothesis. A central concern with the counter-argument presented there is how we can reason from current data to conclusions about how experiments were initialized in the past and from there infer laws of the universe. The key conundrum we highlight is that this inference process itself relies on those laws being inferred, and so is circular. (Arguments in the literature that reach similar conclusions have characterized the standard argument for the BB hypothesis as “unstable reasoning”; see [
7,
8] for some work related to this counter-argument.)
In
Section 2 we begin by introducing the mathematical machinery we will need to disentangle the arguments concerning the BB hypothesis. Our first contribution is to introduce a restriction on sigma algebras that is necessary if (as in all analyses of the PH, the second law, etc.) we use them to investigate probability measures over possible “laws of physics”. We then show how to fully formalize the entropy conjecture as a Markov process; this is our second contribution. It builds crucially on the results in [
19].
Our third contribution is to use this formalization of the entropy conjecture to clarify what it cannot tell us. In contrast, our fourth contribution is to clarify what the entropy conjecture does in fact tell us concerning the BB hypothesis specifically. Next, our fifth contribution is to formalize our reasoning in
Section 2 within a general framework. We then use that framework’s formalization of the entropy conjecture to analyze what arguments concerning unstable reasoning can and cannot tell us about the BB hypothesis. After this we present our sixth contribution, which is a variant of the BB hypothesis we call the “1000 CE hypothesis”. As we show, formally speaking, the 1000 CE hypothesis lies exactly between the standard BB hypothesis and the PH. This illustrates the underlying identity between the standard BB hypothesis and the PH.
We end with a discussion, and then present some of the myriad future areas of research that our investigation suggests.
2. An Objection to the Standard Argument for Boltzmann Brains
Recall from above that we write for the (classical) laws of physics as we currently understand them, and for our current observational data. In addition, write for the Boltzmann brain hypothesis. In addition, as shorthand we write all statements that a particular conditional probability is close to 1 as .
To recap from the introduction, the conditional probability
of the BB hypothesis, given the laws of physics and the data observed in the present, is near 1. Therefore, in terms of our shorthand, that standard argument says that
The problem with the standard argument that we wish to highlight is that we
infer from —but that inference in turn invokes those very laws
we wish to infer. Specifically, it is now understood that, the way our brains work physically must ultimately rely on the second law of thermodynamics, to infer that our current data
accurately records the results of past experimental tests [
20,
21], even though (obviously) we are not conscious of using the second law when we access our memories. In turn, the second law is contained in
. Thus, in particular, we need to use the second law to infer from our current data the results of our past experiments of the second law itself,
. Therefore, we have circular reasoning in the very conditioning events we wish to use to ground the standard argument for the BB hypothesis.
To elaborate on this problem, let us focus on the laws
. How do we know the laws of the universe that we claim to know? By knowing only some present data
, i.e., properties of the current universe, we cannot infer these laws. This is even true when
contains the results of experimental tests of those laws. This is because the laws
are relations between the values of the physical variables
at different times, hence we cannot know or infer anything about them unless we also know data about the world at a time different that the present. In other words, we also need to know how those experimental tests were initialized in the past, in addition to the results of those tests, recorded in
. That is, we do not directly have
, but rather
How do we access the past data
? As described in [
20,
22], to do this we have to trust the reliability of
records or
memories we have at present that concern these past data. For notational simplicity, we indicate these memories as parts of
. Let us call
the assumption that such memories of past data are statistically reliable, that they accurately reflect the ways that our experimental apparatus(es) were established in the past [
23]. Therefore the combination
establishes
. Therefore writing
is equivalent to writing
.
Now, let us consider the two cases separately: whether we make the assumption
or we do not. If we do, then by Equation (
2) we have
and therefore
is close to 0. Bayes theorem then says
Even though the second term on the RHS,
, is close to 1 (by hypothesis), the first term on the RHS,
, is close to 0. Therefore
is close to 0. Therefore, returning to our simplified notation,
What about if we instead do
not assume
? In this case, we have no argument in support of
. Of course
does not follow from
alone, without
. Without an assumption about reliability of inferences about the past from current data, i.e., without an assumption about the reliability of memory systems [
20,
22], we have no reasons to believe in BBs. Data at a single moment of time are compatible with any dynamical law, namely with both
and
. Absent other assumptions, they characterize the state of the universe at a single time, at best. They are not informative about the laws of the universe.
One may object: we can consider not as a component of our knowledge, but rather as a fact of the world. That is, suppose we say: the world is truly governed by the laws , irrespectively from our knowledge. What can then we deduce from this fact combined with the fact that we know ? This appears to circumvent the observation that our knowledge of depends on our knowledge of records and on the assumption . But this objection does not hold. By themselves, events either happen or do not. Facts are either true or false. To reason about likelihood can only be based on incomplete knowledge and assumptions. The only reasonable question here is the likelihood that we assign to the possibility of being BBs, and this only depends on what we consider reliable. Our considering to be reliable is grounded into our consideration of records being reliable, hence in our not being in a fluctuation. That is, considering to be reliable relies on the assumption — which in turn ultimately relies on . So the reasoning is circular.
3. Relation Between the Entropy Conjecture, Boltzmann Brains, and the Second Law
The discussion in the preceding sections may leave a sense of confusion. Many parts of the arguments recounted above involving the BB hypothesis might seem to involve circular reasoning. Such reasoning can only (!) establish that the priors is relies are actually inconsistent. In particular, the fact that an argument engages in circular reasoning does not tell us anything about how valid its
conclusions are, just that the priors it relies on do not themselves establish those conclusions. (See
Appendix B for a Bayesian definition “circular reasoning”, and an analysis of what it does and does not imply.)
Another possible concern is that the arguments are only semi-formal, relying on “common experience”. However, in any investigation concerning foundational issues in thermodynamics (like whether BBs accord with our present data records), if one does not use fully formal mathematical reasoning it is extremely easy to subtly assume what one is in fact trying to prove. For example, that mistake is made in very many of the papers in the literature trying to prove the second law. Specifically, the analyses in many of these papers implicitly privilege the initial condition of a physical system (e.g., how it was set up) over its final condition (e.g., how it finishes). That asymmetry in boundary conditions assumed by those arguments is, ultimately, the basis for the “derivation” of the second law. However, our privileging the initial conditions in turn arises from the psychological arrow of time—which is commonly believed to be a consequence of the asymmetry of the second law. Therefore such a supposed derivation involves circular reasoning. The same mistake of circular reasoning has also been made in many earlier investigations of the BB hypothesis.
Fortunately, there is a formally rigorous framework that was recently introduced in the context of proving the second law specifically to avoid this mistake of circular reasoning [
19]. In the rest of this section we build on that rigorous framework, to fully formalize our arguments concerning the second law and BBs in earlier sections. This formal rigor ensures that there are no “gotcha’s” hiding in our central argument, and in particular that we do not implicitly assume the very thing we are trying to prove. This formalization also makes clear just how different our argument is from any that has previously been considered in the literature.
Before presenting this formal version of our central argument, we need to say something about our terminology. Throughout this text we will often refer to “Newtonian laws” as shorthand for all of classical physics. With some more care about the precise meaning of “space-time events”, this shorthand could be extended even further to also include quantum mechanics and general relativity.
3.1. Using Probability Distributions to Reason About Newton’s Laws
To begin, note that much of the literature (and in particular our central argument) ascribes probabilities to various possible laws of physics, e.g., to the second law. However, to meaningfully speak about a probability distribution over laws, we need to at least sketch how physical laws can be formulated in terms of sets in a sigma algebra that that probability distribution is defined over. In particular, to fully clarify the (in)validity of the BB hypothesis, the PH, and the second law, we need to pay scrupulous attention to how we reason from current data to probabilistic conclusions concerning the laws of the universe.
In this paper, for completeness, we provide a rigorous definition of what it means to have a probability measure over a space of possible physical laws of the universe. Our definition involves a sigma algebra
defined over a state space whose elements we will call “space-time events”. For maximal generality, we do not define that state space any further. We then define “laws” to be special kinds of partitions over the sigma algebra
. This is our first contribution—a fully rigorous formalization of how to define probability measures over laws of the universe [
24].
However, the details of this contribution of ours, and in particular of how we identify elements of such partitions of
as possible laws of the universe, do not directly arise in our analysis of the Boltzmann brain hypothesis. Accordingly, the details of our definition are consigned to
Appendix A. The reader who wishes to confirm that we do not run afoul of these restrictions on the sigma algebra of
are invited to consult that appendix.
With this issue resolved, we have the machinery to reason from current data to conclusions concerning the laws of the universe, i.e., to investigate posterior distributions over laws given various data sets, e.g., “P(Newton’s laws hold | observational data D)”. In this paper we will mostly be interested in such observational data sets D that comprise the current state of multiple “scientific records” stored in some of our books, academic papers, computer memories, etc. Central to our analysis will be the suppositions (properly formalized via information theory) that there is high probability that every such record that we consider accurately provides the results of experiments that take place at some time other than the present.
We label the first such observational data set we are interested in as
.
consists of current scientific records which lead to the conclusion that our universe obeys Newtonian mechanics (assuming those records do in fact provide the result of the associated experiments, which are in our past). In other words, they are the current state of some physical “memory system” that we suppose (!) is statistically correlated with the state of some experimental outcomes in the past [
25], where those experiments involved Newton’s laws. Stated more formally, if
—the
current state of an associated physical memory system—does in fact accurately provide the results of experiments conducted in
our past, then
P(Newton’s laws hold | observational data
would be high. (In this paper we will not need to precisely quantify what it means for some probability to be “high”.) Below we will refer to this situation as
being “reliable”.
Unfortunately though, just by itself, cannot establish that it “does in fact accurately provide the results of experiments conducted in our past with high probability”. As a prosaic example of this problem, no set of lab notebooks with squiggles in them can, by themselves, imply that those squiggles record the results of experiments in the past concerning Newtonian mechanics that were actually done—ultimately, one must simply assume that they have such implications with high probability. No a priori reasoning can allow us to conclude that any such causes P(Newton’s laws hold|observational data to be high.
To get past this roadblock we need to in essence
assume that
does indeed accurately provide the results of past experiments. More precisely, we need to adopt a prior distribution over
that says that we can indeed use the records
, which are the current state of an associated memory system, to infer that with high probability the results of those earlier experiments, which in turn imply Newton’s laws. While these details do not concern us here, it is important to note that can make this reasoning concerning prior distributions and our data fully formal. To do so, in the terminology of [
20] we say that
is
reliable if it is the current state of a memory system, and there is high restricted mutual information between
and the results of those particular experiments in our past which are related to Newton’s laws [
26].
It is important to emphasize that every paper concerning the second law, the past hypothesis, or Boltzmann brains, has assumed the laws of physics hold. Many of those papers go on to make additional assumptions as well, either explicitly or implicitly. Here we are maximally conservative, not making any extra assumptions at all, beyond the laws of physics.
As shorthand, below we will say “our prior is that the data is reliable” if the combination of our prior over the set of space-time events in , together with the likelihood function over conditioned on , gives a high posterior probability that is in fact reliable. Under such circumstances, we have high posterior probability that the contents of accurately reflect the results of earlier experiments. From now on, unless explicitly stated otherwise, we will assume that we always have such a data set .
3.2. The Entropy Conjecture
Now that we have elaborated on how a data set providing the current state of a memory system can lead us to believe Newton’s laws, we can start to investigate the thermodynamic implications of those laws. One of the myriad contributions of Boltzmann was, of course, his argument that the entropy conjecture follows from Newton’s laws. (More precisely of course, he argued that Newton’s laws led to the H theorem; the entropy conjecture can be viewed loosely as a generalization of that theorem.) In modern language, his argument concerns stochastic processes, Markov processes in particular. (See [
27,
28] for background on stochastic processes.) However, Boltzmann derived his results well before Kolmogorov formalized the notion of stochastic processes, and in particular before anyone had started to systematically investigate Markov processes. As a result, even though his mathematics behind his analysis was correct, his formulation of those results does not explicitly involve stochastic processes.
Unfortunately, this precise failure to formulate arguments concerning the dynamics of the universe’s entropy as a stochastic process has led to widespread failure to fully grasp the mathematics of the entropy conjecture, and how to use it to reason formally about the second law, BBs, etc. To undertake such reasoning one must formulate the entropy conjecture in terms of stochastic processes.
To that end, we now present a generalized version of the entropy conjecture, as a specific class of Markov processes. While we will use the term “entropy” to mean the random variable of this process, the underlying mathematics of this kind of process is more general. Write this Markov process as , defined over a (compact, real-valued) state space X. We write for the value of that process at the time t, i.e., the entropy of the universe then. (Note that while is the state space of a random variable, t is an index, distinguishing instances of that random variable.) Refer to a distribution conditioned on n time-indexed entropy value as an “n-time-conditioned” distribution. For example, is a two-time-conditioned distribution over .
We will say a process is a
Boltzmann process if it has a time-symmetric and time-translation invariant one-time-conditioned distribution:
where
and
is arbitrary. The first equality expresses time symmetry (also called time reversal invariance) about a particular time
t. The second expresses time translation invariance. The particular stochastic process derived in Boltzmann’s H theorem can be viewed as a kind of Boltzmann process. With
k implicit, we refer to the conditional distribution in Equation (
6) as the
kernel that generates the Boltzmann process. (Because of the time-translation and time-symmetric invariances of the process, the arguments of the kernel are not indexed by any specific values of time.)
The definition of a Boltzmann process involves probability distributions conditioned on values of the process at single times only. However, in many situations we would want to compute the conditional probability of the value a Boltzmann process
x conditioned on its value at two times. In particular, in this paper we will be interested in
for times
t such that
. Fortunately, as is shown in [
19], for Boltzmann processes we can express the two-time-conditioned distribution purely in terms of the kernel of the process:
for all
.
We refer to this result as the “Boltzmann generation lemma”. Note that due to Markovanity, conditioning on an additional value
has no effect on this conditional distribution if
lies outside the interval
. Therefore, the lemma provides a broad suite of
n-time conditioned Boltzmann distributions, for
n arbitrarily large [
29].
As an aside, an important property of Boltzmann processes is that their marginal distribution must be the same at all times
t, i.e.,
must be independent of
t. (This follows from the fact that the kernel is time-translation invariant.) It immediately follows that that marginal distribution equals the fixed point distribution of the kernel [
19].
As implied in the discussion above, the standard argument for the BB hypothesis involves Boltzmann processes that are conditioned on , a single, specific value of entropy at a single, specific moment in time (namely, is the present). As we describe in detail below, successive stages in that standard argument all involve conditioning on that pair . In particular, even if we exploit Bayes’ theorem to condition on other quantities, those other quantities we condition on are all in addition to conditioning on .
Accordingly, in our analysis of the BB hypothesis we will focus on (Boltzmann) stochastic processes that are “pre-conditioned”, to always be conditioned on some specific set of events , whatever other set of events they might be conditioned on as well. It will be convenient to introduce some special terminology to refer to such stochastic processes.
Let
be a stochastic process over a state space
X, and let
be an associated set of events. We will say that a different stochastic process
is
equivalent to the Boltzmann process
,
nailed to the
(nailed) set of events , if
for all sets of times
and associated set of joint states
. In other words, the unconditioned distribution over joint states
under the process
is the same as that of the Boltzmann process
at those times conditioned on
. Intuitively, we “nail down” some of the time-state pairs of the underlying Boltzmann process
to produce
. It immediately follows from the definition Equation (
8) that for all sets of events
,
In the sequel, especially when discussed nailed data sets, it will sometimes be convenient to rewrite the pair of an index t and associated value x as a pair , or even abuse notation and write it as . As shorthand, we will say that a stochastic process over a state space X is singleton-equivalent if it is equivalent to a Boltzmann process nailed to a single event, .
As an important example, the so-called “past hypothesis (PH)” is the informal idea that the second law of thermodynamics can be derived from the entropy conjecture if we condition the Boltzmann process of the universe’s entropy on the value of the universe’s entropy at the time of the Big Bang [
15,
21,
22,
30]. Stating this more formally, the PH involves a singleton-equivalent stochastic process, where the singleton concerns the time of the Big Bang. (This description of the PH is elaborated on in a fully formal manner below.)
We now have the tools to precisely state what Boltzmann did show—and what he did not. Stated formally, what Boltzmann showed was that if Newtonian mechanics holds, then the H function evolves as a Markov process having a time-symmetric and time-translation invariant kernel, i.e., as a Boltzmann process. Therefore, the assumption that is reliable tells us that the stochastic process governing the dynamics of H is a Boltzmann process. This clarification of how to properly formalize the entropy conjecture is our second contribution.
Note that Boltzmann did not prove that the H function evolves as some specific singleton-equivalent stochastic process. In particular, he did not prove that it evolves as a Boltzmann process nailed to a singleton data set where is the present. Nor did he prove for PH that the H function evolves as a Boltzmann process nailed to a singleton data set where is the Big Bang or some such. This further clarification of what the entropy conjecture does not say is our third contribution.
We need more than just a fully formal entropy conjecture to analyze the BB hypothesis though. To properly understand that hypothesis we also need a properly motivated basis for determining just what nailed data set we should use. To do this, we can invoke the copious literature on desiderata motivating Bayesian reasoning. Many compelling arguments, ranging from DeFinetti’s Dutch book arguments [
31] to Cox’s axiomatization of a “calculus of reasoning under uncertainty” [
32] to the derivation of subjective Bayesian decision theory based on Savage’s axioms [
33], all establish the normative rule that whenever we perform scientific reasoning, we should only consider probability distributions that are conditioned on precisely all of the current observational data that we have, and not on anything more that that. (See [
34] for a comprehensive, though now somewhat dated review, of the relevant literature on Bayesian reasoning.)
This normative rule is a formalization of what we meant above when we said “all evidence can be doubted, but doubting selectively leads to shaky conclusions.” In the context of the introductory argument, whether or not we conditioned on reliability determined whether or not we could negate the BB hypothesis. Here, we emphasize that, if assuming a prior of reliability, all data assumed reliable needs to be conditioned on. In other words, the normative rule implies that we should include in the nailed set all time-entropy pairs that we have high belief in, and no others. We will refer to this injunction as the (nailed set) construction rule.
3.3. The Boltzmann Brain Hypothesis and the Entropy Conjecture
We now have the machinery to rigorously investigate the BB hypothesis. To begin, we define the BB hypothesis formally as a triple of suppositions:
- (i)
The entropy conjecture holds for a kernel generating a Boltzmann process ;
- (ii)
The stochastic process giving the entropy values of the universe is equivalent to nailed to the singleton , where is the present and is the universe’s entropy at present.
- (iii)
The expectation value of the stationary distribution of
is far greater [
35] than
.
The immediate implication of the BB hypothesis would be that entropy increases going into our past, i.e., that we are BBs.
As pointed out above, Boltzmann proved that the first of the suppositions defining the BB hypothesis follows from Newton’s laws (and therefore from ). However, the other two suppositions are just as crucial to have a “BB” as the term is usually considered. For current purposes it is safe to simply assume that item (iii) of the BB hypothesis holds, by incorporating that assumption into our prior. (While cosmological arguments indicate that that assumption holds, there are subtleties that are irrelevant for current purposes).
Item (ii) of the BB hypothesis is far more problematic however. The entropy conjecture does not specify that the stochastic process generating the trajectory of entropy values of our universe is singleton-equivalent. Therefore, the posterior probability of the laws of the universe conditioned only on the data is not peaked about the BB hypothesis, as that hypothesis is defined above. This clarification of what the entropy conjecture does (not) imply concerning the BB hypothesis is our fourth contribution.
In light of this clarification, is there any way we might infer the BB hypothesis from the assumption only that ( is true and therefore) the entropy conjecture holds, as the standard arguments for the BB hypothesis in the literature claim to do? To answer this question, first define to be some set of scientific records that have the property that if the entropy conjecture holds (i.e., if the first supposition of the BB hypothesis holds), and is reliable, then the other two suppositions of the BB hypothesis hold with high posterior probability, conditioned on both and . We will use the term “standard argument” for the BB hypothesis to mean the assumption that there is such a , and so suppositions (ii) and (iii) of the BB hypothesis hold.
Unfortunately, whether or not we have such a , we definitely do have yet another, third data set, , which (via the nailed set construction rule) contradicts supposition (ii) of the BB hypothesis. It is convenient to partition this set of data into two distinct subsets, and , and discuss them separately.
First, whatever else we might infer concerning the entropy of the universe at various times, we certainly have at least as much data leading us to infer the value of the universe’s entropy at the present, . is that data leading us to conclude that the entropy at the present is that particular value. (More formally, we adopt a prior such that is reliable, where implies that equality with high probability.) Invoking the construction rule, this means that the singleton should be added to the nailed set.
The second component of , , is a set of data that has the property that if the entropy conjecture holds (and assuming a prior such that is reliable), that set of data would imply that at the time of the Big Bang, or shortly thereafter, the universe had some specific entropy value that is far lower than . A very large number of such data sets are provided by modern observational cosmology. Invoking the construction rule, this means that the pair should also be added to the nailed set, in addition to . This implies that our nailed data set should have two time-indexed entropy values in it.
As we describe below, depending on the precise parameters of the associated Boltzmann process and the precise values in this two-moment nailed data set, the associated posterior expectation of the values of the universe’s entropy would not monotonically increase as we move further into our past from the present. Therefore, at least for those precise values in
, the three suppositions that jointly lead to the BB hypothesis would not all hold. Indeed,
would not just contradict the usual argument establishing the BB hypothesis; it would in fact rule out the BB hypothesis. (See
Figure 1).
As an aside, suppose that before the advent of observational data indicating the existence of the Big Bang in our past, we did not have the data set
, only
. Suppose that in those times before Hubble, scientists had our current understanding of the Big Bang, the PH, why memory is temporally asymmetric, etc. Arguably, if that had been the case, then those scientists
should have concluded (absent other arguments) that we were Boltzmann brains. See
Section 4.
3.4. The Past Hypothesis and the Second Law
Does the kind of analysis given above provide a reason to doubt the PH? What about the second law itself? The answers, respectively, are “yes”, and “depends what you mean”.
To see this, first we introduce a fully formal definition of the PH, just like our formal definition of the BB hypothesis, as a set of three logically independent suppositions:
- (i)
The entropy conjecture holds for a kernel generating a Boltzmann process ;
- (ii)
The stochastic process giving the entropy values of the universe is equivalent to nailed to the singleton , where is the time of the Big Bang (or some specific time shortly thereafter) and is the entropy of the universe at the entropy of the universe at .
- (iii)
The expectation value of the stationary distribution of is far greater than .
Note that this formal definition of the PH is almost identical to the formal definition of the BB hypothesis, with only the second of the three assumptions being (slightly) different in the two cases.
To begin, in direct analogy to the case with the BB hypothesis, simply assume that item (iii) of the PH holds by incorporating that assumption into our prior. Next, suppose we had data and adopted associated priors that led us to conclude that item (i) of the PH holds, and that the nailed data set should include . For example, this could be because we have data and , and assume that both are reliable. Suppose though that we did not have any data .
If we made no other assumptions and had no other data, then by the construction rule, we would conclude that the stochastic process of the values of entropy was a Boltzmann process singleton-equivalent to . In other words, it would establish item (ii) of the PH to go with items (i) and (iii). Therefore, the PH would hold in its entirety.
That in turn would lead us to conclude that the values of entropy between the Big Bang and now was monotonically increasing in expectation value, and that it would continue to increase into our future. (More precisely, that data together with those assumptions would mean that the expected value of the marginal distributions of that process at single times t increased monotonically with .)
Therefore, a prior that
and
are reliable, together with the assumption that item (iii) of the PH holds,
and no other data, would imply that the PH holds, and that therefore the second law holds. Note that the second law would in turn mean that entropy decreases the further back we go into our past from the present. However, as mentioned above, the BB hypothesis would imply that entropy
increases going backward in time from the present. Thus, the PH would both imply the second law and rule out the BB hypothesis. This is precisely the case considered in
Section 2 with
.
It is important to emphasize though that in addition to and , we have yet other type of data that we typically assume to be reliable. In particular, it seems hard to dispute that the value of the universe’s entropy at the current time is known at least as reliably as , the universe’s entropy at the time of the big bang. (In this regard, note that modern cosmology only infers indirectly, through an extremely long line of reasoning involving lots of experimental data of many different kinds.)
The pair is precisely the data set given by the first component of , discussed above when analyzing the formal foundations of the BB hypothesis. According to the construction rule, our analysis is flawed if we condition only one or the other of the two pairs and ; that is the flaw in the BB hypothesis discussed above. Note though that this flaw in the standard argument for the BB hypothesis is precisely replicated in the PH. Both lines of reasoning violate the construction rule (and therefore contradict all the desiderata underlying Bayesian reasoning), since they selectively leave out some reliable data from the nailed data set while keeping some other data. In fact, arguably the PH is more guilty of this transgression than the standard argument for the BB hypothesis, since the reliability of the first component of seems more indisputable than that of the second component of . This means that any physicist willing to accept the PH should instead adopt the BB hypothesis (absent other arguments).
Of course, it is possible to combine the suppositions of both hypotheses, simply by supposing the full data set
, including both subsets, is reliable. Doing this would lead us to conclude that the evolution of the universe’s entropy is equivalent to a Boltzmann process nailed to the pair
, as illustrated in
Figure 1. Reference [
19] analyzes such Boltzmann processes that are nailed to such a pair of time–entropy pairs. It is shown there that for certain parameters of the Boltzmann process and certain values in the nailed data set, the associated posterior expected entropy would have been
increasing as we start to go backwards in time, into our very recent past. This would imply that the second law is wrong in our recent past. Expected entropy would still decrease though once we go far enough backward in time towards the Big Bang. Therefore, the second law would be correct on long enough timescales into our past. Moreover, we would still have the usual second law going
forward from the present. Thus, all experimental tests of the second law we might do
in our future would (according to the entropy conjecture, etc.) still confirm the second law. Such a scenario could be viewed as a highly attenuated variant of the BB hypothesis, one not necessarily having any implications about carefully arranged sensory inputs to our brains and/or carefully arranged photons streaming in to our telescopes, etc.
3.5. What the Entropy Conjecture Does Not Imply
Interestingly, the second law provides a major way—perhaps the
only way, in fact—for us to conclude that any scientific records are reliable. This is because of the role of the second law in establishing reliability [
22,
30], or stated more formally, because of the role of the second law in real-world type-3 memory systems [
20]. Therefore, assuming the second law holds, it provides the only way we know of to conclude that
is reliable, along with establishing the reliability of either subset in
, or even some
. Given the results of
Section 3.4, this means that the only known way to establish that the second law holds is, ultimately, to assume that the second law holds.
Therefore, the second law is consistent with the priors that imply it—but only by virtue of the fact that we can adopt priors that
and
are reliable, or that the second law holds (or even both). But we cannot somehow avoid making any prior assumptions at all, in an attempt to bootstrap ourselves into placing high probability on having
or
be reliable and that the second law of thermodynamics holds. (See
Appendix B.1 for a Bayesian formalization of circular reasoning and the associated derivation of what circular reasoning does and does not imply.)
Suppose that we misunderstood the foregoing and thought that if were reliable that somehow that would not only imply the entropy conjecture but also imply:
- (a)
The associated evolution of the entropy of the universe is a Boltzmann process that is singleton-equivalent, for a time set to the present, with set to the entropy of the universe of the present;
- (b)
If the second law holds (both at the present and at times in our past), then it follows that in fact is reliable. (In other words, there is a type-3 memory system that would rely on the second law in order to establish that is reliable.)
Suppose as well that the second law is true. That would mean (by (b)) that is reliable, and therefore that Newton’s laws hold. This in turn would mean (as Boltzmann showed) that the entropy conjecture holds. By (a), we could then conclude that the BB hypothesis holds.
However, this derivation of the BB hypothesis would mean that the second law does not hold. That would contradict the second supposition (b) of this very derivation of the BB hypothesis. Note that this is more than using circular reasoning to try to establish some hypothesis or other. Rather it is using reasoning that contradicts itself to try to establish a particular hypothesis. This seeming self-contradiction is a version of what is sometimes called unstable reasoning. (See
Appendix B.2).
Of course, the foregoing flaw in a particular derivation of the BB hypothesis does not mean that the BB hypothesis is false—it just means that the reasoning for that hypothesis given above contradicts itself. More precisely, as we prove in
Appendix B.1, unstable reasoning just means that one is implicitly trying to use a joint prior distribution over all associated random variables that is not a proper probability distribution. Therefore it is simply saying that one needs to adopt a different prior. This clarification that issues of unstable reasoning have no bearing on the BB hypothesis is our fifth contribution.
In addition though, it is interesting to note that
nobody has ever explicitly invoked the second law as in (b) to justify the reliability of any data concerning the past they wish to use, be that data
or anything else. Rather, we now understand that all memory systems that allow us to treat the current state of a memory as providing information concerning the past must, ultimately, rely on the second law in their underlying physical mechanism [
36]. Moreover, before this modern understanding, say a century ago, there would have been no reason to suppose that the second law was in any way required for us to (have faith that we can) use our experimental data as though it provides accurate information about the past state of our experimental apparatuses. There would not have been any reason to suppose that the second law is itself in any way needed to derive the second law itself. And no current textbook or paper in the literature uses the second law that way. “Unstable reasoning” never explicitly arises in any of the arguments for the BB hypothesis one can find in the literature.
Finally, it is important to emphasize that there is a simple variant of the BB hypothesis that does not suffer from any form of “cognitive instability”. Note that formally speaking, the analysis in
Section 2 says that the BB hypothesis defined as a Boltzmann process singleton-equivalent to the present time, when
is observed, does not hold (if we assume
). Moreover, it actually also implies that the dynamics of the universe’s entropy is equivalent to a Boltzmann process nailed to a pair of times, namely the times of
and
. However, as discussed above, this could in turn imply that
entropy increases going into the past from the time of , the time when we started our experiments that led us to believe
. In other words, that argument not only says that the present is not a Boltzmann brain—it also says that the time in our past when we started our experiments
is a Boltzmann brain, i.e., that entropy increases going past from that time we started our experiments, as well as going forward from it, to our present.
It is illuminating to push this scenario further, by introducing a slight variant of the BB hypothesis. This variant is just like the standard BB hypothesis in that it supposes that the universe fluctuates from a state of maximal entropy down to some minimal entropy value, after which the entropy of the universe starts expanding again. The difference is that rather than suppose that the minimum of the entropy of the universe occurs at the present, under the 1000CE BB hypothesis it occurs at 1000CE [
37]. More formally, the “1000CE BB hypothesis” is the same three suppositions that were used to define the BB hypothesis in
Section 3.3, except that
is changed to be the year 1000CE:
- (i)
The entropy conjecture holds for a kernel generating a Boltzmann process ;
- (ii)
The stochastic process giving the entropy values of the universe is equivalent to nailed to the singleton , where is 1000 CE and is the universe’s entropy at that time.
- (iii)
The expectation value of the stationary distribution of is far greater than .
Under the 1000CE BB hypothesis, the second law would hold, so our memory systems (scientific records) would accurately record the results of all experiments conducted in the last thousand years. Therefore the data set
which implies the entropy conjecture would be reliable. Therefore cognitive instability arguments do not apply, even though a version of the BB hypothesis is still true [
38].
Note that we could keep pushing the time of the entropy minimum further and further back into the past, from the present to 1000CE and ultimately all the back to the big bang. At that point we have a variant of the BB hypothesis which is just the fluctuation hypothesis version of the PH [
39]. Moreover, as discussed above, we
do have a data set—
—that we think gives us some confirmation that entropy was indeed quite low at the big bang. Therefore one must be careful before making sweeping claims about whether any particular variations of the BB hypothesis could or could not be confirmed by data, since the PH is itself only such a variant. These formal definitions of the BB hypothesis, the PH, and the 1000CE hypothesis are our last contribution.
As a final comment, as pointed out above, this same kind of data problem holds for the second law, the PH, etc., in that none of them can be either confirmed or refuted
solely from data. Any data-based assessment of any of them ultimately relies on making assumptions for prior probabilities. In our analysis in this paper, we have chosen to only adopt the prior that
is reliable, which we take to be a necessary minimal assumption for
any investigation of these issues [
40]. Of course, this prior removes the need to invoke the second law to establish the reliability of
. Therefore there is no cognitive instability under this prior.
4. How and Why Intuition Leads Us Astray
Many of the mathematical arguments and derivations presented above defy to common sense and intuition. In part, one might dismiss this issue. After all, since the advent of quantum physics and the general theory of relativity, physicists have become accustomed to following math wherever it may lead, even when it results in counterintuitive conclusions.
Nonetheless, it is worth teasing apart the precise reason
why our result are counterintuitive, to allay associated fears. Some instances of specific intuitions being defied were discussed above, in both the introduction and
Section 3. Since these are such subtle points however, it is worth describing some of the other strong intuitions that are at odds with our formal results, why those intuitions are so strong, and just how it is that they can be so misleading.
First, most obviously, almost everyone feels, very deeply, that their memories are accurate, that they correctly depict some information about a real external world and its states in the recent past. To understand how this deep-felt intuition could be completely wrong, it is necessary to proceed through several steps, to understand the source of this intuition:
To begin, note that human memory is, formally speaking, a physical system (namely, our brains) whose current state provides statistical information about the likely state of some external physical system, at a different time (namely, information about the external world that we believe we remember). Or at least, we presume that to be the case.
Formalizing memory this way, we immediately notice a mystery: why do we have so much confidence that the current state of our brains provides such accurate information about the past state of the external world, but not about the future state of the external world? Phrased differently, why do we believe our innate ability at retrodiction is so much greater than our ability at prediction?
Presumably, our belief in the accuracy of our memories of the past does not have anything to do with our knowledge of the laws of physics. Training in theoretical thermodynamics is certainly not required for people to believe their memories are accurate (!). Rather, this belief seems to be built into the functioning of our brains, into how our neocortices are wired, so to speak. Why are our brains constructed this way, to believe our memories of our past, solely?
To address that mystery requires us to provide a physical model of memory. However, human memory involves some extraordinarily complicated processes in the human brain, processes we know precious little about [
41].
Accordingly, the great majority of investigations of the human memory system have not considered it directly. Rather they have considered far simpler examples of memory systems that we can investigate in detail, which provide non-biological instances of memory [
15,
20,
21,
22,
30,
42]. A common example of such a memory system is the surface of the moon. The surface of the moon is the memory system, akin to the human brain, whose current state (namely, the craters on the moon’s surface) provides information about the state of physical systems external to it in the past (namely, that there were meteors on a collision course with the moon in the past). Another common example is a line of footprints on an otherwise smooth beach. The beach is the memory system, akin to the human brain, whose current state (namely, the footprints on it) provides information about the state of physical systems external to it in the past (namely, that there was a person walking across that beach).
What do we learn from these examples of simple memory systems concerning the asymmetry of our memory? Do the common features of these examples of memory systems help us understand why we have so much more confidence in retrodiction rather than prediction?
In fact, these examples do have one striking feature in common: all of them seem to rely on the second law of thermodynamics. The temporal asymmetry of these memory systems, and by extension of our human brains, seems to reflect nothing more than the simple fact that entropy increases in time. Indeed, the precise reason we are so very sure about some of our memories seems to reflect the ironclad nature of the second law.
In other words, the reason we seem to have so much confidence in our memory ultimately resides in the fact that those memories rely on the second law.
This line of reasoning would seem to mean that we
should have complete faith in our memories, given that the second law cannot be violated. Unfortunately though, this line of reasoning begs a question: how it is that the physical universe, whose (relevant) microscopic laws are all time-reversal symmetric, could give rise to the asymmetry of the second law? The arguments supporting the second law offered by Boltzmann and those who followed him were all based on the microscopic laws of physics, and therefore they
had to be time-inversion symmetric. Formally they all state that at a certain time
t entropy is low (or a similar quantity is, like Boltzmann’s H function), then entropy increases in time going
both forward from
t and backward from
t [
15]. This implies that entropy should increase going both forward from the present
as well as going backward from the present.
In other words, our analysis of why we believe our memories to be accurate has reduced the mystery of human memory to the infamous paradox ascribed to Loschmidt.
What would this time-symmetric reasoning of Loschmidt (and others) mean physically? What would it mean, physically, for entropy to increase going into our past, despite our apparent memories of the past, which would seem to rely on entropy increasing in time, not decreasing? Among other things, it would have to mean that we have no reason to believe our memories are accurate. It means that they do not tell us anything about the state of the external world in the past. They only appear to give us that information.
In short, our intuition concerning our memories leads us greatly astray. In quantum mechanics, or relativity, that which seems “obviously” true is deeply problematized, either by the results of experiment (quantum mechanics) and/or pure mathematical reasoning (general relativity). The same is true of thermodynamics, evidently.
Just like when we analyze event horizons or the quantum Zeno effect, when we analyze the second law, memory and Boltzmann brains, we are almost guaranteed to go astray if we do not rely solely on mathematical reasoning to guide us, consigning intuition to a purely secondary role.
Reasonable as the arguments just presented might be, in the abstract, how, concretely, can they hold? How could we have all of our human memories concerning the past be fallacious? How could entropy increase into our past rather than decrease, as required by the time-symmetric nature of all derivations of the second law that are consistent with the microscopic laws of physics? How could it be that our memories are wrong?
Such flaws in our memory would require some exquisite fine-tuning, that all the neurons in our brains happen to be in the state corresponding to particular memories, when in fact nothing of the sort is true. Amazingly though, standard arguments of statistical physics tell us that it is almost infinitely more likely for this to be the case, rather than for entropy to continue to decrease into our past, as demanded by the second law.
This conclusion is precisely the Boltzmann brain hypothesis. As we show above, it too has subtleties that can only be addressed through careful mathematical reasoning.
5. Discussion
In this paper, we extended recent advances in stochastic process theory in order to disentangle the second law, the reliability of human memory, the BB hypothesis, and the Past Hypothesis (PH) with full formal rigor.
To do this we first established the following, preliminary results (some of which are similar to earlier results in the literature):
- (i)
The standard argument in favor of the BB hypothesis can be seen as contradicting itself, by implicitly assuming the second law;
- (ii)
Boltzmann’s entropy conjecture, properly formalized, says that the H function evolves as a particular type of stochastic process, a “Boltzmann process”, and then derives some new, useful properties of Boltzmann processes;
- (iii)
We point out that Boltzmann’s reasoning does not prove that the H function evolves as a Boltzmann process conditioned on the value of H at a single time (any more than he proved that it evolves as a Boltzmann process conditioned on the value of H at two times, or zero times). This disproves the common interpretation of what Boltzmann proved;
- (iv)
We introduced formal definitions involving Boltzmann processes of the BB hypothesis, the Past Hypothesis (PH), and a variant of the BB hypothesis that does not suffer from any form of cognitive instability, which we call the 1000CE hypothesis;
- (v)
We point out that due to (iii), the entropy conjecture has nothing to say one way or another about the legitimacy of the BB hypothesis, nor of the second law, the PH, or the 1000CE hypothesis.
We then used these preliminary results to disentangle the BB hypothesis, the time-asymmetry of memory, the entropy conjecture, and the second law, with more formal rigor than has been done before. Specifically, we demonstrate that neither the reliability of experimental data nor our possessing Type-3 memory can be disentangled from the second law. Second, we show that the validity of the entropy conjecture is independent of the BB hypothesis, the PH, and the 1000CE hypothesis, since it is an independent ingredient in those hypotheses. Third, we show that those hypotheses are all variants of one another, simply differing in the times they assume for the extreme of an entropy fluctuation—with the caveat that the BB hypothesis has a seeming self-contradiction baked in.
Our use of stochastic process theory—never before used to analyze these issues—was necessary to avoid the implicit assumptions in many arguments in the literature that often result in circular reasoning. Stochastic process theory also (correctly) forces us to specify the prior probabilities one must use to analyze the issues we consider.
This rigor demonstrates that, ultimately, the only reason we have for assigning more credence to the PH or even to the second law than we assign to the BB hypothesis is the credence we assign to our cosmological observations concerning the big bang. This means in turn that before the early 20th century, when we first made those cosmological observations, we should have treated the BB hypothesis as just as probable as the second law. Moreover, whatever reliability we now assign to those cosmological observations implying the entropy at the big bang was low, we should arguably assign at least as much reliability to our observations concerning the current entropy of the universe. Once we do that however, then the Boltzmann process governing the evolution of the universe’s entropy should be conditioned on two moments in time (the Big Bang and the present). However, conditioned that way, the Boltzmann process of the universe would rule out not just the BB hypothesis, but also the naive versions of the PH and the second law currently in vogue.
We emphasize that since the entropy conjecture cannot imply anything regarding the probability of the BB hypothesis that does not ultimately rely on priors, any argument in its favor or against must ultimately involve an argument in favor of one specific prior. In particular, at present there is no fully rigorous argument that relies only on established physics to dispel the possibility of the BB hypothesis. The view we have established in this paper is that the BB hypothesis is not established by the standard argument, nor is it refuted by standard criticisms.
It is important to appreciate that there are many deep issues in the philosophy of science that our careful investigation of the BB hypothesis uncovers. One is that there is the huge issue of how—from the perspectives of either philosophy or the axiomatizations of Bayesian probability—one can justify conditioning on any moment in addition to the present, and if so, why just one such additional moment.
Suppose that we do though, e.g., due to associated assumptions concerning reliability. This would still leave the issue of whether (as implied by the construction rule) we really should nail stochastic processes to all time-entropy pairs that we have very strong reason to believe. All of which in fact provides ways to rescue the BB hypothesis, by justifying a prior that we nail the Boltzmann process to only .