Typical = Random

Landsman, Klaas

doi:10.3390/axioms12080727

Open AccessReview

Typical = Random

by

Klaas Landsman

Institute for Mathematics, Astrophysics, and Particle Physics (IMAPP) and Radboud Center for Natural Philosophy (RCNP), Radboud University, 6525 XZ Nijmegen, The Netherlands

Axioms 2023, 12(8), 727; https://doi.org/10.3390/axioms12080727

Submission received: 15 June 2023 / Revised: 28 June 2023 / Accepted: 30 June 2023 / Published: 27 July 2023

(This article belongs to the Special Issue Editorial Board Members' Collection Series: Quantum Information Theory)

Download

Browse Figure

Review Reports Versions Notes

Abstract

:

This expository paper advocates an approach to physics in which “typicality” is identified with a suitable form of algorithmic randomness. To this end various theorems from mathematics and physics are reviewed. Their original versions state that some property

Φ (x)

holds for P-almost all

x \in X

, where P is a probability measure on some space X. Their more refined (and typically more recent) formulations show that

Φ (x)

holds for all P-random

x \in X

. The computational notion of P-randomness used here generalizes the one introduced by Martin-Löf in 1966 in a way now standard in algorithmic randomness. Examples come from probability theory, analysis, dynamical systems/ergodic theory, statistical mechanics, and quantum mechanics (especially hidden variable theories). An underlying philosophical theme, inherited from von Mises and Kolmogorov, is the interplay between probability and randomness, especially: which comes first?

Keywords:

algorithmic randomness; probability; entropy; statistical mechanics; quantum mechanics

MSC:

60K35; 68Q30; 82C03

Dedicated to the memory of Marinus Winnink (1936–2023)

1. Introduction

The introduction of probability in statistical mechanics in the 19th century by Maxwell and Boltzmann raised questions about both the meaning of this concept by itself and its relationship to randomness and entropy [1,2,3,4,5]. Roughly speaking, both initially felt that probabilities in statistical mechanics were dynamically generated by particle trajectories, a view which led to ergodic theory. But subsequently Boltzmann [6] introduced his counting arguments as a new start; these led to his famous formula for the entropy

S = k log W

on his gravestone in Vienna, i.e.,

probability first, entropy second.

This was turned on its head by Einstein [7], who had rediscovered much of statistical mechanics by himself in his early work, always stressing the role of fluctuations. He expressed the probability of energy fluctuations in terms of entropy seen as a primary concept. This suggests:

entropy first, probability second.

From the modern point of view of large deviation theory [8,9,10,11], what happens is that for finite N some stochastic process

(X_{N})

fluctuates around its limiting value

\bar{X}

as

N \to \infty

(if it has one), and, under favorable circumstances that often obtain in statistical mechanics, the “large” i.e.,

O (1)

fluctuations (as opposed to the

O (1 / \sqrt{N})

fluctuations, which are described by the central limit theorem [12]) can be computed via an entropy function

S (x)

whose argument x lies in the (common) codomain

X

of

X_{N} : Ω \to X

. Since the domain

Ω

of

X_{N}

carries a probability measure to begin with, it seems an illusion that entropy could be defined without some prior notion of probability.

Similar questions may be asked about the connection between probability and randomness (and, closing the triangle, of course also about the relationship between randomness and entropy). First, in his influential (but flawed) work on the foundations of probability, von Mises [13,14] initially defined randomness through a Kollektiv (which, with hindsight, was a precursor to a random sequence). From this, he extracted a notion of probability via asymptotic relative frequencies. See also [5,15,16,17]. Von Plato [5] (p. 190) writes that ‘He [von Mises] was naturally aware of the earlier attempts of Einstein and others at founding statistical physics on classical dynamics’ and justifies this view in his Section 6.3. Thus:

randomness first, probability second.

Kolmogorov [18], on the other hand, (impeccably) defined probability first (via measure theory), in terms of which he hoped to understand randomness. In other words, his (initial) philosophy was:

probability first, randomness second.

Having realized that this was impossible, thirty years later Kolmogorov [19,20] arrived at the concept of randomness named after him, using tools from computer and information science that actually had roots in in the work of von Mises (as well as of Turing, Shannon, and others). See [5,15,21,22,23,24]. So:

Kolmogorov randomness first, measure-theoretic probability second.

But I will argue that even Kolmogorov randomness seems to rely on some prior concept of probability, see Section 3 and in particular the discussion surrounding Theorem 4; and this is obviously the case for Martin-Löf randomness, both in its original form for binary sequences (which is essentially equivalent to Kolmogorov randomness as extended from finite strings to infinite sequences, see Theorem 3) and in its generalizations (see Section 3). So I will defend the view that after all we have

some prior probability measure first, Martin-Löf randomness second.

In any case, there isn’t a single concept of randomness [25], not even within the algorithmic setting [26]; although the above slogan probably applies to most of them.

Motivated by the above discussion and its potential applications to physics, the aim of this paper is to review the interplay between probability, (algorithmic) randomness, and entropy via examples from probability itself, analysis, dynamical systems and (Boltzmann-style) statistical mechanics, and quantum mechanics. Some basic relations are explained in the next Section 2. In Section 3 I review algorithmic randomness beyond binary sequences. Section 4 introduces some key “intuition pumps”: these are results in which ‘for P-almost every x:

Φ (x)

’ in some “classical” result can be replaced by ‘for all P-random x:

Φ (x)

’ in an “effective” counterpart thereof; this replacement may even be seen as the essence of algorithmic randomness. In Section 5 I apply this idea to statistical mechanics, and close in Section 6 with some brief comments on quantum mechanics. The paper closes with a brief summary.

2. Some Background on Entropy and Probability

Consider Figure 1, which connects and illustrates the main examples in this paper.

Here

N \in N

is meant to be some large natural number, whereas

q \in {2, 3, \dots}

, the cardinality of

A = {a_{0}, \dots, a_{q - 1}},

(1)

could be anything (finite), but is small (

q = 2

) in the already interesting case of binary strings. In what follows,

A^{N}

is the set of all functions

σ : N \to A

, where

N = {0, 1, \dots, N - 1},

(2)

as usual in set theory. Such a function is also called a string over A, having length

ℓ (σ) \equiv | σ | = N .

(3)

We write either

σ (n)

or

σ_{n}

for its value at

n \in N

, and may write

σ

as

σ_{0} σ_{1} \dots σ_{N - 1}

. In particular, if

A = 2 = {0, 1}

, then

σ

is a binary string. I write

A^{*} \equiv A^{< ω} = ⋃_{N \in N} A^{N},

(4)

so that for example

2^{*} = ⋃_{N \in N} 2^{N}

is the set of all binary strings. Thus a (binary) string

σ

is finite, whereas a (binary) sequence s is infinite. The set of all binary sequences is denoted by

2^{ω}

, and likewise

A^{ω}

consists of all functions

s : N \to A

. For

s \in A^{ω}

, I write

s_{| N}

for

s_{0} s_{1} \dots s_{N - 1} \in A^{N}

, to be sharply distinguished from

s_{n} \equiv s (n) \in A

. Using this notation, I now review various ways of looking at Figure 1. Especially in the first two items below it is hard to avoid overlap with e.g., the reviews [27,28], which I recommend for further information.

In statistical mechanics as developed by Boltzmann in 1877 [6], and more generally in what one might call “Boltzmann-style statistical mechanics”, which is based on typicality arguments [29], N is the number of (distinguishable) particles under consideration, and A could be a finite set of single-particle energy levels. More generally, $a \in A$ is some property each particle may separately have, such as its location in cell $X_{a}$ relative to some partition

$X = ⨆_{a \in A} X_{a}$

(5)

of the single-particle phase space or configuration space X accessible to each particle. Here $X_{a} \subset X$ and different $X_{a}$ are disjoint, which fact is expressed by the symbol ⨆ in (5). One might replace ⨆ by ⋃ as long as one knows that the subsets $X_{a}$ are mutually disjoint (and measurable as appropriate). The microstate $σ \in A^{N}$ is a function

$σ : {0, 1, \dots, N - 1} \to A$

(6)

that specifies which property (among the possibilities in A) each particle has. Thus also spin chains fall under this formalism, where $σ_{n} \in A$ is some internal degree of freedom at site n. In Boltzmann-style arguments it is often assumed that each microstate is equally likely, which corresponds to the probability $P_{f}^{N}$ on $A^{N}$ defined by

$P_{f}^{N} ({σ}) = {| A |}^{- N} = q^{- N},$

(7)

for each $σ \in A^{N}$ . This is the Bernoulli measure on $A^{N}$ induced by the flat prior $p = f$ on A,

$f (a) = 1 / q,$

(8)

for each $a \in A$ . More generally, $P_{p}^{N}$ is the Bernoulli measure on $A^{N}$ induced by some probability distribution p on A; that is, the product measure of N copies of p; some people write $P_{p}^{N} = p^{\times N}$ . This extends to the idealized case $A^{ω}$ , as follows. For $σ \in A^{N}$ we define

$[σ] : = σ A^{ω} = {s \in A^{ω} ∣ σ ≺ s},$

(9)

where $σ ≺ s$ means that $s = σ τ$ for some $τ \in A^{ω}$ (in words: $σ \in A^{*}$ is a prefix of $s \in A^{ω}$ ). On these basic measurable (and open) sets we define a probability measure $P_{p}^{ω}$ by

$P_{p}^{ω} ([σ]) = P_{p}^{N} (σ) .$

(10)

In particular, if $σ \in A^{N}$ , then its unbiased probability is simply given by

$P_{f}^{ω} ([σ]) = {| A |}^{- N} .$

(11)

It is important to keep track of p even if it is flat: making no (apparent) assumption (which $p = f$ is often taken to be) is an important assumption! For example, Boltzmann’s famous counting argument [6] really reads as follows [30,31,32,33]. The formula

$S = k log W$

(12)

on Boltzmann’s grave should more precisely be something like

$S_{B}^{N} (μ) = log W^{N} (μ),$

(13)

where I omit the constant k and take $μ \in Prob (A)$ to be the relevant argument of the (extensive) Boltzmann entropy $S_{B}^{N}$ (see below). Furthermore, $W^{N} (μ)$ is the probability (“Wahrscheinlichkeit”) of $μ$ , which Boltzmann, assuming the flat prior (8) on A, took as

$W^{N} (μ) = \frac{N (μ)}{{| A |}^{N}},$

(14)

where $N (μ)$ is the number of microstates $σ \in A^{N}$ whose corresponding empirical measure

$L_{N} (σ) = \frac{1}{N} \sum_{n = 0}^{N - 1} δ_{σ (n)},$

(15)

equals $μ$ . Here, for any $b \in A$ , $δ_{b} \in Prob (A)$ is the point measure at b, i.e., $δ_{b} (a) = δ_{a b}$ , the Kronecker delta. The number $N (μ)$ is only nonzero if $μ \in {Prob}_{N} (A)$ , which consists of all probability distributions $ν$ on A that arise as $ν = L_{N} (σ)$ for some $σ \in A^{N}$ . This, in turn, means that $ν (a) = ν^{'} (a) / N$ for some $ν^{'} (a) \in {0, \dots, N}$ , with $\sum_{a \in A} ν^{'} (a) = N$ . In that case,

$N (μ) = \frac{N!}{\prod_{a \in A} (μ^{'} (a)!)} .$

(16)

The term ${| A |}^{- N}$ in (14) of course equals $P^{N} (σ)$ for any $σ \in A^{N}$ and hence certainly for any $σ$ for which $L_{N} (σ) = μ$ . For such $σ$ , for general Bernoulli measures $P_{p}$ on $A^{N}$ we have

$P_{p}^{N} (σ) = e^{N \sum_{a \in A} μ (a) log p (a)} = e^{- N (h (μ) + I (μ | p))},$

(17)

in terms of the Shannon entropy and the Kullback–Leibler distance (or divergence), given by

$\begin{matrix} h (μ) & : = - \sum_{a \in A} μ (a) log μ (a); \end{matrix}$

(18)

$\begin{matrix} I (μ | p) & : = \sum_{a \in A} μ (a) log (\frac{μ (a)}{p (a)}), \end{matrix}$

(19)

respectively. These are simply related: for the flat prior (8) we have

$\begin{matrix} I (μ | f) = - h (μ) + log | A | . \end{matrix}$

(20)

In general, computing $S_{B}^{N} (μ)$ from (13) and, again assuming $L_{N} (σ) = μ$ , we obtain

$W^{N} (μ) = P_{p}^{N} (L_{N} = μ) = N (μ) P_{p}^{N} (σ) = \frac{N! e^{- N (h (μ) + I (μ | p))}}{\prod_{a \in A} (μ^{'} (a)!)},$

(21)

from which Stirling’s formula (or the technique in [Section 2.1] in [31]) gives

$s_{B} (μ | p) : = lim_{N \to \infty} \frac{S_{B}^{N} (μ_{N})}{N} = - I (μ | p),$

(22)

where $μ_{N} \in {Prob}_{N} (A)$ is any sequence of probability distributions on A that (weakly) converges to $μ \in Prob (A)$ , i.e., the variable in $s_{B} (\cdot | p)$ . For the flat prior (8), Equation (20) yields

$s_{B} (μ | f) = h (μ) - log | A | .$

(23)

As an aside, note that the Kullback–Leibler distance or relative entropy (19) is defined more generally for probability measures $μ$ and p on some measure space $(A, Σ)$ . As usual, we write $μ ≪ p$ iff $μ$ is absolutely continuous with respect to p, i.e., $p (B) = 0$ implies $μ (B) = 0$ for $B \in Σ$ . In that case, the Radon–Nikodym derivative $d μ / d p$ exists, and one has

$I (μ | p) : = \int_{A} d p \frac{d μ}{d p} log (\frac{d μ}{d p}) .$

(24)

If $μ$ is not absolutely continuous with respect to p, one puts $I (μ | p) : = \infty$ . The nature of the empirical measure (15) and the Kullback–Leibler distance (19) comes out well in hypothesis testing. In order to test the hypothesis $H_{0}$ that $μ = μ_{0}$ by an N-fold trial $σ \in A^{N}$ , one accepts $H_{0}$ iff $I (L_{N} (σ) | μ_{0}) < η$ , for some $η > 0$ . This test is optimal in the sense of Hoeffding [Section 3.5] in [31]. But let us return to the main story.
The stochastic process $X_{N} : Ω \to X$ whose large fluctuations are described by (22) is

$\begin{matrix} X = Prob (A); & Ω = A^{ω}; & P = P_{p}^{ω}; & X_{N} = L_{N} . \end{matrix}$

(25)

Then $L_{N} \to p$ almost surely, and large fluctuations around this value are described by

$\begin{matrix} lim_{N \to \infty} \frac{1}{N} log P_{p}^{N} (L_{N} \in Γ) = - I (Γ | p) : = - inf_{μ \in Γ} I (μ | p) = sup_{μ \in Γ} s_{B} (μ | p), \end{matrix}$

(26)

where $Γ \subset Prob (A)$ is open, or more generally, is such that $Γ \subseteq \bar{int (Γ)}$ . Less precisely,

$\begin{matrix} P_{p}^{N} (L_{N} \in Γ) \approx e^{- N I (Γ | p)} & as N \to \infty, \end{matrix}$

(27)

which implies that $P_{p}^{N} (L_{N} \in Γ) \approx 1$ if $p \in Γ$ , whereas $P_{p}^{N} (L_{N} \in Γ)$ is exponentially damped if $p \notin Γ$ . Note that the rate function $μ \mapsto I (μ | p)$ defined in (19) and (26) is convex and positive, whereas the entropy (22) is concave and negative. Thus the former is to be minimized, its infimum (even minimum) over $μ \in Prob (A)$ being zero at $μ = p$ , whereas the latter is to be maximized, its supremum (even maximum) at the same value being zero. The first term in (23) hides the negativity of the Boltzmann entropy (here for a flat prior), but the second term drives it below zero. Positivity of $I (μ | p)$ follows from (or actually is) the Gibbs inequality. Equation (26) is a special case of Sanov’s theorem, which works for arbitrary Polish spaces (instead of our finite set A); see [8,31]. Einstein [7] computes the probability of large fluctuations of the energy, rather than of the empirical measure, as Boltzmann did in 1877 [6], but these are closely related.
Interpreting A as a set of energy levels, the relevant stochastic process $X_{N} : Ω \to X$ still has $Ω$ and P as in (25), but this time $X_{N} : Ω \to R$ is the average energy, defined by

$X_{N} (σ) \equiv E_{N} (σ) = \frac{1}{N} \sum_{n = 0}^{N - 1} σ (n) .$

(28)

This makes the relevant entropy $s_{C}$ (which is the original entropy from Clausius-style thermodynamics!) a function of $u \in R$ , interpreted as energy: instead of (26), one obtains

$\begin{matrix} lim_{N \to \infty} \frac{1}{N} log P_{p}^{N} (E_{N} \in Δ) & = sup_{u \in Δ} s_{C} (u | p); \end{matrix}$

(29)

$\begin{matrix} s_{C} (u | p) & : = sup_{μ \in Prob (A)} \{s_{B} (μ | p) ∣ \sum_{a \in A} μ (a) \cdot a = u\}, \end{matrix}$

(30)

which “maximal entropy principle” is a special case of Cramér’s theorem [8,31]. If $\bar{u} = \sum_{a \in A} p (a) \cdot a$ lies in $Δ \subset R$ , then $log P_{p}^{N} (E_{N} \in Δ) \approx 1$ . If not, this probability is exponentially small in N. To obtain the classical thermodynamics of non-interacting particles [32], one may add that the free energy

$f (β | p) = log (\sum_{a \in A} p (a) e^{- β a})$

(31)

is essentially the Fenchel transform [34] of the entropy $s_{C} (u | p)$ , in that

$\begin{matrix} β f (β | p) = inf_{u \in R} {β u - s_{C} (u | p)}; & s_{C} (u | p) = inf_{β \in R} {β u - β f (β | p)} . \end{matrix}$

(32)

For $β > 0$ , the first equality is a refined version of “ $F = E - T S$ ”.
In information theory as developed by Shannon [35] (see also [36,37,38]) the “N” in our diagram is the number of letters drawn from an alphabet A by sampling a given probability distribution $p \in Prob (A)$ , the space of all probability distributions on A. So each microstate $σ \in A^{N}$ is a word with N letters. The entropy of p, i.e.,

$h_{2} (p) : = - \sum_{a \in A} p (a) {log}_{2} p (a) = \sum_{a \in A} p (a) I_{2} (a),$

(33)

plays a key role in Shannon’s approach. It is the expectation value $h_{2} (p) = {〈 I 〉}_{p}$ of the function

$I_{2} (a) : = - {log}_{2} p (a),$

(34)

interpreted as the information contained in $a \in A$ , relative to p. This interpretation is evident for the flat distribution $p = f$ on an alphabet with $| A | = 2^{n}$ letters, in which case $I_{2} (a) = n$ for each $a \in A$ , which is the minimal number of bits needed to (losslessly) encode a. The general case is covered by the noiseless coding theorem for prefix (or uniquely decodable) codes. A map $C : A \to 2^{*}$ is a prefix code if it is injective and $C (a)$ is never a prefix of $C (b)$ for any $a, b \in A$ , that is, there is no $τ \in 2^{*}$ such that $C (b) = C (a) τ$ . A prefix code is uniquely decodable. Let $C : A \to 2^{*}$ be a prefix code, let $ℓ (C (a))$ be the length of the codeword $C (a)$ , with expectation

$L (C, p) = \sum_{a \in A} p (a) ℓ (C (a)) .$

(35)

An optimal code minimizes this. Then:
- Any prefix code satisfies $h_{2} (p) \leq L (C, p)$ ;
- There exists an optimal prefix code C, which satisfies $L (C, p) \leq h_{2} (p) + 1$ .
- One has $h_{2} (p) = L (C, p)$ iff $ℓ (C (a)) = I_{2} (a)$ for each $a \in A$ (if this is possible).
Of course, the equality $ℓ (C (a)) = I_{2} (a)$ can only be satisfied if $p (a) = 2^{- k}$ for some integer $k \in N$ . Otherwise, one can find a code for which $ℓ (C (a)) = [I_{2} (a)]$ , the smallest integer $\geq I_{2} (a)$ . See e.g., [Section 5.4] in [36].
Thus the information content $I_{2} (a)$ is approximately the length of the code-word $C (a)$ in some optimal coding C. Passing to our case of interest of N-letter words over A, in case of a memoryless source one simply has the Bernoulli measure $P_{p}^{N}$ on $A^{N}$ , with entropy

$H_{2} (P_{p}^{N}) = - \sum_{σ \in A^{N}} P_{p}^{N} (σ) {log}_{2} P_{p}^{N} (σ) = N h_{2} (p) .$

(36)

Extending the letter-code $C : A \to 2^{*}$ to a word-code $C^{N} : A^{N} \to 2^{*}$ by concatenation, i.e., $C^{N} (a_{i_{0}} \dots a_{i_{N - 1}}) = C (a_{i_{0}}) \dots C (a_{i_{N - 1}})$ , and replacing $L (C^{N}, P_{p}^{N})$ , which diverges as $N \to \infty$ , by the average codeword length per symbol $L (C^{N}, P_{p}^{N}) / N$ , an optimal code C satisfies

$lim_{N \to \infty} \frac{L (C^{N}, P_{p}^{N})}{N} = h_{2} (p) .$

(37)

In what follows, the Asymptotic Equipartition Property or AEP will be important. In its (probabilistically) weak form, which is typically used in information theory, this states that

$\begin{matrix} \forall_{ε > 0} lim_{N \to \infty} P_{p}^{N} (\{σ \in A^{N} ∣ P_{p}^{N} (σ) \in [2^{- N (h_{2} (p) + ε)}, 2^{- N (h_{2} (p) - ε)}]\}) = 1 . \end{matrix}$

(38)

Its strong form, which is the (original) Shannon–McMillan–Breiman theorem, reads

$\begin{matrix} P_{p}^{ω} (\{s \in A^{ω} ∣ lim_{N \to \infty} - \frac{1}{N} {log}_{2} P_{p}^{N} (s_{| N}) = h_{2} (p)\}) = 1 . \end{matrix}$

(39)

Either way, the idea is that for large N, with respect to $P_{p}^{N}$ “most” strings $σ \in A^{N}$ have “almost” the same probability $2^{- N h_{2} (p)}$ , whilst the others are negligible [lecture 2] in [30]. For $A = 2$ with flat prior $p = f$ this yields a tautology: all strings $σ \in 2^{N}$ have $P_{f}^{N} (σ) = 2^{- N}$ . See e.g., [Sections 3.1 and 16.8] in [36]. The strong form follows from ergodic theory, cf. (53).
In dynamical systems along the lines of the ubiquitous Kolmogorov [39], one starts with a triple $(X, P, T)$ , where X–more precisely $(X, Σ)$ , but I usually suppress the $σ$ -algebra $Σ$ –is a measure space, P is a probability measure on X (more precisely, on $Σ$ ), and $T : X \to X$ is a measurable (but not necessarily invertible) map, required to preserve P in the sense that $P (T^{- 1} B) = P (B)$ for any $B \in Σ$ . A measurable coarse-graining (5) defines a map

$\begin{matrix} ξ : X \to A^{ω}; & ξ {(x)}_{n} = a \in A iff T^{n} x \in X_{a}, \end{matrix}$

(40)

in terms of which the given triple $(X, P, T)$ is coarse-grained by a new triple $(A^{ω}, ξ_{*} P, S)$ . Here $ξ_{*} P (B^{'}) = P (ξ^{- 1} B)$ is the induced probability on $A^{ω}$ , whilst S is the (unilateral) shift

$\begin{matrix} S : A^{ω} \to A^{ω}; & {(S s)}_{n} : = s_{n + 1} (n = 0, 1, \dots) \end{matrix}$

(41)

A fine-grained path $(x, T x, T^{2} x, \dots) \in X^{ω}$ is coarse-grained to $ξ (x) \in A^{ω}$ , and truncating the latter at $t = N - 1$ gives $ξ {(x)}_{| N} \in A^{N}$ . Hence the configuration in Figure 1 states that our particle starts from $x \in X_{a (0)} \subset X$ at $t = 0$ , moves to $T x \in X_{a (1)} \subset X$ at $t = 1$ , etc., and at time $t = N - 1$ finds itself at $T^{N - 1} x \in X_{a (N - 1)} \subset X$ . In other words, a coarse-grained path $σ \in A^{N}$ tells us exactly that $T^{n} x \in X_{σ_{n}}$ , for $n = 0, 1, \dots, N - 1$ ( $σ_{n} \in A$ ). Note that the shift satisfies

$S \circ ξ = ξ \circ T,$

(42)

so if $ξ$ were invertible, then nothing would be lost in coarse-graining; using the bilateral shift on $2^{Z}$ instead of the unilateral one in the main text, this is the case for example with the Baker’s map on $X = [0, 1) \times [0, 1)$ with $A = 2$ and partition $(X_{1} = {[0, 1 / 2] \times [0, 1)$ , $X_{2} = (1 / 2, 1) \times [0, 1)})$ .
The point of Kolmogorov’s approach is to refine the partition (5), which I now denote by

$π = {X_{a}, a \in A} \subset Σ \subset P (X),$

(43)

to a finer partition $π^{N} = {X_{σ}, σ \in A^{N}} \subset Σ$ of X, which consists of all non-empty subsets

$X_{σ_{0} \dots σ_{N - 1}} : = X_{σ_{0}} \cap T^{- 1} X_{σ_{1}} \cap \dots \cap T^{- (N - 1)} X_{σ_{N - 1}} .$

(44)

Indeed, if we know x, then we know both the (truncated) fine- and coarse-grained paths

$\begin{matrix} (x, T x, \dots, T^{N - 1} x) \in X^{N}; & ξ {(x)}_{| N} = σ_{0} \dots σ_{N - 1} \in A^{N} & (T^{n} x \in X_{σ_{n}}, n \in N) . \end{matrix}$

(45)

But if we just know that $x \in X_{a}$ , we cannot construct even the coarse-grained path $ξ {(x)}_{| N}$ . To do so, we must know that $x \in X_{σ}$ , for some $σ = σ_{0} \dots σ_{N - 1} \in A^{N}$ (provided $X_{σ} \neq \emptyset$ ). In other words, the unique element $π^{N} (x) = X_{σ (x)}$ of the partition $π^{N}$ that contains x, bijectively corresponds to a coarse-grained path $σ (x) \in A^{N}$ , and hence we may take $P (π^{N} (x))$ to be the probability of the coarse-grained path $σ (x)$ . This suggests an information function

$I_{(X, P, T, π^{N})} (x) : = - {log}_{2} P (π^{N} (x)),$

(46)

cf. (34), and, as in (33), an average (=expected) information or entropy function

$H_{(X, P, T, π^{N})} : = {〈 I_{(X, P, T, π^{N})} 〉}_{P} = \int_{X} d P (x) I_{(X, P, T, π^{N})} (x) = - \sum_{Y \in π^{N}} P (Y) {log}_{2} (P (Y)) .$

(47)

As $H_{(X, P, T, π^{M + N})} \leq H_{(X, P, T, π^{M})} + H_{(X, P, T, π^{N})}$ , this (extensive) entropy has an (intensive) limit

$h_{(X, P, T, π)} : = lim_{N \to \infty} \frac{1}{N} H_{(X, P, T, π^{N})},$

(48)

in terms of which the Kolmogorov–Sinai entropy of our system $(X, P, T)$ is defined by

$h_{(X, P, T)} : = sup_{π} h_{(X, P, T, π)},$

(49)

where the supremum is taken over all finite measurable partitions of X, as above.
We say that $(X, P, T)$ is ergodic if for every T-invariant set $A \in Σ$ (i.e., $T^{- 1} A = A$ ), either $P (A) = 0$ or $P (A) = 1$ . For later use, I now state three equivalent conditions for ergodicity of $(X, P, T)$ ; see e.g., [Section 4.1] in [40]. Namely, $(X, P, T)$ is ergodic if and only if for P-almost every x:

$\begin{matrix} lim_{N \to \infty} \frac{1}{N} \sum_{n = 0}^{N - 1} δ_{T^{n} x} & = P & (weakly in Prob (X)); \end{matrix}$

(50)

$\begin{matrix} lim_{N \to \infty} \frac{1}{N} \sum_{n = 0}^{N - 1} f (T^{n} x) & = \int_{X} d P f & for each f \in L^{1} (X, P); \end{matrix}$

(51)

$\begin{matrix} lim_{N \to \infty} \frac{1}{N} | {n \in {0, \dots, N - 1} : T^{n} x \in B} | & = P (B); & for each B \in Σ . \end{matrix}$

(52)

The empirical measure (15) is a special case of (50). Equation (51) is a special case of Birkhoff’s ergodic theorem; equation (51) is Birkhoff’s theorem assuming ergodicity. In general, the l.h.s. is in $L^{1} (X, P)$ and is not constant P-a.e. Each of these is a corollary of the others, e.g., (50) and (51) are basically the same statement, and one obtains (52) from (51) by taking $f = 1_{B}$ . Note that the apparent logical form of (51) is: ‘for all f and all x’, which suggests that the universal quantifiers can be interchanged, but this is false: the actual logical for is: ‘for all f there exists a set of P-measure zero’, which in general cannot be interchanged (indeed, in standard proofs the measure-zero set explicitly depends on f). Nonetheless, in some cases a measure zero set independent of f can be found, e.g., for compact metric spaces and continuous f, cf. [Theorem 3.2.6] in [40]. Similar f-independence will be true for the computable case reviewed below, which speaks in their favour. Likewise for (52). Equation (51) implies the general Shannon–McMillan–Breiman theorem, which in turn implies the previous one (39) for information theory by taking $(X = A^{ω}, P = P_{p}^{ω}, T = S)$ . Namely:
Theorem 1.
If $(X, P, T)$ is ergodic, then for P-almost every $x \in X$ one has

$h_{(X, P, T, π)} = - lim_{N \to \infty} \frac{1}{N} {log}_{2} P (π^{N} (x)) .$

(53)

See e.g., [Theorem 9.3.1] in [40]. Comparing this with (48) and (47), the average value of the information $I_{(X, P, T, π^{N})} (x)$ w.r.t. P can be computed from its value at a single point x, as long as this point is “typical”. As in the explanation of the original theorem in information theory, equation (53) implies that all typical paths (with respect to P) have about the same probability $\approx exp (- N h_{(X, P, T, π)})$ .
See [41] for Kolmogorov’s contributions to dynamical systems and ergodic theory. Relevant textbooks include for example [40,42].

3. P-Randomness

The concept of P-randomness (where P is a probability measure on some measure space

(X, Σ)

) was introduced by Martin-Löf in 1966 [43] for the case

X = 2^{ω}

and

P = P_{f}^{ω}

, i.e., the unbiased Bernoulli measure on the space of infinite coin flips. Following [Section 3] in [44], a more general definition of P-randomness which is elegant and appropriate to my goals is as follows:

Definition 1.

1.: A topological space X is effective if it has a countable base $B \subset O (X)$ with a bijection

$B : N \overset{≅}{\to} B .$

(54)

An effective probability space $(X, B, P)$ is an effective topological space X with a Borel probability measure P, i.e., defined on the open sets $O (X)$ .
2.: An open set $V \in O (X)$ as in 1. is computable if for some computable function $f : N \to N$ ,

$V = ⋃_{n \in N} B (f (n)) .$

(55)

Here f may be assumed to be total without loss of generality. In other words,

$V = ⋃_{n \in E} B (n)$

(56)

for some c.e. set $E \subset N$ (where c.e. means computably enumerable, i.e., $E \subset N$ is the image of a total computable function $f : N \to N$ ).
3.: A sequence $(V_{n})$ of opens $V_{n} \in O (X)$ is computable if

$V_{n} = ⋃_{m \in N} B (g (n, m))$

(57)

for some (total) computable function $g : N \times N \to N$ ; that is,

$V_{n} = ⋃_{m ∣ (n, m) \in G} B (m)$

(58)

for some c.e. $G \subset N^{2}$ . Without loss of generality we may and will assume that the (double) sequence $V_{(n, m)} = B (n) \cap B (m)$ is computable.
4.: A (randomness) test is a computable sequence $(V_{n})$ as in 3. for which for all $n \in N$ one has

$P (V_{n}) \leq 2^{- n} .$

(59)

One may (and will) also assume without loss of generality that for all n we have

$V_{n + 1} \subset V_{n} .$

(60)
5.: A point $x \in X$ is P-random if $x \notin N$ for any subset $N \subset X$ of the form

$N = ⋂_{n} V_{n},$

(61)

where $(V_{n})$ is some test (since $P (⋂_{n} V_{n}) = 0$ , such an N is called an effective null set).
6.: A measure P in an effective probability space $(X, B, P)$ is upper semi-computable if the set

$U (P) : = \{(F, q) \in P_{f} (N) \times Q ∣ P (⋃_{n \in F} B (n)) < q\}$

(62)

is c.e. (assuming to some computable isomorphisms $P_{f} (N) ≅ N$ and $Q ≅ N$ ). Also, P is lower semi-computable if the set $L (P)$ , defined like (62) with $> q$ instead of $< q$ , is c.e. Finally, P is computable if it is upper and lower semi-computable, in which case $(X, B, P)$ is called a computable probability space (and similarly for upper and lower computability).

Note that parts 1 to 5 do not impose any computability requirement on P, but even so it easily follows that

P (R) = 1

, where

R \subset X

is the set of all P-random points in X. However, if P is upper semi-computable, one has a generalization of a further central result of Martin-Löf [43] (p. 605).

Definition 2.

A universal test is a test

(U_{n})

such that for any test

(V_{n})

there is a constant

c = c (U, V) \in N

such that for each

n \in N

we have

V_{n + c} \subset U_{n}

.

Universal tests exist provided P is upper semi-computable, which in turn implies that

x \in X

is P-random iff

x \notin U

. See [Theorem 3.10] in [44]. Compared with computable metric spaces [45], which for all purposes of this paper could have been used, too, Hertling and Weihrauch [44], whom I follow, avoid the choice of a countable dense subset of X. The latter is unnatural already in the case

X = A^{ω}

, where

σ \in A^{*} ≅ N

has to be injected into

A^{ω}

via a map like

σ \mapsto σ a^{ω}

for some fixed

a \in A

(where

a^{ω}

repeats a infinitely often). On the other hand, the map

A^{*} \to O (A^{ω})

,

σ \mapsto [σ]

, where

[σ] = σ A^{ω}

, is quite natural (here

O (X)

is the topology of X). If P is computable as defined in clause 6, P is a computable point in the effective space of all probability measures on X [46,47,48], where a point x in an effective topological space is deemed computable if

{x} = \cap_{n} V_{n}

for some computable sequence

V_{n}

.

A key example is

X = A^{ω}

over a finite alphabet A, with topology

O (X)

generated by the cylinder sets

[σ] = σ A^{ω}

, where

σ \in A^{N}

and

N \in N

. The usual lexicographical order on

A^{*}

then gives a bijection

L : N \overset{≅}{\to} A^{*}

, and hence a numbering

\begin{matrix} B : N \overset{≅}{\to} B; & n \mapsto [L (n)] = L (n) A^{ω} . \end{matrix}

(63)

The Bernoulli measures

P_{p}^{ω}

on

A^{ω}

then have the same computability properties as

p \in Prob (A)

. In particular, the flat prior f makes

P_{f}^{ω}

computable, and in case that

A = 2

, the computability properties of

p \in [0, 1] ≅ Prob (2)

are transferred to

P_{p}^{ω}

. This is all we need for my main theme.

In case of a flat prior f on A, the above notion of randomness of sequences in

A^{ω}

is equivalent to the definition of Martin-Löf [43], which I will now review, following Calude [49]. Though equivalent to the definition of Martin-Löf random sequences in books like [22,50,51], the construction in [49] is actually closer in spirit to Martin-Löf [43] and has the advantage of being compatible with Earman’s principle (see below) even before redefining randomness in terms of Kolmogorov complexity. The definition in the other three books just cited lacks this feature.

Calude’s definition is based on first defining random strings

σ \in A^{*}

. Since

A^{*}

has the discrete topology, we simply take

B

to consist of all singletons

{σ}

,

σ \in A^{*}

, with

B = L

as in (63). Since unlike

A^{N}

or

A^{ω}

the set

A^{*}

does not carry a useful probability measure P, we replace (59) by

P_{f}^{N} (V_{n} \cap A^{N}) \leq \frac{{| A |}^{- n}}{| A | - 1} .

(64)

Definition 3.

A sequential test is a computable sequence

(V_{n})

of subsets

V_{n} \subset A^{*}

such that:

1.: The inequality (64) holds;
2.: $V_{n + 1} \subset V_{n}$ (as in Definition 1 (4));
3.: $σ \in V_{n}$ and $σ ≺ τ$ imply $τ \in V_{n}$ (i.e., extensions of $σ \in V_{n}$ also belong to $V_{n}$ ).

Since (64) is the same as

| V_{n} \cap A^{N} | \leq \frac{{| A |}^{N - n}}{| A | - 1},

(65)

via Equation (65), Definition (3) implies that

V_{n} \cap A^{N} = \emptyset

for all

N < n

. A simple example of a sequential test for

A = 2

is

V_{n} = {σ \in 2^{*} ∣ 1^{n} ≺ σ}

, i.e., the set of all strings starting with n copies of 1. There exists a universal sequential test

(U_{n})

such that for any sequential test

(V_{n})

there is a

c = c (U, V) \in N

such that for each

n \in N

we have

V_{n + c} \subset U_{n}

. See [Theorem 6.16 and Definition 6.17] in [49]. For this (or indeed any) test U we define

m_{U} (σ) : = 0

if

σ \notin U_{1}

, and otherwise

\begin{matrix} m_{U} (σ) : = max {n \in N ∣ σ \in U_{n}} . \end{matrix}

(66)

By the comment after (65) we have

m_{U} (σ) \leq | σ | < \infty

, since

σ \in A^{N}

for some N. If

m_{U} (σ) < q

for some

q \in N

, then

σ \notin U_{q}

by definition. Since

U_{n + 1} \subset U_{n}

, this implies

σ \notin U_{q^{'}}

for all

q^{'} > q

, so that also

m_{U} (σ) < q^{'}

. But as we have just seen, we may restrict these values to

q^{'} \leq | σ |

.

Definition 4.

1.: A string $σ \in A^{*}$ is q-random (for some $q \in N$ ) if $m_{U} (σ) < q \leq | σ |$ .
2.: A sequence $s \in A^{ω}$ is Calude random (with respect to $P = P_{f}^{ω}$ ) if there is a constant $q \in N$ such that each finite segment $s_{| N} \in A^{N} \subset A^{*}$ is q-random, i.e., such that for all N,

$m_{U} (s_{| N}) < q .$

(67)

Note that the lower q is, the higher the randomness of

σ

, as it lies in fewer sets

U_{n}

. It is easy to show that Calude randomness is equivalent to any of the following three conditions (the third of these is taken as the definition of randomness in [Theorem 6.16 and Definition 6.25] in [49]:)

\begin{matrix} lim_{N \to \infty} m_{U} (s_{| N}) < \infty; & lim sup_{N \to \infty} m_{U} (s_{| N}) < \infty; \end{matrix}

(68)

\begin{matrix} lim_{N \to \infty} m_{V} (s_{| N}) < \infty & for all sequential tests (V_{n}) . \end{matrix}

(69)

Theorem 2.

A string

s \in A^{ω}

is

P_{f}^{ω}

-random (cf. Definition 1 (5)) iff it is Calude random.

This follows from [Theorem 6.35] in [49] and Theorem 3 below. The point is that randomness of sequences

s \in A^{ω}

can be expressed in terms of randomness of finite initial segments of s. This is also true via another (much better known) reformulation of Martin-Löf randomness.

Definition 5.

A sequence

s \in A^{ω}

is Chaitin–Levin–Schnorr random if there is a constant

c \in N

such that each finite segment

s_{| N} \in A^{N}

is prefix Kolmogorov c-random, in the sense that for all N,

K (s_{| N}) \geq N - c .

(70)

Here

K (σ)

is the prefix Kolmogorov complexity of

σ

with respect to a fixed universal prefix Turing machine; changing this machine only changes the constant c in the same way for all strings

σ

(which makes the value of c somewhat arbitrary). Recall that

K (σ)

of

σ \in A^{*}

is defined as the length of the shortest program that outputs

σ

and then halts, running on a universal prefix Turing machine T (i.e., the domain

D (T)

of T consists of a prefix subset of

2^{*}

, so if

x \in D (T)

then

y \notin D (T)

whenever

x ≺ y

). Fix some universal prefix Turing machine T, and define

K (σ) : = min {| x | : x \in 2^{*}, T (x) = σ} .

(71)

Then

σ \in A^{*}

is c-prefix Kolmogorov random, for some

σ

-independent

c \in N

, if

K (σ) \geq | σ | - c .

(72)

Note that Calude [49] writes

H (σ)

for what, following [22] and others, I call

K (σ)

.

A key result in algorithmic randomness [Theorem 6.2.3] in [51], then, is:

Theorem 3.

A sequence

s \in A^{ω}

is

P_{f}^{ω}

-random iff it is Chaitin–Levin–Schnorr random.

According to [p. 3308, footnote 1] in [52] (with references adapted):

Theorem 3 was announced by Chaitin [53] and attributed to Schnorr (who was the referee of the paper) without proof. The first published proof (in a form generalized to arbitrary computable measures) appeared in the work of Gács [54].

Since Levin [55] also states Theorem 3, the names Chaitin–Levin–Schnorr seem fair.

Hence both Definitions 4 and 5 are compatible on their own terms with Earman’s Principle:

While idealizations are useful and, perhaps, even essential to progress in physics, a sound principle of interpretation would seem to be that no effect can be counted as a genuine physical effect if it disappears when the idealizations are removed. (Earman, [56] (p. 191)).

By Theorem 2, Definition 4 is a special case of Definition 1 (5) and hence it depends on the initial probability

P_{f}^{ω}

on

A^{ω}

. On the other hand, both (65) and the equivalence between Definitions 3 and 4 suggest that

P_{f}^{ω}

-randomness does not depend on

P_{f}^{ω}

! To assess this, let us look at a version of Theorem 3 for arbitrary computable measures P on

2^{ω}

[54,55].

Theorem 4.

Let P be a computable probability measure on

2^{ω}

. Then

s \in 2^{ω}

is P-random iff there is a constant

c \in N

such that for all N,

K (s_{| N}) \geq - {log}_{2} (P ([s_{| N}])) - c .

(73)

If

P = P_{f}^{ω}

, then

P ([s_{| N}]) = 2^{- N}

, and so (73) reduces to (70). Thus the absence of a P-dependence in Definition 5 is only apparent, since it implicitly depends on the assumption

p = f

. It seems, then, that Kolmogorov did not achieve his goal of defining randomness in a non-probabilistic way! Indeed, note also that the definition of

K (σ)

depends on the hidden assumption that the length function

σ \mapsto | σ |

on

2^{*}

assigns equal length to 0 and 1 (Chris Porter, email 13 June 2023).

Another interesting example is Brownian motion, which is related to binary sequences via the random walk [12,57]. Brownian motion may be defined as a Gaussian stochastic process

{(B_{t})}_{t \in [0, 1]}

in

R

with variance t and covariance

〈 B_{s} B_{t} 〉 = min (s, t)

. We also assume that

B_{0} = 0

. An equivalent axiomatization states that for each n-tuple

(t_{1}, \dots t_{n})

with

0 \leq t_{1} \leq \dots \leq t_{n}

the increments

B_{t_{n}} - B_{t_{n - 1}}

, …,

B_{t_{2}} - B_{t_{1}}

are independent, that for each t one has

P (B_{t + h} - B_{t} \in [a, b]) = {(2 π t)}^{- 1 / 2} \int_{a}^{b} d x e^{- x^{2} / 2 h},

(74)

and that

t \mapsto B_{t}

is continuous with probability one. If we add that

B_{0} = 0

, these axioms imply

P (B_{t} \in [a, b]) = {(2 π t)}^{- 1 / 2} \int_{a}^{b} d x e^{- x^{2} / 2 t} .

(75)

We switch from

2 = {0, 1}

to

\underset{̲}{2} = {- 1, 1}

. Take

C [0, 1] \equiv C ([0, 1], R)

, seen as a Banach space in the supremum norm

{∥ f ∥}_{\infty} = sup {| f (x) |, x \in [0, 1]}

and hence as a metric space (i.e.,

d (f, g) = {∥ f - g ∥}_{\infty}

) with ensuing Borel structure. For each

N = 1, 2, \dots

, define a map

\begin{matrix} R_{N} : {\underset{̲}{2}}^{N} \to C [0, 1]; & R_{N} (\underset{̲}{σ}) (0) : = 0; & R_{N} (\underset{̲}{σ}) (\frac{n}{N}) : = \frac{1}{\sqrt{N}} \sum_{k = 1}^{n} {\underset{̲}{σ}}_{k} (n = 1, \dots, N), \end{matrix}

(76)

and

R_{N} (\underset{̲}{σ})

is defined at all other points

t \neq n / N

of

[0, 1]

via linear interpolation (i.e., by drawing straight lines between

R_{N} (\underset{̲}{σ}) (n - 1)

and

R_{N} (\underset{̲}{σ}) (n)

for each

n = 1, \dots, N

; I omit the formula). Thus

R_{N} (\underset{̲}{σ})

is a random walk with N steps in which each time jump

t = 0, 1, \dots N

is compressed from unit duration to

1 / N

(so that the time span

[0, N]

becomes

[0, 1]

), and each spatial step size is compressed from

\pm 1

to

\pm 1 / \sqrt{N}

. Now equip

{\underset{̲}{2}}^{N}

with the fair Bernoulli probability measure

P_{f}^{N}

. Then

R_{N}

induces a probability measure

P_{W}^{N}

on

C [0, 1]

in the usual way, i.e.,

P_{W}^{N} (A) = P_{f}^{N} (R_{N}^{- 1} (A)),

(77)

for measurable

A \subset C [0, 1]

. The point, then, is that there is a unique probability measure

P_{W}

on

C [0, 1]

, called Wiener measure, such that

P_{W}^{N} \to P_{W}

weakly as

N \to \infty

. The concept of weak convergence of probability measures on (complete separable) metric spaces X used here is defined as follows: a sequence

(P_{N})

of probability measures on X converges weakly to P iff

lim_{N \to \infty} \int_{X} d P_{N} f = \int_{X} d P f,

(78)

for each

f \in C_{b} (X)

. This is equivalent to

P_{N} (A) \to P (A)

for each measurable

A \subset X

for which

\partial A = \emptyset

. See [58] for both the general theory and its application to Brownian motion, which may now be realized on

(C [0, 1], P_{W})

as

B_{t} = {ev}_{t}

, where the evaluation maps are defined by

\begin{matrix} {ev}_{t} : C [0, 1] \to R; & {ev}_{t} (f) = f (t) . \end{matrix}

(79)

In fact, the set of all paths of the kind

R_{N} (\underset{̲}{σ})

,

\underset{̲}{σ} \in {\underset{̲}{2}}^{N}

, is uniformly dense in

C_{0} [0, 1]

, the set of all

B \in C [0, 1]

that vanish at

t = 0

(on which

P_{W}

is supported). Namely, for

B \in C_{0} [0, 1]

and

N > 0

, recursively define

\begin{matrix} t_{1} & : = min {t \in [0, 1], | B (t) | = 1 / \sqrt{N}}; t_{2} : = min {t \geq t_{1}, | B (t) - B (t_{1}) | = 1 / \sqrt{N}}; \end{matrix}

(80)

\begin{matrix} t_{n} & : = min {t \geq t_{n - 1}, | B (t) - B (t_{n - 1}) | = 1 / \sqrt{N}}, \end{matrix}

(81)

until

n = N

. In terms of these, define

{\underset{̲}{σ}}^{(N)} \in {\underset{̲}{2}}^{N}

by

{\underset{̲}{σ}}_{0}^{(N)} = 0

and then, again recursively until

n = N

,

\begin{matrix} {\underset{̲}{σ}}_{1}^{(N)} = \sqrt{N} B (t_{1}), {\underset{̲}{σ}}_{2}^{(N)} = \sqrt{N} (B (t_{2}) - B (t_{1})), & \dots, & {\underset{̲}{σ}}_{n}^{(N)} = \sqrt{N} (B (t_{n}) - B (t_{n - 1})) . \end{matrix}

(82)

Then

R_{N} ({\underset{̲}{σ}}^{(N)}) \to B

as

N \to \infty

, but

∥ B - R_{N} ({\underset{̲}{σ}}^{(N)}) ∥_{\infty}

is just

O (N^{- 1 / 18})

, cf. [Section 6.4.1] in [12] This enables us to turn

(C [0, 1], P_{W})

(with suppressed Borel structure given by the metric) into an effective probability space. In Definition 1 (1), we take the countable base

B

to consist of all open balls with rational radii around points

R_{N} ({\underset{̲}{σ}}^{(N)})

, where

N \in N_{*}

and

{\underset{̲}{σ}}^{(N)} \in {\underset{̲}{2}}^{N}

, numbered via lexicographical ordering of

{\underset{̲}{2}}^{*}

and computable ismorphisms

Q^{+} ≅ N

and

N^{2} ≅ N

. The following theorem [59] then characterizes the ensuing notion of

P_{W}

-randomness. See also [52,60,61].

Theorem 5.

A path

B \in C [0, 1]

is

P_{W}

-random iff

B = {lim}_{N \to \infty} B_{N}

(w.r.t.

{∥ \cdot ∥}_{\infty}

) effectively for some sequence

B_{N} = R_{N} ({\underset{̲}{σ}}^{(N)})

in

C [0, 1]

for which there is a constant

c \in N

such that for all N,

K ({\underset{̲}{σ}}^{(N)}) \geq N - c .

(83)

Here effective convergence

B_{N} \to B

means that

\forall_{m \in N_{*}} \exists_{N (m)} \forall_{N > N (m)} {∥ B - B_{N} ∥}_{\infty} < 1 / m,

(84)

where

m \mapsto N (m)

is computable (so

1 / m \in Q^{+}

plays the role of

ε \in R^{+}

).

Compare with Definition 5. It might be preferable if there were a single sequence

\underset{̲}{s} \in {\underset{̲}{2}}^{ω}

for which

K (s_{| N}) \geq N - c

, cf. (70), but unfortunately this is not the case [60,61]. Nonetheless, Theorem 5 is satisfactory from the point of view of Earman’s principle above, in that randomness of a Brownian path is characterized by randomness properties of its finite approximants

B_{N}

; indeed, each

{\underset{̲}{σ}}^{(N)} \in {\underset{̲}{2}}^{N}

is c-Kolmogorov random, even for the same value of c for all N.

4. From ‘for P-Almost Every x’ to ‘for All P-Random x’

Although results of the kind reviewed here pervade the literature on algorithmic randomness (and, as remarked in the Introduction, might be said to be a key goal of this theory), their importance for physics still remains to be explored. The idea is best illustrated by the following example, which was the first of its kind. For binary sequences in

2^{ω}

equipped with the flat Bernoulli measure

P_{f}^{ω} \equiv P_{1 / 2}^{ω}

, see (8) etc., the strong law of large numbers (cf. [Section 2.1.2] in [12]) states that

lim_{N \to \infty} \frac{1}{N} \sum_{n = 0}^{N - 1} s_{n} = 1 / 2,

(85)

for

P_{f}^{ω}

-almost every

s \in 2^{ω}

(or:

P_{f}^{ω}

-almost surely). Recall that this means that there exists a measurable subset

A \subset 2^{ω}

with

P_{f}^{ω} (A) = 1

such that (85) holds for each

s \in A

(equivalently: there exists

B \subset 2^{ω}

with

P_{f}^{ω} (B) = 0

such that (85) holds for each

s \notin B

). Theorems like this provide no information about A (or B). Martin-Löf randomness (cf. Definition 1) provides this information (usually at the cost of additional computability assumptions), where I recall that, as explained more generally after Definition 1, the set

R \subset X

of all P-random elements in X has

P (R) = 1

. In the case at hand, the computability assumption behind this result is satisfied since we use a computable flat prior

p = f

under which

(2^{ω}, P_{f}^{ω})

is a computable probability space in the sense of Definition 1.

Theorem 6.

The strong law of large numbers (85) holds for all

P_{f}^{ω}

-random sequences

s \in 2^{ω}

.

See [43] (p. 619), and in detail [Theorem 6.57] in [49]. The law of the iterated logarithm [Section 2.3] in [12] also holds in this sense [62]. More generally, the classical theorem stating that

P_{f}^{ω}

-almost all sequences

s \in A^{ω}

are Borel normal can be sharpened to the statement that all

P_{f}^{ω}

-random sequences

s \in A^{ω}

are Borel normal. See [Theorem 6.61 ] in [49] (a sequence

s \in A^{ω}

is called Borel normal if each string

σ \in A^{N}

occurs in s with the asymptotic relative frequency

{| A |}^{- N}

given by

P_{f}^{ω}

). The most spectacular result in this direction is arguably [Theorem 6.50] in [49]:

Theorem 7.

Any

σ \in A^{*}

occurs infinitely often in every

P_{f}^{ω}

-random sequence

s \in A^{ω}

.

The original version of this “Monkey typewriter theorem” states that any

σ \in A^{*}

occurs infinitely often in

P_{f}^{ω}

-almost all sequences

s \in A^{ω}

. But I wonder if this theorem matches Earman’s principle; I see no interesting and valid version for finite strings (but put this as a challenge to the reader).

The proof of all such theorems, including those to be mentioned below, is by contradiction: x not having the property

Φ (x)

in question, e.g., (85), would make x fail some randomness test.

Interesting examples also come from analysis. The pertinent computable probability space is

([0, 1], λ)

, where

λ

is Lebesgue measure, and for the basic opens

B

in Definition 1 one takes open intervals with rational endpoints, suitably numbered (here I suppress the usual Borel

σ

-algebra B on

[0, 1]

, which is generated by the standard topology). Alternatively, the map

\begin{matrix} 2^{ω} \to [0, 1]; & s \mapsto \sum_{n = 0}^{\infty} \frac{s_{n}}{2^{n + 1}}, \end{matrix}

(86)

induces an isomorphism of probability spaces

(2^{ω}, P_{1 / 2}^{ω}) \overset{≅}{\to} ([0, 1], λ)

, though not a bijection of sets

2^{ω} \to [0, 1]

, since the dyadic numbers (i.e.,

x = m / 2^{n}

for

n \in N

and

m = 1, 2, \dots, 2^{n} - 1

) have no unique binary expansions (the potential non-uniqueness of binary expansions is irrelevant for the purposes of this section, since dyadic numbers are not random). Although (86) is not a homeomorphism, it nonetheless maps the usual

σ

-algebra of measurable subsets of

2^{ω}

to its counterpart for

[0, 1]

. By [Corollary 5.2] in [44] we then have:

Theorem 8.

Let

x = \sum_{n} s_{n} 2^{- n - 1}

. Then

x \in [0, 1]

is λ-random iff

s \in 2^{ω}

is

P_{f}^{ω}

-random.

This matches Theorem 5 in reducing a seemingly different setting for randomness to the case of binary sequences. See [48] for a general perspective on this phenomenon.

Further to Definition 1 above, taken from [44], in what follows I also adopt [Definition 4.2] in [44] as a definition (or characterizing) of computability (which in this approach presupposes continuity):

Definition 6.

if

(X, B)

and

(X^{'}, B^{'})

are effective topological spaces, then

f : X \to X^{'}

is computable iff for each

U^{'} \in B^{'}

the inverse image

f^{- 1} (U^{'}) \subset X

is open and computable.

One of the clearest theorems relating analysis to randomness in the spirit of our theme is the following [Theorem 6.7] in [63] (see also [Section 4] in [64]), which sharpens a classical result to the effect that any function

f : [0, 1] \to R

of bounded variation is almost everywhere differentiable. First, recall that f has bounded variation if there is a constant

C < \infty

such that for any finite collection of points

0 \leq x_{0} < x_{1} \dots < x_{n} < x_{n + 1} \leq 1

one has

\sum_{k = 0}^{n} | f (x_{k + 1} - f (x_{k}) | < C

. By the Jordan decomposition theorem, this turns out to be the case iff

f = g - h

where g and h are non-decreasing.

Theorem 9.

If

f : [0, 1] \to R

is computable and has bounded variation, then f is differentiable at any λ-random

x \in [0, 1]

. Moreover,

x \in [0, 1]

is λ-random iff

f^{'} (x)

exists for every such f.

Theorems like this give us even more than we asked for (which was the mere ability to replace ‘for P-almost every x’ by ‘for all P-random x’): they characterize random points in terms of a certain property that all members of a specific class of computable functions have. There is a similar result in which bounded variation is replaced by absolute continuity. Theorem 9 also has a counterpart in which f is non-decreasing, but here the conclusion is that f is computably random instead of Martin-Löf random (see [65] for computable randomess, originally defined by Schnorr via martingales, which is weaker than Martin-Löf randomness, i.e., Martin-Löf randomness implies computable randomness). A similar classical theorem returns Schnorr randomness (where

x \in X

is Schnorr random if in Definition 1 (4) we replace (59) by

P (V_{n}) = 2^{- n}

; this gives fewer tests to pass, and hence, once again, a weaker sense of randomness than Martin-Löf randomness), see [48,63]:

Theorem 10.

If

f \in L^{1} [0, 1]

is computable, then

{lim}_{h \to 0} \frac{1}{2 h} \int_{x - h}^{x + h} d y f (y)

exists for all λ-random

x \in [0, 1]

, and the above limit exists for each computable

f \in L^{1}

iff

x \in [0, 1]

is Schnorr random.

Now for ergodic theory. Recall the equivalent characterizations of ergodicity stated in equations (50) to (52). We then have (cf. [Theorem 8] in [66], [Theorem 3.2.2] in [67], [Theorem 1.3] in [68]):

Theorem 11.

Let

(X, P, T)

be ergodic with P and T computable. Then (50), restricted to

lim_{N \to \infty} \frac{1}{N} \sum_{n = 0}^{N - 1} δ_{T^{n} x} (V) = P (V),

(87)

for each computable open

V \subset X

(cf. Definition 1 (3)), holds for all P-random

x \in X

. Moreover,

x \in X

is P-random iff (87) holds for every computable T and every computable open

V \subset X

.

The first author to prove such results was V’yugin [69]. See also the reviews [70,71]. In Theorem 11 one could replace (87) with the property that x satisfy (Poincaré) recurrence, in the sense that for each computable open

V \subset X

(not necessarily containing x) there is some

n \in N

such that

T^{n} (x) \in V

. If (51) instead of (50) is used, a result like Theorem 11 obtains that characterizes Schnorr randomness. The Shannon–McMillan–Breiman theorem (53) also falls under this scope. We say that a partition

π

of X is computable if each

X_{A} \subset X

is a computable open set. The defining equation (5) is then replaced by

P (⨆_{a \in A} X_{a}) = 1

.

Theorem 12.

If P and T are computable and T is ergodic, and also the partition π of X is computable, then for every P-random

x \in X

one has

h_{(X, P, T, π)} = - lim sup_{N \to \infty} \frac{1}{N} {log}_{2} P (π^{N} (x)) .

(88)

See [Corollary 6.1.1] in [67]; note the lim sup here.

Things become more interesting if we replace the information function

- {log}_{2} P (π^{N} (x))

by the (prefix) Kolmogorov complexity

K (ξ_{N} (x))

. Recall the map (40), which we may truncate to maps

\begin{matrix} ξ_{N} : X \to A^{N}; & ξ_{N} (x) = ξ {(x)}_{| N}, \end{matrix}

(89)

so that

ξ_{N} {(x)}_{n} = a

(for

n = 0, \dots, N - 1

) identifies the subspace

X_{a}

of the partition our particle occupies after n time steps. Under the same assumptions as Theorem 12, we then have:

h_{(X, P, T, π)} = lim sup_{N \to \infty} \frac{1}{N} K (ξ_{N} (x)),

(90)

for all P-random

x \in X

(and hence for P-almost every

x \in X

), cf. (48). See [67,72,73,74]. Taking the supremum over all computable partitions

π

, the Kolmogorov–Sinai entropy of

(X, P, T)

equals the limiting Kolmogorov complexity of any increasingly fine coarse-grained P-random path for

(X, T)

. Note that the right-hand side of (90) is independent of P, which the left-hand side is not; however, the condition for the validity of (90), namely that x be P-random, depends on P. The equality

lim sup_{N \to \infty} \frac{1}{N} {〈 K \circ ξ_{N} 〉}_{P} = lim sup_{N \to \infty} \frac{1}{N} K (ξ_{N} (x)),

(91)

for all P-random

x \in X

also illustrates our theme; it shows that (at least asymptotically) each P-random x generates a course-grained path

ξ_{N} (x)

that has “average” Kolmogorov complexity. Applying these results to

X = A^{ω}

, with T the unilateral shift and

P = P_{p}^{ω}

the Bernoulli measure on

A^{ω}

given by a probability distribution p on some alphabet A, gives a similar expression for the Shannon entropy (18): for all

P_{p}

-random

s \in A^{ω}

(and hence for

P_{p}

-almost every

s \in A^{ω}

),

lim_{N \to \infty} \frac{1}{N} K (s_{| N}) = h_{2} (p) .

(92)

Note the lim instead of the

lim sup

, which is apparently justified in this special case. Porter [75] states my equation (92) as his Theorem 3.2 and labels it “folklore”, referencing however [55,76]. See also [28,77] for further connections between entropy and algorithmic complexity.

Finally, here are some nice examples involving Brownian motion. Three classical results are:

Theorem 13.

1.: For $P_{W}$ -almost every $B \in C [0, 1]$ there exists $h_{0} > 0$ such that

$| B (t + h) - B (t) | \leq \sqrt{2 h log (1 / h)},$

(93)

for all $0 < h < h_{0}$ and all $0 \leq t \leq 1 - h$ , and $\sqrt{2}$ is the best constant for which this is true.
2.: $P_{W}$ -almost every $B \in C [0, 1]$ is locally Hölder continuous with index $0 < α < 1 / 2$ .
3.: $P_{W}$ -almost every $B \in C [0, 1]$ is not differentiable at any $t \in [0, 1]$ .

See e.g., [Sections 1.2 and 1.3] in [57]. A path

f \in C [0, 1]

is locally Hölder continuous with index

α > 0

if there is

ε > 0

such that if

| s - t | < ε

, then

| f (s) - f (t) | \leq {C | s - t |}^{α}

for some

C > 0

. This implies the same property for any

0 < α^{'} < α

. The value

α < 1 / 2

is optimal:

P_{W}

-almost every

B \in C [0, 1]

fails to be locally Hölder continuous with index

α > 1 / 2

[Remark 1.21] in [57]. Concerning the critical value

α = 1 / 2

, the best one can say is that

P_{W}

-almost every B satisfies

{inf}_{t \in [0, 1]} lim {sup}_{h \to 0} (| B (t + h) - B (t) | / \sqrt{h}) = 1

(ibid. [Theorem 10.30] in [57].

Theorem 14.

Theorem 13 holds verbatim (even without any computability assumption on

t \in [0, 1]

!) if ‘for

P_{W}

-almost every

B \in C [0, 1]

’ is replaced by ‘for every

P_{W}

-random

B \in C [0, 1]

’.

For continuity see [Section 3] in [78] and [Section 2.3] in [79]. For non-differentiability see [Theorem 7] in [78]. See also [80].

5. Applications to Statistical Mechanics

It is the author’s view that many of the most important questions still remain unanswered in very fundamental and important ways. (Sklar [2] (p. 413)).

What many “chefs” regard as absolutely essential and indispensable, is argued to be insufficient or superfluous by many others. (Uffink [3] (p. 925)).

The theme of the previous section is the mathematical key to a physical understanding of the notorious phenomenon of irreversibility, for the moment in classical statistical mechanics. The literature on this topic is enormous; I recommend [2,3,4,29]. My discussion is based on the pioneering work of Hiura and Sasa [81]. But before getting there, I would like to very briefly review Boltzmann’s take on the general problem of irreversibility (which in my view is correct). In Boltzmann’s approach, irreversibility of macroscopic phenomena in a microscopic world governed by Newton’s (time) reversible equations is a consequence of:

1.: Coarse-graining (only certain macroscopic quantities behave irreversibly);
2.: Probability (irreversible behaviour is just very likely–or, in infinite systems, almost sure).

Boltzmann launched two different scenarios to make this work, both extremely influential. First, in 1872 [82] the coarse-graining of an N-particle system moving in some volume

V \subset R^{3}

was done by introducing a time-dependent macroscopic distribution function

f_{t}

, which for each time t is defined on the single-particle phase space

V \times R^{3} \subset R^{6}

, and which is a probability density in the sense that

N \cdot \int_{A} d^{3} r d^{3} v f_{t} (r, v)

is the “average” number of particles inside a region

A \subset V \times R^{3}

at time t (the normalization is

\int d^{3} r d^{3} v f_{t} (r, v) = 1

). Boltzmann argued that under various assumptions, notably his Stosszahlansatz (“molecular chaos”), which assumes probabilistic independence of two particles before they collide, as well as the absence of collisions between three or more particles, and finally some form of smoothness,

f_{t}

solves the Boltzmann equation

\partial_{t} f_{t} + v \cdot \partial_{r} f_{t} = C (f_{t}),

(94)

whose right-hand side is a quadratic integral expression in f taking the effect of two-body-collisions (or other two-particle interactions) into account. He then showed that the “entropy”

S (t) = - \int_{R^{6}} d r d v f_{t} (r, v, t) ln f_{t} (r, v, t)

(95)

satisfies

d S / d t \geq 0

whenever

f_{t}

solves (94), and saw this as a proof of irreversibility.

Historically, there were two immediate objections to this result (see also the references above). First, there is some tension between this irreversibility and the reversibility of Newton’s equations satisfied by the microscopic variables

(r_{0} (t), v_{0} (t), \dots r_{N - 1} (t), v_{N - 1} (t))

on which

f_{t}

is based (Loschmidt’s Umkehreinwand). Second, in a finite system any N-particle configuration eventually returns to a configuration arbitrarily close to its initial value (Zermelo’s Wiederkehreinwand). A general form of this phenomenon of Poincaré recurrence (e.g., Viana and Oliveira, 2016, Theorem 1.2.1) states that if

(X, P, T)

is a dynamical system, where P is T-invariant and

T : X \to X

is just assumed to be measurable, and

A \subset X

has positive measure, then for P-almost every

x \in E

there exists infinitely many

n \in N

for which

T^{n} (x) \in A

. These problems made Boltzmann’s conclusions look dubious, perhaps even circular (irreversibility having been put in by hand via the assumptions leading to the Boltzmann equation). Despite the famous later work by Lanford [83,84] on the derivation of the Boltzmann equation for short times, these issues remain controversial [85]. But I see a promising way forward, as follows [86,87,88,89]. From the point of view of Boltzmann [6], as rephrased in Section 2 (first bullet), the distribution function is just the empirical measure (15) for an N-particle system with

A = R^{6}

(hence A is uncountably infinite, but nonetheless the measure-theoretic situation remains unproblematic). Each N-particle configuration

x^{(N)} (t) : = (r_{0} (t), v_{0} (t), \dots r_{N - 1} (t), v_{N - 1} (t))

(96)

at time t determines a probability measure

L_{N} (x^{(N)} (t))

on A via

L_{N} (x^{(N)} (t)) = \frac{1}{N} \sum_{n = 0}^{N - 1} δ_{(r_{n} (t) v_{n} (t))} .

(97)

Physicists prefer densities (with respect to Lebesque measure

d r d v

) and Dirac

δ

-functions, writing

\begin{matrix} f_{t}^{(N)} : A^{N} & \to Dis (A); \end{matrix}

(98)

\begin{matrix} f_{t}^{(N)} (r_{0}, v_{0}, \dots r_{N - 1}, v_{N - 1}) : (r, v) & \mapsto \frac{1}{N} \sum_{k = 0}^{N - 1} δ (r - r_{k} (t)) δ (v - v_{k} (t)), \end{matrix}

(99)

where

Dis (A)

is the space of probability distributions on A. The connection is [Section 1.3] in [87]

d L_{N} (x^{(N)} (t)) = f_{t}^{(N)} (r_{0}, v_{0}, \dots r_{N - 1}, v_{N - 1}) d r d v .

(100)

The hope, then, is that

f_{t}^{(N)}

has a limit

f_{t}

as

N \to \infty

that has some smoothness and satisfies the Boltzmann equation. To accomplish this, the idea is to turn

{(f_{t}^{(N)})}_{t \geq 0}

into a stochastic process taking values in

Dis (A)

or in

Prob (A)

, based on a probability space

(A^{ω}, P^{ω})

, indexed by

N \in N

, and study the limit

N \to \infty

. More precisely, in a dilute gas (for which the Boltzmann equation was designed in the first place) one has

a^{3} ≪ 1 / ρ ≪ ℓ^{3}

, where a is the atom size (or some other length scale),

ρ = N / V

is the particle density, and ℓ is the mean free path (between collisions). Defining

ε = 1 / (ρ ℓ^{3})

, the limit “

N \to \infty

” is the Boltzmann–Grad limit

N \to \infty

and

ε \to 0

at constant

ε N

.

The probability measure

P^{ω}

on

A^{ω}

should, via “propagation of chaos”, implement the Stosszahlansatz. The simplest way to do this [83] is to take some initial value

f_{0}

for the envisaged solution

f_{t}

of the Boltzmann equation. This induces a probability measure

p \in Prob (A)

on A, which in turn yields the corresponding Bernoulli measure

P_{p}^{ω}

on

A^{ω}

. This construction may be generalized by taking some

μ \in Prob (Prob (A))

and averaging Bernoulli measures

P_{p}^{ω}

with respect to

μ

, as in de Finetti’s theorem [90]. Alternatively, take permutation-invariant probability measures

(P^{(N)})

on

A^{N}

for which the empirical measures

L_{N}

on A converge (in law) to some

p \in Prob (A)

as

N \to \infty

[87]. By [Prop. 2.2] in [91] this is equivalent to the factorization property, for all

g_{1}, g_{2} \in C_{b} (A)

,

lim_{N \to \infty} {〈 g_{1} \otimes g_{2} \otimes 1_{A} \dots \otimes 1_{A} 〉}_{P^{(N)}} = {〈 g_{1} 〉}_{μ} {〈 g_{2} 〉}_{μ} .

(101)

Either way, one’s hope is that

P^{ω}

-almost surely the random variables

f_{t}^{(N)}

have a smooth limit

f_{t}

as

N \to \infty

, which limit satisfies the Boltzmann equation, so that the macroscopic time-evolution

t \mapsto f_{t}

is induced by the microscopic time-evolution

t \mapsto x (t)

at least for the

P^{ω}

-a.e.

x \in X

for which

{lim}_{N \to \infty} f_{t} (x)

exists, where x is some configuration of infinitely many particles in

R^{3}

, including their velocities, cf. (96)–(100). This would even derive the Boltzmann equation.

Using large deviation theory, Bouchet [88] showed all this assuming the Stosszahlansatz (albeit at a mathematically heuristic level). This is impressive, but the argument would only be complete if one could prove that, in the spirit of the previous section,

{lim}_{N \to \infty} f_{t}^{(N)} (x)

exists for all

P^{ω}

-random

x \in X

, preferably by showing first that the Stosszahlansatz and the other assumptions used in the derivation of the Boltzmann equation (such as the absence of multiple-particle collisions) hold for all

P^{ω}

-random x. In particular, this would make it clear that the Stosszahlansatz is really a randomness assumption. Earman’s prinicple applies: Bouchet [88] showed that for finite N, the Boltzmann equation holds approximately for a set of initial conditions

x \in A^{N}

with high probability

P^{N}

. The resolution of the Umkehreinwand is then standard, see e.g., [Chapter 8] in [29]. Similarly, the Wiederkehreinwand is countered by noting that in an infinite system the recurrence time is infinite, whilst in a large system it is astronomically large (beyond the age of the universe).

While its realization for the Boltzmann equation may still be remote (for mathematical rather than conceptual reasons or matters of principle), this scenario can be demonstrated in the Kac ring model [92] (see also [81,93,94]), which is a caricature of the Boltzmann equation. Namely:

The microstates of the Kac ring model for finite N are pairs

$\begin{matrix} (x^{(N)}, y^{(N)}) \in 2^{2 N + 1} \times 2^{2 N + 1} \equiv A^{N}; & x^{(N)} = (x_{- N}, \dots, x_{N}); & y^{(N)} = (y_{- N}, \dots, y_{N}), \end{matrix}$

(102)

with $x_{n} \in 2, y_{n} \in 2$ . Here $x_{n}$ is seen as a spin that can be “up” ( $x_{n} = 1$ ) or “down” ( $x_{n} = 0$ ), whereas $y_{n}$ denotes the presence ( $y_{n} = 1$ ) or absence ( $y_{n} = 0$ ) of a scatterer, located between $x_{n}$ and $x_{n + 1}$ . These replace the variables $(r_{0}, v_{0}, \dots, r_{N - 1}, v_{N - 1}) \in R^{6 N}$ for the Boltzmann equation. In the thermodynamic limit we then have

$(x^{(N)}, y^{(N)}) \overset{N \to \infty}{⟶} (x, y) \in 2^{Z} \times 2^{Z} \equiv A^{ω} .$

(103)
The macrostates of the model, which replace the distribution function (99), form a pair

$\begin{matrix} m^{(N)} : A^{N} \to [0, 1], & m^{(N)} (x^{(N)}, y^{(N)}) : = \frac{1}{2 N + 1} \sum_{k = - N}^{N} x_{k}; \end{matrix}$

(104)

$\begin{matrix} s^{(N)} : A^{N} \to [0, 1], & s^{(N)} (x^{(N)}, y^{(N)}) = \frac{1}{2 N + 1} \sum_{k = - N}^{N} y_{k} . \end{matrix}$

(105)
The microdynamics replacing the time evolution $(r_{0} (t), v_{0} (t), \dots, r_{N - 1} (t), v_{N - 1} (t))$ generated by Newton’s equations with some potential, is now discretized, and is given by maps

$\begin{matrix} T^{(N)} : A^{N} \to A^{N}; & T^{(N)} {(x, y)}_{n + 1} & : = (x_{n}, y_{n}) & (y_{n} = 0); \end{matrix}$

(106)

$\begin{matrix} : = (1 - x_{n}, y_{n}) & (y_{n} = 1), \end{matrix}$

(107)

where $n = - N, \dots, N$ , with periodic boundary conditions, i.e.,

$(x_{N + 1}, y_{N + 1}) = (x_{- N}, y_{- N}) .$

(108)

The same formulae define the thermodynamic limit $T : A^{ω} \to A^{ω}$ . The idea is that in one time step the spin $x_{n}$ moves one place to the right and flips iff it hits a scatterer ( $y_{n} = 1$ ).
The macrodynamics, which replaces the solution of the Boltzmann equation, is given by

$\begin{matrix} Φ : [0, 1] \times [0, 1] \to [0, 1] \times [0, 1]; & Φ (\bar{m}, \bar{s}) = ((1 - 2 \bar{s}) (\bar{m} - \frac{1}{2}) + \frac{1}{2}, \bar{s}); \end{matrix}$

(109)

In particular, for $t \in N$ one has

$Φ^{t} (\bar{m}, \bar{s}) = ({(1 - 2 \bar{s})}^{t} (\bar{m} - \frac{1}{2}) + \frac{1}{2}, \bar{s}),$

(110)

and hence every initial state $(\bar{m}, \bar{s})$ with $\bar{s} \in (0, 1)$ reaches the “equilibrium” state $(\frac{1}{2}, \bar{s})$ , as

$lim_{t \to \infty} Φ^{t} (\bar{m}, \bar{s}) = (\frac{1}{2}, \bar{s}) .$

(111)
The macrodynamics (109) is induced by the microdynamics (107), that is,

(m^{(N)}, s^{(N)}) (T^{(N)} (x^{(N)}, y^{(N)})) = Φ ((m^{(N)}, s^{(N)}) (x^{(N)}, y^{(N)})),

(112)

provided the counterpart of the Stosszahlansatz for this model holds. For

N < \infty

this reads

#_{x = 1} (t + 1) = (1 - \bar{s}) #_{x = 1} (t) + \bar{s} #_{x = 0} (t),

(113)

i.e., the number of spins

x = 1

after one time step equals the number of spins that already had

x = 1

and did not scatter (where the probability of non-scattering is estimated to be

1 - \bar{s}

, i.e., the average density of voids), plus the number of spins

x = 0

that have flipped because they hit a scatterer (where the probability of scattering is estimated to be the average density

\bar{s}

of scatterers). This kind of averaging of course overlooks the details of the actual location of the scatterers versus the location of the spins with specific values. It is trivial to find configurations

(x, y)

where it is violated, but these become increasingly rare as

N \to \infty

.

I now state Theorem 3.5 in [81] due to Hiura and Sasa, which sharpens earlier results by Kac [92] in replacing a ‘for P-almost every x’ result by a ‘for all P-random x’ result that provides much more precise information on randomness. First, recall that if

(x, y) \in A^{ω}

is

P_{\bar{m}}^{ω} \times P_{\bar{s}}^{ω}

-random, then

lim_{N \to \infty} (m^{(N)}, s^{(N)}) (x^{(N)}, y^{(N)}) = (\bar{m}, \bar{s}) .

(114)

Theorem 15.

For all computable macrostates

(\bar{m}, \bar{s}) \in [0, 1] \times [0, 1]

and all

P_{\bar{m}}^{ω} \times P_{\bar{s}}^{ω}

-random microstates

(x, y) \in A^{ω}

, the macrodynamics (109) is induced by the microdynamics (107) as

N \to \infty

:

lim_{N \to \infty} (m^{(N)}, s^{(N)}) (T^{(N)} (x^{(N)}, y^{(N)})) = Φ (\bar{m}, \bar{s}) .

(115)

It follows that the “Boltzmann equation” (110) holds, and that the macrodynamics is autonomous: the dynamics of the macrostates

(\bar{m}, \bar{s})

does not explicitly depend on the underlying microstates.

Theorem 15 uses biased Martin-Löf randomness on

A^{ω}

and hence defines “typicality” out of equilibrium. As we have seen, equilibrium corresponds to

\bar{m} = \frac{1}{2}

(for arbitrary

\bar{s} \in (0, 1)

), for which the corresponding

P_{1 / 2}^{ω}

-random states are arguably the most random ones: for it follows from (73) that if s is

P_{p}^{ω}

-random for some

p \in (0, 1)

and

s^{'}

is

P_{f}^{ω}

-random, then

K (s_{| N}) \leq K (s_{| N}^{'}),

(116)

so that the approach to equilibrium

\bar{m} \to 1 / 2

increases (algorithmic) randomness, as expected. Similarly [81,93], the fine-grained (microscopic) entropy of some

P^{ω} \in Prob (A^{ω})

is defined by

h (P^{ω}) : = lim sup_{N \to \infty} - \frac{1}{N} \sum_{(x^{(N)}, y^{(N)}) \in A^{N}} P^{ω} ([x^{(N)}, y^{(N)}]) ln P^{ω} ([x^{(N)}, y^{(N)}]) .

(117)

For example, as in (36), the Bernoulli measure

P^{ω} = P_{(\bar{m}, \bar{s})}^{ω}

has fine-grained entropy

\begin{matrix} h (P_{(\bar{m}, \bar{s})}^{ω}) = h_{2} (\bar{m}, \bar{s}) = h_{2} (\bar{m}) + h_{2} (\bar{s}); & h_{2} (\bar{m}) = - \bar{m} {log}_{2} \bar{m} - (1 - \bar{m}) {log}_{2} (1 - \bar{m}), \end{matrix}

(118)

on which (92) gives an algorithmic perspective: for all

P_{(\bar{m}, \bar{s})}^{ω}

-random microstates

(x, y)

we have

h (P_{(\bar{m}, \bar{s})}^{ω}) = lim sup_{N \to \infty} \frac{K ({(x, y)}_{| N})}{N} .

(119)

On the other hand, the coarse-grained (macroscopic) entropy (22) for the flat prior

p = f

on A and the probability

μ = μ_{(\bar{m}, \bar{s})}

on

2 \times 2

defined by

μ_{(\bar{m}, \bar{s})} (1, 0) = \bar{m} \cdot (1 - \bar{s})

etc. is given by

s_{B} (μ (\bar{m}, \bar{s})) = h_{2} (\bar{m}, \bar{s}) - 2 ln 2,

(120)

cf. (23). Despite the similarity of (118) and (120), we should keep them apart. Irreversibility of the macroscopic dynamics does not contradict reversibility of the microscopic dynamics, even though the fine-grained and coarse-grained entropies practically coincide here. In this case,

{(T^{(N)})}^{- 1}

is given by (107) with

n + 1 ⇝ n - 1

(where ⇝ stands for ‘is replaced by’) and

y_{n} = 0 / 1 ⇝ y_{n - 1} = 0 / 1

, so that the spin now moves to the left. Defining time reversal

τ : A^{ω} \to A^{ω}

by

τ {(x, y)}_{n} : = (x_{- n}, y_{- n - 1}),

(121)

so that

τ \circ T^{(N)} = {(T^{(N)})}^{- 1} \circ τ

as appropriate, one even has

Φ \circ τ = Φ

. But the real point is that if

(x, y)

is

P_{(\bar{m}, \bar{s})}^{ω}

-random, then “typically”,

τ (x, y)

is not. In fact, the entire (neo) Boltzmannian program can be carried out in this model [29,81,93]. In particular, the coarse-grained entropy (118) is invariant under the microscopic time evolution T, whereas the fine-grained entropy (120) increases along solutions of the “Boltzmann equation” (110).

6. Applications to Quantum Mechanics

There is yet another interpretation of the diagram at the beginning of Section 2: in quantum mechanics a string

σ \in A^{N}

denotes the outcome of a run of N repeated measurements of the same observable

A

with finite spectrum A in the same quantum state, so that the possible outcomes

a \in A

are distributed according to the Born rule: if H is the Hilbert space pertinent to the experiment,

A \in B (H)

is the observable that is being measured, with spectrum

A = σ (A)

and spectral projections

E_{a}

onto the eigenspace

H_{a}

for eigenvalue a, and

\hat{ρ}

is the density operator describing the quantum state, then

p (a) = Tr (\hat{ρ} E_{a})

. It can be shown that if we consider the run as a single experiment, the probability of outcome

σ

is

P_{p}^{N} (σ)

, as in a classical Bernoulli trial. This extends to the idealized case of an infinitely repeated experiment, described by the probability measure

P_{p}^{ω}

on

A^{ω}

[95,96]. In particular, for a “fair quantum toss” (in which

A = 2

with

p (1) = p (0) = 1 / 2

), it follows that the outcome sequences sample the probability space

(2^{ω}, P_{f}^{ω})

, just as in the classical case.

For quantum mechanics itself, this implies that

P_{f}^{ω}

-almost every outcome sequence

s \in 2^{ω}

is

P_{f}^{ω}

-random. The theme of Section 4 then leads to the circular conclusion that all

P_{f}^{ω}

-random outcome sequences are

P_{f}^{ω}

-random. Nonetheless, this circularity comes back with a vengeance if we turn to hidden variable theories, notably Bohmian mechanics [97]. Let me first summarize my original argument [96,98], and then reply to a potential counterargument.

In hidden variable theories there is a space

Λ

of hidden variables, and if the theory has the right to call itself “deterministic”, then there must be functions

h : N \to Λ

and

g : Λ \to A

such that

s = g \circ h .

(122)

The existence of g expresses the idea that the value of

λ

determines the outcome of the experiment. The function g tacitly incorporates all details of the experiment that may affect its outcome, except the hidden variable

λ

(which is the argument of g). Such details may include the setting, a possible context, and the quantum state. The existence of g therefore does not contradict the Kochen–Specker theorem (which excludes context-dependence). But g is just one ingredient that makes a hidden variable theory deterministic. The other is the function h that gives the value of

λ

in experiment number n in a long run, for each n. Furthermore, in any hidden variable theory the probability of the outcome of some measurement if the hidden variable λ is unknown is given by averaging the determined outcomes given by g with respect to some probability measure

μ_{ψ}

on

Λ

defined by the quantum state

ψ

supposed to describe the experiment within quantum mechanics.

Theorem 16.

The functions g and h cannot both be provided by any deterministic theory (and hence deterministic hidden variable theories that exactly reproduce the Born rule cannot exist).

Proof.

The Born rule is needed to prove that outcome sequences

s \in A^{ω}

are

P_{p}^{ω}

-distributed [Theorem 3.4.1] in [96]. If g and h were explicitly given by some deterministic theory T, then the sequence s would be described explicitly via (122). By (what I call) Chaitin’s second incompleteness theorem, the sequence s cannot then be

P_{p}^{ω}

-random. □

The theorem used here states that if

s \in A^{ω}

is

P_{p}^{ω}

-random, then ZFC (or any sufficiently comprehensive mathematical theory T meant in the proof of Theorem 16) can compute only finite many digits of s. See e.g., [Theorem 8.7] in [49], which is stated for Chaitin’s famous random number

Ω

but whose proof holds for any

P_{p}^{ω}

-random sequence. Consistent with Earman’s principle, Theorem 16 does not rely on the idealization of infinitely long runs of measurements, since for finite runs Chaitin’s (first) incompleteness theorem leads to a similar contradiction. The latter theorem states that for any sound mathematical theory T containing enough arithmetic there is a constant

C \in N

such that T cannot prove any sentence of the form

K (σ) > C

although infinitely many such sentences are true. In other words, T can only prove randomness of finitely many strings, although infinitely many strings are in fact random. See e.g., [Theorem 8.4] in [49].

The upshot is that a deterministic theory cannot produce random sequences. Against this, fans of deterministic hidden variable theories could argue that the (unilateral) Bernoulli shift S on

2^{ω}

(equipped with

P_{f}^{ω}

for simplicity) is deterministic and yet is able to produce random sequences. Indeed, following a suggestion by Jos Uffink (who is not even a Bohmian!), this can be done as follows (readers familiar with [99] will notice that the following scenario would actually be optimal for these authors!). With

Λ = A = 2

, and the simplest experiment for which

g : 2 \to 2

is the identity (so that the measurement just reveals the actual value of the pertinent hidden variable), take an initial condition

s^{'} \in 2^{ω}

, and define

h : N \to 2

by

h (n) = s^{'} (n) .

(123)

Then

s = s^{'}

. In other words, imagine that experiment number

n \in N

takes place at time

t = n

, at which time the hidden variable takes the value

λ = s^{'} (n)

. The measurement run then just reads the tape

s^{'}

. Trivially, if the initial condition

s^{'}

is

P_{p}^{ω}

-random, then so is the outcome sequence s.

According to the key Bohmian doctrine [99], the randomness of outcomes in the deterministic world envisaged in Bohmian mechanics originates in the random initial condition of universe, which is postulated to be in “quantum equilibrium”. In the above toy example, the configuration space (which in Bohmian mechanics is

R^{3 N}

) is replaced and idealized by

2^{ω}

, i.e., the role of the position variable

q \in R^{3 N}

is now played by

s \in 2^{ω}

; the dynamics (replacing the Schrödinger equation) is the shit map S; and the “quantum equilibrium condition” (which is nothing but the Born rule) then postulates that its initial value

s^{'}

is distributed according to the Born rule, which here is the fair Bernoulli measure

P_{f}^{ω}

. The Bohmian explanation of randomness then comes down to the claim that despite the determinism inherent in the dynamics S as well as in the measurement theory g:

Each experimental outcome s(n) is random because the hidden variable λ is randomly distributed.

Since s′ = s, this simply says that s is random because s is random.

Even in less simplistic scenarios, using the language of computation theory (taking computability as a metaphor for determinism) we may say: deterministic hidden variable theories need a random oracle to reproduce the randomness required for quantum mechanics. This defeats their determinism.

7. Summary

This paper was motivated by a number of (closely related) questions, including these:

1.: Is it probability or randomness that “comes first”? How are these concepts related?
2.: Could the notion of “typicality” as it is used in Boltzmann-style statistical mechanics [29] be replaced by some precise mathematical form of randomness?
3.: Are “typical” trajectories in “chaotic” dynamical systems (i.e., those with high Kolmogorov–Sinai entropy) random in the same, or some similar sense?

Here, “typical” means “extremely probable”, which may be idealized to “occurring almost surely”.

My attempts to address these questions are guided by what I call Earman’s principle, stated after Theorem 3 in Section 3, which regulates the connection between actual and idealized physical theories. On this score, P-randomness (see Section 3) does quite well, cf. Theorems 2 and 3, although I have some misgivings about the physical relevance of its mathematical origins in the theory of computation, which for physical applications should be replaced by some abstract logical form of determinism.

Various mathematical examples provide situations where some property

Φ (x)

that holds for P-almost every

x \in X

(where P is some probability measure on X) in fact holds for all P-random

x \in X

, at least under some further computability assumptions, see Section 4. The main result in Section 5, i.e., Theorem 15 due to Hiura and Sasa [81], as well as the much better known results about the relationship between entropy, dynamical systems, and P-randomness reviewed Section 2 and Section 4, notably Theorem 1 and Equation (90), provide positive answers to questions 2 and 3. This, in turn, paves the way for an explanation of emergent phenomena like irreversibility and chaos, and suggests that the answer to question 1 is that at least the computational concept of P-randomness requires a prior probability P.

Funding

This research received no external funding.

Data Availability Statement

There are no relevant data.

Acknowledgments

This paper originated in a talk at the “Kolmogorov Workshop” in Geneva, 16 February 2023. I thank Alexei Grinbaum for the invitation, as well as various participants for questions. My presentation of Martin-Löf randomness benefited from discussions with Cristian Calude. I also thank Chris Porter and Sylvia Wenmackers for helpful correspondence. The recent volume Algorithmic Randomness: Progress and Prospects, eds. J.Y. Franklin and C.P. Porter (Cambridge University Press, 2020) [100], greatly faciliated this work.

Conflicts of Interest

The author declares no conflict of interest.

References

Brush, S.G. The Kind of Motion We Call Heat; North-Holland: Amsterdam, The Netherlands, 1976. [Google Scholar]
Sklar, L. Physics and Chance: Philosophical Issues in the Foundations of Statistical Mechanics; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
Uffink, J. Compendium of the foundations of classical statistical physics. In Handbook of the Philosophy of Science; Butterfield, J., Earman, J., Eds.; North-Holland: Amsterdam, The Netherlands, 2007; Volume 2: Philosophy of Physics; Part B; pp. 923–1074. [Google Scholar]
Uffink, J. Boltzmann’s Work in Statistical Physics. The Stanford Encyclopedia of Philosophy; Summer 2022 Edition; Zalta, E.N., Ed.; Stanford University: Stanford, CA, USA, 2022; Available online: https://plato.stanford.edu/archives/sum2022/entries/statphys-Boltzmann/ (accessed on 15 June 2023).
Von Plato, J. Creating Modern Probability; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
Boltzmann, L. Über die Beziehung dem zweiten Haubtsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive den Sätzen über das Wärmegleichgewicht. Wien. Berichte 1877, 76, 373–435, Reprinted in Boltzmann, L. Wissenschaftliche Abhandlungen, Hasenöhrl, F., Ed.; Chelsea: London, UK, 1969; Volume II, p. 39. English translation in Sharp, K.; Matschinsky, F., Entropy 2015, 17, 1971–2009. [Google Scholar] [CrossRef] [Green Version]
Einstein, A. Zum Gegenwärtigen STAND des Strahlungsproblem. Phys. Z. 1909, 10, 185–193, Reprinted in The Collected Papers of Albert Einstein; Stachel, J., et al., Eds.; Princeton University Press: Princeton, NJ, USA, 1990; Volume 2; Doc.56, pp. 542–550. Available online: https://einsteinpapers.press.princeton.edu/vol2-doc/577 (accessed on 15 June 2023); English Translation Supplement. pp. 357–375. Available online: https://einsteinpapers.press.princeton.edu/vol2-trans/371 (accessed on 15 June 2023).
Ellis, R.S. Entropy, Large Deviations, and Statistical Mechanics; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
Ellis, R.S. An overview of the theory of large deviations and applications to statistical mechanics. Scand. Actuar. J. 1995, 1, 97–142. [Google Scholar] [CrossRef]
Lanford, O.E. Entropy and Equilibrium States in Classical Statistical Mechanics; Lecture Notes in Physics; Springer: Berlin/Heidelberg, Germany, 1973; Volume 20, pp. 1–113. [Google Scholar]
Martin-Löf, A. Statistical Mechanics and the Foundations of Thermodynamics; Lecture Notes in Physics; Springer: Berlin/Heidelberg, Germany, 1979; Volume 101, pp. 1–120. [Google Scholar]
McKean, H. Probability: The Classical Limit Theorems; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Von Mises, R. Grundlagen der Wahrscheinlichkeitsrechnung. Math. Z. 1919, 5, 52–99. [Google Scholar] [CrossRef] [Green Version]
Von Mises, R. Wahrscheinlichkeit, Statistik, und Wahrheit, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1936. [Google Scholar]
Van Lambalgen, M. Random Sequences. Ph.D. Thesis, University of Amsterdam, Amsterdam, The Netherlands, 1987. Available online: https://www.academia.edu/23899015/RANDOM_SEQUENCES (accessed on 15 June 2023).
Van Lambalgen, M. Randomness and foundations of probability: Von Mises’ axiomatisation of random sequences. In Statistics, Probability and Game Theory: Papers in Honour of David Blackwell; IMS Lecture Notes–Monograph Series; IMS: Beachwood, OH, USA, 1996; Volume 30, pp. 347–367. [Google Scholar]
Porter, C.P. Mathematical and Philosophical Perspectives on Algorithmic Randomness. Ph.D. Thesis, University of Notre Dame, Notre Dame, IN, USA, 2012. Available online: https://www.cpporter.com/wp-content/uploads/2013/08/PorterDissertation.pdf (accessed on 15 June 2023).
Kolmogorov, A.N. Grundbegriffe de Wahrscheinlichkeitsrechnung; Springer: Berlin/Heidelberg, Germany, 1933. [Google Scholar]
Kolmogorov, A.N. Three Approaches to the Quantitative Definition of Information. Probl. Inf. Transm. 1965, 1, 3–11. Available online: http://alexander.shen.free.fr/library/Kolmogorov65_Three-Approaches-to-Information.pdf (accessed on 15 June 2023). [CrossRef]
Kolmogorov, A.N. Logical Basis for information theory and probability theory. IEEE Trans. Inf. Theory 1968, 14, 662–664. [Google Scholar] [CrossRef] [Green Version]
Cover, T.M.; Gács, P.; Gray, R.M. Kolmogorov’s contributions to information theory and algorithmic complexity. Ann. Probab. 1989, 17, 840–865. [Google Scholar] [CrossRef]
Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Porter, C.P. Kolmogorov on the role of randomness in probability theory. Math. Struct. Comput. Sci. 2014, 24, e240302. [Google Scholar] [CrossRef] [Green Version]
Zvonkin, A.K.; Levin, L.A. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russ. Math. Surv. 1970, 25, 83–124. [Google Scholar] [CrossRef] [Green Version]
Landsman, K. Randomness? What randomness? Found. Phys. 2020, 50, 61–104. [Google Scholar] [CrossRef] [Green Version]
Porter, C.P. The equivalence of definitions of algorithmic randomness. Philos. Math. 2021, 29, 153–194. [Google Scholar] [CrossRef]
Georgii, H.-O. Probabilistic aspects of entropy. In Entropy; Greven, A., Keller, G., Warnecke, G., Eds.; Princeton University Press: Princeton, NJ, USA, 2003; pp. 37–54. [Google Scholar]
Grünwald, P.D.; Vitányi, P.M.B. Kolmogorov complexity and Information theory. With an interpretation in terms of questions and answers. J. Logic, Lang. Inf. 2003, 12, 497–529. [Google Scholar] [CrossRef]
Bricmont, L. Making Sense of Statistical Mechanics; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Austin, T. Math 254A: Entropy and Ergodic Theory. 2017. Available online: https://www.math.ucla.edu/~tim/entropycourse.html (accessed on 15 June 2023).
Dembo, A.; Zeitouni, A. Large Deviations: Techniques and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Dorlas, T.C. Statistical Mechanics: Fundamentals and Model Solutions, 2nd ed.; CRC: Boca Raton, FL, USA, 2022. [Google Scholar]
Ellis, R.S. The theory of large deviations: From Boltzmann’s 1877 calculation to equilibrium macrostates in 2D turbulence. Physica D 1999, 133, 106–136. [Google Scholar] [CrossRef]
Borwein, J.M.; Zhu, Q.J. Techniques of Variational Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Lesne, A. Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Math. Struct. Comput. Sci. 2014, 24, e240311. [Google Scholar] [CrossRef] [Green Version]
MacKay, D.J. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Kolmogorov, A.N. New metric invariant of transitive dynamical systems and endomorphisms of Lebesgue spaces. Dokl. Russ. Acad. Sci. 1958, 119, 861–864. [Google Scholar]
Viana, M.; Oliveira, K. Foundations of Ergodic Theory; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Charpentier, E.; Lesne, A.; Nikolski, N.K. Kolmogorov’s Heritage in Mathematics; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Castiglione, P.; Falcioni, M.; Lesne, A.; Vulpiani, A. Chaos and Coarse Graining in Statistical Mechanics; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Martin-Löf, P. The definition of random sequences. Inf. Control 1966, 9, 602–619. [Google Scholar] [CrossRef] [Green Version]
Hertling, P.; Weihrauch, K. Random elements in effective topological spaces with measure. Inform. Comput. 2003, 181, 32–56. [Google Scholar] [CrossRef] [Green Version]
Hoyrup, M.; Rojas, C. Computability of probability measures and Martin-Löf randomness over metric spaces. Inf. Comput. 2009, 207, 830–847. [Google Scholar] [CrossRef] [Green Version]
Gács, P.; Hoyrup, M.; Rojas, C. Randomness on computable probability spaces—A dynamical point of view. Theory Comput. Syst. 2011, 48, 465–485. [Google Scholar] [CrossRef] [Green Version]
Bienvenu, L.; Gács, P.; Hoyrup, M.; Rojas, C. Algorithmic tests and randomness with respect to a class of measures. Proc. Steklov Inst. Math. 2011, 274, 34–89. [Google Scholar] [CrossRef] [Green Version]
Hoyrup, M.; Rute, J. Computable measure theory and algorithmic randomness. In Handbook of Computability and Complexity in Analysis; Springer: Berlin/Heidelberg, Germany, 2021; pp. 227–270. [Google Scholar]
Calude, C.S. Information and Randomness: An Algorithmic Perspective, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Nies, A. Computability and Randomness; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
Downey, R.; Hirschfeldt, D.R. Algorithmic Randomness and Complexity; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Kjos-Hansen, B.; Szabados, T. Kolmogorov complexity and strong approximation of Brownian motion. Proc. Am. Math. Soc. 2011, 139, 3307–3316. [Google Scholar] [CrossRef] [Green Version]
Chaitin, G.J. A theory of program size formally identical to information theory. J. ACM 1975, 22, 329–340. [Google Scholar] [CrossRef]
Gács, P. Exact expressions for some randomness tests. In Theoretical Computer Science 4th GI Conference; Weihrauch, K., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1979; Volume 67, pp. 124–131. [Google Scholar]
Levin, L.A. On the notion of a random sequence. Sov. Math.-Dokl. 1973, 14, 1413–1416. [Google Scholar]
Earman, J. Curie’s Principle and spontaneous symmetry breaking. Int. Stud. Phil. Sci. 2004, 18, 173–198. [Google Scholar] [CrossRef]
Mörters, P.; Peres, Y. Brownian Motion; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Billingsley, P. Convergence of Probability Measures; Wiley: Hoboken, NJ, USA, 1968. [Google Scholar]
Asarin, E.A.; Prokovskii, A.V. Use of the Kolmogorov complexity in analysing control system dynamics. Autom. Remote Control 1986, 47, 21–28. [Google Scholar]
Fouché, W.L. Arithmetical representations of Brownian motion I. J. Symb. Log. 2000, 65, 421–442. [Google Scholar] [CrossRef]
Fouché, W.L. The descriptive complexity of Brownian motion. Adv. Math. 2000, 155, 317–343. [Google Scholar] [CrossRef] [Green Version]
Vovk, V.G. The law of the iterated logarithm for random Kolmogorov, or chaotic, sequences. Theory Probab. Its Appl. 1987, 32, 413–425. [Google Scholar] [CrossRef]
Brattka, V.; Miller, J.S.; Nies, A. Randomness and differentiability. Trans. Am. Math. Soc. 2016, 368, 581–605. [Google Scholar] [CrossRef] [Green Version]
Rute, J. Algorithmic Randomness and Constructive/Computable Measure Theory; Franklin & Porter: New York, NY, USA, 2020; pp. 58–114. [Google Scholar]
Downey, R.; Griffiths, E.; Laforte, G. On Schnorr and computable randomness, martingales, and machines. Math. Log. Q. 2004, 50, 613–627. [Google Scholar] [CrossRef]
Bienvenu, L.; Day, A.R.; Hoyrup, M.; Mezhirov, I.; Shen, A. A constructive version of Birkhoff’s ergodic theorem for Martin-Löf random points. Inf. Comput. 2012, 210, 21–30. [Google Scholar] [CrossRef] [Green Version]
Galatolo, S.; Hoyrup, M.; Rojas, C. Effective symbolic dynamics, random points, statistical behavior, complexity and entropy. Inf. Comput. 2010, 208, 23–41. [Google Scholar] [CrossRef] [Green Version]
Pathak, N.; Rojas, C.; Simpson, S. Schnorr randomness and the Lebesgue differentiation theorem. Proc. Am. Math. Soc. 2014, 142, 335–349. [Google Scholar] [CrossRef] [Green Version]
V’yugin, V. Effective convergence in probability and an ergodic theorem for individual random sequences. SIAM Theory Probab. Its Appl. 1997, 42, 39–50. [Google Scholar] [CrossRef]
Towsner, H. Algorithmic Randomness in Ergodic Theory; Franklin & Porter: New York, NY, USA, 2020; pp. 40–57. [Google Scholar]
V’yugin, V. Ergodic theorems for algorithmically random points. arXiv 2022, arXiv:2202.13465. [Google Scholar]
Brudno, A.A. Entropy and the complexity of the trajectories of a dynamic system. Trans. Mosc. Math. Soc. 1983, 44, 127–151. [Google Scholar]
White, H.S. Algorithmic complexity of points in dynamical systems. Ergod. Theory Dyn. Syst. 1993, 15, 353–366. [Google Scholar] [CrossRef]
Batterman, R.W.; White, H. Chaos and algorithmic complexity. Found. Phys. 1996, 26, 307–336. [Google Scholar] [CrossRef]
Porter, C.P. Biased Algorithmic Randomness; Franklin and Porter: New York, NY, USA, 2020; pp. 206–231. [Google Scholar]
Brudno, A.A. The complexity of the trajectories of a dynamical system. Russ. Math. Surv. 1978, 33, 207–208. [Google Scholar] [CrossRef]
Schack, R. Algorithmic information and simplicity in statistical physics. Int. J. Theor. Phys. 1997, 36, 209–226. [Google Scholar] [CrossRef] [Green Version]
Fouché, W.L. Dynamics of a generic Brownian motion: Recursive aspects. Theor. Comput. Sci. 2008, 394, 175–186. [Google Scholar] [CrossRef] [Green Version]
Allen, K.; Bienvenu, L.; Slaman, T. On zeros of Martin-Löf random Brownian motion. J. Log. Anal. 2014, 6, 1–34. [Google Scholar]
Fouché, W.L.; Mukeru, S. On local times of Martin-Löf random Brownian motion. arXiv 2022, arXiv:2208.01877. [Google Scholar]
Hiura, K.; Sasa, S. Microscopic reversibility and macroscopic irreversibility: From the viewpoint of algorithmic randomness. J. Stat. Phys. 2019, 177, 727–751. [Google Scholar] [CrossRef] [Green Version]
Boltzmann, L. Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Wien. Berichte 1872, 66, 275–370, Reprinted in Boltzmann, L. Wissenschaftliche Abhandlungen; Hasenöhrl, F., Ed.; Chelsea: London, UK, 1969; Volume I, p. 23. English translation in Brush, S. The Kinetic Theory of Gases: An Anthology of Classic Papers with Historical Commentary; Imperial College Press: London, UK, 2003; pp. 262–349. [Google Scholar]
Lanford, O.E. Time evolution of large classical systems. In Dynamical Systems, Theory and Applications; Lecture Notes in Theoretical Physics; Moser, J., Ed.; Springer: Berlin/Heidelberg, Germany, 1975; Volume 38, pp. 1–111. [Google Scholar]
Lanford, O.E. On the derivation of the Boltzmann equation. Astérisque 1976, 40, 117–137. [Google Scholar]
Ardourel, V. Irreversibility in the derivation of the Boltzmann equation. Found. Phys. 2017, 47, 471–489. [Google Scholar] [CrossRef]
Villani, C. A review of mathematical topics in collisional kinetic theory. In Handbook of Mathematical Fluid Dynamics; Friedlander, S., Serre, D., Eds.; Elsevier: Amsterdam, The Netherlands, 2002; Volume 1, pp. 71–306. [Google Scholar]
Villani, C. (Ir)reversibility and Entropy. In Time Progress in Mathematical Physics; Duplantier, B., Ed.; (Birkhäuser): Basel, Switzerland, 2013; Volume 63, pp. 19–79. [Google Scholar]
Bouchet, F. Is the Boltzmann equation reversible? A Large Deviation perspective on the irreversibility paradox. J. Stat. Phys. 2020, 181, 515–550. [Google Scholar] [CrossRef]
Bodineau, T.; Gallagher, I.; Saint-Raymond, L.; Simonella, S. Statistical dynamics of a hard sphere gas: Fluctuating Boltzmann equation and large deviations. arXiv 2020, arXiv:2008.10403. [Google Scholar]
Aldous, D.L. Exchangeability and Related Topics; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1985; Volume 1117, pp. 1–198. [Google Scholar]
Sznitman, A. Topics in Propagation of Chaos; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1991; Volume 1464, pp. 164–251. [Google Scholar]
Kac, N. Probability and Related Topics in Physical Sciences; Wiley: Hoboken, NJ, USA, 1959. [Google Scholar]
Maes, C.; Netocny, K.; Shergelashvili, B. A selection of nonequilibrium issues. arXiv 2007, arXiv:math-ph/0701047. [Google Scholar]
De Bièvre, S.; Parris, P.E. A rigourous demonstration of the validity of Boltzmann’s scenario for the spatial homogenization of a freely expanding gas and the equilibration of the Kac ring. J. Stat. Phys. 2017, 168, 772–793. [Google Scholar] [CrossRef] [Green Version]
Landsman, K. Foundations of Quantum Theory: From Classical Concepts to Operator Algebras; Springer Open: Berlin/Heidelberg, Germany, 2017; Available online: https://www.springer.com/gp/book/9783319517766 (accessed on 15 June 2023).
Landsman, K. Indeterminism and undecidability. In Undecidability, Uncomputability, and Unpredictability; Aguirre, A., Merali, Z., Sloan, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; pp. 17–46. [Google Scholar]
Goldstein, S. Bohmian Mechanics. The Stanford Encyclopedia of Philosophy. 2017. Available online: https://plato.stanford.edu/archives/sum2017/entries/qm-bohm/ (accessed on 15 June 2023).
Landsman, K. Bohmian mechanics is not deterministic. Found. Phys. 2022, 52, 73. [Google Scholar] [CrossRef]
Dürr, D.; Goldstein, S.; Zanghi, N. Quantum equilibrium and the origin of absolute uncertainty. J. Stat. Phys. 1992, 67, 843–907. [Google Scholar] [CrossRef] [Green Version]
Franklin, J.Y.; Porter, C.P. (Eds.) Algorithmic Randomness: Progress and Prospects; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]

Figure 1. Sample configuration on N objects each of which can be in q different states.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Landsman, K. Typical = Random. Axioms 2023, 12, 727. https://doi.org/10.3390/axioms12080727

AMA Style

Landsman K. Typical = Random. Axioms. 2023; 12(8):727. https://doi.org/10.3390/axioms12080727

Chicago/Turabian Style

Landsman, Klaas. 2023. "Typical = Random" Axioms 12, no. 8: 727. https://doi.org/10.3390/axioms12080727

APA Style

Landsman, K. (2023). Typical = Random. Axioms, 12(8), 727. https://doi.org/10.3390/axioms12080727

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Typical = Random

Abstract

1. Introduction

2. Some Background on Entropy and Probability

3. P-Randomness

4. From ‘for P-Almost Every x’ to ‘for All P-Random x’

5. Applications to Statistical Mechanics

6. Applications to Quantum Mechanics

7. Summary

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI