Typical = random

This expository paper advocates an approach to physics in which ``typicality"is identified with a suitable form of algorithmic randomness. To this end various theorems from mathematics and physics are reviewed. Their original versions state that some property F(x) holds for P-almost all x in X, where P is a probability measure on some space X. Their more refined (and typically more recent) formulations show that F(x) holds for all P-random x in X. The computational notion of P-randomness used here generalizes the one introduced by Martin-L"of in 1966 in a way now standard in algorithmic randomness. Examples come from probability theory, analysis, dynamical systems/ergodic theory, statistical mechanics, and quantum mechanics (especially hidden variable theories). An underlying philosophical theme, inherited from von Mises and Kolmogorov, is the interplay between probability and randomness, especially: which comes first?


Introduction
The introduction of probability in statistical mechanics in the 19th century by Maxwell and Boltzmann immediately raised questions about both the meaning of this concept by itself and its relationship to randomness and entropy (Brush, 1976;Sklar, 1993;von Plato, 1994;Uffink, 2007Uffink, , 2022)).Roughly speaking, both initially felt that probabilities in statistical mechanics were dynamically generated by particle trajectories, a view which led to ergodic theory.But subsequently Boltzmann (1877) introduced his counting arguments as a new start; these led to his famous formula for the entropy S = k logW on his gravestone in Vienna, i.e., probability first, entropy second.
This was turned on its head by Einstein (1909), who had rediscovered much of statistical mechanics by himself in his early work, always stressing the role of fluctuations.He expressed the probability of energy fluctuations in terms of entropy seen as a primary concept.This suggests: entropy first, probability second.
From the modern point of view of large deviation theory (Lanford, 1973; A. Martin-Löf, 1979-the older brother of P. Martin-Löf-; Ellis, 1985), what happens is that for finite N some stochastic process (X N ) fluctuates around its limiting value X as N → ∞ (if it has one), and, under favorable circumstances that often obtain in statistical mechanics, the "large" i.e.O(1) fluctuations (as opposed to the O(1/ √ N) fluctuations, which are described by the central limit theorem, cf.McKean, 2014) can be computed via an entropy function S(x) whose argument x lies in the (common) codomain X of X N : Ω → X .Since the domain Ω of X N carries a probability measure to begin with, it seems an illusion that entropy could be defined without some prior notion of probability.
Similar question may be asked about the connection between probability and randomness (and, closing the triangle, of course also about the relationship between randomness and entropy).First, in his influential (but flawed) work on the foundations of probability, von Mises (1919Mises ( , 1936) ) initially defined randomness through a Kollektiv (which, with hindsight, was a precursor to a random sequence).From this, he extracted a notion of probability via asymptotic relative frequencies.See also van Lambalgen (1987van Lambalgen ( , 1996)), von Plato (1994), and Porter (2012).Von Plato (1994, p. 190) writes that 'He [von Mises] was naturally aware of the earlier attempts of Einstein and others at founding statistical physics on classical dynamics' and justifies this view in his §6. 3. Thus: randomness first, probability second.Kolmogorov (1933), on the other hand, (impeccably) defined probability first (via measure theory), in terms of which he hoped to understand randomness.In other words, his (initial) philosophy was: probability first, randomness second.
Having realized that this was impossible, thirty years later Kolmogorov arrived at the concept of randomness named after him, using tools from computer and information science that actually had roots in in the work of von Mises (as well as of Turing, Shannon, and others).See van Lambalgen (1987), Cover et al. (1989), von Plato (1994), Li & Vitányi (2008), and Porter (2014) Figure 1: Sample configuration on N objects each of which can be in q different states But I will argue that even Kolmogorov randomness seems to rely on some prior concept of probability, see §3 and in particular the discussion surrounding Theorem 3.8; and this is obviously the case for Martin-Löf-randomness, both in its original form for binary sequences (which is essentially equivalent to Kolmogorov randomness as extended from finite strings to infinite sequences, see Theorem 3.7) and in its generalizations (see §3).So I will defend the view that after all we have some prior probability measure first, Martin-Löf randomness second.
In any case, there isn't a single concept of randomness (Landsman, 2020), not even within the algorithmic setting (Porter, 2021); although the above slogan probably applies to most of them.Motivated by the above discussion and its potential applications to physics, the aim of this paper is to review the interplay between probability, (algorithmic) randomness, and entropy via examples from probability itself, analysis, dynamical systems and (Boltzmann-style) statistical mechanics, and quantum mechanics.Some basic relations are explained in the next §2.In §3 I review algorithmic randomness beyond binary sequences.Section 4 introduces some key "intuition pumps": these are results in which 'for P-almost every x: Φ(x)' in some "classical" result can be replaced by 'for all P-random x: Φ(x)' in an "effective" counterpart thereof; this replacement may even be seen as the essence of algorithmic randomness.In section 5 I apply this idea to statistical mechanics, and close in §6 with some brief comments on quantum mechanics.The paper closes with a brief summary.
2 Some background on entropy and probability Consider the following diagram, which connects and illustrates the main examples in this paper.
Here N ∈ N is meant to be some large natural number, whereas q ∈ {2, 3, . ..}, the cardinality of A = {a 0 , . . ., a q−1 }, (2.1) could be anything (finite), but is small (q = 2) in the already interesting case of binary strings.In what follows, A N is the set of all functions σ : N → A, where as usual in set theory.Such a function is also called a string over A, having length We write either σ (n) or σ n for its value at n ∈ N, and may write σ as σ 0 σ 1 • • • σ N−1 .In particular, if A = 2 = {0, 1}, then σ is a binary string.I write so that 2 * = N∈N 2 N is the set of all binary strings.Thus a (binary) string σ is finite, whereas a (binary) sequence s is infinite.The set of all binary sequences is denoted by 2 ω , and likewise A ω consists of all functions s : Using this notation, I now review various ways of looking at the above diagram.Especially in the first two items below it is hard to avoid overlap with e.g.Georgii (2003) and Grünwald & Vitányi (2003), which I recommend for further information.
• In statistical mechanics as developed by Boltzmann (1877), and more generally in what one might call "Boltzmann-style statistical mechanics", which is based on typicality arguments (Bricmont, 2022), N is the number of (distinguishable) particles under consideration, and A could be a finite set of single-particle energy levels.More generally, a ∈ A is some property each particle may separately have, such as its location in cell X a relative to some partition of the single-particle phase space or configuration space X accessible to each particle.Here X a ⊂ X and different X a are disjoint, which fact is expressed by the symbol in (2.5).One might replace by as long as one knows that the subsets X a are mutually disjoint (and measurable as appropriate).The microstate σ ∈ A N is a function σ : {0, 1, . . ., N − 1} → A, written n → σ (n) or n → σ n , that specifies which property (among the possibilities in A) each particle has.Thus also spin chains fall under this formalism, where σ n ∈ A is some internal degree of freedom at site n.In Boltzmann-style arguments it is often assumed that each microstate is equally likely, which corresponds to the probability P N f on A N defined by for each σ ∈ A N .This is the Bernoulli measure on A N induced by the flat prior p = f on A, for each a ∈ A. More generally, P N p is the Bernoulli measure on A N induced by some probability distribution p on A; that is, the product measure of N copies of p; some people write P N p = p ×N .This extends to the idealized case A ω , as follows.For σ ∈ A N we define where σ ≺ s means that s = σ τ for some τ ∈ A ω (in words: σ ∈ A * is a prefix of s ∈ A ω ).On these basic measurable (and open) sets we define a probability measure P ω p by It is important to keep track of p even if it is flat: making no (apparent) assumption (which p = f is often taken to be) is an important assumption!For example, Boltzmann's (1877) famous counting argument really reads as follows (Ellis, 1995;Dembo & Zeitouni, 1998;Austin, 2017;Dorlas, 2022).The formula S = k logW (2.9) on Boltzmann's grave should more precisely be something like where I omit the constant k and take µ ∈ Prob(A) to be the relevant argument of the (extensive) Boltzmann entropy S N B (see below).Furthermore, W N (µ) is the probability ("Wahrscheinlichkeit") of µ, which Boltzmann, assuming the flat prior (2.7) on A, took as where N(µ) is the number of microstates σ ∈ A N whose corresponding empirical measure respectively.These are simply related: for the flat prior (2.7) we have (2.17) In general, computing S N B (µ) from (2.10) and, again assuming L N (σ ) = µ, we obtain where µ N ∈ Prob N (A) is any sequence of probability distributions on A that (weakly) converges to µ ∈ Prob(A), i.e. the variable in s B (•|p).For the flat prior (2.7), eq.(2.17) yields As an aside, note that the Kullback-Leibler distance or relative entropy (2.16) is defined more generally for probability measures µ and p on some measure space (A, Σ).As usual, we write µ ≪ p iff µ is absolutely continuous with respect to p, i.e. p(B) = 0 implies µ(B) = 0 for B ∈ Σ.In that case, the Radon-Nikodym derivative dµ/d p exists, and one has If µ is not absolutely continuous with respect to p, one puts I(µ|p) := ∞.The nature of the empirical measure (2.12) and the Kullback-Leibler distance (2.16) comes out well in hypothesis testing.In order to test the hypothesis H 0 that µ = µ 0 by an N-fold trial σ ∈ A N , one accepts H 0 iff I(L N (σ )|µ 0 ) < η, for some η > 0. This test is optimal in the sense of Hoeffding, see Dembo & Zetoumi (1998), §3.5.We now return to the main story.
The stochastic process X N : Ω N → X whose large fluctuations are described by (2.19) is Then L N → p almost surely, and large fluctuations around this value are described by lim where Γ ⊂ Prob(A) is open, or more generally, is such that Γ ⊆ int(Γ).Less precisely, which implies that Note that the rate function µ → I(µ|p) defined in (2.16) and (2.23) is convex and positive, whereas the entropy (2.19) is concave and negative.Thus the former is to be minimized, its infimum (even minimum) over µ ∈ Prob(A) being zero at µ = p, whereas the latter is to be maximized, its supremum (even maximum) at the same value being zero.The first term in (2.20) hides the negativity of the Boltzmann entropy (here for a flat prior), but the second term drives it below zero.Positivity of I(µ|p) follows from (or actually is) the Gibbs inequality.Eq. (2.23) is a special case of Sanov's theorem, which works for arbitrary Polish spaces (instead of our finite set A); see Ellis (1985) or Dembo & Zeitouni (1998).Einstein (1909) computes the probability of large fluctuations of the energy, rather than of the empirical measure, as in Boltzmann (1877), but these are closely related.
Interpreting A as a set of energy levels, the relevant stochastic process X N : but this time taking values in X = R (or some suitable finite subset thereof).This makes the relevant entropy s C (which is the original entropy from Clausius-style thermodynamics!) a function of u ∈ R, interpreted as energy: instead of (2.23), one obtains lim which "maximal entropy principle" is a special case of Cramér's theorem (Ellis, 1985;Dembo & Zeitouni, 1998).
If not, this probability is exponentially small in N. To obtain the classical thermodynamics of non-interacting particles (Dorlas, 2022), one may add that the free energy is essentially the Fenchel transform (Borwein & Zhu, 2005) of the entropy s C (u|p), in that (2.30) For β > 0, the first equality is a refined version of "F = E − T S".
• In information theory (Shannon, 1948 interpreted as the information contained in a ∈ A, relative to p.This interpretation is evident for the flat distribution p = f on an alphabet with |A| = 2 n letters, in which case I 2 (a) = n for each a ∈ A, which is the minimal number of bits needed to (losslessly) encode a.
The general case is covered by the noiseless coding theorem for prefix (or uniquely decodable) codes.A map (2.33) An optimal code minimizes this.Then: 1. Any prefix code satisfies h 2 (p) ≤ L(C, p); 2. There exists an optimal prefix code C, which satisfies L(C, p) ≤ h 2 (p) + 1.Thus the information content I 2 (a) is approximately the length of the code-word C(a) in some optimal coding C. Passing to our case of interest of N-letter words over A, in case of a memoryless source one simply has the Bernoulli measure P N p on A N , with entropy Extending the letter-code , and replacing L(C N , P N p ), which diverges as N → ∞, by the average codeword length per symbol L(C N , P N p )/N, an optimal code C satisfies In what follows, the Asymptotic Equipartition Property or AEP will be important.In its (probabilistically) weak form, which is typically used in information theory, this states that Its strong form, which is the (original) Shannon-McMillan-Breiman theorem, reads Either way, the idea is that for large N, w.r.t.P N p "most" strings σ ∈ A N have "almost" the same probability 2 −Nh 2 (p) , whilst the others are negligible (Austin, 2017, lecture 2).For A = 2 with flat prior p = f this yields a tautology: all strings σ ∈ 2 N have P N f (σ ) = 2 −N .See e.g.Cover & Thomas (2006), §3.1 and §16.8.The strong form follows from ergodic theory, cf.(2.51).
• In dynamical systems along the lines of the ubiquitous Kolmogorov (1958), one starts with a triple (X, P, T ), where X-more precisely (X, Σ), but I usually suppress the σ -algebra Σ-is a measure space, P is a probability measure on X (more precisely, on Σ), and T : X → X is a measurable (but not necessarily invertible) map, required to preserve P in the sense that P(T −1 B) = P(B) for any B ∈ Σ.A measurable coarse-graining (2.5) defines a map in terms of which the given triple (X, P, T ) is coarse-grained by a new triple (A ω , ξ * P, S). Here A fine-grained path (x, T x, T 2 x, . ..) ∈ X ω is coarse-grained to ξ (x) ∈ A ω , and truncating the latter at t = N − 1 gives ξ (x) |N ∈ A N .Hence the configuration in the picture states that our particle starts from x ∈ X a(0) ⊂ X at t = 0, moves to T x ∈ X a(1) ⊂ X at t = 1, etc., and at time In other words, a coarse-grained path σ ∈ A N tells us exactly that T n x ∈ X σ n , for n = 0, 1, . . ., N − 1 (σ n ∈ A).Note that the shift satisfies so if ξ were invertible, then nothing would be lost in coarse-graining; using the bilateral shift on 2 Z instead of the unilateral one in the main text, this is the case for example with the Baker's map on X = [0, 1) × [0, 1) with A = 2 and partition (X The point of Kolmogorov's approach is to refine the partition (2.5), which I now denote by to a finer partition which consists of all non-empty subsets Indeed, if we know x, then we know both the (truncated) fine-and coarse-grained paths But if we just know that x ∈ X a , we cannot construct even the coarse-grained path ξ (x) |N .
To do so, we must know that x ∈ X σ , for some In other words, the unique element π N (x) = X σ (x) of the partition π N that contains x, bijectively corresponds to a coarse-grained path σ (x) ∈ A N , and hence we may take P(π N (x)) to be the probability of the coarse-grained path σ (x).This suggests an information function cf. (2.32), and, as in (2.31), an average (= expected) information or entropy function As H (X,P,T,π M+N ) ≤ H (X,P,T,π M ) + H (X,P,T,π N ) , this (extensive) entropy has an (intensive) limit in turns of which the Kolmogorov-Sinai entropy of our system (X, P, T ) is defined by where the supremum is taken over all finite measurable partitions of X, as above.
We say that (X, P, T ) is ergodic if for every T -invariant set A ∈ Σ (i.e.T −1 A = A), either P(A) = 0 or P(A) = 1.For later use I now state a number of equivalent conditions, each of which holds iff (X, P, T ) is ergodic (Viana & Oliveira, 2016, §4.1).For P-almost every x: The empirical measure (2.12) is a special case of (2.48).Eq. (2.49) is a special case of Birkhoff's ergodic theorem; eq.(2.49) is Birkhoff's theorem assuming ergodicity.In general, the l.h.s. is in L 1 (X, P) and is not constant P-a.e.Each of these is a corollary of the others, e.g.(2.48) and (2.49) are basically the same statement, and one obtains (2.50) from (2.49) by taking f = 1 B .Note that the apparent logical form of (2.49) is: 'for all f and all x', which suggests that the universal quantifiers can be interchanged, but this is false: the actual logical for is: 'for all f there exists a set of P-measure zero', which in general cannot be interchanged (indeed, in standard proofs the measure-zero set explicitly depends on f ).Nonetheless, in some cases a measure zero set independent of f can be found, e.g. for compact metric spaces and continuous f , cf.Viana & Oliveira (2016), Theorem 3.2.6.Similar f -independence will be true for the computable case reviewed below, which speaks in their favour.Likewise for (2.50).
Eq. (2.49) implies the general Shannon-McMillan-Breiman theorem, which implies the previous one (2.37) for information theory by taking (X = A ω , P = P ω p , T = S).Namely:  2.45), the average value of the information I (X,P,T,π N ) (x) w.r.t.P can be computed from its value at a single point x, as long as this point is "typical".As in the explanation of the original theorem in information theory, eq. ( 2.51) implies that all typical paths (with respect to P) have about the same probability ≈ exp(−Nh (X,P,T,π) ).

P-Randomness
The concept of P-randomness (where P is a probability measure on some measure space (X, Σ)) was introduced by Martin-Löf (1966) for the case X = 2 ω and P = P ω f , i.e. the unbiased Bernoulli measure on the space of infinite coin flips.Following Hertling & Weihrauch (2003), §3, a more general definition of P-randomness which is elegant and appropriate to my goals is as follows: An effective probability space (X, B, P) is an effective topological space X with a Borel probability measure P, i.e. defined on the open sets O(X).
2. An open set V ∈ O(X) as in 1. is computable if for some computable function f : N → N, Here f may be assumed to be total without loss of generality.In other words, for some c.e. set E ⊂ N (where c.e. means computably enumerable, i.e.E ⊂ N is the image of a total computable function f : N → N).
for some (total) computable function g : for some c.e. G ⊂ N 2 .Without loss of generality we may and will assume that the (double for which for all n ∈ N one has One may (and will) also assume without loss of generality that for all n we have where (V n ) is some test (since P( n V n ) = 0, such an N is called an effective null set).
6.A measure P in an effective probability space (X, B, P) is upper semi-computable if the set is c.e. (relative to some computable isomorphisms P f (N) ∼ = N and Q ∼ = N).Also, P is lower semi-computable if the set L(P), defined like (3.9) with > q instead of < q, is c.e. Finally, P is computable if it is upper and lower semi-computable, in which case (X, B, P) is called a computable probability space (and similarly for upper and lower computability).
Note that parts 1 to 5 do not impose any computability requirement on P, but even so it easily follows that P(R) = 1, where R ⊂ X is the set of all P-random points in X.However, if P is upper semi-computable, one has a generalization of a further central result of Martin-Löf (1966), p. 605.
Universal tests, then, exist provided P is upper semi-computable, which in turn implies that x ∈ X is P-random iff x / ∈ U. See Hertling & Weihrauch (2003), Theorem 3.10.Compared with the computable metric spaces of Hoyrup & Rojas (2009), which for all purposes of this paper could have been used, too, Hertling & Weihrauch (2003), whom I follow here, avoid the choice of a countable dense subset of X.The latter is unnatural already in the case X = A ω , where σ ∈ A * ∼ = N has to be injected into A ω via a map like σ → σ a ω for some fixed a ∈ A (where a ω repeats a infinitely often).On the other hand, the map , where [σ ] = σ A ω , is quite natural (here O(X) is the topology of X).If P is computable as defined in clause 6, P is a computable point in the effective space of all probability measures on X (Hoyrup & Rute, 2021), where a point x in an effective topological space is deemed computable if {x} = ∩ n V n for some computable sequence V n .
A key example is X = A ω over a finite alphabet A, with topology O(X) generated by the cylinder sets [σ ] = σ A ω , where σ ∈ A N and N ∈ N. The usual lexicographical order on A * then gives a bijection L : N ∼ = → A * , and hence a numbering The Bernoulli measures P ω p on A ω then have the same computability properties as p ∈ Prob(A).In particular, the flat prior f makes P ω f computable, and in case that A = 2, the computability properties of p ∈ [0, 1] ∼ = Prob(2) are transferred to P ω p .This is all we need for my main theme.In case of a flat prior f on A, the above notion of randomness of sequences in A ω is equivalent to the definition in Martin-Löf (1966), which I will now review, following Calude (2002).Though equivalent to the definition of Martin-Löf random sequences in books like Li & Vitányi (2008), Nies (2009), and Downey & Hirschfeldt (2010), the construction in Calude (2002) is actually closer in spirit to Martin-Löf (1966) and has the advantage of being compatible with Earman's principle (see below) even before redefining randomness in terms of Kolmogorov complexity.The definition in the other three books just cited lacks this feature.Calude's definition is based on first defining random strings σ ∈ A * .Since A * has the discrete topology, we simply take B to consist of all singletons {σ }, σ ∈ A * , with B = L as in (3.10).Since unlike A N or A ω the set A * does not carry a useful probability measure P, we replace (3.6) by 1.The inequality (3.11) holds; Since (3.11) is the same as Via eq.(3.12),Definition (3.3) implies that V n ∩ A N = / 0 for all N < n.A simple example of a sequential test for the set of all strings starting with n copies of 1.There exists a universal sequential test (U n ) such that for any sequential test (V n ) there is a c = c(U,V ) ∈ N such that for each n ∈ N we have V n+c ⊂ U n .See Calude (2002), Theorem 6.16 and Definition 6.17.For this (or indeed any) test U we define m U (σ ) := 0 if σ / ∈ U 1 , and otherwise By the comment after (3.12) we have m U (σ ) ≤ |σ | < ∞, since σ ∈ A N for some N.If m U (σ ) < q for some q ∈ N, then σ / ∈ U q by definition.Since U n+1 ⊂ U n , this implies σ / ∈ U q ′ for all q ′ > q, so that also m U (σ ) < q ′ .But as we have just seen, we may restrict these values to q ′ ≤ |σ |.
2. A sequence s ∈ A ω is Calude random (with respect to P = P ω f ) if there is a constant q ∈ N such that each finite segment s |N ∈ A N ⊂ A * is q-random, i.e., such that for all N, m U (s |N ) < q. (3.14) Note that the lower q is, the higher the randomness of σ , as it lies in fewer sets U n .It is easy to show that Calude randomness is equivalent to any of the following three conditions (the third of these is taken as the definition of randomness by Calude (2002), Definition 6.25):Here K(σ ) is the prefix Kolmogorov complexity of σ with respect to a fixed universal prefix Turing machine; changing this machine only changes the constant c in the same way for all strings σ (which makes the value of c somewhat arbitrary).Recall that K(σ ) of σ ∈ A * is defined as the length of the shortest program that outputs σ and then halts, running on a universal prefix Turing machine T (i.e., the domain D(T ) of T consists of a prefix subset of 2 * , so if x ∈ D(T ) then y / ∈ D(T ) whenever x ≺ y).Fix some universal prefix Turing machine T , and define K(σ Calude (2002)  According to Kjos-Hanssen & Szabados (2011), p. 3308, footnote 1 ((references adapted): [Theorem 3] was announced by Chaitin (1975) and attributed to Schnorr (who was the referee of the paper) without proof.The first published proof (in a form generalized to arbitrary computable measures) appeared in the work of Gács (1979).
Since Levin (1973) also states Theorem 3.7, the names Chaitin-Levin-Schnorr seem fair.Hence both Definitions 3.4 and 3.6 are compatible on their own terms with Earman's Principle: While idealizations are useful and, perhaps, even essential to progress in physics, a sound principle of interpretation would seem to be that no effect can be counted as a genuine physical effect if it disappears when the idealizations are removed.(Earman, 2004, p. 191) By Theorem 3.5, Definition 3.4 is a special case of Definition 3.1.5and hence it depends on the initial probability P ω f on A ω .On the other hand, both (3.12) and the equivalence between Definitions 3.4 and 3.7 suggest that P ω f -randomness does not depend on P ω f !To assess this, let us look at a version of Theorem 3.7 for arbitrary computable measures P on 2 ω (Levin, 1973; Gács, 1979).Theorem 3.8 Let P be a computable probability measure on 2 ω .Then s ∈ 2 ω is P-random iff there is a constant c ∈ N such that for all N, (3.18) , and so (3.18) reduces to (3.17).Thus the absence of a Pdependence in Definition 3.6 is only apparent, since it implicitly depends on the assumption p = f .It seems, then, that Kolmogorov did not achieve his goal of defining randomness in a non-probabilistic way!Indeed, note also that the definition of K(σ ) depends on the hidden assumption that the length function σ → |σ | on 2 * assigns equal length to 0 and 1 (Chris Porter, email June 13, 2023).
Another interesting example is Brownian motion, which is related to binary sequences via the random walk ( Mörters & Peres, 2010;McKean, 2014).Brownian motion may be defined as a Gaussian stochastic process (B t ) t∈[0,1] in R with variance t and covariance ⟨B s B t ⟩ = min(s,t).We also assume that B 0 = 0.An equivalent axiomatization states that for each n-tuple (t 1 , . ..t n ) with 0 ≤ t 1 ≤ • • • ≤ t n the increments B t n − B t n−1 , . . ., B t 2 − B t 1 are independent, that for each t one has and that t → B t is continuous with probability one.If we add that B 0 = 0, these axioms imply We switch from 2 = {0, 1} to 2 = {−1, 1}.Take C[0, 1] ≡ C([0, 1], R), seen as a Banach space in the supremum norm ∥ f ∥ ∞ = sup{| f (x)|, x ∈ [0, 1]} and hence as a metric space (i.e.d( f , g) = ∥ f − g∥ ∞ ) with ensuing Borel structure.For each N = 1, 2, . .., define a map and R N (σ ) is defined at all other points t ̸ = n/N of [0, 1] via linear interpolation (i.e. by drawing straight lines between R N (σ )(n − 1) and R N (σ )(n) for each n = 1, . . ., N; I omit the formula).Thus R N (σ ) is a random walk with N steps in which each time jump t = 0, 1, . . .N is compressed from unit duration to 1/N (so that the time span [0, N] becomes [0, 1]), and each spatial step size is compressed from ±1 to ±1/ √ N. Now equip 2 N with the fair Bernoulli probability measure P N f .Then R N induces a probability measure P N W on C[0, 1] in the usual way, i.e.
for measurable A ⊂ C[0, 1].The point, then, is that there is a unique probability measure P W on C[0, 1], called Wiener measure, such that P N W → P W weakly as N → ∞.The concept of weak convergence of probability measures on (complete separable) metric spaces X used here is defined as follows: a sequence (P N ) of probability measures on X converges weakly to P iff lim for each f ∈ C b (X).This is equivalent to P N (A) → P(A) for each measurable A ⊂ X for which ∂ A = / 0. See Billingsley (1968) for both the general theory and its application to Brownian motion, which may now be realized on (C[0, 1], P W ) as B t = ev t , where the evaluation maps are defined by In fact, the set of all paths of the kind R N (σ ), σ ∈ 2 N , is uniformly dense in C 0 [0, 1], the set of all B ∈ C[0, 1] that vanish at t = 0 (on which P W is supported).Namely, for B ∈ C 0 [0, 1] and N > 0, recursively define (3.25) until n = N.In terms of these, define σ (N) ∈ 2 N by σ (N) 0 = 0 and then, again recursively until n = N, This enables us to turn (C[0, 1], P W ) (with suppressed Borel structure given by the metric) into an effective probability space.In Definition 3.1.1,we take the countable base B to consist of all open balls with rational radii around points R N (σ (N) ), where N ∈ N * and σ (N) ∈ 2 N , numbered via lexicographical ordering of 2 * and computable ismorphisms Q + ∼ = N and N 2 ∼ = N.
The following theorem (Asarin & Prokovskii, 1986) characterizes the ensuing notion of P Wrandomness.See also Fouché (2000ab) and Kjos-Hansen & Szabados (2011). (3.28) Here effective convergence B N → B means that where m → N(m) is computable (so 1/m ∈ Q + plays the role of ε ∈ R + ).Compare with Definition 3.6.It might be preferable if there were a single sequence s ∈ 2 ω for which K(s |N ) ≥ N − c, cf.(3.17), but unfortunately this is not the case (Fouché, 2000ab).Nonetheless, Theorem 3.9 is satisfactory from the point of view of Earman's principle above, in that randomness of a Brownian path is characterized by randomness properties of its finite approximants B N ; indeed, each σ (N) ∈ 2 N is c-Kolmogorov random, even for the same value of c for all N.
4 From 'for P-almost every x' to 'for all P-random x' Although results of the kind reviewed here pervade the literature on algorithmic randomness (and, as remarked in the Introduction, might be said to be a key goal of this theory), their importance for physics still remains to be explored.The idea is best illustrated by the following example, which was the first of its kind.For binary sequences in 2 ω equipped with the flat Bernoulli measure for P ω f -almost every s ∈ 2 ω (or: P ω f -almost surely).Recall that this means that there exists a measurable subset A ⊂ 2 ω with P ω f (A) = 1 such that (4.1) holds for each s ∈ A (equivalently: there exists B ⊂ 2 ω with P ω f (B) = 0 such that (4.1) holds for each s / ∈ B).Theorems like this provide no information about A (or B).Martin-Löf randomness (cf.Definition 3.1) provides this information (usually at the cost of additional computability assumptions), where I recall that, as explained more generally after Definition 3.1, the set R ⊂ X of all P-random elements in X has P(R) = 1.In the case at hand, the computability assumption behind this result is satisfied since we use a computable flat prior p = f under which (2 ω , P ω f ) is a computable probability space in the sense of Definition 3.1.3) also holds in this sense (Vovk, 1987).More generally, the classical theorem stating that P ω f -almost all sequences s ∈ A ω are Borel normal can be sharpened to the statement that all P ω f -random sequences s ∈ A ω are Borel normal.See Calude (2002), Theorem 6.61 (a sequence s ∈ A ω is called Borel normal if each string σ ∈ A N occurs in s with the asymptotic relative frequency |A| −N given by P ω f ).The most spectacular result in this direction is arguably: Theorem 4.2 Any σ ∈ A * occurs infinitely often in every P ω f -random sequence s ∈ A ω .
See Calude (2002) Theorem 6.50; the original version of this "Monkey typewriter theorem" of course states that any σ ∈ A * occurs infinitely often in P ω f -almost all sequences s ∈ A ω .But I wonder if this theorem matches Earman's principle: I see no interesting and valid version for finite strings.
The proof of all such theorems, including those to be mentioned below, is by contradiction: x not having the property Φ(x) in question, e.g.(4.1), would make x fail some randomness test.
Interesting examples also come from analysis.The pertinent computable probability space is ([0, 1], λ ), where λ is Lebesgue measure, and for the basic opens B in Definition 3.1 one takes open intervals with rational endpoints, suitably numbered (here I suppress the usual Borel σ -algebra B on [0, 1], which is generated by the standard topology).Alternatively, the map induces an isomorphism of probability spaces (2 ω , P ω 1/2 ) → ([0, 1], λ ), though not a bijection of sets 2 ω → [0, 1], since the dyadic numbers (i.e.x = m/2 n for n ∈ N and m = 1, 2, . . ., 2 n − 1) have no unique binary expansions (the potential non-uniqueness of binary expansions is irrelevant for the purposes of this section, since dyadic numbers are not random).Although (4.2) is not a homeomorphism, it nonetheless maps the usual σ -algebra of measurable subsets of 2 ω to its counterpart for [0, 1].By Corollary 5.2 in Hertling & Weihrauch (2003) we then have: This matches Theorem 3.9 in reducing a seemingly different setting for randomness to the case of binary sequences.See Hoyrup & Rute (2021) for a general perspective on this phenomenon.One of the clearest theorems relating analysis to randomness in the spirit of our theme is the following (Brattka, Miller, & Nies, 2015, Theorem 6.7), which sharpens a classical result to the effect that any function f : [0, 1] → R of bounded variation is almost everywhere differentiable.First, recall that f has bounded variation if there is a constant C < ∞ such that for any finite collection of points 0 By the Jordan decomposition theorem, this turns out to be the case iff f = g − h where g and h are nondecreasing.
Theorems like this give us even more than we asked for (which was the mere ability to replace 'for P-almost every x' by 'for all P-random x'): they characterize random points in terms of a certain property to be had by a specific class of computable functions.I here use the definition (or characterization) of computability due to Hertling & Weihrauch (2003), Definition 4.2: if (X, B) and (X ′ , B ′ ) are effective topological spaces (see Definition 3.1.1above), then f : There is also a similar result in which bounded variation is replaced by absolute continuity.Theorem 4.4 also has a counterpart in which f is non-decreasing, but here the conclusion is that f is computably random instead of Martin-Löf-random (see Downey, Griffiths, & Laforte (2004) for computable randomess, originally defined by Schnorr via martingales, which is weaker than Martin-Löf randomness, i.e.Martin-Löf-randomness implies computable randomness).Another classical result in the same direction returns Schnorr randomness (recall that x ∈ X is Schnorr random if in Definition 3.1.4we replace (3.6) by P(V n ) = 2 −n ; this gives fewer tests to pass, and hence, once again, a weaker sense of randomness than Martin-Löf randomness).
x−h dy f (y) exists for all λ -random x ∈ [0, 1], and the above limit exists for each computable f ∈ L 1 iff x ∈ [0, 1] is Schnorr random.We now turn to ergodic theory.Here, my favourite example is the following.Recall the equivalent characterizations of ergodicity stated in eqs.(2.48) to (2.50) in §2.
Theorem 4.6 Let (X, P, T ) be ergodic with P and T computable.Then (2.48), restricted to for each computable open V ⊂ X (cf.Definition 3.1.3),holds for all P-random x ∈ X.Moreover, x ∈ X is P-random iff (4.3) holds for every computable T and every computable open V ⊂ X.
so that ξ N (x) n = a (for n = 0, . . ., N − 1) identifies the subspace X a of the partition our particle occupies after n time steps.Under the same assumptions as Theorem 4.7, we then have: for all P-random x ∈ X (and hence for P-almost every x ∈ X), cf.(2.46).See Brudno (1983), White (1993), Batterman & White (1996), and Galatolo, Hoyrup, & Rojas (2010).Taking the supremum over all computable partitions π, the Kolmogorov-Sinai entropy of (X, P, T ) equals the limiting Kolmogorov complexity of any increasingly fine coarse-grained P-random path for (X, T ).Note that the right-hand side of (4.6) is independent of P, which the left-hand side is not; however, the condition for the validity of (4.6), namely that x be P-random, depends on P. The equality lim sup for all P-random x ∈ X also illustrates our theme; it shows that (at least asymptotically) each P-random x generates a course-grained path ξ N (x) that has "average" Kolmogorov complexity.Applying these results to X = A ω , with T the unilateral shift and P = P ω p the Bernoulli measure on A ω given by a probability distribution p on some alphabet A, gives a similar expression for the Shannon entropy (2.15): for all P p -random s ∈ A ω (and hence for P p -almost every s ∈ A ω ), Note the lim instead of the lim sup, which is apparently justified in this special case.Porter (2020) states my eq.(4.8) as his Theorem 3.2 and labels it "folklore", referencing however Levin & Zvonkin (1970) and Brudno (1978).See also Schack (1998) and Grünwald & Vitányi (2003) for further connections between entropy and algorithmic complexity.Finally, here are some nice examples involving Brownian motion.Three classical results are: Theorem 4.8 1.For P W -almost every B ∈ C[0, 1] there exists h 0 > 0 such that for all 0 < h < h 0 and all 0 ≤ t ≤ 1 − h, and √ 2 is the best constant for which this is true.

Applications to statistical mechanics
It is the author's view that many of the most important questions still remain unanswered in very fundamental and important ways.(Sklar, 1993 What many "chefs" regard as absolutely essential and indispensable, is argued to be insufficient or superfluous by many others.(Uffink, 2007, p. 925) The theme of the previous section is the mathematical key to a physical understanding of the notorious phenomenon of irreversibility, for the moment in classical statistical mechanics.The literature on this topic is enormous; I recommend Sklar (1993), Uffink (2007), and Bricmont (2022).My discussion is based on the pioneering work of Hiura & Sasa (2019).But before getting there, I would like to very briefly review Boltzmann's take on the general problem of irreversibility (which in my view is correct).In Boltzmann's approach, irreversibility of macroscopic phenomena in a microscopic world governed by Newton's (time) reversible equations is a consequence of: 1. Coarse-graining (only certain macroscopic quantities behave irreversibly); 2. Probability (irreversible behaviour is just very likely-or, in infinite systems, almost sure).
Boltzmann launched two different scenarios to make this work, both extremely influential.First, in Boltzmann (1872) the coarse-graining of an N-particle system moving in some volume V ⊂ R 3 was done by introducing a time-dependent macroscopic distribution function f t , which for each time t is defined on the single-particle phase space V × R 3 ⊂ R 6 , and which is a probability density in the sense that N • A d 3 rd 3 v f t (r, v) is the "average" number of particles inside a region A ⊂ V × R 3 at time t (the normalization is d 3 rd 3 v f t (r, v) = 1).Boltzmann argued that under various assumptions, notably his Stosszahlansatz ("molecular chaos") that assumes probabilistic independence of two particles before they collide, as well as the absence of collisions between three or more particles, and finally some form of smoothness, f (N) t solves the Boltzmann equation whose right-hand side is a quadratic integral expression in f taking the effect of two-body-collisions (or other two-particle interactions) into account.He then showed that the "entropy" satisfies dS/dt ≥ 0 whenever f t solves his equation, and saw this as a proof of irreversibility.Historically, there were two immediate objections to this result (see also the references above).First, there is some tension between this irreversibility and the reversibility of Newton's equations satisfied by the microscopic variables (r 0 (t), v 0 (t), . . .r N−1 (t), v N−1 (t)) on which f t is based (Loschmidt's Umkehreinwand).Second, in a finite system any N-particle configuration eventually returns to a configuration arbitrarily close to its initial value (Zermelo's Wiederkehreinwand).A general form of this phenomenon of Poincaré recurrence (e.g.Viana & Oliveira, 2016, Theorem 1.2.1) states that if (X, P, T ) is a dynamical system, where P is T -invariant and T : X → X is just assumed to be measurable, and A ⊂ X has positive measure, then for P-almost every x ∈ E there exists infinitely many n ∈ N for which T n (x) ∈ A. These problems made Boltzmann's conclusions look dubious, perhaps even circular (irreversibility having been put in by hand via the assumptions leading to the Boltzmann equation).Despite the famous later work by Lanford (1975Lanford ( , 1976) ) on the derivation of the Boltzmann equation for short times, these issues remain controversial, see e.g. the debate between Uffink & Valente (2015) and Ardourel (2017).But I see a promising way forward, as follows (Villani, 2002(Villani, , 2013;;Bouchet, 2020;Bodineau at al., 2020).From the point of view of Boltzmann (1877), as rephrased in §2 (first bullet), the distribution function is just the empirical measure (2.12) for an N-particle system with A = R 6 (hence A is uncountably infinite, but nonetheless the measure-theoretic situation remains unproblematic).Each N-particle configuration x (N) (t) := (r 0 (t), v 0 (t), . . .r N−1 (t), v N−1 (t)) (5.3) at time t determines a probability measure L N (x (N) (t)) on A via Physicists prefer densities (with respect to Lebesque measure drdv) and Dirac δ -functions, writing where Dis(A) is the space of probability distributions on A. The connection is (Villani, 2010, §1.3) (5.7) The hope, then, is that f (N) t has a limit f t as N → ∞ that has some smoothness and satisfies the Boltzmann equation.To accomplish this, the idea is to turn ( f (N) t ) t≥0 into a stochastic process taking values in Dis(A) or in Prob(A), based on a probability space (A ω , P ω ), indexed by N ∈ N, and study the limit N → ∞.More precisely, in a dilute gas (for which the Boltzmann equation was designed) one has a 3 ≪ 1/ρ ≪ ℓ 3 , where a is the atom size (or some other length scale), ρ = N/V is the particle density, and ℓ is the mean free path (between collisions).Defining ε = 1/(ρℓ 3 ), the limit "N → ∞" is the Boltzmann-Grad limit N → ∞ and ε → 0 at constant εN.
The simplest way to put probability measures on A N and A ω is to start from some initial value f 0 , which is the density of a probability measure p on A N , and hence defines a Bernoulli probability measure P ω p on X = A ω .There are two ways to block correlations in the spirit of the Stosszahlansatz.One is to take permutation-invariant probability measures (P (N) ) on A N for which the empirical measures L N on A converge (in law) to some p ∈ Prob(A) as N → ∞ (Villani, 2013); this is equivalent to the factorization lim Sznitman (1991), Prop.2.2.Alternatively, take µ ∈ Prob(A) and average Bernoulli measures P ω p with respect to µ, as in de Finetti's theorem (cf.Aldous, 1985).Either way, one's hope is that P ω -almost surely the random variables f (N) t have a smooth limit f t as N → ∞, which limit distribution function satisfies the Boltzmann equation, so that the macroscopic timeevolution t → f t is induced by the microscopic time-evolution t → x(t) at least for the P ω -a.e.
x ∈ X for which lim N→∞ f t (x) exists, where x is some configuration of infinitely many particles in R 3 , including their velocities, cf.(5.3) - (5.7).This would even derive the Boltzmann equation.
Using large deviation theory, Bouchet (2020) showed all this assuming the Stosszahlansatz.This is very impressive, but the argument would be complete only if one could prove that, in the spirit of the previous section, lim N→∞ f (N) t (x) exists for all P ω -random x ∈ X, preferably by showing first that the Stosszahlansatz and the other assumptions used in the derivation of the Boltzmann equation (such as the absence of multiple-particle collisions) hold for all P ω -random x.In particular, this would make it clear that the Stosszahlansatz is really a randomness assumption.Earman's prinicple applies: Bouchet (2020) showed that for finite N, the Boltzmann equation holds approximately for a set of initial conditions x ∈ A N with high probability P N .The resolution of the Umkehreinwand is then standard, see e.g.Bricmont (2022), Chapter 8. Similarly, the Wiederkehreinwand is countered by noting that in an infinite system the recurrence time is infinite, whilst in a large system it is astronomically large (beyond the age of the universe).
While its realization for the Boltzmann equation may still be remote (for mathematical rather than conceptual reasons or matters of principle), this scenario can be demonstrated in the Kac ring model (Hiura & Sasa, 2019).The original reference for the Kac ring model is Kac (1959); useful literature prior to the use of algorithmic randomness in Hiura & Sasa (2019) includes Maes, Netocný, & Shergelashvili (2007) and De Bièvre & Parris (2017).It is a caricature of the Boltzmann equation rather in the spirit of Boltzmann (1877), i.e. his second approach to the problem of irreversibility (the state counting techniques reviewed in §2 come from this second paper).Namely: • The microstates of the Kac ring model for finite N are pairs (x (N) , y (N) ) ∈ 2 2N+1 × 2 2N+1 ≡ A N ; x (N) = (x −N , . . ., x N ); y (N) = (y −N , . . ., y N ), (5.8) with x n ∈ 2, y n ∈ 2.Here x n is seen as a spin that can be "up" (x n = 1) or "down" (x n = 0), whereas y n denotes the presence (y n = 1) or absence (y n = 0) of a scatterer, located between x n and x n+1 .These replace the variables (r 0 , v 0 , . . ., r N−1 , v N−1 ) ∈ R 6N for the Boltzmann equation.In the thermodynamic limit we then have (x (N) , y (N) ) • The macrostates of the model, which replace the distribution function (5.6), form a pair (5.9) (5.10) • The microdynamics replacing the time evolution (r 0 (t), v 0 (t), . . ., r N−1 (t), v N−1 (t)) generated by Newton's equations with some potential, is now discretized, and is given by maps where n = −N, . . ., N, with periodic boundary conditions, i.e. (x N+1 , y N+1 ) = (x −N , y −N ).
The same formulae define the thermodynamic limit T : A ω → A ω .The idea is that in one time step the spin x n moves one place to the right and flips iff it hits a scatterer (y n = 1).
• The macrodynamics, which replaces the solution of the Boltzmann equation, is given by In particular, for t ∈ N one has and hence every initial state (m, s) with s ∈ (0, 1) reaches the "equilibrium" state ( 1 2 , s), as lim t→∞ Φ t (m, s) = ( 1 2 , s). (5.15) • The macrodynamics (5.13) is induced by the microdynamics (5.12), that is, (m (N) , s (N) )(T (N) (x (N) , y (N) )) = Φ((m (N) , s (N) )(x (N) , y (N) )), (5.16) provided the counterpart of the Stosszahlansatz for this model holds.For N < ∞ this reads i.e., the number of spins x = 1 after one time step equals the number of spins that already had x = 1 and did not scatter (where the probability of non-scattering is estimated to be 1 − s, i.e., the average density of voids), plus the number of spins x = 0 that have flipped because they hit a scatterer (where the probability of scattering is estimated to be the average density s of scatterers).This kind of averaging of course overlooks the details of the actual location of the scatterers versus the location of the spins with specific values.It is trivial to find configurations (x, y) where it is violated, but these become increasingly rare as N → ∞.
(5. 19) It follows that the "Boltzmann equation" (5.14) holds, and that the macrodynamics is autonomous: the dynamics of the macrostates (m, s) does not explicitly depend on the underlying microstates.Theorem 5.1 uses biased Martin-Löf randomness on A ω and hence defines "typicality" out of equilibrium.As we have seen, equilibrium corresponds to m = 1 2 (for arbitrary s ∈ (0, 1)), for which the corresponding P ω 1/2 -random states are arguably the most random ones: for it follows from (3.18) that if s is P ω p -random for some p ∈ (0, 1) and s ′ is P ω f -random, then so that the approach to equilibrium m → 1/2 increases (algorithmic) randomness, as expected.

Applications to quantum mechanics
There is yet another interpretation of the diagram at the beginning of §2: in quantum mechanics a string σ ∈ A N denotes the outcome of a run of N repeated measurements of the same observable A with finite spectrum A in the same quantum state, so that the possible outcomes a ∈ A are distributed according to the Born rule: if H is the Hilbert space pertinent to the experiment, A ∈ B(H) is the observable that is being measured, with spectrum A = σ (A) and spectral projections E a onto the eigenspace H a for eigenvalue a, and ρ is the density operator describing the quantum state, then p(a) = Tr ( ρE a ).It can be shown that if we consider the run as a single experiment, the probability of outcome σ is P N p (σ ), as in a classical Bernoulli trial.This extends to the idealized case of an infinitely repeated experiment, described by the probability measure P ω p on A ω (Landsman, 2021).In particular, for a "fair quantum toss" (in which A = 2 with p(1) = p(0) = 1/2), it follows that the outcome sequences sample the probability space (2 ω , P ω f ), just as in the classical case.For quantum mechanics itself, this implies that P ω f -almost every outcome sequence s ∈ 2 ω is P ω f -random.The theme of §4 then leads to the circular conclusion that all P ω f -random outcome sequences are P ω f -random.Nonetheless, this circularity comes back with a vengeance if we turn to hidden variable theories, notably Bohmian mechanics (cf.Goldstein, 2017).Let me first summarize my original argument (Landsman, 2021(Landsman, , 2022)), and then reply to a potential counterargument.
In hidden variable theories there is a space Λ of hidden variables, and if the theory has the right to call itself "deterministic", then there must be functions h : N → Λ and g : Λ → A such that s = g • h. (6.1) The existence of g expresses the idea that the value of λ determines the outcome of the experiment.The function g tacitly incorporates all details of the experiment that may affect its outcome, except the hidden variable λ (which is the argument of g).Such details may include the setting, a possible context, and the quantum state.The existence of g therefore does not contradict the Kochen-Specker theorem (which excludes context-dependence).But g is just one ingredient that makes a hidden variable theory deterministic.The other is the function h that gives the value of λ in experiment No. n in a long run, for each n.Furthermore, in any hidden variable theory the probability of the outcome of some measurement if the hidden variable λ is unknown is given by averaging the determined outcomes given by g with respect to some probability measure µ ψ on Λ defined by the quantum state ψ supposed to describe the experiment within quantum mechanics.
Theorem 6.1 The functions g and h cannot both be provided by any deterministic theory (and hence deterministic hidden variable theories that exactly reproduce the Born rule cannot exist).
Proof.The Born rule is needed to prove that outcome sequences s ∈ A ω are P ω p -distributed (Landsman, 2021, Theorem 3.4.1).If g and h were explicitly given by some deterministic theory T , then the sequence s would be described explicitly via (6.1).By (what I call) Chaitin's second incompleteness theorem, the sequence s cannot then be P ω p -random.Q.E.D. The theorem used here states that if s ∈ A ω is P ω p -random, then ZFC (or any sufficiently comprehensive mathematical theory T meant in the proof of Theorem 6.1) can compute only finite many digits of s.See e.g.Calude (2002), Theorem 8.7, which is stated for Chaitin's famous random number Ω but whose proof holds for any P ω p -random sequence.Consistent with Earman's principle, Theorem 6.1 does not rely on the idealization of infinitely long runs of measurements, since for finite runs Chaitin's (first) incompleteness theorem leads to a similar contradiction.The latter theorem states that for any sound mathematical theory T containing enough arithmetic there is a constant C ∈ N such that T cannot prove any sentence of the form K(σ ) > C although infinitely many such sentences are true.In other words, T can only prove randomness of finitely many strings, although infinitely many strings are in fact random.See e.g.Calude (2002), Theorem 8.4.
The upshot is that a deterministic theory cannot produce random sequences.Against this, fans of deterministic hidden variable theories could argue that the (unilateral) Bernoulli shift S on 2 ω (equipped with P ω f for simplicity) is deterministic and yet is able to produce random sequences.Indeed, following a suggestion by Jos Uffink (who is not even an Bohmian!), this can be done as follows; readers familiar with Dürr, Goldstein, & Zanghi (1992) will notice that the scenario in the main text would actually be optimal for these authors.With Λ = A = 2, and the simplest experiment for which g : 2 → 2 is the identity (so that the measurement just reveals the actual value of the pertinent hidden variable), take an initial condition s ′ ∈ 2 ω , and define h : N → 2 by h(n) = s ′ (n).
(6.2) Then s = s ′ .In other words, imagine that experiment number n ∈ N takes place at time t = n, at which time the hidden variable takes the value λ = s ′ (n).The measurement run then just reads the tape s ′ .Trivially, if the initial condition s ′ is P ω p -random, then so is the outcome sequence s.According to Dürr, Goldstein, & Zanghi (1992), the randomness of outcomes in the deterministic world envisaged in Bohmian mechanics originates in the random initial condition of universe, which is postulated to be in "quantum equilibrium".In the above toy example, the configuration space (which in Bohmian mechanics is R 3N ) is replaced and idealized by 2 ω , i.e. the role of the position variable q ∈ R 3N is now played by s ∈ 2 ω ; the dynamics (replacing the Schrödinger equation) is S; and the "quantum equilibrium condition" (which is nothing but the Born rule) then postulates that its initial value s ′ is distributed according to the Born rule, which here is the fair Bernoulli measure P ω f .The Bohmian explanation of randomness then comes down to the claim that despite the determinism inherent in the dynamics S as well as in the measurement theory g: Each experimental outcome s(n) is random because the hidden variable λ is randomly distributed.Since s ′ = s, this simply says that s is random because s is random.
Even in less simplistic scenarios, using the language of computation theory (taking computability as a metaphor for determinism) we may say: deterministic hidden variable theories need a random oracle to reproduce the randomness required for quantum mechanics.This defeats their determinism.

Summary
I have some misgivings about the physical relevance of its mathematical origins in the theory of computation, which for physical applications should be replaced by some abstract logical form of determinism.
Various mathematical examples provide situations where some property Φ(x) that holds for P-almost every x ∈ X (where P is some probability measure on X) in fact holds for all P-random x ∈ X, at least under some further computability assumptions, see §4.The main result in §5, i.e.Theorem 5.1 due to Hiura & Sasa (2019), as well as the much better known results about the relationship between entropy, dynamical systems, and P-randomness reviewed §2 and §4, notably Theorem 2.1 and eq.(4.6), provide positive answers to questions 2 and 3. This, in turn, paves the way for an explanation of emergent phenomena like irreversibility and chaos, and suggests that the answer to question 1 is that at least the computational concept of P-randomness requires a prior probability P!

. 13 )
The term |A| −N in (2.11) of course equals P N (σ ) for any σ ∈ A N and hence certainly for any σ for which L N (σ ) = µ.For such σ , for general Bernoulli measures P p on A N we have P N p (σ ) = e N ∑ a∈A µ(a) log p(a) = e −N(h(µ)+I(µ|p)) , (2.14) in terms of the Shannon entropy and the Kullback-Leibler distance (or divergence), given by h(µ) := − ∑ a∈A µ(a) log µ(a); (2.15)

h 2 (
p) := − ∑ a∈A p(a) log 2 p(a) = ∑ a∈A p(a)I 2 (a), (2.31) plays a key role in Shannon's approach.It is the expectation value h 2 (p) = ⟨I⟩ p of the function I 2 (a) := − log 2 p(a), be a prefix code, let ℓ(C(a)) be the length of the codeword C(a), with expectation L(C, p) = ∑ a∈A p(a)ℓ(C(a)).

) Theorem 3 . 5 A
string s ∈ A ω is P ω f -random (cf.Definition 3.1.5)iff it is Calude random.This follows from Theorem 6.35 in Calude (2002), §6.3, and Theorem 3.7 below.The point is that randomness of sequences s ∈ A ω can be expressed in terms of randomness of finite initial segments of s.This is also true via another (much better known) reformulation of Martin-Löf-randomness.Definition 3.6 A sequence s ∈ A ω is Chaitin-Levin-Schnorr random if there is a constant c ∈ N such that each finite segment s |N ∈ A N is prefix Kolmogorov c-random, in the sense that for all N, K(s |N ) ≥ N − c.(3.17) writes H(σ ) for what I call K(σ ), following Li & Vitányi (2008) and others.A key result algorithmic randomness, then, is (e.g.Downey & Hirschfeldt, Theorem 6.2.3): Theorem 3.7 A sequence s ∈ A ω is P ω f -random iff it is Chaitin-Levin-Schnorr random.
hand, the coarse-grained (macroscopic) entropy (2.19) for the flat prior p = f on A and the probability µ = µ (m,s) on 2 × 2 defined by µ See Galatolo, Hoyrup, & Rojas (2010), Theorem 3.2.2, and Pathak, Rojas, & Simpson (2014), Theorem 1.3.The first author to prove such results was V'yugin(1997).See also the reviews by Towsner (2020) and V'yugin (2022).In Theorem 4.6 one could replace (4.3) with the property that x satisfy (Poincaré) recurrence, in the sense that for each computable open V ⊂ X (not necessarily containing x) there is some n ∈ N such that T n (x) ∈ V .If (2.49) instead of (2.48) is used, a result like Theorem 4.6 obtains that characterizes Schnorr randomness.The Shannon-McMillan-Breiman theorem (2.51) also falls under this scope.We say that a partition π of X is computable if each X A ⊂ X is a computable open set.The defining equation (2.5) is then replaced by P( a∈A X a ) = 1.