Probability Axioms and Set Theory Paradoxes

: In this paper, we show that Zermelo–Fraenkel set theory with Choice (ZFC) conﬂicts with basic intuitions about randomness. Our background assumptions are the Zermelo–Fraenekel axioms without Choice (ZF) together with a fragment of Kolmogorov’s probability theory. Using these minimal assumptions, we prove that a weak form of Choice contradicts two common sense assumptions about probability—both based on simple notions of symmetry and independence.


A Puzzle
We begin with a paradox involving the Axiom of Choice (AC) and an infinite set of fair coins. An early version of this result first appeared as a problem in the American Mathematical Monthly [1]; a version closer to ours can be found in [2].
Let I = 2 ω denote the set of all binary-valued functions on ω = {0, 1, 2, . . .}. Let ψ : I → I be a randomly constructed function. By this, we mean that for each r ∈ I, n ∈ ω, ψ(r)(n) is determined by a fair coin toss. (If you prefer, you may also interpret the elements of I as the binary expansion of a real number 0 ≤ r ≤ 1. We note that each dyadic rational, 0 < r < 1, can occur in two ways; e.g., 1 2 = 0.10 = 0.01).) We now ask: "Ifr ∈ I is chosen at random (i.e.,r(n) is determined by a fair coin toss for each n ∈ ω), is it possible to guess the value of ψ(r) given the values ψ(r) for all r =r?". The intuitively obvious answer is "no". Since each value of ψ was chosen independently, the restriction of ψ to I \ {r} should carry no information about ψ(r). Hence, if we are limited to information about ψ (I \ {r}), the odds of guessing ψ(r) correctly should be 0, no matter what strategy we employ.
But are they? Consider the following argument, which can be fully formalized within Zermelo-Fraenkel set theory with Choice (ZFC). Define an equivalence relation on I I by setting f ∼ g if and only if f (r) = g(r) for all but finitely many r ∈ I. By AC, there exists an S ⊆ I I that intersects each ∼-class in one point. For each g ∈ I I , let g be the unique function in S ∩ [g], and let ∆ g denote the finite set {r ∈ I : g(r) = g (r)}. Note that g is uniquely determined by the restriction of g to any cofinite subset of I. Hence, ψ is determined by ψ (I \ {r}), so we can employ the strategy of guessing that ψ(r) = ψ (r). This strategy fails if and only ifr ∈ ∆ ψ . Since ∆ ψ is a finite set depending only on ψ (not onr), any randomly chosenr ∈ I is almost certain not to lie in ∆ ψ . Thus, using only information about ψ (I \ {r}), our strategy is almost certain to guess the correct value of ψ(r). This paradox invites us to reexamine the assumptions underlying ZFC.

Axioms and Mathematical Intuition
Over the last 100 years, set theory (ZFC) has become widely accepted as a foundation for mathematics. If mathematics is to be a search for objective truth, then the correctness of these foundational axioms is essential. Axiomatic systems, such as ZFC, turn our mathematical intuitions into precise statements. Hence, their validity ultimately depends on whether these intuitions are correct. Further, mathematics will always have meaningful questions that cannot be resolved on the basis of currently accepted axioms (as shown by the MRDP theorem [3], which proves Hilbert's 10th problem to be unsolvable, stating that no consistent formal system can prove all true statements of the form "∀x ∈ ω m (F(x) = 0)" for F ∈ Z[X]). The desire to move beyond these limitations naturally motivates the invention of new axioms, which can only be justified using informal arguments. In this way, formal mathematics rests atop informal mathematics.
Of course, peoples' mathematical intuitions do not always agree. A prime example of this is the Axiom of Choice, first articulated by Zermelo in 1904. Most mathematicians find this axiom to be self-evident. Indeed, prior to its codification as part of ZFC, mathematicians had already been using it implicitly in their proofs (including even strong opponents of the axiom like Lebesgue [4]). Yet, this axiom has been the subject of rich controversy due to its non-constructive nature and counter-intuitive consequences. Further, accepting (or not accepting) the Axiom of Choice has significant implications for many areas of mathematics. Hence, it shows that the question of whether to accept a foundational principle can be both subtle and important. For a standard reference on the Axiom of Choice, as well as the other ZFC axioms, see [5].
Compared with other scientists, it may be especially important for mathematicians to take an active interest in the foundations of their subject. Since many mathematical statements cannot be empirically tested, incorrect mathematics will not necessarily "selfcorrect" as an incorrect theory of physics might. Therefore, the validity of mathematics rests on our willingness to critique our own basic assumptions. In this paper, we study two axioms that, while highly intuitive, are incompatible with ZFC. These axioms stem from simple, intuitive arguments concerning probability, and we believe they merit the consideration of anyone interested the foundations of mathematics.
Freiling motivates this axiom as follows: Suppose we choose x, y ∈ [0, 1] uniformly at random. Since f (y) is a countable set, it is almost certain that x / ∈ f (y). By symmetry, it is almost certain that y / ∈ f (x). Hence, for a randomly chosen pair, x, y, it is almost certain that both x / ∈ f (y) and y / ∈ f (x); and we conclude that some pair must satisfy this condition. Incredibly, under ZFC, this axiom is equivalent to the negation of the continuum hypothesis.
We are not the first to observe that this type of reasoning also has the potential to contradict ZFC itself. If we make the (plausible) assumption that any subset of [0, 1] with cardinality < 2 ℵ 0 must have probability 0, we might suggest the following stronger principle: 1] such that x / ∈ f (y) and y / ∈ f (x).
However, as Freiling notes, this axiom implies that [0, 1] cannot be well-ordered (assuming the other ZF axioms). Ultimately, Freiling rejects Axiom 2, saying that he knows of no compelling reason to support the belief that small sets have probability 0.
A second way in which Freiling's intuition may be used to argue against AC is explored in [7]. Van Lambalgen's approach is to add a new "almost all" quantifier, Q, to the language of set theory. Among van Lambalgen's axioms for this quantifier is the following, which directly formalizes Freiling's symmetry argument: Axiom 3 (cf. Axiom Q6 from [7]). For any formula φ, QxQyφ ↔ QyQxφ.
In [7], van Lambalgen shows that his axioms for Q, together with ZF, imply that 2 ω is not well-orderable.
The results of this paper show further ways in which Freiling's reasoning can conflict with ZFC. We note two important ways in which our arguments differ from those just mentioned. First, both Freiling and van Lambalgen derive a contradiction with ZFC by arguing that a set equinumerous with the continuum cannot be well-ordered. By contrast, our contradictions require only the following (even weaker) special case of AC.
Axiom 4 (Weak Axiom of Choice). Every partition of 2 ω into countable sets has a set of representatives.
Second, we will reconsider the usual axioms of probability. We will argue that some of Kolmogorov's axioms do not have a clear justification, and that this calls into question any argument against ZFC based on them. We will then present a more parsimonious framework for probability and derive our contradictions within that framework.

Freiling's Argument for ¬CH
We are greatly indebted to Freiling for his innovative work on the continuum hypothesis. In [6], he proposed a negative solution to CH based on intuitive probability axioms, such as Axiom 1. We view our axioms as natural extensions of his.
The eventual aim of this paper is to show that Freiling's reasoning (taken to its logical conclusion) is incompatible with ZFC, which he assumes in his argument. Therefore, we believe that Freiling's approach to CH does not ultimately work. Nonetheless, his argument is delightful and we include a version of it here for context. The remainder of this section is independent of the rest of the paper and may be skipped without any loss of understanding.
Theorem 1 (Adapted from [6]). For all n < ω, there exists A n ⊆ ω n+2 n such that: We proceed by induction. For n = 0, we may take A 0 = {(m, n) ∈ ω 2 : n ≤ m}. Now, assume the result holds for n − 1. For each α < ω n , there exists an injection The following is adapted from [6]: For every positive integer, n, and for any A ⊆ I n the following cannot simultaneously be true: The justification for Axiom 5 is similar to that given for Axiom 1. Suppose we choose (x 0 , . . . , x n−1 ) ∈ I n by choosing each x i uniformly at random and independently. If we choose x 0 , . . . , x n−2 first, then by (i), it is almost certain that x n−1 will be such that (x 0 , . . . , x n−1 ) / ∈ A. Yet, the order in which we choose the x i should be irrelevant, so we conclude that (x 0 , . . . , x n−1 ) is almost certainly not in A, period. By symmetry, for any σ ∈ Sym(n), (x σ(0) , . . . , x σ(n−1) ) is almost certain not to lie in A. It follows that (x 0 , . . . , x n−1 ) is almost certain not to lie in σ∈Sym(n) σ · A, which implies this union does not equal I n .

Revisiting Kolmogorov's Axioms
If we wish to use probability as a source of foundational principles, as Freiling does, we are obliged to examine the assumptions underlying probability theory itself. Modern probability was formalized in the 1930's by Kolmogorov, using the concepts of measure theory. Despite its great success, the measure-theoretic framework is not beyond scrutiny. In particular, we believe that Kolmogorov's axioms cannot be reasonably deduced from first principles. According to Kolmogorov, a probability space is a triple (X, Σ, P), where X is a set, Σ ⊆ P (X) is a σ-algebra and P : Σ → R is a (σ-additive) measure satisfying P(X) = 1 (for more on probability spaces, see [8]).
Kolmogorov makes two significant assumptions that we believe are not well justified. The first is that probabilities are elements of R, rather than some other ordered field extending Q, such as a hyperreal field (this is explored in [9]). The modern conception of R was motivated by the desire to give a rigorous foundation to analysis (which was primarily developed for physics). Thus, the applicability of real numbers to probability is by no means self-evident. In fact, there are examples of events which prima facie appear to require nonzero, infinitesimal probabilities; an impossibility in Archimedean fields such as R. Hence, if real numbers are well-suited to probability, this ought to be argued for, rather than taken as a hypothesis.
Secondly, we can examine Kolmogorov's assumptions regarding the structure of measurable sets. It is a point in his favor that he does not assume that all subsets of X have defined probabilities (a reasonable precaution given results such as the Banach-Tarski theorem [10]). However, it would then be logical to call a set "measurable" precisely when its probability can be calculated using some set of basic assumptions (e.g., additivity for disjoint sets).
On the other hand, there seems to be no clear reason why measurable sets should be closed under unions and intersections. There is no general method for assigning a probability to the union or intersection of two events in terms of their individual probabilities. Hence, the assumption that measurable sets form an algebra (let alone a σ-algebra) needlessly constrains the universe of allowed probability spaces, and may exclude some desirable spaces There are clear practical advantages to having measurable sets be closed under unions and intersections. However, we have good reason not to accept axioms simply because they are plausible and convenient. In the case of R 3 , it is quite reasonable to think that ordinary volume ought to be invariant under the operation of breaking a set into two pieces and applying a Euclidean transformation to one of those pieces such that it remains disjoint from the other. Hence, one may be tempted to assert that the family of measurable sets be closed under this type of operation. Yet, it is well-known that (assuming AC) this cannot hold for any measure that assigns the usual volumes to all boxes. Such examples urge us to use extreme caution when asserting closure properties for measurable sets.

Uniform Probability on 2 ω
For the reasons given above, we choose for our basic framework a formulation of probability that differs somewhat from Kolmogorov's. In particular, we will not insist that measurable sets form an algebra, nor will we insist that probabilities be real numbers. We will then show that the conflicts between ZFC and our probability axioms arise, even within this simplified framework. (Another approach to probability, which is more philosophically cautious than Kolmogorov's, is qualitative probability. In this formulation, a "not more likely" relation on events, , is axiomatized, rather than a probability function. For one possible axiomatization, see [11]. We note that our axioms and results can be adapted to this setting.) Our definitions and axioms are motivated by a prototypical example: selecting a random element from 2 ω by flipping a fair coin for each n ∈ ω; and associating 0 and 1 to heads and tails, respectively. We refer to this intuitive idea as the fair coin space.
For at least some A ⊆ 2 ω , the probability that a random element of the fair coin space lies in A seems to have an exact answer. For example, if A = {x ∈ 2 ω : x 0 = x 3 = 0}, then membership in A depends only on the outcomes of two of the coins and thus appears to be 1 4 .
In the remainder of this section, we present several new definitions and axioms, which are motivated by further considering the fair coin space and exploiting its symmetry (the coins are fair and identical) and independence (the coins do not influence each other). If ∆ ⊆ ω is finite, and δ ∈ 2 ∆ , then P({z ∈ 2 ω : z ∆ = δ}) = 1/2 |∆| .

(ii) If A, B ∈ Σ and A ⊆ B, then P(A) ≤ P(B).
We refer to 2 ω as the sample space, Σ as the event space and P as the probability function. Elements of Σ and P (2 ω ) \ Σ are called measurable and non-measurable sets, respectively.
For convenience, we will avoid explicit mention of the set Σ when the meaning is still clear. Thus, expressions such as, "P(A) = P(B)", should be understood to mean that either both sides of the equation are defined and equal or that they are both undefined.
Let I ⊆ ω and let A = {x ∈ 2 ω : x I ∈ A 0 } for some A 0 ⊆ 2 I . Given x ∈ 2 ω , whether or not x ∈ A depends only on the restriction of x to I. Therefore, we may reasonably identify "the probability that a random element of 2 I is in A 0 " with the "probability that a random element of 2 ω is in A. In fact, with a slight abuse of notation, we will write P(A 0 ) to mean P(A). Notation 1. Let I ⊆ ω, and let A ⊆ 2 I . We will write P(A) as a shorthand for P({x ∈ 2 ω : x I ∈ A}). Notation 2. Let I, J ⊆ ω be disjoint, and let A ⊆ 2 I∪J . For x ∈ 2 I , we will use A x to denote the set {y ∈ 2 J : x ∪ y ∈ A}. Remark 1. It will frequently be convenient to replace the set 2 ω with 2 N , where N is some countably infinite set. We will call the resulting space an MPS on 2 N . Similarly, if I and J are countable sets, we can speak of an MPS on 2 I × 2 J , using the natural identification between this space and 2 I J .

Two Axioms
We now introduce two new intuitively appealing axioms. In Sections 4.1 and 4.2, we show that each of these is incompatible with Axiom 4.

Definition 2.
An MPS has the Freiling property if the following holds: Let I and J be disjoint subsets of ω. Let A ⊆ 2 I∪J . If for every x ∈ 2 I , P(A x ) = q, then P(A) = q.
We believe that this definition, which is based on Freiling's Axiom of Symmetry [6], ought to hold in the fair coin space. To argue this, we will apply a variant of Freiling's own argument. Suppose I, J, A and q are as in Definition 2. Since the fair coin space is symmetric, it should not matter in what order we flip the coins when choosing a random element. Therefore, let us start with coins corresponding to elements of I, followed by those corresponding to elements of J. Once x ∈ 2 I has been determined, our hypothesis tells us that the probability that we will choose y ∈ 2 J such that x ∪ y ∈ A is exactly q. This is true regardless of the value of x, and hence we may say that it is true before x is chosen. Thus, P(A) = q as claimed. Definition 3. Let N be a countable set. We let G(2 N ) denote the group of transformations on 2 N generated by functions, σ : 2 N → 2 N , such that either (i) there exists an r ∈ 2 N such that σ(x) = x ⊕ r for all x ∈ 2 N , where ⊕ denotes the bitwise XOR operation, or (ii) there exists a permutation π : N → N such that σ(x) = x • π for all x ∈ 2 N .
A probability symmetry of 2 N is an element of G(2 N ). We let · denote the natural action of G(2 N ) on 2 N .

Definition 4.
An MPS on 2 ω has the dependent symmetry property if the following holds: Let I ∪ J be any bipartition of ω; we will identify 2 ω with 2 I × 2 J in the natural way. Then Again, we will argue that this definition ought to be satisfied by the fair coin space. Fix σ ∈ G(2 J ). Consider randomly choosing an element (x, y) ∈ 2 I × 2 J using two different methods. The first is just that of the fair coin space. For the second, suppose we choose x and y just as before, but if x ∈ A, we replace y with σ −1 · y (so the choice of whether to apply σ −1 does not depend on y). From the point of view of probability, there ought to be no difference between these methods of choosing (x, y). Hence, the probabilities that points chosen by each method lie in (A × B) ∪ C should be equal if they are defined. This motivates the equality in Definition 4.
Since we believe that the fair coin space is an MPS having the properties given in Definitions 2 and 4, we propose the following axioms.

Axiom 6.
There exists an MPS with the Freiling property.

Axiom 7.
There exists an MPS with the dependent symmetry property.
Both of these axioms assert the existence of sets with specific properties. Moreover, we have given informal arguments as to why these properties ought to be satisfiable. Hence, our axioms may be viewed as examples of the maximize principle explored by Maddy in [4]. This principle essentially states that the set theoretic universe ought to be very "full", containing as many sets as possible without generating contradictions (e.g., as one gets by declaring {x : x / ∈ x} a set).

Remark 2.
The usual product measure on 2 ω satisfies Definition 1 (though not Axioms 6 or 7). Hence, our arguments may be adapted to that setting.

A Vitali-Type Paradox
In this and the following sections, we will assume the axioms of Zermelo-Fraenkel set theory without Choice or Replacement. When axioms beyond this are used, we will say so explicitly. Lemma 1. Let (Σ, P) be an MPS with the Freiling property. Let ∆ ⊆ ω be finite, and let A ⊆ 2 ω have the following property: For every f : ω \ ∆ → 2, there exists a (unique) δ f : ∆ → 2, such that for all g : ∆ → 2, f ∪ g ∈ A ⇔ g = δ f .

The Banach-Tarski Paradox in 2 ω
In this section, we let F 2 = F(a, b) denote the free group on two generators. The following is an adaptation of the proof of the famous Banach-Tarski paradoxical decomposition [10]. Lemma 2. (Dependent Symmetries) Let I ∪ J be a bipartition of ω. Let (Σ, P) be an MPS on 2 ω (≈ 2 I × 2 J ) with the dependent symmetry property. Let A 1 , . . . , A n ⊆ 2 I be pairwise disjoint, and let B 1 , . . . , B n ⊆ 2 J . If σ i ∈ G(2 J ) for each i ≤ n, then P( Proof. This follows immediately from n applications of Axiom 7.

Lemma 3.
There exists a bijection π : F 2 → F 2 such that π(e) = e and for all g, h ∈ F 2 \ {e}, there exists an integer n > 1 such that π(g n ) = h n .
Proof. Let {(g i , h i ) : i < ω} be an enumeration of (F 2 \ {e}) 2 . First, note that for every g ∈ F 2 \ {e}, the set {g n : n ∈ Z + } is infinite. For i < ω, recursively define n i to be the least integer ≥ 2 such that g n i i / ∈ {g n j j : j < i} and h n i i / ∈ {h n j j : j < i}. Let π : X → X be the partial function π = {(e, e)} ∪ {(g n i i , h n i i ) : i < ω}. Since F 2 \ {e, g n i i , h n i i : i < ω} is infinite (e.g., it contains a n b for all n), we can extend π to a bijection π : F 2 → F 2 .
Lemma 4. Let G be a group acting on a set, X. Let Y = {x ∈ X : |G x | = 1}. Then G fixes Y as a set, and G acts freely on Y.
Let α, β be a, b, respectively. To complete the proof, it is sufficient (by symmetry) to show that b m · A ∩ b n · A = ∅ for all m = n ∈ ω. Suppose y ∈ b m · A ∪ b n · A. Then y = (b m g 1 ) · z 1 = (b n g 2 ) · z 2 , for some z 1 , z 2 ∈ Γ and g 1 , g 2 ∈ X a . Then z 1 and z 2 are in the same F 2 -orbit of S C , and so in fact z 1 = z 2 = z. We now have (g −1 2 b m−n g 1 ) · z = z, which implies g −1 2 b m−n g 1 = e, since z ∈ S C . Thus, b m−n = g 2 g −1 1 ∈ X a , which yields m = n. Proof. Let x, y, z be three elements not in F 2 , and let (Σ, P) be an MPS on 2 {x,y,z} × 2 F 2 with the dependent symmetry property. Let A, B ⊆ 2 F 2 and α, β, γ ∈ G(2 F 2 ) be as in Lemma 5; are both pairwise disjoint unions. Using Lemma 5, we have the following: which is further equal to By Definition 1(iii), this is at most P(000 2 F 2 ), which equals 1 8 , by Definition 1(i); a contradiction.

Conclusions
We have shown that the Axiom of Choice is incompatible with highly intuitive assumptions about probability (Axiom 6 or Axiom 7). Our work improves on other similar results in two important ways. First, our contradictions rely on a weaker version of AC than that used in [6,7], which both assume the well-orderability of R. Second, we analyze the philosophical "weak points" in Kolmogorov's axiomatization of probability, and show that our results do not depend on these.
While the standard resolution to these paradoxes is to reject Axioms 6 and 7, we believe that there are three plausible options: (i) reject the Infinity or Powerset axioms, so 2 ω cannot be constructed, (ii) reject Axiom 4, or (iii) reject Axioms 6 and 7.
For those with finitistic leanings, option (i) will be the obvious choice, and our paradoxes may be viewed as consequences of invalid reasoning about "completed infinities". For mathematicians who accept completed infinities, but believe that valid mathematical objects must be explicitly constructed, option (ii) may be appealing.
Finally, if we reject Axioms 6 and 7, we can keep all of ZFC. Unfortunately, these axioms seem to follow immediately from our most basic intuitions about random events. Therefore, the authors believe that rejecting them amounts to a rejection of randomness as a valid mathematical concept. This leads one to question whether probability can be meaningfully formalized at all.
Each of these attempts at a resolution has drawbacks, and it is not clear to the authors what approach is best. The aim of this paper has been to make these issues more widely understood, especially by mathematicians working outside of set theory. We hope that in so doing, we will stimulate an interesting conversation about the conflicts between probability, infinite sets, and the Axiom of Choice.