Contextuality in Classical Physics and Its Impact on the Foundations of Quantum Mechanics

It is shown that the hallmark quantum phenomenon of contextuality is present in classical statistical mechanics (CSM). It is first shown that the occurrence of contextuality is equivalent to there being observables that can differentiate between pure and mixed states. CSM is formulated in the formalism of quantum mechanics (FQM), a formulation commonly known as the Koopman–von Neumann formulation (KvN). In KvN, one can then show that such a differentiation between mixed and pure states is possible. As contextuality is a probabilistic phenomenon and as it is exhibited in both classical physics and ordinary quantum mechanics (OQM), it is concluded that the foundational issues regarding quantum mechanics are really issues regarding the foundations of probability.


Introduction
Feynman [1] pointed out that the interference phenomenon as observed in the doubleslit experiment (DSE) can be seen as a purely probabilistic phenomenon, as a violation of the formula of total probability (FTP). As one is in the DSE still measuring probabilities in the same way, i.e., as frequencies of occurrence, Feynman concluded that FQM constitutes a probability theory on par with the classical one but only with different rules of computation. By 'classical' Feynman meant that the computational rules equivalent to those in the measure theoretic formulation due to Kolmogorov (KPM) [2] where satisfied. Hence Feynman referred to the result of DSE as FQM violating classical probability theory. However, Ballentine [3] and Koopman [4], though not dismissing the probabilistic origin of the interference phenomenon in the DSE, disagreed with Feynman regarding his claim about it allegedly corresponding to a violation of FTP and classical probability theory. They correctly pointed out that the violation was only seemingly as such, resulting only because of a naive application of FTP. To briefly explain, we consider the DSE in a more generic form referred to as double-slit-type experiments (DSTE). In a DSTE, two observables A and B are considered. For simplicity, A is assumed dichotomous with possible outcomes being a i , i = 1, 2. These observables are measured in three different contexts of measurement C 1,2 , C 1 and C 2 , where a context of measurement of an observable is a specification of the physical conditions under which this observable is measured. By construction, each context C i , i = 1, 2 is such that the probability distribution over A, as measured under the circumstances that defines C i , satisfies Based on the measured probability distributions one then calculates the following function/distribution over b, P(B = b|C 1,2 ) − P(B = b|C 1 )P(A = a 1 |C 1,2 ) + P(B = b|C 2 )P(A = a 2 |C 1,2 ), (2) where the analogous notation as in (1) has been applied for the different probability distributions. Now, DSE corresponds to the particular DSTE for which A is the observable of the slit passage and B the observable of where on the screen the particle hits, with the respective contexts being: In DSE the actual result is a non-zero (2), which is what constitutes the interference phenomenon. If the context is ignored in the notation, this looks like a violation of FTP. However, as Koopman and Ballentine correctly pointed out, there is in reality no violation of FTP, Feynman had only failed to take the contexts into account. Indeed, as Feynman had shown [1], classical mechanics cannot account for the DSE. So DSE is 'unclassical' in that sense. However, in the purely probabilistic sense there is nothing unclassical about it. In this article it will be shown that this typical quantum phenomenon is present in CSM as well. The above argument demonstrates that there is not inherent contradiction in this result. Note that Koopman and Ballentine only disagreed with Feynman regarding his use of the term 'classical'. Their argument did not dismiss that FQM provided computational rules for calculating the term (2). Hence their views can be combined, which will be done in this article. FQM will be seen as a framework for computing probabilities in which the context dependence of observables is explicitly taken into account. This view of FQM will be made manifest by considering in the framework of contextual probability, as developed by Khrennikov in [5]. In it, violations of the type (2) are viewed as measures of contextuality. For a certain class of such violation FQM serves as a particular framework into which they can be represented. More specifically, this is done by considering the generic DSTE together with the generalized formula of total probability (generalized FTP), defined as where the interference coefficient λ b serves as a measure of contextuality. A trivial interference coefficient means no contextuality is present in the particular DSTE. On the other hand, if then we have contextuality. In [5], it shown that if a DSTE demonstrates such interference coefficients (4), then it can be represented in FQM. Indeed, in this case, the observables' A and B are represented as mutually non-commuting self-adjoint operators. This will also be demonstrated here in Section 2. The purpose of doing so is to show that the occurrence of contextuality is, in FQM, equivalent to being able to physically tell the difference between pure and mixed states. The main result of Section 2 is that the way in which probability distributions transform under time evolution can be used to distinguish between mixed and pure states. It is in this sense that contextuality will be shown to occur in CSM. In Section 3, KvN [6][7][8] will be presented and be shown to correspond to CSM. This will be in the sense of all solutions of the Schrödinger equation of KvN via Born's rule also being solutions to the classical Liouville equation. As such, we can apply the statistical mechanical notion of an equilibrium state as states that are stationary in time and non-equilibrium as states not being stationary. In particular, the states of equilibrium are identified as corresponding to eigenvectors ψ n of the Liovillian T, i.e., the KvN generator of time, as the associated probability distributions to these transform trivially under time evolution. The associated probability distributions of non-trivial superpositions of such eigenstates, ∑ n c n ψ n , are non-stationary. Hence, they can be identified as non-equilibrium states. By applying the principle of maximum Gibbs entropy, we will in fact be able to the main equilibrium states-the microcanonical and canonical ensembles-as corresponding to such eigenstates. Hence, it will be shown that KvN, when applying the principles of statistical mechanics, corresponds to CSM. As non-equilibirum states (4) transform non-trivially under time evolution while a corresponding mixed state ∑ n |c n | 2 |ψ n ψ n | (6) does not, we are able to tell the difference between pure and mixed states. Hence, contextuality will be shown to be present in CSM, and is hence not only a phenomenon confined to OQM, where 'OQM' refers to what is obtained through some variant of canonical quantization. In Section 4, it is shown what impacts on the foundation of quantum mechanics this has. In short, as contextuality is not restricted solely to OQM and as it is quantifiable as a purely probabilistic phenomenon, it is concluded that the issues regarding the foundations of quantum mechanics are really about the foundations of probability.

Time Evolution as an Indicator of Contextuality
Let A and B be two quantum observables on some Hilbert space H. For the sake of the argument, it suffices to assume that A is non-degenerate with an orthonormal eigenbasis {φ a } a∈A , where the eigenvectors are labeled by their respective eigenvalue a. We impose no such restriction on B. We let ψ b denote an arbitrary (generalized) eigenvector of B with associated eigenvalue b. By Born's rule, the probability distribution over B, given initial state ψ, is Analogously, for A, By utilizing the completeness relation in terms of {φ a } a∈A , we obtain = ∑ a∈A P(B = b|φ a )P(A = a|ψ) Now, there exists a unique number θ ∈ [0, 2π) such that From which it follows that with θ being unique if we restrict it to values in [0, π]. Hence, we end up with . (17) Hence, it has been shown that FQM indeed satisfies the generalized FTP (3). Before moving on to the main point of this section, it is worth pointing out that we have implicitly interpreted quantum states as corresponding to contexts of measurement here. This is already evident from the choice of notation in (7) and (8). Note also that the meaning of a conditional probability-as the probability of outcome A = a given the conditions C -does by itself not necessitate that it must satisfy as it is defined in KPM, i.e., that C can be attributed to some random variable on the same measure space as A. Indeed, Kolmogorov himself [2] pointed out that probability measures P(·|C) (19) are based on the complex of all experimental conditions and that not all observables are representable as random variables on the same measure space. Hence, neither the notation in (7) and (8) nor the interpretation of quantum states as contexts cause any contradictions. Now, assume that we have performed some DSTE from which we obtained the probability distributions We wish to represent this DSTE in FQM. This means that A and B are to correspond to self-adjoint operators, which we, for simplicity, assume to be non-degenerate. As (1) holds by construction of the experiment it also means that C 1 and C 2 are to be represented as eigenvectors φ a 1 and φ a 2 of A, respectively. In addition, we must have Furthermore, on the basis of P(B = b|C 1,2 ) alone, C 1,2 may be represented as up to some choices of the θ a in [0, 2π), or as the mixed state However, as we obtain via a straightforward calculation that where θ b,a 1 ,a 2 is the number such that by comparing (25) with (3), we see that contextuality means that we are able to differentiate between pure and mixed states.
Recall that in an identification of C 1,2 as ψ in (22), there is the ambiguity of the complex phases θ b . This ambiguity can be thought of as the identification being only up to a unitary transformation W of the form As ρ transforms trivially under all such transformations, we obtain from (25) that where we have defined That is, this difference between how pure and mixed transform under such unitary transformations is equivalent to contextuality. The relevance of this here is that in the case where A corresponds to the generator of time evolution, then the time evolution e iAt is such a transformation W. The equilibrium states of CSM will be shown to in KVN correspond to eigenvectors of the generator of time evolution T, and as there certainly exist states in CSM that transform non-trivially under time evolution, it will have been shown that contextuality is not just an OQM phenomenon but occurs in classical physics as well. Note, it was not discussed above whether there exists mixed states diagonal with respect to some other basis than {|a } such that their respective probability distributions with respect to B transform identically under time evolution. In Appendix A it is shown that there are no such states.
As a side note, the ambiguity of the complex phases in representing contexts as quantum states. Has crucial foundational importance when it comes to FQM and should be seen as a feature rather as something redundant. For instance, in a previous article [9], the author demonstrated that Born's rule can be proved rather than postulated by enforcing the probability to be such that it is invariant under certain such unitary transformations W. Its foundational importance as it relates to Born's rule has also for instance been demonstrated elsewhere, e.g., [10].

The Koopman-von Neumann Representation of Classical Mechanics
Let N ∈ N be the number of considered particles. We are considering a phase space P R 6N together with a set of fixed global canonical coordinates (p, q). Typically, these canonical coordinates are chosen such that q corresponds to the observable of the position in Cartesian coordinates and p, corresponding to its the conjugate momentum [11] associated with q, defined as where L is the given Lagrangian function. We note that p defined in this way does not always correspond to linear momentum, e.g., the charged particle moving in a magnetic field [12]. However, for all Hamiltonian functions of the typical form p does correspond to linear momentum. Furthermore, solely in terms of the formalism of Hamiltonian mechanics, there is nothing special about this particular choice of canonical coordinates. This choice of a coordinate q and its conjugate momentum does, however, play a distinguished role in going from the Lagrangian picture of classical mechanics to the Hamiltonian one via a Legendre transformation [11]. Note that it is not even a necessity to interpret q as a coordinate of position. We will, however, stick to this conventional interpretation of p and q here and consider a Hamiltonian function H which in terms of them has the typical form (31).
To be explicit with the notation, we have defined As we are assuming these to be global coordinates, we can without loss of generality assume that each p n i corresponds to the canonical projection π 3(n−1)+1 , where Similarly, each q n i is defined as the projection π 3(N+n−1)+1 . The time evolution of this Hamiltonian dynamical structure is given by a Hamiltonian flow Now, KvN corresponds to the unitary representation on L 2 (P ) corresponding to the action for every ψ. This representation is constructed by first defining it according to (37) on the space of Schwartz functions ψ on P. This action (37) acts bijectively on the Schwartz space as all U −t are smooth diffeomorphisms. As the Schwartz space is dense in L 2 (P ) we may uniquely linearly extend this action (37) to all of L 2 (P ). Now, by Liouville's theorem [13], we have dx t dy t = dxdy, where Hence, Thus, each U t preserves the inner product. U hence defines a unitary representation of the Hamiltonian flow U. Given this representation, we can make some natural identifications.
Consider the multiplication operators where we, similarly as in (33), have applied the notation Since we have assumed that U acts on points (p, q) having the interpretation of kinetic momentum p and Cartesian position q, we naturally interpret the operatorp n i as corresponding to the observable of the ith component of the kinetic momentum of the nth particle andq n i as corresponding to the observable of the ith component of the position of the nth particle. Based on this, the operator is interpreted as the observable of energy. We have here, for the sake of simplicity, dropped the sub-/supscripts, as will be done generally from now on unless their inclusion is a necessity. As U defines a unitary representation of the Lie group R through U, we may apply Stone's theorem to obtain a generator of time evolution T. From (37), we see that T acts as on all ψ ∈ L 2 (P ) in its domain D(T). Notice that we may also write T more concisely as with {·, ·} denoting the Poisson bracket on P, which in the coordinates (p, q), read The corresponding Schrödinger equation is, hence, In (48), T acts as a derivative. Hence, asp andq are multiplication operators, it follows that That is, and similarly that Indeed, these are merely the equations of motion in operator form. Hence,p andq satisfy the expected dynamical relations in line with their interpretation.
Next, we move on to the relation between the KvN Schrödinger equation and the Liouville equation. By Born's rule, ρ t := |ψ t | 2 (57) is a probability distribution over phase space. Assuming that ψ t is a solution to (50), it follows that This means that ρ t solves the Liouville equation. We, furthermore, note that if ψ λ is an eigenvector of T, i.e., then the probability distribution We hence see that eigenvectors of T correspond to statistical equilibria, while non-trivial superpositions of them correspond to statistical non-equilibria as their induced probability distribution, via Born's rule, transforms non-trivially under T.
To recap, abstractly speaking, we have unitarily represented the Hamiltonian flow U as U such that there exist (non-trivial) operatorsp andq satisfying (55) and (56), with T being the associated generator of time evolution of U. Note that OQM is also such a representation of U. As such, KvN and OQM are, at this level, equivalent. This equivalence is, however, broken when considering a specific representation of U. This representation theoretic view of the difference/similarity between KvN and OQM will, however, not be formalized further here. However, as KvN itself contains subrepresentations of U in whichp andq still act invariantly and non-trivially, we will apply the essence of this representation theoretic view to identify these subrepresentations as 'proper' quantum theories.
For instance, for every E > 0, we can construct such a subrepresentation on the Hilbert which corresponds to a subspace of L 2 (P ) in the sense of the direct integral [14], i.e., As the measure ν E is invariant under U, U is unitarily represented on each L 2 (P, ν E ) in the same fashion as on L 2 (P ). Of course, these representations L 2 (P, ν E ) have the natural interpretation of corresponding to subrepresentations of KvN with fixed total energy E. The measure ν E corresponds to the microcanonical measure as known from CSM.
Note that ν E has not yet been identified as a quantum state and so cannot at this point be interpreted as a microcanonical ensemble for CSM. This identification will, however, come next. To do this, we first need to find eigenvectors to T in a 'generic enough' fashion. We take this to mean that U can be assumed to be periodic with a period τ and that there exists a smooth function Ω on phase space such that Then Assuming, furthermore, that all conserved quantities are of the form g • H, it then follows that a generic eigenvector of T is of the form where n ∈ Z. Now, in L 2 (P ), the only restriction of f is that it must be such that ψ n is in L 2 (P ). Note, however, that f in the subrepresentations L 2 (P, ν E ) simply corresponds to a constant, a normalization factor, and hence this degeneracy is removed. However, the degeneracy it represents regains physical meaning in the formalism of direct integrals. The degeneracy induced by the ambiguity of the choice of f can be considered as a section in the sense of direct integrals. That is, ψ n, f is identified as the element in Now, the amplitude of ψ n, f as a section is given as Therefore, Born's rule tells us that given the equilibrium state ψ n, f , the probability distribution over the energy E is We can simply apply the principles of statistical mechanics to find the desired statistical equilibrium state here. For example, by maximizing the Gibbs entropy given a fixed mean energy, thus concluding that (78) must correspond to the canonical measure, i.e., We can also maximize the Gibbs entropy given a fixed energy E. As is the only statistical equilibrium given a fixed energy, it is trivially the distribution that maximizes the Gibbs entropy. As such, we have applied the tools of statistical mechanics to provide the eigenstates of T physical interpretation as statistical equilibria. Now, as has already been pointed out, the statistical non-equilibria correspond to non-trivial superpositions of these equilibria. Hence, in accordance with Section 2, we have demonstrated that contextuality exists in CSM as well. Note, if T commuted with bothp andq, then indeed no contextuality would be demonstrable by measuring their associated probability distributions or average as these would then be invariant under time evolution for all states. However, because of (55) and (56), this is not so for the relevant cases.

Conclusions
In Section 2, it was demonstrated that in FQM, the occurrence of contextuality is equivalent to being able to differentiate between pure and mixed states. In turn, it was shown that mixed states that are diagonal in the eigenbasis of the considered generator of time evolution transform trivially under time evolution while the corresponding pure states do not. Hence, one can, through time evolution, demonstrate contextuality.
In Section 3, KvN was developed as a unitary representation of the Hamiltonian flow U on L 2 (P ) through the action (37). Similarly to OQM, it was noted that in KvN U is represented such that there exist self-adjoint operatorsp andq satisfying the operator version of Hamilton's equations of motion (55) and (56), which can be seen as a necessity for interpretingp andq as, respectively, corresponding to quantum observables of momentum and position. As such, KvN and OQM can be seen as equivalent at this pre-representation level. This representation theoretic view was not formalized further. However, the heuristics of it were used to motivate us to consider subrepresentations of KvN for which this still held. In particular, the subrepresentations considered were those corresponding to constant energy, these were subrepresentations in the sense of the direct integral (71). It was also shown that solutions of the KvN Schrödinger equation via Born's rule are also solutions of the Liouville equation. In particular, the eigenstates of the generator of time evolution of KvN were shown to, in the sense of statistical mechanics, correspond to statistical equilibria, while statistical non-equilibria correspond to non-trivial superpositions of them. The principle maximum entropy was applied to show that the equilibrium states in the subrepresentations of constant energy correspond to the microcanonical ensemble. It was also shown that maximizing the entropy given a fixed average energy gives the equilibrium states in the direct integral (71) that correspond to the canonical ensemble. It was shown that the eigenstates of the KvN generator of time evolution permit a physical interpretation as equilibria in the sense of CSM, and hence the generator of time evolution itself corresponds to an observable. As the difference between equilibrium and non-equilibrium is physically observable, it followed that when the principles of statistical mechanics are applied to KvN, KvN exhibits contextuality. Now, a possible counterargument for this claim about CSM in the form of KvN exhibiting contextuality is that CSM does not need to be done in FQM, i.e., in the form of KvN. That is, however, besides the point. As explained in the introduction and in Section 2, FQM is merely a computational framework for calculating probabilities of outcomes. In FQM, contextuality takes the form of being able to distinguish between pure and mixed states. Contextuality is, however, still a more general probabilistic phenomena which is quantifiable via generalized FTP. Therefore, it does not matter which formalism is used as long as one is dealing with probability.
It is worth pointing out that it has not been claimed that KvN can replace OQM. KvN cannot, for instance, account for the DSE [7]. Hence, no such replacement can be done. However, no such replacement is needed either. What has been shown is that contextuality, which is often considered as a hallmark of the OQM phenomenon, exists in classical physics as well. As contextuality really is a probabilistic phenomenon, it is, in this article, concluded that the issues regarding the foundations of quantum mechanics are really about the foundations of probability. It is here worth pointing out that Kolmogorov himself pointed out that probabilities are inherently contextual [2].
One might argue against this conclusion of this article by claiming that all classical physics really is reducible to OQM, and that OQM is the 'more fundamental theory'. Based on this, one would then conclude that any contextuality demonstrated in CSM would in reality only be a result of, say, the contextuality induced byp andq satisfying the canonical commutation relations. However, classical physics does not so simply reduce to OQM through, say, someh → 0 limit [15]. Therefore, if one still insists on this reductionism, one would have to consider another form of reduction of a different mathematical form but still carrying the essence of the physical interpretation ofh → 0, the existence of which it is far from obvious and quite possibly impossible. Therefore one cannot so simply disregard the contextuality in CSM demonstrated here as not a 'real' quantum phenomenon. Moreover, this reductionistic view of classical mechanics as being reducible to OQM is common. It is often even applied on the larger landscape of physics, and the natural sciences in general. It is for instance claimed that thermodynamics is just statistical mechanics and that chemistry is just physics. However, as shown in [15,16] , and also pointed out by Dyson [17], this type of reductionistic relation does not hold in general. In [15,16] concrete demonstrations of the failure of reductionism are presented. In these refererences and in [18] it is furthermore discussed what this non-reductionism implies for the hierarchical ordering of theories in terms of a contextified view of emergence versus reduction and ontic versus epistemic.
Lastly we note that Bell-violations may be seen as being induced by contextuality. For in the proof of Bell's theorem [19] the validity of FTP is crucial, something which is even more obvious when considering it in its Wigner form [20]. As such, Bell violations are also ways of demonstrating contextuality [21]. Contextuality is really the hallmark phenomenon of quantum mechanics and, contrary to popular belief, it is present in classical physics as well.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Acknowledgments: I would like to thank Andrei Khrennikov for stimulating discussions and comments. I would also like to thank my wife, this article would not have been possible with out her support when writing got tough.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: whose respective probability distribution with respect to B transforms identically under time evolution.
As ρ is a state, its trace is equal to one and it is self-adjoint, i.e., ∑ a ρ a,a = 1, and for all a, a , ρ * a,a = ρ a ,a .
As it, in addition, is mixed, Tr{ρ 2 } = 1. for all functions f , it follows that this can only hold if Re ρ a,a − √ ρ a,a ρ a ,a e i(θ a −θ a ) e i(a−a )t b|a a |b = 0.
As this must be true for all t, we can pick a t such that e i(a−a )t b|a a |b (A16) is real valued. Hence, (A15) implies that Re ρ a,a − √ ρ a,a ρ a ,a e i(θ a −θ a ) = 0. (A17) In turn, this implies that |ρ a,a | 2 = ρ a,a ρ a ,a , which contradicts (A9). Hence, ρ and ψ cannot transform similarly under time evolution.