Assumption-Free Derivation of the Bell-Type Criteria of Contextuality/Nonlocality

Bell-type criteria of contextuality/nonlocality can be derived without any falsifiable assumptions, such as context-independent mapping (or local causality), free choice, or no-fine-tuning. This is achieved by deriving Bell-type criteria for inconsistently connected systems (i.e., those with disturbance/signaling), based on the generalized definition of contextuality in the contextuality-by-default approach, and then specializing these criteria to consistently connected systems.


Introduction
The criteria (necessary and sufficient conditions) of contextuality/nonlocality (usually, but not necessarily, in the form of inequalities) are at the heart of the foundations of quantum physics. Since John Bell's pioneering work, researchers have been interested in what assumptions about nature one needs to justify these criteria. Several such assumptions have been proposed. One of them is that measurements cannot be affected by spacelike remote events (local causality). Another assumption is that experimenters choose what they measure independent of the background events determining measurement outcomes (free choice). It has also been proposed that statistically identical measurement outcomes should have identical ontological models (no-fine-tuning). This article shows that the criteria of contextuality/nonlocality can be derived without any such assumptions.
Although our discussion is valid for essentially all possible systems of random variables, we will use a system describing the Bohm's version of the Einstein-Podolsky-Rosen experiment (EPR/B) [1][2][3] as a throughout example. This allows us to avoid technicalities needed in a more general exposition. The system in question is (1) The random variables in (1) represent outcomes of spin measurements of two entangled particles along direction q; in every trial Alice chooses between directions q = 1 and 3, and Bob chooses between q = 2 and 4 (indicated by subscripts of the random variables). We call subscripts q in R c q the contents of the corresponding random variables. The four combinations of the Alice-Bob choices form contexts c = 1, . . . , 4 (indicated by the superscripts of the random variables). If the spin along an axis q (i.e., content q) is measured in context c, we write: For instance q = 2 ≺ c = 1 but q = 3 ≺ c = 1. The particles are spin-1 /2, so all the random variables in the system are dichotomous, say ±1.
The structure of this article is as follows. In Section 2, we do preliminary work: we present a rigorous version of the traditional account of the criteria for contextuality/nonlocality of systems of random variables with no disturbance (we call them consistently connected systems), and then we stipulate the traditional assumptions involved in the formulation of these criteria. The main point of this paper is presented in Section 3. There, we generalize the definition of contextuality/nonlocality to inconsistently connected systems (those with disturbance or signaling), and show that the generalized Bell-type criteria in this generalized conceptual setting can be derived with no assumptions. Instead they are derived from a classificatory definition of (non)contextuality based on the notion of the difference between random variables. The traditional Bell-type criteria then immediately follow by specializing the generalized criteria to consistently connected systems, again, with no assumptions about nature.
The term "nonlocality" has been criticized as misleading, unnecessary, or even outright contradicting the tenets of quantum physics in several recent publications [4][5][6]. In this paper, however, the term is used to simply designate a special case of contextuality, for systems where contexts are formed by spacelike separated components. Barring experimental biases, such a system is consistently connected, and its mathematical analysis does not differ from that of other consistently connected systems [7,8]. Singling nonlocal systems out, however, is justified due to their special importance in quantum physics.

Contextuality for Consistently Connected Systems
Consider a random variable Λ and a measurable function F mapping Λ and q into {−1, 1}. In contextuality/nonlocality analysis of a system we are asking whether one can choose (Λ, F) so that, for every context c, where d = stands for "has the same distribution as". In other words, the joint distribution of all F(q, Λ) for a given context c is the same as the joint distribution of all R c q in this context. Thus, for system R 4 in (1), Note that the random variables R c q : q ≺ c in a given context c are jointly distributed: e.g., the event R 1 1 = x, R 1 2 = y is well-defined for any x, y ∈ {−1, 1}, and has a probability assigned to it. At the same time, random variables in different contexts, e.g., R 1 1 and R 2 1 , or R 1 1 and R 2 2 , do not have a joint distribution, as different contexts are mutually exclusive [9]. However, all random variables F(q, Λ) are jointly distributed, because Λ is one and the same for all q and c. Therefore, the random variables S c q below form a (probabilisitic) coupling of system R 4 : Generally, a coupling of a system of random variables R = R c q : c ∈ C, q ∈ Q, q ≺ c is a set of jointly distributed S = S c q : c ∈ C, q ∈ Q, q ≺ c such that, for any c ∈ C, Note that any set of jointly distributed random variables is a random variable. Therefore, S, S c , and R c are random variables, but R is not (hence the difference in notation). In the coupling (5), the octuple of the S c q -variables is jointly distributed, and the withincontext joint distributions of S c q variables are the same as the joint distributions of the corresponding R c q variables. In addition, since F(q, Λ) does not depend on c, we have (with Pr standing for probability): for any q ≺ c, c . In other words, the probabilities are well-defined for all 2 8 octuples of s c q ∈ {−1, 1}. These probabilities are subject to the following constraints:

1.
For every fixed (c, q, q ), such that q, q ≺ c, and any fixed values where the coefficient ι [expression] is the Boolean indicator of whether the event For any fixed (c, c , q) such that q ≺ c, c , and any fixed values This can be compactly presented as a linear programing (LP) task [7,10]: where X is the vector of the 2 8 probabilities (8), P is the vector of the probabilities in the right-hand sides of (9) and (10), and M is a Boolean incidence matrix, whose entries are the ι-coefficients in (9) and (10). The condition X ≥ 0 (componentwise) ensures that the solution X, if it exists, consists of numbers interpretable as probabilities (the summation of these values to 1 in (11) is ensured). Denoting by LP[M, P] a Boolean function that equals 1 if and only if a solution X exists, it is well-known that LP[M, P] is computable in polynomial time [11]. Consequently, can be taken as a criterion of noncontextuality/locality. As an optional step, by a facet enumeration algorithm, this criterion can be presented in the form of inequalities and equations involving moments of the distributions within contexts. For system R 4 , this yields [3,12,13]: Equalities (14) present the condition of consistent connectedness. What ontological assumptions have we made in the foregoing? Let us note first that we need no assumptions to derive the criterion LP[M, P] = 1 or its equivalents from (3). The derivation of the criterion is by straightforward linear programming, optionally complemented by facet enumeration. However, we need certain assumptions about nature to justify the plausibility of using the function F(q, Λ), which is the starting point of the derivation. Namely, we must have assumed that the mapping does not contain context c among its arguments. For the EPR/Bohm experiment, following Bell [14], this assumption can be called local causality, because dependence of the mapping on c can be interpreted as dependence of measurement outcomes on spacelike-remote settings. More generally, however, this can be called context-independent mapping, in order to also include the Kochen-Specker type contextuality [15], when measurements in the same context are not spacelike separated. We must have also assumed that Λ in F(q, Λ) is one and the same for all q and for all c. This is called the assumption of free choice, or statistical independence (of measurements and settings). In relation to system R 4 , the necessity of adding the free choice assumption to the local causality assumption was pointed out to John Bell by Shimony, Horne, and Clauser in their 1985 interchange [16,17]. The relationship between free choice and local causality (more generally, context-independent mapping) is an interesting issue, but it is discussed elsewhere [18].
One can replace both the assumption of context-independent mapping and the free choice assumption with the assumption proposed by Cavalcanti, called no-fine-tuning [19,20]. For our purposes it can be formulated thus: if two random variables sharing a content in different contexts have the same distribution, their representations in the form F(some parameters, some random variables) (15) should be identical. This is an attractive alternative to context-independent mapping, because the latter is not especially compelling in the cases of Kochen-Specker-type contextuality, when contexts are not defined by spacelike remote settings. Note that the no-fine-tuning can also be considered a principle of theory construction, essentially a conceptual parsimony principle, rather than an ontological assumption.
As it turns out, however, by generalizing the notion of (non)contextuality to include inconsistently connected systems, one can avoid the necessity of making any of these, or other, falsifiable assumptions. The no-fine-tuning assumption (or parsimony principle) is a consequence of specializing this general definition to consistently connected systems.

Contextuality in Inconsistently Connected Systems
The generalization follows the broadening of the class of systems of random variables amenable to contextuality analysis. As one can see in (14), the criterion LP[M, P] = 1 can be satisfied only if the system of random variables is consistently connected: random variables measuring the same property in different contexts have the same distribution. We will now drop this constraint, and allow for inconsistent connectedness. In particular, we allow for signaling between Alice and Bob if their measurements are timelike separated.
We begin with the maximally lax representation in which both the outcome of the measurement and the background random variable are allowed to depend on both q and c. This representation obviously holds for any system of random variables. Now, because all R c q in a given context c are jointly distributed, it follows that all λ c q in this context are jointly distributed. Therefore, there is a random variable λ c of which λ c q for all q ≺ c are functions. As a result, we get a seemingly more restrictive but still universally applicable representation: The remaining dependence of λ c on context c can also be eliminated, by the following reasoning. Let us form an arbitrary coupling e.g., couple all λ c independently. Then: where Proj c is the cth projection function. This representation (with Λ one and the same for all q and c) is also universally applicable. Note that in (19) we only have R c q d = G(q, c, Λ), rather than R c q = G(q, c, Λ), because the latter would make R c q jointly distributed across mutually exclusive contexts c (which is nonsensical).
It remains to define (non)contextuality in this generalized conceptual setting. In the contextuality-by-default approach (CbD) [8][9][10]21,22], the system is considered noncontextual if (Λ, G) in (19) can be chosen so that the probability: is the maximal possible, for all (q, c, c ) such that q ≺ c, c . For dichotomous random variables, this means The rationale for this definition is as follows. The maximal probability with which G(q, c, Λ) and G(q, c , Λ) can be made to coincide shows how similar the random variables R c q and R c q are if taken as an isolated pair, "out of their contexts". Denoting by p c q the value of Pr[F(q, c, Λ) = 1] = Pr R c q = 1 , the maximum probability of F(q, c, Λ) = F(q, c , Λ) is 1 − p c q − p c q , and it is achieved if and only if the joint distribution of S c q = F(q, c, Λ) and S c q = F(q, c , Λ) is (assuming p c q ≤ p c q ): The existence of a coupling of the system in which all pairs S c q , S c q have this joint distribution indicates that the way R c q and R c q are related to other random variables in the corresponding contexts leaves their dissimilarity intact. Otherwise, the contexts of the system "force" some of the pairs of R c q and R c q to be more dissimilar than they are when taken in isolation.
The Bell-type criterion for this generalized definition is also determined by the LP problem (11). The only difference is that the part of vector P determined by (10) is replaced with the probabilities given in (22): Note that this definition of (non)contextuality is purely classificatory, it does not involve any assumptions about nature.
Suppose now that the system is consistently connected. Then the maximal probability in (20) is 1, i.e., (22) holds with p c q = p c q , and the system is noncontextual if and only if (Λ, G) can be chosen so that: whenever q ≺ c, c . But the latter means that We have thus arrived at the same representation as in the previous section, but without assuming context-independent mapping, free choice, or no-fine-tuning.

Conclusions
We have seen that if one does not make any constraining assumptions about how a measurement outcome in a system depends on settings, local or remote, and if one adopts the CbD definition of generalized (non)contextuality, the derivation of the traditional criteria of (non)contextuality follows with no ontological assumptions, by simply specializing the definition in question to consistently connected systems.
Let us examine possible doubts about the validity of our analysis.
(1) Is not the property of consistent connectedness from which we derive (24) and (25) an assumption? If the distributions of the random variables are only known to us on a sample level, then it is indeed an assumption, as any other statistical hypothesis. However, the discussion in this article deals with the systems known to us precisely, as random variables rather than samples. For example, a standard quantum mechanical computation can yield the precise joint probabilities for the pairs of variables in each context of system R 4 .
(2) By writing R c q = γ q, c, λ c q , have we not made the assumption of "outcome determinism"? The latter means that the values of the relevant arguments uniquely determine the value of R c q . What if q, c, and λ c q only determine the distribution of the random variable R c q rather than its value? This possibility, however, amounts to introducing yet another random variable among the arguments of the function determining the value of R c q : This obviously reduces to the initial representation on renaming λ c q , µ c q into λ c q . This observation is, in fact, a short (and generalized) version of a theorem established as early as 1982 by Arthur Fine [12].
(3) Is not the generalized CbD definition of contextuality a form of the no-fine-tuning assumption? This is clearly not true for the original no-fine-tuning assumption [19,20], as the latter is confined to consistently connected systems. However, it has been shown by M. Jones [23] that it is possible to generalize the no-fine-tuning assumption to become a principle that forbids "hidden influences" in ontological models of a system. "Hidden influences" mean dependence of measurement outcomes on factors that do not influence the distributions of these outcomes. Maximizing the probability of G(q, c, Λ) = G(q, c , Λ) ensures that the entire difference between the influences of c and c on the measurement of q is reflected in the difference of their distributions. However, in CbD, this is a consequence of defining (non)contextuality of a system in terms of differences between content-sharing random variables, rather than an assumption about nature or a principle for constructing plausible ontological theories.
(4) Finally, let us address the question about the distinction we make between ontological assumptions and a classificatory definition. Is it defensible? Is not any assumption convertible into a definition and vice versa? Specifically, the traditional analysis of con-textuality can be presented as a definition according to which a (consistently connected) system is noncontextual if a representation exists, and it is noncontextual otherwise. Not denying this obvious possibility, it is nevertheless reasonable to ask how such a definition is motivated. The assumptions of contextindependent mapping, free choice, and no-fine-tuning serve to provide this motivation.
(Let us note in passing that most of the traditional analyses of contextuality confuse the distributional Equation (27) with the equality R c q = F(q, Λ). The ensuing logical problems are discussed in [9,21]). In our analysis, we begin with a hidden variable model that cannot be empirically false, R c q = γ q, c, λ c q , and reduce it to R c q d = G(q, c, Λ), that cannot be false either. Then we pose the question about the difference between random variables R c q and R c q , measuring the same content in different contexts. Asked about the motivation for this question, the obvious answer is that we are interested in the dependence of measurements on their contexts. The maximal probability of S c q = S c q in the coupling S c q = F(q, c, Λ), S c q = F q, c , Λ provides the answer for R c q and R c q taken in isolation from their contexts. Finally, we ask the question if thus measured differences between R c q and R c q are compatible with their respective contexts, by finding out if the maximal probability of S c q = S c q remains the same or decreases within the overall couplings of the system. If asked about the motivation for this question, the answer is that the situations when these maximal probabilities decrease (for some R c q and R c q ) indicate a special form of dependence of measurements on their contexts, beyond the difference of their distributions. There seems to be no assumptions about nature that would pertain to this distinction (see point 3 above). To summarize, the point we make in this paper is not based on the admittedly blurry distinction between definitions and assumptions. Rather, it is based on dealing with a representation, γ q, c, λ c q or G(q, c, Λ), that is universally applicable, and does not therefore involve any ontological assumptions.
Funding: This research received no external funding.