Contextuality with Disturbance and without: Neither Can Violate Substantive Requirements the Other Satisfies

Contextuality was originally defined only for consistently connected systems of random variables (those without disturbance/signaling). Contextuality-by-Default theory (CbD) offers an extension of the notion of contextuality to inconsistently connected systems (those with disturbance) by defining it in terms of the systems’ couplings subject to certain constraints. Such extensions are sometimes met with skepticism. We pose the question of whether it is possible to develop a set of substantive requirements (i.e., those addressing a notion itself rather than its presentation form) such that (1) for any consistently connected system, these requirements are satisfied, but (2) they are violated for some inconsistently connected systems. We show that no such set of requirements is possible, not only for CbD but for all possible CbD-like extensions of contextuality. This follows from the fact that any extended contextuality theory T is contextually equivalent to a theory T′ in which all systems are consistently connected. The contextual equivalence means the following: there is a bijective correspondence between the systems in T and T′ such that the corresponding systems in T and T′ are, in a well-defined sense, mere reformulations of each other, and they are contextual or noncontextual together.


Introduction
A formal theory T of contextuality is defined by a class R of possible systems of random variables and a rule by which these systems are divided into noncontextual and contextual ones. In the original theory of contextuality (a term in which we include both the Kochen-Specker contextuality and the contextuality in distributed systems, referred to as nonlocality [1][2][3][4][5][6][7][8]), the class R is confined to consistently connected systems, or a subclass thereof. These are the systems with no "disturbance" or "signaling," which means that the variables representing the same property (answering the same question) in different contexts are identically distributed. The Contextuality-by-Default theory (CbD) extends the notion of contextuality to all systems of random variables, including those with disturbance [9,10], and it has been applied to several experimental and theoretical situations [11][12][13][14][15][16][17][18]. A recent workshop on contextuality [19] exhibited a renewed interest to studying contextuality in inconsistently connected systems, including approaches that are distinctly non-CbD-like [20][21][22], and some work directly critical of CbD ( [23], responded to in Ref. [24]).
The present paper is not about CbD specifically. Rather, it is about a broad class of all possible CbD-like theories, as defined below. The plan and the main message of the paper are as follows. In Section 2, we present the terminology and notation to be used and define the notion of a system of random variables modeling (representing or describing) another system. In Section 3, we define the traditional notion of contextuality in the language of probabilistic couplings [25], and we introduce the notion of C-contextuality as a very broad generalization of both traditional contextuality and CbD-contextuality. In Section 4, we introduce the notion of consistification of a system and show that any theory T, irrespective of its class R of systems and a specific version of the C-contextuality it uses, can be redefined as a theory T , whose systems are consistently connected, and that uses the traditional notion of contextuality. Because of this, we conclude that there can be no set of substantive requirements X for the notion of contextuality that are satisfied by all consistently connected systems but contravened by some inconsistently connected ones. Indeed, if such a set of requirements existed, one could form a theory T whose class R includes some systems contravening X. However, X would then be satisfied by the theory T that is contextually equivalent to T and a mere reformulation thereof. Consequently, requirements X cannot be substantive: they address a form rather than the substance of the notion of contextuality. In Section 5, we discuss some issues related to the consistified systems (the term used for the consistently connected systems in T ), including the representability thereof by hidden variable models. We also briefly discuss there a still more general (in fact, maximally general) notion of C-contextuality, one that does not have the existence-and-uniqueness property postulated for C-contextuality. In the final analysis, this does not alter the main conclusion of the paper.
The idea that consistification precludes the possibility of rejecting extended contextuality while accepting the traditional one was previously mentioned in Ref. [24]. However, it was confined to CbD only and mentioned without elaborating. The consistification procedure was first described in Ref. [13] for an older version of the CbD approach, and it was elaborated and adapted to the current version of CbD in Ref. [26]. Finally, the C-contextuality in our paper generalizes a more limited version of C-contextuality that was used in Ref. [27] as a generalization of the CbD approach.

Basic Notions
A system of random variables is a set of double-indexed random variables where q ∈ Q identifies what the random variable R c q represents (measures, responds to, or describes); c ∈ C identifies circumstances under which R c q is recorded (including what other random variables are recorded together with R c q ); q and c are referred to as, respectively, the content and the context of the random variable R c q ; and the relation q ≺ c indicates that a variable with content q is recorded in context c. As an example, this is a system with Q = {1, 2, 3} and C = {1, 2, 3, 4}: The subset R c = R c q : q ∈ Q, q ≺ c of random variables recorded in the same context c (a row in the matrix above) is termed a bunch, and the subset R q = R c q : c ∈ C, q ≺ c of random variables sharing a content q (a column in the matrix above) is termed a connection. The difference in font (R c vs. R q ) reflects the fact that R c is a random variable in its own right (i.e., all its components are jointly distributed), whereas the components of R q are not jointly distributed. In fact, no two random variables R c q and R c q are jointly distributed unless they are in the same bunch, c = c . The measurable space on which R c q is distributed is assumed to be the same for all elements of a connection and can be denoted A q , Σ q .
The triple (Q, C, ≺) is called the format of the system. It is essentially the mathematical depiction of "what the system is about," what kind of empirical or theoretical situation it represents. Thus, the format of the system in (2) can be presented as where indicates the elements of the relation ≺. To define a system of a given format, one has to specify the distributions of its bunches. As should be clear from Abstract and Introduction, in this paper, we use the notion of one system of random variables, B, being a "mere reformulation" of another, A. Intuitively, this means that regardless of what empirical or theoretical situation is modeled (described, represented) by A, it is also modeled by B. The relation between a system and a situation it depicts is difficult to formalize directly, as one would have then to impose some formal structure on the situation being represented before it is represented (as in the representational theory of measurement, [28,29]). However, it is sufficient for our purposes to formalize a simpler relationship: between a system A and another system that models (describes, represents) the system A. Moreover, rather than presenting this relationship in a most general possible way, it will suffice to describe one special, universally applicable construction of the modeling systems B. We will refer to this construction as canonical modeling.
Consider two classes of systems, R and R † , in a bijective correspondence to each other, about which we say that any system in R is canonically modeled by the corresponding system in R † . The following definition gives a precise meaning to this relation.

Definition 1.
We say that a system R ∈ R with format (Q, C, ≺) is canonically modeled by a system R † ∈ R † with format Q † , Here, the symbol d = stands for "has the same distribution as". The dot symbol in (·, c) and (q, ·) should be taken as part of the names of these contexts. We choose this notation to emphasize that every random variable R c q of the system R is placed in R † within two contexts, (·, c) and (q, ·), whose names are derived from the indices of the variable. Note that the variables in the set R (q,·) defined here have the same distributions as the corresponding variables in R q = R c q : q ≺ c . We use the former set, however, to emphasize that the auxiliary bunches are uniquely determined by the corresponding variables in the main bunches. Note that the variables in R (q,·) are not jointly distributed, so R (q,·) depends on their individual distributions only.
To give an example, consider the systems Observe that in system B the contents, contexts, and the relation between them are constructed in accordance with Definition 1. System B canonically models system A if and if there is a rule by which the distribution of is uniquely determined by the distributions of the corresponding variables in Observe the following properties of canonical modeling.

1.
The formats of R and R † are reconstructible from each other, and so are the bunches of the two systems. Moreover, R † faithfully replicates the bunches of R. This allows one to say that R and R † describe the same empirical or theoretical situation. 2.
One might wonder why we need the auxiliary contexts at all, and they are indeed unnecessary if all one wants is a system modeling another system, e.g., However, we will see the utility of the auxiliary contexts when we introduce consistifications and contextual equivalence, in Section 4.

3.
The contents in the modeling system are "contextualized". For instance, system A in (4) may be describing an experiment in which two questions, q = 1 and q = 2, are asked in two orders, c = 1 indicating "1 then 2" and c = 2 indicating "2 then 1" [30,31]. In this case, in the modeling system, the content q = (1, 2) should be interpreted as "question 1 asked second", and q = (1, 1) should be interpreted as "question 1 asked first". We return to the issue of interpretation in Section 5.1.

4.
The indexation of the variables in a canonical model is clearly redundant, and it can be simplified. It is more important, however, to maintain the general logic of indexing the variables by their contents and contexts.

Traditional and Extended Contextuality
A system R is consistently connected if in every connection R q all its constituent variables have one and the same distribution. Otherwise, the system is inconsistently connected.
(The latter term is also used to designate arbitrary systems, i.e., in the meaning of "not necessarily consistently connected".) An overall coupling of a system R in (1) is an identically labeled system of jointly distributed random variables such that its bunches S c are distributed as the corresponding bunches R c , Clearly, S has the same format as R. A coupling S q of a connection R q is a set of jointly distributed random variables such that S c q d = R c q for all its elements. A connection coupling S q is said to be an identity coupling if S c q = S c q for any two of its elements. Obviously, such a coupling exists if and only if all of its elements (equivalently, all elements of the connection R q ) have one and the same distribution. Moreover, the identity coupling is unique if it exists. (The uniqueness of a coupling should always be understood as the uniqueness of its distribution. In other words, it is irrelevant on what domain probability space the coupling is defined as a random variable.) The traditional notion of contextuality is confined to consistently connected systems, and it can be rigorously defined in our terminology as follows.
Definition 2. A consistently connected system R ∈ R is noncontextual if it has a coupling S in which any connection S q is the identity coupling of the connection R q . Otherwise, the system is contextual.
The class of all possible systems R in a theory T is denoted R. For instance, R can only contain the systems with finite sets Q and C, or only the systems with dichotomous random variables. By constraining the class R, one induces constraints on all possible random variables, R c q ∈ R + + , on bunches of random variables, R c ∈ R + , and on possible connections, R q ∈ R + .
In CbD, contextuality of a system R is defined by considering its couplings S and determining if, in some of them, the couplings S q of the system's connections R q satisfy a certain statement. To generalize this definition to all possible CbD-like theories, all one has to do is to replace this specific statement with one that is (almost) arbitrary. Let C be any statement of the form "the coupling of connection R q has the following properties: . . . ". The only constraints we impose on C are as follows.
Definition 3. C is considered well-fitting if (1) for any connection R q ∈ R + , there is one and only one coupling S q of R q that satisfies C, and (2) if R q consists of identically distributed random variables, then the coupling that satisfies C is the identity coupling. We denote such a coupling of R q as C R q .
To give an example of a well-fitting statement C: in CbD, if the class R of all possible systems is confined to the systems with dichotomous variables, the well-fitting statement is C = "for any two random variables S c 1 q and S c 2 q in the coupling of connection R q , the probability of S c 1 q = S c 2 q is maximal possible". Another example: if the class R of all possible systems is confined to the systems with real-valued (or more generally, linearly ordered) variables, then a well-fitting statement can be C = "for any two random variables S c 1 q and S c 2 q in the coupling of connection R q , S c 1 q and S c 2 q have the same quantile rank". In Section 5.3, we discuss the possibility of dropping the first of the two defining properties of a well-fitting statement C.
Definition 4. Given a well-fitting C, a system R is C-noncontextual if it has a coupling S such that, for any connection R q of the system, the connection coupling S q coincides with C R q . Otherwise, the system is C-contextual.

Equivalence and Impossibility Theorems
It follows from the last two definitions that, for a well-fitting C, a consistently connected system is C-noncontextual if and only if it is noncontextual in the traditional sense (i.e., in the sense of Definition 2). In other words, any extension of the notion of contextuality using a well-fitting C properly reduces to the traditional notion when confined to consistently connected systems. This is not, obviously, sufficient to consider the extension of contextuality by means of a well-constructed C. There may be other desiderata for a well-constructed notion of contextuality, and a specific choice of C may not satisfy them. The question we pose now is as follows: Q*: Is it possible to formulate a set of such desiderata/requirements X for the notion of contextuality that, for some choice of C, (1) X is satisfied for any consistently connected system, but (2) X is not satisfied for some inconsistently connected systems?
Note that we impose no constraints on what X may entail, except for its being related to contextuality. It may, e.g., for some relation B between systems, have the form "if system R 1 is (non)contextual, then any system R 2 related to R 1 by B is (non)contextual" [24].
To answer the question Q*, we need the following result.

Theorem 1.
For any well-fitting C and system R, there is a consistently connected system R ‡ that canonically models it (Definition 1), such that R is C-contextual (Definition 4) if and only if R ‡ is contextual in the traditional sense (Definition 2).
Proof. Let R ‡ be a canonically modeling system for R, with One can check that R ‡ is consistently connected: every connection R (q,c) of R ‡ consists of precisely two variables, R because R (·,c) d = R c in any canonically modeling system and R (q,·) (q,c) d = R c q because we know from (*) that R (q,·) (q,c) d = S c q , where S c q ∈ C R q . The system R ‡ thus constructed is referred to as a consistification of R. We can now define the consistification S ‡ of a coupling S of a system in precisely the same way as for the system itself, except that (*) is replaced with the straightforward S (q,·) = S q , with the obvious correspondence between the different indexations within the two random vectors. Clearly, S ‡ is a coupling of R ‡ .
Assume now that R is noncontextual. This means that it has a coupling S such that (a) S c d = R c for every c ∈ C and (b) S q = C R q for every q ∈ Q. Then, in the coupling S ‡ of system R ‡ , we have (a') S (·,c) d = R (·,c) for every (·, c) ∈ C ‡ , and (b') S (q,·) = C R q for every (q, ·) ∈ C ‡ . Moreover, since both S (·,c) (q,c) and S (q,·) (q,c) equal S c q , we have (c') S (·,c) (q,c) = S (q,·) (q,c) . However, (a')-(b')-(c') mean that R ‡ is noncontextual in the traditional sense. The implication here is easily seen to be reversible, and we conclude that R is noncontextual if and only if so is R ‡ .
In our example (4), B is a consistification of A if we specify the rule for the auxiliary bunches as follows: R (q,·) (q,c) d = R (·,c) (q,c) , and the distribution of R (q,·) is the same as that of C R q .
If C is chosen as in CbD, the consistification of the system R in (2) is the system below (omitting, for simplicity, the parentheses and commas in R (·,c) (q,c) and R (q,·) (q,c) ): (11) where all variables are assumed to be dichotomous, and in each of the auxiliary bunches, the variables are pairwise equal with maximal possible probability. For the purposes of contextuality analysis, R ‡ can be viewed as a mere reformulation of R, a different form of the same substance. We express this fact by saying that R and R ‡ are contextually equivalent. (In Refs. [24,26], contextual equivalence is defined more narrowly, requiring also the numerical coincidence of certain measures of contextuality, such as contextual fraction [32]. In this paper, however, the level of abstraction is higher, and we only consider the notion of contextuality rather than its quantifications.) Consider now a theory of (generally, extended) contextuality T = T(C, R). In accordance with Theorem 1, we can form the class R ‡ of the consistifications of the elements of R in a bijective correspondence with R. By extension of the term, we can say that T and T = T C 0 , R ‡ are contextually equivalent. C 0 here denotes the statement "the connection R q has an identity coupling" that underlies the traditional notion of contextuality, because by definition, it can be viewed as a special case of any well-fitting statement C. We have now everything we need to demonstrate our main conclusion. Let there be a set of requirements X of the notion of contextuality that are satisfied by all consistently connected systems (using the traditional contextuality) and contravened by some inconsistently connected ones, using some version of C-contextuality. Let T include some of the inconsistently connected systems contravening X. Clearly then, requirements X contradict theory T, but they are satisfied by the contextually equivalent theory T = T C 0 , R ‡ . Therefore, X is not a set of substantive requirements. We can summarize this as a formal theorem.

Theorem 2.
For any well-fitting C, there can be no set of substantive requirements X of the notion of contextuality that are satisfied by all consistently connected systems (using the traditional contextuality) and contravened by some inconsistently connected ones, using C-contextuality.
Of course, a set of requirements X satisfied by T but not T can be readily formulated. The theorem says, however, that all it can do is lead one to prefer one of two equivalent representations of contextuality, without affecting the substance of the notion.
Note also that in the theorem just formulated, we assume no relationship between the set of requirements X and the bijective correspondence relating R to R ‡ . In particular, let X have the form "if system R 1 is contextual, then any system R 2 related to R 1 by relation B is contextual." It is not necessary then, although not excluded either, that R ‡ 2 is also related to R ‡ 1 by relation B. All that is stated in the theorem above is that if one wishes to use this X as a substantive principle in testing competing theories, then the failure of a theory to satisfy it cannot be selectively attributed to the fact that its R contains inconsistently connected systems.

Miscellaneous Remarks
Here, we consider a few issues related to the main point of this paper.

Interpretation of Contents and Contexts
Dealing with consistified systems R ‡ , one needs to get used to a new interpretation of contents and contexts of the random variables: as mentioned previously, in R ‡ , contents are "contextualized," with (q, c) in place of just q, and the contexts are simply marginalized contents, (·, c) and (q, ·). Consider as an example the EPR/Bohm experiment, the most widely investigated paradigm in contextuality/nonlocality research [1,33,34]. In the usual CbD notation, the system representing it is where q = 1 and q = 3 denote two settings (axes) to be chosen between by Alice, q = 2 and q = 4 are settings to choose between by Bob, c indicates the combination of their choices, and R c q is the dichotomous (spin-up/spin-down) variables. The consistified representation of the same experiment is (again, omitting the parentheses and commas in the indexation) The interpretation of, say, the content q = (3, 2) here is as follows: it is the choice of axis 3 (that we know to be made by Alice) when Bob's choice of his axis forms combination 2 with Alice's choice (which we know to mean that Bob chooses axis 2). The interpretation of context c = (·, 2) is that it is simply the set of contents whose second component is 2. Similarly, c = (3, ·) is the set of contents whose first component is 3. The random variables within context c = (·, 2) are jointly distributed by observation, whereas the random variables within context c = (3, ·) are jointly distributed by computation (that, in turn, is uniquely determined by the observations). If C is defined in accordance with CbD,

Hidden Variable Models
One possible argument against contextuality in inconsistently connected systems is that it is not distinguishable from inconsistent connectedness itself in the language of hidden variable models (HVMs). If, the argument goes, a consistently connected system R in (1) is noncontextual, it has a coupling S in which all random variables can be presented as where Λ is a "hidden" random variable [35]. If R is contextual, then all its couplings can only be presented as S c q = F(q, c, Λ), with ineliminable c. However, the latter HVM representation is also required for all inconsistently connected systems, irrespective of whether they are C-contextual or C-noncontextual. We would argue in response that this only means that on this general level (merely showing the arguments of the functions), the language of HVMs is too crude to capture the subtler properties of the couplings, such as contextuality under inconsistent connectedness. However, even if one takes this issue as a matter of concern, it is eliminated by the consistification procedure. The system R ‡ corresponding to R is noncontextual if and only if it has a coupling S ‡ such that, for all (q, c) ∈ Q ‡ , for some random variable Λ. If R ‡ is contextual, then in all its couplings, for some (q, c) ∈ Q ‡ ,S (·,c) (q,c) = S (q,·) (q,c) , which means that their HVM representations can only be different functions, S or, equivalently, the same function but with differently distributed hidden variables, It is instructive to apply this to the EPR/Bohm systems A and A ‡ in (12) and (13). Here, contextuality is traditionally referred to as nonlocality because for the contextual system A, all its couplings are represented in the form of (15): the ineliminable dependence on c here is interpreted as the dependence of a measurement on a remote setting. However, if one models the EPR/Bohm experiment by system A ‡ instead, the HVM representations (16) and (17) both contain the contextualized content (q, c) as an argument. Following the logic above, they should both be considered nonlocal, even though one of them represents a noncontextual system and is equivalent to (14), while the other represents a contextual system and is equivalent to (15). It seems to us, in agreement with other authors [36], that this demonstration speaks against a naturalistic interpretation of the HVMs in terms of physical dependences.

The Existence and Uniqueness Constraint
In the definition of C-couplings, their reducibility to identity couplings when applied to identically distributed variables is indispensable because without it, the C-contextuality will not be an extension of traditional contextuality. How critical, however, is the second constraint imposed on well-fitting C, that the C R q -coupling always exists and is unique? What if one considers statement C for which C R q is a set that may be empty or contain more than one coupling? This complicates the matters conceptually because then, in the consistification procedure, the (q, ·)-type bunches, those filled with the C R q -couplings, cannot be formed unquely or cannot be formed at all. However, the main point of this paper can still be made, with some qualifications.
We can agree that the consistification of an inconsistently connected system R is not a single system R ‡ but a cluster of systems R ‡ i : i ∈ I , the elements of which are obtained by filling the (q, ·)-type bunches in the consistification of R by all possible couplings of R's connections. We can further agree that the cluster R ‡ i : i ∈ I is considered noncontextual if it contains a noncontextual system R ‡ i . In particular, if R ‡ i : i ∈ I is empty (which means that C R q does not exist for at least one of the connections of R), the latter definition is not satisfied, and the cluster should be considered contextual. Once again, we have a theory dealing with consistently connected systems only, except that the empirical or theoretical situations they depict are represented by clusters of systems sharing a format and the (·, c)-bunches.
It might seem that dealing with an infinity of possible couplings C R q or proving that C R q is empty is a significantly more difficult mathematical task than when C is well-fitting. This is not the case, however, as the complication is not necessarily major. Mathematically, the problem of finding whether a system R is contextual consists in determining whether the variable S having the same format as R can be assigned a probability measure subject to certain constraints on its marginals. The constraints are imposed by the distributions of the bunches R c (that S c have to match) and by the statement C that has to be satisfied by the couplings S q of the connections R q . For discrete random variables and finite sets Q and C, this is a linear programming task, provided that the compliance with C can be presented in terms of linear inequalities of the probabilities in the distribution of S q . For the consistification R ‡ the problem is precisely the same, except that in place of connection couplings, one deals with (q, ·)-type bunches.
Author Contributions: The authors have contributed equally to all aspects of this work. All authors have read and agreed to the published version of the manuscript.