The Violation of Bell-CHSH Inequalities Leads to Different Conclusions Depending on the Description Used

Since the experimental observation of the violation of the Bell-CHSH inequalities, much has been said about the non-local and contextual character of the underlying system. However, the hypothesis from which Bell’s inequalities are derived differ according to the probability space used to write them. The violation of Bell’s inequalities can, alternatively, be explained by assuming that the hidden variables do not exist at all, that they exist but their values cannot be simultaneously assigned, that the values can be assigned but joint probabilities cannot be properly defined, or that averages taken in different contexts cannot be combined. All of the above are valid options, selected by different communities to provide support to their particular research program.


Introduction
Quantum mechanics is a probabilistic theory, due to the central role played by the Born rule to relate the calculations with the observations. These and other characteristics motivated Einstein and others to postulate that quantum mechanics must also be incomplete, statistical in nature, and that some unknown, hidden variables must exist which could provide a deeper understanding and a more detailed description of the observations at the microscopic level [1,2]. Bell inequalities offer a way to contrast predictions of certain hidden variable theories with experimental observations [3]. The straightforward argument, the simple mathematics involved, and the strong conclusions drawn from it put Bell inequalities in a special place for the physicists, philosophers, and people interested in quantum phenomena.
Since the first experimental observation of the violation of a Bell's inequality [4], there have been a variety of interpretations with high impact in both the description of the world from a quantum physics perspective and in technological developments. For a large number of authors, models describing the quantum realm are not consistent with local realism [5,6], local causality [7,8], and non-contextuality [9]. A large variety of experiments have been performed to test the validity of the main assumptions needed to derive Bell's inequality [10][11][12][13], and in 2015, the three most relevant loopholes were closed at the same time [14][15][16], which was interpreted as providing strong support to the impossibility to formulate local realistic theories that are compatible with the observed violation of Bell's inequalities.
In the area of quantum information, violations of Bell's inequality are used to characterize properties associated exclusively with quantum systems [17]. It has been postulated that the outputs of quantum measurements, when the system is prepared and measured on a different basis, are "intrinsically random" [18][19][20], and guarantee the randomness of random number generators [21][22][23].
Randomness and probability are closely related in the analysis of quantum phenomena [24]. It has been argued that the Hilbert space formalism of quantum mechanics is a new theory of probability [25]. While some authors have pointed out that non-locality, as well as rejection of realism, are only sufficient (but not necessary) conditions for violation of Bell's inequality [26,27] and that the Bell inequalities only need to be satisfied if all observables can be measured jointly [28], it seems that the analysis of the violation of Bell's inequality from the probability theory point of view is not fully understood in the physics community. In this contribution, our aim is to add more elements, in the common language of probability, to include a variety of interpretations of the violation of Bell's inequality beyond non-locality.
Our starting point is the probability space, which is presented in detail in the next section. We adhere to the view that different probability spaces lead to different Bell's inequalities [29,30]. While the inequalities look similar, they are not equivalent. They are different mathematical objects with different sets of hypothesis, which, when questioned, generate completely different conclusions: non-locality, contextuality (setting dependence), impossibility to assign values to unmeasured quantities, impossibility to define a joint probability distribution, etc. Our intention is to expose that there is no unique way to decide which inequality is employed in the analysis of a Bell-type experiment, and the interpretation depends on this choice, which in turn elicits the ontological and epistemological possibilities according to each decision.
Bell's inequalities refer to the probabilities of finding correlations between experimental outputs of a relatively simple experimental setup. Although the quantum mechanical description perfectly matches the experimental results, Bell's inequalities have consequences far beyond quantum physics, and they apply to a variety of generalized probabilistic theories [31]. This adds to its relevance, but at the same time, there is no common basis to which all communities adhere for its application. We hope this contribution will serve as an invitation to recognize the diversity of interpretations, all of them equally valid in their own terms and with different consequences.
The remainder of the article is organized as follows: Section 2 presents a short review of Kolmogorov axioms, and Section 3 describes a Bell-type experiment and the different probability spaces which can be employed in its description. Section 4 contains the derivation of the different Bell's inequalities, describing the hypothesis involved in each of them. The interpretations of the violation of the inequalities are presented in Section 5, and the conclusions in Section 6.

A Short Review of Kolmogorov Axioms
In the measure theoretic view, axioms of probability assume the existence of a sample space, an event space, and a probability measure [32,33]. These three elements, commonly referred to as the probability space, must be properly defined when probability is invoked.
One important motivation for this contribution is that, in many cases, operative rules are provided without making explicit the probability space in which they are being employed. In the case of physics, many textbooks, basic or advanced, particularly those devoted to quantum mechanics, tend to introduce probability as if it were another physical quantity, such as charge, mass, etc. [34][35][36][37]. In the following sections, we show that the implications of the violation of the Bell inequalities are radically affected when different probability spaces are employed.
The sections below present a modern presentation of the probability space [32,38,39]. The first element in the list is the sample space, commonly represented by Ω. This is a set whose elements represent the possible outputs of a trial or experiment. In a coin flip, the set {heads, tails} is the common election for a sample space. On a dice, the sample space is the numbers one through six: {1, 2, 3, 4, 5, 6}.
The sample space has to fully characterize the outputs of the experiment. That is its main feature. Every output not taken into account in the sample space will be ignored. For instance, in the coin flip case, the case where the coin lands on its edge is excluded.
Probability is commonly discussed in relation to games such as dice, coins, roulette, etc. In those scenarios, the sample space appears almost trivial, where it is just the list of possible outcomes in the experiment. However, the situations where probability is applied can be far wider than the former scenarios, making it more challenging to define the sample space.

The Event Space
The second element in the list is the event space. This space is meant to represent the situations in which a probability is associated. It is composed of subsets of sample space. Each of these subsets is referred as an event. While the sample space represents all the possible outputs, the event space can include the representation of more complex situations.
For instance, in the dice example, the probability of the output being even is represented by the event {2, 4, 6}, i.e., the set that contains all the possible outcomes that fulfil the criterion of "being even". Another example is: "the output is six", the event is {6}, this is a set whose only element is 6. Important events are {1, 2, 3, 4, 5, 6}, which can be read as "any of the possible outputs", and the "null event", that is the empty set {}, which can be interpreted as "none of the possible outputs".
The event space does not necessarily include all possible subsets of the sample space, it only needs to fulfil three conditions, as follows: 1.
It must contain the sample space. Therefore, there is an event, the "total" event, that contains all the possible outputs of the experiment. Just as we explained above, for a dice, this event is {1, 2, 3, 4, 5, 6}.

2.
If an event belongs to the space, then its "complement" also belongs to the space. For instance, if the total event is in the space, then the null event must also be in the space. For the dice, if the event containing all the even outputs is in the event space, then the event {1, 3, 5}, the one containing all odd outputs, must also be in the space.

3.
The space must be closed under countable unions and intersections. That is, if the space has the event {2, 4, 6} and also has the event {1, 2, 3, 4}, then the event {1, 2, 3, 4, 6} and the event {2, 4} must also be in the space. This can be interpreted as "the event where the output is even and is less than 5". In this example, the word "and" refers to the intersection of subsets. In a similar way, the union of subsets is referred to with the word "or", as in "the event where the output is even or less than 5".
It is relevant to mention two extreme cases. An event space can be formed using only the "total event" and the "null" event. These two events are enough to have a proper event space. This is called the "smallest" event space possible.
On the other side of the spectrum, there is the "biggest" event space. This is the set of all the possible subsets, the power set of the sample space. In some cases, it is a natural way to represent all the possible events, but in many situations, there are restrictions which exclude some elements of the biggest event space.

The Probability Measure
The third element in the list is the probability measure, P. It is a function that assigns a probability to each event, i.e., a real number. It is important to note that such function must be total, i.e., the probability measure must be defined in every element of the event space.
In the smallest event space, only the total event and the null event will have a probability. In the biggest event space, all the possible events have a probability.
Having defined the probability space, the Kolmogorov axioms can be enunciated.

Kolmogorov Axioms
The Kolmogorov axioms are considered the foundations of modern probability theory [33]. Given a sample space, Ω, an event space, E, and a probability measure, P, such measures must fulfil: if E i ∩ E j = ∅, then where E, E i , E j are elements of the event space. These axioms constrain probabilities to have values between zero and one, and other common properties attributed to probabilities. Notice that the third axiom refers to mutually exclusive events, those which cannot occur at the same time or in the same run [40]. It can be extended to include the case In the following sections, we employ the notation P(E 1 ,

Conditional Probabilities
One of the most important concepts in probability theory is the conditional probability, P(E 1 |E 2 ) [41], the probability of event E 1 given that the event E 2 has occurred. In Kolmogorov's framework, it is introduced as a derived concept: In the case that two events are statistically independent, i.e., P(E 1 , E 2 ) = P(E 1 )P(E 2 ), the conditional probability is unaffected by the introduction of the conditional, since P(E 1 |E 2 ) = P(E 1 ).
For the present analysis, it is important to highlight the fact that a conditional probability is not associated with an event in Kolmogorov's framework.

Probabilities and Relative Frequencies
Relative frequencies are the quantities actually observed in an experiment. When an experiment is repeated M times, and an event takes place N E times, the relative frequency of the event, E, is defined as N E M . These quantities are expected to be approximately equal to probability values as M grows: How tight this equality should hold is not specified in Kolmogorov's framework.

Expected Values
In many situations, it is useful to associate probability with variables instead of with events. To do so, it is a common practice to use random variables as an abbreviation for events. For instance, the probability P(X = x) is the probability of the event "the quantity X assumed the value x".
For any random variable in a probability space, with probability P(X), its expectation value is defined as f (X) . This quantity is expected to have a value close to the average value observed in experiments: where x i is the result of the i-th run of the experiment. Please note that the above definition does not specify the probability space to which these probabilities belong. For instance, X can be in a probability space that also includes Y. In this case, the definition of average is: where P(X = x, Y = y) is the joint probability, which allows to obtain P(X = x) via marginalization: An important case for our purposes is that of binary variables, which can only assume values of 0 and 1. Their expected values have the following property: If there are other binary variables in the probability space, it is possible to extend the previous identity. The product XY is 1 if and only if X = Y = 1, and 0 otherwise. In this case, each expected value is equivalent to the probability of having the value 1:

Bell-Type Experiment
A Bell-type experiment [4,42,43] is schematically represented in Figure 1. In each run of the experiment, a pair of entangled photons, particles, or other quantum systems is generated, with one photon traveling to the left and the other to the right. While the photons are in flight, a random selection is made on each side, selecting an angle, which sets the polarization basis to be employed in the measurement, between two previously defined options. For simplicity of notation, here, we employ the pair 0 • and 45 • on both sides. Changing the angles to, for example, 22.5 • and −22.5 • on one side has no effect on the following discussion. Once the polarization basis is selected, each photon travels through a polarized beam splitter with a single photon detector in each arm. The detector activated on each side is univocally associated with a polarization projection in the selected basis, allowing to assign a value to the polarization, in that basis, of each photon detected. To take into account the detector efficiency and presence of noise, the valid measurements are selected as those where one photon is detected on each side. In this way, all events include, on each side, one and only one angle selected and one and only one detector which registered a photon.
After each run of the experiment, there are four quantities registered: one angle selected and one detector activated on each side.
There are different probability spaces which can be employed to describe the previous experimental situation, and the Bell inequalities can be built on each one of them. Their details are provided below.
3.2. The Probability Space 1 3.2.1. The Sample Space 1 In this space, the possible results of the experiment are characterized by four variables, one angle selected and one detector activated on each side. A hidden variable, λ, is introduced, which cannot be observed. The sample space can be represented with quintuples of the following form: Here, A, B represent which detector is activated on each side. These variables can only have one of two values, denoted 0 and 1. The other variables, α, β, represent the chosen orientation values. They also have only two options, represented by 0 • and 45 • . A hidden variable, λ, is also considered. The sample space contains sixteen different quintuplets for a given value of λ: They represent all the possible combinations for the values of the variables. In a trial of the experiment, it is possible to know the value of the first four variables, however, the last one remains unknown. All possible values of λ describe the same output of the experiment.

The Event Space 1
Implicit in most uses of probability, it is common to choose the biggest event space as the event space, which includes all possible subsets of the sample space. For instance, if side A sets an angle α = 0 • and gets a result A = 1, the corresponding event is: i.e., all the quintuples that meet the requirements A = 1 and α = 0 • . This set contains these quintuples for each value of λ.

The Probability Measure 1
For the third element of the probability space, the probability measure, it is enough to assume that it fulfils the previous requirements and Kolmogorov axioms. It assigns a real number between zero and one to each event in the event space 1.
For instance, the probability of the above-mentioned event, where side A sets an angle α = 0 • and gets a result A = 1, is This probability is different to the probability that side A gets a result A = 1, given that the polarization is measured setting the angle at α = 0: This is a conditional probability, obtained by evaluating: Which one of them must be employed in the formulation of Bell inequalities will be explained in the next section. This sample space also has four quantities of interest: for side A, the results when the apparatus is set at 0 • and set at 45 • . For side B, there are also results for 0 • and for 45 • . A hidden variable, λ, is also added.
The sample space 2 can be written as: where all these quintuplets repeat for each value of λ.

The Event Space 2 and the Probability Measure 2
Again, the chosen space for event space 2 is the biggest event space associated with sample space 2. Every event in it has an associated probability. These three elements form probability space 2. The probability measure is assumed to satisfy Kolmogorov axioms, i.e., the probability measure assigns values between 0 and 1 to all events.

Events and Probabilities in the Analysis of the Bell Experiments
Sample spaces 1 and 2 have similarities and differences which are worth analyzing in detail.

The Same Event in the Two Probability Spaces
The same experimental outcome is associated with different events in each space. As an example, if the result is 0 on side A of the experiment with the angle 0 • , while in side B the result is 1 with the experiment setup at an angle of 45 • , this event in event space 1 is: where there is one quintuple for each value of λ.
In the event space 2, the set of quintuples compatible with these same conditions is: where these four quintuplets repeat for each value of λ. This example shows that the same experimental situation has different events associated to the different event spaces. This can be seen in the following simple example, where only events associated with the left arm of the experiment are considered.

The Same Probability in the Two Probability Spaces
In space 2, the probability that A 0 • = 1 is P(A 0 • = 1). What is the probability of this event in space 1? It refers to those cases where side A obtains the result A = 1, but only for α = 0 • . Note that the election P(A = 1, α = 0 • ) is not the right one, because, if α = 0 • is a very unlikely event, the previous probability will be very low due to the fact that the measurement apparatus on side A is almost always set at α = 45 • . Therefore, it is important to take into account only those cases in which α = 0 • . This is represented by the conditional probability: It is worth extending the previous argument a little further using an example. On the left hand side of Table 1, the events in space 1 representing the output of each run are listed. In the probability space 2, there is a value for A 0 • and A 45 • in every experiment, but it is only possible to measure one of them in each run of the experiment. On the right hand side of Table 1, both the observed and the unobserved outputs associated to each event in event space 2 are shown, with a circle enclosing the observed quantities. The value which was not measured could have been 0 or 1, but in each run, it is assumed to have a value.
In space 1, the number of events in which A = 1 and α = 0 • is 2, and the total number of events is 6. It follows that: In space 2, the values of A for α = 0 • are assumed to exist, even when the angle selected was α = 45 • . In Table 1

Space 1 Space 2 Run
Returning to space 1, P(A 0 • ) = 2/3 can be expressed as the number of events in which A = 1, given α = 0 • , which is the conditional probability: It is worth highlighting that the conditional probability is not associated with an event in space 1. It is a mere convention meant to represent the quotient of probabilities that do have an associated event.
The above discussion confirms the relation: where the probability on the left hand side is in probability space 2, while the one on the right hand side is in probability space 1. This equivalence is relevant in the analysis of Bell inequalities.

Some Events Exist Only in One Probability Space
There are situations in which an event in one event space cannot be represented in the other. For instance, the event {α = 0} in space 1 cannot be expressed in space 2, since there is no variable related to the chosen angle.
On the other hand, the event {A 0 • = 0, A 45 • = 0} exists in space 2, but in space 1 it has no meaning, because it would imply to select both polarization angles at the same time. This is not possible since it is not included in sample space 1.

Two Bell-CHSH Inequalities
Bell-CHSH inequalities refers to a family of inequalities. One very common and useful inequality is the CHSH inequality [42]: where the random variables X, X , Y, Y can take only the values +1 or −1. Please note that the previous expression leaves the probability space in which the averages take place unspoken. It is useful to rewrite the above expression in terms of random variables A, A , B, B , which take the values +1 or 0 instead. In this form, it reads: This will be the one used in this paper.
As it was explained in Section 2.2.3, employing variables which can only take the values 1 or 0 allows to write their expected values as probabilities. For instance, AB is equal to the probability of having a value of one on side A for α and also on side B for β. The relevant point here is that there are two probability spaces in which such probability can be written. As explained in Section 3.4.2, in probability space 1, the equality: should hold. In contrast, in probability space 2, the equality: holds. Since every term in Equation (16) can be expressed in either of the spaces, we end up with two possible inequalities.
As the two probabilities have been proven equal, the matter of choosing a probability space to write the inequality appears irrelevant at first glance. However, both options will have completely different consequences when the violation of the Bell-CHSH inequalities is analyzed.
To continue, a derivation of the inequality in the different spaces is exposed. Here, we reproduce a derivation that simplifies the mathematical expressions [30] and enables to clearly state the mathematical hypothesis used. It is based on the following mathematical inequality. Given four real numbers, a, a , b, b ∈ [0, 1], as shown in the Appendix A, the following inequality holds:

Bell Inequality in Probability Space 1
The version of the inequality (16) expressed in probability space 1 is the most commonly found in the literature, and the closest with the original intention of the inequality written by Bell [3,42,44].
Since λ is assumed to be unobserved in the experiments, its relative frequencies cannot be measured. If the inequalities are to be compared with the experiment, the involved probabilities should not include λ. This is accomplished by averaging over the λ variable.
Based on Kolmogorov's framework, the averaging is given by: for the four possible values of α and β. Two hypothesis are employed to manipulate the above conditional probabilities. One is the locality hypothesis [7]: Based on the concept of statistical independence, it is commonly interpreted as the requirement that the polarization angle, α, selected on the left hand side does not influence the value of the output, B, on the right hand side, and vice versa.
As there are four different settings, one for each pair of angles, α, β, it is necessary to add the hypothesis of λ-independence, sometimes referred to as settings independence [45]: They allow to write: The desired inequality is deduced in the following way. As all probabilities are assumed to satisfy Kolmogorov's axioms, probability takes values between 0 and 1. That allows to use inequality (19) with the association: to obtain, after multiplying by P(λ) and integrating: This allows to derive the desired inequality, from now on referred to as inequality 1: In brief, this inequality required the following assumptions in its proof: probability space 1, Kolmogorov's axioms, locality, and λ-independence.

Bell Inequality in Probability Space 2
Assuming the existence of the probability space 2, the demonstration of its Bell inequality is straightforward [46][47][48] .
Since A α , A α , B β , B β only assume the values 0 and 1, it is again possible to use inequality (19), making the assignation: Multiplying by P (A α , A α , B β , B β , λ), i.e., the joint probability, and averaging over all the variables, the following is obtained: In each realization of the Bell experiment, the outputs A α , A α , B β , B β can only have values of 0 or 1. Therefore, according to Section 2.2.3, it is possible to use identity (7). Each term can be rewritten as: Then, inequality (24) takes the desired form: From now on, this will be called inequality 2.
To summarize, to demonstrate the validity of the inequality 2, Equation (26), probability space 2 and the validity of the Kolmogorov axioms is required. On the other hand, there is no need to invoke locality nor λ-independence.

Bell Inequality in a Third Probability Space
It is possible to derive a third form of the inequalities, one that has the angles and at the same time associates values to all observables. It must be based on a sample space that takes into account all the quantities: Using the biggest event space, a new probability space is formed. The inequality, referred to as inequality 3, in this space has the form: The proof can be found using the same argument as inequality 1. It turns out that the analogous inequality in this space also needs the hypothesis of inequality 1 [30], so the introduction of simultaneous values has no impact on the hypothesis when compared to inequality 1.
These possibilities are summarized in Table 2.

Bell Inequalities without Probability
It is possible to derive inequality 2 without making any explicit reference to probabilities, only to relative frequencies [46,47].
In an experiment repeated N times, A i α is meant to represent the value of the variable A α in the i-th repetition of the experiment, and the same notation for the other variables. Since each variable only assumes the values 0 or 1, it is possible to again use (19) with the association: Summing up all the repetitions of the experiment, and dividing by N, we have: The hypothesis are: • The outputs A α , A α , B β , B β have assigned values in all the runs, although only two of them are measured in each run. • The observed product averages, ∑ i A i α B i β , can be experimentally determined only for the subset of the results in which the angles α, β were selected, while the theoretical average refers to the whole set of values. A version of the fair sampling assumption is required, to associate the observed and predicted values [26].

Interpretations of Inequality Violation
Violations of the Bell inequalities have been observed since the first experiments [4,42]. The original setting raised a list of objections regarding the adequacy of the experimental setup, for instance, efficiency of detectors, space-time separation of the detectors on both sides of the experiment, etc. [12,13,49,50]. Most of the original objections have been answered with more refined settings, at the same time closing many of the loopholes discussed in the literature [14,[51][52][53][54]. Despite the remarkable experimental creativity and amazing technological developments, the debate about the validity and the interpretations of the observed violation of Bell inequalities remains vivid. In the following, we analyze the consequences, assuming that the experiments actually show a violation of Bell's inequalities.
In the above discussion, it has been established that there are different inequalities with different hypothesis, as listed in Table 2. For each one of them, accepting the experimental violation implies that some of the hypothesis must be questioned.
The inequalities 1 and 3 are based on the same set of hypothesis. Most of the literature around this subject concentrates on locality and λ-independence. Locality, as its name suggests, is interpreted trough its relation with space, and refers to the impossibility of mutual influence between events in space-time whose interval is space-like. The probability independence, or statistical independence, is interpreted as independence of the physical situation, and is sometimes related to contextuality [55,56]. The consequences have been widely discussed in many excellent texts [7,8,27,29,47,[57][58][59]. We will not abound on them further.
The violation of any of the three inequalities can be interpreted, alternatively, questioning any element of the probability space, i.e., the sample space, the event space, or the probability measure.
First, consider that the failure is in the sample space. That is, the considered sample space does not correctly represent the experimental situation. Since all sample spaces have hidden variables included, it could be tempting to conclude that the mere existence of hidden variables must be rejected, invalidating the use of any of the sample spaces. This position is also close to the orthodox quantum mechanics. In particular, it is related to the debate of completeness of quantum mechanics. However, it is difficult to hold this position, since there are theories such as Bohmian mechanics that use non-local hidden variables and successfully reproduce the results of quantum experiments [60,61].
An argument can be stated against the use of the sample spaces 2 and 3. From the old days of quantum mechanics, the simultaneous assignment of values for non-commuting observables has been rejected [62], and these sample spaces perform such value assignment. In the same line, an instrumentalist position will argue against these spaces, since there are values that cannot be simultaneously observed, condensed in the famous dictum: unperformed experiments have no results [30,63].
To continue, consider that the failure is in the event space. It must be closed under unions and intersections.
It is possible to accept the sample spaces 2 and 3, but reject their event space, arguing that only events that can be observed are to be considered. From this point of view, the event {A 0 • = 1, A 45 • = 1} must be rejected since it cannot be observed. This is a problem since {A 0 • = 1} and {A 45 • = 1} can be observed, so their intersection should be an event. This is enough to invalidate the event space.
Continuing with the elements to be questioned, the probability measure is the next on the list. It is required to assign probability to all events and to fulfil Kolmogorov's axioms. Therefore, it can be argued that the sample and event spaces do apply for the situation, but not all events have an assigned probability. The existence of the joint probability for the four variables of interest, i.e., P(A α , A α , B β , B β ), is a necessary and sufficient condition to deduce Bell inequality 2. Therefore, the inequality violation can be interpreted as a non-existent joint probability. This argument is due to Fine [64], and has been explored in [28,48,[65][66][67].
The interpretation of violations of Bell's inequalities as invalidating the sample space, the event space, and/or the probability measure, can be be summarized as "Bell inequality violation just proves that there cannot be a reduction to one common probability space." [9,56,[67][68][69].
Finally, the last hypothesis to be questioned is the applicability of the Kolmogorov axioms. Another way to understand the violation of Bell's inequalities is to accept the va-lidity of the probability space, and question the applicability of the Kolmogorov axioms (3) to this situation. The axioms guarantee that probabilities have values between zero and one, sum up to one, etc.
In the context of quantum mechanics, a phase space including non-commuting observables such as position and momentum is widely employed as a probability space, but the "probabilities" defined there do not fulfil the non-negative Kolmogorov's postulate, such as the Wigner quasi-probability distribution. There are also distributions which are non-negative, but their marginals do not coincide with the quantum mechanical probabilities in position or momentum, such as the Husimi distribution [70,71]. There have also been works relating contextuality and negative probabilities in areas outside of physics [72].
To sumarize, when the Bell's inequalities are formulated in the probability spaces 2 and 3, their violation can be justified, denying the pertinence of these sample spaces, or the event spaces, as being not closed under unions and intersections. It is also possible to point out some events that do not have an associated probability, or to exhibit probabilities that do not fulfil the property of being between one and zero. Inequality 1 is not affected by many of these concerns, which are based on the counterfactual character of elements in the sample spaces 2 and 3. Sample space 1 does not assume the existence of simultaneous values of incompatible outputs in each experimental run. For this reason, if the probability space 1 is employed to deduce the inequality 1, and the Kolmogorov framework is assumed, it is somehow natural to conclude that either locality or λ-independence must be questioned to explain the inequality violation.
A final remark refers to the Kolmogorov axioms as the best theoretical representation of probability theory. It is worth remembering that the origin of probability can be traced back to the XVII century. Extensive and valuable studies about the nature of probability were written [40,73,74] before the Kolmogorov axioms appeared in 1932 [33]. Each school of thought has important differences regarding event spaces and probability measures. Although all schools coincide in Kolmogorov axioms for textbook cases, the Bell scenario appears as the perfect situation to raise the differences [30,75,76].

Conclusions
Bell-type experiments can be analyzed by employing different probability spaces under Kolmogorov's axioms. Each of the spaces are based on a set of hypothesis, as listed in Table 2. Under some simple mathematical manipulations, they lead to different versions of the Bell-CHSH inequalities, all of them mathematically sound but not equivalent. Their observed violation implies that at least one of the hypothesis must be refuted. Employing the probability space 1 and the validity of the Kolmogorov axioms has led many authors to conclude that any hidden variable model must be non-local and/or contextual. On the other hand, probability space 2 offers other interpretations: that values cannot be simultaneously assigned to all variables, that the values can be assigned but joint probabilities cannot be properly defined, or that such situations pertain to different probability spaces.
The observed violations of the Bell-CHSH inequalities impose conditions on any physical theory aimed to describe these observations. The theoretical or philosophical frame where the description is performed depends on the moment and the interest of each community. Those elements are the basis of each interpretation. Particular probability theories, such as frequentism, of philosophical views such as realism, would add elements that may dismiss one or more probability spaces, narrowing the set of hypothesis and changing the epistemological interpretation.
The intention of this contribution was to emphasize the richness of the possibilities and show that, in the present use of probability, there is no evidence that one option should be selected over the others. The same experimental results lead to different conclusions depending on the description used.
As different probability theories have different interpretations of what a probability refers to, they have important implications in the analysis of the violation of the Bell-CHSH inequalities, which we plan to analyze in future work.