Sensitivity to Context in Human Interactions

: Considering two agents responding to two (binary) questions each, we deﬁne sensitivity to context as a state of affairs such that responses to a question depend on the other agent’s questions, with the implication that it is not possible to represent the corresponding probabilities with a four-way probability distribution. We report two experiments with a variant of a prisoner’s dilemma task (but without a Nash equilibrium), which examine the sensitivity of participants to context. The empirical results indicate sensitivity to context and add to the body of evidence that prisoner’s dilemma tasks can be constructed so that behavior appears inconsistent with baseline classical probability theory (and the assumption that decisions are described by random variables revealing pre-existing values). We ﬁtted two closely matched models to the results, a classical one and a quantum one, and observed superior ﬁts for the latter. Thus, in this case, sensitivity to context goes hand in hand with (epiphenomenal) entanglement, the key characteristic of the quantum model.


Introduction and Basic Definitions
Prisoner's dilemma (PD) games involve two players with a binary action each, typically denoted as cooperate (C) vs. defect (D). A usually symmetrical payoff matrix determines the reward of each player, depending on their combined action. Typically, payoffs are set so that it is most advantageous to D, if the other player Cs, but the mutual gain is highest if they both C (defection is then the Nash equilibrium). PD games have been extensively studied in psychology, partly because they can lead to apparent discrepancies with classical probability theory [1][2][3][4]. In the pioneering study by [4], participants were put in the shoes of one of the players in a PD game and were presented with three kinds of trials: first, trials for which participants were told the other player would defect; second, trials for which participants were told the other player would cooperate; third, trials for which participants were not given information about the other player. Results indicated that Prob(D Participant , unknown) was outside the bounds of Prob(D Participant |known C) and Prob(D Participant |known D), thus violating the law of total probability. Such results are not insurmountably inconsistent with classical probability theory, but they do challenge the ubiquitousness of classical probability theory in cognitive theory [5][6][7][8].
In standard PD paradigms, there is a Nash equilibrium for each participant to D, that is, neither participant can improve her position by unilaterally changing a D action. In this work, we do not consider such PD paradigms, but rather just the two-player interactions, based on a payoff matrix without a Nash equilibrium. We refer to such paradigms as PD variants. The surprising hypothesis we are interested in is whether there are PD variants for which choice statistics cannot be modelled with a four-way probability distribution (this statement will be qualified shortly). So, our paradigm reflects a minimal set up of interaction between two agents. While there is a vast literature on game theory, we avoid These distinctions are particularly relevant in psychology, since the only systems known to break Bell's bound are physical systems of microscopic particles, obeying the laws of quantum mechanics. By contrast, for macroscopic systems, it is generally (see shortly) accepted that violations of Bell's bound can be accounted for only by communication, disturbance or some other equivalent mechanism, between the two systems [9]. For example, demonstrably classical systems, such as containers with fluids at different levels, connected by tubes, allow the construction of variables which violate Bell's bound. But of course there is nothing peculiar going on and this is just a result of communication or influence between the systems (such examples have been known for a while, e.g., [17,18]). We can say that such systems demonstrate sensitivity to context. Note, there are subtleties to this discussion, for example see [17,19], who described possible systems for which a measurement (decision) itself can bring about the dependence to context needed for S > 2. An additional subtlety is whether communication is assumed to lead to signaling or not. In [18,19] there is no signaling, but in [17] there is signaling (as [18] note, in general, communication can be taken to be some influence of some sort, but it does not always have to lead to signaling). These ideas are interesting, though we think they do not apply to the present results (this issue is briefly considered in the General Discussion).

Psychological Implications and Outline
Bell's bound has an almost magical quality. Sensitivity to context means impossibility of describing the system in the usual way via a four-way probability distribution, with the marginal distributions representing the observed (conditional) statistics. But what exactly does this mean? Consider Table 1, wherein we assume that all marginal probabilities are 0.5. For the right-hand side, S = 4 and it can be shown that the corresponding probability information is not self-consistent (the same conjunction can be 'shown' to be both zero and non-zero, Appendix A). We think that, amongst experimental psychologists at least, it is a baseline expectation that probabilities can be organized in a table of this kind.  NB. Each table is four separate probability subtables, corresponding to different measurements for the two systems. For the left table, S = 2, and for the right, S = 4 > 2. It can be shown that the right table is inconsistent (Appendix A). The right table is a famous one, corresponding to the Popescu-Rohlich box (PR-box; [13]).
We are interested in how these ideas translate to two individuals playing a game, corresponding to a Bell scenario (i.e., each individual has two binary questions). Of course, an interaction between two individuals is an extremely common decision situation. With the locality and free choice assumptions, in general it is impossible to break Bell's bound [19][20][21]. For two agents, the only way Bell's bound can be exceeded is if at least one of the free choice or locality assumptions is violated. For example, suppose we retain free choice and allow violations of locality. Then, Bob needs to adjust his answers depending on knowledge of which question Alice receives. So, the decision to stay local or not is 'outsourced' to Bob-in the experimental paradigm we employ, it is up to the participants (on a trial by trial basis) to decide whether to stay local or not. This is the essence of the paradigm we will shortly present.
So far, while there have been several studies concerning Bell's bound in psychology, these studies have focused on the thought processes of individual participants. Specifically, there have been several examinations of sensitivity to context, for the same participant answering all four questions, a1, a2, b1, b2 (for an early example see [22]. The issue of compositionality in conceptual combination concerns whether the constituent concepts combine in a way that their meaning independently determines the meaning of the composite concept. For example, in considering the novel conceptual combination 'spring plant', under a compositionality assumption we would look for some meaning from 'spring' and some from 'plant', independently combined together. A contrasting hypothesis is that a constituent in a conceptual combination acquires meaning contextually, depending on the other constituent. For example, in the case of boxer-bat, whether we consider a sporting or animal sense for 'bat', will impact on the how we interpret 'boxer' [23]. A number of theorists have employed the CHSH inequality or variants to conclude in favor of noncompositionality in conceptual combination [23,24], an issue of considerable significance concerning conceptual representation [25][26][27]. Similar ideas have been pursued in memory associations [28,29] and in decision making [24,30]. There has been no research exploring Bell's ideas for interacting agents. Our purpose is to develop a paradigm based on a PD variant involving the interaction of a participant with a hypothetical counterpart. The payoff matrices can be set up in a way that optimal performance (relative to overall payoff) requires sensitivity to the counterpart's choices in some cases, but not others. Allowing participants to choose whether to communicate or not with their counterpart on every trial, we can examine participant's sensitivity to context and the capacity of different modeling approaches to capture behavior.
We propose two models for modeling choice behavior, based on the models widely employed in physics for Bell paradigms. The classical model (specifically, a local hidden variables one) is based on an assumption of perfect coordination between the interacting agents, but without communication of the questions each agent receives on any trial. It allows for no sensitivity to context. The quantum model is also based on an assumption of perfect coordination between the agents, but, additionally, it allows sensitivity to context up to a certain degree (quantified by Tsirelson's bound [31]). In physics, such quantum models are interesting, because they allow sensitivity to context, even though there is no obvious physical mechanism violating locality and free choice (and there is no signaling). In psychology, such a quantum model offers a particular hypothesis of the extent to which any communication between participants can translate to sensitivity to context.
Note that we could construct more elaborate classical models, in which the causal role of communication on the observed statistics is included, and such models could (in principle) be reconciled with the sort of paradigm we have outlined above. However, we think it is more interesting to explore a baseline classical model (perfect coordination, but no sensitivity to context) vs. the standard quantum model (perfect coordination and some sensitivity to context), to inform our understanding of the extent to which participants could employ their information resource. We think it is surprising and interesting that, when S > 2, as we shall see, a superficially reasonable classical model cannot offer a good description of behavior. Examining violations of Bell's bound while allowing for interacting participants to break locality mimics attempts in physics to describe experimental statistics in Bell paradigms, by allowing violations of free choice and locality [32].
More generally, the use of quantum probability theory in cognitive modeling follows an assumption that, in some cases, quantum principles offer better descriptions to human behavior [33][34][35]. Quantum cognitive models have been explored for many kinds of cognitive processes, including decision making, categorization, similarity, perception, and memory. What is common amongst such diverse applications is a handful of characteristics which researchers have taken to be indicative of quantum-like processes. For example, sometimes behavior appears to be subject to interference effects, so that the law of total probability is violated-the PD games and analogous situations in [4] are good examples. In other cases, when participants are asked to make a decision, it appears that the underlying mental state changes. Social psychologists have been aware of such processes for a long time [36]. The added value from quantum models is that in quantum theory there is a specific requirement for how the state ought to change as a result of measurements (in behavior, decisions) and various researchers have taken advantage of these processes to build cognitive models (e.g., [37,38]). Of course, as outlined above, there have also been behavioral results indicative of sensitivity to context, for which the Bell framework and corresponding quantum models have been invoked to construct relevant theory (e.g., [23,24]). Quantum cognitive models have had good generative value, for example, in terms of anticipating biases from prior decisions [38] or a surprising constraint for question order effects [39].
As per our comments for Bell inequality violations above, in quantum cognitive models any quantum processes are epiphenomenal and are underwritten by an assumption of classical neurophysiology [40]. Moreover, there have been some compelling proposals of heuristic models mimicking quantum models [41]. So, why invoke the (unfamiliar) concepts of quantum theory at all? There are two reasons. First, it appears that in some behavioural cases quantum models can offer particularly simple explanations. Such cases tend to be ones for which behaviour is sensitive to context (as in the present case) or there are conflicting biases for behaviour, which appear to interfere with each other. Second, different quantum models generally employ the same set of principles and so have been used to identify commonalities between findings which, up to that point, had been considered separate [42]. So, even assuming that there is no 'real' quantum structure in the brain, and even if there are compelling mimicries between a specific quantum model and models based on other principles (as in [41]), we think there is explanatory value in considering such models.

Experiment 1 3.1. Participants
Participants were recruited using Prolific Academic and we restricted sampling to UK nationals. They were paid £2.25 for their involvement. Sample size was set a priori to 100 participants (50 males, 49 females and 1 participant who self-identified as 'other'). Participants were between 18 and 62 years old (M Age = 31.08 years old, SD = 11.70). Participants also reported their English fluency on a scale from 1 (extremely uncomfortable) to 5 (extremely comfortable), with the majority of participants reporting 5 (n = 97) and only a few others (n = 3) reporting 4 or lower.

Materials and Procedure
We employed a one-shot, PD variant, such that there were two possible questions for each player. Participants were told to imagine they were arrested with an associate and were both under suspicion for a minor crime in the Old Wild West. The sheriff of the town would interrogate them and their associate separately, asking one question to each. The sheriff would ask either: (1) "Did you know the victim?", or (2) "Were you at the scene of the crime?" The first question corresponds to a1 or b1 and the second to a2 or b2 (the participant and his/her associate questions are denoted by 'a and 'b respectively). Participants had two possible actions: to confess (equivalent to 'D in the standard PD paradigm; coded with a minus sign) or deny (equivalent to C; coded with a plus sign). Depending on the combination of questions, a different sentencing policy would apply. Participants were told that their sentencing policy would depend on their question and their response, as well as their associate's question and response. Participants were expected to favor decisions leading to lower sentences (fewer days spent in prison) for just themselves or for both themselves and their associate [2,43]. We created 'Good' and 'Bad' payoff matrices, such that the sentencing would bias participants to deny or confess respectively. Note, participants were shown the payoff matrix just for themselves, but were told that their associate would receive the same payoff matrix.
There were eight unique trials which can be denoted as a1b1 good, a1b1 bad, etc. Each participant received all eight trials and was told to respond independently (e.g., each trial contained a different payoff matrix and each associate had a different name across the trials). Table 2 shows an example of a good and bad matrix in a2b1. For a2b1 bad, the payoff bias has been created with a bias towards confessing the crime. For a2b1 good matrix, this bias is towards denying the crime. Note, the assumption that participants are responding independently may appear unrealistic. However, it is only marginally relevant to the present purpose, which was to collect data on choice behavior ostensibly inconsistent with a simple classical model. Isabel will be asked whether she knew the victim of the crime or whether she was at the scene of the crime. Since you don't know what Isabel's question will be, the following sentencing policy will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
Rick will be asked whether he knew the victim of the crime or whether he was at the scene of the crime. Since you don't know what Rick's question will be, the following sentencing policy will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison Were you at the scene of the crime? Were you at the scene of the crime?

Participant Checks Participant Checks
You checked on Isabel and found that she will be asked about whether she was at the scene of the crime. So, you know that the following policy for sentencing will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
You checked on Rick and found that he will be asked about whether he knew the victim. So, you know that the following policy for sentencing will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison Were you at the scene of the crime? Were you at the scene of the crime?

Participant Does Not Check Participant Does Not Check
Were you at the scene of the crime? Were you at the scene of the crime?

Participant Checks Participant Checks
The participant's associate was hypothetical and he/she was always assumed to behave as expected, e.g., in the case of trial a1b1 good, the associate would deny the crime in the b1 question. How would a participant know what the associate is likely to be doing? In most cases, there would be a choice associated with a lower sentence and so the participant would/ should guess that her associate would be selecting this option. This would be applicable for a1b1, a1b2, and a2b1 trials. For a2b2 trials, the payoffs would not uniquely identify an action as optimal. For these trials, the participant would receive a hint of what the associate is likely to be doing: participants were told that the sheriff does not know much about the crime, but he does know that exactly one between the participant and his/her associate, was at the scene of the crime. Participants therefore were cautioned that if the participant and his/her associate were to both confess or both deny for these trials, the sheriff would punish them with a high penalty. For example, for the a2b2 good trial, the sentencing matrix would be biasing towards anticorrelation between the participant and his/her associate, and the participant would receive an additional hint that the associate is 'likely' to deny the crime. So, sensitivity to context is built into the structure of the problem, in the simple sense that the participant's action needs to be informed by the associate's action when his/her question is a2, but not when it is a1.
To clarify, given each payoff matrix, there is an 'obvious' response for what the hypothetical participant should be doing: we just assume that the hypothetical participant follows this action. The exception is the a2b2 case, where we offered an additional hint of what the hypothetical participant is doing.
On each trial, participants were allowed to choose whether to communicate (i.e., violate locality) or not. They had the option to try to check, so as to discover the question that their counterpart was going to be asked. We discouraged participants from checking frequently by telling them that a check involved a risk of being caught and automatically receiving a high sentence. The first four trials in the experiment always attracted a penalty if a participant checked on his/her counterpart (these trials were fixed and different from the main experimental trials). Following these first four trials, without a noticeable break in the procedure, participants went through the eight trials corresponding to each of the four combinations of questions in each of their Good/Bad instantiation. The recorded data concerned only these eight trials and participants never experienced the penalty for checking during these trials.
On each trial, a participant was shown a 2 × 1 matrix for just their payoffs. If he/she decided to check, then he/she would be told which question was assigned to the associate (b1 or b2) and the matrix would expand to show the payoffs for all combination of answers for the participant and associate ( Table 2). If the participant did not decide to check, then he/she would just be shown again the initial 2 × 1 matrix for just their payoffs. Either way, on each trial they had to decide whether to deny or to confess.
In this experiment, for a particular trial (e.g., a1b2) the payoffs in the 2 × 1 matrices were the approximate average of the payoffs in the 2 × 2 one. For example, looking at the top left of Table 2, (29 + 21) ÷ 2 = 25. Note, averaged decimal payoffs were rounded up to the nearest whole number. But we did not create true averages across different trials. That is, for trial a1b2, there would be a 2 × 1 matrix which would be the average of payoffs in a corresponding 2 × 2 one. However, the a1 payoff would not be an average from the a1b1 and a1b2 payoff matrices. These considerations are somewhat unimportant (in any case, they are addressed in Experiment 2, what matters is the bias for action, which was to Deny in all good matrices and Confess in the bad ones.
Initial instructions explained the format of the PD game. Participants then responded to a few practice trials, but with detailed additional instructions for each step of a trial. After these trials, participants were told that the main experiment would start. They first received the four consequence-checking rounds, and then the eight PD trials, after which the experiment concluded.

Results
We observed a significant difference in the overall proportion of trials when participants checked vs. not checked, χ 2 (1, n = 800) = 121.69, p = < 0.001 (Tables 3 and 4). Additionally, participants were more likely to check with a2b2 trials than for other ones. Note, we carried out these comparisons so that Good question combinations were compared only with other Good question combinations and analogously for the Bad ones. Minimally, these results show that participants were sensitive to the context of their associate's questions, necessary to achieve higher performance.
We further show the choice probabilities to deny for all question combinations and separately for checking vs. non-checking trials (Table 5). Consider the Deny/Good/Checking column. In this case, because the matrices are good, by design the participant's associate is meant to be denying; the participant should also recognize that it is better to deny. As expected, choice proportions reveal high probability for the participant to deny in pairs a1b1, a1b2, a2b1. For the last pair, a2b2, however, the participant and his/her associate are biased to anticorrelate and, given the associate will be denying, we observe a low proportion for deny choices, again as expected (0.07). We observe the reverse pattern in the Deny/Bad/Checking column.   Note, when the participant is not checking, he/she ostensibly does not know which question the associate will be asked, and therefore there is no basis for the participant to distinguish between cases when he/she should correlate with the associate (a1b1, a1b2, a2b1) vs. anticorrelate (a2b2). If this assumption were entirely correct, we should be observing identical deny proportions across all four question combinations, when not checking, but this is not the case (e.g., for the bad matrices, 0.65 is higher than the choice proportions for the other question combinations). As noted, the issue is that the reduced payoff matrix when not checking should be identical for a1b1 and a1b2 (and likewise for a2b1 and a2b2), but this was not the case (because reduced payoff matrices were constructed separately for each question combination). We address this issue in Experiment 2.
Despite this point, the results of Experiment 1 are still useful for modeling and for exploring the question of whether the particular classical vs. quantum models we will propose are adequate. Relatedly, the empirical result is perhaps unsurprising: participants seek more information when existing information is inadequate for a decision. On one level, this is certainly true, since the task was designed to incorporate sensitivity to context in a particular way. On another level, our objective is less so to offer a surprising empirical finding, but to show that choice probabilities from this seemingly innocuous situation cannot be modeled by a classical model incorporating the (assumed) perfect coordination between the participant and her associate.

Experiment 2
Experiment 1 showed that participants recognized that there would be different biases for action depending on whether their associate's question was b1 or b2. In this experiment, we constructed payoff matrices so that the reduced matrix for e.g., a1 would be the collapsed matrix across the a1b1 and a1b2 possibilities ( Table 6).
Whereas previously there were only eight main trials (four question combinations in good and bad versions), for which participants were free to decide whether to check or not check, in this experiment we added eight trials when participants were forced to check and another eight trials in which participants were forced to not check (e.g., on some trials they were told that they had to check on their associate). Recall that to use Equation (1), we need probabilities for e.g., Prob(++|a1b1), which is computed by considering the number of times the participant denies when given question a1 together with his/her counterpart denying when given question b1. Trivially, Prob(+ + good |a1b1 checking) = counts o f deny a1b1 good checking all counts o f a1b1 good checking . With this approach, we can compute S values for the entire sample, but it is difficult to do so for individual participants, because e.g., a participant may have not checked in the case of the a1b1 good trial. With the additional trials in this experiment, all relevant probabilities can be computed within participants, e.g., Prob(+ + good |a1b1 checking) = counts o f deny a1b1 good checking f or the participant all counts o f a1b1 good checking f or the participant , and so S values can be computed within participants (which enables us to conduct some statistical tests). For the example of this probability, Prob(+ + good |a1b1 checking), for a particular participant there would be a max of two relevant trials and a min of one trial, depending on whether the participant decided to check when he/she had the option to do so.
We also included three questionnaires. First, we included the Toronto Empathy Questionnaire (TEQ, [44]), since the present task is one of guessing what a (hypothetical) associate is planning to do. The questionnaire asks participants to rate 16 questions on a five-point scale, ranging from Never (1), Rarely (2), Sometimes (3), Often (4), to Always (5). Items include, "Other people's misfortunes do not disturb me a great deal" and "It upsets me to see someone being treated disrespectfully". Second, we include the 17-item Cognitive Uncertainty (CU) subscale from the Uncertainty Response Scale [45]. The CU asks participants to state how well a series of statements describe them, including, "I like to plan ahead in detail rather than leaving things to chance" and "I like to know exactly what I'm going to do next" on a four-point scale of Never (1), Sometimes (2), Often (3) and Always (4). This questionnaire assesses the possibility that checking behavior is driven by uncertainty aversion. Finally, we employed the Cognitive Reflection Task (CRT) to test for engagement and reflection with our PD tasks. However, the original CRT has been massively overused [46,47]. To reduce the likelihood that participants had encountered the original CRT in the past, we used three of the word problems presented in the appendices of [47]. Participants read each of the questions and were asked to provide an answer in the text box. Isabel will be asked whether she knew the victim of the crime or whether she was at the scene of the crime. Since you don't know what Isabel's question will be, the following sentencing policy will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
Rick will be asked whether he knew the victim of the crime or whether he was at the scene of the crime. Since you don't know what Rick's question will be, the following sentencing policy will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
Were you at the scene of the crime? Were you at the scene of the crime?

Participant Checks Participant Checks
You checked on Isabel and found that she will be asked about whether she was at the scene of the crime. So, you know that the following policy for sentencing will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
You checked on Rick and found that he will be asked about whether he knew the victim. So, you know that the following policy for sentencing will apply. Please note, the numbers in the sentencing policies refer to the number of days you will serve in prison.
Were you at the scene of the crime? Were you at the scene of the crime?

Participants
Participants were recruited using Prolific Academic and we restricted sampling to UK nationals only. They were paid £4.50 for their involvement. Sample size was set a priori to 100 participants, and we recruited 101 participants, 50 males, 50 females and 1 participant who self-identified as 'other'. Participants were between 18 and 78 years old (M Age = 32.13 years old, SD = 12.54). Participants also reported their English fluency on a scale from 1 (extremely uncomfortable) to 5 (extremely comfortable), with the majority of participants reporting 5 (n = 95) and only a few others (n = 6) reporting 4 or lower. None of the participants for this experiment had taken part in Experiment 1.

Materials and Procedure
In Experiment 2, the payoff matrices were set up so that if the participant did not check, the reduced 2 × 1 payoff matrix would be identical across the two possible question combinations, e.g., a1b1, a1b2 (Table 6). Additionally, there were 24 trials in total: eight choice trials, where the participant can choose or not to check on their counterpart (as in Experiments 1), eight trials for which the participant is forced to check, and eight trials for which the participant is forced to not check.

Results
As expected, when participants did not check, choice proportions were nearly identical across matched pairs of question combinations (e.g., a1b1 good and a1b2 good, Table 7). Once again, we were interested in the extent to which participants check on their counterpart when they were meant to, notably in the case of a2b1 and a2b2 trials. For this experiment, this analysis will only examine the trials when participants could decide whether to check or not. We first confirmed that there was a difference in the overall proportion of trials when participants checked vs. did not check, χ 2 (1, n = 808) = 148.31, p = < 0.001 (Table 3). Moreover, participants were more likely to check with a2b1 and a2b2 trials than for other ones (Table 4). We next consider the individual differences measures. We computed d', empathy (TEQ), aversion (CU), engagement/reflection (CRT) scores and S for each participant, using Equation (1) (focused on the trials when participants could choose whether to check or not). The d' coefficient was calculated as d = Φ −1 (H) − Φ −1 (F), where H and F are hits and false alarms, respectively, and the Φ −1 function converts raw scores to z scores by fitting a normal distribution (0, 1 mean and standard deviation) to scores from each participant and then inverting [48]. Hits are considered instances of checking when the participants are meant to be checking (on a2b1 and a2b2 trials) and false alarms instances of checking when there would be no need for participants to check (on a1b1 and a1b2 trials). Note, due to the small number of trials per participants, we had a large number of probabilities of 0 or 1, which we corrected by adding 1 to the number of trials and 0.5 to the counts of hits and false alarms ( [48], p. 144). Indeed, participants checked more so on a2b1 and a2b2 trials (hits) than they did in the other trials (false alarms). This is evident from the mean d' (M = 0.995, SD = 1.25) being above zero. All measures were then correlated with each other, without a multiple comparisons correction, as the intention is exploratory. There are two notable results. First, there was no relationship between individual participant S scores and d', r = −0.135, p = 0.18. Second, there was a negative relationship between S and empathy, r = −0.23, p < 0.05. Higher values of S imply higher sensitivity to context, which in this case means that a participant is better at recognizing when he/she should reverse decisions, based on what his/her counterpart is doing. One possible explanation for this result is that participants higher in empathy try to over-guess their counterpart's action, at the expense of considering the statistical properties of the game. There were no other significant results.

Modeling
It appears that participants are sensitive to the context of their associate's decisions in the PD variants we employed, but does this sensitivity to context push choice statistics beyond the descriptive adequacy of classical models (of a certain kind) and, if yes, in what way? This is the key research question in the present work. The aim of the two models we will shortly present is to describe as closely as possible average choice statistics across trials.
The data produced by the experiments has the form of eight probabilities, corresponding to the decision of the participants to deny (plus) or confess (minus), when encountering the different PD payoff matrices (sentencing policies). The recorded probabilities always correspond to the participant deciding to plus. Therefore, for the a1b1 good matrix, the observed probability is recorded as Prob(++) and in the case of the a1b1 bad matrix, the observed probability is recorded as Prob(+−). Of course, we further inferred Prob(−+), Prob(−−), etc.
We will present two models for the observed data, referred to as the classical hidden variables model (or just classical model) and the quantum model. It is more standard to formulate these models assuming a stochastic, rather than deterministic, associate. Accordingly, we combined choice statistics from the good and bad trials using e.g., Prob(++|a1b1) = Prob(++|a1b1 Good) · Prob(Good|a1b1) + Prob(++|a1b1 Bad) · Prob(Bad|a1b1) = Prob(++|a1b1 Good) · Prob(Good|a1b1), where Prob(Good|a1b1), Prob(Bad|a1b1) refer to the probability of a good, bad game for a given choice of questions, respectively, and Prob(++|a1b1 Bad) = 0. Since in all cases we employed equal proportions of good, bad trials, for each choice of questions, then Prob(Good|a1b1) = Prob(Bad|a1b1) = 0.5.

Hidden Variables Classical Model
According to this model, for each of the two agents, there is a hidden variable λ describing each sub-system, such that λ A = −λ B , with λ A uniformly distributed over a 3D sphere. Note, this is an expression of perfect anti-correlation of the hidden variables corresponding to the agents, as opposed to perfect correlation, but this difference is immaterial (this is illustrated for the quantum model in Appendix B, but the case is analogous for the classical model). So, the main assumptions of the model are as follows. First, if the same questions are asked, the participant will always perfectly coordinate in the same way with the counterpart, that is, either always correlate or always anticorrelate; assuming always-correlation, if the participant denies, it is assumed the counterpart will deny as well, etc. Second, there is a specific value for all question outcomes at all times. The implication of this more subtle assumption is that the participant should produce an outcome to her question, independently of which question is asked to her counterpart. In physics, this is the key realism assumption. Third, this model assumes locality and free choice. In the present experiments, we endow participants with a means of violating locality, so if they do this in a certain way, we expect the model to perform poorly. A final, minor assumption is that the participant will generally recognize the optimal action in each trial (corresponding to a lower sentence), and that she will always assume that her associate will also take the optimal action. This assumption is minor because of the way the payoff matrices were constructed, but if it is wrong, the model will just fail (both models will fail).
In what follows, instead of a participant and her counterpart, we sometimes talk about two interacting agents, Alice and Bob.
The first agent is measured in two directions, a1, a2 and the second agent is measured in two different directions, b1, b2. In the present psychological context, 'directions' just correspond to the steer for action from each question, which is a function of the information in the payoff matrix and the agent's interpretation of this information (which will depend on his/her personality etc.). Non-trivial algebra shows that (e.g., [10]; note, the assumption concerning the existence of the hidden variable λ impacts on how these probabilities are derived): The key parameter in Equation (3) is the angle θ, in radians, corresponding to the correlation between a measurement direction a for Alice and b for Bob. So, the joint probability for Alice and Bob to deny for question combination ab depends on the relation between how Alice perceives question a and Bob question b. Note that when θ = 0, there is an equal chance for Alice and Bob to anticorrelate in one way (plus, minus) vs. the opposite way (minus, plus), which is just an expression of the assumption λ A = −λ B , in the considered hidden variable model.
Since we have four pairs of measurement directions, a1b1, a1b2, a2b1, a2b2, then there are four angles as the parameters of this model. But these parameters are not independent. In the original physics set up they are actual measurement directionspsychologically, there is a corresponding assumption regarding the extent to which the two agents align or not in their consideration of questions. Suppose we have co-planar measurement directions, without much loss of generality. Then, the Figure 1 arrangement is a plausible representation of the four directions. Without loss of generality, we set θ a = 0 and θ b1 , θ b2 and θ a2 as shown in Figure 1. Then, the four angles needed for the classical model are given as a1b1 = θ b1 mod π, a1b2 = θ b2 mod π, a2b1 = (θ a2 − θ b1 ) mod π, and a2b2 = (θ a2 − θ b2 ) mod π. The mod π function simply ensures that the angles for the four question pairs stay within the 0 < angle < π limit. It is defined as: We next consider the S value given this classical model. Prob(++|a1, b1) is the probability for both agents to +, when the questions are a1, b1, Prob(+−|a1, b1) the probability for Alice to + and Bob to-etc. Each expectation value is given by a&b = 2θ π − 1, where θ is the angle between the measurement directions a, b. The overall result for the classical model is then: Note, we have mentioned that for this classical model S is bounded by 2. It can be shown that for θ a1b1 + θ a1b2 the max is 2π − θ and the min is θ, where θ is the angle between b1, b2, and for θ a2b1 − θ a2b2 the max, min are θ and − θ. Together these results deliver the classical limits for S.

Quantum Model
One of the most significant discoveries in the history of quantum theory has been the capacity of the theory to break the classical S ≤ 2 bound, seemingly without violating either locality or free choice. In the present paradigm, the situation is less philosophically challenging, since we endow the two agents with a communication capacity to break locality. Since the statistics produced by the quantum model are equivalent to classical ones, but with a degree of violation of locality (or free choice; [32]), the quantum model is a reasonable option for the present paradigm. The assumptions of the quantum model are equivalent to those of the classical one, but for two differences. First, instead of the Bayesian probability rules, we employ the probability rules from quantum theory. Second, instead of a hidden variable capturing perfect coordination between the two agents, we have the quantum property of entanglement (see just below). However, this is not true (physical) quantum entanglement, but rather one of a more epiphenomenal flavor [40].
A column vector is denoted as |x , its conjugate transpose as x|, and an inner product between two vectors as x|y . Since we are concerned with two systems (agents), we need to employ tensor products to construct the joint state from the individual states, for example, |x ⊗|y which can be written for brevity as |xy . We employ a qubit representation such that 0 means an intention for a '−' (minus) action (Confess) and 1 a '+' action (Deny). States are represented as |ψ = a|x +b|y . Measurements can change the state, so if on measuring ψ we obtain x the new state becomes |ψ =|x .
We start with state, ψ + = |00 −|11 √ 2 , where the tensor structure is so that the first index corresponds to Alice and the second to Bob (the subscript '+' in |ψ + simply indicates a 'correlation' state). So, |00 means that Alice is intending to minus and Bob to minus etc. Note, in physics, the state used is typically the singlet state, which is an anticorrelation . However, the predictions from |ψ + are essentially identical but for a fixed rotation of the measurement directions; so, for the purposes of model fitting, this issue is irrelevant (in a way analogous to that for the classical model). The state |ψ + is called entangled and is one of perfect coordination between the two agents, but now using the rules of quantum theory. The predictions from the quantum model are then (Appendix C).
As before, the crucial parameter is the angle θ for each measurement direction. The four angles are constrained as for the classical model (Figure 1), so that the quantum model also has three parameters.
We can consider the computation for the Bell bound from the quantum model. We have that the expectation values are given by a&b = − cos θ, where θ is the angle between the two measurement directions. Then, It can be immediately seen that if we set the angle for a1b1, a2b1, a1b2 to π 4 , with the arrangement as in Figure 1, a2b2 is 3π 4 . Then S = − cos π 4 − cos π 4 − cos π 4 − (− cos 3π 4 ) = 2 √ 2 > 2. In fact, though not obvious from the present discussion, a quantum model cannot produce S values greater than 2 √ 2 and 2 √ 2 is called Tsirelson's bound [31].

Overview of the Two Models
The question we are interested in is whether a model satisfying realism, locality, and free choice can model this data-this is the hidden variables classical model. The answer is not automatically no because, even though locality is violated, it is an empirical question whether participants recognize the need to employ non-local resources and use the available information efficiently. If participants do not employ the non-local information, then the results could still be described by a local model and S < 2. That is, in this situation, the possibility of communication (checking) is clearly a necessary condition for participant data to violate Bell's bound, but it is not a sufficient one.
A related question is whether any use of local information can be modelled by a quantum model (which is constrained by Tsirelson's bound) or not. If not, then participants' checking behavior and use of the corresponding information would be greater than what is allowed by quantum theory.
Because there is communication in this case, it is likely that there is signaling as well. If there is signaling, the bound of S = 2 is clearly not a fundamental limitation on how a system behaves. However, there is still an empirical question on how people behave, and we can ask the question (as above) of whether human behavior can be characterized by a local model (S < 2), a nonlocal model constrained by Tsirelson's bound (the quantum model), or something else. Table 8 shows the predictions from both models, where probabilities correspond to averaged data across multiple trials. This is easier to show by retaining the reference to the good, bad matrices, bearing in mind that in the fitted data we average probabilities across these two experimental situations to better match the actual models. Table 8. Correspondence between observed probabilities and predictions from the classical and quantum models.

Term
Observed , where N is the number of observations and o i , e i observed and expected probabilities for each trial type. Best fit for the models was identified through directed grid search with a step size for angle differences of 0.1; all parameters were taken to be uniformly distributed in a [0, 2π] range. For simplicity, since N was nearly identical for the two experiments, we ignored it in computing G 2 . Table 9 shows observed, classical predicted, and quantum predicted probabilities. Observe that for the a1b1, a1b2, and a2b1 pairs we recorded higher probabilities along the diagonals of the corresponding cells, but for the a2b2 pair, the opposite is true. This is the essential impression of supercorrelation and sensitivity to context: participants respond differently to question a2 depending on whether his/her counterpart received question b1 (correlation) vs. b2 (anticorrelation). We computed three S values, one for the observed choice probabilities, one for the predicted probabilities based on the classical model, and one for the predicted probabilities based on the quantum model. Note that, for Experiment 2, empirical S was computed on the basis of the trials for which participants could freely choose whether to check on their associate or not. For Experiment 1, the empirical S, best fit classical S, and best fit quantum S were, respectively, 3, 2 (G 2 = 0.46), and 2.76 (G 2 = 0.08). For Experiment 2, the corresponding values were 2.46, 2 (G 2 = 0.17), and 2.65 (G 2 = 0.09). Bootstrapped 95% confidence intervals for the empirical S values were [2.73, 3.23] for Experiment 1 and [2.23, 2.71] for Experiment 2. The confidence intervals were computed by first calculating individual S values for each participant (only choice trials were used in this computation). Means were then calculated from each of the 1000 bootstrap samples created (each bootstrapped sample was a random choice of N values from the original sample, with replacement, where N = number of values in the sample, i.e., the number of participants). Finally, the bootstrapped means were sorted and quantiles of 0.025 and 0.975 were utilized to indicate the 95% confidence intervals for each participant. In all cases, the empirical data show S > 2, which demonstrates sensitivity to context and the impossibility of a four-way classical probability distribution to explain the data. The classical model resulted in worse fits than the quantum one, with the latter producing S values closer to the observed ones. Note that while the quantum model is able to capture a certain kind of sensitivity to context, of course it cannot describe any behavior [31].

Fit Results
Using the forced checking and non-checking trials in Experiment 2, we computed S values for checking and non-checking trials for each participant. Note, in this case, it is only checking trials that should allow a violation of the S ≤ 2 bound-therefore, for non-checking trials, it must be the case that S ≤ 2. When participants were not checking on their associate, S for the good and bad trials respectively were 1.78 and 1.82; when checking, we observed 2.91 and 2.59, respectively. The difference in S between checking (averaged across good, bad matrices 2.75) and non-checking trials (averaged across good, bad matrices 1.80) was reliable, Z = −6.44, p < 0.001 (using the Wilcoxon Signed Rank Test, as the normality assumption would be suspect here).

Signaling
We finally, briefly consider the issue of signaling, for completeness. We can define a signaling quantity as: where the expectation values are defined as expected, for example, a 1 b1 = (+1) ·(Prob(++|a1b1) + Prob(+−|a1b1)) + (−1)·(Prob(−+|a1b1) + Prob(|a1b1)). Note, the max value for I S is 8, when communication in both directions is considered (this is relevant in evaluating the size of the observed I S values). We review a point which may lead to confusion: the probabilities in Tables 5 and 7 are not exactly the ones appearing in these expectation values. This is because, in Tables 5 and 7, we counted probabilities separately for the Good and Bad matrices, i.e., the probabilities in Tables 5 and 7 are e.g., Prob(++|a1b1, Good). Therefore, as seen above too, we need to compute Prob(++|a1b1) = Prob(+ + Good|a1b1) + Prob(+ + Bad|a1b1), but recall Prob(+ + Bad|a1b1) = 0. So, Prob(++|a1b1) = Prob(+ + Good|a1b1) = Prob(++|a1b1, Good)·Prob(Good|a1b1) = Prob(++|a1b1, Good) 1 2 , because in the present design Prob(Good|a1b1) = Prob(Bad|a1b1) = 1/2 (meaning the probability of having a 'good' payoff matrix etc.; the same applies for all question combinations). The probabilities Prob(++|a1b1, Good) etc. are the ones in Tables 5 and 7 and so in computing the expectation values for I S , all probabilities from Tables 5 and 7 need to be multiplied by a factor of 1 2 (the same applies to the calculations for the S values presented in Table 10). We computed I S separately for each experiment and for the checking vs. no checking trials. For Experiment 1, for the checking and no checking trials we observed, respectively, that I S = 0.08 and I S = 0.33. The corresponding values in Experiment 2 were I S = 0.08 and I S = 0.04. In Experiment 2, the results are as expected, since there is more signaling in the checking trials (ostensibly as a result of communication). In Experiment 1, even though for the no checking trials there was no communication, we still observed sizeable signaling. Signaling in Experiment 1 would be the result of the lack of balancing between the payoff matrices (as discussed in detail above). A consideration of signaling is clearly useful as a way to establish whether there might be unintended causal influences in the experimental statistics (as in Experiment 1). However, the non-zero I S in Experiment 2 in the no checking trials indicates that signaling may be apparent even when there is no plausible corresponding mechanism, perhaps as a result of noise [16]. This does recommend caution when employing signaling in such experiments, especially when the N is small (as would be the case in behavioral experiments).
The calculation of the signaling quantifiers I S allows us to test for contextuality in the sense of [15], which we do here for completeness. According to this work, contextuality is present whenever |S| − I S > 2 (the S here refers to the maximum one between the four pos-sible ways to compute it; here, we focused only on S = | a1&b1 + a1&b2 + a2&b1 − a2&b2 |, which is most relevant to our experimental design). In Table 10, we offer a complete record of relevant S values for the checking/ no checking quantifiers separately, for both experiments, as well as the quantities |S| − I S , which are, as it happens, indicative of contextuality.

General Discussion
Sensitivity to context is an important insight concerning the representation of information, whether in physics, data science, or psychology. Outside the physics of microscopic particles, it is assumed that there are no true quantum processes, and the study of sensitivity to context leads one to question the mechanism that supports it. In psychology, some pioneering work has been carried out so that both sets of questions, {a1, a2} and {b1, b2}, would be answered by the same participant, or in any case concern mental processes focused on the individual (e.g., [24,28]). Such approaches cannot be adapted to the interaction between separate agents because, in general, without communication there is no possibility of breaking Bell's bound (or without rigging the choice of the questions asked to each agent).
For the first time, in this study we developed an approach enabling the application of the Bell framework in the interaction of two cognitive (and so macroscopic) agents. We considered putative locality violations as an information resource, that two interacting agents can employ at will (cf. [32]). We developed a simple empirical paradigm which embodied sensitivity to context in its structure, as a variant of a PD task [2] Empirical results showed that participants were sensitive to this context and the empirical S values exceeded Bell's bound. As noted, this is not surprising, given the structure of the payoff matrices we employed. The more surprising implication is that this sensitivity prevented fits by a simple classical model and therefore shows another way in which PD tasks and variants can produce results problematic for baseline expectation from classical probability theory. 'Baseline' is a key qualification here since, as noted above, a classical model incorporating communication could be developed to account for the present results. Therefore, the present situation is not unlike most so-called paradoxes in probabilistic inference, for which a baseline classical probability approach appears erroneous, but it is always possible to offer accommodating elaborations (e.g., faced with a result such as Prob(X&Y) > Prob(X), one could write Prob(X&Y|A) > Prob(X|B)).
Theoretically, we fitted two closely matched models, a classical and a quantum one. The latter produced superior fits. This conclusion adds to the body of evidence that quantum theory sometimes offers a good descriptive framework for behavior [33,34]. Elsewhere, we have suggested that this is because quantum theory looks like Bayesian inference, but in a local way [49]. That is, a set of questions for which it is impossible to have a complete joint probability distribution (e.g., because of resource limitations) is divided into subsets, such that within each subset-locally-we have Bayesian inference, but across subsets apparent classical errors arise. The idea that behavior is 'locally rational' has a precedent in psychology [50,51].
Note that the immediate availability of locality violations to the participants makes it unlikely that any results showing S > 2 would be due to 'correlations of the second kind', as discussed by S. Aerts and D. Aerts [19,21,52]. In Experiment 2, when participants would check on the hypothetical counterpart we observed S > 2 and when they would not, S < 2, showing that any apparent sensitivity to context was not brought about just by the measurements (decisions) themselves.
From the point of view of a physicist, the present results are interpreted as sensitivity to context, due to communication, regardless of whether this sensitivity to context is due to signaling or not. As noted, rather than considering signaling a nuisance influence, in this case we are interested in it, as a possible way in which Alice makes use of the information she has about Bob s questions.
There have been several challenges in realizing this project. First, the notion of applying the Bell framework to the interaction of cognitive agents superficially goes against the grain of Bell's work in physics. To address this problem, we had to formalize a notion of violations of locality or free choice, as information resources, which can be adopted vs. not at will (our formal work on this topic is reported in [32], as well as consider the distinction between context sensitivity and contextuality (for the latter see [15]). Second, adapting the classical and quantum models developed for systems of microscopic particles in physics to behavioral data required careful consideration of the underlying assumptions of the models and how they could be matched to behavioral situations. Third, the difference between contextuality and sensitivity to context and the restrictive (or not) role of signaling in Belltype paradigms are highly contentious issues. We think the approach we chose is justified, but equally we have offered additional analyses which we hope will allow researchers of differing opinions to still appreciate the results. Finally, reporting the research was challenging: the primary audience for this work is cognitive scientists, but we also hope to interest physicists and mathematicians familiar with Bell who might be intrigued by applications outside physics. But the mathematics is likely to be unfamiliar and challenging to cognitive scientists, while the details of the behavioral paradigm unfamiliar to physicists and mathematicians. Overall, interdisciplinary work of this kind, while conceptually exciting and potentially rewarding, is fraught with challenges-we can only hope that we have been at least partly successful in overcoming them.
The present analysis has practical potential. Consider two agents, Alice and Bob, for whom it is in their interest to supercorrelate, but such that they are not meant to break locality and free choice, e.g., they are not meant to communicate. Alice and Bob might be an employee in a tech firm and a stockbroker considering investment opportunities in that firm, respectively. The present framework could be employed to determine whether Alice and Bob benefit from supercorrelation, either on the basis of violations of locality (which may reveal illegal insider trading) or free choice (which could correspond to Alice and Bob independently being sensitive to market conditions which determine the 'questions' each one of them has to respond to, at a given time). Clearly, the applicability of such an analysis depends largely on how the questions for each agent are specified and whether there is advantage in supercorrelation, which may not be very often.
In closing, we hope that the present work will further encourage researchers to employ the notion of contextuality and the corresponding technical tools in the study of the interaction between multiple agents.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
An example of how S > 2 means that the counts or proportions for four binary questions are inconsistent, that is, they do not obey the expected sum rules (if a 4 × 4 table is employed for representation).
Below we show a table of probabilities, which could be computed from frequencies in a behavioral experiment (Table A1). We try to fill it in assuming maximum correlation for the three first pairs and then maximum anticorrelation for the last pair. For a pair of questions, e.g., a and b, there are four cells, and the sum of the corresponding probabilities must be one. That is, Prob(a + ∧b+) + Prob(a + ∧b−) + Prob(a − ∧b+) + Prob(a − ∧b−) = 1 (the notation a+ means a plus outcome for question a). This constraint is a simple implication of the fact that these probabilities span all the space of possibilities for the outcomes of any pair of two questions. Let us first consider the highlighted cells, corresponding to the a, b pair of questions. Table A1. A sequence of tables to illustrate the implications of S > 2: a and b.
Because we are assuming maximum correlation, we can set Prob(a + ∧b+) = p and Prob(a − ∧b−) = 1 − p = q, with the other probabilities equal to 0 (we leave blank the cells corresponding to 0 probabilities). Table A1 then becomes: Table A2. A sequence of tables to illustrate the implications of S > 2: computating a and b.
We then consider the cells corresponding to the a, b pair of questions, and the cells corresponding to the a , b pair of questions. Table A3. A sequence of tables to illustrate the implications of S > 2: a, b and a , b.
Note the highlighted parts below are constrained from the white part. For example, consider the highlighted a+ row. If we set one of the cells, e.g., Prob(a + ∧b +), then the other probability follows, from the law of total probability: Prob(a+) = Prob a + ∧b + + Prob a + ∧b − = Prob(a + ∧b+) + Prob(a + ∧b−) The second part of the equation is known (from the white part of the table), Prob(a + ∧b+) + Prob(a + ∧b−) = p. In physics this condition is called non-signaling, i.e., the marginal probability Prob(a+) does not depend on the other question b or b . So, in filling in the highlighted parts of the table we only need to worry about one probability in each row. This one probability can be set on the basis of the logic above, p and q = 1 − p for the terms which are meant to be correlating, so ending up with: Table A4. A sequence of tables to illustrate the implications of S > 2: computing a, b and a , b.
We note that the locality and free choice assumptions are equivalent to an assumption of lack of contextuality for all questions. That is, lack of contextuality must mean that e.g., Prob(a ) is the same regardless of context, that is, regardless of whether it is measured with b or b . Therefore, we would have that Prob a + = Prob a + ∧b+ + Prob a + ∧b− = Prob a + ∧b + + Prob a + ∧b − = p Prob a − = 1 − Prob a + = 1 − p = q Prob b + = Prob a + ∧b + + Prob a − ∧b + = Prob a + ∧b + + Prob a − ∧b + = p We highlight the cells relevant to the first constraint, for Prob(a +), in Table A5 below. Table A5. A sequence of tables to illustrate the implications of S > 2: illustrating constraints.
Note that the above marginal conditions lead to: What remains is to fill the bottom right part of the table in a way that reflects the anticorrelation pattern for a b , i.e., so that the diagonal elements vanish Prob a + ∧b + = Prob a + ∧b − = 0 Now, by the same logic as above, we arrive at Prob a + = Prob a + ∧b − = Prob b − , Prob a − = Prob a + ∧b + = Prob b + , which entails that p = q = 0.5. Thus, we obtain Table A6b. The reader may compare this table with the case when all questions maximally correlate. The derivation follows the same pattern except the last step, which in this case does not fix probabilities p and q. See Table A6a for comparison.
Note that in Table A6a, with p = q = 0.5, individually each answer is a coin toss, but looking at Alice and Bob together they perfectly coordinate for questions ab, ab and a b and perfectly anticorrelate for questions a b . This is an instance of the famous PR-box (Popescu-Rohrlich box) considered in the physics literature [13]. In the present work, we aim to provide an analogue for the interaction of two agents. However, contextuality, or its lack of it, does not help us see how violation of the Bell bound is inconsistent with the existence of a four-way probability distribution. This can be readily seen by comparing the two tables below, Table A6a,b. Table A6. A sequence of tables to illustrate the implications of S > 2. Note, (a) is 'classical', while (b) is 'contextual' (the response to a depends on whether Bob answers b or b ).
In order to show that the right table is contextual and therefore inconsistent with classical probability theory, we need to make use of the overarching assumption that there exists a complete (four-way in this case) classical probability distribution.

Appendix B
We show that quantum predictions are equivalent for all maximally entangled states (note that for the classical model it is straightforward to see that the model can easily adjust itself depending on whether perfect correlation or anti-correlation between the agents is assumed). This equivalence is up to a simple transformation of the angles employed. Therefore, it has little impact in model fits whether we employ a perfect correlation state (which fits our empirical situation well) or a perfect anticorrelation one (which is the standard state for this kind of analysis).
We know that for the Bell singlet state |ψ − = |01 −|10 √ 2 the statistics of measurement outcomes in directions a for Alice and b for Bob is given by the formula Suppose we want to get statistics for the same measurement on a different maximally entangled state, such as |ψ = |φ 0 |ξ 0 +|φ 1 |ξ 1 √ 2 with some orthogonal basis states |φ 0 , |φ 1 for Alice and |ξ 0 , |ξ 1 for Bob. The trick is to observe that there exist two unitaries U and V such that Note that these unitaries are realized as rotations of the Bloch sphere for the respective Alice and Bob s qubit, i.e., we have (cf. [53], Chapter 4.2) where a = Rn A (−θ A )a and b = Rn B (−θ B )b. The bottom line is that we treat ab ψ as if it was a b ψ − (i.e., just measuring in different measurement basis). That is, we are able to conclude that ab ψ = a b ψ − . We either look for angles which produce best fit to our data, assuming ψ or different angles which produce best fit to our data, assuming ψ − . The two pictures are equivalent and the two sets of angles are linearly related to each other.

Appendix C
We review the derivation of choice probabilities for the quantum model (which is fairly standard in physics).
In text, we employed |ψ + = |00 −|11 √ 2 , but recall that for a maximally entangled state, such as |ψ + , joint probabilities are unaffected, but for a fixed rotation on the angles, regardless of which maximally entangled state we employ, whether it is an anti-correlation one, such as |ψ − = |01 −|10 √ 2 or a correlation one, such as |ψ + = |00 −|11 √ 2 (this can be shown fairly easily). Specifically, below we proceed with |ψ − , as is standard in physics discussions, but if a reader wishes to know the exact predictions for |ψ + all that is needed is to transform the angles as θ = θ + π. First, we need to show that a state |ψ − = |01 −|10 √ 2 can be written in any alternative, equivalent basis.
Starting with the Bell state |ψ − = |01 −|10 √ 2 We seek to express it in the alternative basis Which means then we can have |ψ − = 1 √ 2 (|n − n + −|n + n − ) Second, we derive the expression for the joint probabilities for two measurement directions, e.g., a · σ A on particle A, a · σ A = a x σ x + a y σ y + a z σ z . To find Prob(+|a; ψ − ), to mean + outcome for the a · σ A observable, we rewrite ψ − in the |±a basis, And clearly Prob(−|a; ψ − ) = 1 2 too. By analogy, we have Prob(+|b; ψ − ) = Prob(−|b; ψ − ) = 1 2 . In order to compute probabilities for simultaneous measurements on both particles for a · σ A on the first particle and b · σ B on the second particle, we must express |ψ − in the basis for a · σ A and then compute the dot product between P b+ and |a + , |a − .
In order to compute | b + |a + | 2 we need to identify the eigenvectors of the operator a · σ A = a z a x − ia y a x + ia y −a z = cos θ sin θe −iϕ sin θe iϕ − cos θ , where θ is the polar and ϕ

Which then gives us
Prob(−−|a, b; ψ − ) = 1 2 sin 2 θ 2 = Prob(++|a, b; ψ − ) With a quick 'sanity' check on the above calculations, namely if θ = 0 then we have the same direction of measurement for both particles. However, such an angle gives us Prob(++|a, b; ψ − ) = 0, consistent with the assumption, since the two sub-systems are meant to be anti-correlated.
Finally, we can use the above to compute the correlator for the S quantity for the quantum model. Note, first of all, the definition of the expected value of an observable in quantum theory, if a physical quantity A and a state of the system are represented respectively by the self-adjoint operator A and the normalized vector ψ ∈ H, then the expected value A ψ of A is A ψ = ψ, Aψ .
The observable a · σ A ⊗ b · σ B has a spectral decomposition The expectation value of observable a · σ A ⊗ b · σ B is then given by, The interpretation of this expectation value is that it is the average value of the product of outcome value for the first sub-system times outcome value for the second sub-system, where outcome for the first sub-system is measured along a and outcome for the second sub-system is measured along b.