Tsirelson’s Bound Prohibits Communication through a Disconnected Channel

Why does nature only allow nonlocal correlations up to Tsirelson’s bound and not beyond? We construct a channel whose input is statistically independent of its output, but through which communication is nevertheless possible if and only if Tsirelson’s bound is violated. This provides a statistical justification for Tsirelson’s bound on nonlocal correlations in a bipartite setting.

ABSTRACT.A central theme in quantum mechanics is nonlocality which states that a pair of quantum systems which are shown not to be physically interacting may nevertheless be impossible to describe as independent entities.Quantum theory, however, forbids nonlocal correlations beyond a certain limit.Why is nature only so nonlocal and not more?Approaching the question from the direction of statistics and statistical inference, we identify a statistical no-signaling principle which states that no information may pass through a disconnected channel.We show this principle to be equivalent to the Tsirelson bound on nonlocality for the Bell-CHSH inequality.
Some of the predictions made by quantum mechanics appear to be at odds with common sense.Yet quantum mechanics remains the most precisely tested and successful quantitative theory of nature.It is therefore believed that even if quantum mechanics is someday replaced, any successor will have to inherit at least some of its "preposterous" but highly predictive principles.Perhaps the most counter-intuitive quantum mechanical principle is nonlocality [1]: Nonlocality: A pair of quantum systems which are shown not to be physically interacting may nevertheless be impossible to describe as independent entities.
The mystery of nonlocality is not only to understand why nature is as nonlocal as it is as, but also to understand why nature is not more nonlocal than it is.There are alternative Non-Signaling theories which permit nonlocality beyond the quantum limit [2,3]; why doesn't nature choose these theories over quantum mechanics?Several explanations have been proposed, but none is tight, i.e. none provides a necessary and sufficient condition for the quantum limit [4,5].We exhibit a protocol (an infinite oblivious transfer) which uses 'superquantum NS-boxes' to send messages through a disconnected channel, and we propose a principle which we call statistical no-signaling which states that such a communication is physically impossible.We show that statistical no-signaling in a bipartite setting is equivalent to Tsirelson's bound for the CHSH inequality which we henceforth call the quantum bound on nonlocality.We thus provide a conceptual explanation for this bound.Our approach is different from others in that we use statistical techniques as opposed to probabilistic techniques-in particular we use Fisher information.
A famous application of nonlocality is to construct an 1-2 oblivious transfer protocol between two distant agents (A)lice and (B)ob.Alice and Bob each possess a mysterious box representing one half of the quantum system to be explained.Alice's box might, for example, contain one half of a singlet state of spin-1 2 particles, with Bob's box containing the other half [1,6].In addition, Alice possesses a pair of bits x 0 and x 1 , each of which is a zero or a one.Using boolean algebra and her boxes (the protocol will be described later), Alice encodes her pair of bits into a single bit x (1) which she sends across a classical channel to Bob. Bob wishes to recover either x 0 or x 1 , but Alice doesn't know in advance which one.Bob uses the received bit x (1) , his box, and some boolean algebra to construct an estimate y i for his desired bit x i .See Figure 1 later on.
What is the probability that Bob correctly estimates the bit he wished to know?He has two possible sources of knowledge-the bit x (1) he received from Alice, and some mysterious 'nonlocal' correlation between his box and Alice's.The strength of such a nonlocal coordination between two systems is encapsulated by a number c ∈ [−1, 1] called the Bell-CHSH correlation such that Bob's probability of guessing correctly is (1 + |c|)/2 (see Supplementary A).The Bell-CHSH inequality tells us that |c| ≤ 1/2 classically [1,6].Mathematically, the statement of nonlocality is that c may violate the Bell-CHSH inequality.This has been supported by increasingly supported by experiments, culminating in a recent loophole-free verification [7].We think of the Bell-CHSH correlation c as a measure of the strength of the nonlocality manifest in our boxes.
How large can c be? Tsirelson's bound tells us that |c| cannot exceed 1/ √ 2 in a world described by quantum mechanics [8].This quantum bound on nonlocality: has been tested experimentally, with the current state of the art being an experiment by Kurtsiefer's group which has achieved a value of c which is only 0.0008 ± 0.00082 distant from Tsirelson's bound [5].Such experimental evidence supports that Tsirelson's bound indeed holds in the real world.
Tsirelson's result is a specifically quantum mechanical fact for which there has been no good conceptual explanation.How fundamental is (1)?Must this inequality also hold for any future theory which might someday supercede quantum mechanics [9]?We are led to the following question: Question: Can we find a plausible physical principle, independent of quantum mechanics, which is necessary and sufficient to guarantee that |c| ≤ 1/ √ 2?
The search for such a principle has a history of about 20 years.It was initially expected that the physical principle of relativistic causality (no-signaling) itself restricts the strength of nonlocality [10,11,12].But then it was discovered that no-signaling theories may exist for which |c| > 1/ √ 2. This led to the deviceindependent formalism of No-Signaling (NS)-boxes [2,13] (see also [3]).In particular, maximum violation of the Bell-CHSH inequality is achieved by Popescu-Rohrlich (PR)-boxes which are consistent with Relativistic Causality.Why then, after all, does nature not permit (1) to be violated (as far as we know)?Several suggestions have been made.Superquantum correlations lead to violations of the Heisenberg uncertainty principle [14,15], which is another seemingly purely quantum result.PR-boxes would allow distributed computation to be performed with only one bit of communication [16], which looks unlikely but doesn't violate any known physical law.Similarly, in stronger-than-quantum nonlocal theories some computations exceed reasonable performance limits [17].The principle of information causality [18] shows that no sensible measure of mutual information exists between pairs of systems in superquantum nonlocal theories.Finally, it was shown that superquantum nonlocality does not permit classical physics to emerge in the limit of infinitely many microscopic systems [19,20].Of these, only information causality and macroscopic locality give necessary conditions for the quantum bound and neither is known to be sufficient [4].Thus these conditions do not single out quantum mechanics from amongst all possible nonlocal theories, as pointed out in [21].
We propose the following statement as a physical principle: Statistical no-signaling: No information can pass through a channel whose output is independent of its input.In this report we formulate a consequence of statistical no-signaling that is equivalent to (1), providing a sought-for conceptual explanation for the quantum bound on nonlocality.The novelty of our approach is our use of statistical methods.
Let x = Bernoulli(θ) be a Bernoulli random variable held by Alice, which serves as our information source.We imagine θ ∈ [−1, 1] as encoding a message, perhaps in the digits of its binary expansion.Alice independently samples m values A def = {x 0 , x 1 , . . ., x m−1 } from x (the interesting case is m → ∞) which she sends through a channel to Bob. Bob receives a set of values B def = {y 0 , y 1 , . . ., y m−1 } which are also independent identically distributed (iid) and which we may consider as realizations of a Bernoulli random variable y whose mean is their sample average.We have thus described a noisy channel with input x and with output y.In Supplementaries A and B we construct such channels and show that c may be viewed as the correlation between their inputs and outputs.
The Fisher information I B (θ) represents the maximum information about θ that Bob may have received by way of the above protocol.If I B (θ) = ∞ then Bob knows θ 'on the nose', while if I B (θ) = 0 then Bob has received no information about θ at all.
Consequence of statistical no-signaling: Given the above setting, if x and y are independent random variables then I B (θ) < ∞.When the number of samples is finite, the Fisher information for a disconnected channel obviously vanishes.But when there are infinitely many samples it may happen that I B (θ) = ∞.Indeed, we will construct a disconnected channel for which I B (θ) = ∞ using superquantum NS-boxes.It is in this way that statistical no-signalling will imply that superquantum NS-boxes are non-physical and therefore that the quantum bound on nonlocality is indeed fundamental.
As we shall see, the only three possible values of I B (θ) in the m → ∞ limit are 0, 1, and ∞.When the Fisher information equals zero or one, no information is transmitted about θ.The distinction between these cases is discussed in Supplementary D To derive the quantum limit on nonlocality |c| ≤ 1/ √ 2 from statistical no-signaling, we realize a disconnected channel in a specific way as a limiting case of the van Dam protocol [16].This is the same protocol that was used to test information causality [18].
Alice samples 2 n bits A def = {x 0 , x1 , . . ., x2 n −1 } from her ±1-valued Bernoulli(θ) random variable x which she converts into 0/1-valued bits, {x 0 , x 1 , . . ., x 2 n −1 }, such that xi = (−1) x i +1 .She then combines these using her NS-boxes, a pair at a time, into one 'very special' bit x (n) which she transmits to Bob through what we will for now assume is a perfect channel.Bob randomly chooses an index 0 ≤ i ≤ 2 n − 1 (Alice does not know in advance which i he will choose), and makes his best guess y i (respectively, ŷi def = (−1) y i +1 ) for Alice's bit x i (respectively, xi ) using x (n) and his NS-boxes.The correlations between Alice's boxes and Bob's boxes are governed by the Bell-CHSH correlation c ∈ [−1, 1].The process described above is called random access coding or oblivious transfer, and it defines a channel from x to y (see Supplementary B and Figure 1).Assume first that |c| < 1.A short calculation in Supplementaries B and D will reveal the following properties in the n → ∞ limit: • Random variables x and y are independent.
• As for the Fisher information: (2) Statistical no-signaling rules out the first case, from which we deduce that 2c 2 ≤ 1, that is the quantum limit on nonlocality (1).
The conceptual explanation for why the channel becomes disconnected as n → ∞ is that the only information which passes from Alice to Bob in the van Dam protocol is x (n) which is a communication bottleneck.Alice's information about θ is contained in her samples x 0 , x 1 , . . ., x 2 n −1 which are combined with one another and with random noise from the boxes to become x (n) .Conversely, Bob's estimates y 0 , y 1 , . . ., y 2 n −1 are also all recovered from x (n) together with noise introduced by his boxes.But x (n) contains less and less information about θ as n grows to infinity and as more boxes are used, and x (n)  contains no information at all about θ in the n → ∞ limit.This disconnects the channel from x to y. See Figure 2.
The |c| = 1 case (PR-boxes) requires special consideration.The nonlocal correlation c is independent of the characteristics of the classical channel, so we choose the correlation of the classical channel from Alice to Bob in the case of 2 n samples to be (c ′ ) n for some 1/ √ 2 < c ′ < 1.This disconnects the classical channel between x to y, as we show in Supplementary C, while maintaining I B (θ) = ∞.This contradicts statistical no-signaling as required.
Our approach differs from others in that we use Fisher information as opposed to Shannon information.As a result, Alice's bits were interpreted as samples of a random variable whose mean encodes a message.The utility of Fisher information as a measure of the quantity of Bob's information about θ stems from the y=x (1) ⊕B x (1) =x 0 ⊕A FIGURE 1. Distributed oblivious transfer (van Dam) protocol [16].Its basic building block is on the left, where Alice inserts x 0 ⊕ x 1 into her box, receives A, and sends x 0 ⊕ A to Bob. Bob decides that he wants to know the value of x j , and he feeds j into his box, which outputs B. Bob's estimate of x i is then x (1) ⊕ B. When there are multiple boxes, Alice concatenates (the process is called wiring).For example, with seven boxes, Alice begins with a collection of bits x 0 , x 1 , . . ., x 7 , and she inputs x 2j ⊕ x 2j+1 into box j, where j = 0, 1, 2, 3, receiving A 0 , A 1 , A 2 , A 3 correspondingly.The bits fed into the next level of boxes become x (1) j def = x 2j ⊕ A j with j = 0, 1, 2, 3.The final output x (3) is sent to Bob. Bob encodes the address of the bit he wants as the binary number i 3 i 2 i 1 -for example, if he wants x 2 , then he sets i 3 = 0, i 2 = 1, and i 1 = 0 because 10 is 2 in binary.This binary encoding describes a path in his binary tree from a root to a branch, where 0 means 'go left' and 1 means 'go right'.Bob inserts i 3 into the lowermost box to obtain B 6 .Setting Cramér-Rao Lower Bound which asserts that σ 2 B def = 1/I B (θ) is the lowest uncertainty about the value of θ in terms of error variance that Bob could hope to achieve with an unbiased estimator.
The Central Limit Theorem (CLT) provides a further interpretation of statistical no-signaling.Let θ be Bob's best decoder of θ based on B, that is the maximum likelihood estimator.In Supplementary E it is shown that as n approaches infinity: where d stands in for convergence in distribution.The rightmost term denotes a random variable whose distribution is Gaussian centered at 0 with variance 1.We may think of (3) as a form of the CLT in which the number of samples has been replaced by the Fisher information.Explicitly, for c = 1 and for θ = 0 we recover the usual CLT.Thinking of the Fisher information as the effective number of samples that Bob receives, if 2c 2 ≤ 1 then (3) appears as a retarded or degenerate CLT in which the effective number of samples does not increase as n → ∞.Thus Bob's ability to estimate θ using θ decreases in the n → ∞ The statistical no-signaling condition.The van Dam protocol defines an underlying channel which becomes disconnected in the n → ∞ limit.The upper illustration shows this channel and the amount of Fisher information about θ at its input and at its output.When the number of nonlocal resources increases unboundedly, the two ends of the channel become disconnected as illustrated by a vanishing bottleneck in the lower figure.Statistical no-signaling dictates that in this case no information can pass through, which occurs if and only if 2c 2 ≤ 1.The case of 2c 2 > 1 leads to a physically unreasonable limit where Bob can fully read off the value of Alice's θ through a disconnected channel.limit, which is what we would expect because less information about θ is passing through the channel.Despite the number of samples growing, the effective number of samples does not increase.

Conclusions
We have formulated a statistical no-signaling principle which dictates that no information can pass through a disconnected channel.Applied to an infinite limit of the van Dam protocol, this principle is equivalent to the quantum bound on nonlocality.We may view this fact as an example of asymptotic theory in statistics, in which an asymptotic limit allows us to discern statistical properties that are unavailable for a finite number of samples.
Statistical no-signaling is different from the notion of no-signaling in the sense of non-signaling theories (NS-boxes).No-signaling pertains to a single pair of boxes, whereas statistical no-signaling is used as a condition on the limit of an iterative construction involving infinitely many boxes.Taking statistical nosignaling instead of what is traditionally called 'no-signaling' as our no-signaling condition, we recover the idea that quantum mechanics is indeed the most general nonlocal non-signaling theory.
Shimony [10,11] and Aharonov [12] independently suggested that a quantum theory may perhaps be based on two axioms, nonlocality and relativistic causality (no-signaling).Aharonov (unpublished) also observed that these two axioms, which seem to contradict one another, can be reconciled using uncertainty.This idea was virtually abandoned for many years following the discovery of superquantum theories which satisfy both axioms.But statistical no-signaling reveals a sense in which this original idea holds true.When the number of nonlocal resources increases to infinity, stronger-than-quantum nonlocal theories fail to satisfy statistical no-signaling.These superquantum theories approach a signaling limit where Bob can recover Alice's message with complete certainty even though the channel between Alice and Bob is disconnected.Quantum nonlocality obeys statistical no-signaling and thus permits only bounded uncertainty (pure randomness), σ 2 B → 1, or complete uncertainty, σ 2 B → ∞, in the limit.The statistical no-signaling condition is stronger than previously identified principle of information causality [18].Violation of statistical no-signaling implies violation of information causality whereas the converse implication is false.This is evident in the derivation of information causality in that paper, where the expression of the Fisher information in (2) in the θ = 0 case appears as Equation ( 23) therein.

Supplementary A. The bipartite Bell experiment as a noisy symmetric channel
In this section we recall the definition of the Bell-CHSH correlation c and we formulate the Bell-CHSH inequality, establishing notation.We then exhibit c as the correlation of a symmetric binary channel.

A.1. The Bell-CHSH inequality. Let us recall the classical bipartite Bell experiment.
Alice and Bob each hold one half of an EPR pair such as a singlet state of spin-1 2 particles.They each possess two different measuring instruments which we unimaginatively call 'instrument zero' and 'instrument one'.Alice measures her particle using one of the instruments, and Bob does the same.Let a be the index of the instrument used by Alice and let Â be its reading.Similarly, let b and B denote the index of an instrument chosen by Bob and its reading.In the language of probability, Â and B are ±1valued-valued Bernoulli random variables.The choices of measuring instrument, a and b, may be either parameters or 0/1-valued Bernoulli random variables.
Repeating the experiment for many different EPR pairs, Alice and Bob may compute the correlations of their readings Â and B for any given pair of indices a and b.Formally, they compute, E Â B a, b , the expectation of Â B conditioned on their choice of a particular pair of measuring instruments a and b.We now define the Bell-CHSH correlation c by the formula: In any theory in which both Alice and Bob's choices, and the readings of their measuring devices, are local, the Bell-CHSH inequality [6] holds: ( Locality means that Alice's readings may only be affected by her own choices (or perhaps by any other hidden variables locally at her site), and similarly for Bob's readings.Quantum mechanically, Alice and Bob may violate (5) and hence Quantum Mechanics is nonlocal.
A.2.The Bell-CHSH correlation c as a channel correlation.Non-signaling (NS)-boxes provide an abstraction and an extension of the Bell-CHSH experiment.This time, Alice and Bob each owns a box.Each such box may be thought of as a complete laboratory containing two measuring devices.Either participants inserts their choice of measuring device into their box.The box output is the respective reading of the chosen measuring device.Alice and Bob share a pair of NS-boxes whose inputs are a and b and whose outputs are Bernoulli random variables A a and B b .Assume now that a, b, A a , and B b are all 0/1-valued.
We will show that the Bell-CHSH parameter (4) represents the correlation of a symmetric binary channel whose input is the Bernoulli random variable x def = ab and whose output is the Bernoulli random variable With respect to a particular choice of measuring devices a and b, (6) becomes: Pulling the condition î = ab = (−1) ab out of (7) and using A a ⊕ B b = Âa Bb , we obtain: Assume that the channel is symmetric, i.e. that c = c ab (a, b), ∀a, b.From ( 7) and ( 8) we may rewrite the Bell-CHSH correlation (4) as: The last equality above follows from the channel symmetry: Equation ( 9) is our promised interpretation of the Bell-CHSH correlation as a correlation of a noisy symmetric binary channel.

Supplementary B. The van Dam protocol as a noisy symmetric channel
In this section we recall the construction of the van-Dam protocol [18,16].We then reinterpret this protocol as underlying a noisy symmetric binary channel, as a special case of the construction of Section A. We compute its correlations, and establish the effect of noise on its classical component.
B.1.The van Dam protocol.The van Dam protocol realizes an oblivious transfer protocol by means of a classical channel and a number of NS-boxes.Each of Alice's boxes has a corresponding box on Bob's side, and different pairs of boxes are statistically independent.Suppose that Alice has in her possession the bits x 0 , . . ., x m−1 where m = 2 n , n ≥ 1. Bob wishes to know the value of one of her bits.He may do so by specifying the address of the bit whose value he wishes to know via its binary address i = 1 n−1 i i−2 • • • , i 0 .For example, if n = 2 then Bob may specify which of the bits x 0 to x 3 he wants by specifying a binary address, 00, 01, 10, or 11.Alice bits and Bob addresses are encoded into the inputs of 2 n − 1 NS-boxes following a particular protocol which is described next.
Alice uses outputs of boxes and choices of measuring devices to determine choices of measuring devices for other boxes.Such a procedure is called wiring.The wiring of boxes on Alice side admits a recursive description which we now give.Let A k,j a denote the output of the jth box on the kth level on Alice side.Let also: (11) f k,j (q 1 , q 2 ) def = q 1 ⊕ A k,j q 1 ⊕q 2 .Suppose that Alice wishes to encode m = 4 bits with her boxes.To do so, she first picks two boxes and computes: (12) x (1) This forms the first level in her construction.The second level then follows: .
In this example there are only two levels and so x (2) is the bit which Alice transmits to Bob through the classical channel.In case where m = 2 n there will be n levels and thus x (n) is the bit Bob will receive from Alice.
Unbeknownst to Alice, Bob now decides which bit x i he would like to know the value of.He takes its binary address i = i n−1 i i−2 • • • , i 0 , and inserts i k−1 into all of his boxes whose counterparts are on the k level on Alice's side.He then uses the values B k,j i k−1 that he obtains, together with the bit x (n) he received from Alice, to construct the decoding function: (14) y The values j 1 , . . ., j n (which boxes Bob uses) are determined by the binary address i The probability that Bob will decode the correct value of the bit he desires is governed by the NS-box correlation c.For the simplest case of m = 2 where Alice and Bob share a single pair of boxes, note that (15) which follows from (9).In general, decoding any bit out of 2 n possible bits involves using n pairs of NS boxes.Noting that an even number of errors, A ⊕ B = ab, will always cancel out in such a construction, leads to [18]: We illustrate in the case that n = 2: B.2. van Dam protocol as a symmetric channel.Assume now that instead of a string of bits, Alice has in her possession an information source that is a ±1-valued Bernoulli random variable x whose mean is θ.Alice generates m iid samples, x0 , . . ., xm−1 from x and converts them into her 0/1-valued bits, x 0 , x 1 , . . ., x m−1 by mapping 0 to −1 and 1 to 1.As in (18), the van Dam protocol has a memoriless property: From this it follows that if Alice's inputs x 0 , x 1 , . . ., x m−1 are iid then Bob's outputs y 0 , y 1 , . . ., y m−1 are also iid.Therefore the set ŷi def = (−1) y i determine a Bernoulli random variable y.In this way, the van Dam protocol may be viewed as a symmetric binary channel whose input is x and whose output is y.By (17) the channel correlation is We shall assume a prior distribution for x given by: Using this, (35) reads: (37) For a symmetric binary channel, with c = c −1 = c 1 , Equation (37) simplifies to: (38) Note that the minimum of I Y (θ) is obtained for θ = 0 in which case P (x | θ) = 1/2 and I Y (0) = mc 2 .

( 20 )B. 3 .D. 1 .
E [xy | x = xi ] = 2P y = xi x = xi − 1 = 2P y i = x i x i − 1 = c n .Noisy classical channel in the van Dam protocol.The preceding discussion of the van Dam protocol assumed a perfect classical channel between Alice and Bob.We now relax this assumption.Let (c ′ ) n be the correlation underlying the classical channel, where |c ′ | ≤ 1.Such a channel can be realized by concatenating n copies of a noisy symmetric channel whose correlation is c ′ .This correlation depends on n, and Alice may construct it as part of the protocol based on her knowledge of n.Fisher information for a binary channel.Consider a binary channel whose input is a ±1-valued Bernoulli random variable x and whose output is another ±1-valued Bernoulli random variable y.The channel correlations are defined by means of(6).If the channel is symmetric then (29) P (y = x) = P y = −1 x = −1 = P y = 1 x = 1 , from which it follows that c −1 = c 1 = E[xy].