Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework

Patra, Soumyadip; Bierhorst, Peter

doi:10.3390/e25091291

Open AccessArticle

Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework

by

Soumyadip Patra

^*

and

Peter Bierhorst

Department of Mathematics, University of New Orleans, New Orleans, LA 70148, USA

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(9), 1291; https://doi.org/10.3390/e25091291

Submission received: 4 July 2023 / Revised: 16 August 2023 / Accepted: 28 August 2023 / Published: 2 September 2023

(This article belongs to the Section Quantum Information)

Download

Browse Figures

Versions Notes

Abstract

:

The probability estimation framework involves direct estimation of the probability of occurrences of outcomes conditioned on measurement settings and side information. It is a powerful tool for certifying randomness in quantum nonlocality experiments. In this paper, we present a self-contained proof of the asymptotic optimality of the method. Our approach refines earlier results to allow a better characterisation of optimal adversarial attacks on the protocol. We apply these results to the (2,2,2) Bell scenario, obtaining an analytic characterisation of the optimal adversarial attacks bound by no-signalling principles, while also demonstrating the asymptotic robustness of the PEF method to deviations from expected experimental behaviour. We also study extensions of the analysis to quantum-limited adversaries in the (2,2,2) Bell scenario and no-signalling adversaries in higher

(n, m, k)

Bell scenarios.

Keywords:

device-independent quantum random number generation; quantum nonlocality; Bell inequalities; asymptotic equipartition property; min-entropy

1. Introduction

Randomness has proven to be a valuable resource for a multitude of tasks, be it computation or communication. In cryptography, access to reliable random bits is essential, since the security of various cryptographic primitives is known to be compromised if the incorporated randomness is of poor quality [1,2,3]. In the study of random network modelling, being able to sample random graphs uniformly and (reliably) at random is crucial [4]. And, for some problems, randomised algorithms are known to vastly outperform their deterministic counterparts [5].

A distinction between two notions of randomness, those of process and product, is discussed in [6] (chapter 8). Although both notions are tightly connected, randomness of a process refers to its unpredictability, while that of a product refers to a lack of pattern in it. An unpredictable process will, with high probability, produce a sequence (a string of bits, say) that is patternless; on the other hand, a seemingly irregular string of bits might not be unpredictable and instead be a probabilistic mixture of pre-recorded information. While product randomness suffices for tasks like Monte Carlo simulations, sampling and those involving randomised algorithms, cryptographic applications involving an adversary necessitate process randomness.

Process randomness, while being non-existent in the strictest interpretation of any classical theory, is permissible in quantum mechanics; an important example of this is quantum nonlocality as manifested in a Bell experiment. Quintessentially, the setup of a Bell experiment constitutes an entangled quantum system shared between two spatially separated stations

A

and

B

receiving inputs

x

and

y

, and recording outcomes

a

and

b

, respectively. If after n successive trials the observed correlations between the outcomes conditioned on the settings violate a Bell inequality then it can be ruled out that the outcomes were pre-assigned by some probabilistic mixture of deterministic processes. Also, the outcomes are (unpredictably) random, not only to the respective users of the devices at the two stations but also to an adversary, even to one having a complete understanding of the Bell experiment. This relationship between nonlocality in quantum mechanics and its random nature is at the foundation of various device-independent random number generation protocols.

Device independence is considered a gold standard in cryptographic tasks such as quantum random number generation and quantum key distribution, in which the respective users are not required to know or trust the inner machinery of their devices, thus treating them as mere black boxes to which they can provide inputs and record outcomes. The only assumption that the experimental setup must satisfy is that the measurement choices of the devices must be uncorrelated with their inner workings. This is the measurement independence assumption, which is ultimately untestable but is tacitly assumed, arguably, in almost all scientific experiments. The no-signalling condition that the outcome recorded at each station is not influenced by the choice of measurement at the other station holds throughout the experiment because of the space-like separation between the stations and the impossibility of superluminal signalling in accordance with the special theory of relativity. Furthermore, the adversary trying to simulate the observed statistics may be considered computationally unbounded, a standard that falls under the paradigm of information-theoretic security. Over the years, technological advancement has facilitated loophole-free Bell nonlocality experiments, which have not only provided experimental validation to rule out a classical description of nature [7,8,9,10], but have also found practical applications in device-independent quantum randomness generation and device-independent quantum key distribution [11,12,13].

The probability estimation framework is a broadly applicable framework for performing device-independent quantum randomness generation (DIQRNG) upon a finite sequence of loophole-free Bell experiment data and involves direct estimation of the amount of certifiable randomness by obtaining high-confidence bounds on the conditional probability of the observed measurement outcomes conditioned on the measurement settings in the presence of classical side information [14,15,16]. Advantageous primarily for its demonstrated applicability to Bell tests with small Bell violations and high efficiency for a finite number of trials, it can also accommodate changing experimental conditions and allows early stoppage upon meeting certain criteria. Also, it can be extended to randomness generation with quantum devices beyond the device-independent scenario.

The probability estimation framework for DIQRNG is provably secure against adversaries who do not possess entanglement with the sources. Security against more general adversaries, with quantum entanglement with the sources, is possible with the quantum estimation framework [17], for which the constructions of the probability estimation framework can often be translated to the quantum estimation framework (as was carried out in [18]), so that progress with the former framework can often be used for the more general latter framework.

The asymptotic optimality of the probability estimation framework was discussed in [15]. The specific result of asymptotic optimality is as follows: given a sufficiently large number of trials sampling from a fixed behaviour (i.e., a set of quantum statistics), the amount of certified randomness per trial is arbitrarily close to a certain upper limit. Then [15] argues, appealing to convex geometry and the asymptotic equipartition property (AEP), that an adversary can always implement a probabilistic mixture of conditional probability distributions, independent and identically distributed across successive experimental trials, that generates observed statistics consistent with the fixed behaviour while not needing to generate more than that same upper limit of randomness per trial that is certified by the probability estimation framework. This is important in the sense that the framework certifies all the randomness conceded by the adversary in that particular attack, while also showing that there is no advantage to be gained for the adversary by resorting to (more sophisticated) memory attacks.

In this paper, we provide a full derivation of the asymptotic optimality of the probability estimation framework, filling in some steps omitted by [15], along the way obtaining a better characterisation of the adversary’s optimal probabilistic mixture for generating the observed statistics. Making precise the arguments from convex geometry, we explicitly describe the optimal attack that an adversary can employ with the minimum required number of different conditional distributions in convex mixture to simulate the observed statistics. Our improvement, with a more self-contained approach, upon the result in [15] is to reduce by one the cardinality of the adversary’s (finite-cardinality) set from which the auxiliary random variable takes values. This random variable serves as her side-information and records which conditional distribution occurs in which trial. Specifically, we prove that the number of possible conditional distributions in her optimal probabilistic mixture attack need not be more than one plus the dimension of the set of admissible distributions of a trial (Theorem 4). (We assume the set of admissible probability distributions of a given trial to be closed and convex, where we can take the convex closure when this assumption is not met; then the dimension

\dim (C)

of a non-empty convex subset C of X is the dimension of the smallest affine subset containing C.) An earlier result (Theorem 43 in [15] under the same assumptions) proved only that the cardinality of the value space of the adversary’s side-information need not be more than two plus the dimension of the set of admissible distributions of a trial. Besides contributing to a methodological improvement, we have thus improved the result itself: a better understanding of the optimal attack in the asymptotic regime will establish a benchmark that will enable the implementer of the protocol to defend against these attack modes.

The central results on asymptotic optimality of the method of probability estimation comprise establishing an upper bound on the randomness per trial more than which the adversary need not concede (Theorems 3 and 4) and which is certified by the method of probability estimation (Theorems 5 and 6). Our derivation in Theorem 3 elucidates how only the classical form of the asymptotic equipartition property is needed for the probability estimation framework, allowing a simplified treatment. In addition to strengthening the result in Theorem 4, we have presented proofs for Theorems 5 and 6 (which have appeared previously in [15]), including more details and specifications where we deemed fit. For instance, in the proof for Theorem 6, enlisting the extreme value theorem we avoid an explicit analytic construction as presented in [15] (see Theorem 41 therein). We also consider the question of robustness of the probability estimation framework, not considered in [14,15]; we derive a sufficient condition (Theorem 7) for a probability estimation factor (optimised at a particular distribution) to certify randomness at a positive rate at a statistically different distribution.

We apply our results to the (2,2,2) Bell scenario (the scenario of two parties, two measurement settings and two outcomes), obtaining an analytic characterisation of the optimal attack of an adversary (restricted only by the no-signalling condition) holding classical side information. We show that the optimal adversarial attack involves a decomposition of the observed statistics in terms of a single extremal no-signalling (super-quantum) correlation and eight local deterministic correlations. The proof of optimality relies upon the fact that equal mixtures of two extremal no-signalling nonlocal super-quantum correlations are expressible as an equal mixture of four local deterministic correlations. We show that this result does not generalise to higher scenarios such as the (3,2,2), (2,3,2) and (2,2,3) Bell scenarios, thereby indicating that the possibility of an optimal attack involving only a single extremal strategy is only ensured in the minimal (2,2,2) Bell scenario. Furthermore, we considered the possibility of an adversary holding classical side information (and, hence, restricted to probabilistic attack strategies) but trying to simulate the observed statistics using quantum-achievable probability distributions, while conceding as little randomness as possible. Assuming uniform settings distribution, numerical studies restricted to a two-dimensional slice of the set of quantum-achievable distributions provided some initial evidence that the optimal quantum-achievable attack strategy involves only one extremal quantum correlation, but we were not able to settle this and have phrased it as a conjecture.

The rest of the article is organised as follows: In Section 2, we review the probability estimation framework where Theorem 1 formalises the central idea and Theorem 2 establishes a lower bound on the smooth conditional min-entropy of the sequence of outcomes conditioned on the settings and side-information. We also present a simplified proof of Lemma 1, an important result that allows the algorithm to execute the PEF method, compared to the proofs in [14,15]. In Section 3, we present our complete proof of asymptotic optimality, study the implications for finding an optimal adversarial attack strategy and derive a robustness result. In Section 4, we apply our results to the (2,2,2) Bell scenario obtaining an analytic characterisation of the optimal attack strategy for an adversary restricted only by the no-signalling condition. The optimal attack comprises a decomposition of the observed statistics in terms of a single Popescu–Rohrlich (PR) correlation and (up to) eight local deterministic correlations. We show that, for a higher number of parties, settings and/or outcomes, a crucial result from the (2,2,2) Bell scenario concerning equal mixtures of extremal nonlocal no-signalling correlations does not hold, and infer that the optimal attack may require more than one nonlocal distribution in general. Returning to the (2,2,2) scenario, we discuss a conjecture that the optimal strategy to mimic the observed statistics by means of a probabilistic mixture of quantum-achievable correlations constitutes only a single extremal quantum correlation and (up to) eight local deterministic correlations.

2. The Probability Estimation Framework

The probability estimation method relies on the probability estimation factor (PEF), which is a function that assigns a score to the results of a single trial of a quantum experiment, with higher scores corresponding to more randomness. The paradigmatic application is to a Bell nonlocality experiment comprising multiple spatially separated parties providing inputs (measurement settings) to measuring devices and recording outputs (observed outcomes); an experimental trial’s results then consist of both the choice of inputs and the recorded outputs for that trial. Figure 1 below shows a schematic two-party representation of such an experimental setting. After many repeated trials the product of the PEFs from all the trials is used to estimate the probability of outcomes conditioned on the settings.

For the examples considered in Section 4, we will consider the canonical scenario of two measuring parties Alice and Bob each selecting respective binary measurement settings X and Y and recording respective binary outcomes A and B, which we refer to as the (2,2,2) Bell scenario. For now, we treat things in a general manner as is carried out in [14,15], modelling the trial settings for all parties and outcomes for all parties with single random variables Z and C, respectively, taking values from respective finite-cardinality sets

Z

and

C

. When applied to the (2,2,2) Bell scenario, C comprises the ordered pair

(A, B)

and Z comprises the ordered pair

(X, Y)

.

The results of a sequence of n time-ordered trials are represented by the sequences

C = {C_{i}}_{i = 1}^{n}, Z = {Z_{i}}_{i = 1}^{n}

; and, so,

(C, Z)

realises values

(c, z) \in C^{n} \times Z^{n}

, where

C^{n}, Z^{n}

are the n-fold Cartesian products of

C, Z

. A PEF is then a real-valued function of C and Z satisfying certain conditions, while the product of PEFs from all trials will be a function of

C

and

Z

. High values of the PEF product will correlate with low values of

P (C | Z)

, the conditional probability of the outcomes given the settings.

To define PEFs, we introduce the notion of a trial model: a set

Π

encompassing all joint probability distributions of settings and outcomes which are compatible with basic assumptions about the experiment. One important trial model that we consider is

Π_{Q}

, consisting of joint distributions of

(C, Z)

for which the conditional distribution of C conditioned on Z can be realised by a measurement on a quantum system. Here, we introduce the convention, used throughout, of using lower case Greek letters with random variables as arguments to denote distributions, i.e.,

μ (C, Z)

and

μ (C | Z)

denote the joint distribution of

(C, Z)

and the conditional distribution of C given Z, respectively. Another important trial model is

Π_{NS}

(NS stands for “no-signalling"), consisting of distributions for which probabilities of measurement outcomes at one location are independent of measurement settings at the other distant locations. (This is more clearly understood in considering the Alice–Bob example, where one of the no-signalling conditions is that

\sum_{b} μ (A = a, B = b | X = x, Y = y) = \sum_{b} μ (A = a, B = b | X = x, Y = y^{'})

for all

a, b, x

and

y \neq y^{'}

.) A third important trial model is the set

Π_{L}

of distributions for which the conditional distribution of outcomes conditioned on settings are local, which means they can be expressed as convex mixtures of local deterministic behaviours. In the bipartite setting, the conditional distribution

μ_{LD, λ} (A, B | X, Y)

, also referred to as a behaviour, is local deterministic if

μ_{LD, λ} (A = a, B = b | X = x, Y = y) = [[a = f (x, λ)]] [[b = g (y, λ)]]

(where the notation

[[\dots]]

represents the function that evaluates to 1 if the condition within holds, 0 otherwise). In words, the outcomes are functions of the local settings and the local hidden variable

λ

which can be understood to be a list of outcomes for all possible settings. A formal definition involving more parties and an arbitrary (albeit same) number of outcomes and settings for each party can be found in (48). The sets

Π_{L}, Π_{Q} and Π_{NS}

satisfy the following strict inclusions:

Π_{L} ⊊ Π_{Q} ⊊ Π_{NS} .

Certain distributions in

Π_{Q}

and

Π_{NS}

violate a Bell inequality and are known to contain randomness; they are contained in

Π_{Q} ∖ Π_{L}

and

Π_{NS} ∖ Π_{L}

, respectively. It is precisely the inability to decompose such distributions into deterministic ones, as in

Π_{L}

, that implies the presence of randomness. The objective of the PEF approach is to quantify the randomness contained in such distributions. As trial models specify the joint distribution

μ (C, Z)

, and for the above examples of trial models we gave only the conditional distributions

μ (C | Z)

, one must also specify the marginal distribution of the settings

μ (Z)

. For the discussions of

Π_{Q}

and

Π_{NS}

in subsequent sections, any fixed distribution satisfying

μ (Z = z) > 0

for all

z \in Z

is permitted. An example of a fixed settings distribution is the equiprobable distribution

Unif (Z)

defined as

Unif (z) = 1 / | Z |

for all

z \in Z

.

As a discrete probability distribution is effectively an ordered list of numbers in

[0, 1]

(the probabilities), trial models are always subsets of

R^{N}

, where N is fixed by the cardinality of

C

and

Z

. This enables us to use a geometric approach to study these sets, which prove to be invaluable for some arguments.

We can now define PEFs. We use the notation

E_{μ} [\dots]

and

P_{μ} (\dots)

to denote expectation and probability, respectively, with respect to a distribution

μ

; and for the sake of notational concision we omit commas in distributions or functions of more than one random variable, for instance,

μ (C Z)

and

f (C Z)

must be understood to mean

μ (C, Z)

and

f (C, Z)

.

Definition 1 (Probability Estimation Factor).

A probability estimation factor (PEF) with power

β > 0

for the model of distributions Π is a function

F : C \times Z \to R^{+}

of the random variables

(C, Z)

such that for all

σ (C Z) \in Π

,

E_{σ} [F (C Z) σ (C {| Z)}^{β}] ⩽ 1

holds.

In the expression above,

σ (C | Z)

denotes a random variable that is a function of the random variables C and Z:

σ (C | Z)

is the random variable that assumes the standard conditional probability (according to

σ

) of C taking the value c conditioned on Z taking the value z; it is assigned the value zero if the probability

σ (Z = z)

is zero. The parameter

β

can be any positive real value. We then note that the constant PEF

F (c z) = 1

for all

(c, z) \in C \times Z

is a valid PEF for any choice of

β > 0

. We will notice in the subsequent sections, however, that the parameter does have an effect on the method employed for choosing useful PEFs for the purpose of randomness certification; and in practice we choose the value of

β

that corresponds to the maximum randomness certification.

Prior to defining a PEF we introduced the notion of a trial model. For the application of probability estimation to the outcomes of an experiment, which is a sequence of n time-ordered trials, we introduce the notion of an experiment model: it is a set

Θ

constraining the joint distribution of

C

,

Z

and E, constructed as a chain of individual trial models

Π

; it consists of joint distributions

μ (CZ | E = e)

conditioned on the event

{E = e}

, where E is the random variable denoting the adversary’s side information and realising values e from the finite set

E

. It satisfies the following two assumptions:

\begin{matrix} μ (C_{i + 1} Z_{i + 1} | C_{⩽ i} = c_{⩽ i}, Z_{⩽ i} = z_{⩽ i}, E = e) \in Π, \forall c_{⩽ i} \in C^{i}, z_{⩽ i} \in Z^{i}, e \in E, \\ μ (Z_{i + 1}, C_{⩽ i} Z_{⩽ i} | E = e) = μ (Z_{i + 1} | E = e) μ (C_{⩽ i} Z_{⩽ i} | E = e), \forall e \in E . \end{matrix}

(1)

In (1),

C_{⩽ i}, Z_{⩽ i}

denote the outcomes and measurement settings for the first

i \in [n]

trials, where

[n] : = {1, 2, \dots, n}

, with

c_{⩽ i}, z_{⩽ i}

denoting their respective realisations. The random variables

C_{i + 1}, Z_{i + 1}

are the outcomes and settings for the

(i + 1)

’th trial. The first condition in (1) formalises the assumption that the (joint) probability of the

(i + 1)

’th outcome and setting, conditioned on the outcomes and settings for the first i trials and each realised value

E = e

of the adversary’s side information, belongs to the

(i + 1)

’th trial model, i.e., it is compatible with the conditions dictated by the trial model. The second condition states that for each

E = e

the setting for the next trial is independent of the outcomes and settings of the past and present trials. Our second condition is a stronger assumption than the corresponding assumption given in [14], which is as follows: the joint distribution

μ

of

CZ E

is such that

Z_{i + 1}

is independent of

C_{⩽ i}

conditionally on both

Z_{⩽ i}

and E. It is a straightforward exercise to check that our stronger assumption implies the one stated in [14]. While the weaker assumption is sufficient for the following result, we find the stronger assumption operationally clearer as an assumption that the future settings are independent of “everything in the past" for each realisation of e.

For the rest of the paper we adopt the abbreviated notation of

μ_{y} (X)

for

μ (X | Y = y)

. The following theorem, appearing as Theorem 9 in Appendix C in [14], formalises the central idea behind the framework of probability estimation. We include a proof for this theorem in Appendix A.1.1 for completeness.

Theorem 1.

Suppose

μ : C^{n} \times Z^{n} \times E \to [0, 1]

is a distribution of

CZ E

such that

μ_{e} (CZ) \in Θ

for each

e \in E

. Then, for fixed

β, ϵ > 0

P_{μ_{e}} (μ_{e} (C | Z) ⩾ {(ϵ \prod_{i = 1}^{n} F_{i} (C_{i} Z_{i}))}^{- 1 / β}) ⩽ ϵ

(2)

holds for each

e \in E

, where

F_{i} (C_{i} Z_{i})

is the probability estimation factor for the i’th trial.

Proof.

See Appendix A.1.1. □

The distinguishing feature of the framework of probability estimation is the direct estimation of

μ_{e} (C | Z)

for each

e \in E

by constructing PEFs

F_{i} (C_{i} Z_{i})

and accumulating them trial-wise in a multiplicative fashion. For a fixed error bound

ϵ > 0

and the power parameter

β > 0

, the term

{(ϵ \prod_{i = 1}^{n} F_{i} (C_{i} Z_{i}))}^{- 1 / β}

serves as an estimate for

μ_{e} (C | Z)

. It is important to note that PEFs are functions of only the measurement outcomes and settings and not of the side information held by the adversary to which we do not have access. For a large value of n—the number of trials—the trial-wise product

\prod_{i = 1}^{n} F_{i} (C_{i} Z_{i})

will be large if the experiment is well-calibrated and run properly. For the purpose of randomness generation the inequality (2) in Theorem 1 can then be understood, intuitively, as follows: Since the trial-wise product

\prod_{i = 1}^{n} F_{i} (C_{i} Z_{i})

of the PEFs is large and so, for fixed

ϵ, β > 0

, the quantity

{(ϵ \prod_{i = 1}^{n} F_{i} (C_{i} Z_{i}))}^{- 1 / β}

is small, for each

e \in E

there is a very small probability (denoted by the outer probability

P_{μ_{e}} (\cdot)

) that the conditional probability of the sequence of outcomes

C

conditioned on the sequence of settings

Z

(denoted by

μ_{e} (C | Z)

) is more than a small value. This translates to the measurement outcomes

C

being unpredictably random for a given

ze

. Since this string of experimental outcomes is unpredictable even given the adversary’s side information, it can be used as a source of certifiable randomness. We stress that in the method of probability estimation the estimates on the conditional probability of measurement outcomes given the settings choices and side-information depend solely on the experimental data.

Conventional methods of randomness extraction, however, involve obtaining a lower bound on the smooth conditional min-entropy which quantifies the amount of raw randomness from a source. The lower bound then goes as one of the parameters in extractor functions to extract near-uniform random bits. It is therefore useful to translate the bound in (2) into a statement about the smooth conditional min-entropy with respect to an adversary.

We motivate and introduce conditional min-entropy as follows. An adversary’s goal is to predict C. Conditioned on a particular realisation of the settings sequence

z \in Z^{n}

and side information

e \in E

, one can measure the “predictability” of the sequence of outcomes

C

with the following maximum probability:

max_{c \in C^{n}} μ (c | z e) .

It quantifies the best guess of the adversary. The

z e

-conditional min-entropy of

C

, corresponding to that particular realisation

z e \in Z^{n} \times E

, is the following negative logarithm:

H_{\infty, μ} (C | z e) : = - {log}_{2} (max_{c \in C^{n}} μ (c | z e)) .

The subscript

μ

in the notation

H_{\infty, μ} (\dots)

refers to the distribution

μ (CZ E)

. The average

Z E

-conditional min-entropy is then defined as follows:

H_{\infty, μ}^{avg} (C | Z E) : = - {log}_{2} [\sum_{z e \in Z^{n} \times E} (max_{c \in C^{n}} μ (c | z e)) μ (z e)] .

But, information-theoretic security of cryptographic protocols take into account a more realistic measure of average

Z E

-conditional min-entropy which involves a smoothing parameter

ϵ

, a type of error bound, and is known as the

ϵ

-smooth average

Z E

-conditional min-entropy. This quantity is useful for our scenario in which the probability distribution is not known exactly and its characteristics can only be inferred from observed data, which introduces the possibility of error. It is defined as follows.

Definition 2 (Smooth Average Conditional Min-Entropy).

For a distribution

μ : C^{n} \times Z^{n} \times E \to [0, 1]

of

C, Z, E

the set

B^{ϵ} (μ)

of distributions of

C, Z, E

is defined as

B^{ϵ} (μ) : = {σ : C^{n} \times Z^{n} \times E \to [0, 1] ∣ d_{TV} (σ, μ) ⩽ ϵ},

(3)

where

ϵ \in (0, 1)

and

d_{TV} (σ, μ)

is the total variation distance between σ and μ defined as

d_{TV} : = (σ, μ) \frac{1}{2} \sum_{cz e \in C^{n} \times Z^{n} \times E} | μ (cz e) - σ (cz e) | .

(4)

The ϵ-smooth average

Z E

-conditional min-entropy is then defined as follows.

H_{\infty, μ}^{avg, ϵ} (C | Z E) : = max_{σ \in B^{ϵ} (μ)} [- {log}_{2} [\sum_{z e \in Z^{n} \times E} (max_{c \in C^{n}} σ (c | z e)) σ (z e)]] .

(5)

The lower bound obtained on this quantity goes as one of the inputs to extractor functions in randomness extraction, whose purpose is to convert random functions with uneven distributions into shorter, close to uniformly distributed bit strings. We note that alternative definitions of

ϵ

-smooth conditional min-entropy can be used, for instance, the

ϵ

-smooth worst-case conditional min-entropy of [19]. A known result from the literature, proven in Proposition A1 in Appendix E, justifies our usage of the

ϵ

-smooth average conditional min-entropy without having to be concerned with the stricter

ϵ

-smooth worst-case conditional min-entropy (defined in (A30)): specifically, the two quantities converge to one another in the asymptotic limit.

The result obtained from Theorem 1 can be translated into a result on smooth average conditional min-entropy formalised in Theorem 2 below. This theorem appears as Theorem 1 in [14]. We include a proof for this theorem in Appendix A.1.2 for completeness. In the notation of

ϵ

-smooth average

Z E

-conditional min-entropy in (7), the semicolon followed by

S

denotes that this information-quantity is assessed with respect to the distribution

μ

after conditioning on the occurrence of the event

S

defined in the statement of Theorem 2. It pertains to an abort criterion. The protocol succeeds only if the product of the trial-wise PEFs exceeds some threshold value, otherwise it is aborted. So we want to establish the lower bound for smooth conditional min-entropy conditioned on the event that the protocol succeeds, because it is precisely this scenario in which we extract randomness. Since a completely predictable local distribution can always have a chance of passing the protocol, however minuscule (in the order of

{(3 / 4)}^{n}

, where the number of trials n often goes up to millions)—and

μ (c | z)

will equal 1 in this case—it is necessary to assume a small but positive lower bound on the probability of not aborting to derive a useful min-entropy bound. This can be thought of as another type of error parameter. The assumed lower bound for the probability of success of the protocol is

κ

.

Theorem 2.

Let μ be a distribution

μ : C^{n} \times Z^{n} \times E \to [0, 1]

of

C, Z, E

such that, for each

e \in E

, the following holds for every

ϵ \in (0, 1)

:

P_{μ_{e}} (μ_{e} (C | Z) ⩽ {(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β}) ⩾ 1 - ϵ,

(6)

where

F_{i}

is a PEF with power β for the i’th trial. For a fixed choice of

ϵ \in (0, 1)

and

p ⩾ {| C |}^{- n}

, define the event

S : = \{{(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p\}

. Then, if κ satisfies

0 < κ ⩽ P_{μ} (S)

, the following holds:

H_{\infty, μ}^{avg, ϵ / κ} (C | Z E; S) ⩾ {log}_{2} (κ) - {log}_{2} (p)

(7)

Proof.

See Appendix A.1.2. □

Under the same conditions of Theorem 2, the main result (7) admits a minor reformulation as follows. This is the formulation that aligns with the statement of Theorem 1 in [14].

Corollary 1.

Let

μ : C^{n} \times Z^{n} \times E \to [0, 1]

be a distribution of

CZ E

and F be a PEF with power β such that (6) holds for each

e \in E

. For a fixed choice of

ϵ \in (0, 1)

,

p ⩾ {| C |}^{- n}

and positive

κ ⩽ P_{μ} (S)

where

S = \{{(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p\}

, we have

H_{\infty, μ}^{avg, ϵ} (C | Z E; S) ⩾ (1 + \frac{1}{β}) {log}_{2} (κ) - {log}_{2} (p) .

(8)

Proof.

Use Theorem 2 with

ϵ^{'} = κ ϵ

,

p^{'} = p / κ^{1 / β}

and

κ^{'} = κ

, noting that, since

0 < κ ⩽ 1

and

β > 0

hold, we have

ϵ^{'} \in (0, 1)

and

p^{'} ⩾ {| C |}^{- n}

as required for invoking the theorem. Then, notice the corresponding event

S^{'} = \{{(ϵ^{'} \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p^{'}\}

aligns with the event

S

. □

The above results hold when we consider distributions

μ : C^{n} \times Z^{n} \times E^{n} \to [0, 1]

of

CZE

, i.e., where the side information is structured as a sequence of random variables. The proof remains the same with the exception that we condition on an arbitrary sequence of realisation

e \in E^{n}

of

E

. We consider this scenario in Section 3 where we define an IID attack from the adversary.

Theorem 1 does not indicate how to find PEFs. One way to find useful PEFs is to first notice that the success criterion of the protocol is the event

S

that the inequality

{(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p

holds, which can be equivalently expressed as

\sum_{i = 1}^{n} {log}_{2} (F_{i}) / β + {log}_{2} (ϵ) / β ⩾ - {log}_{2} (p),

(9)

where

ϵ, β and p

are pre-determined quantities to be chosen in advance of running the protocol. Then, considering an anticipated trial distribution

ρ (C Z)

based on observed results and calibrations from previous trials, in the limit of sufficiently large n the difference between the term on the left hand side of (9) (which consists of the trial-wise sum of (base-2) logarithm of PEFs) and

n E_{ρ} [{log}_{2} (F (C Z)) / β]

will be either greater or less than zero with roughly equal probability. This follows from the Central Limit Theorem if the distribution remains roughly stable from trial to trial. Since it is desirable to have the largest value of

- {log}_{2} (p)

possible, one can then perform the following constrained maximisation using any convex programming software owing to the concavity of the objective function and the linearity of the constraints.

\begin{matrix} Maximise : & E_{ρ} [(n {log}_{2} (F (C Z)) + {log}_{2} (ϵ)) / β] \\ Subject to : & E_{ν} [F (C Z) ν (C {| Z)}^{β}] ⩽ 1, for all ν (C Z) \in Π, \\ F (c z) ⩾ 0, for all (c, z) \in C \times Z \end{matrix}

(10)

Since

n, ϵ and β

are fixed, it is sufficient to maximise

E_{ρ} [{log}_{2} (F (C Z))]

subject to the same constraints. In practice, one can consider a range of values of

β

and perform the constrained maximisation with the objective

E_{ρ} [{log}_{2} (F (C Z))]

, then plug in the maximum value in the expression

E_{ρ} [(n {log}_{2} (F (C Z)) + {log}_{2} (ϵ)) / β]

and obtain a plot with respect to the considered range of

β

values (see, for example, Figure 2 in [16]; a similar pattern is observed in Figure 2 in Section 2).

The following lemma (from [14], see Lemma 15)—for which we provide a more direct proof—enables us to restrict the satisfiability constraints of the optimisation routine in (10) to the extremal distributions of the model

Π

under the condition that the model is convex and closed. So, the first line of constraints in (10) can be replaced with

E_{ν} [F (C Z) ν (C {| Z)}^{β}] ⩽ 1, \forall ν (C Z) \in Π_{extr}

, where

Π_{extr}

is the set of extremal distributions of

Π

. If the model

Π

is not convex and closed, we take its convex closure. In words, the lemma states that, if

F (C Z)

is a PEF with power

β > 0

for the distributions

σ_{1} (C Z) and σ_{2} (C Z)

, then it is a PEF with the same power for all distributions that can be expressed as a convex combination of

σ_{1}

and

σ_{2}

.

Lemma 1.

For distributions

σ_{i} (C Z) \in Π

satisfying

E_{σ_{i}} [F (C Z) σ_{i} (C {| Z)}^{β}] ⩽ 1

, for

i = 1, 2

, if

σ (C Z) \in Π

is expressible as

σ (C Z) = λ σ_{1} (C Z) + (1 - λ) σ_{2} (C Z)

for

λ \in [0, 1]

, then it satisfies

E_{σ} [F (C Z) σ (C {| Z)}^{β}] ⩽ 1

.

Proof.

For z such that

σ_{1} (z), σ_{2} (z) > 0

, we have

σ (z) > 0

as well and, from

σ (C Z) = λ σ_{1} (C Z) + (1 - λ) σ_{2} (C Z)

, straightforward algebra shows that

σ (c | z) = δ σ_{1} (c | z) + (1 - δ) σ_{2} (c | z)

for any

(c, z) \in C \times Z

, where

δ = λ σ_{1} (z) / σ (z) \in [0, 1]

. Since, for

α > 1

,

x^{α}

is convex for

x ⩾ 0

, we can write

\begin{matrix} σ (c {| z)}^{1 + β} & {⩽ δ σ (c | z)}^{1 + β} + (1 - δ) σ_{2} {(c | z)}^{1 + β} \\ \Rightarrow σ (c {| z)}^{1 + β} σ (z) & ⩽ λ σ_{1} (c {| z)}^{1 + β} σ_{1} (z) + (1 - λ) σ_{2} (c {| z)}^{1 + β} σ_{2} (z) . \end{matrix}

(11)

Turning to cases where

σ_{1} (z)

and/or

σ_{2} (z)

may equal zero, we can also demonstrate (11) under the convention of taking

σ_{i} (c | z)

to be zero when

σ_{i} (z) = 0

. Then, the inequality holds as an equality when

σ_{1} (z) = σ_{2} (z) = 0

(which implies

σ (z) = 0

as well); for

0 = σ_{2} (z) < σ_{1} (z)

one can verify (11) after noting

σ (c z) = λ σ_{1} (c z)

and

σ (z) = λ σ_{1} (z)

, and the

0 = σ_{1} (z) < σ_{2} (z)

case follows symmetrically. Now, multiplying both sides of (11) by

F (c z)

and summing over

(c, z) \in C \times Z

gives

\begin{matrix} \sum_{c, z} {F (c z) σ (c | z)}^{1 + β} σ (z) & ⩽ λ \sum_{c, z} F (c z) σ_{1} (c {| z)}^{1 + β} σ_{1} (z) + (1 - λ) \sum_{c, z} F (c z) σ_{2} (c {| z)}^{1 + β} σ_{2} (z) \\ \Rightarrow E_{σ} [F (C Z) σ (C {| Z)}^{β}] & ⩽ λ E_{σ_{1}} [F (C Z) σ_{1} (C {| Z)}^{β}] + (1 - λ) E_{σ_{2}} [F (C Z) σ_{2} (C {| Z)}^{β}] \\ ⩽ λ + (1 - λ) = 1 . \end{matrix}

□

We remark that the result of Lemma 1 can also be obtained through specialisation of known quantum results to classical distributions; however, this requires a more technical argument with additional machinery. To elaborate, the proof for Lemma 1 involves showing the joint convexity of

σ (C {| Z)}^{1 + β} σ (Z)

which can be seen as a special case of the joint convexity of sandwiched Rényi powers. To be more specific, it arises as a special case of the joint convexity of

e^{β D_{1 + β} (σ | | ω)}

for

β > 0

when the distribution

ω (C Z)

is taken to be

ω (c z) = σ (z) / | C |, \forall (c, z) \in C \times Z

. Notice that

D_{1 + β} (σ | | ω)

is the (classical) Rényi divergence of order

(1 + β) \in (1, \infty)

of

σ (C Z)

with respect to

ω (C Z)

. The functional

e^{D_{1 + β} (σ | | ω)}

can also be seen as a specialisation (to classical states) of the same functional, defined in terms of (quantum) density states

σ

and

ω

, whose joint convexity was proven in proposition 3 of [20] with an extended technical argument.

3. Asymptotic Performance

The results of the previous section give us a method for certifying randomness. In this section, we assess the asymptotic performance of the method. Our figure of merit is the amount of randomness certified per trial, as measured by the average conditional min-entropy divided by the number of trials n. We will see in this section that the PEF method is asymptotically optimal, in the following sense: given a fixed observed distribution, the PEF method can asymptotically certify an amount of per-trial conditional min-entropy that is equal to the actual per-trial conditional min-entropy generated by an adversary replicating the observed distribution with as little randomness as possible.

To elaborate on this, consider that the adversary’s goal is to minimise the following quantity:

\frac{1}{n} H_{\infty, μ}^{avg} (C | Z E) .

We assume that the adversary has complete knowledge of the distribution

μ

, and can have access to not just the realised value of E but also the realised value of

Z

in guessing

C

. This access to

Z

aligns with the paradigm, as discussed in [11], of “using public (settings) randomness to generate private (outcome) randomness”. The adversary is constrained, however, in that the statistics when marginalised over E must appear to be consistent with an expected observed trial distribution

ρ (C Z)

for the protocol to not abort. Technically, all that is necessary for the protocol to pass is that the observed product of the PEFs must exceed some threshold value chosen by the experimenter—which could be possible with high probability with many different distributions

μ

—but, as the experimenter’s threshold value will likely be chosen based on a full behaviour that they expect to observe, we study attacks that match the expected observed trial distribution exactly. We will find attacks meeting this criterion that are asymptotically optimal for minimising the conditional min-entropy.

Given an expected observed distribution, how can the adversary generate observed statistics consistent with it while yielding as little randomness as possible? She can employ a strategy of preparing multiple different states to be measured that will yield different distributions, each one consistent with the trial model

Π

, whose convex mixture is equal to the observed distribution. If she has an auxiliary random variable E realising values from the finite-cardinality set

E

and recording which state was prepared on which trial, she can predict better the outcome conditioned on her side information

E = e

, in conjunction with the settings Z. Indeed, some of her e-conditional distributions could be deterministic—specifically, the product of a fixed settings distribution and a deterministic behaviour (conditional distribution of the outcomes conditioned on settings), in which case she does not yield any randomness to Alice and Bob on a trial where E takes that value. But, if the overall observed statistics are nonlocal, then she is forced to prepare at least some states that contain randomness even conditioned on e; this, in essence, is because the information that she possesses with E is a local hidden variable.

3.1. I.I.D. Attacks

Given a convex decomposition of the observed distribution, the adversary’s simplest form of an attack is to select e from some finite-cardinality set

E

in an i.i.d manner on each trial according to the distribution that recovers the observed distribution

ρ (C Z)

. A more general attack would allow her to use memory of earlier trials but we will see later that, asymptotically, this does not yield meaningful improvement.

Operationally, we do not like to think of the adversary accessing the devices in between trials to provide a choice of

e_{i}

for each trial. Instead, one can imagine her randomly sampling from the distribution of

E

for all trials, coming up with a choice

e

that encodes all the choices of

e_{i}

for each trial and then supplying this choice to the measured system, in advance, to determine its behaviour in each trial. She keeps a record of

e

to help her predict C later. Through this sampling process there is a small chance that she will sample an atypical “bad”

e

that results in statistics deviating from the observed distribution but the probability that her

e

is typical is asymptotically high. Our figure of merit for the adversary now is:

\frac{1}{n} H_{\infty, μ}^{avg} (C | Z E),

which she wants to minimise with a distribution that, marginalised over

E

, is consistent with i.i.d sampling from an expected observed distribution

ρ

. We formally define the set

Σ_{E}^{ρ}

of distributions

ω : C \times Z \times E \to [0, 1]

of

C, Z, E

mimicking

ρ

through such a convex decomposition as follows, where e is shorthand for the event

{E = e}

:

Σ_{E}^{ρ} : = \{ω (C Z E) : ω (C Z | e) \in Π \forall e \in E, \sum_{e \in E} ω (C Z | e) ω (e) = ρ (C Z)\} .

(12)

Then, an IID attack can be defined as follows.

Definition 3 (IID Attack).

Given a distribution

ω (C Z E) \in Σ_{E}^{ρ}

, we define an IID attack (with ω) to be the distribution ϕ consisting of n independent and identical realisations of random variables

C_{i}, Z_{i}, E_{i}

distributed according to ω; i.e., the joint distribution of the sequence of random variables

C, Z, E

is

ϕ : C^{n} \times Z^{n} \times E^{n} \to [0, 1]

such that

ϕ (CZE) = \prod_{i = 1}^{n} ω (C_{i} Z_{i} E_{i})

.

As mentioned earlier, the adversary randomly samples from the distribution of

E

which represents their knowledge of all trials;

e \equiv (e_{1}, e_{2}, \dots, e_{n}) \in E^{n}

encodes the individual choices

e_{i}

for trial

i \in {1, 2, \dots, n}

. The IID attack satisfies the two assumptions of the experiment model discussed earlier (see (1) and the short discussion that follows immediately). Namely, the (joint) probability of the

(i + 1)

’th trial outcome and input setting, conditioned on each realisation of the outcomes and settings for the first i trials and each realisation

e \in E^{n}

of the side information, satisfies the conditions of the trial model; and, conditioned on each

e \in E^{n}

, the settings for the

(i + 1)

’th trial are (unconditionally) independent of the outcomes and settings of the past and present trials (i.e., the first i trials). This is formally stated and proved in Lemma 2 below.

Lemma 2.

The IID attack as defined in Definition 3 satisfies the following conditions.

\begin{matrix} ϕ (C_{i + 1} Z_{i + 1} | c_{⩽ i} z_{⩽ i} e) & \in Π, \forall c_{⩽ i} \in C^{i}, z_{⩽ i} \in Z^{i}, e \in E^{n} \end{matrix}

(13)

\begin{matrix} ϕ (Z_{i + 1} C_{⩽ i} Z_{⩽ i} | e) & = ϕ (Z_{i + 1} | e) ϕ (C_{⩽ i} Z_{⩽ i} | e), \forall e \in E^{n} \end{matrix}

(14)

Proof.

Consider the distribution

ϕ (CZ | e)

conditioned on a realisation

E = e

, where

ϕ (CZE) = \prod_{i = 1}^{n} ω (C_{i} Z_{i} E_{i})

. Notice that

ϕ (CZ | e) = \prod_{i = 1}^{n} ω (C_{i} Z_{i} | e_{i})

. Marginalising over the random variables

C_{i + 2}, C_{i + 3}, \dots, C_{n}, Z_{i + 2}, Z_{i + 3}, \dots, Z_{n}

we obtain:

ϕ (C_{i + 1} Z_{i + 1} C_{⩽ i} Z_{⩽ i} | e) = \prod_{j = 1}^{i + 1} ω (C_{j} Z_{j} | e_{j})

(15)

Corresponding to a particular realisation

c_{⩽ i} \in C^{i}, z_{⩽ i} \in Z^{i}

, we then have

ϕ (C_{i + 1} Z_{i + 1} c_{⩽ i} z_{⩽ i} | e) =

ω (C_{i + 1} Z_{i + 1} | e_{i + 1}) \prod_{j = 1}^{i} ω (c_{j} z_{j} | e_{j})

; and, since

ϕ (c_{⩽ i} z_{⩽ i} | e) = \prod_{j = 1}^{i} ω (c_{j} z_{j} | e_{j})

, we have

\frac{ϕ (C_{i + 1} Z_{i + 1} c_{⩽ i} z_{⩽ i} | e)}{ϕ (c_{⩽ i} z_{⩽ i} | e)} = ϕ (C_{i + 1} Z_{i + 1} | c_{⩽ i} z_{⩽ i} e) = ω (C_{i + 1} Z_{i + 1} | e_{i + 1}) .

(16)

ω (C_{i + 1} Z_{i + 1} | e_{i + 1})

belongs to the set

Π

for all values of

e_{i + 1} \in E

(by construction of the set

Σ_{E}

, see (12)). Since (16) is true for all realisations

c_{⩽ i} \in C^{i}, z_{⩽ i} \in Z^{i}, e \in E^{n}

we conclude (13) holds. Next, marginalising (15) over

C_{i + 1}

we have:

ϕ (Z_{i + 1} C_{⩽ i} Z_{⩽ i} | e) = ω (Z_{i + 1} | e_{i + 1}) \prod_{j = 1}^{i} ω (C_{j} Z_{j} | e_{j}) = ϕ (Z_{i + 1} | e) ϕ (C_{⩽ i} Z_{⩽ i} | e)

(17)

In (17),

ω (Z_{i + 1} | e_{i + 1}) = ϕ (Z_{i + 1} | e)

can be observed by marginalising (15) over the random variables

C_{1}, \dots, C_{i}, C_{i + 1}, Z_{1}, \dots, Z_{i}

and

ϕ (C_{⩽ i} Z_{⩽ i} | e) = \prod_{j = 1}^{i} ω (C_{j} Z_{j} | e_{j})

(from marginalising (15) over

C_{i + 1}, Z_{i + 1}

); (17) is true for all

e \in E^{n}

; hence, we conclude (14). □

Next, the adversary would like to implement an attack that “generates as little randomness as possible”. One measure of the randomness is the conditional Shannon entropy of the outcomes C conditioned on the inputs Z and the side information E.

Definition 4 (Conditional Shannon Entropy).

For a distribution

μ : C \times Z \times E \to [0, 1]

of

C, Z, E

the conditional Shannon entropy of the outcomes C conditioned on the settings Z and the side information E is defined as

H_{μ} (C | Z E) = - \sum_{c z e} {log}_{2} μ (c | z e) μ (c z e) = E_{μ} [- {log}_{2} μ (C | Z E)] .

(18)

The Greek letter

μ

in the subscript of

H_{μ} (\cdot | \cdot)

refers to the distribution

μ (C Z E)

with respect to which the conditional Shannon entropy is defined.

Theorem 3 below shows that

H_{ω} (C | Z E)

is an asymptotic upper bound on the per-trial conditional min-entropy that the adversary generates with an IID attack employing a trial distribution

ω

that is consistent with the observed distribution

ρ

. This result was discussed but not demonstrated explicitly in [15]. The proof of Theorem 3 involves one of the fundamental technical tools from information theory, the (classical) asymptotic equipartition property (AEP), or equivalently the notion of typical sequences which has the weak law of large numbers at its core.

Suppose

μ

, the distribution of all trials, is obtained as n i.i.d. copies of a single-trial distribution

ω

. Then, for

ϵ_{a} \in (0, 1)

,

δ ⩾ 0

there exists

N (ϵ_{a}, δ)

such that

n ⩾ N (ϵ_{a}, δ)

ensures

E_{μ (ZE)} [P_{μ (C | ZE)} (μ (C | ZE) ⩾ γ)] ⩾ 1 - ϵ_{a}

, where

γ = 2^{- n H_{ω} (C | Z E) - n δ}

and

H_{ω} (C | Z E)

is the conditional Shannon entropy. We refer to this as the AEP condition; it holds by a conditional form of the classical AEP (see, for instance, Section 14.6 in [21]). The set

B^{ϵ_{s}} (μ)

of distributions of

C, Z, E

that are within a

TV

distance of

ϵ_{s}

from

μ

and the sets

A_{ze}

are as defined below:

\begin{matrix} B^{ϵ_{s}} (μ) : = & {σ : C^{n} \times Z^{n} \times E^{n} \to [0, 1] ∣ d_{TV} (μ, σ) ⩽ ϵ_{s}}, \end{matrix}

(19)

\begin{matrix} A_{ze} : = & {c \in C^{n} ∣ μ (c | ze) ⩾ γ}, \end{matrix}

(20)

where

A_{ze}

is defined for any

ze

for which

μ (ze) > 0

. Note that the case

ϵ_{s} = 0

reduces to a bound on the standard (non-smooth) average conditional min-entropy. We now state the result as follows.

Theorem 3.

Let μ be an IID attack with ω. For

ϵ_{s} ⩾ 0

,

ϵ_{a}, δ > 0

and

ϵ_{a} + 2 ϵ_{s} < 1

, there exists

N (ϵ_{a}, ϵ_{s}, δ)

such that for

n ⩾ N (ϵ_{a}, ϵ_{s}, δ)

\frac{1}{n} H_{\infty, μ}^{avg, ϵ_{s}} (C | ZE) ⩽ H_{μ} (C | Z E) + \frac{1}{n} {log}_{2} \frac{1}{1 - ϵ_{a} - 2 ϵ_{s}} + δ .

(21)

Proof.

Throughout, we follow the convention that

σ (c | ze) = 0

for all

c \in C^{n}

for any

ze \in Z^{n} \times E^{n}

with

σ (ze) = 0

. We begin with the inequality

d_{TV} (σ, μ) ⩽ ϵ_{s}

that any

σ \in B^{ϵ_{s}} (μ)

must satisfy and proceed as follows:

\begin{matrix} 2 ϵ_{s} & ⩾ {μ - σ}_{1} = \sum_{cze \in C^{n} \times Z^{n} \times E^{n}} | μ (cze) - σ (cze) | \\ ⩾ \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} | μ (cze) - σ (cze) | \end{matrix}

(22)

\begin{matrix} ⩾ | \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} (μ (cze) - σ (cze)) | \\ = | E_{μ (ZE)} [P_{μ (C | ZE)} (μ (C | ZE) ⩾ γ)] - \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} σ (cze) | \\ ⩾ E_{μ (ZE)} [P_{μ (C | ZE)} (μ (C | ZE) ⩾ γ)] - \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} σ (cze) . \end{matrix}

(23)

The inequality in (22) follows as a result of the sum containing fewer terms; the inequality in (23) follows from the triangle inequality. Now from the AEP condition mentioned above we have the following:

\sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} σ (cze) ⩾ E_{μ (ZE)} [P_{μ (C | ZE)} (μ (C | ZE) ⩾ γ)] - 2 ϵ_{s} ⩾ 1 - ϵ_{a} - 2 ϵ_{s} .

(24)

For any

σ \in B^{ϵ_{s}} (μ)

, we define

M_{ze}^{σ}

for any

ze \in Z^{n} \times E^{n}

as

M_{ze}^{σ} : = {max}_{c \in C^{n}} σ (c | ze)

. The average conditional maximum probability is then expressed as

{\bar{M}}_{σ} : = \sum_{ze} M_{ze}^{σ} σ (ze)

. Because

1 ⩽ \sum_{c \in A_{ze}} μ (c | ze) ⩽ γ | A_{ze} |

, we have

| A_{ze} | ⩽ 1 / γ

for each

ze

and we can write:

\begin{matrix} \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} σ (cze) & = \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} σ (c | ze) σ (ze) ⩽ \sum_{ze : μ (ze) > 0} \sum_{c \in A_{ze}} M_{ze}^{σ} σ (ze) \\ = \sum_{ze : μ (ze) > 0} | A_{ze} | M_{ze}^{σ} σ (ze) ⩽ \frac{1}{γ} \sum_{ze} M_{ze}^{σ} σ (ze) = \frac{{\bar{M}}_{σ}}{γ} . \end{matrix}

(25)

Using (24) and (25) we obtain

{\bar{M}}_{σ} ⩾ γ (1 - ϵ_{a} - 2 ϵ_{s})

from which (21) follows using the definition of smooth average conditional min-entropy. □

Having shown that the per-trial min-entropy generated by an IID attack is asymptotically bounded by the conditional Shannon entropy, we give the following definition of an optimal attack.

Definition 5 (Optimal IID Attack).

The distribution

μ (CZE)

of the sequence of random variables

C, Z, E

is an optimal IID attack if μ is obtained through an IID attack based on a single-trial distribution ω whose conditional Shannon entropy achieves the infimum defined below:

h_{min} (ρ) : = inf_{ω (C Z E) \in Σ_{E}^{ρ}} H_{ω} (C | Z E)

(26)

Additional motivation for naming the attack of Definition 5 optimal is provided by later results in this section, which show that the adversary must generate at least

h_{min} (ρ)

of per-trial conditional min-entropy asymptotically with any attack that replicates the observed distribution

ρ

.

In the theorem that follows, we formalise the claim that the infimum in (26) is achieved. This theorem corresponds to Theorem 43 in [15]; in comparison, the comprehensive proof provided here explicitly works out more of the steps. Crucially, this explicit approach also allowed us to provide an improvement upon the result of Theorem 43 in [15], decreasing the required value of

| E |

by one, thereby better characterising the adversary’s optimal attack. Results in Section 4.2 will illustrate that no further improvement, i.e., a decrease in

| E |

, is possible.

Theorem 4.

Suppose Π is closed and equal to the convex hull of its extreme points. Then, there is a distribution

μ (C Z E) \in Σ_{E}^{ρ}

with

| E | = 1 + dim Π

such that

H_{μ} (C | Z E) = h_{min} (ρ)

.

Proof.

See Appendix B.1.1. □

Theorem 4, in conjunction with the bound in Theorem 3, sets a benchmark for how well the adversary can perform with an IID attack that replicates the observed distribution

ρ (C Z)

. Specifically, the adversary’s goal is to minimise the amount of per trial conditional min-entropy and this shows there exists a strategy to replicate the observed statistics while conceding no more min-entropy per trial than

h_{min} (ρ)

, asymptotically.

3.2. Optimal PEFs

We now show that PEFs can asymptotically certify a min-entropy of

h_{min} (ρ)

per trial from an observed distribution

ρ

. This is notable since it shows that an IID attack can be asymptotically optimal: since the PEF method certifies the presence of

h_{min} (ρ)

min-entropy per trial against any attack, this means no attack can generate observed statistics consistent with

ρ

while conceding a smaller amount of randomness. This furthermore demonstrates that there is nothing to be gained (asymptotically) by the adversary employing a more sophisticated memory-based attack, since the PEF method allows for the possibility of memory attacks. Conversely, the below results show that the PEF method is asymptotically optimal: no (correct) method can certify more min-entropy per trial from

ρ

than the amount that is present in an explicit attack.

To formalise and prove these claims, we use the following technical tool, called an “entropy estimator” as in [15].

Definition 6 (Entropy Estimator).

An entropy estimator of the model Π is a function

K (C Z)

of the random variables

C, Z

such that

E_{σ} [K (C Z)] ⩽ E_{σ} [- {log}_{2} (σ (C | Z))]

holds for all

σ (C Z) \in Π

.

Given an entropy estimator

K (C Z)

, we say that its entropy estimate at a distribution

σ (C Z)

is

E_{σ} [K (C Z)]

. We will see below that an entropy estimator can be used to construct PEFs certifying per-trial min-entropy arbitrarily close to its entropy estimate, underlying the significance of the following result:

Theorem 5.

Suppose Π satisfies the conditions of Theorem 4 and ρ is in the interior of Π. Then, there exists an entropy estimator whose entropy estimate at ρ is equal to

h_{min} (ρ)

.

Proof.

See Appendix B.1.2. □

The assumption that

ρ

is in the interior of

Π

will generally hold if

ρ

is estimated from real data, as the boundary of

Π

is a measure zero set. If the assumption is removed, a weaker version of the theorem can still be obtained, which is discussed in the proof in Appendix B.1.

The entropy estimator

K (C Z)

whose existence is guaranteed by the above theorem can be used to show the existence of a family of PEFs that can become arbitrarily close to certifying

h_{min} (ρ)

amount of per-trial min-entropy. However, for a precise formulation of this claim we need a way to measure the asymptotic rate of min-entropy using PEFs. Recall from (8) that we can lower-bound the per-trial min-entropy certified by a PEF as:

\frac{1}{n} H_{\infty, μ}^{avg, ϵ} (C | ZE; S) ⩾ \frac{1}{n} (1 + \frac{1}{β}) {log}_{2} (κ) - \frac{1}{n} {log}_{2} (p) .

(27)

As in [15], we ignore the

{log}_{2} (κ)

term in the asymptotic regime, as the completeness parameter

κ

can be thought of as a “reasonable” lower bound on the probability that the protocol does not abort, a type of error parameter that one might try to decrease somewhat for longer experiments but not at the exponential decay rate required to make this term asymptotically significant. Focusing then on the

- (1 / n) {log}_{2} (p)

term, recall that success of the protocol is determined by the occurrence of the event

S : = \{{(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p\}

, the inequality in which can be expressed equivalently as:

\frac{1}{n β} \sum_{i = 1}^{n} {log}_{2} (F_{i}) + \frac{1}{n β} {log}_{2} (ϵ) ⩾ - \frac{1}{n} {log}_{2} (p) .

The expression on the left hand side of the above inequality is the negative base-2 logarithm of the upper bound on

μ_{e} (C | Z)

for each

e \in E^{n}

(refer to (2) and the comments following Corollary 1) and so is a rough measure of the amount of randomness, up to an error probability of

ϵ

, present in the outcome data. More concretely, since p will be chosen to make

- (1 / n) log (p)

as large as reasonably possible to optimise min-entropy certified by (27), the anticipated value of the left hand side quantity can be used as a measure of certifiable randomness. For a stable experiment (i.e., one with each trial having the same distribution

σ

belonging to the same model

Π

), the quantity

(1 / n) \sum_{i = 1}^{n} {log}_{2} (F_{i}) / β

approaches

E_{σ} [{log}_{2} (F (C Z))] / β

in the limit

n \to \infty

, while the term

(1 / n β) {log}_{2} (ϵ)

goes to zero for any fixed value of

β

and

ϵ

. Hence, we introduce the following quantity as a measure of per-trial min-entropy certified by a PEF.

Definition 7 (Log-Prob Rate).

The log-prob rate of a PEF

F (C Z)

with power β at a distribution

ρ (C Z)

is defined as

O_{ρ} (F; β) = E_{ρ} [{log}_{2} (F (C Z))] / β

.

We say that a PEF certifies randomness at a distribution

ρ

if the quantity

O_{ρ} (F; β)

is positive. We note that this definition is consistent with our expectation that only nonlocal distributions allow the certification of randomness, as the log-prob rate for a local distribution is a non-positive number, i.e.,

O_{σ_{L}} (F; β) ⩽ 0

: a local behaviour is a convex mixture of (finitely many) local deterministic behaviours

σ_{LD} (C | Z)

. Hence, with a fixed settings distribution

π (z) > 0

, the defining condition

E_{σ} [F (C Z) σ (C {| Z)}^{β}] ⩽ 1

of a PEF for a distribution defined as

σ (c z) = σ_{LD} (c | z) π (z)

, for all

c, z

, is equivalently expressed as

E_{σ} [F (C Z)] ⩽ 1

, since

σ_{LD} (c | z)

is either 0 or 1 for all

c, z

. Due to the concavity of log function, we then have

E_{σ} [{log}_{2} (F (C Z))] ⩽ {log}_{2} (E_{σ} [F (C Z)]) ⩽ 0

using Jensen’s inequality. Hence, no device-independent randomness can be certified at a local-realistic distribution.

Theorem 6.

Given an entropy estimator

K (C Z)

and an observed distribution

ρ (C Z)

, for any

ϵ \in (0, 1 / 2)

there is a PEF whose log-prob rate at ρ is greater than

E_{ρ} [K (C Z)] - ϵ

.

Our proof follows the general approach of Theorem 41 in [15], though we are able to shorten the argument.

Proof.

Given an entropy estimator

K (C Z)

and

ϵ \in (0, 1 / 2)

from the statement of the theorem, for any

γ > 0

we can define a function

F (C Z) = 2^{(K (C Z) - ϵ) γ}

(28)

We will show that there exists a (small) positive value of

γ

for which

F (C Z)

is a PEF with power

β = γ

; the asymptotic log-prob rate of this PEF at

ρ

will then be

E_{ρ} [{log}_{2} (F (C Z))] / β = E_{ρ} [K (C Z)] - ϵ

as desired. So, our task is to find a value of

γ

such that the following inequality holds for all

σ \in Π

:

E_{σ} [F (C Z) σ {(C | Z)}^{γ}] ⩽ 1

We study the left side of the above expression as a function of

γ

; specifically, define a function

f_{σ} (γ) = E_{σ} [F (C Z) σ {(C | Z)}^{γ}] = \sum_{c, z : σ (c z) > 0} {[2^{K (c z) - ϵ} σ (c | z)]}^{γ} σ (c z)

which is, for any fixed choice of

σ

and

K (C Z)

, a convex combination of positive constants raised to the power of

γ

and so is infinitely differentiable at all

γ \in R

. (Note that we never encounter the problematic form

0^{0}

because the argument of

{[\cdot]}^{γ}

will always be strictly positive, as the sum defining

f_{σ}

extends only over values of

c, z

for which

σ (c z)

is positive, and hence

σ (c | z) > 0

.) We can thus Taylor-expand

f_{σ}

about

γ = 0

, obtaining via the Lagrange remainder theorem that, for any positive

γ

, there exists a

k \in (0, γ)

making the following equality hold:

f_{σ} (γ) = f_{σ} (0) + f_{σ}^{'} (0) γ + \frac{f_{σ}^{″} (k)}{2} γ^{2}

(29)

The first term in the expansion satisfies

f_{σ} (0) = \sum_{c z} 1 \cdot σ (c z) = 1

. The coefficient of

γ

in (29) satisfies:

\begin{matrix} f_{σ}^{'} (0) & = \sum_{c, z : σ (c z) > 0} {[2^{K (c z) - ϵ} σ (c | z)]}^{0} σ (c z) ln [2^{K (c z) - ϵ} σ (c | z)] \\ = \sum_{c, z : σ (c z) > 0} σ (c z) [K (c z) - ϵ + {log}_{2} (σ (c | z))] ln (2) \\ = ln (2) (E_{σ} [K (C Z)] - E [- {log}_{2} (σ (c | z))] - ϵ) ⩽ - ϵ ln (2) \end{matrix}

where the inequality follows from the condition

E_{σ} [K (C Z)] ⩽ E_{σ} [- {log}_{2} (σ (C | Z))]

in the definition of an entropy estimator. Hence, (29) yields

f_{σ} (γ) ⩽ 1 - ϵ γ ln (2) + \frac{f_{σ}^{″} (k)}{2} γ^{2}

(30)

for some

k \in (0, γ)

. Now, it is given that a fixed

γ

, k may be different in (30) for different choices of

σ

; however, it must always lie in the interval

(0, γ)

, so if we can show that there is a choice of

γ

such that for any

σ

the following inequality holds for all

k \in (0, γ)

\frac{f_{σ}^{″} (k)}{2} γ^{2} ⩽ ϵ γ ln (2)

(31)

then, for that value of

γ

, we will know that

F (C Z)

as defined in (28) is a valid PEF satisfying the conditions of the theorem. To find the needed value of

γ

making (31) hold and complete the proof, we calculate

\begin{matrix} f_{σ}^{″} (k) & = {ln}^{2} (2) \sum_{c, z : σ (c z) > 0} {[2^{K (c z) - ϵ} σ (c | z)]}^{k} {[{log}_{2} (2^{K (c z) - ϵ} σ (c | z))]}^{2} σ (c z) \\ ⩽ {ln}^{2} (2) M^{k} \sum_{c z : σ (c z) > 0} {σ (c | z)}^{k + 1} {[K (c z) - ϵ + {log}_{2} (σ (c | z))]}^{2} σ (z) \end{matrix}

where

M = {max}_{c z} 2^{K (c z)}

. We now assert that each quantity

σ {(c | z)}^{k + 1} {[K (c z) - ϵ + {log}_{2} (σ (c | z))]}^{2}

is bounded above by a constant

N_{c z}

for all

k > 0

and

N_{c z}

is independent of

σ

. This follows because, for any fixed choice of c and z, this quantity is strictly smaller than the expression

g_{c z} (x) = x {[K (c z) - ϵ + {log}_{2} (x)]}^{2}

for the choice of

x = σ (c | z) \in (0, 1]

(note that since

σ (c | z) \in (0, 1]

,

σ {(c | z)}^{k + 1} ⩽ σ (c | z)

holds for any

k > 0

). Then, two applications of l’Hôpital’s rule demonstrate that

{lim}_{x \to 0} g_{c z} (x)

exists and so

g_{c z}

can be extended to a continuous function on

[0, 1]

where it has a maximum by the extreme value theorem. Invocation of the extreme value theorem, rather than computing an explicit bound, is what primarily allows us to shorten the proof compared to the argument proving Theorem 41 in [15]. Referring to this maximum as

N_{c z}

and letting

N = {max}_{c z} N_{c z}

, we obtain the desired bound as shown below.

f_{σ}^{″} (k) ⩽ {ln}^{2} (2) M^{k} \sum_{z : σ (z) > 0} σ (z) \sum_{c : σ (c, z) > 0} N ⩽ ln (2) M^{k} \sum_{z : σ (z) > 0} σ (z) | C | N = ln (2) M^{k} | C | N .

(32)

This shows that, if

M^{k} γ ⩽ 2 ϵ / | C | N

holds, then (31) holds, from which it follows that a sufficiently small choice of

γ > 0

makes (31) hold for all

k \in (0, γ)

. □

The combination of Theorem 5, which shows the existence of an entropy estimator with entropy estimate

h_{min} (ρ)

, and Theorem 6, which enables the construction of a family of PEFs with log-prob rate arbitrarily close to this entropy estimate, demonstrates the asymptotic optimality of the PEF method.

3.3. Robustness of PEFs

We want to consider a question not considered in the previous PEF papers: can a PEF optimised for

ρ (C Z)

certify randomness for a distribution different from

ρ

, where the difference is measured in terms of the total variation distance between them; in other words, how robust is the PEF? We will see in the next section that, in the (2,2,2) Bell scenario, for any behaviour corresponding to

ρ

violating the CHSH–Bell inequality, PEFs can be (up to any desired

ϵ

-tolerance) asymptotically optimal in terms of log-prob rate at

ρ

while also generating randomness at a positive rate for any behaviour (corresponding to a distribution of outcomes and settings) that violates the CHSH–Bell inequality by a fixed positive amount, which can be chosen to be as small as desired.

The following theorem gives a useful sufficient condition for a distribution different from

ρ

to have a positive log-prob rate and demonstrates that any nontrivial (i.e., non-constant) PEF will have at least some degree of robustness.

Theorem 7.

Let

F (C Z) = G {(C Z)}^{β}

be a non-constant positive PEF with power

β > 0

for Π. The log-prob rate

O_{σ} (F; β)

at a distribution

σ (C Z) \in Π

is related to the log-prob rate

O_{ρ} (F; β)

at

ρ (C Z) \in Π

and the total variation distance between ρ and σ as

| O_{ρ} (F; β) - O_{σ} (F; β) | ⩽ (L - l) d_{TV} (ρ, σ),

(33)

where

L = {max}_{c z} {log}_{2} (G (c z))

and

l = {min}_{c z} {log}_{2} (G (c z))

. Consequently, assuming that

O_{ρ} (F; β)

is positive, the following upper bound on the total variation distance between

ρ (C Z)

and

σ (C Z)

is a sufficient condition for F to have a positive log-prob rate at

σ (C Z)

d_{TV} (ρ, σ) < E_{ρ} [{log}_{2} (G)] / (L - l) .

(34)

Proof.

Using the definition of log-prob rate at a given distribution we have

\begin{matrix} | O_{ρ} (F; β) - O_{σ} (F; β) | = | \sum_{c z} \frac{1}{β} [{log}_{2} (G {(c z)}^{β}) (ρ (c z) - σ (c z))] | \\ = | \sum_{c z} ({log}_{2} (G (c z)) + \frac{L + l}{2} - \frac{L + l}{2}) (ρ (c z) - σ (c z)) | \\ = | \sum_{c z} ({log}_{2} (G (c z)) - \frac{L + l}{2}) (ρ (c z) - σ (c z)) + \frac{L + l}{2} \sum_{c z} (ρ (c z) - σ (c z)) | \\ ⩽ \sum_{c z} | {log}_{2} (G (c z)) - \frac{L + l}{2} | | ρ (c z) - σ (c z) | \\ ⩽ (L - l) \frac{1}{2} \sum_{c z} | ρ (c z) - σ (c z) | = (L - l) d_{TV} (ρ, σ) \end{matrix}

Hence, we have

O_{ρ} (F; β) - (L - l) d_{TV} (ρ, σ) ⩽ O_{σ} (F; β) ⩽ O_{ρ} (F; β) + (L - l) d_{TV} (ρ, σ) .

Assuming that

O_{ρ} (F; β)

is positive, a sufficient condition for

O_{σ} (F; β)

to be positive is

O_{ρ} (F; β) > | L - l | d_{TV} (ρ, σ)

or, equivalently, the following bound on

d_{TV} (ρ, σ)

:

d_{TV} (ρ, σ) < O_{ρ} (F; β) / (L - l) = E_{ρ} [{log}_{2} (G)] / (L - l) .

□

We will see in Section 4.2 that the bound (33) can be saturated and so is tight.

4. Application to the (2,2,2) Bell Scenario

Here, we explore the application of the results of the previous section to the (2,2,2) Bell scenario (that of two parties, two measurement settings and two outcomes). First, working within the trial model of no-signalling distributions

Π_{NS}

, we show that PEFs can be simultaneously asymptotically optimal and robust by means of an explicit construction of a sequence of PEFs that approaches the optimal log-prob rate for the target distribution while simultaneously generating randomness at a positive rate for any other distribution violating the CHSH inequality.

In the course of this exercise, we will observe that the optimal adversarial attack—one generating the observed statistics (consistent with an expected trial distribution

ρ

) while asymptotically yielding

h_{min} (ρ)

amount of per-trial randomness—is always achieved through a single-trial distribution that marginalises to

ρ

through a convex combination of a single extremal no-signalling nonlocal distribution and a local realistic distribution (which itself consists of a convex mixture of up to eight extremal local deterministic distributions). This is a notable feature, revealing that the adversary never needs to prepare more than one nonlocal distribution to simulate the observed distribution with as little min-entropy as possible. Later in this section, we explore the potential for generalisation of this feature to the (2,2,2) scenario restricted to quantum distributions (

Π_{Q}

); if true, this would be an important finding, outlining the optimal approach of a (more realistic) quantum-limited adversary attacking the PEF protocol. The general observation that preparing a single nonlocal state is preferable to preparing multiple ones underlies the significance of the answer to this question. We find some evidence that the feature—only requiring one extremal nonlocal distribution in the convex combination attack—may hold for

Π_{Q}

in the (2,2,2) Bell scenario but this may be a difficult question to resolve due to the complicated geometry of the quantum set. We also explore possible generalisations of this feature to no-signalling trial models for

(n, m, k)

Bell scenarios where n, m or k exceed 2, and find that it does not hold in any of these cases—so the question of whether this holds in a given Bell scenario and trial model is non-trivial in general.

We begin with a brief review of the (2,2,2) Bell scenario and some features of the set

Π_{NS}

of no-signalling distributions in this scenario.

4.1. A Brief Review of the (2,2,2) Bell Scenario

The (2,2,2) Bell scenario is the minimal Bell scenario, comprising two spatially separated parties Alice and Bob, each having two measurement settings and two possible outcomes corresponding to each setting. The measurement settings for Alice and Bob are represented by the RVs

X, Y

realising values

x, y \in {0, 1}

and the measurement outcomes are represented by the RVs

A, B

realising values

a, b \in {0, 1}

. With

σ_{s} (X Y)

representing a fixed settings distribution, we refer to the sets

Π_{NS}, Π_{Q} and Π_{L}

as no-signalling, quantum and local models, respectively, when they comprise of distributions

μ (A B X Y) : = μ (A B | X Y) σ_{s} (X Y)

, where the conditional probabilities

μ (A B | X Y)

, referred to as behaviours, are constrained by the no-signalling, quantum and local realism principle, respectively. Henceforth, all distributions

μ (A B X Y)

belonging to a model are defined as

μ (A B X Y) : = μ (A B | X Y) σ_{s} (X Y)

, and we associate a model with its constituent behaviour

μ (A B | X Y)

or distribution

μ (A B X Y)

, indistinctively, since the settings distribution is fixed. Recall that the model

Π_{NS}

is a polytope, the extremal points of which consist of the behaviours

μ_{extr} (A B | X Y) \equiv \{μ_{extr} (a b | x y) : a, b, [0] x, y \in {0, 1}\}

defined below.

\begin{matrix} μ_{PR}^{α β γ} (a b | x y) : = & \{\begin{matrix} \frac{1}{2} & : a \oplus b = x y \oplus α x \oplus β y \oplus γ \\ 0 & : otherwise \end{matrix} \end{matrix}

(35)

\begin{matrix} μ_{LD}^{α β γ δ} (a b | x y) : = & \{\begin{matrix} 1 & : a = α x \oplus β, b = γ y \oplus δ \\ 0 & : otherwise \end{matrix} \end{matrix}

(36)

where

α, β, γ, δ \in {0, 1}

and ⊕ denotes addition modulo 2; (35) and (36) are known as the Popescu–Rohrlich (PR) behaviours [22] and the local deterministic (LD) behaviours, respectively. The CHSH–Bell inequalities shown below are known to be the only non-trivial facet inequalities delimiting the local polytope which is the convex hull of the LD behaviours [23]. Corresponding to each choice of

α, β, γ \in {0, 1}

, the inequalities represent a version of the canonical CHSH–Bell inequality.

B^{α β γ} : = {(- 1)}^{γ} E_{00} + {(- 1)}^{β + γ} E_{01} + {(- 1)}^{α + γ} E_{10} + {(- 1)}^{α + β + γ + 1} E_{11} ⩽ 2,

(37)

where

E_{x y} : = \sum_{a, b = 0}^{1} {(- 1)}^{a + b} μ (a b | x y)

for

x, y \in {0, 1}

. The nonlocal algebraic maximum for the expression

B^{α β γ}

is 4. The local maximum is obtained by eight

μ_{LD}^{α β γ δ} (A B | X Y)

behaviours for each

B^{α β γ}

. The sets

{LD}_{i}, i \in {1, 2, \dots, 8}

, each comprise of eight LD behaviours that saturate—i.e., achieve a value of 2—exactly one

B^{α β γ}

. A result proven in [24] (see Theorems 2.1 and 2.2 therein) states that any behaviour violating (37) can be represented as a convex combination of one PR box achieving the nonlocal maximum for

B^{α β γ}

and (up to) eight LD behaviours of the corresponding

{LD}_{i}

set saturating it. In fact, the geometry of the no-signalling polytope in this Bell scenario is such that there is a one-to-one correspondence between the nonlocal no-signalling extremal points, the PR boxes, in (35) and the non-trivial facets of the local polytope described by (37), with exactly one extremal point violating it up to the algebraic maximum of four for each choice of

(α, β, γ) \in {0, 1}^{3}

. Hence, any nonlocal behaviour—that violates a given version of the CHSH–Bell inequality—is contained in a nonlocal 8-simplex whose vertices are the one PR box that maximally violates that particular version and the eight LD behaviours that saturate it. Recall that a p-simplex is a p-dimensional polytope which is the convex hull of its

p + 1

vertices. More formally, if the set

C : = {{\vec{a}}_{0}, {\vec{a}}_{1}, \dots, {\vec{a}}_{p}} \subset R^{n}

of

p + 1

points are affinely independent, then the p-simplex determined by them is the following set of points:

Δ^{p} : = \{\sum_{k = 0}^{p} θ_{k} {\vec{a}}_{k} | \sum_{k = 0}^{p} θ_{k} = 1, θ_{k} ⩾ 0 for k = 0, 1, \dots, p\} .

The affine independence condition means that the only admissible choice of

θ_{k} \in R

such that

\sum_{k = 0}^{p} θ_{k} {\vec{a}}_{k} = \vec{0}

and

\sum_{k = 0}^{p} θ_{k} = 0

are satisfied is

θ_{k} = 0

for all k; this holds if and only if the vectors

{\vec{a}}_{k} - {\vec{a}}_{0}

are linearly independent for

k = 1, 2, \dots, p

.

One can check that the PR box that achieves the nonlocal maximum for a given version of the CHSH–Bell expression

B^{α β γ}

and the eight LD behaviours that achieve the local maximum for it are affinely independent. Since

a, b, x, y \in {0, 1}

and

{| {0, 1}}^{4} | = 16

, we can represent the behaviours

μ (a b | x y)

in this Bell scenario as vectors

\vec{μ} \in R^{16}

as shown in Table 1. Then, the affine independence is apparent: letting the PR box behaviour be

{\vec{a}}_{0}

and the LD behaviours be the other

{\vec{a}}_{k}

, each

{\vec{a}}_{k} - {\vec{a}}_{0}

term has a unique column where it contains a “1” while all of the other terms contain “0”, ensuring linear independence.

It is known that a behaviour belonging to

Π_{NS} ∖ Π_{L}

violates exactly one of the eight CHSH–Bell inequalities. The impossibility of simultaneously violating a specific pair of CHSH–Bell inequalities can be seen as presented in [25]: suppose a behaviour in

Π_{NS} ∖ Π_{L}

violates both inequalities corresponding to

(α, β, γ) = (0, 0, 0)

and

(α, β, γ) = (1, 0, 0)

, then

E_{00} + E_{01} + E_{10} - E_{11} > 2

and

E_{00} + E_{01} - E_{10} + E_{11} > 2

holds for the same behaviour. Adding these two inequalities we have

2 (E_{00} + E_{01}) > 4

, i.e.,

E_{00} + E_{01} > 2

, which is not possible to satisfy since the correlations

E_{x y}

satisfy

| E_{x y} | ⩽ 1

.

Table 2 lists the eight versions of the Bell expression

B^{α β γ}

and the eight nonlocal 8-simplices

Δ_{PR, i}^{8}

containing points that violate the corresponding CHSH–Bell inequality. Any nonlocal no-signalling behaviour ultimately belongs to exactly one such simplex.

4.2. Robust PEFs and Optimal Adversarial Attacks in the (2,2,2) Bell Scenario

We now examine the robustness of PEFs that are optimal for an anticipated distribution

ρ

and a fixed number of planned trials n. We first review how we find optimal PEFs in this scenario. The constrained maximisation routine in (10) provides a method to find useful PEFs with respect to an anticipated trial distribution, with Lemma 1 showing that the feasibility constraints in (10) can be restricted to only the distributions corresponding to the eight PR and sixteen LD behaviours (with a fixed settings distribution

σ_{s} (X Y) > 0

).

In practice, the number of trials n will affect the choice of

β

and the PEF that optimises the quantity

E_{ρ} [(n {log}_{2} (F (C Z)) + {log}_{2} (ϵ)) / β]

, a quantity which (per the discussion surrounding (10)) can be thought of as the anticipated amount of raw randomness from running the experiment whose trial distribution is expected to be

ρ

. If we divide this quantity by n, we arrive at a measure of expected randomness per trial for the optimal PEF at a given value of

β

, called the net log-prob rate: the function

({max}_{F} O_{ρ} (F; β)) + {log}_{2} (ϵ) / n β

. Figure 2 shows a plot of the net log-prob rates corresponding to two different values of n, as well as the supremum of the log-prob rate, for

β

varying from

0.001

to

0.1

and

ϵ

fixed at the value

10^{- 4}

. The value of

β

, and the corresponding PEF that maximises the curve, is then the best choice for the given planned number of trials n.

The plot illustrates some notable features of PEFs. First, it was proved in Appendix D of [14] that assuming a stable experiment (with each trial distribution

ρ

) the function

{sup}_{F} O_{ρ} (F; β)

is monotonically non-increasing in

β > 0

which implies that the global supremum of the log-prob rates

{sup}_{β > 0} {sup}_{F} O_{ρ} (F; β)

, for all PEFs with positive powers, is achieved in the limit

β \to 0

. We observe this with the top curve. For a fixed

ϵ

, the net log-prob rate converges upwards to

{sup}_{F} O_{ρ} (F; β)

for each

β

as

n \to 0

but, for any fixed value of n,

{log}_{2} (ϵ) / n β

diverges to

- \infty

as

β \to 0

. Hence, in a finite trial regime the supremum of the log-prob rates (attainable by PEFs with positive powers) is not achieved—the maximum value of the net log-prob rate is achieved at a

β

away from 0. The general trend is that for a value of n the net log-prob rate achieves a higher value corresponding to a lower value of

β

; the net log-prob rate is improved by a reduction in power and an increase in the number of trials. This is observed in Figure 2 for the two choices of

n = 1.5 \times 10^{5}

and

n = 2.4 \times 10^{5}

. As a side note, the proof that

β^{'} < β

implies

{sup}_{F} O_{ρ} (F; β^{'}) ⩽ {sup}_{F} O_{ρ} (F; β)

is straightforward: write

β^{'} = γ β

with

0 < γ < 1

; then, for any F in the scope of

{sup}_{F} O_{ρ} (F; β)

, it turns out

F^{γ}

is a PEF with power

β^{'}

, for which the equality

O_{ρ} (F^{γ}; β^{'}) = O_{ρ} (F; β)

follows immediately from the definition of log-prob rate—hence, the supremum of log-prob rates cannot be smaller at

β^{'}

.

F^{γ}

is a PEF with power

β^{'}

as

E_{ρ} (F^{γ} σ {(c | z)}^{β γ}) ⩽ E_{ρ} {(F σ {(c | z)}^{β})}^{γ} ⩽ 1^{γ} = 1

, with the first inequality holding by Jensen’s inequality (

f (x) = x^{γ}

is concave) and the second because F is a PEF with power

β

.

The arguments above illustrate how it is necessary to consider a range of

β

values to find the optimal choice. We remark there is an upper limit to the range of

β

values that must be considered: it was noted in [14] (see Appendix F therein) that there exists a certain threshold value

β_{th}^{NS}

such that, for all

β ⩾ β_{th}^{NS}

, the optimisation problem in (10) will return the same PEF independent of the choice of

β

and [14] cites numerical evidence that this bound is

β_{th}^{NS} ≃ 0.4151

. The following result, whose proof we give in the appendix, derives this threshold analytically, finding it to have the exact value

{log}_{2} (4 / 3)

.

Proposition 1.

For the set of behaviours

Π_{NS}

, the PEF optimisation in (10) is independent of the power β for

β ⩾ {log}_{2} (4 / 3)

.

Proof.

See Appendix F. □

We now ask how optimal PEFs for lower and lower values of

β

(and correspondingly higher values of n) compare on the question of robustness, in the following sense: can a PEF optimised with respect to a distribution

ρ

violating the standard CHSH–Bell inequality be used to certify randomness of distributions that are different from

ρ

, provided they violate the same CHSH–Bell inequality? This question is relevant because, in practice, the observed experimental distribution will never be exactly the same as the anticipated one and may be somewhat different depending on many potential factors. Figure 3 gives an illustration of the matter of robustness. Comparing the two plots of the log-prob rate for quantum-realisable distributions on the two-dimensional slice (shown in Figure 4b) above the standard CHSH–Bell facet, we observe that the level set denoting a zero amount of certified randomness in the right hand plot (which corresponds to a lower value of

β

than that on the left) is pushed further down to (almost touching) the standard CHSH–Bell facet.

This suggests that the asymptotic optimality of a PEF need not entail a trade-off with its robustness; indeed, we observed that, in many cases, as

β > 0

assumes smaller and smaller values, the PEF optimised for a fixed

ρ

violating the standard CHSH–Bell inequality becomes more and more robust in the sense that it certifies randomness at a positive rate (asymptotically) for increasingly statistically different

σ

.

We show that this is a general feature. To this end, we define a sequence of PEFs that is both asymptotically optimal with respect to the log-prob rate and is asymptotically robust in the sense that, given any distribution violating the standard CHSH–Bell inequality, all the PEFs beyond a point in the sequence certify randomness at a positive rate. To construct this PEF sequence, we first define the function

K_{*} (A B X Y)

as shown below:

K_{*} (a b x y) : = 4 [[a \oplus b = x y]] - 3,

(38)

where

a, b, x, y \in {0, 1}

and the function

[[\cdot]]

evaluates to 1 if the condition within holds, 0 otherwise. The function defined in (38) is an entropy estimator for the distributions in the no-signalling polytope when the settings are equiprobable, i.e.,

σ_{s} (x y) = 1 / 4

for all choices of x and y. To see this, recalling Definition 6 we can check—by direct evaluation—whether

K_{*}

satisfies the inequality

E_{σ} [K (C Z)] ⩽ E_{σ} [- {log}_{2} (σ (C | Z))]

when

σ

is each of the extremal points of the no-signalling polytope. It is sufficient to check this condition for the extremal points of the no-signalling set, i.e., the PR behaviours and the LD behaviours. This is because if

σ

is expressible as

σ = λ σ_{1} + (1 - λ) σ_{2}

then, for any function K satisfying

E_{σ_{i}} [K (A B X Y)] ⩽ H_{σ_{i}} (A B | X Y)

, we have

E_{σ} [K] = λ E_{σ_{1}} [K] + (1 - λ) E_{σ_{2}} [K] ⩽ λ H_{σ_{1}} (A B | X Y) + (1 - λ) H_{σ_{2}} (A B | X Y) ⩽ H_{σ} (A B | X Y)

. Hence, if the condition holds for the extremal points, it will hold for all points in the set. To see that it does, we confirm by inspection that

E_{σ} [K_{*}]

attains the value 1 for the PR behaviour achieving the no-signalling maximum for the standard CHSH function, the value

- 3

for the PR behaviour achieving

- 4

and the value

- 1

for each of the PR behaviours that achieve the value 0, which are all less than or equal to the conditional Shannon entropy of the respective PR behaviours, which is 1. Likewise, we can check that

K_{*}

is a valid entropy estimator for all the LD behaviours; it takes the value zero for the eight local deterministic distributions appearing in Table 1 and

- 2

for the other eight, while

H (A B | X Y) = 0

for these distributions. Hence, we have verified that

K_{*}

satisfies the entropy estimator condition for all the extremal behaviours and by extension all behaviours in the no-signalling polytope.

Having shown

K_{*}

is a is an entropy estimator, we next consider a sequence of functions

{F_{k}}_{k = 1}^{\infty}

where

F_{k}

is defined according to the construction in Theorem 6:

F_{k} (A B X Y) = 2^{(K_{*} (A B X Y) - e^{- k}) β_{k}},

(39)

where we choose a positive

β_{k}

making

F_{k}

a PEF for each k, whose existence is guaranteed by the theorem. By construction, for each k the function

F_{k}

is a valid PEF with power

β_{k} > 0

for the set of no-signalling distributions. The log-prob rate of

F_{k}

at

σ

is:

O_{σ} (F_{k}; β_{k}) = \frac{1}{β_{k}} E_{σ} [{log}_{2} (2^{(K_{*} - e^{- k}) β_{k}})] = E_{σ} [K_{*}] - e^{- k} .

(40)

We show robustness of the sequence in the following sense: for any

σ \in Π_{NS}

violating the standard CHSH–Bell inequality, the log-prob rate of the sequence of PEFs

{F_{k}}_{k = 1}^{\infty}

is eventually positive. To see this, recall that, as discussed in our brief review of the (2,2,2) Bell scenario, behaviours violating the standard CHSH–Bell inequality are contained in the nonlocal 8-simplex

Δ_{PR, 1}^{8}

(see Table 2). Hence,

σ

is expressible as a convex combination of the vertices of

Δ_{PR, 1}^{8}

:

σ (a b | x y) σ_{s} (x y) = λ_{PR, 1} μ_{PR, 1} (a b | x y) σ_{s} (x y) + \sum_{i = 1}^{8} α_{i} μ_{LD, i} (a b | x y) σ_{s} (x y),

(41)

where

λ_{PR, 1} + \sum_{i = 1}^{8} α_{i} = 1

. This decomposition allows us to express the log-prob rate in terms of the standard CHSH–Bell function, which we define as

S (A B X Y) : = {(- 1)}^{X Y} {(- 1)}^{A + B} / σ_{s} (X Y),

where

σ_{s} (X Y)

is the fixed settings distribution. We see that

λ_{PR, 1} = (S_{σ} - 2) / 2

in (41), where

S_{σ}

is the expected standard CHSH–Bell value according to the distribution

σ (A B X Y) = σ (a b | x y) σ_{s} (x y)

. This follows by computing the expectation of S according to the PR box distribution

μ_{PR, 1} = μ_{PR, 1} (a b x y) = μ_{PR, 1} (a b | x y) σ_{s} (x y)

, which is 4, and the expectation of S according to the local distribution

μ_{L, i} = μ_{L, i} (a b x y) = μ_{LD, i} (a b | x y) σ_{s} (x y)

, which is 2. The log-prob rate

O_{σ} (F_{k}; β_{k})

for

F_{k}

at

σ

is then expressed as:

O_{σ} (F_{k}; β_{k}) = (\frac{S_{σ} - 2}{2}) E_{μ_{PR, 1}} [K_{*}] + \sum_{i = 1}^{8} α_{i} E_{μ_{LD, i}} [K_{*}] - e^{- k} .

(42)

Since

E_{μ_{LD, i}} [K_{*}]

evaluates to zero for each

μ_{LD, i}

and

E_{μ_{PR, 1}} [K_{*}]

evaluates to 1, the expression for

O_{σ} (F_{k}; β_{k})

reduces to

O_{σ} (F_{k}; β_{k}) = \frac{S_{σ} - 2}{2} - e^{- k}

. As

k \to \infty

,

O_{σ} (F_{k}; β_{k}) = (S_{σ} - 2) / 2

and so the quantity is eventually strictly positive provided

S_{σ} > 2

, i.e., provided

σ

violates the standard CHSH–Bell inequality.

Continuing our discussion on robustness, a different perspective on it would be to ask: given a PEF F with power

β > 0

optimised with respect to the distribution

ρ

, how far in terms of total-variation distance can another distribution

σ

be such that the same PEF (with the same power) can be used to certify randomness? Theorem 7 provides a sufficient condition for the robustness of a positive, non-constant PEF

F = G^{β}

with power

β

in the following sense: assuming the log-prob rate of F at

ρ

is positive, the log-prob rate of F at a different distribution

σ

is positive if

d_{TV} (ρ, σ)

is within a certain bound (as given in (34)). For the sequence

{F_{k}}_{k = 1}^{\infty}

of PEFs the upper bound on

d_{TV} (ρ, σ)

is computed as follows: Notice that in the sequence

{F_{k}}_{k = 1}^{\infty}

of PEFs,

F_{k}

is of the form

F_{k} = G_{k}^{β_{k}}

, where

G_{k} = 2^{K_{*} - e^{- k}}

. The upper bound on

d_{TV} (ρ, σ)

(as given in (34)) is then

E_{ρ} [G_{k}] / (L - l) = \frac{1}{4} (\frac{S_{ρ} - 2}{2} - e^{- k})

. It is worthwhile to observe that, given a standard CHSH–Bell inequality violating distribution

ρ

, this upper bound approaches the strength of nonlocality for

ρ

which is expressed as

(S_{ρ} - 2) / 8

. The strength of nonlocality is defined in terms of how far the nonlocal no-signalling distribution

ρ

is from the local set

Π_{L}

[26]. It is defined as follows:

d_{NL} (ρ) : = \frac{1}{| X | | Y |} \frac{1}{2} min_{τ \in Π_{L}} \sum_{a b x y} | ρ (a b | x y) - τ (a b | x y) |,

(43)

where the minimum is over all distributions

τ

belonging to the local set

Π_{L}

. In the definition of

d_{NL} (ρ)

in (43) we have assumed a uniform settings distribution as is evident from the factor

1 / | X | | Y |

, where

| X |

and

| Y |

denote the number of the measurement settings choices for Alice and Bob, respectively (which for the (2,2,2) Bell scenario is 2 for Alice and 2 for Bob). A theorem in [24] (see Theorem 3.1) provides a condition for the local distribution

τ

such that the minimum

(1 / 2) {min}_{τ \in Π_{L}} \sum_{a b x y} | ρ (a b | x y) - τ (a b | x y) |

in (43) is achieved and that the minimum comes out to be the weight

(S_{ρ} - 2) / 2

on the PR box in the expression of

ρ

as the convex combination of the vertices of

Δ_{PR, 1}^{8}

; and so per the definition in (43)

d_{NL} (ρ) = (S_{ρ} - 2) / 8

. Thus, the bound

\frac{1}{4} (\frac{S_{ρ} - 2}{2} - e^{- k})

from Theorem 7 approaches

\frac{S_{ρ} - 2}{8}

which is the strength of nonlocality

d_{NL} (ρ)

for

ρ

. This illustrates that a bound of this form cannot be improved, in the sense that increasing the total variation distance from

ρ

by any positive amount will encompass local distributions which cannot certify randomness.

Thus,

{F_{k}}_{k = 1}^{\infty}

is fully robust as

k \to \infty

. Next, we confirm that

{F_{k}}_{k = 1}^{\infty}

is asymptotically optimal in terms of min-entropy per trial (i.e., log-prob rate), for any distribution

σ

violating the standard CHSH inequality. Since

Π_{NS}

is closed and equal to the convex hull of its extremal points, Theorem 4 implies that, given such a

σ

, the adversary has a strategy obtained through an IID attack based on a single-trial distribution whose conditional Shannon entropy is equal to the infimum defined in (26). We can identify this attack. The optimisation in (26) can be expressed as follows:

h_{min} (σ) = min \{H_{ν} (A B | X Y E) : ν_{e} \in Π_{NS}, \sum_{e} ν (e) ν_{e} = σ\},

(44)

where

ν_{e} = ν (A B X Y | e)

. We compute

H_{ν} (A B | X Y E)

for the decomposition of

σ

given in (41), where we have noted

λ_{PR, 1} = (S_{σ} - 2) / 2

. Since the conditional Shannon entropy is one for PR boxes and zero for LD behaviours, we obtain

H_{ν} (A B | X Y E) = (S_{σ} - 2) / 2

and, hence,

h_{min} (σ)

is no larger than this value. But since this expression is the same as that of the asymptotic log-prob rate of the sequence

{F_{k}}_{k = 1}^{\infty}

of valid PEFs, we can say

h_{min} (σ)

is also no smaller than this value and so

h_{min} (σ) = (S_{σ} - 2) / 2

. This demonstrates the asymptotic optimality of the sequence

{F_{k}}_{k = 1}^{\infty}

in the sense that the PEFs in the sequence become arbitrarily close to certifying an asymptotic randomness rate of

h_{min} (σ)

.

In our proof of the asymptotic optimality of the sequence

{F_{k}}_{k = 1}^{\infty}

, we identified the optimal attack by an adversary: it is to prepare the decomposition in (41) with each e corresponding to one of the (up to) nine extremal behaviours, with respective

ν (e)

weights of

λ_{PR, 1}

and

α_{i}

. This can be seen to be the unique attack achieving

h_{min} (σ)

, through an argument we sketch as follows: (1) any

ν

-decomposition of

σ

can be improved upon (i.e., reducing

H_{ν} (A B | X Y E)

) by considering only extremal

ν_{e}

, by the concavity of conditional Shannon entropy; (2) any decomposition including positive weights on more than one PR box can be strictly improved upon by one with weights on a single PR box, by Theorem 2.1 of [24], which shows how to replace equal mixtures of two PR boxes with mixtures of a single PR box and local deterministic distributions; (3) this decomposition can be further strictly improved via Theorem 2.2 of [24] by removing any local deterministic distributions not saturating the CHSH–Bell inequality with those that do (the improvement being obtained by decreasing the weight on the sole remaining PR box). The resulting decomposition—that of (41)—is thus the unique optimiser of (44). It witnesses the bound of

1 + dim Π_{NS} = 1 + 8 = 9

on the set

E

(as shown in Theorem 4). In general, positive weight on all nine extremal boxes may be necessary, due to their affine independence which was noted in Section 4.1. One can confirm this visually from Table 1: weight on the (only) nonlocal distribution, the PR box, is necessary to violate the CHSH–Bell inequality and any distribution with non-zero probabilities for each possible outcome (a property possessed by, for example, the quantum distribution saturating Tsirelson’s bound) will require positive weight on all the local deterministic behaviours, as each LD behaviour corresponds to a distinct sole appearance of the number “1” in a column otherwise populated by zeroes in Table 1. This witnesses that further reduction of the

1 + dim Π_{NS}

bound on

| E |

in Theorem 4 is impossible and so this bound is optimal.

It is an important observation that the adversary needs to prepare only one non-classical state in her realisation of the optimal attack, since the preparation of a non-classical state is likely the most difficult aspect of the attack. We now explore possible generalisations of this feature to other trial models.

4.3. Characterising the Optimal Attack in Different Scenarios

We start by exploring the possibility of arriving at a similar analytic characterisation of the optimal adversarial attack when the adversary is limited to only quantum-realisable distributions. Suppose now that our trial model is the set

Π_{Q}

of quantum-achievable distributions for the (2,2,2) scenario. The adversary is still constrained to performing probabilistic attacks to simulate the trial statistics, while generating the least amount of randomness possible; however, she now tries to mimic the trial statistics using quantum-achievable distributions. The optimisation routine depicting this goal is:

{\tilde{h}}_{min} (σ) = min \{H_{ω} (A B | X Y E) : ω_{e} \in Π_{Q}, \sum_{e} ω (e) ω_{e} = σ\},

(45)

where

ω_{e} = ω (A B X Y | e)

. The set

Π_{Q}

is compact and convex, but, unlike

Π_{NS}

, is not a polytope and so there is a continuum of extremal points.

We conjecture that the minimum in (45) is achieved at a distribution that marginalises to the observed trial distribution through a convex combination of (only) one quantum extremal distribution violating the standard CHSH–Bell inequality and no more than eight local deterministic distributions that saturate the same inequality.

An attempt to prove this will require an understanding of the geometry of the quantum set and in particular its extremal points. We do not yet have a complete characterisation of the set of behaviours

Π_{Q}

(in the true

R^{8}

space), although a recent work has conjectured an analytic criterion for extremality in the CHSH scenario [27]. However, a characterisation does exist when we make the assumption of unbiased marginals:

μ (A = 0 | x) = μ (A = 1 | x) = 1 / 2

for all

x \in {0, 1}

and

μ (B = 0 | y) = μ (B = 1 | y) = 1 / 2

for all

y \in {0, 1}

, in which case the set of behaviours is four dimensional. The unbiased marginal case has been completely characterised, a detailed exposition of which can be found in [25] (see Theorem 1 therein).

A key enabling step in the direction of characterising the optimal attack in the unbiased marginals case would be to see if the following two conditions hold simultaneously: first, a convex combination of any two extremal quantum behaviours can be expressed equivalently as a different convex combination of one extremal quantum behaviour (different from the previous two) and classical noise (mixtures of the local deterministic behaviours), i.e., for extremal quantum behaviours

{\vec{μ}}_{1}, {\vec{μ}}_{2}

, the convex combination

λ {\vec{μ}}_{1} + (1 - λ) {\vec{μ}}_{2}

can be re-expressed as the convex combination

δ {\vec{μ}}_{3} + (1 - δ) {\vec{μ}}_{0}

, where

λ, δ \in (0, 1)

,

{\vec{μ}}_{3}

is a third extremal quantum behaviour and

{\vec{μ}}_{0}

is a mixture of the local deterministic behaviours; and, second,

λ H_{μ_{1}} (A B | X Y) + (1 - λ) H_{μ_{2}} (A B | X Y) ⩾ δ H_{μ_{3}} (A B | X Y)

, where the term

(1 - δ) H_{μ_{0}} (A B | X Y)

that might be expected to appear on the right vanishes due to the concavity of conditional Shannon entropy and the fact that it is zero for local deterministic behaviours, into which

{\vec{μ}}_{0}

can be decomposed.

A numerical inspection to check—by means of an exhaustive search—whether these two conditions hold simultaneously (in the uniform marginals case) introduces a lot of free variables. If we add more symmetry to the behaviours with uniform marginals and constrain ourselves to the two-dimensional slice as shown in Figure 4a, one can perform a numerical search to see whether the two conditions mentioned above hold simultaneously and we did observe it to hold in some initial numerical investigations comparing the

\vec{θ}

decompositions against the

\vec{ν}

decompositions as depicted in Figure 4b. The behaviours in the two-dimensional are represented by the formula:

\vec{μ} = \frac{S}{4} {\vec{μ}}_{PR, 1} + \frac{S^{'}}{4} {\vec{μ}}_{PR, 2} + (1 - \frac{S + S^{'}}{4}) {\vec{μ}}_{0}, S, S^{'} \in [- 4, 4],

(46)

where

{\vec{μ}}_{0}

is the maximally random behaviour obtained as the equal mixtures of all 16 local deterministic behaviours. The disk

S^{2} + {(S^{'})}^{2} ⩽ 8

represents the set of quantum behaviours. Table 3 depicts a tabular representation of the behaviours expressible as (46). (As a side note, one way to add more symmetry to the behaviours with uniform marginals is as follows: a behaviour with uniform marginals can be completely specified by the correlators

(E_{00}, E_{01}, E_{10}, E_{11})

, where

- 1 ⩽ E_{x y} ⩽ 1, \forall x, y

; see the line following (37) for the definition of

E_{x y}

. To obtain behaviours in the two-dimensional slice as shown in Figure 4a one can restrict attention to distributions of the form

μ (a b | x y) = \frac{1}{4} (1 + {(- 1)}^{a + b} C_{x y})

where

C_{00} = - C_{11} = \frac{E_{00} - E_{11}}{2}

and

C_{01} = C_{10} = \frac{E_{01} + E_{10}}{2}

.)

Going beyond the minimal Bell scenario, we considered the possibility of a similar characterisation of optimal no-signalling adversarial attack in higher

(n, m, k)

Bell scenarios. In the (2,2,2) Bell scenario the analytical characterisation of the optimal adversarial attack crucially relied upon the geometric features of the no-signalling polytope, namely Theorems 2.1 and 2.2 in [24]: that equal mixtures of two PR behaviours are expressible as equal mixtures of four distinct LD behaviours and, consequently, a behaviour violating any of the eight versions (up to local relabelling of the outcomes and settings) of the CHSH–Bell inequality is expressible as a convex combination of the one PR behaviour achieving the nonlocal maximum and (up to) eight LD behaviours achieving the local maximum of the corresponding CHSH–Bell expression. These geometric features, however, do not extend to the no-signalling polytopes of higher

(n, m, k)

Bell scenarios. Membership of equal mixtures of extremal no-signalling nonlocal behaviours in the local polytope holds solely in the (2,2,2) Bell scenario.

Below, we provide examples of equal mixtures of no-signalling nonlocal extremal behaviours in the

(2, 2, 3)

,

(2, 3, 2)

and

(3, 2, 2)

Bell scenarios that do not belong to the local polytope. One can use linear programming to check the nonlocality of such examples. Assessment of the locality of a behaviour is an instance of the membership problem of the local polytope. Since the local deterministic (LD) behaviours are the extremal points of the local polytope, we can formulate our problem as a feasibility linear program. Suppose

\{{\vec{μ}}_{LD, 1}, {\vec{μ}}_{LD, 2}, \dots, {\vec{μ}}_{LD, #_{LD}}\}

is the set of LD behaviours for some Bell scenario. The vector

{\vec{μ}}_{LD, i} \in R^{d}

denotes the joint probability of outcomes conditioned on the input choices and d is the dimension of the ambient space in which the vector lies. The feasibility linear program has the variable

\vec{x} \in R^{#_{LD}}

. The inequality constraints comprise

x_{i} ⩾ 0, i \in [#_{LD}]

and the equality constraints are

\sum_{i = 1}^{#_{LD}} x_{i} = 1

and the following:

\frac{1}{2} {\vec{NS}}_{extr} + \frac{1}{2} {\vec{{NS}^{'}}}_{extr} = \sum_{k = 1}^{#_{LD}} x_{k} {\vec{μ}}_{LD, k}

(47)

where

{\vec{NS}}_{extr}

is a nonlocal no-signalling extremal behaviour. The details on formulating the dual of this linear program can be found in section E.2.1 of the Appendix of [6].

Before presenting the counter-examples we briefly review the

(n, m, k)

Bell scenario: This scenario consists of n spatially separated parties, where each party

i \in [n]

has a choice of m different k-outcome measurements. For

X \equiv {0, 1, \dots, m - 1}

and

A \equiv {0, 1, \dots, k - 1}

the joint probability

μ (a_{1} a_{2} \dots a_{n} | x_{1} x_{2} \dots x_{n})

of obtaining the outcomes

(a_{1}, a_{2}, \dots, a_{n}) \in A^{n}

conditioned on the inputs

(x_{1}, x_{2}, \dots, x_{n}) \in X^{n}

can be viewed as a probability vector

\vec{μ} \in R^{d}

, where

d = {(| A | | X |)}^{n}

.

The extremal points of the no-signalling polytope comprise the local deterministic (LD) behaviours and the nonlocal extremal behaviours. The LD behaviours consist of all possible assignments

Λ_{LD} = {{λ_{1 x_{1}}}_{x_{1} \in X}; {λ_{2 x_{2}}}_{x_{2} \in X}; \dots; {λ_{n x_{n}}}_{x_{n} \in X}}

, where

λ_{i x_{i}} \in A

for

i \in [n]

. The number of such assignments is

#_{LD} = {(| A |)}^{n | X |}

. Corresponding to each assignment

λ \in Λ_{LD}

the LD probabilities are expressed as

μ_{LD, k} (a_{1} a_{2} \dots a_{n} | x_{1} x_{2} \dots x_{n}) = [[a_{1} = λ_{1 x_{1}}]] [[a_{12} = λ_{2 x_{2}}]] \dots [[a_{n} = λ_{n x_{n}}]]

(48)

where

[[\cdot]]

is the function that evaluates to 1 if the condition within holds, 0 otherwise. A behaviour

{\vec{μ}}_{L}

is local if it can be expressed as

{\vec{μ}}_{L} = \sum_{k = 1}^{#_{LD}} q_{k} {\vec{μ}}_{LD, k}

, where

q_{k} ⩾ 0

and

\sum_{k = 1}^{#_{LD}} q_{k} = 1

.

(2, 2, 3)

Bell scenario: This scenario is an instance of the more general

(2, 2, k)

scenario, also known in the literature as the CGLMP scenario [28], for

k = 3

. In this bipartite scenario the parties have two three-output choices of settings. The extremal behaviours for the no-signalling polytope for the CGLMP scenario have been fully described in [29]. The nonlocal no-signalling extremal behaviours for the

(2, 2, 3)

scenario, up to relabelling of inputs and outcomes, are given by the following formula:

{NL}_{ext} (a b | x y) : = \{\begin{matrix} \frac{1}{3} & : b - a \equiv x y (\mod 3) \\ 0 & : otherwise \end{matrix}

(49)

where

a, b \in {0, 1, 2}

and

x, y \in {0, 1}

are the outputs and inputs for the parties, respectively. We found that (47) does not necessarily hold for all equal mixtures of a pair of distinct nonlocal extremal behaviours. Among the several examples we found that violate (47), Table 4 shows one such example.

(2, 3, 2)

Bell scenario: More generally, the extremal behaviours of

(2, k, 2)

no-signalling polytope, with

k > 2

, have been completely characterised in [30,31], of which

(2, 3, 2)

is an instance. Following Table II of [31], we can obtain Table 5 and Table 6 which are two representative examples of nonlocal no-signalling extremal behaviours, equal mixtures of which lie outside the local polytope. In Table 5 all input choices,

x, y \in {0, 1, 2}

, for Alice and Bob have uniform probabilities of outcomes; in Table 6 all inputs for Alice and inputs

y \in {0, 1}

for Bob have uniform probabilities of outcomes, with the exception that Bob’s outcome for

y = 2

is deterministic.

\begin{matrix} _ \\ | \begin{matrix} p (00 | x y) & p (01 | x y) \\ p (10 | x y) & p (11 | x y) \end{matrix} | \\ _ \end{matrix} with \begin{matrix} _ \\ | \begin{matrix} ? \end{matrix} | \\ _ \end{matrix} \equiv \begin{matrix} _ \\ | \begin{matrix} 1 / 2 & 0 \\ 0 & 1 / 2 \end{matrix} | \\ _ \end{matrix} or \begin{matrix} _ \\ | \begin{matrix} 0 & 1 / 2 \\ 1 / 2 & 0 \end{matrix} | \\ _ \end{matrix}

There are 16 possible mixtures of the two behaviours in Table 5 and Table 6 corresponding to each ‘?’ in each table being a perfect correlation or a perfect anti-correlation, all of which represent mixtures of extremal nonlocal boxes [31] and all lie outside the local polytope. The nonlocality of the mixtures is confirmed by noting that the four cells in the upper left corner, corresponding to restricting the settings choices to

x, y \in {0, 1}

, is the PR box distribution which is of course nonlocal.

(3, 2, 2)

Bell scenario: This is a tripartite scenario with each party having binary input choices and outcomes. The no-signalling polytope consists of 46 inequivalent classes of extremal behaviours, of which one is the class comprising 64 LD behaviours. A complete characterisation can be found in [32]. As an example violating (47) we can refer to the observation made in Section 2.3 of [32] that equal mixtures of two behaviours in Class 46 (see Table 1 of [32]) are a GHZ correlation which is expressed (entirely in terms of correlators

〈 A_{x} B_{y} C_{z} 〉

) as

P_{GHZ} (a b c | x y z) = \frac{1}{8} (a + a b c 〈 A_{x} B_{y} C_{z} 〉)

.

{\vec{P}}_{GHZ}

is a nonlocal behaviour which is obtained by measuring

\frac{1}{\sqrt{2}} (000 + 111)

in suitable local bases [33].

5. Conclusions

In this work, we revisited the probability estimation framework with the goal of presenting a complete and self-contained proof of its optimality in the asymptotic regime and obtaining a better characterisation of optimal adversarial attack strategies on the protocol. We obtained in Theorem 4 an improved and tight upper bound on the cardinality of the set of states needed in the optimal attack, and studied the implications of this result for specific scenarios in Section 4. We also considered the question of robustness for the PEF method, finding that asymptotic optimality of PEFs (in terms of randomness generation rate) need not entail a trade-off with robustness to small deviations from expected experimental behaviour.

In proving the optimality of the framework, our results show that there remains nothing to be gained, asymptotically, for an adversary implementing memory attacks—an i.i.d. attack is asymptotically optimal. However, in real world applications this may not hold. The number of trials in a Bell experiment is finite, albeit large, and there are unavoidable correlations between the successive trials (referred to as memory effects). We leave to future work considerations of side-channel attacks in the non-asymptotic (finite trials) regime for the probability estimation framework.

Author Contributions

Conceptualisation, S.P. and P.B.; Formal analysis, S.P. and P.B.; Funding acquisition, P.B.; Investigation, S.P. and P.B.; Methodology, S.P. and P.B.; Software, S.P.; Supervision, P.B.; Writing—original draft, S.P. and P.B.; Writing—review and editing, S.P. and P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by AFOSR Grant FA9550-20-1-0067, NSF Award 1839223 and Louisiana Board of Regents Award LEQSF (2019-22)-RD-A-27. The APC was funded by NSF Grant 2210399.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We acknowledge helpful discussions with Jitendra Prakash and Mark Wilde.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proofs for Theorems 1 and 2

Appendix A.1.1. Proof for Theorem 1

Theorem.

Suppose

μ : C^{n} \times Z^{n} \times E \to [0, 1]

is a distribution of

CZ E

such that

μ_{e} (CZ) \in Θ

for each

e \in E

. Then, for fixed

β, ϵ > 0

P_{μ_{e}} (μ_{e} (C | Z) ⩾ {(ϵ \prod_{i = 1}^{n} F_{i} (C_{i} Z_{i}))}^{- 1 / β}) ⩽ ϵ

(A1)

holds for each

e \in E

, where

F_{i} (C_{i} Z_{i})

is the probability estimation factor for the i’th trial.

Proof.

The sequence of random variables

C, Z

represent the time-ordered sequence of n trial results. For the remainder of the proof we omit conditioning on

E = e

since the result holds for each realisation. Hence,

μ (\dots)

,

P_{μ} (\dots)

and

E_{μ} [\dots]

must be understood to mean

μ_{e} (\dots)

,

P_{μ_{e}} (\dots)

and

E_{μ_{e}} [\dots]

.

Observe that for any

i \in {1, . . ., n - 1}

we have

μ (C_{⩽ i + 1} | Z_{⩽ i + 1}) = μ (C_{i + 1} | C_{⩽ i} Z_{⩽ i + 1}) μ (C_{⩽ i} | Z_{i + 1} Z_{⩽ i}) = μ (C_{i + 1} | C_{⩽ i} Z_{⩽ i + 1}) μ (C_{⩽ i} | Z_{⩽ i})

(A2)

where the first equality is an elementary manipulation of conditional probabilities and the second equality follows from

\begin{matrix} μ (C_{⩽ i} | Z_{i + 1} Z_{⩽ i}) & = μ (C_{⩽ i} Z_{i + 1} Z_{⩽ i}) / μ (Z_{i + 1} Z_{⩽ i}) \\ = μ (C_{⩽ i} Z_{⩽ i}) μ (Z_{i + 1}) / μ (Z_{i + 1}) μ (Z_{⩽ i}) \\ = μ (C_{⩽ i} | Z_{⩽ i}), \end{matrix}

with the second step above following from the second condition in (1), applied directly in the numerator and in the denominator via

μ (z_{j + 1} z_{⩽ j}) = \sum_{c_{⩽ j}} μ (z_{j + 1} c_{⩽ j} z_{⩽ j}) = μ (z_{j + 1}) \sum_{c_{⩽ j}} μ (c_{⩽ j} z_{⩽ j}) = μ (z_{j + 1}) μ (z_{⩽ j}) .

Now, consider the sequence

Q_{i} = μ (C_{⩽ i} {| Z_{⩽ i})}^{β} \prod_{j = 1}^{i} F_{j}

, for

i ⩾ 1

, where we note

Q_{i}

is a random variable that is determined by

C_{⩽ i}, Z_{⩽ i}

. We begin by showing that conditioned on

C_{⩽ i}, Z_{⩽ i}

the expectation of

Q_{i + 1}

is at most

Q_{i}

for all

i \in {1, . . ., n - 1}

. Applying (A2), we can write

\begin{matrix} Q_{i + 1} & = F_{i + 1} μ (C_{i + 1} {| Z_{i + 1} C_{⩽ i} Z_{⩽ i})}^{β} μ (C_{⩽ i} {| Z_{⩽ i})}^{β} \prod_{j = 1}^{i} F_{j} \\ = F_{i + 1} μ (C_{i + 1} {| Z_{i + 1} C_{⩽ i} Z_{⩽ i})}^{β} Q_{i}, \\ \Rightarrow E_{μ} [Q_{i + 1} | C_{⩽ i} Z_{⩽ i}] & = Q_{i} E_{μ} [F_{i + 1} μ (C_{i + 1} {| Z_{i + 1} C_{⩽ i} Z_{⩽ i})}^{β} | C_{⩽ i} Z_{⩽ i}] ⩽ Q_{i} \end{matrix}

(A3)

where the fact that

Q_{i}

is determined by

C_{⩽ i}, Z_{⩽ i}

allows us to pull it out of the conditional expectation and the inequality follows from the fact that

E_{μ} [F_{i + 1} μ (C_{i + 1} {| Z_{i + 1} c_{⩽ i} z_{⩽ i})}^{β}] ⩽ 1

for all realisations

c_{⩽ i}, z_{⩽ i}

of

C_{⩽ i}, Z_{⩽ i}

, as ensured by Definition 1. We remark that

Q_{i}

is a super-martingale as indicated by the inequality in (A3): the term

F_{i} μ (C_{i} {| Z_{i} Z_{⩽ i - 1} C_{⩽ i - 1})}^{β}

is non-negative, is determined by

C_{⩽ i}, Z_{⩽ i}

and satisfies

E_{μ} [F_{i} μ (C_{i} {| Z_{i} C_{⩽ i - 1} Z_{⩽ i - 1})}^{β} | C_{⩽ i - 1} Z_{⩽ i - 1}] ⩽ 1

. Now, using the law of iterated expectation we obtain:

E_{μ} [Q_{i + 1}] = E_{μ} [E_{μ} [Q_{i + 1} | C_{⩽ i} Z_{⩽ i}]] ⩽ E_{μ} [Q_{i}]

(A4)

Since

Q_{1}

equals

μ {(C_{1} | Z_{1})}^{β} F (C_{1} Z_{1})

, it satisfies

E_{μ} [Q_{1}] ⩽ 1

directly from Definition 1 and so repeated applications of (A4) yield

E_{μ} [Q_{n}] ⩽ E_{μ} [Q_{n - 1}] ⩽ \dots ⩽ E_{μ} [Q_{1}] ⩽ 1

. Since

Q_{n} {= μ (C | Z)}^{β} \prod_{i = 1}^{n} F_{i}

is non-negative, we can use Markov’s inequality and obtain the required result as shown below.

\begin{matrix} P_{μ} (μ (C {| Z)}^{β} \prod_{i = 1}^{n} F_{i} ⩾ 1 / ϵ) ⩽ ϵ E_{μ} [μ (C {| Z)}^{β} \prod_{i = 1}^{n} F_{i}] & ⩽ ϵ \\ \Rightarrow P_{μ} (μ (C | Z) ⩾ {(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β}) & ⩽ ϵ . \end{matrix}

□

Appendix A.1.2. Proof for Theorem 2

Theorem.

Let μ be a distribution

μ : C^{n} \times Z^{n} \times E \to [0, 1]

of

CZ E

such that, for each

e \in E

, the following holds for every

ϵ \in (0, 1)

:

P_{μ_{e}} (μ_{e} (C | Z) ⩽ {(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β}) ⩾ 1 - ϵ,

(A5)

where

F_{i}

is a PEF with power β for the i’th trial. For a fixed choice of

ϵ \in (0, 1)

and

p ⩾ {| C |}^{- n}

, define the event

S : = \{{(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p\}

. Then, if κ is a positive number for which

P_{μ} (S) ⩾ κ

, the following holds:

H_{\infty, μ}^{avg, ϵ / κ} (C | Z E; S) ⩾ {log}_{2} (κ) - {log}_{2} (p)

(A6)

Proof.

The goal is to construct a distribution

ω

of

CZ E

such that it is within

ϵ / κ

TV distance from

μ (CZ E | S)

and such that the average conditional maximum probability of

C

conditioned on (and averaged over)

Z E

is bounded below by

p / κ

. We will construct

ω

to satisfy

ω (Cz e | S) = 0

for all values of

z

and e for which

μ (z e | S) = 0

. Hence, for the rest of the construction, we will restrict attention to cases where

μ (z e | S) > 0

. We will use expressions such as

P_{μ_{e}} (S)

and

μ_{e} (S)

interchangeably.

We start by defining the event

R \{μ_{e} (C | Z) ⩽ {(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β}\},

whose occurrence or non-occurrence is determined by the particular realisation of e,

c

and

z

. The event

R

corresponds to the desired probability bound holding; (A5) ensures that this event occurs with high probability and we will construct our distribution

ω

to, in an intuitive sense, extend this desirable behaviour from

R \cap S

to all of

S

.

We begin the construction by defining, for each fixed e satisfying

μ_{e} (S) > 0

, a non-negative function

f : C^{n} \times Z^{n} \to R^{+}

as shown below.

\begin{matrix} f (cz) = \{\begin{matrix} μ_{e} (cz) / P_{μ_{e}} (S), & S \cap R holds \\ 0, & otherwise \end{matrix} \end{matrix}

(A7)

The weight w of f, defined as

w (f) = \sum_{c, z} f (cz)

, satisfies

w (f) ⩽ 1

as shown below:

w (f) = \sum_{c, z} f (cz) = \sum_{c, z} μ_{e} (cz) [[S \cap R]] / P_{μ_{e}} (S) ⩽ \sum_{c, z} μ_{e} (cz) [[S]] / P_{μ_{e}} (S) = P_{μ_{e}} (S) / P_{μ_{e}} (S) = 1,

where

[[\cdot]]

is equal to 1, if the condition or expression within holds, 0 otherwise. (Note that f is a sub-probability distribution on

cz

: a set of non-negative numbers whose sum is less than or equal to 1. Defining a sub-probability distribution is a standard trick to construct a distribution by invoking certain lemmas.) Below we show that w satisfies

w (f) ⩾ 1 - ϵ / P_{μ_{e}} (S)

.

\begin{matrix} w (f) & = 1 - 1 + \sum_{c, z} μ_{e} (cz) [[S]] [[R]] / P_{μ_{e}} (S) \\ = 1 - \sum_{c, z} μ_{e} (cz) [[S]] / P_{μ_{e}} (S) + \sum_{c, z} μ_{e} (cz) [[S]] [[R]] / P_{μ_{e}} (S) \\ = 1 - \sum_{c, z} (μ_{e} (cz) - μ_{e} (cz) [[R]]) [[S]] / P_{μ_{e}} (S) ⩾ 1 - \sum_{c, z} (μ_{e} (cz) - μ_{e} (cz) [[R]]) / P_{μ_{e}} (S) \\ = 1 - (1 - P_{μ_{e}} (R)) / P_{μ_{e}} (S) ⩾ 1 - ϵ / P_{μ_{e}} (S), \end{matrix}

(A8)

where in (A8) we have used the fact that

P_{μ_{e}} (R) ⩾ 1 - ϵ

holds for each

e \in E

, as follows from (A5). Next, we define a non-negative function

{\tilde{f}}_{z} : C^{n} \to R^{+}

for each

z \in Z^{n}

for which

μ_{e} (z | S) > 0

:

{\tilde{f}}_{z} (c) = f (cz) / μ_{e} (z | S)

(A9)

We show below that, for each such

z

,

{\tilde{f}}_{z} (c)

is bounded by

μ_{e} (c | z, S), \forall c \in C^{n}

. We have:

\begin{matrix} {\tilde{f}}_{z} (c) & = μ_{e} (cz) [[S]] [[R]] / (P_{μ_{e}} (S) μ_{e} (z | S)) = μ_{e} (cz, S) [[R]] / μ_{e} (z, S) \\ ⩽ μ_{e} (cz, S) / μ_{e} (z, S) = μ_{e} (c | z, S), \end{matrix}

(A10)

where the equality

μ_{e} (cz) [[S]] = μ_{e} (cz, S)

makes sense because whether or not

S

holds is determined by

cz

. Since (A10) holds for all

c

, we conclude

{\tilde{f}}_{z} (C) ⩽ μ_{e} (C | z, S)

. This proves that

{\tilde{f}}_{z} (C)

is dominated by

μ_{e} (C | z, S)

. From the definition of

{\tilde{f}}_{z}

we also have another upper bound for all

c

:

{\tilde{f}}_{z} (c) = μ_{e} (c | z) μ_{e} (z) [[S \cap R]] / μ_{e} (z, S) ⩽ p μ_{e} (z) / μ_{e} (z, S) .

Above, we have used the fact that the event

S \cap R

implies

μ_{e} (C | Z) ⩽ {(ϵ \prod_{i = 1}^{n} F_{i})}^{- 1 / β} ⩽ p

. The bound

p μ_{e} (z) / μ_{e} (z, S) ⩾ p ⩾ {| C |}^{- n}

also holds, since

μ_{e} (z) / μ_{e} (z, S) ⩾ 1

. Hence, using the lemmas in Appendix D we can construct, for each

z

under consideration, a distribution

μ_{z}^{'} (C)

such that

μ_{z}^{'} (C) ⩾ {\tilde{f}}_{z} (C)

,

μ_{z}^{'} (C) ⩽ p μ_{e} (z) / μ_{e} (z, S)

and

TV (μ_{z}^{'} (C), μ_{e} (C | z, S)) ⩽ 1 - w ({\tilde{f}}_{z})

, where

w ({\tilde{f}}_{z}) ⩽ 1

is the weight of

{\tilde{f}}_{z} (C)

. Now we are ready to define the distribution

ω (CZ E)

as

ω (cz e) = \{\begin{matrix} μ_{z}^{'} (c) μ_{e} (z | S) μ (e | S) & if μ (z e | S) > 0 \\ 0 & if μ (z e | S) = 0 \end{matrix}

We show that the total variation distance between

ω

and

μ (CZ E | S)

is bounded by

ϵ / κ

and that the average

ze

-conditional maximum probability of

C

is bounded by

p / κ

. First,

\begin{matrix} d_{TV} (ω (CZ E), μ (CZ E | S)) = \frac{1}{2} \sum_{cz e} | ω (cz e) - μ (cz e | S) | \end{matrix}

\begin{matrix} = \frac{1}{2} \sum_{e : μ (e | S) > 0} \sum_{z : μ_{e} (z | S) > 0} \sum_{c} | μ_{z}^{'} (c) - μ (c | z e, S) | μ_{e} (z | S) μ (e | S) \end{matrix}

(A11)

\begin{matrix} ⩽ \frac{1}{2} \sum_{e : μ (e | S) > 0} \sum_{z : μ_{e} (z | S) > 0} \sum_{c} (μ_{z}^{'} (c) - \tilde{f} (c) + μ (c | z e, S) - \tilde{f} (c)) μ_{e} (z | S) μ (e | S) \end{matrix}

(A12)

\begin{matrix} = \frac{1}{2} \sum_{e : μ (e | S) > 0} \sum_{z : μ_{e} (z | S) > 0} μ_{e} (z | S) μ (e | S) (1 - \sum_{c} \tilde{f} (c) + 1 - \sum_{c} \tilde{f} (c)) \end{matrix}

(A13)

\begin{matrix} = \frac{1}{2} \sum_{e : μ (e | S) > 0} 2 (1 - \sum_{cz} f (cz)) μ (e | S) \end{matrix}

(A14)

\begin{matrix} = \sum_{e : μ (e | S) > 0} (1 - w (f)) μ (e | S) ⩽ \sum_{e : μ (e | S) > 0} \frac{ϵ}{P_{μ_{e}} (S)} μ (e | S) = \sum_{e : μ (e | S) > 0} \frac{ϵ μ (e)}{μ (S)} ⩽ ϵ / P_{μ} (S) ⩽ ϵ / κ \end{matrix}

(A15)

The equality in (A11) follows because

ω (cz e) = μ (cz e | S) = 0

for values of

e, z

removed from the sums and

μ_{c}^{'} (c)

is defined for the remaining values of

e, z

. In (A12) we add and subtract with

\tilde{f} (c)

inside the absolute value expression in the previous step and use the triangle inequality, following which we use the facts established above that both

μ_{z}^{'} (C)

and

μ (C | z e, S) = μ_{e} (C | z, S)

dominate

{\tilde{f}}_{z} (C)

; (A13) follows from the fact that

μ_{z}^{'} (c) μ_{e} (z | A)

and

μ (c | z e, S)

sum to 1 over

c

(being distributions), and (A14) follows from

{\tilde{f}}_{z} (c) = f (cz) / μ_{e} (z | S)

and the fact that

f (cz) = 0

in cases where

μ_{e} (z | S) = 0

. Finally, the first inequality in (A15) follows from (A8) and the last inequality follows from

P_{μ} (S) ⩾ κ

. Next, we show the upper bound on the average conditional maximum probability.

\begin{matrix} \sum_{z e} (max_{c} ω (c | z e)) ω (z e) & = \sum_{z e} (max_{c} μ_{z}^{'} (c)) μ_{e} (z | S) μ (e | S) \\ ⩽ \sum_{z e} \frac{μ_{e} (z) p}{μ_{e} (z, S)} μ_{e} (z | S) μ (e | S) = \sum_{z e} \frac{μ_{e} (z) p}{μ_{e} (S)} μ (e | S) \\ = \sum_{e} \frac{p}{P_{μ_{e}} (S)} μ (e | S) = \frac{p}{P_{μ} (S)} ⩽ \frac{p}{κ} \end{matrix}

(A16)

\begin{matrix} \Rightarrow - {log}_{2} [\sum_{z e} (max_{c} ω (c | z e)) ω (z e)] & ⩾ {log}_{2} (κ) - {log}_{2} (p) \end{matrix}

(A17)

Hence, we have obtained an upper bound on the average conditional maximum probability in (A16). Since by definition the

ϵ / κ

-smooth average conditional min-entropy involves a maximum (over the set

B^{ϵ / κ} (μ)

) of the quantity on the left hand side of (A17), the final result follows. □

Appendix B

Appendix B.1. Proofs Using Convex Geometry

Here we prove Theorems 4 and 5 using arguments from convex geometry.

Appendix B.1.1. Proof for Theorem 4

Theorem.

Suppose Π is closed and equal to the convex hull of its extreme points. Then, there is a distribution

ω (C Z E) \in Σ_{E}

with

| E | = 1 + dim Π

such that

H_{μ} (C | Z E) = h_{min} (ρ (C Z))

.

Proof.

We will be analysing

h_{\min} (\cdot)

as a function with domain

Π

. It is useful to re-write

h_{\min} (\cdot)

in the form

h_{\min} (ρ) = inf_{{σ_{i}}_{i \in I}} \sum_{i} p_{i} H_{σ_{i}} (C | Z),

where the infimum is taken over all finite subsets

{σ_{i}}_{i \in I} \subseteq Π

for which

\sum_{i \in I} p_{i} σ_{i} = ρ

for some collection of non-negative

p_{i}

summing to 1. This is equivalent to the earlier definition if we set

ω (C Z | e_{i}) = σ_{i} (C Z)

and

ω (e_{i}) = p_{i}

, yielding

H_{ω} (C | Z E) = \sum_{i, j} H_{ω (C | z_{j}, e_{i})} (C) ω (z_{j}, e_{i}) =

\sum_{i} (\sum_{j} H_{ω (C | z_{j}, e_{i})} (C) ω (z_{j} | e_{i})) ω (e_{i}) = \sum_{i} (\sum_{j} H_{σ_{i} (C | z_{j})} (C) σ_{i} (z_{j})) p_{i} = \sum_{i} p_{i} H_{σ_{i}} (C | Z) .

We first observe that the scope of the infimum can be reduced to consider only sets of

σ_{i}

belonging to

Π_{extr}

, the set of extreme points of

Π

. This follows from the fact that conditional Shannon entropy is concave. (The proof of Theorem 43 in [15] correctly notes that the concavity of conditional Shannon entropy can be obtained as a specialisation of the concavity of the quantum conditional entropy. It is worth noting, however, that the classical-only result can be obtained much more quickly and directly as shown in Appendix C.) Hence, any expression in the scope of the infimum defining

h_{\min}

can always be decreased (or at least unchanged) by replacing each

σ_{i}

in the expression with a convex combination of extremal behaviours replicating

σ_{i}

.

Π

is a subset of

R^{N}

where

N = | Z | \times | C |

is the number of conditional probabilities appearing in the behaviour. In general, N is strictly larger than

dim Π

: the constraint that certain elements of

Π

need to form valid probability distributions reduces the dimension and no-signalling equalities can reduce the dimension further. So, we seek to re-parameterise the elements of

Π

using only the number of coordinates necessary based on its dimension. The (affine) dimension of

Π

is by definition the dimension of the smallest affine space containing it—that is, the intersection of all affine subspaces of

R^{N}

containing

Π

, which is itself affine space. Let us call this smallest affine space

A

. If

dim A = m

, then there is a set of m linearly independent vectors

{\vec{v}}_{i}

and a displacement/base vector

\vec{b}

such that any

σ \in A

has a unique representation as

σ = \vec{b} + \sum_{i = 1}^{n} c_{i} {\vec{v}}_{i} .

(A18)

For any

σ \in Π \subseteq A

, then, we can uniquely represent

σ

as a vector of these coefficients,

(c_{1}, c_{2}, . . ., c_{m})

.

Our approach now makes explicit the arguments only alluded to in the proof of Theorem 43 in [15] through general referral to existence and extension theorems in convex analysis, and takes full advantage of the fact that we are always working in a large ambient

R^{n}

, allowing us to harness the strength of linear algebra. We would like to construct an affine-linear map

g : R^{N} \to R^{m}

whose restriction to

A

maps the N-coordinate vector

σ

to its m-coordinate representation

(c_{1}, c_{2}, . . ., c_{m})

. Our affine-linear map will be represented by a matrix M and a vector

\vec{k}

such that

g (σ) = M σ + \vec{k} = (c_{1}, . . ., c_{m})

. To construct M and

\vec{k}

, let A be the

N \times m

matrix whose m columns are the vectors

{\vec{v}}_{i}

appearing in (A18). Since the columns of A are linearly independent,

A^{T} A

is invertible as its kernel consists only of the zero vector:

A^{T} A \vec{v} = \vec{0} \Rightarrow {\vec{v}}^{T} A^{T} A \vec{v} = 0 \Rightarrow {(A \vec{v})}^{T} A \vec{v} = 0 \Rightarrow | | A \vec{v} | | = 0 \Rightarrow A \vec{v} = \vec{0} \Rightarrow \vec{v} = \vec{0}

We can thus define

M = {(A^{T} A)}^{- 1} A^{T}

which will satisfy

M A = I

(M is a pseudo-inverse of A), and so M maps the vectors

{\vec{v}}_{i}

to the standard basis vectors in

R^{m}

. Setting

\vec{k} = - M \vec{b}

yields the desired

g (\cdot)

.

We point out a couple of properties of g that we will use in our arguments. First, it commutes with convex combinations: for a set of non-negative

p_{i}

satisfying

\sum_{i} p_{i} = 1

and a collection of elements

σ_{i}

of

A

,

\sum_{i} p_{i} g (σ_{i}) = g (\sum_{i} p_{i} σ_{i}),

(A19)

which follows directly from expressing g as

M (\cdot) + \vec{k}

and noticing that

\sum_{i} p_{i} \vec{k} = \vec{k}

. Second, M is injective when restricted to

A

, so consequently g is a bijection between

A

and

R^{m}

, and in particular

g (\sum p_{i} σ_{i}) = g (σ) \Leftrightarrow \sum p_{i} σ_{i} = σ .

(A20)

The following development is inspired by the arguments in the appendix of [34], though the assumptions and conclusions differ somewhat. Let us consider the following subset of

R^{m + 1}

,

Ξ_{extr} = {(g (σ), H_{σ} (C | Z)) : σ \in Π_{extr}},

where the first m coordinates of an element of

Ξ_{extr}

are the coordinates of

g (σ)

and the

m + 1

coordinate is

H_{σ} (C | Z)

. Define

Ξ = conv (Ξ_{extr}),

where ‘conv’ denotes the convex hull.

Ξ_{extr}

and

Ξ

are artificial constructions but by studying their geometry we can prove the existence of a convex combination achieving the infimum defining

h_{min} (ρ)

.

We first confirm that

Ξ_{extr}

is indeed the set of extremal points

Ξ

(as suggested by our choice in names), i.e., we confirm that

Ξ_{extr}

contains only trivial convex combinations of its elements. To see this, note if

\sum_{i} p_{i} (g (σ_{i}), H_{σ_{i}} (C, Z)) = (g (σ), H_{σ} (C, Z))

holds for some

σ_{i}, σ \in Π_{extr}

and non-negative

p_{i}

satisfying

\sum_{i} p_{i} = 1

, then we must have

\sum_{i} p_{i} g (σ_{i}) = g (σ)

and so

\sum_{i} p_{i} σ_{i} = σ

by (A19) and (A20). This can only be a trivial convex combination (i.e., all

σ_{i}

with nonzero

p_{i}

coefficient must equal

σ

) as the

σ_{i}

and

σ

are assumed to be in

Π_{extr}

.

Second, we show that the point

(g (ρ), h_{min} (ρ))

is on the boundary of

Ξ

, i.e., that

(g (ρ), h_{min} (ρ))

is a limit point of

Ξ

and also a limit point of

Ξ^{C}

. To see that we can converge to this point from within the set, note that, for any set of

σ_{i} \in Π_{extr}

satisfying

\sum_{i} p_{i} σ_{i} = ρ

, we have by definition

\sum_{i} p_{i} (g (σ_{i}), H_{σ_{i}} (C | Z)) \in Ξ

which can be re-expressed as

(g (ρ), \sum_{i} p_{i} H_{σ_{i}} (C | Z)) \in Ξ

by invoking (A19). By the nature of the infimum defining

h_{min} (ρ)

, there must be a sequence of such elements of

Ξ

whose last component forms a non-increasing sequence converging to

h_{min} (ρ)

; since the first m components are identically

g (ρ)

, this sequence converges to

(g (ρ), h_{min} (ρ))

as desired. Similarly, one can also converge to

(g (ρ), h_{\min} (ρ))

from outside the set

Ξ

:

(g (ρ), h_{\min} (ρ) - ϵ) \notin Ξ

for all

ϵ > 0

; this is because all elements of

Ξ

take the form

\begin{matrix} \sum_{i} p_{i} (g (σ_{i}), H_{σ_{i}} (C | Z)) = (\sum_{i} p_{i} g (σ_{i}), \sum_{i} p_{i} H_{σ_{i}} (C | Z)), \end{matrix}

for some collection

σ_{i} \in Ξ_{extr}

and if the first m coordinates are equal to

g (ρ)

, then by (A19) and (A20) we must have

\sum_{i} p_{i} σ_{i} = ρ

and so the

m + 1

coordinate is a term contributing to the infimum defining

h_{\min} (ρ)

; it cannot be less than

h_{\min} (ρ)

.

We now would like to demonstrate that

(g (ρ), h_{\min} (ρ))

is contained in

Ξ

. As a first step, we show that

(g (ρ), h_{\min} (ρ)) \in conv (\bar{Ξ_{extr}}),

(A21)

where

conv (\bar{Ξ_{extr}})

denotes the convex hull of the closure of

Ξ_{extr}

. To see this, first note that

Ξ_{extr}

is bounded—for the

m + 1

coordinate, Shannon entropy is non-negative with a maximum value set by the cardinality of the value space of C and, for the first m coordinates, these are contained in the image of the set

Π_{extr}

through the continuous map g—and since

Π_{extr}

is contained in the compact set

P = {[0, 1]}^{n}

(

P

contains all probability distributions), its image must be contained in the compact (and thus bounded) set

g (P)

. As

Ξ_{extr}

is bounded, its closure, denoted

\bar{Ξ_{extr}}

, must be bounded as well and so is compact. It is a known fact that the convex hull of a compact set in

R^{n}

is compact, so

conv (\bar{Ξ_{extr}})

is compact—and so, in particular, closed. Finally

conv (\bar{Ξ_{extr}})

clearly contains

Ξ = conv (Ξ_{extr})

, the convex hull of a smaller set; as a closed set containing

Ξ

, it will contain the

Ξ

-boundary point

(g (ρ), h_{\min} (ρ))

.

Now we show that this implies containment in

Ξ

proper. Since the map

h (ρ) : = (g (ρ), H_{ρ} (C | Z))

with image in

R^{m + 1}

is continuous on the domain of n-dimensional probability distributions and

Π_{extr}

is bounded, we have

\bar{h (Π_{extr})} \subseteq h (\bar{Π_{extr}})

. (For any bounded subset S in

R^{n}

like

Π_{extr}

and continuous h, we have

\bar{h (S)} \subseteq h (\bar{S})

, the proof of which is as follows: Any

x \in \bar{h (S)}

must be the limit of a sequence in

h (S)

; let

{s_{i}}_{i = 1}^{\infty} \subseteq S

satisfy

h (s_{i}) \to x

. Since

\bar{S}

is compact,

{s_{i}}_{i = 1}^{\infty} \subseteq S \subseteq \bar{S}

has a convergent sub-sequence

{s_{j}}_{j = 1}^{\infty}

with limit in

\bar{S}

; let

s \in \bar{S}

be this limit. By continuity,

h (s_{j}) \to h (s)

; considered as a sub-sequence of

{h (s_{i})}_{i = 1}^{\infty}

, we also have

h (s_{j}) \to x

and so uniqueness of limits implies

x = h (s) \in h (\bar{S})

.)

Now, since by definition

Ξ_{extr} = h (Π_{extr})

, we write

\bar{Ξ_{extr}} \subseteq h (\bar{Π_{extr}}) .

(A22)

Next, using (A21), (A22), the definition of

h (\cdot)

and finally (A19), we can write

\begin{matrix} (g (ρ), h_{\min} (ρ)) & = & \sum_{i} p_{i} {\vec{w}}_{i} for some {{\vec{w}}_{i}}_{i \in I} \subseteq \bar{Ξ_{extr}} \\ = & \sum_{i} p_{i} h (τ_{i}) for some {τ_{i}}_{i \in I} \subseteq \bar{Π_{extr}} \\ = & \sum_{i} p_{i} (g (τ_{i}), H_{τ_{i}} (C | Z)) for some {τ_{i}}_{i \in I} \subseteq \bar{Π_{extr}} \\ = & (g (\sum_{i} p_{i} τ_{i}), \sum_{i} p_{i} H_{τ_{i}} (C | Z)) for some {τ_{i}}_{i \in I} \subseteq \bar{Π_{extr}} . \end{matrix}

(A23)

Comparing the first expression in the above sequence to the last and applying (A20) implies that

\sum_{i} p_{i} τ_{i} = ρ

. Now, since by assumption

Π

is closed,

Π_{extr} \subseteq Π

implies

\bar{Π_{extr}} \subseteq Π

, so

Π = conv (Π_{extr})

implies that elements of

\bar{Π_{extr}}

can be expressed as convex combinations of elements of

Π_{extr}

. Thus, in the expression

\sum_{i} p_{i} τ_{i}

, if there are any non-extremal

τ_{i}

elements they can be replaced with convex combinations of elements of

Π_{extr}

to yield a convex combination

\sum_{j} q_{j} σ_{j}

equalling

ρ

where the concavity of conditional Shannon entropy implies that

\sum_{j} q_{j} H_{σ_{j}} (C | Z)

is not larger than

\sum_{i} p_{i} H_{τ_{i}} (C | Z)

. However, by (A23),

\sum_{i} p_{i} H_{τ_{i}} (C | Z) = h_{\min} (ρ)

and, since

\sum_{j} q_{j} H_{σ_{j}} (C | Z)

cannot be smaller than

h_{min} (ρ)

, it must equal

h_{min} (ρ)

. As

\sum_{j} q_{j} σ_{j} = ρ

and

\sum_{j} q_{j} H_{σ_{j}} (C | Z) = h_{min} (ρ)

, one more application of (A19) yields

\begin{matrix} (g (ρ), h_{\min} (ρ)) & = & (g (\sum_{j} q_{j} σ_{j}), \sum_{j} q_{j} H_{σ_{j}} (C | Z)) for some {σ_{j}}_{j \in J} \subseteq Π_{extr} \\ = & \sum_{j} q_{j} (g (σ_{j}), H_{σ_{j}} (C | Z)) for some {σ_{j}}_{j \in J} \subseteq Π_{extr}, \end{matrix}

which is in

Ξ

.

The argument thus far demonstrates the existence of a convex combination of

Π_{extr}

elements explicitly achieving the infimum in the definition of

h_{min} (ρ)

. We continue with our argument to further demonstrate that the number of required

Π_{extr}

elements in such an optimal decomposition is not greater than

m + 1

.

We first note that, since

(g (ρ), h_{\min} (ρ))

is on the boundary of the convex set

Ξ

, the supporting hyperplane theorem says there is a supporting hyperplane

H_{ρ}

with

(g (ρ), h_{\min} (ρ)) \in H_{ρ}

and

Ξ

entirely on one side of

H_{ρ}

. Now, notice that if we decompose

(g (ρ), h_{\min} (ρ))

as a convex combination of

Ξ_{extr}

elements, these elements must all lie in the hyperplane

H_{ρ}

: this is because any elements strictly on one side of

H_{ρ}

would have to be counterbalanced by elements strictly on the other side of

H_{ρ}

—but, since one side of

H_{ρ}

is disjoint from

Ξ

, this is not possible. Applying the same observation to any other element of

H_{ρ} \cap Ξ

, it follows that

H_{ρ} \cap Ξ

is contained in the convex hull of

H_{ρ} \cap Ξ_{extr}

. As the reverse inclusion follows from the convexity of

H_{ρ}

and the fact that

Ξ = conv (Ξ_{extr})

, we can write

conv (H_{ρ} \cap Ξ_{extr}) = H_{ρ} \cap Ξ

. Now, since

H_{ρ} \cap Ξ

is at most m dimensional (

H_{ρ}

, as a hyperplane, has one fewer dimension than the ambient

(m + 1)

-dimensional space), we can invoke Carathéodory’s theorem to see that at most

m + 1

points of

H_{ρ} \cap Ξ_{extr}

are required to replicate

(g (ρ), h_{\min} (ρ))

as a convex combination. Thus, we have

(g (ρ), h_{\min} (ρ)) = \sum_{i} p_{i} {\vec{w}}_{i} for some {{\vec{w}}_{i}}_{i \in I} \subseteq Ξ_{extr}, | I | ⩽ m + 1

and so, recalling the definition of

Ξ_{extr}

and invoking (A19) one last time, we can write that, for some integer

m^{*}

satisfying

1 ⩽ m^{*} ⩽ m + 1

,

\begin{matrix} (g (ρ), h_{\min} (ρ)) & = & \sum_{i = 1}^{m^{*}} p_{i} (g (σ_{i}), H_{σ_{i}} (C | Z)) for some {σ_{i}}_{i = 1}^{m^{*}} \subseteq Π_{extr} \\ = & (g (\sum_{i = 1}^{m^{*}} p_{i} σ_{i}), \sum_{i = 1}^{m^{*}} p_{i} H_{σ_{i}} (C | Z)) for some {σ_{i}}_{i = 1}^{m^{*}} \subseteq Π_{extr} . \end{matrix}

By (A20),

\sum_{i = 1}^{m^{*}} p_{i} σ_{i} = ρ

and so

{σ_{i}}_{i = 1}^{m^{*}}

induces the desired distribution

ω (C Z E)

by setting

ω (C Z | e_{i}) = σ_{i} (C Z)

and

ω (e_{i}) = p_{i}

. □

Appendix B.1.2. Proof for Theorem 5

Theorem.

Suppose Π satisfies the conditions of Theorem 4 and ρ is in the interior of Π. Then, there exists an entropy estimator whose entropy estimate at ρ is equal to

h_{min} (ρ)

.

Proof.

We continue from where we left off in the proof of Theorem 4 and show that the supporting hyperplane

H_{ρ}

discussed in that proof can be used to construct an affine function that is the desired entropy estimator. Recall that the dimension of

Π

, which is embedded in a higher dimensional vector space

R^{N}

, is defined as the affine dimension of

A

, the smallest affine subspace containing

Π

. Given this context, the assumption that

ρ

is in the interior of

Π

means that there exists an open

ϵ

-ball U in

R^{N}

such that

U \cap A

, which is open in the subspace topology, is contained in

Π

. (If this assumption is removed, a weaker form of the theorem demonstrating the existence of entropy estimators with estimate

ϵ

-close to

h_{min} (ρ)

can be proved with a similar argument to that of the current proof by invoking Exercise 3.28 of [35].)

First, we note that

g (ρ)

is in the interior of

g (Π)

. To see this, consider the restriction

g ↾_{A}

of g to

A

, which is a bijection with affine-linear inverse map

{(g ↾_{A})}^{- 1} : R^{m} \to A

given by

A (\cdot) + \vec{b}

(recalling the construction following (A18) in the proof of Theorem 4). This ensures that the set

g ↾_{A} (U \cap A)

must be open, as it is equal to the inverse image of

U \cap A

under the map

{(g ↾_{A})}^{- 1}

which is equal to the inverse image of the open set U under the (continuous) map

A (\cdot) + \vec{b} : R^{m} \to R^{N}

. Hence,

g (ρ)

is contained in the open set

g (U \cap A)

which is a subset of

g (Π)

as

U \cap A \subseteq Π

.

Now, we take a closer look at

H_{ρ}

, the supporting hyperplane touching

Ξ

at

(g (ρ), h_{\min} (ρ))

. As a hyperplane,

H_{ρ}

will be equal to the set of

\vec{x}

satisfying an equation of the form

\vec{a} \cdot \vec{x} = b

for some fixed

\vec{a} \in R^{m + 1}

and

b \in R

, where · denotes the dot product, and the condition

ξ \in Ξ \Rightarrow \vec{a} \cdot ξ ⩾ b

(A24)

expresses algebraically the notion that

Ξ

is on one side of

H_{ρ}

. We argue that the fact that

g (ρ)

is in the interior of

g (Π)

implies the

m + 1

component of

\vec{a}

, denoted

{\vec{a}}_{m + 1}

, must be nonzero. Assume

{\vec{a}}_{m + 1} = 0

for a proof by contradiction: since

(g (ρ), h_{min} (ρ))

is the point of contact of the supporting hyperplane

H_{ρ}

, we have

\vec{a} \cdot (g (ρ), h_{min} (ρ)) = b

, which implies

{\vec{a}}_{[m]} \cdot g (ρ) = b

where

{\vec{a}}_{[m]} \in R^{m}

denotes the vector consisting of the first m coordinates of

\vec{a}

. Since the previous paragraph demonstrated there is an open subset of

g (Π)

containing

g (ρ)

, this means

g (ρ) - c {\vec{a}}_{[m]}

for a sufficiently small positive c is equal to

g (ϕ)

for some

ϕ

in

Π

. By construction,

ϕ

will satisfy

{\vec{a}}_{[m]} \cdot g (ϕ) < b

but since

{\vec{a}}_{m + 1} = 0

this requires

\vec{a} \cdot (g (ϕ), h_{min} (ϕ)) < b

as well. This would imply

(g (ϕ), h_{min} (ϕ)) \notin Ξ

; however, this is a contradiction as the arguments of Theorem 4 show that, for any

ϕ \in Π

,

(g (ϕ), h_{min} (ϕ))

belongs to

Ξ

(the arguments of Theorem 4 demonstrated this for

ρ

but they apply to any element of

Π

).

Having demonstrated

{\vec{a}}_{m + 1} \neq 0

, we can define a function

f_{ρ} : R^{m} \to R

as follows:

f_{ρ} (\vec{x}) = \frac{b - {\vec{a}}_{[m]} \cdot \vec{x}}{{\vec{a}}_{m + 1}} .

(A25)

Composing this function with g, we find that

f_{ρ} \circ g (ρ) = h_{min} (ρ)

. Furthermore, for general

ϕ \in Π

the fact that

(g (ϕ), h_{min} (ϕ)) \in Ξ

ensures

f_{ρ} \circ g (ϕ) ⩽ h_{min} (ϕ)

, and

h_{min} (ϕ) ⩽ H_{ϕ} (C | Z)

by concavity of conditional Shannon entropy, so the map

f_{ρ} \circ g

, applied to any

ϕ \in Π

, satisfies

f_{ρ} \circ g (ϕ) ⩽ H_{ϕ} (C | Z) = E_{ϕ} (- {log}_{2} [ϕ (C | Z)]) .

We now use

f_{ρ} \circ g

to construct the desired entropy estimator as follows. We have

f_{ρ} \circ g (ϕ) = \frac{b - {\vec{a}}_{[m]} \cdot (M ϕ + \vec{k})}{{\vec{a}}_{m + 1}} = \vec{n} \cdot ϕ + d

where

d = (b - {\vec{a}}_{[m]} \cdot \vec{k}) / {\vec{a}}_{m + 1}

is a constant and

\vec{n} = - (1 / {\vec{a}}_{m + 1}) M^{T} {\vec{a}}_{[m]}

is an N-dimensional vector; that is, it has one component for each possible distinct outcome pair

c, z

for the random variable pair

C, Z

. Now we can define

K (c, z) : = {\vec{n}}_{c z} + d

to obtain a function of

C, Z

satisfying

E_{ϕ} [K (C Z)] = \vec{n} \cdot ϕ + d = f_{ρ} \circ g (ϕ) ⩽ E_{ϕ} (- {log}_{2} [ϕ (C | Z)]),

and thus K is an entropy estimator satisfying the conditions of the theorem. □

Appendix C

Concavity of Conditional Shannon Entropy

It is known that conditional Shannon entropy is concave. For completeness, we provide a brief proof of how this follows from the concavity of (unconditional) Shannon entropy. Let

ν

be a convex combination of

ν_{1}

and

ν_{2}

, so that for all

(c, z) \in C \times Z

we have

ν (c, z) = λ ν_{1} (c, z) + (1 - λ) ν_{2} (c, z)

for some

λ \in [0, 1]

. Then, it follows that

ν (C | z)

is a convex mixture of

ν_{1} (C | z)

and

ν_{2} (C | z)

for each fixed z for which

ν (z) > 0

:

ν (C | z) = λ \frac{ν_{1} (z)}{ν (z)} ν_{1} (C | z) + (1 - λ) \frac{ν_{2} (z)}{ν (z)} ν_{2} (C | z),

(A26)

where it is straightforward to check that the coefficients of

ν_{1} (C | z)

and

ν_{2} (C | z)

are non-negative numbers summing to one. Then, using (A26) and the concavity of (unconditional) Shannon entropy we have the following.

\begin{matrix} H_{ν} (C | Z) & = \sum_{z} H_{ν} (C | z) ν (z) ⩾ \sum_{z} (λ \frac{ν_{1} (z)}{ν (z)} H_{ν_{1}} (C | z) + (1 - λ) \frac{ν_{2} (z)}{ν (z)} H_{ν_{2}} (C | z)) ν (z) \\ = \sum_{z} (λ H_{ν_{1}} (C | z) ν_{1} (z) + (1 - λ) H_{ν_{2}} (C | z) ν_{2} (z)) = λ H_{ν_{1}} (C | Z) + (1 - λ) H_{ν_{2}} (C | Z) \end{matrix}

(A27)

Appendix D

Useful Lemmas

The following lemmas are used for arguments in Appendix A.1.1 and Appendix E.

Lemma A1.

If the distributions

μ (X)

and

μ^{'} (X)

dominate the non-negative function

f : X \to R^{+}

with weight

w (f) = \sum_{x \in X} f (x) = 1 - ϵ

for

ϵ \in [0, 1]

, i.e.,

μ (x) ⩾ f (x), μ^{'} (x) ⩾ f (x),

for all

x \in X

, then

d_{TV} (μ, μ^{'}) ⩽ ϵ

.

Proof.

Using the definition of TV distance we have the required result as shown below.

\begin{matrix} d_{TV} (μ, μ^{'}) & = \frac{1}{2} \sum_{x \in X} | μ (x) - f (x) + f (x) - μ^{'} (x) | \\ ⩽ \frac{1}{2} \sum_{x \in X} (| μ (x) - f (x) | + | μ^{'} (x) - f (x) |) = 1 - w (f) = ϵ . \end{matrix}

□

Lemma A2.

Suppose the function

f : X \to R^{+}

has weight

w (f) = \sum_{x \in X} f (x) = 1 - ϵ

where

ϵ \in [0, 1]

and satisfies

f (x) ⩽ p, \forall x \in X

for some fixed

p ⩾ 1 / | X |

. Then, there exists a distribution

μ^{'} (X)

such that

f (x) ⩽ μ^{'} (x) ⩽ p

holds for all

x \in X

.

Proof.

If

ϵ = 0

, it suffices to take

μ^{'} (x) = f (x)

. If

ϵ > 0

, we construct a distribution satisfying the two properties as follows. Define a function

μ_{λ}

with domain

X

as

μ_{λ} (x) = (1 - λ) f (x) + λ p

(A28)

Then, for any fixed

λ \in [0, 1]

,

μ_{λ} (x)

is a convex combination of non-negative numbers and thus non-negative for any choice of x. We show that there exists a

λ \in [0, 1]

for which

\sum_{x} μ_{λ} (x) = 1

, making

μ_{λ}

a distribution. It is easy to verify that for

λ^{'} = ϵ / (p | X | + ϵ - 1)

the above function adds up to unity when summed over

x \in X

. We just need to ensure that

ϵ / (p | X | + ϵ - 1) \in [0, 1]

holds. To see this, note that

p | X | ⩾ 1

so we have

p | X | + ϵ - 1 ⩾ ϵ

and since

ϵ > 0

the quotient must indeed lie in

[0, 1]

. Finally,

μ_{λ^{'}} (X)

satisfies the bounds in the lemma: since

f (x) ⩽ p

for all

x \in X

, for any

λ \in [0, 1]

we have

\begin{matrix} p ⩾ p + (1 - λ) (f (x) - p) = (1 - λ) f (x) + λ p = f (x) + λ (p - f (x)) ⩾ f (x), \forall x \in X \end{matrix}

(A29)

and the middle term above is

μ_{λ^{'}} (X)

for

λ = λ^{'}

. □

Appendix E

Inequalities Relating Smooth Average Conditional Min-Entropy and Smooth Worst-Case Conditional Min-Entropy

Here we state and prove a known inequality that relates two notions of smooth conditional min-entropy. We present this result without structuring random variables as stochastic sequences, i.e., instead of considering distributions of

C, Z, E

we consider distributions of

X, Y

. The result and its proof can be adapted to the more general case involving sequence of random variables.

For a distribution

μ : X \times Y \to [0, 1]

of

X, Y

and the set

B^{ϵ} (μ)

of distributions of

X, Y

defined as

B^{ϵ} (μ) {σ : X \times Y \to [0, 1] ∣ d_{TV} (μ, σ) ⩽ ϵ}

, the

ϵ

-smooth average conditional min-entropy is:

H_{\infty, μ}^{avg, ϵ} (X | Y) : = max_{σ \in B^{ϵ} (μ)} [- {log}_{2} (\sum_{y \in Y} max_{x \in X} σ (x | y) σ (y))] .

A stricter definition of smooth conditional min-entropy than the one stated above is the

ϵ

-smooth “worst-case” conditional min-entropy, introduced in [19]. It reads as follows:

H_{\infty, μ}^{wst, ϵ} (X | Y) = max_{σ \in B^{ϵ} (μ)} [- {log}_{2} (max_{x \in X, y \in Y} σ (x | y))] .

(A30)

For purposes of randomness extraction or scenarios involving predictability of an adversary, the smooth average conditional min-entropy suffices. One can show that the notions of average-case and worst-case are equivalent up to an additive factor [36]. This is formalised in Proposition A1.

Proposition A1.

For a distribution

μ : X \times Y \to [0, 1]

of

X, Y

and

1 ⩾ ϵ ⩾ 0, 1 > ϵ^{'} > 0

, the smooth worst case conditional min-entropy and smooth average case conditional min-entropy are related by the following inequalities:

H_{\infty, μ}^{wst, ϵ} (X | Y) ⩽ H_{\infty, μ}^{avg, ϵ} (X | Y) ⩽ H_{\infty, μ}^{wst, ϵ + ϵ^{'}} (X | Y) + {log}_{2} (1 / ϵ^{'})

(A31)

Proof.

The first inequality holds immediately, since for every

σ (X, Y) \in B^{ϵ} (μ)

we have

max_{x \in X, y \in Y} σ (x | y) ⩾ \sum_{y \in Y} (max_{x \in X} σ (x | y)) σ (y) .

(A32)

Taking

- {log}_{2}

of both sides we obtain

H_{\infty, σ}^{wst} (X | Y) ⩽ H_{\infty, σ}^{avg} (X | Y)

, where

H_{\infty, σ}^{wst} (X | Y)

is the bracketed quantity in (A30) and, since this inequality holds for every

σ \in B^{ϵ} (μ)

, we have

H_{\infty, μ}^{wst, ϵ} (X | Y) ⩽ H_{\infty, μ}^{avg, ϵ} (X | Y)

. For the second inequality of (A31), we want to show that

H_{\infty, μ}^{wst, ϵ + ϵ^{'}} (X | Y) ⩾ H_{\infty, μ}^{avg, ϵ} (X | Y) - log (1 / ϵ^{'})

holds. Suppose the distribution

ν (X, Y) \in B^{ϵ} (μ)

witnesses

H_{\infty, μ}^{avg, ϵ} (X | Y)

, i.e.,

H_{\infty, μ}^{avg, ϵ} (X | Y) = H_{\infty, ν}^{avg} (X | Y)

. The existence of such a witness follows from the compactness of

B^{ϵ} (μ)

and the continuity of

H_{\infty, μ}^{avg} (X | Y)

. It suffices to construct a distribution

σ (X, Y) \in B^{ϵ^{'}} (ν)

such that

{max}_{x, y} σ (x | y) ⩽ p / ϵ^{'}

holds, where

p = \sum_{y} {max}_{x} ν (x | y) ν (y) = E_{ν (Y)} [{max}_{x \in X} ν (x | Y)]

. We begin by defining the sub-probability distribution

\tilde{ν} (X, Y)

as shown below.

\tilde{ν} (x, y) = ν (x, y) [[max_{x \in X} ν (x | y) ⩽ p / ϵ^{'}]],

(A33)

where the notation

[[\dots]]

represents the function that evaluates to 1 if the enclosed condition holds and zero if it does not. Basically, the definition of

\tilde{ν}

in (A33) involves discarding

ν

corresponding to those

y \in Y

for which

{max}_{x} ν (x | y) > p / ϵ^{'}

holds. An application of Markov’s inequality then shows that the weight

w (\tilde{ν})

of

\tilde{ν}

is at least

1 - ϵ^{'}

:

\begin{matrix} w (\tilde{ν}) = \sum_{x, y} \tilde{ν} (x, y) & = \sum_{y \in Y : {max}_{x} ν (x | y) ⩽ p / ϵ^{'}} \sum_{x} ν (x, y) \\ = P_{ν (Y)} (max_{x \in X} ν (x | Y) ⩽ p / ϵ^{'}) ⩾ 1 - ϵ^{'} \end{matrix}

(A34)

Since

p = E_{ν (Y)} [{max}_{x} ν (x | Y)]

, (A34) follows. One way to now construct a distribution

σ \in B^{ϵ^{'}} (μ)

satisfying

{max}_{x, y} σ (x | y) ⩽ p / ϵ^{'}

is to scale

\tilde{ν}

, i.e., we define

σ (X, Y)

as

σ (x, y) = \tilde{ν} (x, y) / w (\tilde{ν})

. Note that, since

w (\tilde{ν}) ⩾ 1 - ϵ^{'}

and

ϵ^{'} \in (0, 1)

,

w (\tilde{ν})

is positive; also,

σ ⩾ \tilde{ν}

holds since

w (\tilde{ν}) ⩽ 1

. Together with the fact that

ν ⩾ \tilde{ν}

, we can use Lemma A1 to show that

d_{TV} (σ, ν) ⩽ 1 - w (\tilde{ν}) ⩽ ϵ^{'}

. By definition of

σ

we have

σ (x, y) ⩽ p σ (y) / ϵ^{'}

for all choices of

x, y

, where

σ (y) = \frac{ν (y)}{w (\tilde{ν})} [[{max}_{x} ν (x | y) ⩽ p / ϵ^{'}]]

for each y. With the convention that

σ (x | y)

is assigned the value 0 when

σ (y) = 0

, we then have

σ (x | y) ⩽ p / ϵ^{'}

for all choices of

x, y

. Membership of

σ

in the set

B^{ϵ + ϵ^{'}} (μ)

follows from the triangle inequality

d_{TV} (μ, σ) ⩽ d_{TV} (μ, ν) + d_{TV} (ν, σ) ⩽ ϵ + ϵ^{'}

. And so we have constructed a distribution in

B^{ϵ + ϵ^{'}} (μ)

such that

{max}_{x, y} σ (x | y) ⩽ p / ϵ^{'}

. Taking

- {log}_{2}

on both sides, we obtain

H_{\infty, σ}^{wst} (X | Y) ⩾ H_{\infty, ν}^{avg} (X | Y) - {log}_{2} (1 / ϵ^{'})

. As mentioned earlier,

ν \in B^{ϵ} (μ)

witnesses

H_{\infty, μ}^{avg, ϵ} (X | Y)

; hence, we have shown:

H_{\infty, σ}^{wst} (X | Y) ⩾ H_{\infty, μ}^{avg, ϵ} (X | Y) - {log}_{2} (1 / ϵ^{'})

(A35)

Since, by definition, the smooth worst-case conditional min-entropy involves a maximum of the left hand side of (A35) over the set

B^{ϵ + ϵ^{'}} (μ)

, this shows that

H_{\infty, μ}^{wst, ϵ + ϵ^{'}} (X | Y) ⩾ H_{\infty, μ}^{avg, ϵ} (X | Y) - {log}_{2} (1 / ϵ^{'})

holds, from which the second inequality in (A31) follows. □

In the asymptotic limit of a large number n of trials, constant factors vanish in measuring per-trial min entropy and, since

ϵ^{'}

can be made arbitrarily small, (A31) enables us to consider either definition when considering asymptotic performance.

Appendix F

Proof of Proposition 1

Proposition.

For the set of behaviours

Π_{NS}

, the PEF optimisation in (10) is independent of the power

β

for

β ⩾ {log}_{2} (4 / 3)

.

Proof.

For a fixed value of n and

ϵ

the optimisation problem in (10) is equivalent to the following:

\begin{matrix} Maximise : & E_{ρ} [{log}_{2} (F (A B X Y))] \\ Subject to : & E_{μ_{PR}^{i}} [F (A B X Y) μ_{PR}^{i} (A B {| X Y)}^{β}] ⩽ 1, for all i \in {0, 1}^{3}, \\ E_{μ_{LD}^{j}} [F (A B X Y) μ_{LD}^{j} (A B {| X Y)}^{β}] ⩽ 1, for all j \in {0, 1}^{4}, \\ F (a b x y) ⩾ 0, \forall a, b, x, y \in {0, 1}, \end{matrix}

(A36)

where the constraints range over the extremal points of

Π_{NS}

as given in (35) and (36). We show that, for

β ⩾ {log}_{2} (4 / 3)

, the above constraints are equivalent to

\begin{matrix} E_{μ_{LD}^{j}} [F (A B X Y) μ_{LD}^{j} (A B | X Y)] ⩽ 1, for all j \in {0, 1}^{4}, \\ F (a b x y) ⩾ 0, \forall a, b, x, y \in {0, 1}, \end{matrix}

(A37)

noticing that

β

does not appear in (A37).

It is immediate to see that the constraints of (A36) imply (A37): since

μ (A B | X Y)

is always zero or one for local deterministic distributions, in this case we have

μ {(A B | X Y)}^{β} = μ (A B | X Y)

and thus for each choice of j we have

E_{μ_{LD}^{j}} [F (A B X Y) μ_{LD}^{j} (A B {| X Y)}^{β}] ⩽ 1

implying the non-

β

counterpart

E_{μ_{LD}^{j}} [F (A B X Y) μ_{LD}^{j} (A B | X Y)] ⩽ 1

in (A37). Now, we demonstrate the reverse implication. First, the argument just given also works in the opposite direction to show that the the non-

β

constraints of (A37) imply the corresponding constraints (with

β

) in (A36). We thus need only to show that the

E_{μ_{PR}^{i}} [\dots] ⩽ 1

in (A36) are implied as well. We give a specific argument for the PR box given in Table 1; symmetric arguments apply for the other PR boxes. Since any distribution

μ (A B X Y)

is the behaviour

μ (A B | X Y)

times a fixed settings distribution

σ_{s} (X Y)

, we can express the product

F (a b x y) σ_{s} (x y)

as

F^{'} (a b x y)

for all choices of

(a, b, x, y)

when the expectation functional

E [\cdot]

is written out in full. The constraints (A37) then imply, by summing over the eight of them corresponding to the eight local deterministic distributions appearing in Table 1 (a set we denote

{LD}_{1}

), that

\begin{matrix} \sum_{a, b, x, y} F^{'} (a b x y) \sum_{μ_{LD} \in {LD}_{1}} μ_{LD} {(a b | x y)}^{2} & ⩽ 8 . \end{matrix}

(A38)

Noticing that the inner sum above is always 3 or 1 (this corresponds to the number of 1s appearing in each column of Table A1, with the result given in Table A2), we can now rewrite (A38) as

3 M + N ⩽ 8

, where

M = F^{'} (0000) + F^{'} (0001) + F^{'} (0010) + F^{'} (0111) + F^{'} (1011) + F^{'} (1100) + F^{'} (1101) + F^{'} (1110)

and

N = F^{'} (0011) + F^{'} (1111) + F^{'} (1001) + F^{'} (0101) + F^{'} (0110) + F^{'} (1010) + F^{'} (0100) + F^{'} (1000)

.

Table A1. Tabular representation for

μ_{PR, 1} {(A B | X Y)}^{1 + β}

.

Table A1. Tabular representation for

μ_{PR, 1} {(A B | X Y)}^{1 + β}

.

		ab
		00	01	10	11
xy	00	$\frac{1}{2^{1 + β}}$	0	0	$\frac{1}{2^{1 + β}}$
	01	$\frac{1}{2^{1 + β}}$	0	0	$\frac{1}{2^{1 + β}}$
	10	$\frac{1}{2^{1 + β}}$	0	0	$\frac{1}{2^{1 + β}}$
	11	0	$\frac{1}{2^{1 + β}}$	$\frac{1}{2^{1 + β}}$	0

Table A2. Tabular representation of the values of the sum

\sum_{μ_{LD} \in {LD}_{1}} μ_{LD} {(a b | x y)}^{2}

.

Table A2. Tabular representation of the values of the sum

\sum_{μ_{LD} \in {LD}_{1}} μ_{LD} {(a b | x y)}^{2}

.

		ab
		00	01	10	11
xy	00	3	1	1	3
	01	3	1	1	3
	10	3	1	1	3
	11	1	3	3	1

Since

M, N

are both non-negative, we can drop N to find that

3 M + N ⩽ 8

implies

M ⩽ 8 / 3 = 2^{1 + {log}_{2} (4 / 3)}

which in turn implies

M ⩽ 2^{1 + β}

whenever

β ⩾ {log}_{2} (4 / 3)

. Since

E_{μ_{PR, 1}} [F (A B X Y) μ_{PR, 1} (A B {| X Y)}^{β}]

is equal to

M {(1 / 2)}^{1 + β}

(see Table A1) the constraint

E_{μ_{PR, 1}} [\dots] ⩽ 1

follows. □

We remark that this inequality condition

β ⩾ {log}_{2} (4 / 3)

is tight in the following sense: there exists a non-negative function

F (a b x y)

violating the PR box constraint of

μ_{PR, 1}

appearing in (A36) for any

β < {log}_{2} (4 / 3)

, while satisfying (A37) for any positive

β

—and consequently satisfying all the constraints of (A36) for

β ⩾ {log}_{2} (4 / 3)

per the argument in the above proof. Thus, the feasible set of (A36) always excludes this particular choice of F for

β < {log}_{2} (4 / 3)

and includes it for

β ⩾ {log}_{2} (4 / 3)

. This function is

F (a b x y) = (1 / 3) [[a \oplus b = x y]] σ_{s} {(x y)}^{- 1}

; fixing

β = {log}_{2} (4 / 3) - ϵ

for some choice of

ϵ

in the interval

(0, {log}_{2} (4 / 3)

, we can check that all the LD boxes satisfy the inequality

\sum_{a, b, x, y} F^{'} (a b x y) μ_{LD} {(a b | x y)}^{1 + β} ⩽ 1

; the value of the expression is always either 1/3 or 1. However, for the PR box

μ_{PR, 1}

in Table 2 we obtain

\sum_{a, b, x, y} F^{'} (a b x y) μ_{PR, 1} {(a b | x y)}^{1 + β} = 2^{ϵ} > 1

, which is a violation.

References

Dodis, Y.; Ong, S.J.; Prabhakaran, M.; Sahai, A. On the (Im)Possibility of Cryptography with Imperfect Randomness. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, Rome, Italy, 17–19 October 2004; pp. 196–205. [Google Scholar] [CrossRef]
Austrin, P.; Chung, K.M.; Mahmoody, M.; Pass, R.; Seth, K. On the Impossibility of Cryptography with Tamperable Randomness. In Proceedings of the Advances in Cryptology—CRYPTO, Santa Barbara, CA, USA, 17–21 August 2014; Garay, J.A., Gennaro, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 462–479. [Google Scholar]
Dodis, Y.; Yao, Y. Privacy with Imperfect Randomness. In Advances in Cryptology—CRYPTO 2015; Gennaro, R., Robshaw, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; pp. 463–482. [Google Scholar]
Orsini, C.; Dankulov, M.M.; Colomer-de Simón, P.; Jamakovic, A.; Mahadevan, P.; Vahdat, A.; Bassler, K.E.; Toroczkai, Z.; Boguñá, M.; Caldarelli, G.; et al. Quantifying randomness in real networks. Nat. Commun. 2015, 6, 8627. [Google Scholar] [CrossRef] [PubMed]
Motwani, R.; Raghavan, P. Randomized Algorithms; Cambridge University Press: Cambridge, NY, USA, 1995. [Google Scholar] [CrossRef]
Scarani, V. Bell Nonlocality; Oxford University Press: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Giustina, M.; Versteegh, M.A.M.; Wengerowsky, S.; Handsteiner, J.; Hochrainer, A.; Phelan, K.; Steinlechner, F.; Kofler, J.; Larsson, J.A.; Abellán, C.; et al. Significant-Loophole-Free Test of Bell’s Theorem with Entangled Photons. Phys. Rev. Lett. 2015, 115, 250401. [Google Scholar] [CrossRef] [PubMed]
Shalm, L.K.; Meyer-Scott, E.; Christensen, B.G.; Bierhorst, P.; Wayne, M.A.; Stevens, M.J.; Gerrits, T.; Glancy, S.; Hamel, D.R.; Allman, M.S.; et al. Strong Loophole-Free Test of Local Realism. Phys. Rev. Lett. 2015, 115, 250402. [Google Scholar] [CrossRef]
Hensen, B.; Bernien, H.; Dréau, A.E.; Reiserer, A.; Kalb, N.; Blok, M.S.; Ruitenberg, J.; Vermeulen, R.F.L.; Schouten, R.N.; Abellán, C.; et al. Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres. Nature 2015, 526, 682–686. [Google Scholar] [CrossRef] [PubMed]
Rosenfeld, W.; Burchardt, D.; Garthoff, R.; Redeker, K.; Ortegel, N.; Rau, M.; Weinfurter, H. Event-Ready Bell Test Using Entangled Atoms Simultaneously Closing Detection and Locality Loopholes. Phys. Rev. Lett. 2017, 119, 010402. [Google Scholar] [CrossRef] [PubMed]
Bierhorst, P.; Knill, E.; Glancy, S.; Zhang, Y.; Mink, A.; Jordan, S.; Rommal, A.; Liu, Y.K.; Christensen, B.; Nam, S.W.; et al. Experimentally generated randomness certified by the impossibility of superluminal signals. Nature 2018, 556, 223–226. [Google Scholar] [CrossRef] [PubMed]
Shalm, L.K.; Zhang, Y.; Bienfang, J.C.; Schlager, C.; Stevens, M.J.; Mazurek, M.D.; Abellán, C.; Amaya, W.; Mitchell, M.W.; Alhejji, M.A.; et al. Device-independent randomness expansion with entangled photons. Nat. Phys. 2021, 17, 452–456. [Google Scholar] [CrossRef]
Li, M.H.; Zhang, X.; Liu, W.Z.; Zhao, S.R.; Bai, B.; Liu, Y.; Zhao, Q.; Peng, Y.; Zhang, J.; Zhang, Y.; et al. Experimental Realization of Device-Independent Quantum Randomness Expansion. Phys. Rev. Lett. 2021, 126, 050503. [Google Scholar] [CrossRef]
Zhang, Y.; Knill, E.; Bierhorst, P. Certifying quantum randomness by probability estimation. Phys. Rev. A 2018, 98, 040304. [Google Scholar] [CrossRef]
Knill, E.; Zhang, Y.; Bierhorst, P. Generation of quantum randomness by probability estimation with classical side information. Phys. Rev. Res. 2020, 2, 033465. [Google Scholar] [CrossRef]
Bierhorst, P.; Zhang, Y. Tsirelson Polytopes and randomness generation. New J. Phys. 2020, 22, 083036. [Google Scholar] [CrossRef]
Zhang, Y.; Fu, H.; Knill, E. Efficient randomness certification by quantum probability estimation. Phys. Rev. Res. 2020, 2, 013016. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Shalm, L.K.; Bienfang, J.C.; Stevens, M.J.; Mazurek, M.D.; Nam, S.W.; Abellán, C.; Amaya, W.; Mitchell, M.W.; Fu, H.; et al. Experimental Low-Latency Device-Independent Quantum Randomness. Phys. Rev. Lett. 2020, 124, 010505. [Google Scholar] [CrossRef] [PubMed]
Renner, R.; Wolf, S. Simple and Tight Bounds for Information Reconciliation and Privacy Amplification. In Proceedings of the Advances in Cryptology—ASIACRYPT, Chennai, India, 4–8 December 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 199–216. [Google Scholar]
Frank, R.L.; Lieb, E.H. Monotonicity of a relative Rényi entropy. J. Math. Phys. 2013, 54, 122201. [Google Scholar] [CrossRef]
Wilde, M.M. Quantum Information Theory; Cambridge University Press: Cambridge, NY, USA, 2013. [Google Scholar] [CrossRef]
Popescu, S.; Rohrlich, D. Quantum nonlocality as an axiom. Found. Phys. 1994, 24, 379–385. [Google Scholar] [CrossRef]
Fine, A. Hidden Variables, Joint Probability, and the Bell Inequalities. Phys. Rev. Lett. 1982, 48, 291–295. [Google Scholar] [CrossRef]
Bierhorst, P. Geometric decompositions of Bell polytopes with practical applications. J. Phys. A Math. Theor. 2016, 49, 215301. [Google Scholar] [CrossRef]
Le, T.P.; Meroni, C.; Sturmfels, B.; Werner, R.F.; Ziegler, T. Quantum Correlations in the Minimal Scenario. Quantum 2023, 7, 947. [Google Scholar] [CrossRef]
Brito, S.G.A.; Amaral, B.; Chaves, R. Quantifying Bell nonlocality with the trace distance. Phys. Rev. A 2018, 97, 022111. [Google Scholar] [CrossRef]
Mikos-Nuszkiewicz, A.; Kaniewski, J. Extremal points of the quantum set in the CHSH scenario: Conjectured analytical solution. arXiv 2023, arXiv:2302.10658. [Google Scholar]
Collins, D.; Gisin, N.; Linden, N.; Massar, S.; Popescu, S. Bell Inequalities for Arbitrarily High-Dimensional Systems. Phys. Rev. Lett. 2002, 88, 040404. [Google Scholar] [CrossRef] [PubMed]
Barrett, J.; Linden, N.; Massar, S.; Pironio, S.; Popescu, S.; Roberts, D. Nonlocal correlations as an information-theoretic resource. Phys. Rev. A 2005, 71, 022101. [Google Scholar] [CrossRef]
Barrett, J.; Pironio, S. Popescu-Rohrlich Correlations as a Unit of Nonlocality. Phys. Rev. Lett. 2005, 95, 140401. [Google Scholar] [CrossRef]
Jones, N.S.; Masanes, L. Interconversion of nonlocal correlations. Phys. Rev. A 2005, 72, 052312. [Google Scholar] [CrossRef]
Pironio, S.; Bancal, J.D.; Scarani, V. Extremal correlations of the tripartite no-signaling polytope. J. Phys. A Math. Theor. 2011, 44, 065303. [Google Scholar] [CrossRef]
Greenberger, D.M.; Horne, M.A.; Zeilinger, A. Going Beyond Bell’s Theorem. arXiv 2007, arXiv:0712.0921. [Google Scholar]
Uhlmann, A. Entropy and Optimal Decompositions of States Relative to a Maximal Commutative Subalgebra. Open Syst. Inf. Dyn. 1998, 5, 209–228. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, NY, USA, 2004. [Google Scholar]
Dodis, Y.; Reyzin, L.; Smith, A. Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data. In Proceedings of the Advances in Cryptology-EUROCRYPT 2004: International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; pp. 523–540. [Google Scholar] [CrossRef]

Figure 1. A schematic representation of the set-up for device-independent randomness generation in a two-party experiment. The outer rectangular box represents a secure location. The adversary

E

has perfect knowledge of the processes inside the secure location but cannot tamper with them. The state

Ψ_{AB E}

represents the resource shared between the two parties.

X_{k}, Y_{k}

are the trial inputs and

A_{k}, B_{k}

are the trial outcomes for the kth trial.

Figure 1. A schematic representation of the set-up for device-independent randomness generation in a two-party experiment. The outer rectangular box represents a secure location. The adversary

E

has perfect knowledge of the processes inside the secure location but cannot tamper with them. The state

Ψ_{AB E}

represents the resource shared between the two parties.

X_{k}, Y_{k}

are the trial inputs and

A_{k}, B_{k}

are the trial outcomes for the kth trial.

Figure 2. A plot showing the net log-prob rates for

n = 1.5 \times 10^{5}

(the dashed curve) and

n = 2.4 \times 10^{5}

(the dash–dotted curve) with

ϵ = 10^{- 4}

and

β

varying in the interval

(0.001, 0.1)

. The dotted curve is the log-prob rate

{sup}_{F} O_{ρ} (F; β)

, an upper bound for the net log-prob rate in the limit as

n \to \infty

. We selected 200 equally spaced points in the interval

(0.001, 0.1)

for

β

and performed the maximisation

{max}_{F} E_{ρ} [{log}_{2} (F (A B X Y))]

constrained by: (1) the non-negativity of PEFs and (2) the defining condition

E_{μ} [F (A B X Y) μ (A B {| X Y)}^{β}] ⩽ 1

at all distributions

μ

corresponding to the eight PR and sixteen LD behaviours with a fixed uniform settings distribution

μ (x y) = 1 / 4

for all

x, y \in {0, 1}

. The anticipated distribution

ρ

used here was the one corresponding to the behaviour given in Table I in [15]. We observe that the maximum value for the net log-prob rate—indicated by the solid vertical lines—is achieved at a lower value of

β

for a higher value of n.

Figure 2. A plot showing the net log-prob rates for

n = 1.5 \times 10^{5}

(the dashed curve) and

n = 2.4 \times 10^{5}

(the dash–dotted curve) with

ϵ = 10^{- 4}

and

β

varying in the interval

(0.001, 0.1)

. The dotted curve is the log-prob rate

{sup}_{F} O_{ρ} (F; β)

, an upper bound for the net log-prob rate in the limit as

n \to \infty

. We selected 200 equally spaced points in the interval

(0.001, 0.1)

for

β

and performed the maximisation

{max}_{F} E_{ρ} [{log}_{2} (F (A B X Y))]

constrained by: (1) the non-negativity of PEFs and (2) the defining condition

E_{μ} [F (A B X Y) μ (A B {| X Y)}^{β}] ⩽ 1

at all distributions

μ

corresponding to the eight PR and sixteen LD behaviours with a fixed uniform settings distribution

μ (x y) = 1 / 4

for all

x, y \in {0, 1}

. The anticipated distribution

ρ

used here was the one corresponding to the behaviour given in Table I in [15]. We observe that the maximum value for the net log-prob rate—indicated by the solid vertical lines—is achieved at a lower value of

β

for a higher value of n.

Figure 3. A heat map illustrating the robustness of PEF with log-prob rate as the figure of merit, evaluated for behaviours

σ (a b | x y)

on the two-dimensional slice of the set of quantum behaviours (shown in Figure 4b) above the standard CHSH–Bell facet. The behaviours on the two-dimensional slice shown above are parameterised by

S

and

S^{'}

as shown in (46) with the added restrictions

S^{2} + {(S^{'})}^{2} ⩽ 8

and

2 ⩽ S ⩽ 2 \sqrt{2}, - 2 ⩽ S^{'} ⩽ 2

(see also Table 3). Assuming a uniform distribution for the settings,

σ_{s} (x y) = 1 / 4

for all

x, y

, we plot the log-prob rate

\sum_{a b x y} [{log}_{2} F_{*} (a b x y) σ (a b | x y) σ_{s} (x y)] / β

for all distributions in the slice. The black dot corresponds to the behaviour (and hence the distribution) with respect to which we perform the PEF optimisation for a fixed n and

ϵ

to obtain

F_{*}

. The coordinates for the black dot are

(S^{'}, S) \equiv (0, 2.6)

. (a) Top figure: Heat map with

F_{*}

obtained from the PEF optimisation in (10) with respect to the fixed distribution (corresponding to the black dot in the figures), fixed

n, ϵ

and

β = 0.1

. Below

S ≃ 2.22145

no device-independent randomness can be certified. (b) Bottom figure: Heat map with

F_{*}

obtained from the PEF optimisation in (10) with respect to the fixed distribution (corresponding to the black dot in the figures), fixed

n, ϵ

and

β = 0.01

. Below

S ≃ 2.02072

no device-independent randomness can be certified.

Figure 3. A heat map illustrating the robustness of PEF with log-prob rate as the figure of merit, evaluated for behaviours

σ (a b | x y)

on the two-dimensional slice of the set of quantum behaviours (shown in Figure 4b) above the standard CHSH–Bell facet. The behaviours on the two-dimensional slice shown above are parameterised by

S

and

S^{'}

as shown in (46) with the added restrictions

S^{2} + {(S^{'})}^{2} ⩽ 8

and

2 ⩽ S ⩽ 2 \sqrt{2}, - 2 ⩽ S^{'} ⩽ 2

(see also Table 3). Assuming a uniform distribution for the settings,

σ_{s} (x y) = 1 / 4

for all

x, y

, we plot the log-prob rate

\sum_{a b x y} [{log}_{2} F_{*} (a b x y) σ (a b | x y) σ_{s} (x y)] / β

for all distributions in the slice. The black dot corresponds to the behaviour (and hence the distribution) with respect to which we perform the PEF optimisation for a fixed n and

ϵ

to obtain

F_{*}

. The coordinates for the black dot are

(S^{'}, S) \equiv (0, 2.6)

. (a) Top figure: Heat map with

F_{*}

obtained from the PEF optimisation in (10) with respect to the fixed distribution (corresponding to the black dot in the figures), fixed

n, ϵ

and

β = 0.1

. Below

S ≃ 2.22145

no device-independent randomness can be certified. (b) Bottom figure: Heat map with

F_{*}

obtained from the PEF optimisation in (10) with respect to the fixed distribution (corresponding to the black dot in the figures), fixed

n, ϵ

and

β = 0.01

. Below

S ≃ 2.02072

no device-independent randomness can be certified.

Figure 4. (a) A two-dimensional slice of the set of no-signalling behaviours (containing the quantum and the local set). The behaviours can be parameterised as the CHSH–Bell values

S

and

S^{'}

obtained by two different versions of the CHSH–Bell expression in (37). Any behaviour on the slice can be represented as in (46). (b) The portion of the two-dimensional slice containing the no-signalling (including quantum-achievable) behaviours above the standard CHSH–Bell facet. For a fixed behaviour

{\vec{μ}}_{Q}

in the interior of the quantum region, the darker shaded region corresponds to possible ways of expressing

{\vec{μ}}_{Q}

as a convex combination of a behaviour on the quantum boundary and a behaviour on the local boundary (for example,

{\vec{μ}}_{Q} = λ {\vec{ν}}_{Q} + (1 - λ) {\vec{ν}}_{L}, λ \in (0, 1)

). For the same behaviour

{\vec{μ}}_{Q}

, the lighter shaded region represents possible ways of expressing it as a convex combination of two behaviours on the quantum boundary (for example,

{\vec{μ}}_{Q} = δ {\vec{θ}}_{Q}, 1 + (1 - δ) {\vec{θ}}_{Q}, 2, δ \in (0, 1)

).

Figure 4. (a) A two-dimensional slice of the set of no-signalling behaviours (containing the quantum and the local set). The behaviours can be parameterised as the CHSH–Bell values

S

and

S^{'}

obtained by two different versions of the CHSH–Bell expression in (37). Any behaviour on the slice can be represented as in (46). (b) The portion of the two-dimensional slice containing the no-signalling (including quantum-achievable) behaviours above the standard CHSH–Bell facet. For a fixed behaviour

{\vec{μ}}_{Q}

in the interior of the quantum region, the darker shaded region corresponds to possible ways of expressing

{\vec{μ}}_{Q}

as a convex combination of a behaviour on the quantum boundary and a behaviour on the local boundary (for example,

{\vec{μ}}_{Q} = λ {\vec{ν}}_{Q} + (1 - λ) {\vec{ν}}_{L}, λ \in (0, 1)

). For the same behaviour

{\vec{μ}}_{Q}

, the lighter shaded region represents possible ways of expressing it as a convex combination of two behaviours on the quantum boundary (for example,

{\vec{μ}}_{Q} = δ {\vec{θ}}_{Q}, 1 + (1 - δ) {\vec{θ}}_{Q}, 2, δ \in (0, 1)

).

Table 1. These probability vectors in

R^{16}

are the PR box

{\vec{μ}}_{PR, 1} \equiv μ_{PR}^{000}

that achieves the nonlocal maximum of 4 and the eight LD behaviours

{\vec{μ}}_{LD, 1}, \dots, {\vec{μ}}_{LD, 8}

that achieve the local maximum of 2 for the standard CHSH–Bell expression

B^{000}

, with the LD behaviours corresponding to the eight probability tables numbered 1, 4, 5, 8, 9, 12, 14 and 15 in Table A2 of [24], and also given in the first row of Table 2. One can verify the affine independence of the nine vectors above by verifying that the eight vectors obtained by subtracting the first vector from the remaining eight are linearly independent.

Table 1. These probability vectors in

R^{16}

are the PR box

{\vec{μ}}_{PR, 1} \equiv μ_{PR}^{000}

that achieves the nonlocal maximum of 4 and the eight LD behaviours

{\vec{μ}}_{LD, 1}, \dots, {\vec{μ}}_{LD, 8}

that achieve the local maximum of 2 for the standard CHSH–Bell expression

B^{000}

, with the LD behaviours corresponding to the eight probability tables numbered 1, 4, 5, 8, 9, 12, 14 and 15 in Table A2 of [24], and also given in the first row of Table 2. One can verify the affine independence of the nine vectors above by verifying that the eight vectors obtained by subtracting the first vector from the remaining eight are linearly independent.

	xy
	00				01				10				11
	ab				ab				ab				ab
	00	01	10	11	00	01	10	11	00	01	10	11	00	01	10	11
${\vec{μ}}_{PR, 1}$	$1 / 2$	0	0	$1 / 2$	$1 / 2$	0	0	$1 / 2$	$1 / 2$	0	0	$1 / 2$	0	$1 / 2$	$1 / 2$	0
${\vec{μ}}_{LD, 1}$	1	0	0	0	1	0	0	0	1	0	0	0	1	0	0	0
${\vec{μ}}_{LD, 2}$	0	0	0	1	0	0	0	1	0	0	0	1	0	0	0	1
${\vec{μ}}_{LD, 3}$	1	0	0	0	0	1	0	0	1	0	0	0	0	1	0	0
${\vec{μ}}_{LD, 4}$	0	0	0	1	0	0	1	0	0	0	0	1	0	0	1	0
${\vec{μ}}_{LD, 5}$	1	0	0	0	1	0	0	0	0	0	1	0	0	0	1	0
${\vec{μ}}_{LD, 6}$	0	0	0	1	0	0	0	1	0	1	0	0	0	1	0	0
${\vec{μ}}_{LD, 7}$	0	1	0	0	1	0	0	0	0	0	0	1	0	0	1	0
${\vec{μ}}_{LD, 8}$	0	0	1	0	0	0	0	1	1	0	0	0	0	1	0	0

Table 2. The eight nonlocal 8-simplices containing behaviours that violate the corresponding version of the CHSH–Bell inequality. We identify each 8-simplex

Δ_{PR, i}^{8}

with a PR box which solely contributes to the nonlocality of the behaviour violating the CHSH–Bell inequality.

Table 2. The eight nonlocal 8-simplices containing behaviours that violate the corresponding version of the CHSH–Bell inequality. We identify each 8-simplex

Δ_{PR, i}^{8}

with a PR box which solely contributes to the nonlocality of the behaviour violating the CHSH–Bell inequality.

$B^{α β γ}$	$Δ_{PR, i}^{8}$
$B^{000}$	$Δ_{PR, 1}^{8} : = conv \{μ_{PR}^{000}, μ_{LD}^{0000}, μ_{LD}^{0101}, μ_{LD}^{0010}, μ_{LD}^{0111}, μ_{LD}^{1000}, μ_{LD}^{1101}, μ_{LD}^{1011}, μ_{LD}^{1110}\}$
$B^{001}$	$Δ_{PR, 2}^{8} : = conv \{μ_{PR}^{001}, μ_{LD}^{0001}, μ_{LD}^{0011}, μ_{LD}^{0100}, μ_{LD}^{0110}, μ_{LD}^{1001}, μ_{LD}^{1010}, μ_{LD}^{1100}, μ_{LD}^{1111}\}$
$B^{010}$	$Δ_{PR, 3}^{8} : = conv \{μ_{PR}^{010}, μ_{LD}^{0000}, μ_{LD}^{0010}, μ_{LD}^{0101}, μ_{LD}^{0111}, μ_{LD}^{1001}, μ_{LD}^{1010}, μ_{LD}^{1100}, μ_{LD}^{1111}\}$
$B^{011}$	$Δ_{PR, 4}^{8} : = conv \{μ_{PR}^{011}, μ_{LD}^{0001}, μ_{LD}^{0011}, μ_{LD}^{0100}, μ_{LD}^{0110}, μ_{LD}^{1000}, μ_{LD}^{1011}, μ_{LD}^{1101}, μ_{LD}^{1110}\}$
$B^{100}$	$Δ_{PR, 5} : = conv \{μ_{PR}^{100}, μ_{LD}^{0000}, μ_{LD}^{0011}, μ_{LD}^{0101}, μ_{LD}^{0110}, μ_{LD}^{1000}, μ_{LD}^{1010}, μ_{LD}^{1101}, μ_{LD}^{1111}\}$
$B^{101}$	$Δ_{PR, 6}^{8} : = conv \{μ_{PR}^{101}, μ_{LD}^{0001}, μ_{LD}^{0010}, μ_{LD}^{0100}, μ_{LD}^{0111}, μ_{LD}^{1001}, μ_{LD}^{1011}, μ_{LD}^{1100}, μ_{LD}^{1110}\}$
$B^{110}$	$Δ_{PR, 7}^{8} : = conv \{μ_{PR}^{110}, μ_{LD}^{0001}, μ_{LD}^{0010}, μ_{LD}^{0100}, μ_{LD}^{0111}, μ_{LD}^{1000}, μ_{LD}^{1010}, μ_{LD}^{1101}, μ_{LD}^{1111}\}$
$B^{111}$	$Δ_{PR, 8}^{8} : = conv \{μ_{PR}^{111}, μ_{LD}^{0000}, μ_{LD}^{0011}, μ_{LD}^{0101}, μ_{LD}^{0110}, μ_{LD}^{1001}, μ_{LD}^{1011}, μ_{LD}^{1100}, μ_{LD}^{1110}\}$

Table 3. Tabular representation of the no-signalling behaviours on the two-dimensional slice shown in Figure 4a. The behaviours have uniform marginals, i.e., the probability of observing an outcome conditioned on a measurement setting is

1 / 2

for each party for all outcomes and settings. The behaviours are further constrained in having the third and fourth row completely determined by the first and second, which need not hold in general for uniform marginal distributions, and brings the dimensionality down from four to two. Any behaviour represented as above is parameterised as the values

S

and

S^{^{'}}

of the two versions of the CHSH–Bell expression

E_{00} + E_{01} + E_{10} - E_{11}

and

- E_{00} + E_{01} + E_{10} + E_{11}

, respectively:

s_{1} = (4 + S - S^{'}) / 16

,

s_{2} = (4 + S^{'} - S) / 16

,

s_{3} = (4 + S + S^{'}) / 16

,

s_{4} = (4 - S - S^{'}) / 16

, where for the no-signalling set

- 4 ⩽ S^{'} + S ⩽ 4, - 4 ⩽ S^{'} - S ⩽ 4

and for the quantum set

S^{2} + {(S^{'})}^{2} ⩽ 8

.

Table 3. Tabular representation of the no-signalling behaviours on the two-dimensional slice shown in Figure 4a. The behaviours have uniform marginals, i.e., the probability of observing an outcome conditioned on a measurement setting is

1 / 2

for each party for all outcomes and settings. The behaviours are further constrained in having the third and fourth row completely determined by the first and second, which need not hold in general for uniform marginal distributions, and brings the dimensionality down from four to two. Any behaviour represented as above is parameterised as the values

S

and

S^{^{'}}

of the two versions of the CHSH–Bell expression

E_{00} + E_{01} + E_{10} - E_{11}

and

- E_{00} + E_{01} + E_{10} + E_{11}

, respectively:

s_{1} = (4 + S - S^{'}) / 16

,

s_{2} = (4 + S^{'} - S) / 16

,

s_{3} = (4 + S + S^{'}) / 16

,

s_{4} = (4 - S - S^{'}) / 16

, where for the no-signalling set

- 4 ⩽ S^{'} + S ⩽ 4, - 4 ⩽ S^{'} - S ⩽ 4

and for the quantum set

S^{2} + {(S^{'})}^{2} ⩽ 8

.

		ab
		00	01	10	11
xy	00	$s_{1}$	$s_{2}$	$s_{2}$	$s_{1}$
	01	$s_{3}$	$s_{4}$	$s_{4}$	$s_{3}$
	10	$s_{3}$	$s_{4}$	$s_{4}$	$s_{3}$
	11	$s_{2}$	$s_{1}$	$s_{1}$	$s_{2}$

Table 4. Two nonlocal extremal behaviours for the CGLMP scenario with 3 outcomes whose equal mixtures are nonlocal. The inputs

x, y \in {0, 1}

and the outcomes

a, b \in {0, 1, 2}

with

x^{'} = x \oplus 1, y^{'} = y \oplus 1

and

a^{'} = a \oplus_{3} 1, a^{″} = a \oplus_{3} 2, b^{'} = b \oplus_{3} 1, b^{″} = b \oplus_{3} 2

. The symbol ⊕ denotes addition modulo 2 and

\oplus_{3}

denotes addition modulo 3. The missing entries correspond to 0. The top behaviour comes directly from (49) while the bottom behaviour is obtained through the relabelling

x \leftrightarrow x^{'}

and

y \leftrightarrow y^{'}

. An equal mixture of these two boxes lies outside the local polytope.

Table 4. Two nonlocal extremal behaviours for the CGLMP scenario with 3 outcomes whose equal mixtures are nonlocal. The inputs

x, y \in {0, 1}

and the outcomes

a, b \in {0, 1, 2}

with

x^{'} = x \oplus 1, y^{'} = y \oplus 1

and

a^{'} = a \oplus_{3} 1, a^{″} = a \oplus_{3} 2, b^{'} = b \oplus_{3} 1, b^{″} = b \oplus_{3} 2

. The symbol ⊕ denotes addition modulo 2 and

\oplus_{3}

denotes addition modulo 3. The missing entries correspond to 0. The top behaviour comes directly from (49) while the bottom behaviour is obtained through the relabelling

x \leftrightarrow x^{'}

and

y \leftrightarrow y^{'}

. An equal mixture of these two boxes lies outside the local polytope.

	$ab$	${ab}^{'}$	$a^{'} b^{'}$	$a^{'} b^{″}$	$a^{″} b$	$a^{″} b^{″}$
$x y$	$1 / 3$		$1 / 3$			$1 / 3$
$x y^{'}$	$1 / 3$		$1 / 3$			$1 / 3$
$x^{'} y$	$1 / 3$		$1 / 3$			$1 / 3$
$x^{'} y^{'}$		$1 / 3$		$1 / 3$	$1 / 3$
$x y$		$1 / 3$		$1 / 3$	$1 / 3$
$x y^{'}$	$1 / 3$		$1 / 3$			$1 / 3$
$x^{'} y$	$1 / 3$		$1 / 3$			$1 / 3$
$x^{'} y^{'}$	$1 / 3$		$1 / 3$			$1 / 3$

Table 5. Nonlocal no-signalling extremal behaviour with all input choices

x, y \in {0, 1, 2}

for Alice and Bob having uniform probabilities of outcomes.

Table 5. Nonlocal no-signalling extremal behaviour with all input choices

x, y \in {0, 1, 2}

for Alice and Bob having uniform probabilities of outcomes.

		y
		0		1		2
x	0	$1 / 2$	0	$1 / 2$	0	$1 / 2$	0
	0	0	$1 / 2$	0	$1 / 2$	0	$1 / 2$
	1	$1 / 2$	0	0	$1 / 2$	?
	1	0	$1 / 2$	$1 / 2$	0
	2	$1 / 2$	0	?		?
	2	0	$1 / 2$

Table 6. All inputs for Alice and inputs

y \in {0, 1}

for Bob have uniform probabilities of outcomes, while Bob’s outcome for

y = 2

is deterministic.

Table 6. All inputs for Alice and inputs

y \in {0, 1}

for Bob have uniform probabilities of outcomes, while Bob’s outcome for

y = 2

is deterministic.

		y
		0		1		2
x	0	$1 / 2$	0	$1 / 2$	0	$1 / 2$	0
	0	0	$1 / 2$	0	$1 / 2$	$1 / 2$	0
	1	$1 / 2$	0	0	$1 / 2$	$1 / 2$	0
	1	0	$1 / 2$	$1 / 2$	0	$1 / 2$	0
	2	$1 / 2$	0	?		$1 / 2$	0
	2	0	$1 / 2$			$1 / 2$	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Patra, S.; Bierhorst, P. Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework. Entropy 2023, 25, 1291. https://doi.org/10.3390/e25091291

AMA Style

Patra S, Bierhorst P. Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework. Entropy. 2023; 25(9):1291. https://doi.org/10.3390/e25091291

Chicago/Turabian Style

Patra, Soumyadip, and Peter Bierhorst. 2023. "Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework" Entropy 25, no. 9: 1291. https://doi.org/10.3390/e25091291

APA Style

Patra, S., & Bierhorst, P. (2023). Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework. Entropy, 25(9), 1291. https://doi.org/10.3390/e25091291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotically Optimal Adversarial Strategies for the Probability Estimation Framework

Abstract

1. Introduction

2. The Probability Estimation Framework

3. Asymptotic Performance

3.1. I.I.D. Attacks

3.2. Optimal PEFs

3.3. Robustness of PEFs

4. Application to the (2,2,2) Bell Scenario

4.1. A Brief Review of the (2,2,2) Bell Scenario

4.2. Robust PEFs and Optimal Adversarial Attacks in the (2,2,2) Bell Scenario

4.3. Characterising the Optimal Attack in Different Scenarios

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proofs for Theorems 1 and 2

Appendix A.1.1. Proof for Theorem 1

Appendix A.1.2. Proof for Theorem 2

Appendix B

Appendix B.1. Proofs Using Convex Geometry

Appendix B.1.1. Proof for Theorem 4

Appendix B.1.2. Proof for Theorem 5

Appendix C

Concavity of Conditional Shannon Entropy

Appendix D

Useful Lemmas

Appendix E

Inequalities Relating Smooth Average Conditional Min-Entropy and Smooth Worst-Case Conditional Min-Entropy

Appendix F

Proof of Proposition 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI