Guessing with a Bit of Help

Weinberger, Nir; Shayevitz, Ofer

doi:10.3390/e22010039

Open AccessArticle

Guessing with a Bit of Help^†

by

Nir Weinberger

^1,*

and

Ofer Shayevitz

²

¹

Institute for Data, Systems, and Society and Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

²

Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 69978, Israel

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in proceedings of IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018.

Entropy 2020, 22(1), 39; https://doi.org/10.3390/e22010039

Submission received: 29 August 2019 / Revised: 22 December 2019 / Accepted: 23 December 2019 / Published: 26 December 2019

(This article belongs to the Special Issue Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding)

Download

Browse Figures

Versions Notes

Abstract

What is the value of just a few bits to a guesser? We study this problem in a setup where Alice wishes to guess an independent and identically distributed (i.i.d.) random vector and can procure a fixed number of k information bits from Bob, who has observed this vector through a memoryless channel. We are interested in the guessing ratio, which we define as the ratio of Alice’s guessing-moments with and without observing Bob’s bits. For the case of a uniform binary vector observed through a binary symmetric channel, we provide two upper bounds on the guessing ratio by analyzing the performance of the dictator (for general

k \geq 1

) and majority functions (for

k = 1

). We further provide a lower bound via maximum entropy (for general

k \geq 1

) and a lower bound based on Fourier-analytic/hypercontractivity arguments (for

k = 1

). We then extend our maximum entropy argument to give a lower bound on the guessing ratio for a general channel with a binary uniform input that is expressed using the strong data-processing inequality constant of the reverse channel. We compute this bound for the binary erasure channel and conjecture that greedy dictator functions achieve the optimal guessing ratio.

Keywords:

boolean functions; fourier analysis; guessing moments; guessing with a helper; hypercontractivity; maximum entropy; strong data-processing inequalities

1. Introduction

In the classical guessing problem, Alice wishes to learn the value of a discrete random variable (r.v.) X as quickly as possible by sequentially asking yes/no questions of the form “Is

X = x

?”, until she makes a correct guess. A guessing strategy corresponds to an ordering of the alphabet of X according to which the guesses are made and induces a random guessing time. It is well known and simple to verify that the guessing strategy which simultaneously minimizes all the positive moments of the guessing time is to order the alphabet according to a decreasing order of probability. Formally, for any

s > 0

, the minimal sth-order guessing-time moment of X is

G_{s} (X) : = E ({ORD}_{X}^{s} (X)),

(1)

where

{ORD}_{X} (x)

returns the index of the symbol x relative to the order induced by sorting the probabilities in a descending order, with ties broken arbitrarily. For brevity, we refer to

G_{s} (X)

as the guessing-moment of X.

Several motivating problems for studying guesswork are fairness in betting games, computational complexity of sequential decoding [1], computational complexity of lossy source coding and database search algorithms (see the introduction of Reference [2] for a discussion), secrecy systems [3,4,5], and crypt-analysis (password cracking) [6,7]. The guessing problem was first introduced and studied in an information-theoretic framework by Massey [8], who drew a relation between the average guessing time of an r.v. to its entropy. It was later explored more systematically by Arikan [1], who also introduced the problem of guessing with side information. In this problem, Alice is in possession of another r.v. Y that is jointly distributed with X, and then, the optimal conditional guessing strategy is to guess by decreasing order of conditional probabilities. Hence, the associated minimal conditional sth-order guessing-time moment of X given Y is

G_{s} (X | Y) : = E ({ORD}_{X | Y}^{s} (X ∣ Y)),

(2)

where

{ORD}_{X | Y} (x ∣ y)

returns the index of x relative to the order induced by sorting the conditional probabilities of X given that

Y = y

in a descending order. Arikan showed that, as intuition suggests, side information reduces the guessing-moments ([1], Corollary 1)

G_{s} (X | Y) \leq G_{s} (X) .

(3)

Furthermore, he showed that, if

{(X_{i}, Y_{i})}_{i = 1}^{n}

is an i.i.d. sequence, then ([1], Proposition 5)

lim_{n \to \infty} \frac{1}{n} log G_{s}^{1 / s} (X^{n} | Y^{n}) = H_{\frac{1}{1 + s}} (X_{1} ∣ Y_{1}),

(4)

where

H_{α} (X ∣ Y)

is the Arimoto-Rényi conditional entropy of order

α

. As was noted by Arikan a few years later [9], the guessing moments are related to the large deviations behavior of the random variable

\frac{1}{n} log {ORD}_{X^{n} | Y^{n}} (X^{n} ∣ Y^{n})

. However, in Reference [9], he was only able to obtain right-tail large deviation bounds since asymptotically tight bounds on

G_{s} (X^{n} ∣ Y^{n})

were only known for positive moments (

s > 0

). Large deviation principle for the normalized logarithm of the guessing time was later established in Reference [10] using substantial results from References [11,12]. Throughout the years, information-theoretic analysis of the guessing problem was extended in multiple directions, such as guessing until the distortion between the guess and the true value is below a certain threshold [2], guessing under source uncertainty [13], and improved bounds at finite blocklength [14,15,16], to name a few.

In the conditional setting described above, one may think of

Y^{n}

as side information observed by a “helper”, say Bob, who sends his observations to Alice. Nonetheless, as other problems employing a helper (e.g., source coding [17,18]), it is more realistic to impose communication constraints and to assume that Bob can only send a compressed description of

Y^{n}

to Alice. This setting was recently addressed by Graczyk and Lapidoth [19,20], who considered the case where Bob encodes

Y^{n}

at a positive rate using

n R

bits before sending this description to Alice. They then characterized the best possible guessing-moments attained by Alice for general distributions as a function of the rate R. In this paper, we take this setting to its extreme and attempt to quantify the value of k bits in terms of reducing the guessing-moments by allowing Bob to use only a k-bit description of

Y^{n}

. The major difference from previous work is that, here, k is finite and does not increase with n, and for some of our results, we further concentrate on the extreme case of

k = 1

—a single bit of help. To that end, we define (Section 2) the guessing ratio, which is the (asymptotically) best possible ratio of the guessing-moments of

X^{n}

obtained with and without observing a function

f (Y^{n}) \in {0, 1}^{k}

, i.e., the minimal possible ratio

G_{s} (X^{n} ∣ f (Y^{n})) / G_{s} (X^{n})

as a function of

s > 0

, in the limit of large n.

Sharply characterizing the guessing ratio appears to be a difficult problem in general. Here, we mostly focus on the special case where

X^{n}

is uniformly distributed over the Boolean cube

{0, 1}^{n}

and

Y^{n}

is obtained by passing

X^{n}

through a memoryless binary symmetric channel (BSC) with crossover probability

δ

(Section 3). We derive two upper bounds and two lower bounds on the guessing ratio in this case. The upper bounds are derived by analyzing the ratio attained by two specific functions, k-Dictator, to wit

f (Y^{n}) = Y^{k}

, and Majority, to wit

f (Y^{n}) = 𝟙 (\sum_{i = 1}^{n} Y_{i} > \frac{n}{2})

, where

𝟙 (\cdot)

is the indicator function, and for simplicity, we henceforth assume that n is odd when discussing majority functions. For

k = 1

, we demonstrate that neither of these functions is better than the other for all values of the moment order s. The first lower bound is based on relating the guessing-moment to entropy using maximum-entropy arguments (generalizing a result of Reference [8]), and the second one on Fourier-analytic techniques combined with a hypercontractivity argument [21]. Furthermore, for the restricted class of functions for which the constituent k-bit functions operate on disjoint sets of bits, a general method is proposed for transforming a lower bound valid for

k = 1

to a lower bound valid for any

k \geq 1

. Nonetheless, we remark that our bounds are valid for

s > 0

and obtaining similar bounds for

s < 0

in order to obtain large deviation principle for the normalized logarithm of the guessing time remains an open problem. In Section 4, we briefly discuss the more general case where

X^{n}

is still uniform over the Boolean cube, but

Y^{n}

is obtained from

X^{n}

via a general binary-input, arbitrary-output channel. We generalize our entropy lower bound to this case using the strong data-processing inequality (SDPI) applied to the reverse channel (from Y to X). We then discuss the case of the binary erasure channel (BEC), for which we also provide an upper bound by analyzing the greedy dictator function, namely where Bob sends the first bit that has not been erased. We conjecture that this function minimizes the guessing-moments simultaneously at all erasure parameters and all moments s.

Related Work. As mentioned above, Graczyk and Lapidoth [19,20] considered the same guessing question if Bob can communicate with Alice at some positive rate R, i.e., can use

k = n R

bits to describe

Y^{n}

. This setup facilitates the use of large-deviation-based information-theoretic techniques, which allowed the authors to characterize the optimal reduction in the guessing-moments as a function of R to the first order in the exponent. This type of argument cannot be applied in our setup of finite number of bits. Furthermore, as we shall see, in our setup, the exponential order of the guessing moment with help is equal to the one without it and the performance is therefore more finely characterized by bounding the ratio of the guessing-moments. For a single bit of help

k = 1

, characterizing the guessing ratio in the case of the BSC with a uniform input can also be thought of as a guessing variant of the most informative Boolean function problem introduced by Kumar and Courtade [22]. There, the maximal reduction in the entropy of

X^{n}

obtainable by observing a Boolean function

f (Y^{n})

is sought after. It was conjectured in Reference [22] that a dictator function, e.g.,

f (y^{n}) = y_{1}

, is optimal simultaneously at all noise levels; see References [23,24,25,26] for some recent progress. As in the guessing case, allowing Bob to describe

Y^{n}

using

n R

bits renders the problem amenable to an exact information-theoretic characterization [27]. In another related work [28], we have asked about the Boolean function

Y^{n}

that maximizes the reduction in the sequential mean-squared prediction error of

X^{n}

and showed that the majority function is optimal in the noiseless case. There is, however, no single function that is simultaneously optimal at all noise levels. Finally, in a recent line of works [29,30], the average guessing time using the help of a noisy version of

f (X^{n})

has been considered. The model in this paper is different since the noise is applied to the inputs of the function rather than to its output.

2. Problem Statement

Let

X^{n}

be an i.i.d. vector from a distribution

P_{X}

, which is transmitted over a memoryless channel of conditional distribution

P_{Y | X}

. A helper observes

Y^{n} \in Y^{n}

at the output of the channel and can send k bits

f (Y^{n}), f : Y^{n} \to {0, 1}^{k}

to a guesser of

X^{n}

. Our goal is to characterize the best possible multiplicative reduction in guessing-moments offered by a function f, in the limit of large n. Precisely, we wish to characterize the guessing ratio, defined as

γ_{s, k} (P_{X}, P_{Y | X}) : = \underset{n \to \infty}{lim sup} min_{f : Y^{n} \to {0, 1}^{k}} \frac{G_{s} (X^{n} ∣ f (Y^{n}))}{G_{s} (X^{n})}

(5)

for an arbitrary

s > 0

. In this paper, we are mostly interested in the case where

P_{X} = (1 / 2, 1 / 2)

, i.e.,

X^{n}

is uniformly distributed over

{0, 1}^{n}

, and where the channel is a BSC with crossover probability

δ \in [0, 1 / 2]

. With a slight abuse of notation, we denote the guessing ratio in this case by

γ_{s, k} (δ)

. Furthermore, some of the results will be restricted to the case of a single bit of help (

k = 1

), and in this case, we will further abbreviate the notation from

γ_{s, 1} (δ)

to

γ_{s} (δ)

. We note the following basic facts.

Proposition 1.

The following properties hold:

1.: The minimum in Equation (5) is achieved by a sequence of deterministic functions.
2.: $γ_{s, k} (δ)$ is a non-decreasing function of $δ \in [0, 1 / 2]$ which satisfies $γ_{s, k} (0) = 2^{- s k}$ and $γ_{s, k} (1 / 2) = 1$ . In addition, $γ_{s, k} (0)$ is attained by any sequence of functions $f_{n}$ such that $f_{n} (Y^{n})$ is a uniform Bernoulli vector, i.e., $Pr (f_{n} (Y^{n}) = b^{k}) = 2^{- k}$ for all $b^{k} \in {0, 1}^{k}$ .
3.: For a BSC $P_{Y | X}$ , the limit-supremum in Equation (5) defining $γ_{s, k} (δ)$ is a regular limit.
4.: If $k = 1$ and $X^{n}$ is a uniformly distributed vector, then the optimal guessing order given that $f (Y^{n}) = 0$ is reversed to the optimal guessing order when $f (Y^{n}) = 1$ .

Proof.

See Appendix A. □

3. Guessing Ratio for a Binary Symmetric Channel

3.1. Main Results

We begin by presenting the bound on the guessing ratio

γ_{s, k} (δ)

obtained by k-dictator functions and then proceed to the bound obtained by majority functions for a single bit of help,

k = 1

. The proofs are given in the next two subsections.

Theorem 1.

Let

L_{k, w} : = \sum_{v = 0}^{w} (\binom{k}{v})

for

w \in {0, 1, \dots, k}

. The guessing ratio is upper bounded as

γ_{s, k} (δ) \leq (1 - 2 δ) \cdot 2^{- s k} \cdot \sum_{w = 0}^{k - 1} {(1 - δ)}^{k - 1 - w} δ^{w} \cdot L_{k, w}^{s + 1} + {(2 δ)}^{k},

(6)

and this upper bound is achieved by k-dictator functions,

f (y^{n}) = y^{k}

.

Specifically, for

k = 1

, Theorem 1 implies

γ_{s} (δ) \leq (1 - 2 δ) \cdot 2^{- s} + 2 δ .

(7)

Theorem 2.

Let

β : = \frac{1 - 2 δ}{\sqrt{4 δ (1 - δ)}}

and

Z \sim N (0, 1)

, and denote by

Q (\cdot)

the tail distribution function of the standard normal distribution. Then, the guessing ratio is upper bounded as

γ_{s} (δ) \leq 2 \cdot (s + 1) \cdot E [Q (β Z) \cdot {(1 - Q (Z))}^{s}],

(8)

and this upper bound is achieved by majority functions,

f (y^{n}) = 𝟙 (\sum_{i = 1}^{n} y_{i} > \frac{n}{2})

.

We remark that, if

k = 1

, the guessing ratio of functions similar to the dictator and majority functions, such as single-bit dictator on

j > 1

inputs (

f (y^{n}) = 1

if and only if

y^{j} = 1^{j}

) or unbalanced majority (

f (y^{n}) = 𝟙 (\sum_{i = 1}^{n} y_{i} > t)

for some t), may also be analyzed in a similar way. However, numerical computations indicate that they do not improve the bounds of Theorems 1 and 2, and thus, their analysis is omitted.

We next present two lower bounds on the guessing ratio

γ_{s, k} (δ)

. The first is based on maximum-entropy arguments, and the second is based on Fourier-analytic arguments.

Theorem 3.

The guessing ratio satisfies the following lower bound:

γ_{s, k} (δ) \geq e^{- 1} \cdot \frac{s^{s - 1} \cdot (s + 1)}{Γ^{s} (\frac{1}{s})} \cdot 2^{- s k {(1 - 2 δ)}^{2}}

(9)

where

Γ (z) : = \int_{0}^{\infty} t^{z - 1} e^{- t} d t

is Euler’s Gamma function (defined for

ℜ {z} > 0

).

Remark 1.

When restricted to

k = 1

, the proof of Theorem 3 utilizes the bound

H (X^{n} | f (Y^{n})) \geq n - {(1 - 2 δ)}^{2}

(see Equation (63)). For balanced functions, this bound was improved in Reference [23] for

1 / 2 (1 - 1 / \sqrt{3}) \leq δ \leq 1 / 2

. Using this improved bound here leads to an immediate improvement in the bound of Theorem 3. Furthermore, it is known [24] that there exists

δ_{0}

such that the most informative Boolean function conjecture holds for all

δ_{0} \leq δ \leq 1 / 2

. For such crossover probabilities,

H (X^{n} | f (Y^{n})) \geq n - 1 + h (δ)

(10)

holds, and then, Theorem 3 may be improved to

γ_{s} (δ) \geq e^{- 1} \cdot \frac{s^{s - 1} \cdot (s + 1)}{Γ^{s} (\frac{1}{s})} \cdot 2^{- s (1 - h (δ))} .

(11)

Our Fourier-based bound for

k = 1

is as follows:

Theorem 4.

Let

τ : = 1 + {(1 - 2 δ)}^{2 (1 - λ)}

. The guessing ratio is lower bounded as

γ_{s} (δ) \geq max_{0 \leq λ \leq 1} [1 - \frac{(s + 1) \cdot {(1 - 2 δ)}^{λ}}{{(τ s + 1)}^{1 / τ}}] .

(12)

This bound can be weakened by the possibly suboptimal choice

λ = 1

, which leads to a simpler yet explicit bound:

Corollary 1.

γ_{s} (δ) \geq 1 - \frac{(s + 1) \cdot (1 - 2 δ)}{\sqrt{1 + 2 s}} .

(13)

The bound in Theorem 4 is only valid for the case

k = 1

. An interesting problem is to find a general way of “transforming” a lower bound which assumes

k = 1

to a bound useful for

k > 1

. In principle, such a result could stem from the observation that a k bit function provides k different conditional optimal guessing orders for each of its output bits. For a general function, however, distilling a useful bound from this observation seems challenging since the relation between the optimal guessing order induced by each of the bits and the optimal guessing order induced by all k bits might be involved. Nonetheless, such a result is possible to obtain if each of the k single-bit functions operate on a different set of input bits. For this restricted set of functions, there is a simple bound which relates the optimal ordering given each of the bits and all the k bits together. It is reasonable to conjecture that this restricted sub-class is optimal or at least close to optimal, since it seems that more information is transferred to the guesser when the k functions operate on different sets of bits, which make the k functions statistically independent.

Specifically, let us specify a k-bit function

f : Y^{n} \to {0, 1}^{k}

by its k constituent one-bit functions

f_{j} : Y^{n} \to {0, 1}

,

j \in [k]

. Let

F_{k}

be the set of sequences of functions

{f^{(n)}}

,

f^{(n)} : Y^{n} \to {0, 1}^{k}

, such that each specific sequence of functions

{f^{(n)}}

satisfies the following property: There exists a sequence of partitions

{{I_{j}^{(n)}}_{j \in [k]}}_{n = 1}^{\infty}

of

[n]

, such that, for all

n \geq 1

and

j \in [k]

,

f_{j}^{(n)} (Y^{n})

only depends on

{Y_{i}}_{i \in I_{j}^{(n)}}

and

{lim}_{n \to \infty} | I_{j}^{(n)} | = \infty

for all

j \in [k]

. In particular, this implies that

{f_{j}^{(n)} (Y^{n})}_{j \in [k]}

is mutually independent for all

n \geq 1

. For example, when

k = 2

,

f_{1} (x^{n}) = x_{1}

, and

f_{2} (x^{n}) = x_{2}

, we can choose

I_{j}^{(n)}

to be the odd/even indices. For

f_{1} = Maj (y_{1}^{n / 2})

and

f_{2} = Maj (y_{n / 2 + 1}^{n})

, the sets are the first and second halves of

[n]

. As in Equation (5), we may define the guessing ratio of this constrained set of functions as

{\tilde{γ}}_{s, k} (δ) : = min_{{f^{(n)}}_{n = 1}^{\infty} \in F_{k}} \underset{n \to \infty}{lim sup} \frac{G_{s} (X^{n} ∣ f^{(n)} (Y^{n}))}{G_{s} (X^{n})},

(14)

where, in general,

{\tilde{γ}}_{s, k} (δ) \geq γ_{s, k} (δ)

.

Proposition 2.

{\tilde{γ}}_{s, k} (δ) \geq \frac{{\tilde{γ}}_{s, 1}^{k} (δ)}{{(s + 1)}^{k - 1}} .

(15)

We demonstrate our results for

k = 1

in Figure 1 (resp. Figure 2) which display the bounds on

γ_{s} (δ)

for fixed values of s (resp.

δ

). The numerical results show that, for the upper bounds, when

s ≲ 3.5

, dictator dominates majority (for all values of

δ

), whereas for

s ≳ 4.25

, majority dominates dictator. For

3.5 ≲ s ≲ 4.25

, there exists

δ_{s}^{'}

such that majority is better for

δ \in (0, δ_{s}^{'})

and dictator is better for

δ \in (δ_{s}^{'}, 1 / 2)

. Figure 2 demonstrates the switch from dictator to majority as s increases (depending on

δ

). As for lower bounds, we first remark that the conjectured maximum-entropy bound (Equation (11)) is also plotted (see Remark 1). The numerical results show that the maximum-entropy bound is better for low values of

δ

whereas the Fourier-analysis bound is better for high values of

δ

. As a function of s, the maximum-entropy bound (resp. Fourier-analysis bound) is better for high (resp. low) values of s. We also mention that, in these figures, the maximizing parameter in the Fourier-based bound (Theorem 4) is

λ = 1

and the resulting bound is as in Equation (13). However, for values of s as low as 10, the maximizing

λ

may be far from 1, and in fact, it continuously and monotonically increases from 0 to 1 as

δ

increases from 0 to

1 / 2

. Finally, Figure 3 demonstrates the behavior of the k-dictator and maximum-entropy bounds on

γ_{s, k} (δ)

as a function of k.

3.2. Proofs of the Upper Bounds on $γ_{s, k} (δ)$

Let

a, b \in N

,

a \leq b

be given. The following sum will be useful for the proofs in the rest of the paper:

K_{s} (a, b) : = \frac{1}{b - a} \sum_{i = a + 1}^{b} i^{s},

(16)

where we will abbreviate

K_{s} (b) : = K_{s} (0, b)

. For a pair of sequences

{a_{n}}_{n = 1}^{\infty}

, {

b_{n}}_{n = 1}^{\infty}

, we will let

a_{n} ≐ b_{n}

mean that

{lim}_{n \to \infty} \frac{a_{n}}{b_{n}} = 1

.

Lemma 1.

Let

{a_{n}}_{n = 1}^{\infty}

and

{b_{n}}_{n = 1}^{\infty}

be non-decreasing integer sequences such that

a_{n} < b_{n}

for all n and

{lim}_{n \to \infty} (a_{n} + 1) / b_{n} = 0

. Then,

K_{s} (a_{n}, b_{n}) ≐ \frac{1}{s + 1} \cdot \frac{b_{n}^{s + 1} - a_{n}^{s + 1}}{b_{n} - a_{n}} .

(17)

Specifically,

G_{s} (X^{n}) = K_{s} (2^{n}) ≐ \frac{2^{s n}}{s + 1}

.

Proof.

See Appendix A. □

We next prove Theorem 1.

Proof of Theorem 1.

Consider a k-dictator function which directly outputs k of the bits of

y^{n}

, say, without loss of generality (w.l.o.g.)

f (y^{n}) = y^{k}

. Let

d_{H} (x^{n}, y^{n})

be the Hamming distance of

x^{n}

and

y^{n}

, and recall the assumption

0 < δ < 1 / 2

. It is easily verified that the optimal guessing order of

X^{n}

given

y^{k}

has

k + 1

parts, such that the wth part,

w \in {0, 1, \dots, k}

, is comprised of an arbitrary ordering of the

(\binom{k}{w}) \cdot 2^{n - k}

vectors for which

d_{H} (x^{k}, y^{k}) = w

. From symmetry,

G_{s} (X^{n} ∣ f (Y^{n})) = G_{s} (X^{n} ∣ f (Y^{n}) = b^{k})

for any

b^{k} \in {0, 1}^{k}

. Then, from Lemma 1

\begin{matrix} G_{s} (X^{n} ∣ f (Y^{n}) = b^{k}) & = \sum_{w = 0}^{k} (\binom{k}{w}) {(1 - δ)}^{k - w} δ^{w} \cdot K_{s} (2^{n - k} \cdot L_{k, w - 1}, 2^{n - k} \cdot L_{k, w}) \end{matrix}

(18)

\begin{matrix} = \sum_{w = 0}^{k} (\binom{k}{w}) {(1 - δ)}^{k - w} δ^{w} \cdot K_{s} (2^{n - k} \cdot L_{k, w - 1}, 2^{n - k} \cdot L_{k, w}) \end{matrix}

(19)

\begin{matrix} ≐ \sum_{w = 0}^{k} (\binom{k}{w}) {(1 - δ)}^{k - w} δ^{w} \cdot \frac{2^{s (n - k)}}{s + 1} \cdot \frac{L_{k, w}^{s + 1} - L_{k, w - 1}^{s + 1}}{(\binom{k}{w})} \end{matrix}

(20)

\begin{matrix} = \frac{2^{s (n - k)}}{s + 1} \sum_{w = 0}^{k} {(1 - δ)}^{k - w} δ^{w} \cdot [L_{k, w}^{s + 1} - L_{k, w - 1}^{s + 1}] \end{matrix}

(21)

\begin{matrix} = \frac{2^{s (n - k)}}{s + 1} ((1 - 2 δ) \sum_{w = 0}^{k - 1} {(1 - δ)}^{k - 1 - w} δ^{w} \cdot L_{k, w}^{s + 1} + δ^{k} 2^{k (s + 1)}) \end{matrix}

(22)

where in the first equality,

L_{k, - 1} : = 0

, and the last equality is obtained by telescoping the sum. The result then follows from Equation (5) and Lemma 1. □

We next prove Theorem 2.

Proof of Theorem 2.

Recall that we assume for simplicity that n is odd. The analysis for an even n is not fundamentally different. To evaluate the guessing-moment, we first need to find the optimal guessing strategy. To this end, we let

W_{H} (x^{n})

be the Hamming weight of

x^{n}

and note that the posterior probability is given by

\begin{matrix} Pr (X^{n} = x^{n} ∣ Maj (Y^{n}) = 1) & = \frac{Pr (Maj (Y^{n}) = 1 ∣ X^{n} = x^{n}) \cdot Pr (X^{n} = x^{n})}{Pr (Maj (Y^{n}) = 1)} \end{matrix}

(23)

\begin{matrix} = 2^{1 - n} \cdot Pr (\sum_{i = 1}^{n} Y_{i} > n / 2 ∣ X^{n} = x^{n}) \end{matrix}

(24)

\begin{matrix} = 2^{1 - n} \cdot Pr (\sum_{i = 1}^{n} Y_{i} > n / 2 ∣ W_{H} (X^{n}) = W_{H} (x^{n})) \end{matrix}

(25)

\begin{matrix} = : 2^{1 - n} \cdot r_{n} (W_{H} (x^{n})), \end{matrix}

(26)

where Equation (25) follows from symmetry. Evidently,

r_{n} (w)

is an increasing function of

w \in {0, 1, \dots, n}

. Indeed, let

Bin (n, δ)

be a binomial r.v. of n trials and success probability

δ

. Then, for any

w \leq n - 1

, as

δ \leq 1 / 2

,

\begin{matrix} r_{n} (w + 1) \end{matrix}

\begin{matrix} = Pr (Bin (w + 1, 1 - δ) + Bin (n - w - 1, δ) > n / 2) \end{matrix}

(27)

\begin{matrix} = Pr (Bin (w, 1 - δ) + Bin (1, 1 - δ) + Bin (n - w - 1, δ) > n / 2) \end{matrix}

(28)

\begin{matrix} \geq Pr (Bin (w, 1 - δ) + Bin (1, δ) + Bin (n - w - 1, δ) > n / 2) \end{matrix}

(29)

\begin{matrix} = Pr (Bin (w, 1 - δ) + Bin (n - w, δ) > n / 2) \end{matrix}

(30)

\begin{matrix} = r_{n} (w), \end{matrix}

(31)

where, in each of the above probabilities, the summation is over an independent binomial r.v. Hence, we deduce that, whenever

Maj (Y^{n}) = 1

(resp.

Maj (Y^{n}) = 0

), the optimal guessing strategy is by decreasing (resp. increasing) Hamming weight (with arbitrary order for inputs of equal Hamming weight).

We can now turn to evaluate the guessing-moment for the optimal strategy given the majority of

Y^{n}

. Let

M_{n, w} : = \sum_{v = 0}^{w} (\binom{n}{v})

for

w \in {0, 1, \dots, n}

. From symmetry,

\begin{matrix} G_{s} (X^{n} ∣ Maj (Y^{n})) & = G_{s} (X^{n} ∣ Maj (Y^{n}) = 1) \end{matrix}

(32)

\begin{matrix} = \sum_{w = 0}^{n} (\binom{n}{w}) 2^{1 - n} r_{n} (w) \sum_{i = M_{n, w - 1} + 1}^{M_{n, w}} i^{s} \end{matrix}

(33)

where

M_{n, - 1} : = 0

. Thus,

\begin{matrix} G_{s} (X^{n} ∣ Maj (Y^{n})) & \geq \sum_{w = 0}^{n} (\binom{n}{w}) 2^{1 - n} r_{n} (w) M_{n, w - 1}^{s} \end{matrix}

(34)

\begin{matrix} = 2^{s n + 1} \cdot E [r_{n} (W) {(\frac{M_{n, W - 1}}{2^{n}})}^{s}] \end{matrix}

(35)

\begin{matrix} = 2^{s n + 1} \cdot E [r_{n} (W) Pr {(W^{'} \leq W - 1)}^{s}], \end{matrix}

(36)

where

W, W^{'} \sim Bin (n, 1 / 2)

and is independent. For evaluating the asymptotic behavior (for large n) of this expression, we note that the Berry–Esseen central-limit theorem ([31], Chapter XVI.5, Theorem 2) leads to (see, e.g., Reference [28], proof of Lemma 15)

r_{n} (w) = Q (β \cdot \frac{2}{\sqrt{n}} [\frac{n}{2} - w]) + \frac{a_{δ}}{\sqrt{n}},

(37)

for some universal constant

a_{δ}

. Using the Berry–Esseen central-limit theorem again, we have that

\frac{2}{\sqrt{n}} (\frac{n}{2} - W^{'}) \overset{d}{\to} Z

, where

Z \sim N (0, 1)

and

\overset{d}{\to}

denote convergence in distribution. Thus for a given w,

\begin{matrix} Pr (W^{'} \leq w - 1) & = 1 - Pr (\frac{2}{\sqrt{n}} (\frac{n}{2} - W^{'}) \geq \frac{2}{\sqrt{n}} (\frac{n}{2} - w - 1)) \end{matrix}

(38)

\begin{matrix} = 1 - Q (\frac{2}{\sqrt{n}} (\frac{n}{2} - w - 1)) - \frac{a_{1 / 2}}{\sqrt{n}} \end{matrix}

(39)

\begin{matrix} = 1 - Q (\frac{2}{\sqrt{n}} (\frac{n}{2} - w)) - O (\frac{1}{\sqrt{n}}), \end{matrix}

(40)

where the last equality follows from the fact that

| Q^{'} (t) | \leq \frac{1}{\sqrt{2 π}}

for all

t \in R

. Using the Berry–Esseen theorem once again, we have that

\frac{2}{\sqrt{n}} (\frac{n}{2} - w) \overset{d}{\to} Z

. Hence, Portmanteau’s lemma (e.g., Reference [31], Chapter VIII.1, Theorem 1) and the fact the

Q (t)

is continuous and bounded result in the following:

G_{s} (X^{n} ∣ Maj (Y^{n})) \geq 2^{s n + 1} \cdot E [Q (β N) \cdot {(1 - Q (N))}^{s}] + O (\frac{1}{n^{s / 2}}) .

(41)

Similarly to Equation (34), the upper bound

G_{s} (X^{n} ∣ Maj (Y^{n})) \leq \sum_{w = 0}^{n} (\binom{n}{w}) 2^{1 - n} r_{n} (w) M_{w}^{s},

(42)

holds, and a similar analysis leads to an expression which asymptotically coincides with the right-hand side (r.h.s.) of Equation (41). The result then follows from Equation (5) and Lemma 1. □

3.3. Proofs of the Lower Bounds on $γ_{s, k} (δ)$

To prove Theorem 3, we first prove the following maximum entropy result. With a standard abuse of notation, we will write the guessing-moment and the entropy of a random variable as functions of its distribution.

Lemma 2.

The maximal entropy under guessing-moment constraint satisfies

max_{P : G_{s} (P) = g} H (P) = log (e^{1 / s} s^{(1 - s) / s} \cdot G_{s}^{1 / s} (P) \cdot Γ (\frac{1}{s})) + o (1),

(43)

where

o (1)

vanishes as

g \to \infty

.

Proof.

To solve the maximum entropy problem ([32], Chapter 12) in Equation (43) (note that the support of P is only restricted to be countable), we first relax the constraint

G_{s} (P) = g

to

\sum_{i = 1}^{\infty} P (i) \cdot i^{s} = g,

(44)

i.e., we omit the requirement that

{P (i)}

is a decreasing sequence. Assuming momentarily that the entropy is measured in nats, it is easily verified (e.g., using the theory of exponential families ([33], Chapter 3) or by Lagrange duality ([34], Chapter 5)) that the entropy maximizing distribution is

P_{λ} (i) : = \frac{exp (- λ i^{s})}{Z (λ)}

(45)

for

i \in N_{+}

, where

Z (λ) : = \sum_{i = 1}^{\infty} exp (- λ i^{s})

is the partition function and

λ > 0

is chosen such that

\sum_{i = 1}^{\infty} P_{λ} (i) \cdot i^{s} = g

. Evidently,

P_{λ} (i)

is in decreasing order (and so is

G_{s} (P_{λ}) = g

) and is therefore the solution to Equation (43). The resulting maximum entropy is then given in a parametric form as

H (P_{λ}) = λ G_{s} (P_{λ}) + ln Z (λ) .

(46)

Evidently, if

g = G_{s} (P_{λ}) \to \infty

, then

λ \to 0

. In this case, we may approximate the limit of the partition function as

λ \to 0

by a Riemann integral. Specifically, by the monotonicity of

e^{- λ i^{s}}

in

i \in N

,

\begin{matrix} Z (λ) & = \sum_{i = 1}^{\infty} e^{- λ i^{s}} \end{matrix}

(47)

\begin{matrix} = \frac{1}{2} (\sum_{i = - \infty}^{\infty} exp (- {(\frac{| i |}{λ^{- 1 / s}})}^{s}) - 1) \end{matrix}

(48)

\begin{matrix} \geq \frac{1}{2} (\int_{- \infty}^{\infty} exp (- {(\frac{| t |}{λ^{- 1 / s}})}^{s}) d t - 1) \end{matrix}

(49)

\begin{matrix} = \frac{1}{s} λ^{- 1 / s} \cdot Γ (\frac{1}{s}) - \frac{1}{2}, \end{matrix}

(50)

where the last equality follows from the definition of the Gamma function (see Theorem 3) or from the identification of the integral as an unnormalized generalized Gaussian distribution of zero mean, scale parameter

λ^{- 1 / s}

, and shape parameter s [35]. Further, by the convexity of

e^{- λ t^{s}}

in

t \in R_{+}

, Jensen’s inequality implies that

e^{- λ i^{s}} \leq \int_{- i - 1 / 2}^{i + 1 / 2} exp (- {λ | t |}^{s}) d t

(51)

for every

i \geq 1

(the r.h.s. can be considered as averaging over a uniform random variable

[i - 1 / 2, i + 1 / 2]

) and so, similarly to Equation (50),

\begin{matrix} Z (λ) & \leq \frac{1}{2} (\int_{- \infty}^{\infty} exp (- {(\frac{| t |}{λ^{- 1 / s}})}^{s}) d t) . \end{matrix}

(52)

Therefore,

Z (λ) = (1 + a_{λ}) \cdot \frac{1}{s} λ^{- 1 / s} \cdot Γ (\frac{1}{s})

(53)

where

a_{λ} \to 0

as

λ \to 0

. In the same spirit,

\begin{matrix} G_{s} (P_{λ}) & = \sum_{i = 1}^{\infty} i^{s} \cdot \frac{exp (- λ i^{s})}{Z (λ)} \end{matrix}

(54)

\begin{matrix} = \frac{\int_{0}^{\infty} t^{s} exp (- {(\frac{| t |}{λ^{- 1 / s}})}^{s}) d t + b_{λ}}{(1 + a_{λ}) \frac{1}{s} λ^{- 1 / s} \cdot Γ (\frac{1}{s})} \end{matrix}

(55)

\begin{matrix} = \frac{\frac{1}{s} λ^{- \frac{s + 1}{s}} \cdot Γ (\frac{s + 1}{s}) + b_{λ}}{(1 + a_{λ}) \frac{1}{s} λ^{- 1 / s} \cdot Γ (\frac{1}{s})} \end{matrix}

(56)

\begin{matrix} = \frac{\frac{1}{s^{2}} λ^{- \frac{s + 1}{s}} \cdot Γ (\frac{1}{s}) + b_{λ}}{(1 + a_{λ}) \frac{1}{s} λ^{- 1 / s} \cdot Γ (\frac{1}{s})} \end{matrix}

(57)

\begin{matrix} = \frac{1}{s λ} (1 + c_{λ}), \end{matrix}

(58)

where in Equation (56),

b_{λ} \to 0

as

λ \to 0

; in Equation (57), the identity

Γ (t + 1) = t Γ (t)

for

t \in R_{+}

was used; and in Equation (58),

c_{λ} \to 0

as

λ \to 0

.

Returning to measure entropy in bits, we thus obtain that, for any distribution P,

H (P) \leq log (\frac{e^{1 / s}}{s} s^{1 / s} \cdot G_{s}^{1 / s} (P) \cdot Γ (\frac{1}{s})) + o (1),

(59)

or, equivalently,

G_{s} (P) \geq Ψ_{s} \cdot 2^{s H (P)} \cdot (1 + o (1)),

(60)

where

Ψ_{s} : = e^{- 1} \cdot \frac{s^{s - 1}}{Γ^{s} (\frac{1}{s})}

and

o (1)

is a vanishing term as

G_{s} (P) \to \infty

. In the same spirit, Equation (60) holds whenever

H (P) \to \infty

. □

Remark 2.

In Reference [8], the maximum-entropy problem was studied for

s = 1

. In this case, the maximum-entropy distribution is readily identified as the geometric distribution. The proof above generalizes that result to any

s > 0

.

Proof of Theorem 3.

Assume that f is taken from a sequence of functions which achieves the minimum in Equation (5). Using Lemma 2 when conditioning on

f (Y^{n}) = b^{k}

for each of possible

b^{k}

, we get (see a rigorous justification to Equation (61) in Appendix A)

\begin{matrix} G_{s} (X^{n} ∣ f (Y^{n})) & \geq ℓ_{n} \cdot Ψ_{s} \cdot \sum_{b^{k} \in {0, 1}^{k}} Pr (f (Y^{n}) = b^{k}) \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})} \end{matrix}

(61)

\begin{matrix} \geq ℓ_{n} \cdot Ψ_{s} \cdot 2^{s H (X^{n} | f (Y^{n}))} \end{matrix}

(62)

\begin{matrix} \geq ℓ_{n} \cdot Ψ_{s} \cdot 2^{s [n - k {(1 - 2 δ)}^{2}]} \end{matrix}

(63)

where in Equation (61),

ℓ_{n} ≐ 1

and Equation (62) follows from Jensen’s inequality. For

k = 1,

the bound in Equation (63) is directly related to the Boolean function conjecture [22] and may be proved in several ways, e.g., using Mrs. Gerber’s Lemma ([36], Theorem 1); see ([23], Section IV), References [27,37]. For general

k \geq 1

, the bound

H (X^{n} | f (Y^{n})) \geq n - k {(1 - 2 δ)}^{2}

was established in Reference ([27], Corollary 1). □

Before presenting the proof of the Fourier-based bound, we briefly remind the reader of the basic definitions and results of Fourier analysis of Boolean functions [21], and to that end, it is convenient to replace the binary alphabet

{0, 1}

by

{- 1, 1}

. An inner product between two real-valued functions on the Boolean cube

f, g : {- 1, 1}^{n} \to R

is defined as

〈f, g〉 : = E (f (X^{n}) g (X^{n})),

(64)

where

X^{n} \in {- 1, 1}^{n}

is a uniform Bernoulli vector. A character associated with a set of coordinates

S \subseteq [n] : = {1, 2, \dots, n}

is the Boolean function

x^{S} : = \prod_{i \in S} x_{i}

, where by convention,

x^{\emptyset} : = 1

. It can be shown ([21], Chapter 1) that the set of all characters forms an orthonormal basis with respect to the inner product (Equation (64)). Furthermore,

f (x^{n}) = \sum_{S \subseteq [n]} {\hat{f}}_{S} \cdot x^{S},

(65)

where

{{\hat{f}}_{S}}_{S \subseteq [n]}

are the Fourier coefficients of f, given by

{\hat{f}}_{S} = 〈 x^{S}, f 〉 = E (X^{S} \cdot f (X^{n}))

. Plancherel’s identity then states that

〈 f, g 〉 = E (f (X^{n}) g (X^{n})) = \sum_{S \in [n]} {\hat{f}}_{S} {\hat{g}}_{S}

. The p norm of a function f is defined as

{∥ f ∥}_{p} : = [E | f (X^{n}) {|^{p}]}^{1 / p}

.

The noise operator operating on a Boolean function f is defined as

T_{ρ} f (x^{n}) = E (f (Y^{n}) ∣ X^{n} = x^{n})

(66)

where

ρ : = 1 - 2 δ

is the correlation parameter. The noise operator has a smoothing effect on the function which is captured by the so-called hypercontractivity theorems. Specifically, we shall use the following version.

Theorem 5

([21], p. 248). Let

f : {- 1, 1}^{n} \to R

and

0 \leq ρ \leq 1

. Then,

∥ T_{ρ} {f ∥}_{2} \leq {∥ f ∥}_{ρ^{2} + 1}

.

With the above, we can prove Theorem 4.

Proof of Theorem 4.

From Bayes law (recall that

f (x^{n}) \in {- 1, 1}

)

Pr (X^{n} = x^{n} ∣ f (Y^{n}) = b) = 2^{- (n + 1)} \cdot \frac{1 + b T_{ρ} f (x^{n})}{Pr (f (Y^{n}) = b)},

(67)

and from the law of total expectation

G_{s} (X^{n} ∣ f (Y^{n})) = Pr (f (Y^{n}) = 1) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = 1) + Pr (f (Y^{n}) = - 1) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = - 1) .

(68)

Let us denote

{\hat{f}}_{ϕ} = E f (X^{n})

and

g : = f - {\hat{f}}_{ϕ}

and abbreviate

{ORD}_{f} (x^{n}) : = {ORD}_{X^{n} ∣ f (Y^{n})} (x^{n} ∣ 1)

. Then, the first addend on the r.h.s. of Equation (68) is given by

\begin{matrix} Pr (f (Y^{n}) = 1) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = 1) & = 2^{- (n + 1)} \sum_{x^{n}} (1 + {\hat{f}}_{ϕ} + T_{ρ} g (x^{n})) \cdot {ORD}_{T_{ρ} g}^{s} (x^{n}) \end{matrix}

(69)

\begin{matrix} = \frac{(1 + {\hat{f}}_{ϕ})}{2} \cdot E ({ORD}_{T_{ρ} g}^{s} (X^{n})) + \frac{1}{2} 〈 T_{ρ} g, {ORD}_{T_{ρ} g}^{s} 〉 \end{matrix}

(70)

\begin{matrix} = \frac{(1 + {\hat{f}}_{ϕ})}{2} \cdot K_{s} (2^{n}) + \frac{1}{2} 〈 T_{ρ} g, {ORD}_{T_{ρ} g}^{s} 〉 \end{matrix}

(71)

\begin{matrix} = \frac{(1 + {\hat{f}}_{ϕ})}{2} \cdot ℓ_{n} \cdot \frac{2^{s n}}{s + 1} + \frac{1}{2} 〈 T_{ρ} g, {ORD}_{T_{ρ} g}^{s} 〉, \end{matrix}

(72)

where, in the last equality,

ℓ_{n} ≐ 1

(Lemma 1). Let

λ \in [0, 1]

, and denote

ρ_{1} : = ρ^{λ}

and

ρ_{2} = ρ^{1 - λ}

. Then, the inner-product term in Equation (72) is upper bounded as

\begin{matrix} |〈 T_{ρ} g, {ORD}_{T_{ρ} g}^{s} 〉| & = |〈 T_{ρ_{1}} g, T_{ρ_{2}} {ORD}_{T_{ρ} g}^{s} 〉| \end{matrix}

(73)

\begin{matrix} \leq ∥ T_{ρ_{1}} {g ∥}_{2} \cdot {∥ T_{ρ_{2}} {ORD}_{T_{ρ} g}^{s} ∥}_{2} \end{matrix}

(74)

\begin{matrix} \leq ρ_{1} \cdot \sqrt{1 - {\hat{f}}_{ϕ}^{2}} \cdot {∥ T_{ρ_{2}} {ORD}_{T_{ρ} g}^{s} ∥}_{2} \end{matrix}

(75)

\begin{matrix} \leq ρ_{1} \cdot \sqrt{1 - {\hat{f}}_{ϕ}^{2}} \cdot {∥ {ORD}_{T_{ρ} g}^{s} ∥}_{1 + ρ_{2}^{2}} \end{matrix}

(76)

\begin{matrix} = ρ_{1} \cdot \sqrt{1 - {\hat{f}}_{ϕ}^{2}} \cdot {(K_{(1 + ρ_{2}^{2}) s} (2^{n}))}^{1 / (1 + ρ_{2}^{2})} \end{matrix}

(77)

\begin{matrix} = ρ_{1} \cdot \sqrt{1 - {\hat{f}}_{ϕ}^{2}} \cdot {[k_{n} \cdot \frac{1}{(1 + ρ_{2}^{2}) s + 1}]}^{1 / (1 + ρ_{2}^{2})} \cdot 2^{s n}, \end{matrix}

(78)

where Equation (73) holds since

T_{ρ}

is a self-adjoint operator and Equation (74) follows from the Cauchy–Schwarz inequality. To justify Equation (75), we note that

\begin{matrix} ∥ T_{ρ} {g ∥}_{2}^{2} & = 〈 T_{ρ} g, T_{ρ} g 〉 \end{matrix}

(79)

\begin{matrix} = \sum_{S \in [n]} ρ^{2 | S |} {\hat{g}}_{S}^{2} \end{matrix}

(80)

\begin{matrix} = \sum_{S \in [n] \ ϕ} ρ^{2 | S |} {\hat{f}}_{S}^{2} \end{matrix}

(81)

\begin{matrix} \leq ρ^{2} \cdot (1 - {\hat{f}}_{ϕ}^{2}), \end{matrix}

(82)

where Equation (80) follows from Plancherel’s identity, Equation (81) is since

{\hat{g}}_{S} = {\hat{f}}_{S}

for all

S \neq ϕ

and

{\hat{g}}_{ϕ} = 0

, and Equation (82) follows from

\sum_{S \in [n]} {\hat{f}}_{S}^{2} = {∥ f ∥}_{2}^{2} = E f^{2} = 1

. Equation (76) follows from Theorem 5, and in Equation (78),

k_{n} ≐ 1

. The second addend on the r.h.s. of Equation (68) can be bounded in the same manner. Hence,

\begin{matrix} G_{s} (X^{n} ∣ f (Y^{n})) & \geq max_{0 \leq λ \leq 1} 2^{s n} \cdot [ℓ_{n} \cdot \frac{1}{s + 1} - ρ^{λ} \cdot \sqrt{1 - {\hat{f}}_{ϕ}^{2}} \cdot {[k_{n} \cdot \frac{1}{(1 + ρ^{2 (1 - λ)}) s + 1}]}^{1 / (1 + ρ^{2 (1 - λ)})}] \end{matrix}

(83)

\begin{matrix} \geq max_{0 \leq λ \leq 1} 2^{s n} \cdot [ℓ_{n} \cdot \frac{1}{s + 1} - ρ^{λ} {[k_{n} \cdot \frac{1}{(1 + ρ^{2 (1 - λ)}) s + 1}]}^{1 / (1 + ρ^{2 (1 - λ)})}] \end{matrix}

(84)

\begin{matrix} \to 2^{s n} \cdot max_{0 \leq λ \leq 1} [\frac{1}{s + 1} - \frac{ρ^{λ}}{{[(1 + ρ^{2 (1 - λ)}) s + 1]}^{1 / (1 + ρ^{2 (1 - λ)})}}] \end{matrix}

(85)

as

n \to \infty

. □

We close this section with the following proof of Proposition 2:

Proof of Proposition 2.

Let

I = (i_{1}, \dots, i_{L})

be a vector of indices in

[n]

such that

1 \leq i_{1} < i_{2} < \dots < i_{L} \leq n

, and let

x^{n} (I) = (x_{i_{1}}, \dots, x_{i_{L}})

be the components of

x^{n}

in those indices. Further, let

{f^{(n)}}_{n = 1}^{\infty} \in F_{k}

. Then, it holds that

Pr [X^{n} = x^{n}, f^{(n)} (Y^{n})] = \prod_{j = 1}^{k} Pr [X^{n} (I_{j}) = x^{n} (I_{j}), f_{j}^{(n)} (y^{n})],

(86)

as well as

{ORD}_{X^{n} ∣ f^{(n)} (Y^{n})} (x^{n} ∣ b^{k}) \geq \prod_{j = 1}^{k} [{ORD}_{X^{n} (I_{j}) ∣ f_{j}^{(n)} (Y^{n})} (x^{n} (I_{j}) ∣ b_{j}) - 1] .

(87)

Hence,

\begin{matrix} G_{s} (X^{n} ∣ f^{(n)} (Y^{n})) & \geq \prod_{j = 1}^{k} [G_{s}^{(n)} (X^{n} (I_{j}) ∣ f_{j}^{(n)} (Y^{n})) - 1] \end{matrix}

(88)

and the stated bound is deduced after taking limits and normalizing by

G_{s} (X^{n}) ≐ \frac{2^{s n}}{s + 1}

. □

4. Guessing Ratio for a General Binary Input Channel

In this section, we consider the guessing ratio for general channels with a uniform binary input. The lower bound of Theorem 3 can be easily generalized to this case. To that end, consider the SDPI constant [38,39] of the reverse channel

(P_{Y}, P_{X | Y})

, given by

η (P_{Y}, P_{Y | X}) : = sup_{Q_{Y} : Q_{Y} \neq P_{Y}} \frac{D (Q_{X} | | P_{X})}{D (Q_{Y} | | P_{Y})},

(89)

where

Q_{X}

is the X-marginal of

Q_{Y} \circ P_{X | Y}

. As was shown in Reference ([40], Theorem 2), the SDPI constant of

(P_{Y}, P_{X | Y})

is also given by

η (P_{Y}, P_{Y | X}) = sup_{P_{W | Y} : W - Y - X, I (W; Y) > 0} \frac{I (W; X)}{I (W; Y)} .

(90)

Theorem 6.

We have

γ_{s, k} (P_{X}, P_{Y | X}) \geq e^{- 1} \cdot \frac{s^{s - 1} \cdot (s + 1)}{Γ^{s} (\frac{1}{s})} \cdot 2^{- s \cdot k \cdot η (P_{Y}, P_{X | Y})} .

(91)

Proof.

See Appendix A. □

Remark 3.

The bound for the BSC case (Theorem 3) is indeed a special case of Theorem 6 as the reverse BSC channel is also a BSC with uniform input and the same crossover probability. For BSCs, it is well known that the SDPI constant is

{(1 - 2 δ)}^{2}

([38], Theorem 9).

Next, we consider in more detail the case where the observation channel is a BEC. We restrict the discussion to the case of a single bit of help,

k = 1

.

4.1. Binary Erasure Channel

Suppose that

Y^{n} \in {0, 1, e}^{n}

is obtained from

X^{n}

by erasing each bit independently with probability

ϵ \in [0, 1]

. As before, Bob observes the channel output

Y^{n}

and can send one bit

f : {0, 1, e}^{n} \to {0, 1}

to Alice, who wishes to guess

X^{n}

. With a slight abuse of notation, the guessing ratio in Equation (5) will be denoted by

γ_{s} (ϵ)

.

To compute the lower bound of Theorem 6, we need to find the SDPI constant associated with the reverse channel, which is easily verified to be

P_{X | Y = y} (x) = \{\begin{matrix} 𝟙 (x = y), & y = 0 or y = 1 \\ Ber (1 / 2), & y = e, \end{matrix}

(92)

with an input distribution

P_{Y} = (\frac{1 - ϵ}{2}, ϵ, \frac{1 - ϵ}{2})

. Letting

Q_{Y} (y) = q_{y}

for

y \in {0, 1, e}

yields

Q_{X} (x) = q_{x} + \frac{q_{e}}{2}

for

x \in {0, 1}

. The computation of

η (P_{Y}, P_{X | Y})

is now a simple three-dimensional constrained optimization problem. We plotted the resulting lower bound for

s = 1

in Figure 4.

Let us now turn to upper bounds and focus for simplicity on the average guessing time, i.e., the guessing-moment for

s = 1

. To begin, let S represent the set of indices of the symbols that were not erased, i.e.,

i \in S

if and only if

Y_{i} \neq e

. Any function

f : {0, 1, e}^{n} \to {0, 1}

is then uniquely associated with a set of Boolean functions

{f_{S}}_{S \in [n]}

, where

f_{S} : {0, 1}^{| S |} \to {0, 1}

designates the operation of the function when S is the set of non-erased symbols. We also let

Pr (S) = {(1 - ϵ)}^{| S |} \cdot ϵ^{| S^{c} |}

be the probability that the non-erased symbols have index set S. Then, the joint probability distribution is given by

\begin{matrix} Pr (X^{n} = x^{n}, f (Y^{n}) = 1) & = Pr (X^{n} = x^{n}) \cdot Pr (f (Y^{n}) = 1 ∣ X^{n} = x^{n}) \end{matrix}

(93)

\begin{matrix} = 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \cdot Pr (f (Y^{n}) = 1 ∣ X^{n} = x^{n}, S) \end{matrix}

(94)

\begin{matrix} = 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \cdot f_{S} (x^{n}), \end{matrix}

(95)

and, similarly,

\begin{matrix} Pr (X^{n} = x^{n}, f (Y^{n}) = 0) & = 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \cdot (1 - f_{S} (x^{n})) \end{matrix}

(96)

\begin{matrix} = 2^{- n} - 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \cdot f_{S} (x^{n}) . \end{matrix}

(97)

In accordance with Proposition 1, the optimal guessing order given that

f (Y^{n}) = 0

is reversed to the optimal guessing order when

f (Y^{n}) = 1

. It is also apparent that the posterior probability is determined by a mixture of

2^{n}

different Boolean functions

{f_{S}}_{S \in [n]}

. This may be contrasted with the BSC case, in which the posterior is determined by a single Boolean function (though with noisy input).

A seemingly natural choice is a greedy dictator function, for which

f (Y^{n})

sends the first non-erased bit. Concretely, letting

k (y^{n}) : = \{\begin{matrix} n + 1, & y^{n} = e^{n} \\ min \{i : y_{i} \neq e\}, & otherwise \end{matrix},

(98)

the greedy dictator function is defined by

G-Dict (y^{n}) : = \{\begin{matrix} Ber (1 / 2), & y^{n} = e^{n} \\ y_{k (y^{n})}, & otherwise \end{matrix},

(99)

where

Ber (α)

is a Bernoulli r.v. of success probability

α

. From an analysis of the posterior probability, it is evident that, conditioned on

f (Y^{n}) = 0

, an optimal guessing order must satisfy that

x^{n}

is guessed before

z^{n}

whenever

\sum_{i = 1}^{n} ϵ^{i - 1} \cdot x_{i} \leq \sum_{i = 1}^{n} ϵ^{i - 1} \cdot z_{i},

(100)

(see Appendix A for a proof of Equation (100)). This rule can be loosely thought of as comparing the “base

1 / ϵ

expansion” of

x^{n}

and

z^{n}

. Furthermore, when

ϵ

is close to 1, then the optimal guessing order tends toward a minimum Hamming weight rule (or maximum Hamming weight in case

f = 1

).

The greedy dictator function is “locally optimal” when

ϵ \in [0, 1 / 2]

, in the following sense:

Proposition 3.

If

ϵ \in [0, 1 / 2]

, then an optimal guessing order conditioning on

G-Dict (Y^{n}) = 0

(resp.

G-Dict (Y^{n}) = 1

) is lexicographic (reverse lexicographic). Also, given lexicographic (resp. reverse lexicographic) order when the received bit is 0 (resp. 1), the optimal function f is a greedy dictator.

Proof.

See Appendix A. □

The guessing ratio of the greedy dictator function can be evaluated for

s = 1

, and the analysis leads to the following upper bound:

Theorem 7.

For

s = 1

, the guessing ratio is upper bounded as

γ_{1} (ϵ) \leq \frac{1}{2 - ϵ},

(101)

and the r.h.s. is achieved with equality by the greedy dictator function in Equation (99) for

ϵ \in [0, 1 / 2]

.

Proof.

See Appendix A. □

The upper bound of Theorem 7 is plotted in Figure 4. Based on Proposition 3 and numerical computations for moderate values of n, we conjecture:

Conjecture. 1.

Greedy dictator functions attain

γ_{s} (ϵ)

for the BEC.

Supporting evidence for this conjecture include the local optimality property stated in Proposition 3 (although there are other locally optimal choices) as well as the following heuristic argument: Intuitively, Bob should reveal as much as possible regarding the bits he has seen and as little as possible regarding the erasure pattern. So, it seems reasonable to find a smallest possible set of balanced functions from which to choose all the functions

f_{S}

, so that they coincide as much as possible. Greedy dictator is a greedy solution to this problem: it uses the function

x_{1}

for half of the erasure patterns, which is the maximum possible. Then, it uses the function

x_{2}

for half of the remaining patterns, and so on. Indeed, we were not able to find a better function than G-Dict for small values of n.

However, applying standard techniques in attempt to prove Conjecture 1 has not been fruitful. One possible technique is induction. For example, assume that the optimal functions for dimension

n - 1

are

f_{S}^{(n - 1)}

. Then, it might be perceived that there exists a bit, say

x_{1}

, such that the optimal functions for dimension n satisfy

f_{S}^{(n)} = f_{S}^{(n - 1)}

if

x_{1}

is erased; in that case, it remains only to determine

f_{S}^{(n)}

when

x_{1}

is not erased. However, observing Equation (95), it is apparent that the optimal choice of

f_{S}^{(n)}

should satisfy two contradicting goals—on the one hand, to match the order induced by

\sum_{S \subseteq [n] : 1 \notin S} Pr (S) \cdot f_{S} (x^{n})

(102)

and, on the other hand, to minimize the average guessing time of

\sum_{S \subseteq [n] : 1 \in S} Pr (S) \cdot f_{S} (x^{n}) .

(103)

It is easy to see that taking a greedy approach toward satisfying the second goal would result in

f_{S}^{(n)} (x^{n}) = x_{1}

if

1 \in S

and performing the recursion steps would indeed lead to a greedy dictator function. Interestingly, taking a greedy approach toward satisfying the first goal would also lead to a greedy dictator function, but one which operates on a cyclic permutation of the inputs (specifically, Equation (99) applied to

(y_{2}^{n}, y_{1})

). Nonetheless, it is not clear that choosing

{f_{S}^{(n)}}_{S : 1 \in S}

with some loss in the average guessing time induced by Equation (103) could not lead to a gain in the second goal (matching the order of Equation (102)), which outweighs that loss.

Another possible technique is majorization. It is known that, if one probability distribution majorizes another, then all the nonnegative guessing-moments of the first are no greater than the corresponding moments of the second ([29], Proposition 1). (The proof in Reference [29] is only for

s = 1

, but it is easily extended to the general

s > 0

case.) Hence, one approach toward identifying the optimal function could be to try and find a function in which induced posterior distributions majorize the corresponding posteriors that induces by any other functions with the same bias (it is of course not clear that such a function even exists). This approach unfortunately fails for the greedy dictator. For example, the posterior distributions induced by setting

f_{S}

to be majority functions are not always majorized by those induces by the greedy dictator (although they seem to be “almost” majorized) even though the average guessing time of greedy dictator is lower (this happens, e.g., for

n = 5

and

ϵ = 0.4

). In fact, the guessing moments for greedy dictator seem to be better than these of majority irrespective of the value of s.

Author Contributions

Conceptualization, N.W. and O.S.; Investigation, N.W. and O.S.; Methodology, N.W. and O.S.; Writing—original draft, N.W. and O.S.; Writing—review and editing, N.W. and O.S. Both authors contributed equally to the research work and to the writing process of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an ERC grant no. 639573. The research of N. Weinberger was partially supported by the MIT-Technion fellowship and the Viterbi scholarship, Technion.

Acknowledgments

We are very grateful to Amos Lapidoth and Robert Graczyk for discussing their recent work on guessing with a helper [19,20] during the second author’s visit to ETH, which provided the impetus for this work. We also thank the anonymous reviewer for helping us clarify the connection between the guessing moments and large deviation principle of the normalized logarithm of the guessing time.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BEC	binary erasure channel
BSC	binary symmetric channel
i.i.d.	independent and identically distributed
r.h.s.	right-hand side
r.v.	random variable
SDPI	strong data-processing inequality
w.l.o.g.	without loss of generality

Appendix A. Miscellaneous Proofs

Proof of Proposition 1.

The claim that random functions do not improve beyond deterministic ones follows directly from that property that conditioning reduces guessing-moment ([1], Corollary 1). Monotonicity follows from the fact that Bob can always simulate a noisier channel. Now, if

δ = 1 / 2

, then

X^{n}

and

Y^{n}

are independent and

G_{s} (X^{n} ∣ f (Y^{n})) = G_{s} (X^{n}) ≐ \frac{2^{s n}}{s + 1}

for any f (Lemma 1). For

δ = 0

, let

γ_{s, k}^{(n)} (δ) : = min_{f : {0, 1}^{n} \to {0, 1}^{k}} \frac{G_{s} (X^{n} ∣ f (Y^{n}))}{G_{s} (X^{n})},

(A1)

and let

{f_{n, k}^{*}}_{n = 1}^{\infty}

be a sequence of functions such that

f_{n, k}^{*}

achieves

γ_{s, k}^{(n)} (δ)

. We show that

f_{n, k}^{*}

must satisfy

Pr [f (Y^{n}) = b^{k}] = 2^{- k}

for all

b^{k} \in {0, 1}^{k}

. If we denote

B^{k} = f_{n, k}^{*} (Y^{n})

, then this is equivalent to showing that

Pr [B_{l} = 1 ∣ B_{1} = b_{1}, \dots, B_{l - 1} = b_{l - 1}, B_{l} = b_{l}, \dots B_{k} = b_{k}] = 1 / 2

for all

l \in [k]

and

(b_{1}, \dots, b_{l - 1}, b_{l}, \dots, b_{k}) \in {0, 1}^{k - 1}

. Assume towards contradiction that the optimal function does not satisfy this property for, say,

l = k

. Let us denote

Pr (f_{n, k}^{*} (Y^{n}) = b^{k}) : = q (b^{k})

and assume w.l.o.g. that

q (b^{k - 1}, 0) > q (b^{k - 1}, 1)

for all

b^{k - 1} \in {0, 1}^{k - 1}

(for notational simplicity). Further, let

\bar{q} (b^{k - 1}) : = \frac{1}{2} [q (b^{k - 1}, 0) + q (b^{k - 1}, 1)]

. Then,

\begin{array}{l} G_{s} (X^{n} ∣ f_{n, k}^{*} (Y^{n})) \\ = \sum_{b^{k - 1} \in {0, 1}^{k - 1}} q (b^{k - 1}, 0) \cdot G_{s} (X^{n} ∣ f_{n, k}^{*} (Y^{n}) = (b^{k - 1}, 0)) + q (b^{k - 1}, 1) \cdot G_{s} (X^{n} ∣ f_{n, k}^{*} (Y^{n}) = (b^{k - 1}, 1)) \\ = \sum_{b^{k - 1} \in {0, 1}^{k - 1}} q (b^{k - 1}, 0) \cdot K_{s} (q (b^{k - 1}, 0) \cdot 2^{n}) + q (b^{k - 1}, 1) \cdot K_{s} (q (b^{k - 1}, 1) \cdot 2^{n}) \\ = 2^{- n} \sum_{b^{k - 1} \in {0, 1}^{k - 1}} (\sum_{i = 1}^{q (b^{k - 1}, 0) \cdot 2^{n}} i^{s} + \sum_{i = 1}^{q (b^{k - 1}, 1) \cdot 2^{n}} i^{s}) \\ = 2^{- n} \sum_{b^{k - 1} \in {0, 1}^{k - 1}} (\sum_{i = 1}^{\bar{q} (b^{k - 1}) \cdot 2^{n}} i^{s} + \sum_{i = \bar{q} (b^{k - 1}) + 1}^{q (b^{k - 1}, 0) \cdot 2^{n}} i^{s} + \sum_{i = 1}^{\bar{q} (b^{k - 1}) \cdot 2^{n}} i^{s} - \sum_{i = q (b^{k - 1}, 1)}^{\bar{q} (b^{k - 1}) \cdot 2^{n}} i^{s}) \\ > 2^{- (n - 1)} \sum_{b^{k - 1} \in {0, 1}^{k - 1}} \sum_{i = 1}^{\bar{q} (b^{k - 1}) \cdot 2^{n}} i^{s} . \end{array}

As equality can be achieved if we modify

f_{n, k}^{*}

to satisfy

q (b^{k - 1}, 0) = q (b^{k - 1}, 1)

for all

b^{k - 1} \in {0, 1}^{k - 1}

, this contradicts the assumed optimality of

f_{n, k}^{*}

. The minimal

G_{s} (X^{n} ∣ f (Y^{n}))

is thus obtained by any function for which

f (Y^{n}) \in {0, 1}^{k}

is a uniform Bernoulli vector and equals to

K_{s} (2^{n - k}) ≐ \frac{2^{s (n - k)}}{s + 1}

(Lemma 1).

To prove that the limit in Equation (5) exists, we note that

\begin{matrix} G_{s} (X^{n + 1}) & = 2^{- (n + 1)} \sum_{i = 1}^{2^{n + 1}} i^{s} \end{matrix}

(A2)

\begin{matrix} = 2^{- (n + 1)} \sum_{i = 1}^{2^{n}} {(2 i - 1)}^{s} + {(2 i)}^{s} \end{matrix}

(A3)

\begin{matrix} \geq 2^{s} \cdot 2^{- n} \sum_{i = 1}^{2^{n}} {(i - 1)}^{s} \end{matrix}

(A4)

\begin{matrix} = ℓ_{n} \cdot 2^{s} \cdot 2^{- n} \sum_{i = 1}^{2^{n}} i^{s} \end{matrix}

(A5)

\begin{matrix} = ℓ_{n} \cdot 2^{s} \cdot G_{s} (X^{n}), \end{matrix}

(A6)

where

ℓ_{n} : = \frac{\sum_{i = 1}^{2^{n}} {(i - 1)}^{s}}{\sum_{i = 1}^{2^{n}} i^{s}} .

(A7)

As before, let

{f_{n, k}^{*}}_{n = 1}^{\infty}

be a sequence of functions such that

f_{n, k}^{*}

achieves

γ_{s, k}^{(n)} (δ)

. Denote the order induced by the posterior

Pr (X^{n} = x^{n} ∣ f_{n, k}^{*} (Y^{n}) = b^{k})

as

{ORD}_{b^{k}, n, n}

,

b^{k} \in {0, 1}^{k}

and the order induced by

Pr (X^{n + 1} = x^{n + 1} ∣ f_{n}^{*} (Y^{n}) = b^{k})

as

{ORD}_{b^{k}, n, n + 1}

. As before (when breaking ties arbitrarily)

{ORD}_{b^{k}, n, n + 1} (x^{n}, 0) = 2 {ORD}_{b^{k}, n, n} (x^{n})

(A8)

and

{ORD}_{b^{k}, n, n + 1} (x^{n}, 1) = 2 {ORD}_{b^{k}, n, n} (x^{n}) - 1 \leq 2 {ORD}_{b^{k}, n, n} (x^{n}) .

(A9)

Thus,

\begin{matrix} G_{s} (X^{n + 1} ∣ f_{n + 1, k}^{*} (Y^{n + 1})) \end{matrix}

\begin{matrix} \leq G_{s} (X^{n + 1} ∣ f_{n, k}^{*} (Y^{n})) \end{matrix}

(A10)

\begin{matrix} = \sum_{b^{k} \in {0, 1}^{k}} Pr (f_{n, k}^{*} (Y^{n + 1}) = b^{k}) \cdot G_{s} (X^{n + 1} ∣ f_{n, k}^{*} (Y^{n}) = b^{k}) \\ \leq \sum_{b^{k} \in {0, 1}^{k}} \sum_{x^{n + 1}} Pr (X^{n + 1} = x^{n + 1}, f_{n, k}^{*} (Y^{n}) = b^{k}) \cdot {ORD}_{b^{k}, n, n + 1}^{s} (x^{n + 1}) \end{matrix}

(A11)

\begin{matrix} \leq 2^{s} \cdot \sum_{b^{k} \in {0, 1}^{k}} \sum_{x^{n}} Pr (X^{n} = x^{n}, f_{n, k}^{*} (Y^{n}) = b^{k}) \cdot {ORD}_{b^{k}, n, n}^{s} (x^{n}) \end{matrix}

(A12)

\begin{matrix} = 2^{s} \cdot G_{s} (X^{n} ∣ f_{n, k}^{*} (Y^{n})) . \end{matrix}

(A13)

Hence,

γ_{s, k}^{(n + 1)} (δ) \leq ℓ_{n}^{- 1} \cdot γ_{s, k}^{(n)} (δ) .

(A14)

To continue, we further analyze

ℓ_{n}

. The summation in the numerator of Equation (A7) may be started from from

i = 2

, and so Equations (A31) and (A33) (proof of Lemma 1 below) imply that

\begin{matrix} 1 & \geq ℓ_{n} \end{matrix}

(A15)

\begin{matrix} \geq \frac{\frac{1}{s + 1} \cdot \frac{2^{n (s + 1)} - 1}{2^{n} - 1}}{\frac{1}{s + 1} \cdot \frac{{(2^{n} + 1)}^{s + 1} - 1}{2^{n}}} \end{matrix}

(A16)

\begin{matrix} \geq \frac{2^{n (s + 1)} - 1}{{(2^{n} + 1)}^{s + 1}} \end{matrix}

(A17)

\begin{matrix} = {(\frac{2^{n}}{2^{n} + 1})}^{s + 1} - \frac{1}{2^{n (s + 1)}} \end{matrix}

(A18)

\begin{matrix} = {(1 + \frac{1}{2^{n}})}^{- (s + 1)} - \frac{1}{2^{n (s + 1)}} \end{matrix}

(A19)

\begin{matrix} = 1 - \frac{(s + 1)}{2^{n}} + O (\frac{1}{2^{2 n}}) - \frac{1}{2^{n (s + 1)}} \end{matrix}

(A20)

\begin{matrix} = 1 - \frac{(s + 1)}{2^{n}} + O (\frac{1}{2^{n \cdot min {1 + s, 2}}}) . \end{matrix}

(A21)

Thus, there exists

c, C > 0

such that

\begin{matrix} log \prod_{n = 1}^{\infty} ℓ_{n}^{- 1} & = \sum_{n = 1}^{\infty} log ℓ_{n}^{- 1} \end{matrix}

(A22)

\begin{matrix} \leq - \sum_{n = 1}^{\infty} log [1 - \frac{c}{2^{n}}] \end{matrix}

(A23)

\begin{matrix} \leq C + \sum_{n = 1}^{\infty} \frac{c}{2^{n}} + O (\frac{1}{2^{2 n}}) \end{matrix}

(A24)

\begin{matrix} < \infty, \end{matrix}

(A25)

and consequently,

d_{n} : = \prod_{j = n}^{\infty} ℓ_{j}^{- 1} \to 1

(A26)

as

n \to \infty

. Hence, Equation (A14) implies that

e_{n} : = d_{n} \cdot γ_{s}^{(n)} (δ)

(A27)

is a non-increasing sequence which is bounded below by 0 and, thus, has a limit. Since

d_{n} \to 1

as

n \to \infty

,

γ_{s}^{(n)} (δ)

also has a limit.

We finally show the reverse ordering property for

k = 1

. The guessing order given that

f (Y^{n}) = 1

is determined by ordering

Pr (X^{n} = x^{n} ∣ f (Y^{n}) = 1) = \frac{Pr (X^{n} = x^{n}) \cdot Pr (f (Y^{n}) = 1 ∣ X^{n} = x^{n})}{Pr (f (Y^{n}) = 1)},

(A28)

or equivalently, by ordering

Pr (f (Y^{n}) = 1 ∣ X^{n} = x^{n})

. It then follows that the order, given that

f (Y^{n}) = 0

, is reversed compared to the order given that

f (Y^{n}) = 1

since

Pr (f (Y^{n}) = 0 ∣ X^{n} = x^{n}) + Pr (f (Y^{n}) = 1 ∣ X^{n} = x^{n}) = 1 .

(A29)

□

Proof of Lemma 1.

The monotonicity of

i^{s}

and standard bounds on sums using integrals lead to the bounds

\begin{matrix} K_{s} (a, b) & \leq \int_{a + 1}^{b + 1} \frac{t^{s}}{b - a} \cdot d t \end{matrix}

(A30)

\begin{matrix} = \frac{1}{s + 1} \cdot \frac{{(b + 1)}^{s + 1} - {(a + 1)}^{s + 1}}{b - a} \end{matrix}

(A31)

and

\begin{matrix} K_{s} (a, b) & \geq \int_{a}^{b} \frac{t^{s}}{b - a} \cdot d t \end{matrix}

(A32)

\begin{matrix} = \frac{1}{s + 1} \cdot \frac{b^{s + 1} - a^{s + 1}}{b - a} . \end{matrix}

(A33)

The ratio between the upper and lower bound is

κ_{s} (a, b) : = \frac{{(b + 1)}^{s + 1} - {(a + 1)}^{s + 1}}{b^{s + 1} - a^{s + 1}}

(A34)

which satisfies

κ_{s} (a_{n}, b_{n}) \to 1

given the premise of the lemma. □

Proof of Equation (61).

Denote by

f_{n}^{*}

a function which achieves the minimal guessing ratio in Equation (5). Then, it holds that

G_{s} (X^{n} ∣ f^{*} (Y^{n}) = b^{k})

is a monotonic non-increasing function of n. To see this, suppose that

f_{n + 1}^{*}

is an optimal function for

n + 1

. This function

f_{n + 1}^{*}

can be used for guessing

X^{n}

on the basis of k bit of help computed from

Y^{n}

as follows: Given

Y^{n}

, the helper randomly generates

Y_{n + 1} \sim P_{Y | X} (\cdot | 0)

, computes

b^{k} = f_{n + 1}^{*} (Y^{n + 1})

, and send these bits to the guesser. The guesser of

X^{n}

then uses the bits

b^{k}

to guess

X^{n}

, and the resulting conditional guessing moment is

G_{s} (X^{n + 1} ∣ f_{n + 1}^{*} (Y^{n + 1}) = b^{k}, X_{n + 1} = 0)

, which is less than

G_{s} (X^{n + 1} ∣ f_{n + 1}^{*} (Y^{n + 1}) = b^{k})

since conditioning reduces guessing moments. Thus, the optimal function

f_{n}^{*}

can only achieve lower guessing moments, which implies the desired monotonicity property. For brevity, we henceforth simply write the optimal function as f (with dimension and optimality being implicit).

Define the set

B_{k} : = {b^{k} \in {0, 1}^{k} : sup_{n} G_{s} (X^{n} ∣ f (Y^{n}) = b^{k}) = \infty},

(A35)

to wit, the set of k-tuples such that the conditional guessing moment grows without bound when conditioned on that k-tuple. By the law of total expectation

\begin{matrix} G_{s} (X^{n} ∣ f (Y^{n})) & = \sum_{b^{k} \in B_{k}} Pr (f (Y^{n}) = b^{k}) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = b^{k}) \end{matrix}

\begin{matrix} + \sum_{b^{k} \in {0, 1}^{k} \ B_{k}} Pr (f (Y^{n}) = b^{k}) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = b^{k}) \end{matrix}

(A36)

\begin{matrix} = : G_{n}^{(1)} + G_{n}^{(2)} . \end{matrix}

(A37)

So, since

G_{s} (X^{n} ∣ f (Y^{n}))

grows without bound as a function of n, it must hold that

B_{k}

is not empty and that there exists

ℓ_{n}

such that

G_{s} (X^{n} ∣ f (Y^{n})) = ℓ_{n} G_{n}^{(1)}

, where

ℓ_{n} \to 1

as

n \to \infty

. Let

η > 0

be given. The monotonicity property previously established and Equation (60) imply that there exists

n_{0} (η)

such that for all

n \geq n_{0} (η)

both

G_{s} (X^{n} ∣ f (Y^{n}) = b^{k}) \geq (1 - η) \cdot Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})}

(A38)

and

Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})} \geq (1 - η) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = b^{k})

(A39)

hold for any

b^{k} \in B_{k}

. Thus, also

G_{s} (X^{n} ∣ f (Y^{n})) \geq ℓ_{n} (1 - η) \sum_{b^{k} \in B_{k}} Pr (f (Y^{n}) = b^{k}) \cdot Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})}

(A40)

and

\sum_{b^{k} \in B_{k}} Pr (f (Y^{n}) = b^{k}) \cdot Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})} \geq \sum_{b^{k} \in B_{k}} Pr (f (Y^{n}) = b^{k}) (1 - η) \cdot G_{s} (X^{n} ∣ f (Y^{n}) = b^{k})

(A41)

hold, and the last equation implies that the term on its left-hand side is unbounded. Moreover, Equation (60) and the sentence that follows it both imply that, if

G_{s} (X^{n} ∣ f (Y^{n}) = b^{k})

is bounded, then

H (X^{n} ∣ f (Y^{n}) = b^{k})

is bounded too. Thus, there exists

k_{n}

which satisfies

k_{n} \to 1

as

n \to \infty

such that

\sum_{b^{k} \in B_{k}} Pr (f (Y^{n}) = b^{k}) \cdot Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})} = k_{n} \cdot \sum_{b^{k} \in {0, 1}^{n}} Pr (f (Y^{n}) = b^{k}) \cdot Ψ_{s} \cdot 2^{s H (X^{n} ∣ f (Y^{n}) = b^{k})} .

(A42)

Combining Equation (A40) with the last equation and noting that

η > 0

is arbitrary completes the proof. □

Proof of Theorem 6.

The proof follows the same lines as the proof of Theorem 3 up to Equation (62), yielding

G_{s} (X^{n} ∣ f (Y^{n})) \geq k_{n} \cdot Ψ_{s} \cdot 2^{s [n - I (X^{n}; f (Y^{n}))]} .

(A43)

Now, let

W^{(n)}

be such that

X^{n} - Y^{n} - W^{(n)}

forms a Markov chain. Then,

\begin{matrix} sup_{f : Y^{n} \to {0, 1}} \frac{I (X^{n}; f (Y^{n}))}{I (Y^{n}; f (Y^{n}))} & \leq sup_{P_{W^{(n)} | Y^{n}}} \frac{I (X^{n}; W^{(n)})}{I (Y^{n}; W^{(n)})} \end{matrix}

(A44)

\begin{matrix} = η (P_{Y^{n}}, P_{X^{n} | Y^{n}}) \end{matrix}

(A45)

\begin{matrix} = η (P_{Y}, P_{X | Y}), \end{matrix}

(A46)

where Equation (A46) follows since the SDPI constant tensorizes (see Reference [40] for an argument obtained by relating the SDPI constant to the hypercontractivity parameter or its extended version, Reference ([40], p. 5), for a direct proof). Thus, for all f,

\begin{matrix} I (X^{n}; f (Y^{n})) & \leq η (P_{Y}, P_{X | Y}) \cdot I (Y^{n}; f (Y^{n})) \end{matrix}

(A47)

\begin{matrix} \leq η (P_{Y}, P_{X | Y}) \cdot H (f (Y^{n})) \end{matrix}

(A48)

\begin{matrix} \leq η (P_{Y}, P_{X | Y}) \cdot k . \end{matrix}

(A49)

Inserting Equation (A49) into Equation (A43) yields

G_{s} (X^{n} ∣ f (Y^{n})) \geq k_{n} \cdot Ψ_{s} \cdot 2^{s [n - k \cdot η (P_{Y}, P_{X | Y})]},

(A50)

and substituting this in the definition of the guessing ratio of Equation (5) completes the proof. □

Proof of Equation (100).

Let us evaluate the posterior probability conditioned on

G-Dict (Y^{n}) = 0

. Since G-Dict is balanced, Bayes law implies that

\begin{matrix} Pr (X^{n} = x^{n} ∣ G-Dict (Y^{n}) = 0) \end{matrix}

\begin{matrix} = 2^{- (n - 1)} \cdot Pr (G-Dict (Y^{n}) = 0 ∣ X^{n} = x^{n}) \end{matrix}

(A51)

\begin{matrix} = 2^{- (n - 1)} \cdot \sum_{i = 1}^{n + 1} Pr (k (y^{n}) = i ∣ X^{n} = x^{n}) \cdot Pr (G-Dict (Y^{n}) = 0 ∣ X^{n} = x^{n}, k (y^{n}) = i) \end{matrix}

(A52)

\begin{matrix} = 2^{- (n - 1)} \cdot \{\sum_{i = 1}^{n} (1 - ϵ) ϵ^{i - 1} \cdot 𝟙 \{x_{i} = 0\} + \frac{1}{2} ϵ^{n}\} . \end{matrix}

(A53)

This immediately leads to the guessing rule in Equation (100). From Proposition 1, the guessing rule for

G-Dict (Y^{n}) = 1

is on reverse order. □

Proof of Proposition 3.

We denote the lexicographic order by

{ORD}_{lex}

. Assume that

G-Dict (Y^{n}) = 0

and that

{ORD}_{lex} (x^{n}) \leq {ORD}_{lex} (z^{n})

. Then, there exists

j \in [n]

such that

x^{j - 1} = z^{j - 1}

(where

x^{0}

is the empty string) and

x_{j} = 0 < z_{j} = 1

. Then,

\begin{matrix} Pr (X^{n} = x^{n} ∣ G-Dict (Y^{n}) = 0) - Pr (X^{n} = z^{n} ∣ G-Dict (Y^{n}) = 0) \end{matrix}

\begin{matrix} = ϵ^{j - 1} + \sum_{i = j + 1}^{n} ϵ^{i - 1} \cdot (z_{i} - x_{i}) \end{matrix}

(A54)

\begin{matrix} \geq ϵ^{j - 1} - \sum_{i = j + 1}^{n} ϵ^{i - 1} \end{matrix}

(A55)

\begin{matrix} = \frac{ϵ^{j - 1}}{1 - ϵ} (1 - 2 ϵ + ϵ^{n - j + 1}) \end{matrix}

(A56)

\begin{matrix} \geq 0 . \end{matrix}

(A57)

This proves the first statement of the proposition. Now, let

{ORD}_{0}

(

{ORD}_{1}

) be the guessing order given that the received bit is 0 (resp. 1), and let

{f_{S}}

be the Boolean functions (which are not necessarily optimal). Then, from Equations (97) and (95)

\begin{matrix} G_{1} (X^{n} ∣ f (Y^{n})) \end{matrix}

\begin{matrix} = \sum_{x^{n}} Pr (X^{n} = x^{n}, f (Y^{n}) = 0) \cdot {ORD}_{0} (x^{n}) + Pr (X^{n} = x^{n}, f (Y^{n}) = 1) \cdot {ORD}_{1} (x^{n}) \end{matrix}

(A58)

\begin{matrix} = 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \sum_{x^{n}} [(1 - f_{S} (x^{n})) \cdot {ORD}_{0} (x^{n}) + f_{S} (x^{n}) \cdot {ORD}_{1} (x^{n})] \end{matrix}

(A59)

\begin{matrix} = 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \sum_{x^{S}} [(1 - f_{S} (x^{n})) \cdot {PORD}_{0} (x^{S} | | S) + f_{S} (x^{n}) \cdot {PORD}_{1} (x^{S} | | S)] \end{matrix}

(A60)

\begin{matrix} \geq 2^{- n} \cdot \sum_{S \subseteq [n]} Pr (S) \sum_{x^{n}} min \{{PORD}_{0} (x^{S} | | S), {PORD}_{1} (x^{S} | | S)\}, \end{matrix}

(A61)

where for

b \in {0, 1}

, the projected orders are defined as

{PORD}_{b} (x^{S} | | S) : = \sum_{x^{(S^{c})}} {ORD}_{b} (x^{n}) .

(A62)

It is easy to verify that, if

{ORD}_{0}

(

{ORD}_{1}

) is the lexicographic (resp. revered lexicographic) order, then the greedy dictator achieves Equation (A61) with equality due to the following simple property: If

{ORD}_{lex} (x^{n}) < {ORD}_{lex} (z^{n})

, then

\sum_{x^{(S^{c})}} {ORD}_{lex} (x^{n}) \leq \sum_{x^{(S^{c})}} {ORD}_{lex} (z^{n})

(A63)

for all

S \in [n]

. This can be proved by induction over n. For

n = 1

, the claim is easily asserted. Suppose it holds for

n - 1

, let us verify it for n. If

1 \in S

, then whenever

{ORD}_{lex} (x^{n}) < {ORD}_{lex} (z^{n})

\begin{matrix} \sum_{x^{(S^{c})}} {ORD}_{lex} (x^{n}) & = \sum_{x^{(S^{c})}} {ORD}_{lex} (x_{1}, x_{2}^{n}) \end{matrix}

(A64)

\begin{matrix} = x_{1} \cdot 2^{n - 1} + \sum_{x^{(S^{c})}} {ORD}_{lex} (x_{2}^{n}) \end{matrix}

(A65)

\begin{matrix} \leq z_{1} \cdot 2^{n - 1} + \sum_{z^{(S^{c})}} {ORD}_{lex} (z_{2}^{n}) \end{matrix}

(A66)

\begin{matrix} = \sum_{z^{(S^{c})}} {ORD}_{lex} (z^{n}) \end{matrix}

(A67)

where the inequality follows from the induction assumption and since

x_{1} \leq z_{1}

. If

1 \notin S

then, similarly,

\begin{matrix} \sum_{x^{(S^{c})}} {ORD}_{lex} (x^{n}) & = \sum_{x^{(S^{c} \ {1})}} [2^{n - 1} + 2 \cdot {ORD}_{lex} (x_{2}^{n})] \end{matrix}

(A68)

\begin{matrix} \leq \sum_{z^{(S^{c} \ {1})}} [2^{n - 1} + 2 \cdot {ORD}_{lex} (z_{2}^{n})] \end{matrix}

(A69)

\begin{matrix} = \sum_{z^{(S^{c})}} {ORD}_{lex} (z^{n}) . \end{matrix}

(A70)

□

Proof of Theorem 7.

We denote the lexicographic order by

{ORD}_{lex}

. Then,

\begin{matrix} G_{1} (X^{n} ∣ G-Dict (Y^{n})) & = G_{1} (X^{n} ∣ G-Dict (Y^{n}) = 0) \end{matrix}

(A71)

\begin{matrix} \leq \sum_{x^{n}} Pr (X^{n} = x^{n} ∣ G-Dict (Y^{n}) = 0) \cdot {ORD}_{lex} (x^{n}) \end{matrix}

(A72)

\begin{matrix} = 2^{- (n - 1)} \cdot \sum_{x^{n}} \sum_{i = 1}^{n} (1 - ϵ) ϵ^{i - 1} \cdot 𝟙 \{x_{i} = 0\} \cdot {ORD}_{lex} (x^{n}) + ϵ^{n} K_{1} (2^{n}) \end{matrix}

(A73)

\begin{matrix} = 2^{- (n - 1)} \cdot (1 - ϵ) \sum_{i = 1}^{n} ϵ^{i - 1} \cdot \sum_{x^{n}} 𝟙 \{x_{i} = 0\} \cdot {ORD}_{lex} (x^{n}) + ϵ^{n} K_{1} (2^{n}) \end{matrix}

(A74)

\begin{matrix} = (1 - ϵ) J_{n} + ϵ^{n} K_{1} (2^{n}), \end{matrix}

(A75)

where

J_{1} : = \frac{1}{2}

and for

n \geq 2

\begin{matrix} J_{n} & : = 2^{- (n - 1)} \cdot \sum_{i = 1}^{n} ϵ^{i - 1} \cdot \sum_{x^{n}} 𝟙 \{x_{i} = 0\} \cdot {ORD}_{lex} (x^{n}) \end{matrix}

(A76)

\begin{matrix} = 2^{- (n - 1)} \sum_{x^{n}} 𝟙 \{x_{i} = 0\} \cdot {ORD}_{lex} (x^{n}) + 2^{- (n - 1)} \sum_{i = 2}^{n} ϵ^{i - 1} \cdot \sum_{x^{n}} 𝟙 \{x_{i} = 0\} \cdot {ORD}_{lex} (x^{n}) \\ = K_{1} (2^{n - 1}) \end{matrix}

(A77)

\begin{matrix} + 2^{- (n - 1)} \sum_{i = 2}^{n} ϵ^{i - 1} \cdot \sum_{x_{2}^{n}} [𝟙 \{x_{1} = 0, x_{i} = 0\} \cdot {ORD}_{lex} (0, x_{2}^{n}) + 𝟙 \{x_{1} = 1, x_{i} = 0\} \cdot {ORD}_{lex} (1, x_{2}^{n})] \\ = K_{1} (2^{n - 1}) + 2^{- (n - 1)} ϵ \sum_{i = 1}^{n - 1} ϵ^{i - 1} \cdot \sum_{x^{n - 1}} 𝟙 \{x_{i} = 0\} {ORD}_{lex} (x^{n - 1}) \end{matrix}

(A78)

\begin{matrix} + 2^{- (n - 1)} ϵ \sum_{i = 1}^{n - 1} ϵ^{i - 1} \cdot \sum_{x^{n - 1}} 𝟙 \{x_{i} = 0\} [2^{n - 1} + {ORD}_{lex} (x^{n - 1})] \end{matrix}

(A79)

\begin{matrix} = K_{1} (2^{n - 1}) + ϵ J_{n - 1} + \sum_{i = 1}^{n - 1} ϵ^{i} \cdot \sum_{x^{n - 1}} 𝟙 \{x_{i} = 0\} \end{matrix}

(A80)

\begin{matrix} = K_{1} (2^{n - 1}) + ϵ J_{n - 1} + 2^{n - 2} \cdot \frac{ϵ - ϵ^{n}}{1 - ϵ} . \end{matrix}

(A81)

So,

\begin{matrix} J_{n} & = K_{1} (2^{n - 1}) + ϵ [K_{1} (2^{n - 2}) + ϵ J_{n - 2} + 2^{n - 3} \cdot \frac{ϵ - ϵ^{n - 1}}{1 - ϵ}] + 2^{n - 2} \cdot \frac{ϵ - ϵ^{n}}{1 - ϵ} \end{matrix}

(A82)

\begin{matrix} = K_{1} (2^{n - 1}) + ϵ K_{1} (2^{n - 2}) + ϵ^{2} J_{n - 2} + 2^{n - 3} \cdot \frac{ϵ^{2} - ϵ^{n}}{1 - ϵ} + 2^{n - 2} \cdot \frac{ϵ - ϵ^{n}}{1 - ϵ} \end{matrix}

(A83)

\begin{matrix} = \sum_{i = 1}^{n} ϵ^{i - 1} K_{1} (2^{n - i}) + \frac{1}{1 - ϵ} \sum_{i = 1}^{n} 2^{i - 2} \cdot (ϵ^{n - i + 1} - ϵ^{n}) . \end{matrix}

(A84)

Hence,

G_{1} (X^{n} ∣ G-Dict (Y^{n})) \leq (1 - ϵ) \sum_{i = 1}^{n} ϵ^{i - 1} K_{1} (2^{n - i}) + \sum_{i = 1}^{n} 2^{i - 2} \cdot (ϵ^{n - i + 1} - ϵ^{n}) + ϵ^{n} K_{1} (2^{n}) .

(A85)

Noting that

K_{1} (M) = \frac{M + 1}{2}

, we get

\begin{matrix} G_{1} (X^{n} ∣ G-Dict (Y^{n})) \end{matrix}

\begin{matrix} \leq 2^{n - 1} \frac{(1 - ϵ)}{ϵ} \sum_{i = 1}^{n} {(\frac{ϵ}{2})}^{i} + \frac{(1 - ϵ) (1 - ϵ^{n})}{2 ϵ} + \frac{1}{4} \sum_{i = 1}^{n} {(\frac{2}{ϵ})}^{i} \cdot ϵ^{n + 1} - \frac{1}{2} (2^{n} - 1) ϵ^{n} + 2^{n - 1} ϵ^{n} + \frac{ϵ^{n}}{2} \end{matrix}

(A86)

\begin{matrix} = \frac{1}{2 - ϵ} (2^{n - 1} + \frac{ϵ^{n}}{2} (1 - ϵ)) + \frac{(1 - ϵ) (1 - ϵ^{n})}{2 ϵ} \end{matrix}

(A87)

\begin{matrix} ≐ \frac{2^{n - 1}}{2 - ϵ} . \end{matrix}

(A88)

□

References

Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef]
Arikan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef]
Merhav, N.; Arikan, E. The Shannon cipher system with a guessing wiretapper. IEEE Trans. Inf. Theory 1999, 45, 1860–1866. [Google Scholar] [CrossRef]
Hayashi, Y.; Yamamoto, H. Coding theorems for the Shannon cipher system with a guessing wiretapper and correlated source outputs. IEEE Trans. Inf. Theory 2008, 54, 2808–2817. [Google Scholar] [CrossRef][Green Version]
Hanawal, M.K.; Sundaresan, R. The Shannon cipher system with a guessing wiretapper: General sources. IEEE Trans. Inf. Theory 2011, 57, 2503–2516. [Google Scholar] [CrossRef]
Christiansen, M.M.; Duffy, K.R.; du Pin Calmon, F.; Médard, M. Multi-user guesswork and brute force security. IEEE Trans. Inf. Theory 2015, 61, 6876–6886. [Google Scholar] [CrossRef]
Yona, Y.; Diggavi, S. The effect of bias on the guesswork of hash functions. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2248–2252. [Google Scholar]
Massey, J.L. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
Arikan, E. Large deviations of probability rank. In Proceedings of the 2000 IEEE International Symposium on Information Theory, Washington, DC, USA, 25–30 June 2000; p. 27. [Google Scholar]
Christiansen, M.M.; Duffy, K.R. Guesswork, large deviations, and Shannon entropy. IEEE Trans. Inf. Theory 2012, 59, 796–802. [Google Scholar] [CrossRef]
Pfister, C.E.; Sullivan, W.G. Rényi entropy, guesswork moments, and large deviations. IEEE Trans. Inf. Theory 2004, 50, 2794–2800. [Google Scholar] [CrossRef]
Hanawal, M.K.; Sundaresan, R. Guessing revisited: A large deviations approach. IEEE Trans. Inf. Theory 2011, 57, 70–78. [Google Scholar] [CrossRef]
Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef]
Serdar, B. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar]
Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef]
Sason, I. Tight bounds on the Rényi entropy via majorization with applications to guessing and compression. Entropy 2018, 20, 896. [Google Scholar] [CrossRef]
Wyner, A. A theorem on the entropy of certain binary sequences and applications—II. IEEE Trans. Inf. Theory 1973, 19, 772–777. [Google Scholar] [CrossRef]
Ahlswede, R.; Körner, J. Source coding with side information and a converse for degraded broadcast channels. IEEE Trans. Inf. Theory 1975, 21, 629–637. [Google Scholar] [CrossRef]
Graczyk, R.; Lapidoth, A. Variations on the guessing problem. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 231–235. [Google Scholar]
Graczyk, R. Guessing with a Helper. Master’s Thesis, ETH Zurich, Zürich, Switzerland, 2017. [Google Scholar]
O’Donnell, R. Analysis of Boolean Functions; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Courtade, T.A.; Kumar, G.R. Which Boolean functions maximize mutual information on noisy inputs? IEEE Trans. Inf. Theory 2014, 60, 4515–4525. [Google Scholar] [CrossRef]
Ordentlich, O.; Shayevitz, O.; Weinstein, O. An improved upper bound for the most informative Boolean function conjecture. In Proceedings of the 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, 10–15 July 2016; pp. 500–504. [Google Scholar]
Samorodnitsky, A. On the entropy of a noisy function. IEEE Trans. Inf. Theory 2016, 62, 5446–5464. [Google Scholar] [CrossRef]
Kindler, G.; O’Donnell, R.; Witmer, D. Continuous Analogues of the most Informative Function Problem. Available online: http://arxiv.org/pdf/1506.03167.pdf (accessed on 26 December 2015).
Li, J.; Médard, M. Boolean functions: Noise stability, non-interactive correlation, and mutual information. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 266–270. [Google Scholar]
Chandar, V.; Tchamkerten, A. Most informative quantization functions. Presented at the 2014 Information Theory and Applications Workshop, San Diego, CA, USA, 9–14 February 2014. [Google Scholar]
Weinberger, N.; Shayevitz, O. On the optimal Boolean function for prediction under quadratic Loss. IEEE Trans. Inf. Theory 2017, 63, 4202–4217. [Google Scholar] [CrossRef]
Burin, A.; Shayevitz, O. Reducing guesswork via an unreliable oracle. IEEE Trans. Inf. Theory 2018, 64, 6941–6953. [Google Scholar] [CrossRef]
Ardimanov, N.; Shayevitz, O.; Tamo, I. Minimum Guesswork with an Unreliable Oracle. In Proceedings of the 2018 IEEE International Symposium Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 986–990, Extended Version. Available online: http://arxiv.org/pdf/1811.08528.pdf (accessed on 26 December 2018).
Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: New York, NY, USA, 1971; Volume 2. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Nadarajah, S. A generalized normal distribution. J. Appl. Stat. 2005, 32, 685–694. [Google Scholar] [CrossRef]
Wyner, A.; Ziv, J. A theorem on the entropy of certain binary sequences and applications—I. IEEE Trans. Inf. Theory 1973, 19, 769–772. [Google Scholar] [CrossRef]
Erkip, E.; Cover, T.M. The efficiency of investment information. IEEE Trans. Inf. Theory 1998, 44, 1026–1040. [Google Scholar] [CrossRef]
Ahlswede, R.; Gács, P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 1976, 925–939. [Google Scholar] [CrossRef]
Raginsky, M. Strong data processing inequalities and Φ–Sobolev inequalities for discrete channels. IEEE Trans. Inf. Theory 2016, 62, 3355–3389. [Google Scholar] [CrossRef]
Anantharam, V.; Gohari, A.; Kamath, S.; Nair, C. On hypercontractivity and a data processing inequality. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 3022–3026. [Google Scholar]

Figure 1. Bounds on

γ_{s} (δ)

for

s = 1

(left) and

s = 5

(right) as a function of

δ \in [0, 1 / 2]

.

Figure 1. Bounds on

γ_{s} (δ)

for

s = 1

(left) and

s = 5

(right) as a function of

δ \in [0, 1 / 2]

.

Figure 2. Bounds on

γ_{s} (δ)

for

δ = 0.1

(left) and

δ = 0.4

(right) as a function of

s \in [1, 10]

.

Figure 2. Bounds on

γ_{s} (δ)

for

δ = 0.1

(left) and

δ = 0.4

(right) as a function of

s \in [1, 10]

.

Figure 3. Bounds on

γ_{s, k} (δ)

for

δ = 0.1

and

s = 1

as a function of k.

Figure 3. Bounds on

γ_{s, k} (δ)

for

δ = 0.1

and

s = 1

as a function of k.

Figure 4. Bounds on

γ_{s} (δ)

for

s = 1

as a function of

ϵ \in [0, 1]

.

Figure 4. Bounds on

γ_{s} (δ)

for

s = 1

as a function of

ϵ \in [0, 1]

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Weinberger, N.; Shayevitz, O. Guessing with a Bit of Help. Entropy 2020, 22, 39. https://doi.org/10.3390/e22010039

AMA Style

Weinberger N, Shayevitz O. Guessing with a Bit of Help. Entropy. 2020; 22(1):39. https://doi.org/10.3390/e22010039

Chicago/Turabian Style

Weinberger, Nir, and Ofer Shayevitz. 2020. "Guessing with a Bit of Help" Entropy 22, no. 1: 39. https://doi.org/10.3390/e22010039

APA Style

Weinberger, N., & Shayevitz, O. (2020). Guessing with a Bit of Help. Entropy, 22(1), 39. https://doi.org/10.3390/e22010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Guessing with a Bit of Help^†

Abstract

1. Introduction

2. Problem Statement

3. Guessing Ratio for a Binary Symmetric Channel

3.1. Main Results

3.2. Proofs of the Upper Bounds on $γ_{s, k} (δ)$

3.3. Proofs of the Lower Bounds on $γ_{s, k} (δ)$

4. Guessing Ratio for a General Binary Input Channel

4.1. Binary Erasure Channel

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Miscellaneous Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Guessing with a Bit of Help †

Abstract

1. Introduction

2. Problem Statement

3. Guessing Ratio for a Binary Symmetric Channel

3.1. Main Results

3.2. Proofs of the Upper Bounds on γ s , k ( δ )

3.3. Proofs of the Lower Bounds on γ s , k ( δ )

4. Guessing Ratio for a General Binary Input Channel

4.1. Binary Erasure Channel

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Miscellaneous Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Guessing with a Bit of Help^†

3.2. Proofs of the Upper Bounds on $γ_{s, k} (δ)$

3.3. Proofs of the Lower Bounds on $γ_{s, k} (δ)$