Guessing with Distributed Encoders

Annina Bracher; Amos Lapidoth; Christoph Pfister

doi:10.3390/e21030298

,

and

¹

P&C Solutions, Swiss Re, 8022 Zurich, Switzerland

²

Signal and Information Processing Laboratory, ETH Zurich, 8092 Zurich, Switzerland

^*

Author to whom correspondence should be addressed.

Entropy2019, 21(3), 298;https://doi.org/10.3390/e21030298

This article belongs to the Special Issue Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding

Version Notes

Order Reprints

Abstract

Two correlated sources emit a pair of sequences, each of which is observed by a different encoder. Each encoder produces a rate-limited description of the sequence it observes, and the two descriptions are presented to a guessing device that repeatedly produces sequence pairs until correct. The number of guesses until correct is random, and it is required that it have a moment (of some prespecified order) that tends to one as the length of the sequences tends to infinity. The description rate pairs that allow this are characterized in terms of the Rényi entropy and the Arimoto–Rényi conditional entropy of the joint law of the sources. This solves the guessing analog of the Slepian–Wolf distributed source-coding problem. The achievability is based on random binning, which is analyzed using a technique by Rosenthal.

Keywords:

Arimoto–Rényi conditional entropy; distributed source coding; guessing; Rényi entropy

1. Introduction

In the Massey–Arıkan guessing problem [1,2], a random variable X is drawn from a finite set

X

according to some probability mass function (PMF)

P_{X}

, and it has to be determined by making guesses of the form “Is X equal to x?” until the guess is correct. The guessing order is determined by a guessing function G, which is a bijective function from

X

to

{1, \dots, | X |}

. Guessing according to G proceeds as follows: the first guess is the element

{\hat{x}}_{1} \in X

satisfying

G ({\hat{x}}_{1}) = 1

; the second guess is the element

{\hat{x}}_{2} \in X

satisfying

G ({\hat{x}}_{2}) = 2

, and so on. Consequently,

G (X)

is the number of guesses needed to guess X. Arıkan [2] showed that for any

ρ > 0

, the

ρ

th moment of the number of guesses required by an optimal guesser

G^{*}

to guess X is bounded by:

\begin{matrix} \frac{1}{{(1 + ln | X |)}^{ρ}} 2^{ρ H_{1 / (1 + ρ)} (X)} \leq E [G^{*} {(X)}^{ρ}] \leq 2^{ρ H_{1 / (1 + ρ)} (X)}, \end{matrix}

(1)

where

ln (\cdot)

denotes the natural logarithm, and

H_{1 / (1 + ρ)} (X)

denotes the Rényi entropy of order

\frac{1}{1 + ρ}

, which is defined in Section 3 ahead (refinements of (1) were recently derived in [3]).

Guessing with an encoder is depicted in Figure 1. Here, prior to guessing X, the guesser is provided some side information about X in the form of

f (X)

, where

f : X \to {1, \dots, M}

is a function taking on at most

M

different values (“labels”). Accordingly, a guessing function

G (\cdot | \cdot)

is a function from

X \times {1, \dots, M}

to

{1, \dots, | X |}

such that for every label

m \in {1, \dots, M}

,

G (\cdot | m) : X \to {1, \dots, | X |}

is bijective. If, among all encoders,

f^{*}

minimizes the

ρ

th moment of the number of guesses required by an optimal guesser to guess X after observing

f (X)

, then [4] (Corollary 7):

\begin{matrix} \frac{1}{{(1 + ln | X |)}^{ρ}} 2^{ρ [H_{1 / (1 + ρ)} (X) - log M]} \leq E [G^{*} {(X | f^{*} (X))}^{ρ}] \leq 1 + 2^{ρ [H_{1 / (1 + ρ)} (X) - log M + 1]} . \end{matrix}

(2)

Figure 1. Guessing with an encoder f.

Thus, in guessing a sequence of independent and identically distributed (IID) random variables, a description rate of approximately

H_{1 / (1 + ρ)} (X)

bits per symbol is needed to drive the

ρ

th moment of the number of guesses to one as the sequence length tends to infinity [4,5] (see Section 2 for more related work).

In this paper, we generalize the single-encoder setting from Figure 1 to the setting with distributed encoders depicted in Figure 2, which is the analog of Slepian–Wolf coding [6] for guessing: A source generates a sequence of pairs

{(X_{i}, Y_{i})}_{i = 1}^{n}

over a finite alphabet

X \times Y

. The sequence

X^{n}

is described by one of

⌊ 2^{n R_{X}} ⌋

labels and the sequence

Y^{n}

by one of

⌊ 2^{n R_{Y}} ⌋

labels using functions:

\begin{matrix} f_{n} : X^{n} & \to {1, \dots, ⌊ 2^{n R_{X}} ⌋}, \end{matrix}

(3)

\begin{matrix} g_{n} : Y^{n} & \to {1, \dots, ⌊ 2^{n R_{Y}} ⌋}, \end{matrix}

(4)

where

R_{X} \geq 0

and

R_{Y} \geq 0

. Based on

f_{n} (X^{n})

and

g_{n} (Y^{n})

, a guesser repeatedly produces guesses of the form

({\hat{x}}^{n}, {\hat{y}}^{n})

until

({\hat{x}}^{n}, {\hat{y}}^{n}) = (X^{n}, Y^{n})

.

Figure 2. Guessing with distributed encoders

f_{n}

and

g_{n}

.

For a fixed

ρ > 0

, a rate pair

(R_{X}, R_{Y}) \in R_{\geq 0}^{2}

is called achievable if there exists a sequence of encoders and guessing functions

{(f_{n}, g_{n}, G_{n})}_{n = 1}^{\infty}

such that the

ρ

th moment of the number of guesses tends to one as n tends to infinity, i.e.,

\begin{matrix} lim_{n \to \infty} E [G_{n} {(X^{n}, Y^{n} | f_{n} (X^{n}), g_{n} (Y^{n}))}^{ρ}] = 1 . \end{matrix}

(5)

Our main contribution is Theorem 1, which characterizes the achievable rate pairs. For a fixed

ρ > 0

, let the region

R (ρ)

comprise all rate pairs

(R_{X}, R_{Y}) \in R_{\geq 0}^{2}

satisfying the following inequalities simultaneously:

\begin{matrix} R_{X} & \geq \underset{n \to \infty}{lim sup} \frac{H_{\tilde{ρ}} (X^{n} | Y^{n})}{n}, \end{matrix}

(6)

\begin{matrix} R_{Y} & \geq \underset{n \to \infty}{lim sup} \frac{H_{\tilde{ρ}} (Y^{n} | X^{n})}{n}, \end{matrix}

(7)

\begin{matrix} R_{X} + R_{Y} & \geq \underset{n \to \infty}{lim sup} \frac{H_{\tilde{ρ}} (X^{n}, Y^{n})}{n}, \end{matrix}

(8)

where the Rényi entropy

H_{α} (\cdot)

and the Arimoto–Rényi conditional entropy

H_{α} (\cdot | \cdot)

of order

α

are both defined in Section 3 ahead, and throughout the paper,

\begin{matrix} \tilde{ρ} ≜ \frac{1}{1 + ρ} . \end{matrix}

(9)

Theorem 1.

For any

ρ > 0

, all rate pairs in the interior of

R (ρ)

are achievable, while those outside

R (ρ)

are not. If

{(X_{i}, Y_{i})}_{i = 1}^{\infty}

are IID according to

P_{X Y}

, then (6)–(8) reduce to:

\begin{matrix} R_{X} & \geq H_{\tilde{ρ}} (X | Y), \end{matrix}

(10)

\begin{matrix} R_{Y} & \geq H_{\tilde{ρ}} (Y | X), \end{matrix}

(11)

\begin{matrix} R_{X} + R_{Y} & \geq H_{\tilde{ρ}} (X, Y) . \end{matrix}

(12)

Proof.

The converse follows from Corollary 1 in Section 4; the achievability follows from Corollary 2 in Section 5; and the reduction of (6)–(8) to (10)–(12) in the IID case follows from (19) and (20) ahead. ☐

The rate region defined by (10)–(12) resembles the rate region of Slepian–Wolf coding [6] (Theorem 15.4.1); the difference is that the Shannon entropy and conditional entropy are replaced by their Rényi counterparts. The rate regions are related as follows:

Remark 1.

For memoryless sources and

ρ > 0

, the region

R (ρ)

is contained in the Slepian–Wolf region. Typically, the containment is strict.

Proof.

The containment follows from the monotonicity of the Arimoto–Rényi conditional entropy: (9) implies that

\tilde{ρ} \in (0, 1)

, so, by [7] (Proposition 5),

H_{\tilde{ρ}} (X | Y) \geq H (X | Y)

,

H_{\tilde{ρ}} (Y | X) \geq H (Y | X)

, and

H_{\tilde{ρ}} (X, Y) \geq H (X, Y)

. As for the strict containment, first note that the Slepian–Wolf region contains at least one rate pair

(R_{X}, R_{Y})

satisfying

R_{X} + R_{Y} = H (X, Y)

. Consequently, if

H_{\tilde{ρ}} (X, Y) > H (X, Y)

, then the containment is strict. Because

H_{\tilde{ρ}} (X, Y) > H (X, Y)

unless

(X, Y)

is distributed uniformly over its support [8], the containment is typically strict.

The claim can also be shown operationally: The probability of error is equal to the probability that more than one guess is needed, and for every

ρ > 0

,

\begin{matrix} Pr [G_{n} (X^{n}, Y^{n} | f_{n} (X^{n}), g_{n} (Y^{n})) \geq 2] & = Pr [G_{n} {(X^{n}, Y^{n} | f_{n} (X^{n}), g_{n} (Y^{n}))}^{ρ} - 1 \geq 2^{ρ} - 1] \end{matrix}

(13)

\begin{matrix} \leq \frac{E [G_{n} {(X^{n}, Y^{n} | f_{n} (X^{n}), g_{n} (Y^{n}))}^{ρ}] - 1}{2^{ρ} - 1}, \end{matrix}

(14)

where (14) follows from Markov’s inequality. Thus, the probability of error tends to zero if the

ρ

th moment of the number of guesses tends to one. ☐

Despite the resemblance between (10)–(12) and the Slepian–Wolf region, there is an important difference: while Slepian–Wolf coding allows separate encoding with the same sum rate as with joint encoding, this is not necessarily true in our setting:

Remark 2.

Although the sum rate constraint (12) is the same as in single-source guessing [5], separate encoding of

X^{n}

and

Y^{n}

may require a larger sum rate than joint encoding of

X^{n}

and

Y^{n}

.

Proof.

If

H_{\tilde{ρ}} (X | Y) + H_{\tilde{ρ}} (Y | X) > H_{\tilde{ρ}} (X, Y)

, then (10) and (11) together impose a stronger constraint on the sum rate than (12). For example, if:

\begin{array}{c} P_{X Y} (x, y) & y = 0 & y = 1 \\ x = 0 & 0.65 & 0.17 \\ x = 1 & 0.17 & 0.01 \end{array}

and

ρ = 1

, then

H_{1 / 2} (X | Y) + H_{1 / 2} (Y | X) \approx 1.61

bits, so separate (distributed) encoding requires a sum rate exceeding

1.61

bits as opposed to joint encoding, which is possible with

H_{1 / 2} (X, Y) \approx 1.58

bits (in Slepian–Wolf coding, this cannot happen because

H (X, Y) - H (X | Y) - H (Y | X) = I (X; Y) \geq 0

). ☐

The guessing problem is related to the task-encoding problem, where based on

f_{n} (X^{n})

and

g_{n} (Y^{n})

, the decoder outputs a list that is guaranteed to contain

(X^{n}, Y^{n})

, and the

ρ

th moment of the list size is required to tend to one as n tends to infinity. While, in the single-source setting, the guessing problem and the task-encoding problem have the same asymptotics [4], this is not the case in the distributed setting:

Remark 3.

For memoryless sources, the task-encoding region from [9] is strictly smaller than the guessing region

R (ρ)

unless X and Y are independent.

Proof.

In the IID case, the task-encoding region is the set of all rate pairs

(R_{X}, R_{Y}) \in R_{\geq 0}^{2}

satisfying the following inequalities [9] (Theorem 1):

\begin{matrix} R_{X} & \geq H_{\tilde{ρ}} (X), \end{matrix}

(15)

\begin{matrix} R_{Y} & \geq H_{\tilde{ρ}} (Y), \end{matrix}

(16)

\begin{matrix} R_{X} + R_{Y} & \geq H_{\tilde{ρ}} (X, Y) + K_{\tilde{ρ}} (X; Y), \end{matrix}

(17)

where

K_{α} (X; Y)

is a Rényi measure of dependence studied in [10] (when

α

is one,

K_{α} (X; Y)

is the mutual information). The claim now follows from the following observations: By [7] (Theorem 2),

H_{\tilde{ρ}} (X) \geq H_{\tilde{ρ}} (X | Y)

with equality if and only if X and Y are independent; similarly,

H_{\tilde{ρ}} (Y) \geq H_{\tilde{ρ}} (Y | X)

with equality if and only if X and Y are independent; and by [10] (Theorem 2),

K_{\tilde{ρ}} (X; Y) \geq 0

with equality if and only if X and Y are independent. ☐

The rest of this paper is structured as follows: in Section 2, we review other guessing settings; in Section 3, we recall the Rényi information measures and prove some auxiliary lemmas; in Section 4, we prove the converse theorem; and in Section 5, we prove the achievability theorem, which is based on random binning and, in the case

ρ > 1

, is analyzed using a technique by Rosenthal [11].

2. Related Work

Tighter versions of (1) can be found in [3,12]. The large deviation behavior of guessing was studied in [13,14]. The relation between guessing and variable-length lossless source coding was explored in [3,15,16].

Mismatched guessing, where the assumed distribution of X does not match its actual distribution, was studied in [17], along with guessing under source uncertainty, where the PMF of X belongs to some known set, and a guesser was sought with good worst-case performance over that set. Guessing subject to distortion, where instead of guessing X, it suffices to guess an

\hat{X}

that is close to X according to some distortion measure, was treated in [18].

If the guesser observes some side information Y, then the

ρ

th moment of the number of guesses required by an optimal guesser is bounded by [2]:

\begin{matrix} \frac{1}{{(1 + ln | X |)}^{ρ}} 2^{ρ H_{\tilde{ρ}} (X | Y)} \leq E [G^{*} {(X | Y)}^{ρ}] \leq 2^{ρ H_{\tilde{ρ}} (X | Y)}, \end{matrix}

(18)

where

H_{\tilde{ρ}} (X | Y)

denotes the Arimoto–Rényi conditional entropy of order

\tilde{ρ} = \frac{1}{1 + ρ}

, which is defined in Section 3 ahead (refinements of (18) were recently derived in [3]). Guessing is related to the cutoff rate of a discrete memoryless channel, which is the supremum over all rates for which the

ρ

th moment of the number of guesses needed by the decoder to guess the message can be driven to one as the block length tends to infinity. In [2,19], the cutoff rate was expressed in terms of Gallager’s

E_{0}

function [20]. Joint source-channel guessing was considered in [21].

Guessing with an encoder, i.e., the situation where the side information can be chosen, was studied in [4], where it was also shown that guessing and task encoding [22] have the same asymptotics. With distributed encoders, however, task encoding [9] and guessing no longer have the same asymptotics; see Remark 3. Lower and upper bounds for guessing with a helper, i.e., an encoder that does not observe X, but has access to a random variable that is correlated with X, can be found in [5].

3. Preliminaries

Throughout the paper,

log (\cdot)

denotes the base-two logarithm. When clear from the context, we often omit sets and subscripts; for example, we write

\sum_{x}

for

\sum_{x \in X}

and

P (x)

for

P_{X} (x)

. The Rényi entropy [23] of order

α

is defined for positive

α

other than one as:

\begin{matrix} H_{α} (X) ≜ \frac{1}{1 - α} log \sum_{x} P {(x)}^{α} . \end{matrix}

(19)

In the limit as

α

tends to one, the Shannon entropy is recovered, i.e.,

{lim}_{α \to 1} H_{α} (X) = H (X)

. The Arimoto–Rényi conditional entropy [24] of order

α

is defined for positive

α

other than one as:

\begin{matrix} H_{α} (X | Y) ≜ \frac{α}{1 - α} log \sum_{y} {[\sum_{x} P {(x, y)}^{α}]}^{\frac{1}{α}} . \end{matrix}

(20)

In the limit as

α

tends to one, the Shannon conditional entropy is recovered, i.e.,

{lim}_{α \to 1} H_{α} (X | Y) = H (X | Y)

. The properties of the Arimoto–Rényi conditional entropy were studied in [7,24,25].

In the rest of this section, we recall some properties of the Arimoto–Rényi conditional entropy that will be used in Section 4 (Lemmas 1–3), and we prove auxiliary results for Section 5 (Lemmas 4–7).

Lemma 1

([7], Theorem 2). Let

α > 0

, and let

P_{X Y Z}

be a PMF over the finite set

X \times Y \times Z

. Then,

\begin{matrix} H_{α} (X | Y, Z) \leq H_{α} (X | Z) \end{matrix}

(21)

with equality if and only if

X ⊸ - Z ⊸ - Y

form a Markov chain.

Lemma 2

([7], Proposition 4). Let

α > 0

, and let

P_{X Y Z}

be a PMF over the finite set

X \times Y \times Z

. Then,

\begin{matrix} H_{α} (X, Y | Z) \geq H_{α} (X | Z) \end{matrix}

(22)

with equality if and only if Y is uniquely determined by X and Z.

Lemma 3

([7], Theorem 3). Let

α > 0

, and let

P_{X Y Z}

be a PMF over the finite set

X \times Y \times Z

. Then,

\begin{matrix} H_{α} (X | Y, Z) \geq H_{α} (X | Z) - log | Y | . \end{matrix}

(23)

Lemma 4

([20], Problem 4.15(f)). Let

Y

be a finite set, and let

f : Y \to R_{\geq 0}

. Then, for all

p \in (0, 1]

,

\begin{matrix} {[\sum_{y} f (y)]}^{p} \leq \sum_{y} f {(y)}^{p} . \end{matrix}

(24)

Proof.

If

\sum_{y} f (y) = 0

, then (24) holds because the left-hand side (LHS) and the right-hand side (RHS) are both zero. If

\sum_{y} f (y) > 0

, then:

\begin{matrix} \sum_{y} f {(y)}^{p} & = {[\sum_{y^{'}} f (y^{'})]}^{p} \sum_{y} {[\frac{f (y)}{\sum_{y^{'}} f (y^{'})}]}^{p} \end{matrix}

(25)

\begin{matrix} \geq {[\sum_{y^{'}} f (y^{'})]}^{p} \sum_{y} \frac{f (y)}{\sum_{y^{'}} f (y^{'})} \end{matrix}

(26)

\begin{matrix} = {[\sum_{y^{'}} f (y^{'})]}^{p}, \end{matrix}

(27)

where (26) holds because

p \in (0, 1]

and

f (y) / \sum_{y^{'}} f (y^{'}) \in [0, 1]

for every

y \in Y

. ☐

Lemma 5.

Let a, b, and c be nonnegative integers. Then, for all

p > 0

,

\begin{matrix} {(1 + a + b + c)}^{p} \leq 1 + 4^{p} (a^{p} + b^{p} + c^{p}) \end{matrix}

(28)

(the restriction to integers cannot be omitted; for example, (28) does not hold if

a = b = c = 0.1

and

p = 2

).

Proof.

If

p \in (0, 1]

, then (28) follows from Lemma 4 because

4^{p} \geq 1

. If

p > 1

, then the cases with

a + b + c \in {0, 1, 2}

can be checked individually. For

a + b + c \geq 3

,

\begin{matrix} {(1 + a + b + c)}^{p} & = {[\frac{3}{a + b + c} + 3]}^{p} \cdot {[\frac{a + b + c}{3}]}^{p} \end{matrix}

(29)

\begin{matrix} \leq 4^{p} \cdot {[\frac{a + b + c}{3}]}^{p} \end{matrix}

(30)

\begin{matrix} \leq 4^{p} \cdot \frac{a^{p} + b^{p} + c^{p}}{3} \end{matrix}

(31)

\begin{matrix} \leq 1 + 4^{p} (a^{p} + b^{p} + c^{p}), \end{matrix}

(32)

where (30) holds because

a + b + c \geq 3

, and (31) follows from Jensen’s inequality because

z \mapsto z^{p}

is convex on

R_{\geq 0}

since

p > 1

. ☐

Lemma 6.

Let a, b, c, and d be nonnegative real numbers. Then, for all

p > 0

,

\begin{matrix} {(a + b + c + d)}^{p} \leq 4^{p} (a^{p} + b^{p} + c^{p} + d^{p}) . \end{matrix}

(33)

Proof.

If

p \in (0, 1]

, then (33) follows from Lemma 4 because

4^{p} \geq 1

. If

p > 1

, then:

\begin{matrix} {(a + b + c + d)}^{p} & = 4^{p} \cdot {[\frac{a + b + c + d}{4}]}^{p} \end{matrix}

(34)

\begin{matrix} \leq 4^{p} \cdot \frac{a^{p} + b^{p} + c^{p} + d^{p}}{4} \end{matrix}

(35)

\begin{matrix} \leq 4^{p} (a^{p} + b^{p} + c^{p} + d^{p}), \end{matrix}

(36)

where (35) follows from Jensen’s inequality because

z \mapsto z^{p}

is convex on

R_{\geq 0}

since

p > 1

. ☐

Lemma 7

(Rosenthal). Let

p > 1

, and let

X_{1}, \dots, X_{n}

be independent random variables that are either zero or one. Then,

X ≜ \sum_{i = 1}^{n} X_{i}

satisfies:

\begin{matrix} E [X^{p}] \leq 2^{p^{2}} max {E [X], E {[X]}^{p}} . \end{matrix}

(37)

Proof.

This is a special case of [11] (Lemma 1). For convenience, we also provide a self-contained proof:

\begin{matrix} E [X^{p}] & = E [\sum_{i \in {1, \dots, n}} X_{i} \cdot {\{\sum_{j \in {1, \dots, n}} X_{j}\}}^{p - 1}] \end{matrix}

(38)

\begin{matrix} = E [\sum_{i \in {1, \dots, n}} X_{i} \cdot {\{1 + \sum_{j \in {1, \dots, n} ∖ {i}} X_{j}\}}^{p - 1}] \end{matrix}

(39)

\begin{matrix} = \sum_{i \in {1, \dots, n}} E [X_{i} \cdot {\{1 + \sum_{j \in {1, \dots, n} ∖ {i}} X_{j}\}}^{p - 1}] \end{matrix}

(40)

\begin{matrix} = \sum_{i \in {1, \dots, n}} E [X_{i}] \cdot E [{\{1 + \sum_{j \in {1, \dots, n} ∖ {i}} X_{j}\}}^{p - 1}] \end{matrix}

(41)

\begin{matrix} \leq \sum_{i \in {1, \dots, n}} E [X_{i}] \cdot E [{\{1 + \sum_{j \in {1, \dots, n}} X_{j}\}}^{p - 1}] \end{matrix}

(42)

\begin{matrix} = E [X] \cdot E [{(1 + X)}^{p - 1}] \end{matrix}

(43)

\begin{matrix} \leq E [X] \cdot 2^{p - 1} \cdot (1 + E [X^{p - 1}]) \end{matrix}

(44)

\begin{matrix} = 2^{p - 1} (E [X] + E [X] E [X^{p - 1}]) \end{matrix}

(45)

\begin{matrix} \leq 2^{p - 1} (E [X] + E [X] E {[X^{p}]}^{\frac{p - 1}{p}}) \end{matrix}

(46)

\begin{matrix} \leq 2^{p} max \{E [X], E [X] E {[X^{p}]}^{\frac{p - 1}{p}}\}, \end{matrix}

(47)

where (39) holds because each

X_{i}

is either zero or one; (41) holds because

X_{1}, \dots, X_{n}

are independent; (42) holds because

z \mapsto z^{p - 1}

is increasing on

R_{\geq 0}

for

p > 1

; (44) holds because for real numbers

a \geq 0

,

b \geq 0

, and

r > 0

, we have

{(a + b)}^{r} \leq {(2 max {a, b})}^{r} = 2^{r} max {a^{r}, b^{r}} \leq 2^{r} (a^{r} + b^{r})

; and (46) follows from Jensen’s inequality because

z \mapsto z^{(p - 1) / p}

is concave on

R_{\geq 0}

for

p > 1

.

We now consider two cases depending on which term on the RHS of (47) achieves the maximum: If the maximum is achieved by

E [X]

, then

E [X^{p}] \leq 2^{p} E [X]

, which implies (37) because

2^{p} \leq 2^{p^{2}}

since

p > 1

. If the maximum is achieved by

E [X] E {[X^{p}]}^{(p - 1) / p}

, then:

\begin{matrix} E [X^{p}] \leq 2^{p} E [X] E {[X^{p}]}^{\frac{p - 1}{p}} . \end{matrix}

(48)

Rearranging (48), we obtain:

\begin{matrix} E [X^{p}] \leq 2^{p^{2}} E {[X]}^{p}, \end{matrix}

(49)

so (37) holds also in this case. ☐

4. Converse

In this section, we prove a nonasymptotic and an asymptotic converse result (Theorem 2 and Corollary 1, respectively).

Theorem 2.

Let

U ⊸ - X ⊸ - Y ⊸ - V

form a Markov chain over the finite set

U \times X \times Y \times V

, and let

τ ≜ 1 + ln | X \times Y |

. Then, for every

ρ > 0

and for every guesser, the ρth moment of the number of guesses it takes to guess the pair

(X, Y)

based on the side information

(U, V)

satisfies:

\begin{matrix} E [G {(X, Y | U, V)}^{ρ}] \geq max { & 2^{ρ (H_{\tilde{ρ}} (X | Y) - log | U | - log τ)}, \\ 2^{ρ (H_{\tilde{ρ}} (Y | X) - log | V | - log τ)}, \\ 2^{ρ (H_{\tilde{ρ}} (X, Y) - log | U \times V | - log τ)}} . \end{matrix}

(50)

Proof.

We view (50) as three lower bounds corresponding to the three terms in the maximization on its RHS. The lower bound involving

H_{\tilde{ρ}} (X, Y)

holds because:

\begin{matrix} E [G {(X, Y | U, V)}^{ρ}] & \geq 2^{ρ (H_{\tilde{ρ}} (X, Y | U, V) - log τ)} \end{matrix}

(51)

\begin{matrix} \geq 2^{ρ (H_{\tilde{ρ}} (X, Y) - log | U \times V | - log τ)}, \end{matrix}

(52)

where (51) follows from (18) and (52) follows from Lemma 3. The lower bound involving

H_{\tilde{ρ}} (X | Y)

holds because:

\begin{matrix} E [G {(X, Y | U, V)}^{ρ}] & \geq 2^{ρ (H_{\tilde{ρ}} (X, Y | U, V) - log τ)} \end{matrix}

(53)

\begin{matrix} \geq 2^{ρ (H_{\tilde{ρ}} (X, Y | U, V, Y) - log τ)} \end{matrix}

(54)

\begin{matrix} = 2^{ρ (H_{\tilde{ρ}} (X | U, V, Y) - log τ)} \end{matrix}

(55)

\begin{matrix} = 2^{ρ (H_{\tilde{ρ}} (X | U, Y) - log τ)} \end{matrix}

(56)

\begin{matrix} \geq 2^{ρ (H_{\tilde{ρ}} (X | Y) - log | U | - log τ)}, \end{matrix}

(57)

where (53) follows from (18); (54) follows from Lemma 1; (55) follows from Lemma 2; (56) follows from Lemma 1 because

X ⊸ - (U, Y) ⊸ - V

form a Markov chain; and (57) follows from Lemma 3. The lower bound involving

H_{\tilde{ρ}} (Y | X)

is analogous to the one with

H_{\tilde{ρ}} (X | Y)

. ☐

Corollary 1.

For any

ρ > 0

, rate pairs outside

R (ρ)

are not achievable.

Proof.

We first show that (8) is necessary for a rate pair

(R_{X}, R_{Y}) \in R_{\geq 0}^{2}

to be achievable. Indeed, if (8) does not hold, then there exists an

ϵ > 0

such that for infinitely many n,

\begin{matrix} \frac{H_{\tilde{ρ}} (X^{n}, Y^{n})}{n} \geq R_{X} + R_{Y} + ϵ . \end{matrix}

(58)

Using Theorem 2 with

X^{'} ≜ X^{n}

,

Y^{'} ≜ Y^{n}

,

U ≜ {1, \dots, ⌊ 2^{n R_{X}} ⌋}

,

V ≜ {1, \dots, ⌊ 2^{n R_{Y}} ⌋}

,

P_{X^{'} Y^{'}} ≜ P_{X^{n} Y^{n}}

,

U ≜ f_{n} (X^{n})

,

V ≜ g_{n} (Y^{n})

, and

τ_{n} = 1 + n ln | X \times Y |

leads to:

\begin{matrix} E [G {(X^{n}, Y^{n} | U, V)}^{ρ}] & \geq 2^{ρ (H_{\tilde{ρ}} (X^{n}, Y^{n}) - log | U \times V | - log τ_{n})} \end{matrix}

(59)

\begin{matrix} \geq 2^{ρ n (\frac{1}{n} H_{\tilde{ρ}} (X^{n}, Y^{n}) - R_{X} - R_{Y} - \frac{1}{n} log τ_{n})} . \end{matrix}

(60)

It follows from (60), (58), and the fact that

\frac{1}{n} log τ_{n}

tends to zero as n tends to infinity that the LHS of (59) cannot tend to one as n tends to infinity, so

(R_{X}, R_{Y})

is not achievable if (8) does not hold. The necessity of (6) and (7) can be shown in the same way. ☐

5. Achievability

In this section, we prove a nonasymptotic and an asymptotic achievability result (Theorem 3 and Corollary 2, respectively).

Theorem 3.

Let

X

,

Y

,

U

, and

V

be finite nonempty sets; let

P_{X Y}

be a PMF; let

ρ > 0

; and let

ϵ > 0

be such that:

\begin{matrix} log | U | & \geq H_{\tilde{ρ}} (X | Y) + ϵ, \end{matrix}

(61)

\begin{matrix} log | V | & \geq H_{\tilde{ρ}} (Y | X) + ϵ, \end{matrix}

(62)

\begin{matrix} log | U \times V | & \geq H_{\tilde{ρ}} (X, Y) + ϵ . \end{matrix}

(63)

Then, there exist functions

f : X \to U

and

g : Y \to V

and a guesser such that the ρth moment of the number of guesses needed to guess the pair

(X, Y)

based on the side information

(f (X), g (Y))

satisfies:

\begin{matrix} E [G {(X, Y | f (X), g (Y))}^{ρ}] & \leq \{\begin{matrix} 1 + 4^{ρ + 1} \cdot 2^{- ρ ϵ} & if ρ \in (0, 1], \\ 1 + 4^{{(ρ + 1)}^{2}} \cdot 2^{- ϵ} & if ρ > 1 . \end{matrix} \end{matrix}

(64)

Proof.

Our achievability result relies on random binning: we map each

x \in X

uniformly at random to some

u \in U

and each

y \in Y

uniformly at random to some

v \in V

. We then show that the

ρ

th moment of the number of guesses averaged over all such mappings

f : X \to U

and

g : Y \to V

is upper bounded by the RHS of (64). From this, we conclude that there exist f and g that satisfy (64).

Let the guessing function G correspond to guessing in decreasing order of probability [2] (ties can be resolved arbitrarily). Let f and g be distributed as described above, and denote by

E_{f, g} [\cdot]

the expectation with respect to f and g. Then,

\begin{matrix} E_{f, g} [E [G {(X, Y | f (X), g (Y))}^{ρ}]] & = \sum_{x, y} P (x, y) E_{f, g} [G {(x, y | f (x), g (y))}^{ρ}] \end{matrix}

(65)

\begin{matrix} \leq \sum_{x, y} P (x, y) E_{f, g} [{\{\sum_{x^{'}, y^{'}} ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'})\}}^{ρ}] \end{matrix}

(66)

\begin{matrix} = \sum_{x, y} P (x, y) E_{f, g} [{(1 + β_{1} + β_{2} + β_{3})}^{ρ}] \end{matrix}

(67)

\begin{matrix} \leq 1 + 4^{ρ} \sum_{x, y} P (x, y) (E_{f, g} [β_{1}^{ρ}] + E_{f, g} [β_{2}^{ρ}] + E_{f, g} [β_{3}^{ρ}]) \end{matrix}

(68)

with:

\begin{matrix} ψ (x^{'}, y^{'}) = ψ (x, y, x^{'}, y^{'}) & ≜ 𝟙 {P (x^{'}, y^{'}) \geq P (x, y)}, \end{matrix}

(69)

\begin{matrix} ϕ_{f} (x^{'}) = ϕ_{f} (x, x^{'}) & ≜ 𝟙 {f (x^{'}) = f (x)}, \end{matrix}

(70)

\begin{matrix} ϕ_{g} (y^{'}) = ϕ_{g} (y, y^{'}) & ≜ 𝟙 {g (y^{'}) = g (y)}, \end{matrix}

(71)

\begin{matrix} β_{1} = β_{1} (x, y, f) & ≜ \sum_{x^{'} \neq x} ψ (x^{'}, y) ϕ_{f} (x^{'}), \end{matrix}

(72)

\begin{matrix} β_{2} = β_{2} (x, y, g) & ≜ \sum_{y^{'} \neq y} ψ (x, y^{'}) ϕ_{g} (y^{'}), \end{matrix}

(73)

\begin{matrix} β_{3} = β_{3} (x, y, f, g) & ≜ \sum_{x^{'} \neq x, y^{'} \neq y} ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'}), \end{matrix}

(74)

where

𝟙 {\cdot}

is the indicator function that is one if the condition comprising its argument is true and zero otherwise; (65) holds because

(f, g)

and

(X, Y)

are independent; (66) holds because the number of guesses is upper bounded by the number of

(x^{'}, y^{'})

that are at least as likely as

(x, y)

and that are mapped to the same labels

(u, v)

as

(x, y)

; (67) follows from splitting the sum depending on whether

x^{'} = x

or not and whether

y^{'} = y

or not and from the fact that

ψ (x, y) = ϕ_{f} (x) = ϕ_{g} (y) = 1

; and (68) follows from Lemma 5 because

β_{1}

,

β_{2}

, and

β_{3}

are nonnegative integers. As indicated in (69)–(74), the dependence of

ψ

,

ϕ_{f}

,

ϕ_{g}

,

β_{1}

,

β_{2}

, and

β_{3}

on x, y, f, and g is implicit in our notation.

We first treat the case

ρ \in (0, 1]

. We bound the terms on the RHS of (68) as follows:

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{1}^{ρ}] & \leq \sum_{x, y} P (x, y) E_{f, g} {[β_{1}]}^{ρ} \end{matrix}

(75)

\begin{matrix} = \sum_{x, y} P (x, y) {[\sum_{x^{'} \neq x} ψ (x^{'}, y) \frac{1}{| U |}]}^{ρ} \end{matrix}

(76)

\begin{matrix} \leq \sum_{x, y} P (x, y) {[\sum_{x^{'}} {[\frac{P (x^{'}, y)}{P (x, y)}]}^{\tilde{ρ}} \frac{1}{| U |}]}^{ρ} \end{matrix}

(77)

\begin{matrix} = \frac{1}{{| U |}^{ρ}} \sum_{x, y} P {(x, y)}^{\tilde{ρ}} {[\sum_{x^{'}} P {(x^{'}, y)}^{\tilde{ρ}}]}^{ρ} \end{matrix}

(78)

\begin{matrix} = \frac{1}{{| U |}^{ρ}} \sum_{y} [\sum_{x} P {(x, y)}^{\tilde{ρ}}] {[\sum_{x^{'}} P {(x^{'}, y)}^{\tilde{ρ}}]}^{ρ} \end{matrix}

(79)

\begin{matrix} = \frac{1}{{| U |}^{ρ}} \sum_{y} {[\sum_{x} P {(x, y)}^{\tilde{ρ}}]}^{1 + ρ} \end{matrix}

(80)

\begin{matrix} = 2^{ρ (H_{\tilde{ρ}} (X | Y) - log | U |)} \end{matrix}

(81)

\begin{matrix} \leq 2^{- ρ ϵ}, \end{matrix}

(82)

where (75) follows from Jensen’s inequality because

z \mapsto z^{ρ}

is concave on

R_{\geq 0}

since

ρ \in (0, 1]

; (76) holds because the expectation operator is linear and because

E_{f, g} [ϕ_{f} (x^{'})] = 1 / | U |

since

x^{'} \neq x

; in (77), we extended the inner summation and used that

ψ (x^{'}, y) \leq {[P (x^{'}, y) / P (x, y)]}^{\tilde{ρ}}

; and (82) follows from (61). In the same way, we obtain:

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{2}^{ρ}] \leq 2^{- ρ ϵ} . \end{matrix}

(83)

Similarly,

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{3}^{ρ}] & \leq \sum_{x, y} P (x, y) E_{f, g} {[β_{3}]}^{ρ} \end{matrix}

(84)

\begin{matrix} = \sum_{x, y} P (x, y) {[\sum_{x^{'} \neq x, y^{'} \neq y} ψ (x^{'}, y^{'}) \frac{1}{| U \times V |}]}^{ρ} \end{matrix}

(85)

\begin{matrix} \leq \sum_{x, y} P (x, y) {[\sum_{x^{'}, y^{'}} {[\frac{P (x^{'}, y^{'})}{P (x, y)}]}^{\tilde{ρ}} \frac{1}{| U \times V |}]}^{ρ} \end{matrix}

(86)

\begin{matrix} = \frac{1}{{| U \times V |}^{ρ}} \sum_{x, y} P {(x, y)}^{\tilde{ρ}} {[\sum_{x^{'}, y^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}}]}^{ρ} \end{matrix}

(87)

\begin{matrix} = \frac{1}{{| U \times V |}^{ρ}} {[\sum_{x, y} P {(x, y)}^{\tilde{ρ}}]}^{1 + ρ} \end{matrix}

(88)

\begin{matrix} = 2^{ρ (H_{\tilde{ρ}} (X, Y) - log | U \times V |)} \end{matrix}

(89)

\begin{matrix} \leq 2^{- ρ ϵ} . \end{matrix}

(90)

From (68), (82), (83), and (90), we obtain:

\begin{matrix} E_{f, g} [E [G {(X, Y | f (X), g (Y))}^{ρ}]] & \leq 1 + 3 \cdot 4^{ρ} \cdot 2^{- ρ ϵ} \end{matrix}

(91)

\begin{matrix} \leq 1 + 4^{ρ + 1} \cdot 2^{- ρ ϵ} \end{matrix}

(92)

and hence infer the existence of

f : X \to U

and

g : Y \to V

satisfying (64).

We now consider (68) when

ρ > 1

. Unlike in the case

ρ \in (0, 1]

, we cannot use Jensen’s inequality as we did in (75). Instead, for fixed

x \in X

and

y \in Y

, we upper-bound the first expectation on the RHS of (68) by:

\begin{matrix} E_{f, g} [β_{1}^{ρ}] & \leq 2^{ρ^{2}} max \{E_{f, g} [β_{1}], E_{f, g} {[β_{1}]}^{ρ}\} \end{matrix}

(93)

\begin{matrix} \leq 2^{ρ^{2}} (E_{f, g} {[β_{1}]}^{ρ} + E_{f, g} [β_{1}]), \end{matrix}

(94)

where (93) follows from Lemma 7 because

ρ > 1

and because

β_{1}

is a sum of independent random variables taking values in

{0, 1}

. By the same steps as in (76)–(82),

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} {[β_{1}]}^{ρ} \leq 2^{- ρ ϵ} . \end{matrix}

(95)

As to the expectation of the other term on the RHS of (94),

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{1}] & \leq {[\sum_{x, y} P (x, y) E_{f, g} {[β_{1}]}^{ρ}]}^{\frac{1}{ρ}} \end{matrix}

(96)

\begin{matrix} \leq 2^{- ϵ}, \end{matrix}

(97)

where (96) follows from Jensen’s inequality because

z \mapsto z^{\frac{1}{ρ}}

is concave on

R_{\geq 0}

since

ρ > 1

, and (97) follows from (95). From (94), (95), and (97), we obtain:

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{1}^{ρ}] & \leq 2^{ρ^{2}} (2^{- ρ ϵ} + 2^{- ϵ}) \end{matrix}

(98)

\begin{matrix} \leq 2^{ρ^{2} + 1} \cdot 2^{- ϵ}, \end{matrix}

(99)

where (99) holds because

2^{- ρ ϵ} \leq 2^{- ϵ}

since

ρ > 1

and

ϵ > 0

. In the same way, we obtain for the second expectation on the RHS of (68):

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{2}^{ρ}] \leq 2^{ρ^{2} + 1} \cdot 2^{- ϵ} . \end{matrix}

(100)

Bounding

E_{f, g} [β_{3}^{ρ}]

, i.e., the third expectation on the RHS of (68), is more involved because

β_{3}

is not a sum of independent random variables. Our approach builds on the ideas used by Rosenthal [11] (Proof of Lemma 1); compare (47) and (48) with (108) and (123) ahead. For fixed

x \in X

and

y \in Y

,

\begin{matrix} E_{f, g} [β_{3}^{ρ}] & = E_{f, g} [\sum_{x^{'} \neq x, y^{'} \neq y} ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'}) \cdot {\{\sum_{\tilde{x} \neq x, \tilde{y} \neq y} ψ (\tilde{x}, \tilde{y}) ϕ_{f} (\tilde{x}) ϕ_{g} (\tilde{y})\}}^{ρ - 1}] \end{matrix}

(101)

\begin{matrix} = E_{f, g} [\sum_{x^{'} \neq x, y^{'} \neq y} ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'}) \cdot {(1 + γ_{1} + γ_{2} + γ_{3})}^{ρ - 1}] \end{matrix}

(102)

\begin{matrix} = \sum_{x^{'} \neq x, y^{'} \neq y} E_{f, g} [ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'}) \cdot {(1 + γ_{1} + γ_{2} + γ_{3})}^{ρ - 1}] \end{matrix}

(103)

\begin{matrix} = \sum_{x^{'} \neq x, y^{'} \neq y} E_{f, g} [ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'})] \cdot E_{f, g} [{(1 + γ_{1} + γ_{2} + γ_{3})}^{ρ - 1}] \end{matrix}

(104)

\begin{matrix} \leq \sum_{x^{'} \neq x, y^{'} \neq y} E_{f, g} [ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'})] \cdot E_{f, g} [{(1 + δ_{1} + δ_{2} + β_{3})}^{ρ - 1}] \end{matrix}

(105)

\begin{matrix} \leq \sum_{x^{'} \neq x, y^{'} \neq y} E_{f, g} [ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'})] \cdot 4^{ρ - 1} \cdot E_{f, g} [1 + δ_{1}^{ρ - 1} + δ_{2}^{ρ - 1} + β_{3}^{ρ - 1}] \end{matrix}

(106)

\begin{matrix} = 4^{ρ - 1} {E_{f, g} [β_{3}] + \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} [δ_{1}] E_{f, g} [δ_{1}^{ρ - 1}] \end{matrix}

\begin{matrix} + \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} [δ_{2}] E_{f, g} [δ_{2}^{ρ - 1}] + E_{f, g} [β_{3}] E_{f, g} [β_{3}^{ρ - 1}]} \end{matrix}

(107)

\begin{matrix} \leq 4^{ρ} max {E_{f, g} [β_{3}], \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} [δ_{1}] E_{f, g} [δ_{1}^{ρ - 1}], \end{matrix}

\begin{matrix} \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} [δ_{2}] E_{f, g} [δ_{2}^{ρ - 1}], E_{f, g} [β_{3}] E_{f, g} [β_{3}^{ρ - 1}]} \end{matrix}

(108)

with:

\begin{matrix} γ_{1} = γ_{1} (x, y, x^{'}, y^{'}, f) & ≜ \sum_{\tilde{x} \notin {x, x^{'}}} ψ (\tilde{x}, y^{'}) ϕ_{f} (\tilde{x}), \end{matrix}

(109)

\begin{matrix} γ_{2} = γ_{2} (x, y, x^{'}, y^{'}, g) & ≜ \sum_{\tilde{y} \notin {y, y^{'}}} ψ (x^{'}, \tilde{y}) ϕ_{g} (\tilde{y}), \end{matrix}

(110)

\begin{matrix} γ_{3} = γ_{3} (x, y, x^{'}, y^{'}, f, g) & ≜ \sum_{\tilde{x} \notin {x, x^{'}}, \tilde{y} \notin {y, y^{'}}} ψ (\tilde{x}, \tilde{y}) ϕ_{f} (\tilde{x}) ϕ_{g} (\tilde{y}), \end{matrix}

(111)

\begin{matrix} δ_{1} = δ_{1} (x, y, y^{'}, f) & ≜ \sum_{\tilde{x} \neq x} ψ (\tilde{x}, y^{'}) ϕ_{f} (\tilde{x}), \end{matrix}

(112)

\begin{matrix} δ_{2} = δ_{2} (x, y, x^{'}, g) & ≜ \sum_{\tilde{y} \neq y} ψ (x^{'}, \tilde{y}) ϕ_{g} (\tilde{y}), \end{matrix}

(113)

where (102) follows from splitting the sum in braces depending on whether

\tilde{x} = x^{'}

or not and whether

\tilde{y} = y^{'}

or not and from assuming

ψ (x^{'}, y^{'}) = ϕ_{f} (x^{'}) = ϕ_{g} (y^{'}) = 1

within the braces, which does not change the value of the expression because it is multiplied by

ψ (x^{'}, y^{'}) ϕ_{f} (x^{'}) ϕ_{g} (y^{'})

; (104) holds because

(ϕ_{f} (x^{'}), ϕ_{g} (y^{'}))

and

(γ_{1}, γ_{2}, γ_{3})

are independent since

\tilde{x} \neq x^{'}

and

\tilde{y} \neq y^{'}

; (105) holds because

ρ - 1 > 0

,

γ_{1} \leq δ_{1}

,

γ_{2} \leq δ_{2}

, and

γ_{3} \leq β_{3}

; (106) follows from Lemma 6; and (107) follows from identifying

E_{f, g} [β_{3}]

,

E_{f, g} [δ_{1}]

, and

E_{f, g} [δ_{2}]

because

ϕ_{f} (x^{'})

and

ϕ_{g} (y^{'})

are independent,

E_{f, g} [ϕ_{f} (x^{'})] = 1 / | U |

, and

E_{f, g} [ϕ_{g} (y^{'})] = 1 / | V |

. As indicated in (109)–(113), the dependence of

γ_{1}

,

γ_{2}

,

γ_{3}

,

δ_{1}

, and

δ_{2}

on x, y,

x^{'}

,

y^{'}

, f, and g is implicit in our notation.

To bound

E_{f, g} [β_{3}^{ρ}]

further, we study some of the terms on the RHS of (108) separately, starting with the second, which involves the sum over

y^{'}

. For fixed

x \in X

,

y \in Y

, and

y^{'} \in Y ∖ {y}

,

\begin{matrix} E_{f, g} [δ_{1}] E_{f, g} [δ_{1}^{ρ - 1}] & \leq E_{f, g} {[δ_{1}^{ρ}]}^{\frac{1}{ρ}} E_{f, g} {[δ_{1}^{ρ}]}^{\frac{ρ - 1}{ρ}} \end{matrix}

(114)

\begin{matrix} = E_{f, g} [δ_{1}^{ρ}] \end{matrix}

(115)

\begin{matrix} \leq 2^{ρ^{2}} max \{E_{f, g} [δ_{1}], E_{f, g} {[δ_{1}]}^{ρ}\} \end{matrix}

(116)

\begin{matrix} \leq 2^{ρ^{2}} (E_{f, g} [δ_{1}] + E_{f, g} {[δ_{1}]}^{ρ}), \end{matrix}

(117)

where (114) follows from Jensen’s inequality because

z \mapsto z^{\frac{1}{ρ}}

and

z \mapsto z^{\frac{ρ - 1}{ρ}}

are both concave on

R_{\geq 0}

since

ρ > 1

, and (116) follows from Lemma 7 because

ρ > 1

and because

δ_{1}

is a sum of independent random variables taking values in

{0, 1}

. This implies that for fixed

x \in X

and

y \in Y

,

\begin{matrix} \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} [δ_{1}] E_{f, g} [δ_{1}^{ρ - 1}] & \leq 2^{ρ^{2}} \sum_{y^{'} \neq y} \frac{1}{| V |} (E_{f, g} [δ_{1}] + E_{f, g} {[δ_{1}]}^{ρ}) \end{matrix}

(118)

\begin{matrix} = 2^{ρ^{2}} E_{f, g} [β_{3}] + 2^{ρ^{2}} \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} {[δ_{1}]}^{ρ}, \end{matrix}

(119)

where (119) follows from the definitions of

δ_{1}

and

β_{3}

. Similarly, for the third term on the RHS of (108),

\begin{matrix} \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} [δ_{2}] E_{f, g} [δ_{2}^{ρ - 1}] & \leq 2^{ρ^{2}} E_{f, g} [β_{3}] + 2^{ρ^{2}} \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} {[δ_{2}]}^{ρ} . \end{matrix}

(120)

With the help of (119) and (120), we now go back to (108) and argue that it implies that for fixed

x \in X

and

y \in Y

,

E_{f, g} [β_{3}^{ρ}] \leq 2 \cdot 4^{ρ^{2}} [E_{f, g} [β_{3}] + \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} {[δ_{1}]}^{ρ} + \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} {[δ_{2}]}^{ρ} + E_{f, g} {[β_{3}]}^{ρ}] .

(121)

To prove this, we consider four cases depending on which term on the RHS of (108) achieves the maximum: If

E_{f, g} [β_{3}]

achieves the maximum, then (121) holds because

4^{ρ} \leq 2 \cdot 4^{ρ^{2}}

. If the LHS of (118) achieves the maximum, then (121) follows from (119) because

4^{ρ} \cdot 2^{ρ^{2}} \leq 2 \cdot 4^{ρ^{2}}

. If the LHS of (120) achieves the maximum, then (121) follows similarly. Finally, if

E_{f, g} [β_{3}] E_{f, g} [β_{3}^{ρ - 1}]

achieves the maximum, then:

\begin{matrix} E_{f, g} [β_{3}^{ρ}] & \leq 4^{ρ} E_{f, g} [β_{3}] E_{f, g} [β_{3}^{ρ - 1}] \end{matrix}

(122)

\begin{matrix} \leq 4^{ρ} E_{f, g} [β_{3}] E_{f, g} {[β_{3}^{ρ}]}^{\frac{ρ - 1}{ρ}}, \end{matrix}

(123)

where (123) follows from Jensen’s inequality because

z \mapsto z^{\frac{ρ - 1}{ρ}}

is concave on

R_{\geq 0}

for

ρ > 1

. Rearranging (123), we obtain:

\begin{matrix} E_{f, g} [β_{3}^{ρ}] \leq 4^{ρ^{2}} E_{f, g} {[β_{3}]}^{ρ}, \end{matrix}

(124)

so (121) holds also in this case.

Having established (121), we now take the expectation of its sides to obtain:

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{3}^{ρ}] \leq 2 \cdot 4^{ρ^{2}} \sum_{x, y} P (x, y) [E_{f, g} [β_{3}] + \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} {[δ_{1}]}^{ρ} + \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} {[δ_{2}]}^{ρ} + E_{f, g} {[β_{3}]}^{ρ}] . \end{matrix}

(125)

We now study the terms on the RHS of (125) separately, starting with the fourth (last). By (85)–(90), which hold also if

ρ > 1

,

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} {[β_{3}]}^{ρ} \leq 2^{- ρ ϵ} . \end{matrix}

(126)

As for the first term on the RHS of (125),

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{3}] \leq 2^{- ϵ}, \end{matrix}

(127)

which follows from (126) in the same way as (97) followed from (95). As for the second term on the RHS of (125),

\begin{matrix} \sum_{x, y} P & (x, y) \sum_{y^{'} \neq y} \frac{1}{| V |} E_{f, g} {[δ_{1}]}^{ρ} \end{matrix}

\begin{matrix} = \sum_{x, y} P (x, y) \sum_{y^{'} \neq y} \frac{1}{| V |} {[\sum_{x^{'} \neq x} ψ (x^{'}, y^{'}) \frac{1}{| U |}]}^{ρ} \end{matrix}

(128)

\begin{matrix} \leq \sum_{x, y} P {(x, y)}^{\tilde{ρ}} \sum_{y^{'}} \frac{1}{| V |} {[\sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}} \frac{1}{| U |}]}^{ρ} \end{matrix}

(129)

\begin{matrix} = \sum_{x, y} P {(x, y)}^{\tilde{ρ}} \sum_{y^{'}} {[\sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}} \frac{1}{{| U \times V |}^{ρ}}]}^{\frac{1}{ρ}} \cdot {[\sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}} \frac{1}{{| U |}^{\frac{ρ}{1 + ρ}}}]}^{(1 + ρ) \cdot \frac{ρ - 1}{ρ}} \end{matrix}

(130)

\begin{matrix} \leq \sum_{x, y} P {(x, y)}^{\tilde{ρ}} {\{\sum_{y^{'}} \sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}} \frac{1}{{| U \times V |}^{ρ}}\}}^{\frac{1}{ρ}} \cdot {\{\sum_{y^{'}} {[\sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}} \frac{1}{{| U |}^{\frac{ρ}{1 + ρ}}}]}^{1 + ρ}\}}^{\frac{ρ - 1}{ρ}} \end{matrix}

(131)

\begin{matrix} = {\{\frac{1}{{| U \times V |}^{ρ}} {[\sum_{x, y} P {(x, y)}^{\tilde{ρ}}]}^{1 + ρ}\}}^{\frac{1}{ρ}} \cdot {\{\frac{1}{{| U |}^{ρ}} \sum_{y^{'}} {[\sum_{x^{'}} P {(x^{'}, y^{'})}^{\tilde{ρ}}]}^{1 + ρ}\}}^{\frac{ρ - 1}{ρ}} \end{matrix}

(132)

\begin{matrix} \leq {(2^{- ρ ϵ})}^{\frac{1}{ρ}} \cdot {(2^{- ρ ϵ})}^{\frac{ρ - 1}{ρ}} \end{matrix}

(133)

\begin{matrix} = 2^{- ρ ϵ}, \end{matrix}

(134)

where in (129), we extended the inner summations and used that

ψ (x^{'}, y^{'}) \leq {[P (x^{'}, y^{'}) / P (x, y)]}^{\tilde{ρ}}

; (131) follows from Hölder’s inequality; and (133) follows from (89)–(90) and (81)–(82). In the same way, we obtain for the third term on the RHS of (125):

\begin{matrix} \sum_{x, y} P (x, y) \sum_{x^{'} \neq x} \frac{1}{| U |} E_{f, g} {[δ_{2}]}^{ρ} \leq 2^{- ρ ϵ} . \end{matrix}

(135)

From (125), (127), (134), (135), and (126), we deduce:

\begin{matrix} \sum_{x, y} P (x, y) E_{f, g} [β_{3}^{ρ}] & \leq 2 \cdot 4^{ρ^{2}} (2^{- ϵ} + 2^{- ρ ϵ} + 2^{- ρ ϵ} + 2^{- ρ ϵ}) \end{matrix}

(136)

\begin{matrix} \leq 8 \cdot 4^{ρ^{2}} \cdot 2^{- ϵ}, \end{matrix}

(137)

where (137) holds because

2^{- ρ ϵ} \leq 2^{- ϵ}

since

ρ > 1

and

ϵ > 0

. Finally, (68), (99), (100), and (137) imply:

\begin{matrix} E_{f, g} [E [G {(X, Y | f (X), g (Y))}^{ρ}]] & \leq 1 + 4^{ρ} (2 \cdot 2^{ρ^{2} + 1} \cdot 2^{- ϵ} + 8 \cdot 4^{ρ^{2}} \cdot 2^{- ϵ}) \end{matrix}

(138)

\begin{matrix} \leq 1 + 4^{{(ρ + 1)}^{2}} \cdot 2^{- ϵ} \end{matrix}

(139)

and thus prove the existence of

f : X \to U

and

g : Y \to V

satisfying (64). ☐

Corollary 2.

For any

ρ > 0

, rate pairs in the interior of

R (ρ)

are achievable.

Proof.

Let

(R_{X}, R_{Y})

be in the interior of

R (ρ)

. Then, (6)–(8) hold with strict inequalities, and there exists a

δ > 0

such that for all sufficiently large n,

\begin{matrix} log ⌊ 2^{n R_{X}} ⌋ & \geq H_{\tilde{ρ}} (X^{n} | Y^{n}) + n δ, \end{matrix}

(140)

\begin{matrix} log ⌊ 2^{n R_{Y}} ⌋ & \geq H_{\tilde{ρ}} (Y^{n} | X^{n}) + n δ, \end{matrix}

(141)

\begin{matrix} log ⌊ 2^{n R_{X}} ⌋ + log ⌊ 2^{n R_{Y}} ⌋ & \geq H_{\tilde{ρ}} (X^{n}, Y^{n}) + n δ . \end{matrix}

(142)

Using Theorem 3 with

X^{'} ≜ X^{n}

,

Y^{'} ≜ Y^{n}

,

U ≜ {1, \dots, ⌊ 2^{n R_{X}} ⌋}

,

V ≜ {1, \dots, ⌊ 2^{n R_{Y}} ⌋}

,

P_{X^{'} Y^{'}} ≜ P_{X^{n} Y^{n}}

, and

ϵ_{n} ≜ n δ

shows that, for all sufficiently large n, there exist encoders

f_{n} : X^{n} \to U

and

g_{n} : Y^{n} \to V

and a guessing function

G_{n}

satisfying:

\begin{matrix} E [G_{n} {(X^{n}, Y^{n} | f_{n} (X^{n}), g_{n} (Y^{n}))}^{ρ}] & \leq \{\begin{matrix} 1 + 4^{ρ + 1} \cdot 2^{- ρ ϵ_{n}} & if ρ \in (0, 1], \\ 1 + 4^{{(ρ + 1)}^{2}} \cdot 2^{- ϵ_{n}} & if ρ > 1 . \end{matrix} \end{matrix}

(143)

Because

ϵ_{n}

tends to infinity as n tends to infinity, the RHS of (143) tends to one as n tends to infinity, which implies that the rate pair

(R_{X}, R_{Y})

is achievable. ☐

Author Contributions

Writing—original draft preparation, A.B., A.L. and C.P.; writing—review and editing, A.B., A.L. and C.P.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Massey, J.L. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory (ISIT), Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar] [CrossRef]
Arıkan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef]
Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef]
Bracher, A.; Hof, E.; Lapidoth, A. Guessing attacks on distributed-storage systems. arXiv, 2017; arXiv:1701.01981v1. [Google Scholar]
Graczyk, R.; Lapidoth, A. Variations on the guessing problem. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 231–235. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006; ISBN 978-0-471-24195-9. [Google Scholar]
Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 6801–6810. [Google Scholar] [CrossRef]
Csiszár, I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
Bracher, A.; Lapidoth, A.; Pfister, C. Distributed task encoding. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 1993–1997. [Google Scholar] [CrossRef]
Lapidoth, A.; Pfister, C. Two measures of dependence. In Proceedings of the 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), Eilat, Israel, 16–18 November 2016; pp. 1–5. [Google Scholar] [CrossRef]
Rosenthal, H.P. On the subspaces of L^p (p > 2) spanned by sequences of independent random variables. Isr. J. Math. 1970, 8, 273–303. [Google Scholar] [CrossRef]
Boztaş, S. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar] [CrossRef]
Hanawal, M.K.; Sundaresan, R. Guessing revisited: A large deviations approach. IEEE Trans. Inf. Theory 2011, 57, 70–78. [Google Scholar] [CrossRef]
Christiansen, M.M.; Duffy, K.R. Guesswork, large deviations, and Shannon entropy. IEEE Trans. Inf. Theory 2013, 59, 796–802. [Google Scholar] [CrossRef]
Sundaresan, R. Guessing based on length functions. In Proceedings of the 2007 IEEE International Symposium on Information Theory (ISIT), Nice, France, 24–29 June 2007; pp. 716–719. [Google Scholar] [CrossRef]
Sason, I. Tight bounds on the Rényi entropy via majorization with applications to guessing and compression. Entropy 2018, 20, 896. [Google Scholar] [CrossRef]
Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef]
Arıkan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef]
Bunte, C.; Lapidoth, A. On the listsize capacity with feedback. IEEE Trans. Inf. Theory 2014, 60, 6733–6748. [Google Scholar] [CrossRef]
Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: Hoboken, NJ, USA, 1968; ISBN 0-471-29048-3. [Google Scholar]
Arıkan, E.; Merhav, N. Joint source-channel coding and guessing with application to sequential decoding. IEEE Trans. Inf. Theory 1998, 44, 1756–1769. [Google Scholar] [CrossRef]
Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef]
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Volume 1, pp. 547–561. [Google Scholar]
Arimoto, S. Information measures and capacity of order α for discrete memoryless channels. In Topics in Information Theory; Csiszár, I., Elias, P., Eds.; North-Holland Publishing Company: Amsterdam, The Netherlands, 1977; pp. 41–52. ISBN 0-7204-0699-4. [Google Scholar]
Sason, I.; Verdú, S. Arimoto–Rényi conditional entropy and Bayesian M-Ary hypothesis testing. IEEE Trans. Inf. Theory 2018, 64, 4–25. [Google Scholar] [CrossRef]

Figure 1. Guessing with an encoder f.

Figure 2. Guessing with distributed encoders

f_{n}

and

g_{n}

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Guessing with Distributed Encoders

Abstract

1. Introduction

2. Related Work

3. Preliminaries

4. Converse

5. Achievability

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics