Guessing with Distributed Encoders

Two correlated sources emit a pair of sequences, each of which is observed by a different encoder. Each encoder produces a rate-limited description of the sequence it observes, and the two descriptions are presented to a guessing device that repeatedly produces sequence pairs until correct. The number of guesses until correct is random, and it is required that it have a moment (of some prespecified order) that tends to one as the length of the sequences tends to infinity. The description rate pairs that allow this are characterized in terms of the Rényi entropy and the Arimoto–Rényi conditional entropy of the joint law of the sources. This solves the guessing analog of the Slepian–Wolf distributed source-coding problem. The achievability is based on random binning, which is analyzed using a technique by Rosenthal.


Introduction
In the Massey-Arıkan guessing problem [1,2], a random variable X is drawn from a finite set X according to some probability mass function (PMF) P X , and it has to be determined by making guesses of the form "Is X equal to x?" until the guess is correct. The guessing order is determined by a guessing function G, which is a bijective function from X to {1, . . . , |X |}. Guessing according to G proceeds as follows: the first guess is the elementx 1 ∈ X satisfying G(x 1 ) = 1; the second guess is the elementx 2 ∈ X satisfying G(x 2 ) = 2, and so on. Consequently, G(X) is the number of guesses needed to guess X. Arıkan [2] showed that for any ρ > 0, the ρth moment of the number of guesses required by an optimal guesser G * to guess X is bounded by: where ln(·) denotes the natural logarithm, and H 1/(1+ρ) (X) denotes the Rényi entropy of order 1 1+ρ , which is defined in Section 3 ahead (refinements of (1) were recently derived in [3]).
Guessing with an encoder is depicted in Figure 1. Here, prior to guessing X, the guesser is provided some side information about X in the form of f (X), where f : X → {1, . . . , M} is a function taking on at most M different values ("labels"). Accordingly, a guessing function G(·|·) is a function from X × {1, . . . , M} to {1, . . . , |X |} such that for every label m ∈ {1, . . . , M}, G(·|m) : X → {1, . . . , |X |} is bijective. If, among all encoders, f * minimizes the ρth moment of the number of guesses required by an optimal guesser to guess X after observing f (X), then [4] (Corollary 7): Thus, in guessing a sequence of independent and identically distributed (IID) random variables, a description rate of approximately H 1/(1+ρ) (X) bits per symbol is needed to drive the ρth moment of the number of guesses to one as the sequence length tends to infinity [4,5] (see Section 2 for more related work).  In this paper, we generalize the single-encoder setting from Figure 1 to the setting with distributed encoders depicted in Figure 2, which is the analog of Slepian-Wolf coding [6] for guessing: A source generates a sequence of pairs {(X i , Y i )} n i=1 over a finite alphabet X × Y. The sequence X n is described by one of 2 nR X labels and the sequence Y n by one of 2 nR Y labels using functions: where R X ≥ 0 and R Y ≥ 0. Based on f n (X n ) and g n (Y n ), a guesser repeatedly produces guesses of the form (x n ,ŷ n ) until (x n ,ŷ n ) = (X n , Y n ). For a fixed ρ > 0, a rate pair (R X , R Y ) ∈ R 2 ≥0 is called achievable if there exists a sequence of encoders and guessing functions {( f n , g n , G n )} ∞ n=1 such that the ρth moment of the number of guesses tends to one as n tends to infinity, i.e., Our main contribution is Theorem 1, which characterizes the achievable rate pairs. For a fixed ρ > 0, let the region R(ρ) comprise all rate pairs (R X , R Y ) ∈ R 2 ≥0 satisfying the following inequalities simultaneously: where the Rényi entropy H α (·) and the Arimoto-Rényi conditional entropy H α (·|·) of order α are both defined in Section 3 ahead, and throughout the paper, Theorem 1. For any ρ > 0, all rate pairs in the interior of R(ρ) are achievable, while those outside R(ρ) are not.
The rate region defined by (10)-(12) resembles the rate region of Slepian-Wolf coding [6] (Theorem 15.4.1); the difference is that the Shannon entropy and conditional entropy are replaced by their Rényi counterparts. The rate regions are related as follows: Remark 1. For memoryless sources and ρ > 0, the region R(ρ) is contained in the Slepian-Wolf region. Typically, the containment is strict.
The claim can also be shown operationally: The probability of error is equal to the probability that more than one guess is needed, and for every ρ > 0, where (14) follows from Markov's inequality. Thus, the probability of error tends to zero if the ρth moment of the number of guesses tends to one.
Despite the resemblance between (10)- (12) and the Slepian-Wolf region, there is an important difference: while Slepian-Wolf coding allows separate encoding with the same sum rate as with joint encoding, this is not necessarily true in our setting: Remark 2. Although the sum rate constraint (12) is the same as in single-source guessing [5], separate encoding of X n and Y n may require a larger sum rate than joint encoding of X n and Y n .
Proof. If Hρ(X|Y) + Hρ(Y|X) > Hρ(X, Y), then (10) and (11) together impose a stronger constraint on the sum rate than (12). For example, if: The guessing problem is related to the task-encoding problem, where based on f n (X n ) and g n (Y n ), the decoder outputs a list that is guaranteed to contain (X n , Y n ), and the ρth moment of the list size is required to tend to one as n tends to infinity. While, in the single-source setting, the guessing problem and the task-encoding problem have the same asymptotics [4], this is not the case in the distributed setting: Remark 3. For memoryless sources, the task-encoding region from [9] is strictly smaller than the guessing region R(ρ) unless X and Y are independent.
Proof. In the IID case, the task-encoding region is the set of all rate pairs (R X , R Y ) ∈ R 2 ≥0 satisfying the following inequalities [9] (Theorem 1): where K α (X; Y) is a Rényi measure of dependence studied in [10] (when α is one, K α (X; Y) is the mutual information). The claim now follows from the following observations: By The rest of this paper is structured as follows: in Section 2, we review other guessing settings; in Section 3, we recall the Rényi information measures and prove some auxiliary lemmas; in Section 4, we prove the converse theorem; and in Section 5, we prove the achievability theorem, which is based on random binning and, in the case ρ > 1, is analyzed using a technique by Rosenthal [11].

Related Work
Tighter versions of (1) can be found in [3,12]. The large deviation behavior of guessing was studied in [13,14]. The relation between guessing and variable-length lossless source coding was explored in [3,15,16].
Mismatched guessing, where the assumed distribution of X does not match its actual distribution, was studied in [17], along with guessing under source uncertainty, where the PMF of X belongs to some known set, and a guesser was sought with good worst-case performance over that set. Guessing subject to distortion, where instead of guessing X, it suffices to guess anX that is close to X according to some distortion measure, was treated in [18].
If the guesser observes some side information Y, then the ρth moment of the number of guesses required by an optimal guesser is bounded by [2]: where Hρ(X|Y) denotes the Arimoto-Rényi conditional entropy of orderρ = 1 1+ρ , which is defined in Section 3 ahead (refinements of (18) were recently derived in [3]). Guessing is related to the cutoff rate of a discrete memoryless channel, which is the supremum over all rates for which the ρth moment of the number of guesses needed by the decoder to guess the message can be driven to one as the block length tends to infinity. In [2,19], the cutoff rate was expressed in terms of Gallager's E 0 function [20]. Joint source-channel guessing was considered in [21].
Guessing with an encoder, i.e., the situation where the side information can be chosen, was studied in [4], where it was also shown that guessing and task encoding [22] have the same asymptotics. With distributed encoders, however, task encoding [9] and guessing no longer have the same asymptotics; see Remark 3. Lower and upper bounds for guessing with a helper, i.e., an encoder that does not observe X, but has access to a random variable that is correlated with X, can be found in [5].

Preliminaries
Throughout the paper, log(·) denotes the base-two logarithm. When clear from the context, we often omit sets and subscripts; for example, we write ∑ x for ∑ x∈X and P(x) for P X (x). The Rényi entropy [23] of order α is defined for positive α other than one as: In the limit as α tends to one, the Shannon entropy is recovered, i.e., lim α→1 H α (X) = H(X). The Arimoto-Rényi conditional entropy [24] of order α is defined for positive α other than one as: In the limit as α tends to one, the Shannon conditional entropy is recovered, i.e., lim α→1 H α (X|Y) = H(X|Y). The properties of the Arimoto-Rényi conditional entropy were studied in [7,24,25].
In the rest of this section, we recall some properties of the Arimoto-Rényi conditional entropy that will be used in Section 4 (Lemmas 1-3), and we prove auxiliary results for Section 5 (Lemmas 4-7).

Lemma 1 ([7]
, Theorem 2). Let α > 0, and let P XYZ be a PMF over the finite set X × Y × Z. Then, with equality if and only if X − − Z − − Y form a Markov chain.

Lemma 2 ([7], Proposition 4).
Let α > 0, and let P XYZ be a PMF over the finite set X × Y × Z. Then, with equality if and only if Y is uniquely determined by X and Z.
Lemma 5. Let a, b, and c be nonnegative integers. Then, for all p > 0, (the restriction to integers cannot be omitted; for example, (28) does not hold if a = b = c = 0.1 and p = 2).
Lemma 7 (Rosenthal). Let p > 1, and let X 1 , . . . , X n be independent random variables that are either zero or one. Then, X ∑ n i=1 X i satisfies: Proof. This is a special case of [11] (Lemma 1). For convenience, we also provide a self-contained proof: where (39) holds because each X i is either zero or one; (41) holds because X 1 , . . . , X n are independent; (42) holds because z → z p−1 is increasing on R ≥0 for p > 1; (44) holds because for real numbers a ≥ 0, b ≥ 0, and r > 0, we have (a + b) r ≤ (2 max{a, b}) r = 2 r max{a r , b r } ≤ 2 r (a r + b r ); and (46) follows from Jensen's inequality because z → z (p−1)/p is concave on R ≥0 for p > 1.
We now consider two cases depending on which term on the RHS of (47) achieves the maximum: If the maximum is achieved by E[X], then E[X p ] ≤ 2 p E[X], which implies (37) because 2 p ≤ 2 p 2 since p > 1. If the maximum is achieved by E[X]E[X p ] (p−1)/p , then: Rearranging (48), we obtain: so (37) holds also in this case.

Converse
In this section, we prove a nonasymptotic and an asymptotic converse result (Theorem 2 and Corollary 1, respectively).

Theorem 2.
Let U − − X − − Y − − V form a Markov chain over the finite set U × X × Y × V, and let τ 1 + ln|X × Y |. Then, for every ρ > 0 and for every guesser, the ρth moment of the number of guesses it takes to guess the pair (X, Y) based on the side information (U, V) satisfies: Proof. We view (50) as three lower bounds corresponding to the three terms in the maximization on its RHS. The lower bound involving Hρ(X, Y) holds because: where (51) follows from (18) where (53) follows from (18) Proof. We first show that (8) is necessary for a rate pair (R X , R Y ) ∈ R 2 ≥0 to be achievable. Indeed, if (8) does not hold, then there exists an > 0 such that for infinitely many n, Using Theorem 2 with X X n , Y Y n , U {1, . . . , 2 nR X }, V {1, . . . , 2 nR Y }, P X Y P X n Y n , U f n (X n ), V g n (Y n ), and τ n = 1 + n ln|X × Y | leads to: It follows from (60), (58), and the fact that 1 n log τ n tends to zero as n tends to infinity that the LHS of (59) cannot tend to one as n tends to infinity, so (R X , R Y ) is not achievable if (8) does not hold. The necessity of (6) and (7) can be shown in the same way.

Achievability
In this section, we prove a nonasymptotic and an asymptotic achievability result (Theorem 3 and Corollary 2, respectively). Theorem 3. Let X , Y, U , and V be finite nonempty sets; let P XY be a PMF; let ρ > 0; and let > 0 be such that: Then, there exist functions f : X → U and g : Y → V and a guesser such that the ρth moment of the number of guesses needed to guess the pair (X, Y) based on the side information ( f (X), g(Y)) satisfies: Proof. Our achievability result relies on random binning: we map each x ∈ X uniformly at random to some u ∈ U and each y ∈ Y uniformly at random to some v ∈ V. We then show that the ρth moment of the number of guesses averaged over all such mappings f : X → U and g : Y → V is upper bounded by the RHS of (64). From this, we conclude that there exist f and g that satisfy (64).
Let the guessing function G correspond to guessing in decreasing order of probability [2] (ties can be resolved arbitrarily). Let f and g be distributed as described above, and denote by E f, g [·] the expectation with respect to f and g. Then, with: where 1{·} is the indicator function that is one if the condition comprising its argument is true and zero otherwise; (65) holds because ( f , g) and (X, Y) are independent; (66) holds because the number of guesses is upper bounded by the number of (x , y ) that are at least as likely as (x, y) and that are mapped to the same labels (u, v) as (x, y); (67) follows from splitting the sum depending on whether x = x or not and whether y = y or not and from the fact that ψ(x, y) = φ f (x) = φ g (y) = 1; and (68) follows from Lemma 5 because β 1 , β 2 , and β 3 are nonnegative integers. As indicated in (69)-(74), the dependence of ψ, φ f , φ g , β 1 , β 2 , and β 3 on x, y, f , and g is implicit in our notation. We first treat the case ρ ∈ (0, 1]. We bound the terms on the RHS of (68) as follows: where (75) follows from Jensen's inequality because z → z ρ is concave on R ≥0 since ρ ∈ (0, 1]; (76) holds because the expectation operator is linear and because E f,g [φ f (x )] = 1/|U | since x = x; in (77), we extended the inner summation and used that ψ(x , y) ≤ [P(x , y)/P(x, y)]ρ; and (82) follows from (61).

Corollary 2.
For any ρ > 0, rate pairs in the interior of R(ρ) are achievable.
Proof. Let (R X , R Y ) be in the interior of R(ρ). Then, (6)-(8) hold with strict inequalities, and there exists a δ > 0 such that for all sufficiently large n, log 2 nR X ≥ Hρ(X n |Y n ) + nδ, log 2 nR X + log 2 nR Y ≥ Hρ(X n , Y n ) + nδ.
Using Theorem 3 with X X n , Y Y n , U {1, . . . , 2 nR X }, V {1, . . . , 2 nR Y }, P X Y P X n Y n , and n nδ shows that, for all sufficiently large n, there exist encoders f n : X n → U and g n : Y n → V and a guessing function G n satisfying: Because n tends to infinity as n tends to infinity, the RHS of (143) tends to one as n tends to infinity, which implies that the rate pair (R X , R Y ) is achievable.