The Listsize Capacity of the Gaussian Channel with Decoder Assistance

The listsize capacity is computed for the Gaussian channel with a helper that—cognizant of the channel-noise sequence but not of the transmitted message—provides the decoder with a rate-limited description of said sequence. This capacity is shown to equal the sum of the cutoff rate of the Gaussian channel without help and the rate of help. In particular, zero-rate help raises the listsize capacity from zero to the cutoff rate. This is achieved by having the helper provide the decoder with a sufficiently fine quantization of the normalized squared Euclidean norm of the noise sequence.


Introduction
The order-ρ listsize capacity C (ρ) list of a channel is the supremum of the coding rates for which there exist codes guaranteeing the large-blocklength convergence to one of the ρ-th moment of the cardinality of the list of messages that, given the received output sequence, have positive a posteriori probability. It is zero for the Gaussian channel because, on this channel, no codeword is ruled out by any received sequence so said list contains all the messages. Here we derive this capacity for the Gaussian channel with a helper that observes the noise sequence and describes it to the decoder using a rate-limited noise-free bit pipe; see Figure 1.  The order-ρ listsize capacity C (ρ) list of a channel is the supremum of the coding rates 10 for which there exist codes guaranteeing the large-blocklength convergence to one of 11 the ρ-th moment of the cardinality of the list of messages that, given the received output 12 sequence, have positive a posteriori probability. It is zero for the Gaussian channel 13 because, on this channel, no codeword is ruled out by any received sequence, so said 14 list contains all the messages. Here we derive this capacity for the Gaussian channel 15 with a helper that observes the noise sequence and describes it to the decoder using a 16 rate-limited noise-free bit pipe; see Figure 1. We show that the listsize capacity C (ρ) list (R h ) is then the sum of bit-pipe's rate R h and the order-ρ cutoff rate R cutoff (ρ) of the Gaussian channel without a helper The latter's definition is similar to that of the listsize capacity, but with the list now comprising only those messages that are a posteriori at least as likely as the transmitted one. As we shall see, for the Gaussian channel with average power P, noise-variance N, and corresponding signal-to-noise ratio (SNR) A ≜ P/N, We show that the listsize capacity C (ρ) list (R h ) is then the sum of bit-pipe's rate R h and the order-ρ cutoff rate R cutoff (ρ) of the Gaussian channel without a helper The latter's definition is similar to that of the listsize capacity, but with the list now comprising only those messages that are a posteriori at least as likely as the transmitted one. As we shall see, for the Gaussian channel with average power P, noise-variance N, and corresponding signal-to-noise ratio (SNR) A P/N, where (in nats) is a function that plays a prominent role in the analysis of the Reliability Function of said channel (Section 7.4 in [1]), [2]. That analysis does not, however, carry over directly to our setting because it deals with error exponents and not lists. It is interesting to note that (1) also holds when the help rate R h is zero: the number of help bits required to increase the listsize capacity from zero to R cutoff (ρ) is sublinear in the blocklength. In fact, as we shall see, all it takes is a sufficiently fine quantization of the normalized squared Euclidean norm of the noise sequence.
The relation (1) is reminiscent of the analogous result on the erasures-only capacity C e-o (R h ) of the Gaussian channel with a rate-R h helper (Remark 10 in [3]), namely, that where C denotes the Shannon capacity of the Gaussian channel (without help) (Theorem 9.1.1 in [4]), and C e-o (R h ) is the erasures-only capacity, which is defined like C (ρ) list (R h ) but with the requirement on the ρ-th moment of the list replaced by the requirement that the list be of size 1 with probability tending to one. (The Gaussian erasures-only capacity with a helper is given by the RHS of (4) irrespective of whether the assistance is provided to the encoder or decoder.) The latter result in turn is reminiscent of the analogous result on the Shannon capacity with a helper C(R h ) [5][6][7][8] In proving (1), we shall focus on the "direct part," i.e., that the right-hand side (RHS) of (1) is achievable. The "converse," that no rate exceeding the RHS of (1) is achievable, is omitted because it follows directly from (Remark 4 in [3]): There it is shown that this is true even if, given the received sequence and the provided help, the list contains only a subset of the messages that are of positive a posteriori probability, namely, those that are a posteriori at least as likely as the transmitted message.
The listsize capacity is relevant, for example, when the message set corresponds to tasks [9] and the transmitted message corresponds to one that must be performed by the decoder with absolute certainty. To ensure this, the decoder must perform all the tasks in the list of tasks that are not ruled out by the received sequence. (In addition to the transmitted task, other tasks need not but may be performed.) The ρ-th moment of the list's size then measures the receiver's average effort.
Results on the listsize capacity and the erasures-only capacity of general discrete memoryless channels (DMCs) in the absence of help are scarce. Noteworthy exceptions are the results of Pinsker and Sheverdjaev [10], Csiszár and Narayan [11], and Telatar [12], that provide sufficient conditions for the erasures-only capacity to equal the Shannon capacity and for the listsize capacity to equal the cutoff rate. Asymptotic results on the erasures-only capacity in the low-noise regime can be found in [13,14]. Once noiseless feedback is introduced, the problems become more tractable [15][16][17].
The rest of the paper is organized as follows. Section 2 describes our set-up and presents the main result. Section 3 contains some classical and some new observations regarding Gallager's E 0 function and its modification. Section 4 derives the cutoff rate of the Gaussian channel without help and proves (2). Section 5 describes and analyzes a coding scheme that proves the direct part of (1).

The Main Result
A power-P blocklength-n encoder f (n) for a message set M is a mapping that maps each message m ∈ M to an n-tuple f (n) (m) whose Euclidean norm f (n) (m) satisfies We sometimes use x m to denote f (n) (m), and x m,k to denote the k-th component of The encoder is said to be of rate R if the cardinality of M is e nR , in which case we often assume that M = {1, . . . , e nR }. (We ignore the fact that e nR need not be an integer; this issue washes out in the large-n asymptotics we study.) When a message m ∈ M is sent over the discrete-time additive Gaussian noise channel using the encoder f (n) , the channel produces the random vector Y ∈ R n whose where {Z k } are independent and identically distributed (IID) zero-mean Gaussians of variance N. We assume that N is positive and use w(y|x) to denote the density of the channel's output when its input is x, i.e., the mean-x variance-N Gaussian density which we extend to n-tuples in a memoryless fashion: For convenience, we define Given an output sequence y and a message m, we define the "at-least-as-likely list" Assuming, as we do, that the messages are a priori equally likely, this list comprises the messages that, given the output sequence y, are a posteriori at least as likely as m.
If a message M, drawn equiprobably from M, is transmitted over the channel with a resulting received sequence Y, then the cardinality of the at-least-as-likely list is a random positive integer, and we denote its ρ-th moment E |L(M, Y)| ρ : where ν(·) denotes the Lebesgue measure on R n .
For a given ρ > 0, we define the order-ρ cutoff rate R cutoff (ρ) as the supremum of the rates R for which there exists a sequence of rate-R power-P blocklength-n encoders Theorem 1. The order-ρ cutoff rate R cutoff (ρ) of the additive Gaussian noise channel equals R 0 (ρ) of (3).

Proof. See Section 4.
A T n -valued description of the noise sequence Z = (Z 1 , . . . , Z n ) is a mapping with the understanding that φ (n) (Z), which we denote T, is the description of Z. We say Suppose now that, in addition to the received sequence Y, the receiver is also presented with the description T = φ (n) (Z) of the noise, and that, based on the two, it forms the "remotely-plausible list" L(Y, T) comprising the messages that have positive a posteriori probability given the two: Given ρ > 0, the listsize capacity C (ρ) list (R h ) with rate-R h decoder assistance is the supremum of the rates R for which there exists a sequence of rate-R power-P blocklength-n encoders { f (n) } and a sequence {φ (n) } of descriptions of rate R h such that Theorem 2. On the Gaussian channel, the listsize capacity with rate-R h decoder assistance where R cutoff (ρ) is the order-ρ cutoff rate of the channel (without assistance) as given in (2) and (3).
Proof. The "converse," that (19) cannot be achieved when the rate exceeds the RHS of (20), follows from (Remark 4 in [3]). The "direct part," describing a coding scheme that achieves (19) with rates approaching the RHS of (20), is proved in Section 5.

Preliminaries
Given ρ ≥ 0 and any probability measure Q on R, Gallager's E 0 function for our channel is defined as [1] where ν(·) is now the Lebesgue measure on R. The result of maximizing E 0 (ρ, Q) over all Q under which E[X 2 ] ≤ P, is denoted E * 0 (ρ): The multi-letter extension of E 0 is where Q (n) is a probability measure on R n ; the integrals are over R n ; the channel w(y|x) is defined in (11). Similarly, Given probability measures Q (m) on R m and Q (n) on R n that satisfy the power con- The sequence n E (n), * 0 (ρ) is thus superadditive, and Feket's Subadditive lemma implies that E (n), * 0 (ρ) converges to its supremum: We shall later see (cf. (55) ahead) that where R 0 (ρ) is defined in (3). We shall also need Gallager's modified E 0 function. To highlight its relation to the unmodified function, which is quite general, we shall use g(x) for x 2 and g(x) for x 2 . We shall also replace P with Γ.
Given some ρ ≥ 0, some probability distribution Q on R under which E[g(X)] ≤ Γ, and some r ≥ 0, the modified Gallager's E 0 function E 0,m (ρ, Q, r) is defined as We shall also be interested in the maximum of E 0,m (ρ, Q, r) over both Q and r. We distinguish between two cases depending on whether E[g(X)] ≤ Γ holds strictly or not. In the former case we only allow r to be zero, whereas in the latter case it can be any non-negative number. We thus define and The next proposition provides a lower bound on lim E (n), * 0 (ρ).

Proposition 1.
Any probability distribution Q on R under which g(X) is of finite second moment and of expectation Γ provides the lower bound Proof. Let Q be any input distributions Q under which g(X) is of finite second moment and E[g(X)] = Γ. For each n ∈ N, let Q (n) be the conditional distribution of the n-fold product distribution Q ×n given the event {X ∈ A n }, where where δ > 0 is some positive constant. Thus, for every Borel measurable subset B of R n , For any r ≥ 0, we can upper-bound the Radon-Nykodim derivative of Q (n) with respect to product distribution Q ×n as follows: where I{statement} equals 1 if the statement is true and else 0. Using this bound on the Radon-Nykodim derivative we obtain: By the Central Limit Theorem, µ tends to 1/2 as n tends to infinity, so (43) implies that lim inf Taking the supremum of the RHS over all r ≥ 0, establishes that and hence, by (24), proves (33).
and, consequently, Proof. The proof is based on Proposition 2 in [18], which implies that for every density f (n) R on R n and any probability measure Q (n) on R n , Applying this inequality to the product density where f R is a density on R, and using the product form of the channel (11), we obtain that for any density f R on R where Q (n) i is the i-th marginal of Q (n) , andQ is the probability measure on R defined bȳ Observe that if E[g(X)] ≤ nΓ under Q (n) , then E[g(X)] ≤ Γ underQ. This observation and (51) establish (46). Since (46) holds for all n, (47) must also hold.

The Cutoff Rate of the Gaussian Channel
In this section, we prove Theorem 1. Since scaling the output does not change the cutoff rate, we will assume WLOG that the noise variance is 1 and the transmit power is A; see (12). Thus, and each codeword x m satisfies

Computing lim E
(n), * 0 (ρ) Here we shall establish that on the Gaussian channel (53) where R 0 (ρ) is defined in (3), and Q G is the zero-mean variance-A Gaussian distribution.
To this end, we shall derive matching upper and lower bounds on the limit. We begin with the former.

Upper-Bounding lim E
(n), * 0 (ρ) We show that on the channel (10) The proof is based on Proposition 2 with the density f R corresponding to a centered Gaussian of variance σ 2 , where and Evaluating the RHS of (47) for this density, we obtain where in (61) we defined To conclude the proof, it remains to show that the RHS of (64) coincides with ρR 0 (ρ). To this end, observe that some basic algebra reveals that and Therefore, the first term in (64) can be rewritten as and the remaining terms rewritten as The sum equals to ρR 0 (ρ).

The Mapping ρ → R 0 (ρ) Is Monotonically Decreasing
For the purpose of proving the achievability of R 0 (ρ), we will need the fact that it is monotonically decreasing in ρ. In view of (55), it suffices to show that, for every n ∈ N, the mapping ρ → ρ −1 E (n), * 0 (ρ) is monotonically decreasing. In view of (24), the latter will follow once we establish the monotonicity of ρ → ρ −1 E (n) 0 (ρ, Q (n) ) for any fixed Q (n) . Since E (n) 0 (ρ, Q (n) ) evaluates to zero at ρ = 0, this monotonicity can be established by showing that the mapping ρ → E (n) 0 (ρ, Q (n) ) is concave. This is established in (Appendix 5.B in [1]). (That appendix deals with finite alphabets, but the proof goes through also to our case.)

Achievability of R 0 (ρ)
The achievability of R 0 (ρ) will be proved using a random-coding argument. Let Q be the zero-mean variance-A Gaussian distribution, let δ > 0 be a positive constant, and let Q (n) be the distribution on R n defined in (35) and (36). Draw the codewords {X m } m=1,...,e nR of a blocklength-n random codebook independently, each according to Q (n) , so X m 2 ≤ nA with probability 1 for every m ∈ M. By symmetry, E |L(m, Y)| ρ (where the expectation is over the random choice of codebook and on the channel behavior) does not depend on m. Consequently, and if we establish that E |L(1, Y)| ρ tends to 1, it will follow by the random-coding argument that there exists a codebook for which the LHS of (77)-with the expectation now over the channel behavior only-tends to 1. Defining we can express the RHS of (77) as and we seek to show that To this end, we shall need the following lemma.

Lemma 1.
Let {Z n } be a sequence of random variables taking values in N, and let ρ > 0 be fixed. The following two conditions are then equivalent: (1) where o(1) tends to zero as n tends to infinity. Thus Proof. The implication (ii) =⇒ (i) follows by noting for any z ∈ N and ρ > 0 As for the implication (i) =⇒ (ii), note that any y ∈ N and ρ > 0 The implication is now established by noting that (i) implies that Pr[Z n = 0] → 1 because, by Markov's inequality (and the strict positivity of ρ), In light of the above lemma, to establish (80) it suffices to show that i.e., that where the outer expectation is over X 1 and Y.
A related expectation-but one where it is the conditional expectation that is raised to the ρ-th power-is studied in the following lemma: (90)

Proof. See Appendix A.
To establish (88) using this lemma, we distinguish between two cases depending on whether 0 < ρ ≤ 1 or ρ > 1. In the former case x → x ρ is concave, so Jensen's inequality implies that which, together with Lemma 2, implies (88) whenever R < R 0 (ρ). Suppose now that ρ > 1. Conditional on the transmitted codeword x 1 and the output y, the random variables {B m } m =1 are IID Bernoulli, with B m determined by X m . We can thus use Rosenthal's technique (Lemma 5.10 in [19]), [20] to obtain Taking the expectation over X 1 and Y yields The first term on the RHS can be treated using the lemma. The second-but for the 2 ρ 2 constant-is the one encountered when ρ is 1. Since by Section 4.2, R 0 (ρ) ≤ R 0 (1) (because ρ > 1 for the case at hand), it too tends to zero when R < R 0 (ρ).

No Rate Exceeding R 0 (ρ) Is Achievable
To show the converse, we need Arıkan's lower bound on guessing [21]. Fix any sequence of rate-R blocklength-n codebooks {C n } satisfying the cost constraint. For any n ∈ N, let be the induced probability distribution on R n . Since the codebook satisfies the cost constraint, E[ X 2 ] ≤ nA under Q (n) . Given y, list the messages m ∈ M in decreasing order of likelihood w(y|x m ) (resolving ties arbitrarily, e.g., ranking low numerical values of m higher), and let G(m|y) denote the ranking of the message m in this list. Note that where the inequality can be strict because there may be messages that are in L(m, y) because they have the same likelihood as m, and that are yet ranked lower than m by G(·|y) because of the way ties are resolved. It follows from this inequality that the ρ-th moment of |L(M, Y)| cannot tend to one unless the ρ-th moment of G(M|Y) does. By Arıkan's guessing inequality [21], so the ρ-th moment of G(M|Y) can tend to one only if From this, the converse now follows using (24) and (55) because = ρR 0 (ρ).

The Direct Part of Theorem 2
In this section we prove the direct part of Theorem 2: when the decoder can be provided with a rate-R h description of the noise, the convergence (19) can be achieved at all transmission rates below R 0 (ρ) + R h . As noted earlier, the converse follows directly from (Remark 4 in [3]).
Our proof treats the cases R h = 0 and R h > 0 separately. As in Section 4, we assume that the channel is normalized to having noise variance 1 and transmit power A.

Case 1: R h = 0
The analogous result for the modulo-additive channel was proved in [3] by having the helper provide the decoder with a lossless description of the type of the noise sequence. Since this type fully specifies the a posteriori probability of the transmitted message, the decoder's remotely-plausible-with-this-help list L(Y, T) contains only messages whose a posteriori probability is equal to that of the correct message. It is therefore a subset of the at-least-as-likely list L(M, Y) (without help) and hence of smaller-or-equal ρ-th moment. Consequently, any rate that allows the latter to tend to one, also allows the former to tend to one.
On the Gaussian channel the likelihood w(y|x m ) is specified by the normalized squared Euclidean norm of the noise sequence z 2 /n. The latter, however, cannot be described at zero rate with infinite precision. This motivates us to quantize it and have the quantized version be the zero-rate help. The result will then follow by considering the high-resolution limit of the achievable rates. For this purpose, a uniform quantizer will do.
Given some large M > 0 (which determines the overload region) and some large K (corresponding to the number of quantization cells), we partition the interval [0, M] into K subintervals, each of length ∆ = M/K. The helper, upon observing the noise sequence Z, produces The constant M, which does not depend on the blocklength n, is chosen large enough to guarantee that the large-deviation probability of overload Pr [ Z 2 /n ≥ M] decay sufficiently fast in n so that the contribution of the overload to the ρ-th moment of the list be negligible, even if an overload results in the list containing all e nR codewords: (Upper bounds on the tail of the χ 2 distribution show, for example, that for R < R 0 (ρ), the choice M = max{2, 20ρR 0 (ρ)} will do.) Since the help takes values in the finite set T n = {0, 1, . . . , K}, where K does not depend on the blocklength, it is of zero rate.
As in Section 4.3, we consider a random codebook {X m } m=1,...,e nR whose codewords are drawn independently from the conditional Gaussian distribution, i.e., from Q (n) defined in (35) and (36) with Q being Q G , the centered variance-A Gaussian distribution. Using the same symmetry arguments, we also assume that the transmitted message is m = 1 and study the ρ-th moment of the list under this assumption. Defining we can express the ρ-th moment of the remotely-plausible list when m = 1 as (106) In view of Lemma 1, we need to prove that where the expectation is over both the random choice of the codebook and the channel behavior.
To analyze the LHS of (107), we define for every x 1 , y ∈ R n and every message m = 1 the binary random variable Our analysis of V m (x 1 , y) depends on whether φ (n) (y − x 1 ) differs from K (no overload) or equals K (corresponding to quantizer overload). In the former case, the random variable V m (x 1 , y) can be upper bounded by B m (x 1 , y; ∆) because where (110) holds because, for the case at hand, the equality of helper's description implies that y − X m 2 and y − x 1 2 lie in a same interval of length n∆. In the latter case-which is exponentially rare when M exceeds the noise variance-we simply upper bound V m (x 1 , y) by 1.
The ρ-th moment of the list can now be expressed using the law of total expectation as The second term on the RHS of (116) tends to zero by (104). The first term is studied in the following lemma: For a given R < R 0 (ρ), achievability is thus established using this lemma and (116) by picking M sufficiently large for (104) to hold, and then picking K large enough to guarantee that R < R 0 (ρ) − M/K so that, by Lemma 3, the first term on the RHS of (116) will also tend to zero.

Case 2: R h > 0
The key to proving the achievability of R cutoff (ρ) + R h is in showing that rate-R h help can be utilized to increase the data rate by R h , and that this can be done losslessly, with arbitrarily small (positive) power, and in one channel use. To show how this can be done, we show that-by using the channel once to send a single input that is bounded by √ A (with A any prespecified positive number) and using help taking values in the set T = {0, . . . , κ − 1}-we can send error-free a message taking values in said set. To transmit m ∈ {0, . . . , κ − 1}, the encoder sends which is upper-bounded by √ A. Upon observing the noise Z, the helper produces the description T by quantizing the normalized noise and taking modulo, i.e., which is an element of {0, . . . , κ − 1}. Based on Y and T, the decoder can calculatê which equals m, becausem where (123) holds because m and T are both integers. Using this building-block, we can now prove the achievability of R cutoff (ρ) + R h by employing two-phase time sharing. Specifically, we propose the following blocklength-(n + 1) scheme. In the first n channel uses, the helper operates at rate zero as in Section 5.1. By the achievability result proved in Section 5.1, for any R < R 0 (ρ), there exists a sequence of blocklength-n rate-R codebooks {x m } m=1,...,e nR , with x m 2 ≤ (n − 1)A for every m, and zero-rate helpers φ(Z n ), such that the remotely-plausible-list L(Y n , φ(Z n )) satisfies In the (n + 1)-th channel-use we use the aforementioned coding scheme with κ being e nR h . Since that scheme is error-free, the overall remotely-plausible-list for the two phases has the same cardinality as that of the first phase, namely L(Y n , φ(Z n )) , and hence, its ρ-th moment tends to 1 by (125).
The achievability now follows by verifying that, the power of the transmitted input sequence x satisfies the rate of the helper is and the rate achieved by the scheme is which tend to R h and R 0 (ρ) + R h , respectively, as n tends to infinity.
Author Contributions: Writing-original draft preparation, A.L. and Y.Y.; writing-review and editing, A.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Lemma 2
We shall establish that the expectation tends to zero as n tends to infinity whenever R < R 0 (ρ). First notice that conditional on the transmitted codeword x 1 and the channel output y, the random variables {B m } m =1 are IID Bernoulli, with B m determined by X m and being of probability of success p(x 1 , y) = Pr w(y|X m ) ≥ w(y|x 1 ) (A3) where the last inequality follows from Markov's inequality. Thus ≤ e nρR y∈R n x 1 ∈R n p(x 1 , y) ρ w(y|x 1 ) dQ (n) (x 1 ) dν(y) (A7) ≤ e nρR y∈R n x 1 ∈R n E w(y|X m ) where the last equality follows from (74). The Central Limit Theorem guarantees that, as n tends to infinity, µ approaches 1/2. Consequently, the RHS of (A14) tends to zero whenever R < R 0 (ρ).

Appendix B. Proof of Lemma 3
To prove the lemma, we shall establish that, whenever R < R 0 (ρ) − ∆, where the outer expectation is over X 1 and Y. From this (117) will follow in much the same way that (88) followed from (90) in Section 4.3.
To establish (A15), first note that, conditional on the transmitted codeword x 1 and the channel output y, the random variables {B m (x 1 , y; ∆)} m =1 are IID Bernoulli, with B m determined by X m and being of probability of success p(x 1 , y; ∆) = Pr w(y|X m ) ≥ w(y|x 1 ) e − n∆ where (A22) holds because ρ, ∆ > 0 so nρ∆/ 2(1 + ρ) < nρ∆. Except for the e nρ∆ factor, the RHS of (A22) is identical to the RHS of (A8), which was shown to decay at least as fast as e nρ(R−R 0 (ρ)) ; see (A14). It follows that the RHS of (A22) tends to zero whenever R + ∆ < R 0 (ρ).