Assisted Identification over Modulo-Additive Noise Channels

The gain in the identification capacity afforded by a rate-limited description of the noise sequence corrupting a modulo-additive noise channel is studied. Both the classical Ahlswede–Dueck version and the Ahlswede–Cai–Ning–Zhang version, which does not allow for missed identifications, are studied. Irrespective of whether the description is provided to the receiver, to the transmitter, or to both, the two capacities coincide and both equal the helper-assisted Shannon capacity.


Introduction
If a helper can observe the additive noise corrupting a channel and can describe it to the decoder, then the latter can subtract it and thus render the channel noiseless.However, for this to succeed, the description must be nearly lossless and hence possibly of formidable rate.It is thus of interest to study scenarios where the description rate is limited, and to understand how the rate of the help affects performance.
When performance is measured in terms of the Shannon capacity, the problem was solved for a number of channel models [1][2][3], where the former two address assistance to the decoder and the latter to the encoder.When performance was measured in terms of the erasures-only capacity or the list-size capacity, the problem was solved in [4,5].Error exponents with assistance were studied in [6].Here we study how rate-limited help affects the identification capacity [7].
We focus on the memoryless modulo-additive channel (MMANC), whose time-k output Y k corresponding to the time-k input x k is: where Z k is the time-k noise sample; the channel input x k , the channel output Y k , and the noise Z k all take values in the set A-also denoted X , or Y, or Z-comprising the |A| elements {0, . . ., |A| − 1}; and ⊕ and denote mod-|A| addition and subtraction, respectively.The noise sequence {Z k } is IID ∼ P Z , where P Z is some PMF on A.
Irrespective of whether the help is provided to the encoder, to the decoder, or to both, the Shannon capacity of this channel coincides with its erasures-only capacity, and both are given by [3] (Section V) and [4] (Theorems 2 and 6): where {ξ} + denotes max{0, ξ}, and H(P Z ) is the Shannon entropy of P Z .
Here we study two versions of the identification capacity of this channel: Ahlswede and Dueck's original identification capacity C ID [7], and the identification capacity subject to no missed-identifications C ID,0 [8].Our main result is that-irrespective of whether the help is provided to the encoder, to the decoder, or to both-the two identification capacities coincide and both equal the right-hand side (RHS) of (2).

Problem Formulation
The identification-over-a-channel problem is parameterized by the blocklength n, which tends to infinity in the definition of the identification capacity.The n-length noise sequence Z n ∈ A n is presented to a helper, which produces its nR h -bit description t(Z n ): where: We refer to the set N = {1, . . ., N} as the set of identification messages and to its cardinality N as the number of identification messages.The identification rate is defined (for N sufficiently large) as: A generic element of N -namely, a generic identification message-is denoted i.
If no help is provided to the encoder, then the latter is specified by a family {P i X n } i∈N of PMFs on A n that are indexed by the identification messages, with the understanding that, to convey the identification message (IM) i, the encoder transmits a random sequence in A n that it draws according to the PMF P i X n .If help T = t(Z n ) ∈ T is provided to the encoder, then the encoder's operation is specified by a family of PMFs {P i X n |t } (i,t)∈N ×T that is now indexed by pairs of identification messages and noise descriptions, with the understanding that, to convey IM i given the description T = t(Z n ), the encoder produces a random n-length sequence of channel inputs that is distributed according to P i X n |T .In either case, the channel output sequence Y n is: componentwise.
If help is provided to the encoder, and if IM i is to be conveyed, then the joint distribution of (X n , Z n , Y n , T) has the form: where and where because we are assuming that the noise description is a deterministic function of the noise sequence.(The results also hold if we allow randomized descriptions: our coding schemes employ deterministic descriptions and the converse allows for randomization.)Here 1{statement} equals 1 if the statement holds and equals 0 otherwise.In the absence of help, the joint distribution has the form: Based on the data available to it-Y n in the absence of help to the decoder and (Y n , t(Z n )) in its presence-the receiver performs N binary tests indexed by i ∈ N , where the i-th test is whether or not the IM is i.It accepts the hypothesis that the IM is i if Y n is in its acceptance region, which we denote D i (t) ∈ A n in the presence of decoder assistance t ∈ T and D i ∈ A n in its absence.
When the help t ∈ T is provided to the receiver, the probability of missed detection associated with IM i ∈ N is thus: and the worst-case false alarm associated with it is: Note that, given t ∈ T , the acceptance regions {D i (t)} i∈N of the different tests need not be disjoint.We define: and: In the absence of help to the receiver, the probability of missed detection associated with IM i is: and the worst-case probability of false alarm associated with it is: In this case, we define: and: In both cases we say that a scheme is of zero missed detectionsif p MD,max is zero.A rate R is an achievable identification rate if, for every γ > 0 and every > 0, there exists some positive integer n 0 such that, for all blocklengths n exceeding n 0 , there exists a scheme with: identification messages for which: The supremum of achievable rates is the identification capacity with a helper C ID (R h ).
Replacing requirement (20) with: leads to the definition of the zero missed-identification capacity C ID,0 (R h ).
Remark 1. Writing out p FA,max of (14) as: highlights that (prior to maximizing over i) we first maximize over j and then average the result over t.In this sense, the help-even if provided to both encoder and decoder-cannot be viewed as "common randomness" in the sense of [9][10][11] where the averaging over the common randomness is performed before taking the maximum.Our criterion is more demanding of the direct part (code construction) and less so of the converse.Both criteria are interesting.Ours allows for the notion of "outage", namely, descriptions that indicate that identification might fail and that therefore call for retransmission.The other criterion highlights the interplay between the noise description and the generation of common randomness (particularly when the help is provided to both transmitter and receiver).
The following theorem is the main result of this paper.Theorem 1.On the modulo additive noise channel-irrespective of whether the help is provided to the transmitter, to the receiver, or to both-the identification capacity with a helper C ID (R h ) and the zero missed-identification capacity with a helper C ID,0 (R h ) are equal and coincide with the Shannon capacity: where the latter is given in (2).
We prove this result by establishing in Section 3 that C ID,0 (R h ) ≥ C Sh (R h ) using a slight strengthening of recent results in [4] in combination with the code construction proposed in [8].The converse is proved in Section 4, where we use a variation on a theme by Watanabe [12] to analyze the case where the assistance is provided to both transmitter and receiver.

Direct Part: Zero Missed Detection
In this section we prove that: by proposing identification schemes of no missed detections and of rates approaching C Sh (R h ).To this end, we extend to the helper setting the connection-due to Ahlswede, Cai, and Zhang [8]-between the zero-missed-detection identification capacity C ID,0 and the erasures-only capacity C e-o .We then call on recent results [4] to infer that, on the modulo-additive noise channel with a helper, the Erasures-Only capacity is equal to the Shannon capacity.We treat encoder-only assistance and decoder-only assistance separately.
Either case also proves achievability when the assistance is provided to both encoder and decoder.
Recall that an erasures-only decoder produces a list L comprising the messages under which the observation is of positive likelihood and then act as follows: If the list contains only one message, it produces that message; otherwise, it declares an erasure.Since the list always contains the transmitted message, this decoder never errs.The erasures-only capacity is defined like the Shannon capacity, but with the additional requirement that the decoder be the erasures-only decoder.This notion extends in a natural way to settings with a helper [4].

Encoder Assistance
A rate-R, blocklength-n, encoder-assisted, erasures-only transmission code comprises a message set M = {1, . . ., M} with M = 2 nR messages and a collection of M mappings { f m } m∈M from T to X n , indexed by M, with the understanding that to transmit Message m after being presented with the help t(Z n ) ∈ T , the encoder produces the n-tuple of channel inputs f m t(Z n ) ∈ X n .Since the decoder observes only the channel outputs (and not the help), it forms the list: The collection of output sequences that cause the erasures-only decoder to produce an erasure is: The probability of erasure associated with the transmission of Message m with encoder help t is P Y n |X n ,T Y er | f m (t), t .On the modulo additive noise channel with rate-R h encoder assistance, the erasures-only capacity and the Shannon capacity coincide and [4]: We shall need the following slightly-stronger version of the achievability part of this result, where we swap the maximization over the messages with the expectation over the help: Proposition 1.Consider the modulo additive noise channel with rate-R h encoder assistance.
For any transmission rate R smaller than C e-o (R h ) of (27)), there exists a sequence of rate-R transmission codes for which: A similar result holds for decoder assistance.
Proof.The proof is presented in Appendix A. It is based on the construction in [4], but with a slightly finer analysis.
The coding scheme we propose is essentially that of [8].We just need to account for the help.For each blocklength n, we start out with a transmission code of roughly 2 nC e-o (R h ) codewords for which (28) holds, and use Lemma 1 ahead to construct approximately 2 2 nCe-o(R h ) lightly-intersecting subsets of its message set.We then associate an IM with each of the subsets, with the understanding that to transmit an IM we pick uniformly at random one of the messages in the subset associated with it and transmit this message with the helper's assistance.Lemma 1 ([7] Proposition 14).Let Z be a finite set, and let λ ∈ (0, 1  2 ) be given.If > 0 is sufficiently small so that: then there exist subsets A 1 , . . ., A N of Z such that for all distinct i, j ∈ {1, . . ., N} the following hold: With the aid of this lemma, we can now prove the achievability of C e-o (R h ).
Proof.Given an erasures-only encoder-assisted transmission code {( f m )} m∈M where f m : T → X n , we apply Lemma 1 to the transmission message set M with: to infer, for large enough n, the existence of subsets F 1 , . . ., F N ⊆ M such that for all distinct i, j ∈ {1, . . ., N} with j = i: Note that (36) implies that: To send IM i after obtaining the assistance t(z n ), the encoder picks a random element M from F i equiprobably and transmits X n = f M t(Z n ) , so: To guarantee no missed detections, we set the acceptance region of i-th IM to be: It now remains to analyze the scheme's maximal false-alarm probability.
where in (41) we expressed , t using (7); in (42) we expressed F j as the disjoint union of F j ∩ F i and F j \ F i ; in (43) we used the trivial bound: (49) in (44) we used the fact that whenever m = i: which holds because, by the definition of the set Y er , any output sequence y n that contributes to the LHS of (50), i.e., that is in D i with P Y n |X n ,T y n | f m (t), t > 0, must also be in Y er ; in (45) we used (35); in (46) we replaced each term in the sum with the global maximum (over m ∈ M) and used (34); in (47) we used the trivial bound |F j \ F i | ≤ |F j |; and in (48) we could simplify the expression because the dependence on i and j is no longer.The above construction demonstrates that every transmission scheme that drives ∑ t∈T P T (t) max m∈M P Y n |X n ,T Y er | f m (t), t to zero induces a zero missed-identification scheme that drives the false-alarm probability to zero.Since the former exists for all rates up to C e-o (R h ), we conclude, by (37), that C ID,0 (R h ) ≥ C e-o (R h ).This, in turn, implies that C ID,0 (R h ) ≥ C Sh (R h ) and hence concludes the achievability proof for encoder-assistance because, on the modulo additive noise channel, C e-o (R h ) = C Sh (R h ).

Decoder Assistance
When, rather than to the encoder, the assistance is to the decoder, the transmission codewords are n-tuples in A n , and we denote the transmission codebook C = {x n (m)} m∈M .For the induced identification scheme we use the same message subsets as before, with IM i being transmitted by choosing uniformly at random a message M from the subset F i and transmitting the codeword x n (M).To avoid any missed detections, we set the acceptance region corresponding to IM i and decoder assistance t to be: The analysis of the false-alarm probability is nearly identical to that with encoder assistance and is omitted.

Converse Part: Help Provided to Both Transmitter and Receiver
In this section we establish the converse for all the cases of interest by proving that the inequality: holds even when the help is provided to both encoder and decoder.The RHS of (52) is the helper Shannon capacity, irrespective of whether the help is provided to the encoder, to the decoder, or to both [3] (Section V).
There are two main steps to the proof.The first addresses the conditional probabilities of the two types of testing errors conditional on a given description T = t.It relates the two to the conditional entropy of the noise given the description, namely, H(Z n |T = t).Very roughly, this corresponds to proving the converse part of the ID-capacity theorem for the channel whose noise is distributed according to the conditional distribution of Z n given T = t.The difficulty in this step is that, given T = t, the noise is not memoryless, and the channel may not even be stable.Classical type-based techniques for proving the converse part of the ID-capacity theorem-such as those employed in [7] (Theorem 12), [13] (Section III), or [14] (Section III)-are therefore not applicable.Instead, we extend to the helper setting Watanabe's technique [12], which is inspired by the partial channel resolvability method introduced by Steinberg [15].
The second step in the proof addresses the unconditional error probabilities.This step is needed because, in the definition of achievability (see (13) and ( 14)), the error probabilities are averaged over the noise description t.We will show that, when the identification rate exceeds the Shannon capacity, there exists an IM i * for which the sum of the two types of errors is large whenever the description t is in a subset T * of T whose probability is bounded away from zero.This will imply that, for this IM i * , the sum of the averaged probabilities of error is bounded away from zero, thus contradicting the achievability.

Additional Notation
Given a PMF P X and a conditional PMF P Y|X , we write P X • P Y|X for the joint PMF that assigns the pair (x, y) the probability P X (x) P Y|X (y|x).We use I P•P Y|X (X; Y) to denote the mutual information between X and Y under the joint distribution P • P Y|X .The product PMF of marginals P X and P Y is denoted P X × P Y ; it assigns (x, y) the probability P X (x) P Y (y).
For the hypothesis testing problem of guessing whether some observation X was drawn ∼ P X (the "null hypothesis") or ∼ Q X (the "alternative hypothesis"), we use K(•|X) to denote a generic randomized test that, after observing X = x, guesses the null hypothesis (X ∼ P X ) with probability K(0|X = x) and the alternative (X ∼ Q X ) with probability K(1|X = x).(Here K(0|X = x) + K(1|X = x) = 1 for every x ∈ X .)The type I error probability associated with K(•|X) is: and the type II: For a given 0 < < 1 we define: to be the least type-II error probability that can be achieved under the constraint that the type-I error probability does not exceed .

Conditional Missed-Detection and False-Alarm Probabilities
The following lemma follows directly from Watanabe's work [12].

Lemma 2 ([12] Theorem 1 and Corollary 2)
. Let P Y n |X n ,T=t be the n-letter conditional distribution of the channel output sequence given that the noise description is T = t and the input is X n .For any λ 1 , λ 2 > 0 with λ 1 + λ 2 < 1, any 0 < η < 1 − λ 1 − λ 2 , and any fixed t ∈ T , the condition: implies: and hence: where: which-for any fixed η > 0-tends to 0 as n tends to ∞.
Substituting P Y n |X n ,T=t for P Y|X in the following theorem will allow us to link the RHS of (57) with the conditional mutual information between X n and Y n given t ∈ T .The theorem's proof was inspired by the proof of [16] (Theorem 8).See also [17] (Lemma 1).Theorem 2. Given any 0 < < 1 and any conditional PMF P Y|X , sup where h( ) − log( ) − (1 − ) log(1 − ) is the binary entropy function.
Proof.Applying the data-processing inequality for relative entropy to the binary hypothesis testing setting (see, e.g., [18] (Thm.30.12.5)) we conclude that for any randomized test where: denotes the binary divergence function.Since there exists a randomized test K * (•|X) for which [18] (Lemma 30.5.4 and Proposition 30.8.1) we can apply (61) to K (•|X) to conclude that: (The above existence also holds when β (P • P Y|X , P × Q)) is zero, but for this case we can verify (63) directly by noting that in this case, since < 1, the RHS of ( 63) is +∞.)The LHS of (63) can be lower bounded by lower-bounding the binary divergence function as: It follows from ( 63) and (64) that: so the infimum over Q of the LHS is upper bounded by the infimum over Q on the RHS.The latter (for fixed P ∈ P (X )) is achieved when Q is the Y-marginal of P • P Y|X , a marginal that we denote P Y : inf This is a special case of a more general result on Rényi divergence [19] (Theorem II.2).Here we give a simple proof for K-L divergence: = sup P∈P (X ) Applying Lemma 2 and Theorem 2 to our channel when its law is conditioned on T = t yields the following corollary.
Proof.Substituting X n for X , Y n for Y, P Y n |X n ,T=t for P Y|X , and for in Theorem 2, we obtain: sup Given P ∈ P (X n ) and P Y n |X n ,T=t , the mutual information term in (76) can be upperbounded as follows: = n log |A| − ∑ Applying ( 76) and (80) to (58) in Lemma 2 establishes Corollary 1.

Averaging over T
Corollary 1 deals with identification for a given fixed T = t, but our definition of achievability in ( 13) and ( 14) entails averaging over t, which we must thus study.We begin by lower-bounding the conditional entropy of the noise sequence Z n given the assistance T: We next define, for every δ > 0, the subset of descriptions: These are poor noise descriptions in the sense that, after they are revealed, the remaining uncertainty about the noise is still large.Key is that their probability is bounded away from zero.In fact, as we next argue: where in the second case the probability is 1 because when R h ≥ H(P Z ) − δ the condition appearing in the definition of T * (δ) in (84) translates to H(Z n |T = t) ≥ 0. As to the first case, we begin with (83) to obtain: from which the first case of the bound in (85) follows.Here (87) follows from expressing T as the disjoint union of T * (δ) and T \ T * (δ), and (88) follows from the definition of T * (δ) and the bound H(Z n |T = t) ≤ n log |A|.Inequality (85) establishes that the probability of a poor description is lower bounded by a positive constant that does not depend on n.Using Corollary 1 for such t's will be the key to the converse.
Henceforth, we fix some sequence of identification codes of rate R exceeding C Sh (R h ), i.e., satisfying R > log |A| − {H(P Z ) − R h } + , and show that p MD,max + p FA,max cannot tend to 0 as n tends to ∞.For such a rate R, there exist R , δ > 0; a pair λ 1 , λ 2 > 0 with where η, and .Since the inequality on R in (89) is strict, and since ψ n (η) tends to zero with n, it follows that the inequality continues to hold also when we add ψ n (η) to the RHS provided that n is sufficiently large, i.e., that there exists some n 0 (η) such that: It then follows from (90) and the definition of T * (δ) in (84) that, whenever n ≥ n 0 (η), R exceeds the RHS of (75): Corollary 1 thus implies that, for n > n 0 (η): However, we need a stronger statement because, in the above, the IM i for which p i MD (t) + p i FA (t) ≥ λ 1 + λ 2 depends on t, whereas in our definition of achievability we are averaging over T for fixed IM.The stronger result we will establish is that the condition on the LHS of (92) implies that, for all sufficiently large n, there exists some IM i * (that does not depend on t) which performs poorly for every t in T * (δ), i.e., for which: That is, we will show that for sufficiently large n: To this end, define for each t ∈ T * (δ): and consider the identification code that results when we restrict our code to the IMs in N (t) (while keeping the same acceptance regions).Applying Corollary 1 to this restricted code using (91), we obtain that: Consequently, t∈T * (δ) where the second inequality holds by (96) and the fact that T * (δ) is contained in T , and the latter's cardinality is 2 nR h .Since R < R (89), there exists some n 1 (R, R , R h ) such that: We can use this to upper-bound the RHS of (97) to obtain that, for n ≥ max n 0 (η), n 1 (R, R , R h ) : t∈T * (δ) The complement (in N ) of the union on the LHS of (99) is thus not empty, which proves the existence of some i ∈ N for which (93) holds.With i in hand, the converse follows from the fact that the probability that T is in T * (δ) is bounded away from zero (85), because for every n ≥ max n 0 (η), n 1 (R, where (100) follows from the definitions in ( 13) and (14); in (101) we replaced the maximum with the IM i * ; and (103) follows from (93).Thus, any code of rate R > log |A| − {H(P Z ) − R h } + with large enough n must have p MD,max + p FA,max ≥ P T (T * ) • (λ 1 + λ 2 ), and the latter is bounded away from zero.This concludes the proof of the converse part.