Next Article in Journal
Revocable Signature Scheme with Implicit and Explicit Certificates
Next Article in Special Issue
Smoothing of Binary Codes, Uniform Distributions, and Applications
Previous Article in Journal
Joint Detection and Communication over Type-Sensitive Networks
Previous Article in Special Issue
Dimension-Free Bounds for the Union-Closed Sets Conjecture
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assisted Identification over Modulo-Additive Noise Channels

Signal and Information Processing Laboratory, ETH Zurich, 8092 Zurich, Switzerland
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(9), 1314; https://doi.org/10.3390/e25091314
Submission received: 15 June 2023 / Revised: 28 August 2023 / Accepted: 1 September 2023 / Published: 8 September 2023
(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Abstract

:
The gain in the identification capacity afforded by a rate-limited description of the noise sequence corrupting a modulo-additive noise channel is studied. Both the classical Ahlswede–Dueck version and the Ahlswede–Cai–Ning–Zhang version, which does not allow for missed identifications, are studied. Irrespective of whether the description is provided to the receiver, to the transmitter, or to both, the two capacities coincide and both equal the helper-assisted Shannon capacity.

1. Introduction

If a helper can observe the additive noise corrupting a channel and can describe it to the decoder, then the latter can subtract it and thus render the channel noiseless. However, for this to succeed, the description must be nearly lossless and hence possibly of formidable rate. It is thus of interest to study scenarios where the description rate is limited, and to understand how the rate of the help affects performance.
When performance is measured in terms of the Shannon capacity, the problem was solved for a number of channel models [1,2,3], where the former two address assistance to the decoder and the latter to the encoder. When performance was measured in terms of the erasures-only capacity or the list-size capacity, the problem was solved in [4,5]. Error exponents with assistance were studied in [6]. Here we study how rate-limited help affects the identification capacity [7].
We focus on the memoryless modulo-additive channel (MMANC), whose time-k output Y k corresponding to the time-k input x k is:
Y k = x k Z k
where Z k is the time-k noise sample; the channel input x k , the channel output Y k , and the noise Z k all take values in the set A —also denoted X , or Y , or Z —comprising the | A | elements { 0 , , | A | 1 } ; and ⊕ and ⊖ denote mod- | A | addition and subtraction, respectively. The noise sequence { Z k } is IID P Z , where P Z is some PMF on A .
Irrespective of whether the help is provided to the encoder, to the decoder, or to both, the Shannon capacity of this channel coincides with its erasures-only capacity, and both are given by [3] (Section V) and [4] (Theorems 2 and 6):
C e-o ( R h ) = C S h ( R h ) = log | A | H ( P Z ) R h +
where { ξ } + denotes max { 0 , ξ } , and H ( P Z ) is the Shannon entropy of P Z .
Here we study two versions of the identification capacity of this channel: Ahlswede and Dueck’s original identification capacity C ID [7], and the identification capacity subject to no missed-identifications C ID , 0 [8]. Our main result is that—irrespective of whether the help is provided to the encoder, to the decoder, or to both—the two identification capacities coincide and both equal the right-hand side (RHS) of (2).

2. Problem Formulation

The identification-over-a-channel problem is parameterized by the blocklength n, which tends to infinity in the definition of the identification capacity. The n-length noise sequence Z n A n is presented to a helper, which produces its n R h -bit description t ( Z n ) :
t ( z n ) T
where:
T = { 0 , 1 } n R h .
We refer to the set N = { 1 , , N } as the set of identification messages and to its cardinality N as the number of identification messages. The identification rate is defined (for N sufficiently large) as:
1 n log log N .
A generic element of N —namely, a generic identification message—is denoted i.
If no help is provided to the encoder, then the latter is specified by a family { P X n i } i N of PMFs on A n that are indexed by the identification messages, with the understanding that, to convey the identification message (IM) i, the encoder transmits a random sequence in A n that it draws according to the PMF P X n i . If help T = t ( Z n ) T is provided to the encoder, then the encoder’s operation is specified by a family of PMFs { P X n | t i } ( i , t ) N × T that is now indexed by pairs of identification messages and noise descriptions, with the understanding that, to convey IM i given the description T = t ( Z n ) , the encoder produces a random n-length sequence of channel inputs that is distributed according to P X n | T i . In either case, the channel output sequence Y n is:
Y n = X n Z n
componentwise.
If help is provided to the encoder, and if IM i is to be conveyed, then the joint distribution of ( X n , Z n , Y n , T ) has the form:
P Z n ( z n ) P T | Z n ( t | z n ) P X n | T i ( x n | t ) P Y n | X n , Z n ( y n | x n , z n )
where
P Y n | X n , Z n ( y n | x n , z n ) = 𝟙 y n = x n z n
and where
P T | Z n ( t | z n ) = 𝟙 t = t ( z n )
because we are assuming that the noise description is a deterministic function of the noise sequence. (The results also hold if we allow randomized descriptions: our coding schemes employ deterministic descriptions and the converse allows for randomization.) Here 𝟙 { statement } equals 1 if the statement holds and equals 0 otherwise. In the absence of help, the joint distribution has the form:
P Z n ( z n ) P T | Z n ( t | z n ) P X n i ( x n ) P Y n | X n , Z n ( y n | x n , z n ) .
Based on the data available to it— Y n in the absence of help to the decoder and ( Y n , t ( Z n ) ) in its presence—the receiver performs N binary tests indexed by i N , where the i-th test is whether or not the IM is i. It accepts the hypothesis that the IM is i if Y n is in its acceptance region, which we denote D i ( t ) A n in the presence of decoder assistance t T and D i A n in its absence.
When the help t T is provided to the receiver, the probability of missed detection associated with IM i N is thus:
p MD i ( t ) = 1 P Y n | T = t i D i ( t )
and the worst-case false alarm associated with it is:
p FA i ( t ) = max j N { i } P Y n | T = t j D i ( t ) .
Note that, given t T , the acceptance regions { D i ( t ) } i N of the different tests need not be disjoint. We define:
p MD , max = max i N t T P T ( t ) p MD i ( t )
and:
p FA , max = max i N t T P T ( t ) p FA i ( t ) .
In the absence of help to the receiver, the probability of missed detection associated with IM i is:
p MD i = 1 t T P T ( t ) P Y n | T = t i ( D i ) = 1 P Y n i ( D i )
and the worst-case probability of false alarm associated with it is:
p FA i = t T P T ( t ) max j N { i } P Y n | T = t j ( D i ) .
In this case, we define:
p MD , max = max i N p MD i
and:
p FA , max = max i N p FA i .
In both cases we say that a scheme is of zero missed detectionsif p MD , max is zero.
A rate R is an achievable identification rate if, for every γ > 0 and every ϵ > 0 , there exists some positive integer n 0 such that, for all blocklengths n exceeding n 0 , there exists a scheme with:
N = 2 2 n ( R γ )
identification messages for which:
max { p MD , max , p FA , max } < ϵ .
The supremum of achievable rates is the identification capacity with a helper C ID ( R h ) . Replacing requirement (20) with:
p MD , max = 0 , p FA , max < ϵ
leads to the definition of the zero missed-identification capacity C ID , 0 ( R h ) .
Remark 1.
Writing out p FA , max of (14) as:
p FA , max = max i N t T P T ( t ) max j N { i } P Y n | T = t j D i ( t )
highlights that (prior to maximizing over i) we first maximize over j and then average the result over t. In this sense, the help—even if provided to both encoder and decoder—cannot be viewed as “common randomness” in the sense of [9,10,11] where the averaging over the common randomness is performed before taking the maximum. Our criterion is more demanding of the direct part (code construction) and less so of the converse.
Both criteria are interesting. Ours allows for the notion of “outage”, namely, descriptions that indicate that identification might fail and that therefore call for retransmission. The other criterion highlights the interplay between the noise description and the generation of common randomness (particularly when the help is provided to both transmitter and receiver).
The following theorem is the main result of this paper.
Theorem 1.
On the modulo additive noise channel—irrespective of whether the help is provided to the transmitter, to the receiver, or to both—the identification capacity with a helper C ID ( R h ) and the zero missed-identification capacity with a helper C ID , 0 ( R h ) are equal and coincide with the Shannon capacity:
C ID ( R h ) = C ID , 0 ( R h ) = C Sh ( R h )
where the latter is given in (2).
We prove this result by establishing in Section 3 that C ID , 0 ( R h ) C Sh ( R h ) using a slight strengthening of recent results in [4] in combination with the code construction proposed in [8]. The converse is proved in Section 4, where we use a variation on a theme by Watanabe [12] to analyze the case where the assistance is provided to both transmitter and receiver.

3. Direct Part: Zero Missed Detection

In this section we prove that:
C ID , 0 ( R h ) C Sh ( R h )
by proposing identification schemes of no missed detections and of rates approaching C Sh ( R h ) . To this end, we extend to the helper setting the connection—due to Ahlswede, Cai, and Zhang [8]—between the zero-missed-detection identification capacity C ID , 0 and the erasures-only capacity C e-o . We then call on recent results [4] to infer that, on the modulo-additive noise channel with a helper, the Erasures-Only capacity is equal to the Shannon capacity. We treat encoder-only assistance and decoder-only assistance separately. Either case also proves achievability when the assistance is provided to both encoder and decoder.
Recall that an erasures-only decoder produces a list L comprising the messages under which the observation is of positive likelihood and then act as follows: If the list contains only one message, it produces that message; otherwise, it declares an erasure. Since the list always contains the transmitted message, this decoder never errs. The erasures-only capacity is defined like the Shannon capacity, but with the additional requirement that the decoder be the erasures-only decoder. This notion extends in a natural way to settings with a helper [4].

3.1. Encoder Assistance

A rate-R, blocklength-n, encoder-assisted, erasures-only transmission code comprises a message set M = { 1 , , M } with M = 2 n R messages and a collection of M mappings { f m } m M from T to X n , indexed by M , with the understanding that to transmit Message m after being presented with the help t ( Z n ) T , the encoder produces the n-tuple of channel inputs f m t ( Z n ) X n . Since the decoder observes only the channel outputs (and not the help), it forms the list:
L ( y n ) = m M : t T s . t . P Y n | X n , T y n | f m ( t ) , t ) > 0 .
The collection of output sequences that cause the erasures-only decoder to produce an erasure is:
Y er = y n A n : | L ( y n ) | > 1 .
The probability of erasure associated with the transmission of Message m with encoder help t is P Y n | X n , T Y er | f m ( t ) , t . On the modulo additive noise channel with rate- R h encoder assistance, the erasures-only capacity and the Shannon capacity coincide and [4]:
C e-o ( R h ) = C Sh ( R h ) = log | A | H ( P Z ) R h + .
We shall need the following slightly-stronger version of the achievability part of this result, where we swap the maximization over the messages with the expectation over the help:
Proposition 1.
Consider the modulo additive noise channel with rate- R h encoder assistance. For any transmission rate R smaller than C e-o ( R h ) of (27)), there exists a sequence of rate-R transmission codes for which:
lim n t T P T ( t ) max m M P Y n | X n , T Y er | f m ( t ) , t = 0 .
A similar result holds for decoder assistance.
Proof. 
The proof is presented in Appendix A. It is based on the construction in [4], but with a slightly finer analysis. □
The coding scheme we propose is essentially that of [8]. We just need to account for the help. For each blocklength n, we start out with a transmission code of roughly 2 n C e-o ( R h ) codewords for which (28) holds, and use Lemma 1 ahead to construct approximately 2 2 n C e-o ( R h ) lightly-intersecting subsets of its message set. We then associate an IM with each of the subsets, with the understanding that to transmit an IM we pick uniformly at random one of the messages in the subset associated with it and transmit this message with the helper’s assistance.
Lemma 1
([7] Proposition 14). Let Z be a finite set, and let λ ( 0 , 1 2 ) be given. If ϵ > 0 is sufficiently small so that:
λ log 1 ϵ 1 > 2 and 1 ϵ > 6
then there exist subsets A 1 , , A N of Z such that for all distinct i , j { 1 , , N } the following hold:
(30) ( a ) | A i | = ϵ | Z | , (31) ( b ) | A i A j | < λ ϵ | Z | , (32) ( c ) N | Z | 1 · 2 ϵ | Z | 1 .
With the aid of this lemma, we can now prove the achievability of C e-o ( R h ) .
Proof. 
Given an erasures-only encoder-assisted transmission code { ( f m ) } m M where f m : T X n , we apply Lemma 1 to the transmission message set M with:
ϵ = 1 n 2 + 2 and λ = 1 log n
to infer, for large enough n, the existence of subsets F 1 , , F N M such that for all distinct i , j { 1 , , N } with j i :
(34) | F i | = M n 2 + 2 (35) | F i F j | < 1 log n M n 2 + 2 (36) N M 1 · 2 M n 2 + 2 1 .
Note that (36) implies that:
lim ̲ n 1 n log log N 1 n log M 0 .
To send IM i after obtaining the assistance t ( z n ) , the encoder picks a random element M from F i equiprobably and transmits X n = f M t ( Z n ) , so:
P X n | T i ( x n | t ) = 1 | F i | m F i 𝟙 x n = f m ( t ) .
To guarantee no missed detections, we set the acceptance region of i-th IM to be:
D i = y n Y n : ( m , t ) F i × T s . t . P Y n | X n , T y n | f m ( t ) , t > 0 .
It now remains to analyze the scheme’s maximal false-alarm probability.
(40) p FA , max = max i N t T P T ( t ) max j N { i } P Y n | T = t j ( D i ) (41) = max i N t T P T ( t ) max j N { i } 1 | F j | m F j P Y n | X n , T D i | f m ( t ) , t (42) = max i N t T P T ( t ) max j N { i } 1 | F j | [ m F j F i P Y n | X n , T D i | f m ( t ) , t + m F j F i P Y n | X n , T D i | f m ( t ) , t ] (43) max i N t T P T ( t ) max j N { i } 1 | F j | m F j F i P Y n | X n , T D i | f m ( t ) , t + F j F i (44) max i N t T P T ( t ) max j N { i } 1 | F j | m F j F i P Y n | X n , T Y er | f m ( t ) , t + F j F i (45) < max i N t T P T ( t ) max j N { i } 1 | F j | m F j F i P Y n | X n , T Y er | f m ( t ) , t + M n 2 + 2 | F j | log   n (46) max i N t T P T ( t ) max j N { i } F j F i | F j | max m M P Y n | X n , T Y er | f m ( t ) , t + 1 log   n (47) max i N t T P T ( t ) max j N { i } max m M P Y n | X n , T Y er | f m ( t ) , t + 1 log   n (48) = t T P T ( t ) max m M P Y n | X n , T Y er | f m ( t ) , t + 1 log   n
where in (41) we expressed P Y n | T = t j ( D i ) as P Y n | X n , T D i | f m ( t ) , t using (7); in (42) we expressed F j as the disjoint union of F j F i and F j F i ; in (43) we used the trivial bound:
P Y n | X n , T D i | f m ( t ) , t 1 ;
in (44) we used the fact that whenever m i :
P Y n | X n , T D i | f m ( t ) , t P Y n | X n , T Y er | f m ( t ) , t
which holds because, by the definition of the set Y er , any output sequence y n that contributes to the LHS of (50), i.e., that is in D i with P Y n | X n , T y n | f m ( t ) , t > 0 , must also be in Y er ; in (45) we used (35); in (46) we replaced each term in the sum with the global maximum (over m M ) and used (34); in (47) we used the trivial bound | F j F i | | F j | ; and in (48) we could simplify the expression because the dependence on i and j is no longer.
The above construction demonstrates that every transmission scheme that drives t T P T ( t ) max m M P Y n | X n , T Y er | f m ( t ) , t to zero induces a zero missed-identification scheme that drives the false-alarm probability to zero. Since the former exists for all rates up to C e-o ( R h ) , we conclude, by (37), that C ID , 0 ( R h ) C e-o ( R h ) . This, in turn, implies that C ID , 0 ( R h ) C Sh ( R h ) and hence concludes the achievability proof for encoder-assistance because, on the modulo additive noise channel, C e-o ( R h ) = C Sh ( R h ) . □

3.2. Decoder Assistance

When, rather than to the encoder, the assistance is to the decoder, the transmission codewords are n-tuples in A n , and we denote the transmission codebook C = { x n ( m ) } m M . For the induced identification scheme we use the same message subsets as before, with IM i being transmitted by choosing uniformly at random a message M from the subset F i and transmitting the codeword x n ( M ) . To avoid any missed detections, we set the acceptance region corresponding to IM i and decoder assistance t to be:
D i ( t ) = { y n A n : m F i s . t . P Y n , T | X n y n , t | x n ( m ) > 0 }
The analysis of the false-alarm probability is nearly identical to that with encoder assistance and is omitted.

4. Converse Part: Help Provided to Both Transmitter and Receiver

In this section we establish the converse for all the cases of interest by proving that the inequality:
C ID ( R h ) log | A | H ( P Z ) R h +
holds even when the help is provided to both encoder and decoder. The RHS of (52) is the helper Shannon capacity, irrespective of whether the help is provided to the encoder, to the decoder, or to both [3] (Section V).
There are two main steps to the proof. The first addresses the conditional probabilities of the two types of testing errors conditional on a given description T = t . It relates the two to the conditional entropy of the noise given the description, namely, H ( Z n | T = t ) . Very roughly, this corresponds to proving the converse part of the ID-capacity theorem for the channel whose noise is distributed according to the conditional distribution of Z n given T = t . The difficulty in this step is that, given T = t , the noise is not memoryless, and the channel may not even be stable. Classical type-based techniques for proving the converse part of the ID-capacity theorem—such as those employed in [7] (Theorem 12), [13] (Section III), or [14] (Section III)—are therefore not applicable. Instead, we extend to the helper setting Watanabe’s technique [12], which is inspired by the partial channel resolvability method introduced by Steinberg [15].
The second step in the proof addresses the unconditional error probabilities. This step is needed because, in the definition of achievability (see (13) and (14)), the error probabilities are averaged over the noise description t. We will show that, when the identification rate exceeds the Shannon capacity, there exists an IM i * for which the sum of the two types of errors is large whenever the description t is in a subset T * of T whose probability is bounded away from zero. This will imply that, for this IM i * , the sum of the averaged probabilities of error is bounded away from zero, thus contradicting the achievability.

4.1. Additional Notation

Given a PMF P X and a conditional PMF P Y | X , we write P X P Y | X for the joint PMF that assigns the pair ( x , y ) the probability P X ( x ) P Y | X ( y | x ) . We use I P P Y | X ( X ; Y ) to denote the mutual information between X and Y under the joint distribution P P Y | X . The product PMF of marginals P X and P Y is denoted P X × P Y ; it assigns ( x , y ) the probability P X ( x ) P Y ( y ) .
For the hypothesis testing problem of guessing whether some observation X was drawn P X (the “null hypothesis”) or Q X (the “alternative hypothesis”), we use K ( · | X ) to denote a generic randomized test that, after observing X = x , guesses the null hypothesis ( X P X ) with probability K ( 0 | X = x ) and the alternative ( X Q X ) with probability K ( 1 | X = x ) . (Here K ( 0 | X = x ) + K ( 1 | X = x ) = 1 for every x X .) The type I error probability associated with K ( · | X ) is:
λ 1 [ K ] = x X P X ( x ) K ( 1 | x )
and the type II:
λ 2 [ K ] = x X Q X ( x ) K ( 0 | x ) .
For a given 0 < ϵ < 1 we define:
β ϵ ( P X , Q X ) = inf K : λ 1 [ K ] ϵ λ 2 [ K ]
to be the least type-II error probability that can be achieved under the constraint that the type-I error probability does not exceed ϵ .

4.2. Conditional Missed-Detection and False-Alarm Probabilities

The following lemma follows directly from Watanabe’s work [12].
Lemma 2
([12] Theorem 1 and Corollary 2). Let P Y n | X n , T = t be the n-letter conditional distribution of the channel output sequence given that the noise description is T = t and the input is X n . For any λ 1 , λ 2 > 0 with λ 1 + λ 2 < 1 , any 0 < η < 1 λ 1 λ 2 , and any fixed t T , the condition:
p MD i ( t ) + p FA i ( t ) < λ 1 + λ 2 , i N
implies:
log log N sup P P ( X n ) inf Q P ( Y n ) log β λ 1 + λ 2 + η P P Y n | X n , T = t , P × Q + log log | A | n + 2 log 1 η + 2
and hence:
1 n log log N 1 n sup P P ( X n ) inf Q P ( Y n ) log β λ 1 + λ 2 + η P P Y n | X n , T = t , P × Q + ψ n ( η ) ,
where:
ψ n ( η ) = log n n + log log | A | n 2 n log η + 2 n
which—for any fixed η > 0 —tends to 0 as n tends to ∞.
Substituting P Y n | X n , T = t for P Y | X in the following theorem will allow us to link the RHS of (57) with the conditional mutual information between X n and Y n given t T . The theorem’s proof was inspired by the proof of [16] (Theorem 8). See also [17] (Lemma 1).
Theorem 2.
Given any 0 < ϵ < 1 and any conditional PMF P Y | X ,
sup P P ( X ) inf Q P ( Y ) log β ϵ P P Y | X , P × Q sup P P ( X ) I P P Y | X ( X ; Y ) + h ( ϵ ) 1 ϵ
where h ( ϵ ) ϵ log ( ϵ ) ( 1 ϵ ) log ( 1 ϵ ) is the binary entropy function.
Proof. 
Applying the data-processing inequality for relative entropy to the binary hypothesis testing setting (see, e.g., [18] (Thm. 30.12.5)) we conclude that for any randomized test K ( · | X ) ,
D bin 1 λ 1 [ K ] λ 2 [ K ] D P P Y | X P × Q
where:
D bin ( α β ) α log α β + ( 1 α ) log 1 α 1 β
denotes the binary divergence function. Since there exists a randomized test K * ( · | X ) for which λ 1 [ K ] , λ 2 [ K ] = ϵ , β ϵ ( P P Y | X , P × Q ) (see, e.g., [18] (Lemma 30.5.4 and Proposition 30.8.1) we can apply (61) to K ( · | X ) to conclude that:
D bin 1 ϵ β ϵ ( P P Y | X , P × Q ) D P P Y | X P × Q .
(The above existence also holds when β ϵ ( P P Y | X , P × Q ) ) is zero, but for this case we can verify (63) directly by noting that in this case, since ϵ < 1 , the RHS of (63) is + .) The LHS of (63) can be lower bounded by lower-bounding the binary divergence function as:
D bin 1 ϵ β ϵ ( P P Y | X , P × Q ) h ( ϵ ) ( 1 ϵ ) log β ϵ ( P P Y | X , P × Q ) .
It follows from (63) and (64) that:
log β ϵ ( P P Y | X , P × Q ) D P P Y | X P × Q + h ( ϵ ) 1 ϵ
so the infimum over Q of the LHS is upper bounded by the infimum over Q on the RHS. The latter (for fixed P P ( X ) ) is achieved when Q is the Y-marginal of P P Y | X , a marginal that we denote P Y :
inf Q D P P Y | X P × Q = I P P Y | X ( X ; Y ) .
This is a special case of a more general result on Rényi divergence [19] (Theorem II.2). Here we give a simple proof for K-L divergence:
D P P Y | X P × Q (67) = x X , y Y P P Y | X ( x , y ) log P P Y | X ( x , y ) P ( x ) Q ( y ) (68) = x X , y Y P P Y | X ( x , y ) log P P Y | X ( x , y ) P ( x ) P Y ( y ) P Y ( y ) Q ( y ) (69) = x X , y Y P P Y | X ( x , y ) log P P Y | X ( x , y ) P ( x ) P Y ( y ) + y P Y ( y ) log P Y ( y ) Q ( y ) (70) I P P Y | X ( X ; Y ) + 0
with equality if and only if Q equals P Y .
From (63), (64), and (66) we obtain:
sup P P ( X ) inf Q P ( Y ) log β ϵ ( P P Y | X , P × Q ) (71) sup P P ( X ) inf Q P ( Y ) D b i n 1 ϵ β ϵ ( P P Y | X , P × Q ) + h ( ϵ ) 1 ϵ (72) sup P P ( X ) inf Q P ( Y ) D P P Y | X P × Q + h ( ϵ ) 1 ϵ (73) = sup P P ( X ) I P P Y | X ( X ; Y ) + h ( ϵ ) 1 ϵ .
Applying Lemma 2 and Theorem 2 to our channel when its law is conditioned on T = t yields the following corollary.
Corollary 1.
On the MMANC, for any λ 1 , λ 2 > 0 with λ 1 + λ 2 < 1 , any 0 < η < 1 λ 1 λ 2 , and any fixed t T , the condition:
p MD i ( t ) + p FA i ( t ) < λ 1 + λ 2 , i N
implies:
1 n log log N log | A | H ( Z n | T = t ) / n 1 ϵ + ψ n ( η ) ,
where ϵ = λ 1 + λ 2 + η .
Proof. 
Substituting X n for X , Y n for Y , P Y n | X n , T = t for P Y | X , and ϵ for ϵ in Theorem 2, we obtain:
sup P P ( X n ) inf Q P ( Y n ) log β ϵ ( P P Y n | X n , T = t , P × Q ) sup P P ( X n ) I P P Y n | X n , T = t ( X n ; Y n ) + h ( ϵ ) 1 ϵ .
Given P P ( X n ) and P Y n | X n , T = t , the mutual information term in (76) can be upper-bounded as follows:
(77) I P P Y n | X n , T = t ( X n ; Y n ) n log | A | H P P Y n | X n , T = t ( Y n | X n , T = t ) (78) = n log | A | x n P ( x n | T = t ) H ( Y n | X n = x n , T = t ) (79) = n log | A | x n P ( x n | T = t ) H ( Z n | X n = x n , T = t ) (80) = n log | A | H ( Z n | T = t ) .
Applying (76) and (80) to (58) in Lemma 2 establishes Corollary 1. □

4.3. Averaging over T

Corollary 1 deals with identification for a given fixed T = t , but our definition of achievability in (13) and (14) entails averaging over t, which we must thus study. We begin by lower-bounding the conditional entropy of the noise sequence Z n given the assistance T:
(81) H ( Z n | T ) = H ( Z n , T ) H ( T ) (82) { H ( Z n ) n R h } + (83) = n { H ( P Z ) R h } + .
We next define, for every δ > 0 , the subset of descriptions:
T * ( δ ) = t T : H ( Z n | T = t ) n { H ( P Z ) R h δ } + .
These are poor noise descriptions in the sense that, after they are revealed, the remaining uncertainty about the noise is still large. Key is that their probability is bounded away from zero. In fact, as we next argue:
P T ( T * ( δ ) ) δ log | A | H ( P Z ) + R h + δ if R h < H ( P Z ) δ 1 if R h H ( P Z ) δ
where in the second case the probability is 1 because when R h H ( P Z ) δ the condition appearing in the definition of T * ( δ ) in (84) translates to H ( Z n | T = t ) 0 . As to the first case, we begin with (83) to obtain:
(86) n ( H ( P Z ) R h ) H ( Z n | T ) (87) = t T * ( δ ) P T ( t ) H ( Z n | T = t ) + t T * ( δ ) P T ( t ) H ( Z n | T = t ) (88) 1 P T T * ( δ ) · n H ( P Z ) R h δ + P T T * ( δ ) · n log | A |
from which the first case of the bound in (85) follows. Here (87) follows from expressing T as the disjoint union of T * ( δ ) and T T * ( δ ) , and (88) follows from the definition of T * ( δ ) and the bound H ( Z n | T = t ) n log | A | .
Inequality (85) establishes that the probability of a poor description is lower bounded by a positive constant that does not depend on n. Using Corollary 1 for such t’s will be the key to the converse.
Henceforth, we fix some sequence of identification codes of rate R exceeding C Sh ( R h ) , i.e., satisfying R > log | A | { H ( P Z ) R h } + , and show that p MD , max + p FA , max cannot tend to 0 as n tends to . For such a rate R, there exist R , δ > 0 ; a pair λ 1 , λ 2 > 0 with λ 1 + λ 2 < 1 ; and some η ( 0 , 1 λ 1 λ 2 ) such that:
R > R > log | A | { H ( P Z ) R h δ } + 1 ϵ
where ϵ λ 1 + λ 2 + η < 1 . Fix such R , δ , λ 1 , λ 2 , η , and ϵ .
Since the inequality on R in (89) is strict, and since ψ n ( η ) tends to zero with n, it follows that the inequality continues to hold also when we add ψ n ( η ) to the RHS provided that n is sufficiently large, i.e., that there exists some n 0 ( η ) such that:
R > log | A | { H ( P Z ) R h δ } + 1 ϵ + ψ n ( η ) , n n 0 ( η ) .
It then follows from (90) and the definition of T * ( δ ) in (84) that, whenever n n 0 ( η ) , R exceeds the RHS of (75):
R > log | A | n 1 H ( Z n | T = t ) 1 ϵ , t T * ( δ ) .
Corollary 1 thus implies that, for n > n 0 ( η ) :
1 n log log N > R t T i ( t ) N s . t . p MD i ( t ) + p FA i ( t ) λ 1 + λ 2 .
However, we need a stronger statement because, in the above, the IM i for which p MD i ( t ) + p FA i ( t ) λ 1 + λ 2 depends on t, whereas in our definition of achievability we are averaging over T for fixed IM. The stronger result we will establish is that the condition on the LHS of (92) implies that, for all sufficiently large n, there exists some IM i * (that does not depend on t) which performs poorly for every t in T * ( δ ) , i.e., for which:
p MD i * ( t ) + p FA i * ( t ) λ 1 + λ 2 , t T * ( δ ) .
That is, we will show that for sufficiently large n:
1 n log log N > R i N s . t min t T ( δ ) p MD i ( t ) + p FA i ( t ) λ 1 + λ 2 .
To this end, define for each t T * ( δ ) :
N ( t ) = i N : p MD i ( t ) + p FA i ( t ) < λ 1 + λ 2
and consider the identification code that results when we restrict our code to the IMs in N ( t ) (while keeping the same acceptance regions). Applying Corollary 1 to this restricted code using (91), we obtain that:
1 n log log | N ( t ) | < R , t T * ( δ ) .
Consequently,
t T * ( δ ) N ( t ) t T * ( δ ) N ( t ) 2 n R h 2 2 n R
where the second inequality holds by (96) and the fact that T * ( δ ) is contained in T , and the latter’s cardinality is 2 n R h .
Since R < R (89), there exists some n 1 ( R , R , R h ) such that:
2 n R h 2 2 n R < 2 2 n R , n n 1 ( R , R , R h ) .
We can use this to upper-bound the RHS of (97) to obtain that, for n max n 0 ( η ) , n 1 ( R , R , R h ) :
t T * ( δ ) N ( t ) < N .
The complement (in N ) of the union on the LHS of (99) is thus not empty, which proves the existence of some i N for which (93) holds.
With i in hand, the converse follows from the fact that the probability that T is in T * ( δ ) is bounded away from zero (85), because for every n max n 0 ( η ) , n 1 ( R , R , R h ) :
(100) p MD , max + p FA , max = max i N t T P T ( t ) p MD i ( t ) + max i N t T P T ( t ) p FA i ( t ) (101) t T P T ( t ) p M i * D ( t ) + p FA i * ( t ) (102) t T * ( δ ) P T ( t ) p M i * D ( t ) + p FA i * ( t ) (103) t T * ( δ ) P T ( t ) · ( λ 1 + λ 2 ) (104) = P T ( T * ( δ ) ) · ( λ 1 + λ 2 )
where (100) follows from the definitions in (13) and (14); in (101) we replaced the maximum with the IM i * ; and (103) follows from (93). Thus, any code of rate R > log | A | { H ( P Z ) R h } + with large enough n must have p MD , max + p FA , max P T ( T * ) · ( λ 1 + λ 2 ) , and the latter is bounded away from zero. This concludes the proof of the converse part.

Author Contributions

Writing—original draft preparation, A.L. and B.N.; writing—review and editing, A.L. and B.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank Christian Deppe and Johannes Rosenberger who read a preprint of this paper and provided them with helpful comments. They also thank the guest editor and the anonymous reviewers for their insightful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MMANCMemoryless modulo-additive noise channel
IIDIdentical independent distribution
IMIdentification message
LHSLeft hand side
RHSRight hand side

Appendix A. Proof of Proposition 1

Proof. 
As in [4], the code construction entails time-sharing between two schemes: a “zero-rate help scheme” corresponding to help of zero rate, and a “high-rate help scheme” corresponding to help of a rate exceeding the noise entropy. In the former, the help comprises one bit, indicating whether or not the noise is typical. We denote this help T ( z ) and assume that it takes values in the set H = { τ , α } , with T ( z ) = τ indicating that the noise is typical and T ( z ) = α that it is atypical (when the help is to the encoder, the helper additionally provides the encoder with the description of one noise sample in order to enable the encoder to convey T ( z ) to the decoder error free).
When the help is of high rate, we denote it T ( h ) . It has two parts, that we denote T t / a ( h ) and T d ( h ) , so T ( h ) = ( T t / a ( h ) , T d ( h ) ) . The first part, T t / a ( h ) , indicates whether or not the noise is typical and hence takes values in H . The second part, T d ( h ) , describes the noise (perfectly) when the latter is typical, and is null otherwise (as above, when the help is to the encoder, the helper additionally provides the encoder with the description of one noise sample in order to enable the encoder to convey T t / a ( h ) to the decoder error free). The help in the time-sharing scheme, which we denote T, comprises the help in the zero-rate part and the help in the high-rate part:
T = T ( z ) , T ( h )
The duty cycle is chosen so that the rate of T be R h (or the entropy of the noise, if the latter is smaller than R h ). We assume throughout that R < log | A | .
The transmission code derived in [4] has two salient properties:
  • In the high-rate scheme, conditional on T t / a ( h ) = τ (i.e., on the noise being typical and that it can therefore be perfectly described by T d ( h ) ), no erasures are declared.
  • In the zero-rate scheme, conditional on T ( z ) = τ (i.e., on the noise being typical), the maximal (over the messages) probability of erasure is upper bounded by some ϵ n tending to zero.
(To guarantee the second property, the code is constructed—as in [4]—using random coding and we then expurgate half the codewords to obtain a code whose maximal probability of erasure is smaller than ϵ . The asserted property then follows by bounding, for each message, the conditional probability of erasure given that the noise is typical by the ratio of the unconditional probability of erasure to the probability that the noise is typical.)
We next analyze the time-sharing scheme. We focus on the case where 0 < R h H ( P Z ) . The remaining cases, where R h = 0 or R h > H ( P Z ) are very similar, except that they require no time sharing. We use the superscript ( h ) for quantities occurring in the high-rate help phase, and the superscript ( z ) for those in the zero-rate phase. For example, m ( h ) , X ( h ) , Y ( h ) , T ( h ) are the message, input sequence, output sequence, and help in the high-rate help phase; and the set of output sequences causing an erasure in this phase is denoted Y er ( h ) . The set of outputs causing an erasure in the time-sharing scheme is:
Y er = y n : y ( h ) Y er ( h ) or y ( z ) Y er ( z ) .
For the time-sharing scheme we now have:
(A3) t T P T ( t ) max m M P Y n | X n , T Y er | f m ( t ) , t t T P T ( t ) max m M [ P Y ( h ) | Y ( h ) , T ( h ) Y er ( h ) | f m ( h ) ( h ) ( t ( h ) ) , t ( h ) + P Y ( z ) | X ( z ) , T ( z ) Y er ( z ) | f m ( z ) ( z ) ( t ( z ) ) , t ( z ) ] (A4) t T P T ( t ) max m ( h ) M ( h ) P Y ( h ) | X ( h ) , T ( h ) Y er ( h ) | f m ( h ) ( h ) ( t ( h ) ) , t ( h ) + t T P T ( t ) max m ( z ) M ( z ) P Y ( z ) | X ( z ) , T ( z ) Y er ( z ) | f m ( z ) ( z ) ( t ( z ) ) , t ( z ) (A5) = t ( h ) ) P T ( h ) ( t ( h ) ) max m ( h ) M ( h ) P Y ( h ) | X ( h ) , T ( h ) Y er ( h ) | f m ( h ) ( h ) ( t ( h ) ) , t ( h ) + t ( z ) P T ( z ) ( t ( z ) ) max m ( z ) M ( z ) P Y ( z ) | X ( z ) , T ( z ) Y er ( z ) | f m ( z ) ( z ) ( t ( z ) ) , t ( z ) (A6) P T t / a ( h ) = α + P T ( z ) = α · 1 + P T ( z ) = τ · ϵ n (A7) P T t / a ( h ) = α + P T ( z ) = α + ϵ n
which establishes the proposition, because the RHS tends to zero. Here (A3) follows from (A2) and the union-of-events bound; (A4) holds (in this case with equality) because the maximum of a sum is upper bounded by the sum of the maxima; and (A6) holds by the aforementioned salient properties of the code construction. □

References

  1. Kim, Y.H. Capacity of a class of deterministic relay channels. IEEE Trans. Inf. Theory 2008, 54, 1328–1329. [Google Scholar] [CrossRef]
  2. Bross, S.I.; Lapidoth, A.; Marti, G. Decoder-assisted communications over additive noise channels. IEEE Trans. Commun. 2020, 68, 4150–4161. [Google Scholar] [CrossRef]
  3. Lapidoth, A.; Marti, G. Encoder-assisted communications over additive noise channels. IEEE Trans. Inf. Theory 2020, 66, 6607–6616. [Google Scholar] [CrossRef]
  4. Lapidoth, A.; Marti, G.; Yan, Y. Other helper capacities. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Virtual, 12–20 July 2021; pp. 1272–1277. [Google Scholar] [CrossRef]
  5. Lapidoth, A.; Yan, Y. The listsize capacity of the Gaussian channel with decoder assistance. Entropy 2022, 24, 29. [Google Scholar] [CrossRef] [PubMed]
  6. Merhav, N. On error exponents of encoder-assisted communication systems. IEEE Trans. Inf. Theory 2021, 67, 7019–7029. [Google Scholar] [CrossRef]
  7. Ahlswede, R.; Dueck, G. Identification via channels. IEEE Trans. Inf. Theory 1989, 35, 15–29. [Google Scholar] [CrossRef]
  8. Ahlswede, R.; Cai, N.; Zhang, Z. Erasure, list, and detection zero-error capacities for low noise and a relation to identification. IEEE Trans. Inf. Theory 1996, 42, 55–62. [Google Scholar] [CrossRef]
  9. Steinberg, Y.; Merhav, N. Identification in the presence of side information with application to watermarking. IEEE Trans. Inf. Theory 2001, 47, 1410–1422. [Google Scholar] [CrossRef]
  10. Ahlswede, R.; Dueck, G. Identification in the presence of feedback—A discovery of new capacity formulas. IEEE Trans. Inf. Theory 1989, 35, 30–36. [Google Scholar] [CrossRef]
  11. Wiese, M.; Labidi, W.; Deppe, C.; Boche, H. Identification over Additive Noise Channels in the Presence of Feedback. IEEE Trans. Inf. Theory 2022, 1. [Google Scholar] [CrossRef]
  12. Watanabe, S. Minimax converse for identification via channels. IEEE Trans. Inf. Theory 2022, 68, 25–34. [Google Scholar] [CrossRef]
  13. Han, T.; Verdú, S. New results in the theory of identification via channels. IEEE Trans. Inf. Theory 1992, 38, 14–25. [Google Scholar] [CrossRef]
  14. Bracher, A.; Lapidoth, A. Identification via the broadcast channel. IEEE Trans. Inf. Theory 2017, 63, 3480–3501. [Google Scholar] [CrossRef]
  15. Steinberg, Y. New converses in the theory of identification via channels. IEEE Trans. Inf. Theory 1998, 44, 984–998. [Google Scholar] [CrossRef]
  16. Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
  17. Rosenberger, J.; Ibrahim, A.; Bash, B.A.; Deppe, C.; Ferrara, R.; Pereg, U. Capacity Bounds for Identification with Effective Secrecy. arXiv 2023, arXiv:2306.14792. [Google Scholar]
  18. Lapidoth, A. A Foundation in Digital Communication, 2nd ed.; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
  19. Aishwarya, G.; Madiman, M. Remarks on Rényi versions of conditional entropy and mutual information. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1117–1121. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lapidoth, A.; Ni, B. Assisted Identification over Modulo-Additive Noise Channels. Entropy 2023, 25, 1314. https://doi.org/10.3390/e25091314

AMA Style

Lapidoth A, Ni B. Assisted Identification over Modulo-Additive Noise Channels. Entropy. 2023; 25(9):1314. https://doi.org/10.3390/e25091314

Chicago/Turabian Style

Lapidoth, Amos, and Baohua Ni. 2023. "Assisted Identification over Modulo-Additive Noise Channels" Entropy 25, no. 9: 1314. https://doi.org/10.3390/e25091314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop