Joint Identification and Sensing for Discrete Memoryless Channels

Labidi, Wafa; Zhao, Yaning; Deppe, Christian; Boche, Holger

doi:10.3390/e27010012

Open AccessFeature PaperArticle

Joint Identification and Sensing for Discrete Memoryless Channels

¹

Chair of Theoretical Information Technology, Technical University of Munich, 80333 Munich, Germany

²

Institute for Communications Technology, Technische Universität Braunschweig, 38106 Brunswick, Germany

^*

Author to whom correspondence should be addressed.

^†

Current address: BMBF Research Hub 6G-Life, 80333 Munich, Germany.

^‡

Current address: Cyber Security in the Age of Large-Scale Adversaries–Exzellenzcluster, Ruhr-Universität Bochum, 44780 Bochum, Germany.

^§

Current address: Munich Center for Quantum Science and Technology (MCQST), 80799 Munich, Germany.

^‖

Current address: Munich Quantum Valley (MQV), 80799 München, Germany.

Entropy 2025, 27(1), 12; https://doi.org/10.3390/e27010012

Submission received: 19 November 2024 / Revised: 23 December 2024 / Accepted: 24 December 2024 / Published: 27 December 2024

(This article belongs to the Special Issue Integrated Sensing and Communications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In the identification (ID) scheme proposed by Ahlswede and Dueck, the receiver’s goal is simply to verify whether a specific message of interest was sent. Unlike Shannon’s transmission codes, which aim for message decoding, ID codes for a discrete memoryless channel (DMC) are far more efficient; their size grows doubly exponentially with the blocklength when randomized encoding is used. This indicates that when the receiver’s objective does not require decoding, the ID paradigm is significantly more efficient than traditional Shannon transmission in terms of both energy consumption and hardware complexity. Further benefits of ID schemes can be realized by leveraging additional resources such as feedback. In this work, we address the problem of joint ID and channel state estimation over a DMC with independent and identically distributed (i.i.d.) state sequences. State estimation functions as the sensing mechanism of the model. Specifically, the sender transmits an ID message over the DMC while simultaneously estimating the channel state through strictly causal observations of the channel output. Importantly, the random channel state is unknown to both the sender and the receiver. For this system model, we present a complete characterization of the ID capacity–distortion function.

Keywords:

joint identification and sensing; message identification; information theory

1. Introduction

The identification (ID) scheme suggested by Ahlswede and Dueck [1] in 1989 is conceptually different from the classical message transmission scheme proposed by Shannon [2]. In classical message transmission, the encoder transmits a message over a noisy channel; at the receiver side, the aim of the decoder is to output an estimation of this message based on the channel observation. In the ID paradigm, however, the encoder sends an ID message (also called the identity) over a noisy channel, and the decoder aims to check whether or not a specific ID message of special interest to the receiver has been sent. Obviously, the sender has no prior knowledge of the specific ID message that the receiver is interested in. Ahlswede and Dueck demonstrated that in the theory of ID [1], if randomized encoding is used, then the size of ID codes for discrete memoryless channels (DMCs) grows doubly exponentially fast with the blocklength. If only deterministic encoding is allowed, then the number of identities that can be identified over a DMC scales exponentially with the blocklength. Nevertheless, the rate is still more significant than the transmission rate in the exponential scale, as shown in [3,4].

New applications in modern communications demand high reliability and latency requirements, including machine-to-machine and human-to-machine systems, digital watermarking [5,6,7], industry 4.0 [8], and 6G communication systems [9,10]. The aforementioned requirements are crucial for achieving trustworthiness [11]. For this purpose, the necessary latency resilience and data security requirements must be embedded in the physical domain. In this situation, the classical Shannon message transmission is limited and an ID scheme can achieve a better scaling behavior in terms of necessary energy and needed hardware components. It has been proved that information-theoretic security can be integrated into the ID scheme without paying an extra price for secrecy [12,13]. Further gains within the ID paradigm can be achieved by taking advantage of additional resources such as quantum entanglement, common randomness (CR), and feedback. In contrast to the classical Shannon message transmission, feedback can increase the ID capacity of a DMC [14]. Furthermore, it has been shown in [15] that the ID capacity of Gaussian channels with noiseless feedback is infinite. This holds to both rate definitions

\frac{1}{n} log M

(as defined by Shannon for classical transmission) and

\frac{1}{n} log log M

(as defined by Ahlswede and Dueck for ID over DMCs). Interestingly, the authors of [15] showed that the ID capacity with noiseless feedback remains infinite regardless of the scaling used for the rate, e.g., double exponential, triple exponential, etc. In addition, the resource CR allows for a considerable increase in the ID capacity of channels [6,16,17]. The aforementioned communication scenarios emphasize that the ID capacity has completely different behavior than Shannon’s capacity.

A key technology within 6G communication systems is the joint design of radio communication and sensor technology [11]. This enables the realization of revolutionary end-user applications [18]. Joint communication and radar/radio sensing (JCAS) means that sensing and communication are jointly designed based on sharing the same bandwidth. Sensing and communication systems are usually designed separately, meaning that resources are dedicated to either sensing or data communications. The joint sensing and communication approach is a solution that can overcome the limitations of separation-based approaches. Recent works [19,20,21,22] have explored JCAS and showed that this approach can improve spectrum efficiency while minimizing hardware costs. For instance, the fundamental limits of joint sensing and communication for a point-to-point channel were studied in [23]. In this case, the transmitter wishes to simultaneously send a message to the receiver and sense its channel state through a strictly causal feedback link. Motivated by the drastic effects of feedback on the ID capacity [15], this work investigates joint ID and sensing. To the best of our knowledge, the problem of joint ID and sensing has not been treated in the literature yet. We study the problem of joint ID and channel state estimation over a DMC with i.i.d. state sequences. The sender simultaneously sends an ID message over the DMC with a random state and estimates the channel state via a strictly causal channel output. The random channel state is available to neither the sender nor the receiver. We consider the ID capacity–distortion tradeoff as a performance metric. This metric is analogous to the one studied in [24], and is defined as the supremum of all achievable ID rates such that some distortion constraint on state sensing is fulfilled. This model was motivated by the problem of adaptive and sequential optimization of the beamforming vectors during the initial access phase of communication [25]. We establish a lower bound on the ID capacity–distortion tradeoff. In addition, we show that in our communication setup, sensing can be viewed as an additional resource that increases the ID capacity.

Outline:The remainder of this paper is organized as follows. Section 2 introduces the system models, reviews key definitions related to identification (ID), and presents the main results, including a complete characterization of the ID capacity–distortion function. Section 3 provides detailed proofs of these main results. In Section 4, we explore an alternative more flexible distortion constraint, namely, the average distortion, and establish a lower bound on the corresponding ID capacity–distortion function. Finally, Section 5 concludes the paper with a discussion of the results and potential directions for future research.

Notation: The distribution of an RV X is denoted by

P_{X}

; for a finite set

X

, we denote the set of probability distributions on

X

by

P (X)

and the cardinality of

X

by

| X |

. If X is a RV with distribution

P_{X}

, we denote the Shannon entropy of X by

H (P_{X})

, the expectation of X by

E (X)

, and the variance of X by

Var [X]

. If X and Y are two RVs with probability distributions

P_{X}

and

P_{Y}

, then the mutual information between X and Y is denoted by

I (X; Y)

. Finally,

X^{c}

denotes the complement of

X

,

X - Y

denotes the difference set, and all logarithms and information quantities are taken to base 2.

2. System Models and Main Results

Consider a discrete memoryless channel with random state

(X \times S, W_{S} (y | x, s), Y)

consisting of a finite input alphabet

X

, finite output alphabet

Y

, finite state set

S

, and pmf

W (y | x, s)

on

Y

. The channel is memoryless, i.e., if the input sequence

x^{n} \in X^{n}

is sent and the sequence state is

s^{n} \in S^{n}

, then the probability of a sequence

y^{n} \in Y^{n}

being received is provided by

W_{S}^{n} (y^{n} | x^{n}, s^{n}) = \prod_{i = 1}^{n} W_{S} (y_{i} | x_{i}, s_{i}) .

(1)

The state sequence

(S_{1}, S_{2}, \dots, S_{n})

is i.i.d. according to the distribution

P_{S}

. We assume that the input

X_{i}

and state

S_{i}

are statistically independent for all

i \in {1, 2, \dots, n}

. In our settup depicted in Figure 1, we assume that the channel state is known to neither the sender nor the receiver.

In the sequelae, we distinguish three scenarios:

Randomized ID over the state-dependent channel $W_{S}$ , as depicted in Figure 1,
Deterministic or randomized ID over the state-dependent channel $W_{S}$ in the presence of noiseless feedback between the sender and the receiver, as depicted in Figure 2,
Joint deterministic or randomized ID and sensing, in which the sender wishes to simultaneously send an identity to the receiver and sense the channel state sequence based on the output of the noiseless feedback link, as depicted in Figure 3.

First, we define randomized ID codes for the state-dependent channel defined above.

Definition 1.

An

(n, N, λ_{1}, λ_{2})

randomized ID code with

λ_{1} + λ_{2} < 1

for channel

W_{S}

is a family of pairs

{{(Q (\cdot | i), D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N}

with

Q (\cdot | i) \in P (X^{n}), D_{i} (s^{n}) \in Y^{n}, \forall s^{n} \in S^{n}, \forall i = 1, \dots, N,

(2)

such that the errors of the first kind and the second kind are bounded as follows:

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{x^{n} \in X^{n}} Q (x^{n} | i) W_{S}^{n} (D_{i} {(s^{n})}^{c} | x^{n}, s^{n}) & \leq λ_{1}, \forall i, \end{matrix}

(3)

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{x^{n} \in X^{n}} Q (x^{n} | i) W_{S}^{n} (D_{j} (s^{n}) | x^{n}, s^{n}) & \leq λ_{2}, \forall i \neq j . \end{matrix}

(4)

In the following, we define the achievable ID rate and ID capacity for our system model.

Definition 2.

1.: The rate R of a randomized $(n, N, λ_{1}, λ_{2})$ ID code for the channel $W_{S}$ is $R = \frac{log log (N)}{n}$ bits.
2.: The ID rate R for $W_{S}$ is said to be achievable if for $λ \in (0, \frac{1}{2})$ there exists an $n_{0} (λ)$ such that for all $n \geq n_{0} (λ)$ there exists an $(n, 2^{2^{n R}}, λ, λ)$ randomized ID code for $W_{S}$ .
3.: The randomized ID capacity $C_{I D} (W_{S})$ of the channel $W_{S}$ is the supremum of all achievable rates.

The following theorem characterizes the randomized ID capacity of the state-dependent channel

W_{S}

when the state information is known to neither the sender nor the receiver.

Theorem 1.

The randomized ID capacity of the channel

W_{S}

is provided by

C_{I D} (W_{S}) = C (W_{S}) = max_{P_{X} \in P (X)} I (X; Y),

(5)

where

C (W_{S})

denotes the Shannon transmission capacity of

W_{S}

.

Proof.

The proof of Theorem 1 follows from Theorem 6.6.4 of [26] and Equation (7.2) of [27]. Because the channel

W_{S}

satisfies the strong converse property [26], the randomized ID capacity of

W_{S}

coincides with its Shannon transmission capacity determined in [27]. □

Now, we consider the second scenario depicted in Figure 2. Let further

{\bar{Y}}^{n} = ({\bar{Y}}_{1}, \dots, {\bar{Y}}_{n}) \in Y^{n}

denote the output of the noiseless backward (feedback) channel:

{\bar{Y}}_{t} = Y_{t}, \forall t \in {1, \dots, n} .

(6)

In the following, we define a deterministic and randomized ID feedback code for the state-dependent channel

W_{S}

.

Definition 3.

An

(n, N, λ_{1}, λ_{2})

deterministic ID feedback code

\{{(f_{i}, D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N\}

with

λ_{1} + λ_{2} < 1

for the channel

W_{S}

is characterized as follows. The sender wants to send an ID message

i \in N : = {1, \dots, N}

that is encoded by the vector-valued function

\begin{matrix} f_{i} = [f_{i}^{1}, f_{i}^{2} \dots, f_{i}^{n}], \end{matrix}

(7)

where

f_{i}^{1} \in X

and

f_{i}^{t} : Y^{t - 1} ⟶ X

for

t \in {2, \dots, n}

. At

t = 1

, the sender sends

f_{i}^{1}

. At

t \in {2, \dots, n}

, the sender sends

f_{i}^{t} (Y_{1}, \dots, Y^{t - 1})

. The decoding sets

D_{i} (s^{n}) \subset Y^{n}, \forall i \in {1, \dots, N}, and \forall s^{n} \in S^{n}

should satisfy the following inequalities:

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) W_{S}^{n} (D_{i} {(s^{n})}^{c} | f_{i}, s^{n}) & \leq λ_{1} \forall i, \end{matrix}

(8)

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) W_{S}^{n} (D_{j} (s^{n}) | f_{i}, s^{n}) & \leq λ_{2} \forall i \neq j . \end{matrix}

(9)

Definition 4.

An

(n, N, λ_{1}, λ_{2})

randomized ID feedback code

\{{(Q_{F} (\cdot | i), D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N\}

with

λ_{1} + λ_{2} < 1

for the channel

W_{S}

is characterized as follows. The sender wants to send an ID message

i \in N : = {1, \dots, N}

that is encoded by the probability distribution

\begin{matrix} Q_{F} (\cdot | i) \in P (F^{n}), \end{matrix}

(10)

where

Q_{F} (\cdot | i)

denotes the set of all n-length functions

f

as

F^{n}

. The decoding sets

D_{i} (s^{n}) \subset Y^{n}, \forall i \in {1, \dots, N}

,

and \forall s^{n} \in S^{n}

should satisfy the following inequalities:

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{f \in F^{n}} Q_{F} (f | i) W_{S}^{n} (D_{i} {(s^{n})}^{c} | f, s^{n}) & \leq λ_{1} \forall i, \end{matrix}

(11)

\begin{matrix} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{f \in F^{n}} Q_{F} (f | i) W_{S}^{n} (D_{j} (s^{n}) | f, s^{n}) & \leq λ_{2} \forall i \neq j . \end{matrix}

(12)

Definition 5.

1.: The rate R of a (deterministic/randomized) $(n, N, λ_{1}, λ_{2})$ ID feedback code for the channel $W_{S}$ is $R = \frac{log log (N)}{n}$ bits.
2.: The (deterministic/randomized) ID feedback rate R for $W_{S}$ is said to be achievable if for $λ \in (0, \frac{1}{2})$ there exists an $n_{0} (λ)$ such that for all $n \geq n_{0} (λ)$ there exists a (deterministic/randomized) $(n, 2^{2^{n R}}, λ, λ)$ ID feedback code for $W_{S}$ .
3.: The (deterministic/randomized) ID feedback capacity $C_{I D f}^{d} (W_{S})$ / $C_{I D f}^{r} (W_{S})$ of the channel $W_{S}$ is the supremum of all achievable rates.

It has been demonstrated in [26] that noise increases the ID capacity of the DMC in the case of feedback. Intuitively, noise is considered a source of randomness, i.e., a random experiment for which outcome is provided to the sender and receiver via the feedback channel. Thus, adding a perfect feedback link enables the realization of a correlated random experiment between the sender and the receiver. The size of this random experiment can be used to compute the growth of the ID rate. This result has been further emphasized in [15,28], where it has been shown that the ID capacity of the Gaussian channel with noiseless feedback is infinite. This is because the authors of [15,28] provided a coding scheme that generates infinite common randomness between the sender and the receiver. Here, we want to investigate the effect of feedback on the ID capacity of the system model depicted in Figure 2. Theorem 2 characterizes the ID feedback capacity of the state-dependent channel

W_{S}

with noiseless feedback. The proof of Theorem 2 is provided in Section 3.

Theorem 2.

If

C (W_{S}) > 0

, then the deterministic ID feedback capacity of

W_{S}

is provided by

C_{I D f}^{d} (W_{S}) = max_{x \in X} H (E [W_{S} (\cdot | x, S)]) .

(13)

Theorem 3.

If

C (W_{S}) > 0

, then the randomized ID feedback capacity of

W_{S}

is provided by

C_{I D f}^{r} (W_{S}) = max_{P \in P (X)} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S)]) .

(14)

Remark 1.

It can be shown that the same ID feedback capacity formula holds if the channel state is known to either the sender or the receiver. This is because we achieve the same amount of common randomness as in the scenario depicted in Figure 2. Intuitively, the channel state is an additional source of randomness that we can take advantage of.

Now, we consider the third scenario depicted in Figure 3, where we want to jointly identify and sense the channel state. The sender comprises an encoder that sends a symbol

x_{t} = f_{i}^{t} (y^{t - 1})

for each identity

i \in {1, \dots, N}

and delayed feedback output

y^{t - 1} \in Y^{t - 1}

along with a state estimator that outputs an estimation sequence

{\hat{s}}^{n} \in S^{n}

based on the feedback output and input sequence. We define the per-symbol distortion as follows:

d_{t} = E [d (S_{t}, {\hat{S}}_{t})],

(15)

where

d : S \times S \to [0, + \infty)

is a distortion function and the expectation is over the joint distribution of

(S_{t}, {\hat{S}}_{t})

conditioned by the ID message

i \in {1, \dots, N}

.

Definition 6.

1.: An ID rate–distortion pair $(R, D)$ for $W_{S}$ is said to be achievable if for every $λ \in (0, \frac{1}{2})$ there exists an $n_{0} (λ)$ such that for all $n \geq n_{0} (λ)$ there exists an $(n, 2^{2 n R}, λ, λ)$ (deterministic/randomized) ID code for $W_{S}$ and if $d_{t} \leq D$ for all $t = 1, \dots, n$ .
2.: The deterministic ID capacity–distortion function $C_{I D}^{d} (D)$ is defined as the supremum of R such that $(R, D)$ is achievable.

Without loss of generality, we choose the following deterministic estimation function

h^{★}

:

\hat{s} = h^{★} (x, y) = min_{h : X \times Y \to S} E [d (S, h (X, Y)) | X = x, Y = y],

(16)

where

h : X \times Y \to S

is an estimator that maps a channel input–feedback output pair to a channel state. If there exist several functions

h^{★} (\cdot, \cdot)

, we choose one randomly. We define the minimal distortion function for each input symbol

x \in X

as in [29]:

d^{★} (x) = E_{S Y} [d (S, h^{*} (X, Y)) | X = x]

(17)

and the minimal distortion function for each input distribution

P \in P (X)

as

d^{★} (P) = \sum_{x \in X} d^{★} (x) .

(18)

In the following, we establish the ID capacity–distortion function defined above.

Theorem 4.

The deterministic ID capacity–distortion function of the state-dependent channel

W_{S}

depicted in Figure 3 is provided by

C_{I D}^{d} (D) = max_{x \in X_{D}} H (E [W_{S} (\cdot | x, S)]),

(19)

where the set

X_{D}

is provided by

X_{D} = {x \in X, d^{★} (x) \leq D} .

(20)

We now turn our attention to a randomized encoder. In the following, we derive the ID capacity–distortion function of the state-dependent channel

W_{S}

under the assumption of randomized encoding.

Theorem 5.

The randomized ID capacity–distortion function of the state-dependent channel

W_{S}

is provided by

C_{I D}^{r} (D) = max_{P \in P_{D}} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S)]),

(21)

where the set

P_{D}

is provided by

P_{D} = {P \in P (X), d^{★} (P) \leq D} .

(22)

Remark 2.

Randomized encoding achieves higher rates than deterministic encoding. This is because we are combining two sources of randomness: local randomness used for the encoding, and shared randomness generated via the noiseless feedback link. The result is analogous to randomized ID over DMCs in the presence of noiseless feedback, as studied in [14].

3. Proof of the Main Results

In this section, we provide the proofs of Theorems 2–5.

3.1. Direct Proof of Theorem 2

Proof.

We consider an average channel

W_{S, avg}

provided by

W_{S, avg} (y | x) = \sum_{s \in S} P_{S} (s) W_{S} (y | x, s), \forall x \in X, \forall y \in Y .

(23)

The DMC

W_{S, avg}

is obtained by averaging the DMCs

W_{S}

over the state. Now, it suffices to show that

R = {max}_{x \in X} H (E [W_{S} (\cdot | x, S)])

is an achievable ID feedback rate for the average channel

W_{S}^{a}

. The deterministic ID feedback capacity of the average channel

C_{I D f}^{d} (W_{S, avg})

can be determined by applying Theorem 1 of [28] on

W_{S, avg}

. If the transmission capacity

C (W_{S, avg})

of

W_{S, avg}

is positive, we have

\begin{matrix} C_{I D f}^{d} (W_{S, avg}) & \leq max_{x \in X} H (W_{S, avg} (\cdot | x)) \end{matrix}

(24)

\begin{matrix} = max_{x \in X} H (E [W_{S} (\cdot | x, S)]) . \end{matrix}

(25)

This completes the direct proof of Theorem 2. □

3.2. Converse Proof of Theorem 2

Proof.

For the converse proof, we use the techniques of [14] for deterministic ID over DMCs with noiseless feedback. We first extend Lemma 3 of [14] (image size for a deterministic feedback strategy) to the deterministic ID feedback code for

W_{S}

described in Definition 3. □

Lemma 1.

For any feedback strategy

f = [f^{1}, f^{2} \dots, f^{n}]

and any

μ \in (0, 1)

, we have

min_{E_{1} \subset Y^{n} : E_{S^{n}} [W_{S}^{n} (E_{1} | f, S^{n})] \geq 1 - μ} | E_{1} | \leq K_{1},

(26)

where

K_{1}

is provided by

K_{1} = 2^{n {max}_{x \in X} H (E [W_{S} (\cdot | x)]) + α \sqrt{n}} = 2^{n H (E [W_{S} (\cdot | x^{★})]) + α \sqrt{n}},

(27)

where

α = \sqrt{\frac{β}{μ}}, β = max ({log}^{2} (3), {log}^{2} (| Y |) .

(28)

Proof.

We use a similar idea as for the proof of Lemma 3 of [14]. Let

E_{1}^{★} \subset Y^{n}

be defined as follows:

E_{1}^{★} = \{y^{n} \in Y^{n}, - log E_{S^{n}} [W_{S}^{n} (y^{n} | f, S^{n})] \leq log K_{1}\} .

(29)

It then follows from the definition of

E_{1}^{★}

that

| E^{★} | \leq K_{1}

. It remains to show that

E_{S^{n}} [W_{S}^{n} (y^{n} | f, S^{n})] \geq 1 - μ .

For

t = 1, 2, \dots, n

and a fixed feedback strategy

f = [f^{1}, \dots, f^{t}] \in X

, we have

\begin{matrix} Pr {Y^{t} = y^{t}} & = \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) W^{n} (y^{n} | f, s^{n}) \end{matrix}

(30)

\begin{matrix} = E_{S^{n}} [W_{S}^{n} (y^{n} | f, S^{n})], \forall y^{t} \in Y^{t} . \end{matrix}

(31)

Let the RV

Z_{t}

be defined as follows:

Z_{t} = - log E_{S_{t}} [W_{S} (Y_{t} | f^{t} (Y^{(t - 1)}, S_{t})] .

(32)

We have

\begin{matrix} E_{S^{n}} [W_{S}^{n} (E_{1}^{★} | f, S^{n})] \end{matrix}

(33)

\begin{matrix} = Pr \{- log E_{S^{n}} [W_{S}^{n} (y^{n} | f, S^{n})] \leq log K_{1}\} \end{matrix}

(34)

\begin{matrix} = Pr \{- log (\sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \prod_{t = 1}^{n} W_{S} (y_{t} | f^{t}, s_{t})) \leq log K_{1}\} \end{matrix}

(35)

\begin{matrix} = Pr \{- log (\sum_{s_{1} \in S} \sum_{s_{2} \in S} \dots \sum_{s_{n} \in S} \prod_{t = 1}^{n} P_{S} (s_{t}) W_{S} (y_{t} | f^{t}, s_{t})) \leq log K_{1}\} \end{matrix}

(36)

\begin{matrix} = Pr {- log (\sum_{s_{1} \in S} P_{S} (s_{1}) W_{S} (y_{1} | f^{1}, s_{1}) \end{matrix}

(37)

\begin{matrix} \cdot \sum_{s_{2} \in S} P_{S} (s_{2}) W_{S} (y_{2} | f^{2}, s_{2}) \dots \sum_{s_{n} \in S} P_{S} (s_{n}) W_{S} (y_{n} | f^{n}, s_{n})) \leq log K_{1}} \end{matrix}

(38)

\begin{matrix} = Pr \{\sum_{t = 1}^{n} - log E_{S_{t}} [W_{S} (Y_{t} | f^{t} (Y^{(t - 1)}, S_{t})] \leq log K_{1}\} \end{matrix}

(39)

\begin{matrix} = Pr \{\sum_{t = 1}^{n} Z_{t} \leq log K_{1}\} . \end{matrix}

(40)

Now, we want to establish a lower bound on

E_{S^{n}} [W_{S}^{n} (E^{★} | f, S^{n})]

. It suffices to find a lower bound on the expression in (40). Let the RV

U_{t}

be defined as follows:

U_{t} = Z_{t} - E [Z_{t} | Y^{t - 1}], t \in {1, 2, \dots, n} .

(41)

It can be shown that

\begin{matrix} E [U_{t} | Y^{t - 1}] & = E_{Y} [Z_{t} - E_{Y} [Z_{t} | Y^{t - 1}] | Y^{t - 1}] \end{matrix}

(42)

\begin{matrix} = E_{Y} [Z_{t} | Y^{t - 1}] - E_{Y} [Z_{t} | Y^{t - 1}] \end{matrix}

(43)

\begin{matrix} = 0 . \end{matrix}

(44)

Furthermore, we have

\begin{matrix} E [U_{t}] & = E_{Y} [Z_{t} - E_{Y} [Z_{t} | Y^{t - 1}]] \end{matrix}

(45)

\begin{matrix} = E_{Y} [Z_{t}] - E_{Y} [E_{Y} [Z_{t} | Y^{t - 1}]] \end{matrix}

(46)

\begin{matrix} = E_{Y} [Z_{t}] - E_{Y} [Z_{t}] \end{matrix}

(47)

\begin{matrix} = 0 . \end{matrix}

(48)

It can be shown that for

y^{t - 1} \in Y^{t - 1}

,

\begin{matrix} E [Z_{t} | y^{t - 1}] & = \sum_{y_{t} \in Y} (- E_{S_{t}} [W_{S} (y_{t} | f^{t} (y^{t - 1}), S_{t})] \end{matrix}

(49)

\begin{matrix} log E_{S_{t}} [W_{S} (y_{t} | f^{t} (y^{t - 1}), S_{t})]) \end{matrix}

(50)

\begin{matrix} \leq max_{x \in X} H (E_{S} [W_{S} (\cdot | x, S)]) \end{matrix}

(51)

\begin{matrix} = H (E_{S} [W_{S} (\cdot | x^{★}, S)]) . \end{matrix}

(52)

It follows from the definition of the RV

U_{t}

in (41) that

\begin{matrix} E_{S^{n}} [W_{S}^{n} (E^{★} | f, S^{n})] & = Pr \{\sum_{t = 1}^{n} (U_{t} + E [Z_{t} | Y^{t - 1}]) \leq log K_{1}\} \end{matrix}

(53)

\begin{matrix} = Pr \{\sum_{t = 1}^{n} U_{t} \leq log K_{1} - \sum_{t = 1}^{n} E [Z_{t} | Y^{t - 1}]\} \end{matrix}

(54)

\begin{matrix} \overset{(a)}{=} Pr {\sum_{t = 1}^{n} U_{t} \leq n H (E_{S} [W_{S} (\cdot | x^{★}, S)]) \end{matrix}

(55)

\begin{matrix} + α \sqrt{n} - \sum_{t = 1}^{n} E [Z_{t} | Y^{t - 1}]} \end{matrix}

(56)

\begin{matrix} \overset{(b)}{\geq} Pr \{\sum_{t = 1}^{n} U_{t} \leq α \sqrt{n}\}, \end{matrix}

(57)

where

(a)

follows from the definition of

K_{1}

in (27) and

(b)

follows from (52).

It can be verified that

Var [U_{t}] \leq β, \forall t = 1, 2, \dots, n .

(58)

Therefore, we can apply Chebyshev’s inequality and obtain

\begin{matrix} E_{S^{n}} [W_{S}^{n} (E^{★} | f, S^{n})] & \geq Pr \{\sum_{t = 1}^{n} U_{t} \leq α \sqrt{n}\} \end{matrix}

(59)

\begin{matrix} \overset{(a)}{\geq} 1 - μ, \end{matrix}

(60)

where

(a)

follows the definition of

β

in (28). This completes the proof of Lemma 1. □

We establish an upper bound on the deterministic ID feedback rate for the channel model

W_{S}

using Lemma 1. Let

\{{(f_{i}, D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N\}

be an

(n, N, λ, λ)

deterministic ID feedback code for channel

W_{S}

with

λ \in (0, \frac{1}{2})

, and let

μ \in (0, 1)

be chosen such that

1 - μ - λ < \frac{1}{2} .

(61)

For each feedback strategy

f_{i}

, we define the set

E_{i}

that satisfies (26). For

i \in {1, 2, \dots, N}

, we have

\begin{matrix} E_{S^{n}} [W_{S}^{n} (D_{i} (s^{n}) \cap E_{i} | f_{i}, S^{n})] & = 1 - E_{S^{n}} [W_{S}^{n} ({(D_{i} (s^{n}) \cap E_{i})}^{c} | f_{i}, S^{n})] \end{matrix}

(62)

\begin{matrix} = 1 - E_{S^{n}} [W_{S}^{n} ({(D_{i} (s^{n}))}^{c} \cup E_{i}^{c} | f_{i}, S^{n})] \end{matrix}

(63)

\begin{matrix} \overset{(a)}{\geq} 1 - E_{S^{n}} [W_{S}^{n} ({(D_{i} (s^{n}))}^{c} | f_{i}, S^{n})] - E_{S^{n}} [W_{S}^{n} (E_{i}^{c} | f_{i}, S^{n})] \end{matrix}

(64)

\begin{matrix} \overset{(b)}{\geq} 1 - λ - μ \end{matrix}

(65)

\begin{matrix} \overset{(c)}{>} \frac{1}{2}, \end{matrix}

(66)

where

(a)

follows from the union bound,

(b)

follows from the definition of the

(n, N, λ, λ)

ID feedback code and from (26), and

(c)

follows from (61). Similarly, from the definition of the ID feedback code

\{{(f_{i}, D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N\},

from (26) and (61), for

i \in {1, 2, \dots, N}

and

i \neq j

it follows that

E_{S^{n}} [W_{S}^{n} (D_{j} (s^{n}) \cap E_{j} | f_{i}, S^{n})] < \frac{1}{2} .

(67)

As the error of the second kind

λ

is smaller than

\frac{1}{2}

, all the sets

D_{i} (s^{n}) \cap E_{i}

are distinct. Therefore, any

(n, N, λ, λ)

deterministic ID feedback code

\{{(f_{i}, D_{i} (s^{n}))}_{s^{n} \in S^{n}}, i = 1, \dots, N\}

for channel

W_{S}

with

λ \in (0, \frac{1}{2})

has an associated

(n, N, λ^{'}, λ^{'})

deterministic ID feedback code

\{{(f_{i}, D_{i} (s^{n}))}_{s^{n} \in S^{n}} \cap E_{i}, i = 1, \dots, N\}

, where

λ^{'} \in (0, \frac{1}{2})

and the set

E_{i}

satisfies (26)

\forall i = 1, \dots, N

. Thus, per Lemma 1, the cardinality N of the deterministic ID feedback code is upper-bounded as follows:

\begin{matrix} N & \leq \sum_{k = 0}^{K_{1}} (\binom{{| Y |}^{n}}{k}) \leq {({| Y |}^{n})}^{K_{1}} \end{matrix}

(68)

\begin{matrix} = 2^{n log | Y | 2^{n H (E [W_{S} (\cdot | x^{★}, S)]) + α \sqrt{n}}} . \end{matrix}

(69)

This completes the converse proof of Theorem 2.

3.3. Direct Proof of Theorem 3

Proof.

Similarly, we can consider the average channel

W_{S, a v g}

defined in (23). It is sufficient to show that

R = {max}_{P \in P (X)} H (\sum_{x \in X} P (x) W_{S, a v g} (\cdot | x))

is an achievable randomized ID feedback rate for the average channel

W_{S, a v g}

. The randomized ID feedback capacity of the average channel

C_{I D f}^{r} (W_{S, a v g})

can be obtained by applying Theorem 2 of [28] on

W_{S, a v g}

. If the transmission capacity

C (W_{S, a v g})

of

W_{S, a v g}

is positive, then we have

\begin{matrix} C_{I D f}^{r} (W_{S, a v g}) & \leq max_{P \in P (X)} H (\sum_{x \in X} P (x) W_{S, a v g} (\cdot | x)) . \end{matrix}

(70)

This completes the direct proof of Theorem 3. □

3.4. Converse Proof of Theorem 3

Proof.

We first extend Lemma 4 of [14] (image size for a randomized feedback strategy) to the randomized ID feedback code for

W_{S}

. □

Lemma 2.

For any randomized feedback strategy

Q_{F} (\cdot)

over all n-length feedback encoding sets

F_{n}

and any

μ \in (0, 1)

,

\begin{matrix} min_{E_{2} \subset Y^{n} : \sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S}^{n} (E_{2} | f, S^{n})] \geq 1 - μ} | E_{2} | \leq K_{2}, \end{matrix}

(71)

where

K_{2}

is provided by

\begin{matrix} K_{2} = 2^{n {max}_{P \in P (X)} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S)]) + α \sqrt{n}} = 2^{n H (\sum_{x \in X} P^{★} (x) E [W_{S} (\cdot | x, S)]) + α \sqrt{n}}, \end{matrix}

(72)

where

α = \sqrt{\frac{β}{μ}}

,

β = max {log}^{2} 3, {log}^{2} | Y |

.

Proof.

We use a similar idea as for the proof of Lemma 4 in [14]. We define the set

E_{2}^{★} \subset Y^{n}

as follows:

\begin{matrix} E_{2}^{★} = \{y^{n} \in Y^{n}, - log \sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S} (y^{n} | f, S^{n})] \leq log K_{2}\} . \end{matrix}

(73)

From the definition of

E_{2}^{★}

, we have

|E_{2}^{★}| \leq K_{2} .

□

Then, it is sufficient to show that

\sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S}^{n} (E_{2} | f, S^{n})] \geq 1 - μ

. Let the RV

Z_{t}

be defined as follows:

Z_{t} = - log (\sum_{f^{t} \in F^{t}} Q^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (Y_{t} | f^{t} (Y^{t - 1}), S_{t})])

(74)

where

Q_{F} (f) = \prod_{t = 1}^{n} Q^{t} (f^{t})

and

F^{t}

is the set of all mapping

Y^{t - 1} \mapsto X

. We have

\begin{matrix} \sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S}^{n} (E_{2} | f, S^{n})] \end{matrix}

(75)

\begin{matrix} = P r \{- log (\sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S} (y^{n} | f, S^{n})]) \leq log K_{2}\} \end{matrix}

(76)

\begin{matrix} = P r \{- log (\sum_{f \in F_{n}} Q_{F} (f) \prod_{t = 1}^{n} E_{S_{t}} [W_{S} (Y_{t} | f^{t} (Y^{t - 1}), S_{t}]) \leq log K_{2}\} \end{matrix}

(77)

\begin{matrix} = P r \{- log (\sum_{f \in F_{n}} \prod_{t = 1}^{n} Q^{t} (f^{t}) \prod_{t = 1}^{n} E_{S_{t}} [W_{S} (Y_{t} | f^{t} (Y^{t - 1}), S_{t}]) \leq log K_{2}\} \end{matrix}

(78)

\begin{matrix} = P r \{- log (\prod_{t = 1}^{n} \sum_{f^{t} \in F^{t}} Q^{t} (f^{t}) E_{S_{t}} [W_{S} (Y_{t} | f^{t} (Y^{t - 1}), S_{t}]) \leq log K_{2}\} \end{matrix}

(79)

\begin{matrix} = P r \{\sum_{t = 1}^{n} Z_{t} \leq log K_{2}\} . \end{matrix}

(80)

Now, for any

y^{t - 1} \in Y^{t - 1}

, we consider

\begin{matrix} E [Z_{t} | y^{t - 1}] \end{matrix}

(81)

\begin{matrix} = \sum_{y_{t} \in Y} (- \sum_{f^{t} \in F^{t}} Q^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (y_{t} | f^{t} (y^{t - 1}), S_{t})] log (\sum_{f^{t} \in F^{t}} Q^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (y_{t} | f^{t} (y^{t - 1}), S_{t})])) \end{matrix}

(82)

\begin{matrix} \leq H (\sum_{x \in X} P^{★} (x) E [W_{S}^{n} (\cdot | x, S]) . \end{matrix}

(83)

Similarly, for all

t \in \{1, 2, \dots, n\}

, we define the RV

U_{t} = Z_{t} - E [Z_{t} | Y^{t - 1}]

. It has been shown in (44) and (48) that

E [U_{t} | Y^{t - 1}] = 0

and

E [U_{t}] = 0

. Combining (80) and (83), we have

\begin{matrix} \sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S}^{n} (E_{2} | f, S^{n})] \end{matrix}

(84)

\begin{matrix} = P r \{\sum_{t = 1}^{n} (U_{t} + E [Z_{t} | Y^{t - 1}]) \leq log K_{2}\} \end{matrix}

(85)

\begin{matrix} = P r \{\sum_{t = 1}^{n} U_{t} \leq n H (\sum_{x \in X} P^{★} (x) E [W_{S} (\cdot | x, S)]) + α \sqrt{n} - \sum_{t = 1}^{n} E [Z_{t} | Y^{t - 1}]\} \end{matrix}

(86)

\begin{matrix} \geq P r \{\sum_{t = 1}^{n} U_{t} \leq α \sqrt{n}\} . \end{matrix}

(87)

Assuming

V a r [U_{t}] \leq β

for all

t = 1, 2, \dots, n

, we can apply Chebyshev’s inequality to obtain

\begin{matrix} \sum_{f \in F_{n}} Q_{F} (f) E_{S^{n}} [W_{S}^{n} (E_{2} | f, S^{n})] & \geq 1 - \frac{β}{α^{2}} \end{matrix}

(88)

\begin{matrix} = 1 - μ . \end{matrix}

(89)

Replacing

K_{1}

in the converse proof of Theorem 2 with the corresponding

K_{2}

as outlined in Lemma 2 completes the converse proof of Theorem 3.

3.5. Direct Proof of Theorem 4

3.5.1. Coding Scheme

Proof.

To some extent, we use the same coding scheme elaborated in [14]. We choose the blocklength as

m = n + ⌈ \sqrt{n} ⌉

. Let

x^{★}

be some symbol in

X_{D}

. Regardless of which identity

i \in {1, \dots, N}

we want to identify, the sender first sends the sequence

{x^{★}}^{n} = (x^{★}, x^{★}, \dots, x^{★}) \in X_{D}^{n}

over the state-dependent channel

W_{S}

. The received sequence

y^{n} \in Y^{n}

becomes known to the sender (estimator and encoder) via the noiseless feedback link. The feedback provides the sender and receiver with knowledge of the outcome of the correlated random experiment

(Y^{n}, E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})])

. □

3.5.2. Common Randomness Generation

We want to generate uniform common randomness, as it is the most convenient form of common randomness [17]; therefore, we convert our correlated random experiment

(Y^{n}, E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})])

to a uniform one

(T^{n}, E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})])

. For

ϵ > 0

, the set

T^{n}

is provided by

T^{n} = ⋃_{V_{S}^{a} : ∥V_{S}^{a} - W_{S}^{a}∥ \leq ϵ} T_{V_{S}^{a}}^{n} ({x^{★}}^{n}),

(90)

where

∥V_{S}^{a} - W_{S}^{a}∥

is defined as follows.

Definition 7.

Let

W

be the set of stochastic matrices

W : X \to Y

. Let

W \in W

such that

P_{X Y} (x, y) = P_{X} (x) W (y | x), \forall x \in X, \forall y \in Y .

(91)

For

V, V^{'} \in W

, the distance

| V - V^{'} |

is defined as

∥V - V^{'}∥ = max_{x \in X, y \in Y} | V (y | x) - V^{'} (y | x) | .

(92)

Here,

V_{S}^{a}

denotes the average channel defined by

V_{S}^{a} = \sum_{s \in S} P_{S} (s) V_{S} (\cdot | \cdot, s) .

(93)

We introduce the following lemmas.

Lemma 3

([30]). Let

(x^{n}, y^{n})

be emitted by the DMS

P_{X Y} (\cdot, \cdot) = W (\cdot | \cdot) P_{X} (\cdot)

and let

V \in W

such that

| V - W | \leq ϵ

. Then, for every

ϵ > 0

there exist

δ^{'} > 0

and

n_{0} (ϵ)

such that for

n \geq n_{0} (ϵ)

we have

\sum_{y^{n} \in T_{V}^{n} (x^{n})} W^{n} (y^{n} | x^{n}) \geq 1 - 2^{- n δ^{'}} .

(94)

Lemma 4

([30]). Let

(x^{n}, y^{n})

be emitted by the DMS

P_{X Y} (\cdot, \cdot) = W (\cdot | \cdot) P_{X} (\cdot)

and let

V \in W

. For every

ϵ > 0

, there exist a

c (ϵ) > 0

and

n_{0} (ϵ)

such that for

n \geq n_{0} (ϵ)

we have

1.: $| ⋃_{V : ∥V - W∥ \leq ϵ} T_{V}^{n} (x^{n}) | \geq 2^{n (H (W | P_{X}) - c (ϵ))}$ ,
2.: $| ⋃_{V : ∥V - W∥ \leq ϵ} T_{V}^{n} (x^{n}) | \leq 2^{n (H (W | P_{X}) + c (ϵ))}$ ,
3.: If $∥V - W∥ \leq ϵ$ , $T_{V}^{n} (x^{n}) \neq \emptyset$ and $c (ϵ) \to 0$ if $ϵ \to 0$ , then

$| T_{V}^{n} (x^{n}) | \geq 2^{n (H (W | P_{X}) - c (ϵ))} .$

(95)

Lemma 5

([31]). Let

{X_{i}}

be i.i.d. RVs taking values in

[0, 1]

with mean μ; then,

\forall c > 0

with

p = μ + c \leq 1

we have

Pr {{\bar{X}}_{n} - μ \geq c} \leq exp (- n D (p | | μ)) \leq exp (- 2 n c^{2}),

(96)

where

{\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}

.

For arbitrary

D_{m i n} \leq D_{1} \leq D_{2}

, we define

X_{D_{1}}

and

X_{D_{2}}

by

\begin{matrix} X_{D_{1}} & = {x \in X, d^{★} (x) \leq D_{1}}, \end{matrix}

(97)

\begin{matrix} X_{D_{2}} & = {x \in X, d^{★} (x) \leq D_{2}} . \end{matrix}

(98)

It is clear that

g (D)

is a non-decreasing function, because

X_{D_{1}} \subseteq X_{D_{2}}

for arbitrary

D_{1} \leq D_{2}

. Letting

μ \in (0, 1)

, we have

\begin{matrix} g (μ D_{1} + (1 - μ) D_{2}) & = max_{x \in X_{μ D_{1} + (1 - μ) D_{2}}} H (E [W_{S} (\cdot | x, S)]) . \end{matrix}

(99)

It follows from Lemma 4 that

E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})]

is essentially uniform on

T^{n}

. Let the set

Y^{n} - T^{n}

be denoted by

E^{★}

. Per Lemma 3, we have

\begin{matrix} Pr {Y^{n} \in E^{★} | X^{n} = ({x^{★}}^{n}, {x^{★}}^{n}, \dots, {x^{★}}^{n})} \end{matrix}

(100)

\begin{matrix} = 1 - Pr {Y^{n} \notin T^{n} | X^{n} = ({x^{★}}^{n}, {x^{★}}^{n}, \dots, {x^{★}}^{n})} \end{matrix}

(101)

\begin{matrix} \leq 2^{- n δ^{'}} . \end{matrix}

(102)

As mentioned earlier, we have

| T^{n} | \approx 2^{n E_{S} [W_{S} (\cdot | {x^{★}}^{n}, S^{n})]}

. This quantity is the size of the correlated random experiment

(T^{n}, E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})])

, which determines the growth of the ID rate. Let

C = {(u_{j}, D_{j}), j = 1, \dots, M}

be an

(⌈ \sqrt{n} ⌉, M, 2^{- \sqrt{n} δ})

code, where

u_{j} \in X_{D}^{⌈ \sqrt{n} ⌉}

for each

j = 1, \dots, M

. We concatenate the sequence

{x^{★}}^{n} = (x^{★}, x^{★}, \dots, x^{★}) \in X_{D}^{n}

and the transmission code

C

to build an

(m, N, λ_{1}, λ_{2})

ID feedback code

C^{'} = {(f_{i}, D_{i}^{'}), i = 1, \dots, N}

for

W_{S}

. We now have

λ_{1}, λ_{2} < λ, λ \in (0, \frac{1}{2})

. The concatenation is performed using the coloring functions

{F_{i}, i = 1, \dots, N}

. We choose a suitable set of coloring functions

{F_{i}, i = 1, \dots, N}

at random. Every coloring function

F_{i} : T^{n} ⟶ {1, \dots, M}

corresponds to an ID message i and maps each element

y^{n} \in T^{n}

to an element

F_{i} (y^{n})

in a smaller set

{1, \dots, M}

. After

y^{n} \in T^{n}

has been received by the sender (encoder and estimator) via the noiseless feedback channel, if

i \in {1, \dots, N}

is available, then the encoder sends

u_{F_{i} (y^{n}))}

. Note that we define an encoding strategy

f_{i} = [f_{i}^{1}, \dots, f_{i}^{m}] \in F_{m}

for each coloring function

F_{i}

, as presented in Definition 4. If

y^{n} \notin T^{n}

, then an error is declared. This error probability goes to zero as n goes to infinity, as computed in (102). For a fixed family of maps

{F_{i}, i = 1, \dots, N}

and for each

i \in {1, \dots, N}

, we define the decoding sets

D (F_{i}) = ⋃_{y^{n} \in T^{n}} {y^{n}} \times D_{F_{i} (y^{n})}

.

3.5.3. Error Analysis

Next, we analyze the maximal error performance of the deterministic ID feedback code. For our analysis of the error of the first kind, we choose a fixed set

{F_{i}, i = 1, \dots, N}

. The error of the first kind is upper-bounded by

\begin{matrix} E_{S^{m}} [W_{S}^{m} ({D^{'}}_{i}^{c} | f_{i}, S^{m})] \end{matrix}

(103)

\begin{matrix} = E_{S^{m}} [W_{S}^{m} ({D (F_{i})}^{c} | f_{i}, S^{m})] \end{matrix}

(104)

\begin{matrix} \overset{(a)}{\leq} \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) (W_{S}^{n} ({(T^{n})}^{c} | {x^{★}}^{n}, s^{n}) + W_{S}^{⌈ \sqrt{n} ⌉} (D_{F_{i} (y^{n})} | u_{F_{i} (y^{n})}, s^{⌈ \sqrt{n} ⌉})) \end{matrix}

(105)

\begin{matrix} \overset{(b)}{\leq} 2^{- n δ} + 2^{- \sqrt{n} δ^{'}}, \end{matrix}

(106)

where

(a)

follows from the memorylessness property of the channel and the union bound, while

(b)

follows from Lemma 3 and the definition of the transmission code

C

.

In order to achieve a small error of the second kind, we choose suitable maps

{F_{i}, i = 1, \dots, N}

randomly. For

i \in {1, \dots, N}

,

y^{n} \in T^{n}

, let

{\bar{F}}_{i} (y^{n})

be independent RVs such that

Pr {{\bar{F}}_{i} (y^{n}) = j} = \frac{1}{M}, j \in {1, \dots, M} .

(107)

Let

F_{1}

be a realization of

{\bar{F}}_{1}

. For each

y^{n} \in T^{n}

, we define the RVs

ψ_{y^{n}} = ψ_{y^{n}} ({\bar{F}}_{2})

analogously to Section IV of [14]:

ψ_{y^{n}} = ψ_{y^{n}} ({\bar{F}}_{2}) = \{\begin{matrix} 1, & if F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n}) \\ 0, & otherwise . \end{matrix}

(108)

The

ψ_{y^{n}}

are also independent for every

y^{n} \in T^{n}

. The expectation of

ψ_{y^{n}}

is computed as follows:

\begin{matrix} E [ψ_{y^{n}}] & = Pr {F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n})} = \frac{1}{M} . \end{matrix}

(109)

Because the

ψ_{y^{n}}

are i.i.d. for all

y^{n} \in T^{n}

, we can apply Hoeffding’s inequality Lemma 5 to obtain the following Lemma.

Lemma 6.

For

λ \in (0, 1)

,

\frac{1}{M} < λ

for each channel

V_{S}^{a}

with

∥V_{S}^{a} - W_{S}^{a}∥ \leq ϵ

, while for each

n \geq n_{0} (ϵ)

we have

\begin{matrix} Pr {\sum_{y^{n} \in T^{n}} ψ_{y^{n}} > | T_{V_{S}^{a}}^{n} ({x^{★}}^{n}) | \cdot λ} \leq 2^{- | T^{n} | \cdot λ \sqrt{n} ϵ} . \end{matrix}

(110)

We can derive an upper bound on the error of the second kind for those values of

{\bar{F}}_{2}

satisfying Lemma 6:

\begin{matrix} E_{S^{m}} [W_{S}^{m} (D ({\bar{F}}_{2}) | f_{1}, S^{m})] \end{matrix}

(111)

\begin{matrix} = \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) W_{S}^{m} (D ({\bar{F}}_{2}) | f_{1}, s^{m}) \end{matrix}

(112)

\begin{matrix} = \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) W_{S}^{m} (D ({\bar{F}}_{2}) \cap ((T^{n} \times Y^{⌈ \sqrt{n} ⌉}) \cup {(T^{n} \times Y^{⌈ \sqrt{n} ⌉})}^{c}) | f_{1}, s^{m}) \end{matrix}

(113)

\begin{matrix} \overset{(a)}{\leq} \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) W_{S}^{m} ({(T^{n} \times Y^{⌈ \sqrt{n} ⌉})}^{c} | f_{1}, s^{m}) \end{matrix}

(114)

\begin{matrix} + \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) W_{S}^{m} (D ({\bar{F}}_{2}) \cap (T^{n} \times Y^{⌈ \sqrt{n} ⌉}) | f_{1}, s^{m}) \end{matrix}

(115)

\begin{matrix} \overset{(b)}{\leq} \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) W_{S}^{n} ({(T^{n})}^{c} | {x^{★}}^{n}, s^{n}) \end{matrix}

(116)

\begin{matrix} + \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) (\sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) \neq {\bar{F}}_{2} (y^{n}) \end{matrix}} W_{S}^{n} (y^{n} | {x^{★}}^{n}, s^{n}) \cdot W_{S}^{⌈ \sqrt{n} ⌉} (y^{⌈ \sqrt{n} ⌉} | u_{F_{1} (y^{n})}, s^{⌈ \sqrt{n} ⌉}) \end{matrix}

(117)

\begin{matrix} + \sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n}) \end{matrix}} W_{S}^{n} (y^{n} | {x^{★}}^{n}, s^{n})) \end{matrix}

(118)

\begin{matrix} \leq 2^{- n δ^{'}} + 2^{- \sqrt{n} δ} + \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n}) \end{matrix}} W_{S}^{n} (y^{n} | {x^{★}}^{n}, s^{n}) \end{matrix}

(119)

\begin{matrix} \overset{(c)}{\leq} 2^{- n δ^{'}} + 2^{- \sqrt{n} δ} \end{matrix}

(120)

\begin{matrix} + \sum_{V_{S}^{a} : | V_{S}^{a} - W_{S}^{a} | \leq ϵ} {(W_{S}^{a})}^{n} (T_{V_{S}^{a}}^{n} ({x^{★}}^{n}) | {x^{★}}^{n}) \cdot (\sum_{y^{n} \in T_{V_{S}^{a}}^{n} ({x^{★}}^{n})} ψ_{y^{n}}) \cdot {| T_{V_{S}^{a}}^{n} ({x^{★}}^{n}) |}^{- 1} \end{matrix}

(121)

\begin{matrix} \overset{(d)}{\leq} 2^{- n δ^{'}} + 2^{- \sqrt{n} δ} + λ \end{matrix}

(122)

where

(a)

follows from the union bound,

(b)

follows from the memorylessness property of the channel and the union bound,

(c)

follows from Lemma 3 along with the definition of the transmission code

C

and the definition of the set

T^{n}

in (90), and

(d)

follows from Lemma 6.

We repeatedly perform the same analysis of the error of the second kind for all pairs

(i_{1}, i_{2}) \in {1, \dots, N}^{2}, i_{1} \neq i_{2}

. For simplicity of notation, we denote the error of the second kind between the pair

(i_{1}, i_{2})

by

μ_{2}^{(i_{1}, i_{2})}

. We have

\begin{matrix} Pr {C^{'} is not an (n, N, λ_{1}, λ_{2}) code} \end{matrix}

(123)

\begin{matrix} = Pr {⋃_{\begin{matrix} i_{1}, i_{2} \in {1, \dots, N} \\ i_{1} \neq i_{2} \end{matrix} μ_{2}^{(i_{1}, i_{2})}} \geq λ_{2}} \end{matrix}

(124)

\begin{matrix} = Pr {⋃_{\begin{matrix} i_{1}, i_{2} \in {1, \dots, N} \\ i_{1} \neq i_{2} \end{matrix}} μ_{2}^{(i_{1}, i_{2})} \geq λ + 2^{- n δ^{'}} + 2^{- \sqrt{n} δ}} \end{matrix}

(125)

\begin{matrix} \overset{(a)}{\leq} N \cdot (N - 1) \cdot 2^{- | T^{n} | \cdot λ \sqrt{n} ϵ}, \end{matrix}

(126)

where

(a)

follows from the union bound, Equation (122) and Lemma 6. It is verifiable that we can construct an ID feedback code for

W_{S}

with cardinality N satisfying

N \geq {(n + 1)}^{- 2 | X | | Y |} \cdot 2^{| T^{n} | \cdot λ \sqrt{n} ϵ}

(127)

and with an error of the second kind upper-bounded as in (122).

The next step in the proof is dedicated to the state estimator. The per-symbol distortion defined in (15) can be rewritten as follows:

\begin{matrix} d_{t} & = E [d (S_{t}, {\hat{S}}_{t})] \end{matrix}

(128)

\begin{matrix} = E [E [d (S_{t}, {\hat{S}}_{t}) | X_{t}, Y_{t}]] \end{matrix}

(129)

\begin{matrix} = \sum_{(x, y) \in X \times Y} P_{X_{t} Y_{t}} (x, y) \sum_{s_{t} \in S} P_{S_{t} | X_{t} Y_{t}} (s | x, y) \sum_{{\hat{s}}_{t} \in S} P_{{\hat{S}}_{t} | X_{t} Y_{t}} (\hat{s} | x, y) d (s, \hat{s}) \end{matrix}

(130)

\begin{matrix} = \sum_{(x, y) \in X \times Y} P_{X_{t} Y_{t}} (x, y) \sum_{s_{t} \in S} P_{S_{t} | X_{t} Y_{t}} (s | x, y) d (s, h^{★} (x, y)) \end{matrix}

(131)

\begin{matrix} = \sum_{x \in X} P_{X_{t}} (x) E_{S_{t} Y_{t}} [d (S_{t}, h^{★} (X_{t}, Y_{t})) | X_{t} = x] \end{matrix}

(132)

\begin{matrix} = \sum_{x \in X} P_{X_{t}} (x) d^{★} (x) \end{matrix}

(133)

\begin{matrix} \leq D . \end{matrix}

(134)

This completes the direct proof of Theorem 4.

3.6. Converse Proof of Theorem 4

Proof.

For the converse proof, we use the techniques for deterministic ID for

W_{S}

as described in Section 3.2. We first extend Lemma 1 to the joint deterministic ID and sensing problem. □

Lemma 7.

For any feedback strategy

f_{D} = [f_{D}^{1}, f_{D}^{2}, \dots, f_{D}^{n}]

which satisfies the per-symbol distortion constraint as described in (17), i.e., for all

t \in \{1, 2, \dots, n\}

,

d^{★} (f_{D}^{t}) \leq D

and for any

μ \in (0, 1)

, we have

\begin{matrix} min_{E_{3} \subset Y^{n} : E_{S^{n}} [W_{S}^{n} (E_{3} | f_{D}, S^{n})] \geq 1 - μ} |E_{3}| \leq K_{3}, \end{matrix}

(135)

where

K_{3}

is provided by

\begin{matrix} K_{3} = 2^{n {max}_{x \in X_{D}} H (E [W_{S} (\cdot | x)]) + α \sqrt{n}} = 2^{n H (E [W_{S} (\cdot | x^{★})]) + α \sqrt{n}}, \end{matrix}

(136)

where

\begin{matrix} α = \sqrt{\frac{β}{μ}}, β = max ({log}^{2} (3), {log}^{2} (|Y|)) . \end{matrix}

(137)

Proof.

Let

E_{3}^{★} \subset Y^{n}

be defined as follows:

\begin{matrix} E_{3}^{★} = \{y^{n} \in Y^{n}, - log E_{S^{n}} [W_{S}^{n} (f_{D}, S^{n})] \leq K_{3}\} . \end{matrix}

(138)

Define an RV

Z_{t} = - log E_{S_{t}} [W_{S} (Y_{t} | f_{D}^{t} (Y^{t - 1}), S_{t})]

. Per (40), we have

\begin{matrix} E_{S^{n}} [W_{S}^{n} (E_{3}^{★} | f_{D}, S^{n})] = P r \{\sum_{t = 1}^{n} Z_{t} \leq log K_{3}\} . \end{matrix}

(139)

Similarly, for all

t \in \{1, 2, \dots, n\}

, we define an RV

U_{t} = Z_{t} - E [Z_{t} | Y^{t - 1}]

. It has been shown in (44) and (48) that

E [U_{t} | Y^{t - 1}] = 0

and

E [U_{t}] = 0

.

Moreover, for all

t \in \{1, 2, \dots, n\}

and for all

y^{t - 1} \in Y^{t - 1}

, we have

\begin{matrix} E [Z_{t} | y^{t - 1}] & = \sum_{y_{t} \in Y} (- E_{S_{t}} [W_{S} (Y_{t} | f_{D}^{t} (Y^{t - 1}), S_{t})] log E_{S_{t}} [W_{S} (Y_{t} | f_{D}^{t} (Y^{t - 1}), S_{t})]) \end{matrix}

(140)

\begin{matrix} \leq max_{x \in X_{D}} H (E_{S} [W_{S} (\cdot | x, S)]) \end{matrix}

(141)

\begin{matrix} = H (E_{S} [W_{S} (\cdot | x^{★}, S)]) . \end{matrix}

(142)

By combining (139) and (142), we have

\begin{matrix} E_{S^{n}} [W_{S}^{n} (f_{D}, S^{n})] & \geq P r \{\sum_{t = 1}^{n} U_{t} \leq α \sqrt{n}\} \end{matrix}

(143)

\begin{matrix} \geq 1 - μ . \end{matrix}

(144)

This completes the proof of Lemma 7 □

The subsequent steps in the proof are identical to Section 3.2.

3.7. Direct Proof of Theorem 5

Proof.

For the direct proof of this theorem, we follow a code construction similar to that presented in [14], with one key difference, namely, that we optimize only over input distributions that satisfy the per-symbol constraint

d^{★} (P) \leq D

. □

3.7.1. Coding Scheme

We construct a randomized ID code with blocklength

m = n + \sqrt{n}

by concatenating two transmission codes, which is described in detail later. The first n symbols are allocated for generation of common randomness. We employ a distribution

P^{*} = arg max_{P \in P (X)} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S))],

where the maximization is performed over distributions subject to the constraint

P_{D} = \{P \in P (X) | d^{★} (P) \leq D\}

. Regardless of which identity

i \in {1, \dots, N}

we want to identify, the sender first sends n symbols with respect to the distribution

{P^{★}}^{n} \in P_{D}^{n}

over the state-dependent channel

W_{S}

. The received sequence

y^{n} \in Y^{n}

becomes known to the sender (estimator and encoder) via the noiseless feedback link. The feedback provides the sender and the receiver with knowledge of the outcome of the correlated random experiment

(Y^{n}, \sum_{x^{n} \in X^{n}} P^{★ n} (x^{n}) E_{S^{n}} [W_{S}^{n} (\cdot | x^{n}, S^{n})])

.

3.7.2. Common Randomness Generation

Similar to the deterministic coding scheme, we want to generate uniform common randomness. Therefore, we convert our correlated random experiment

(Y^{n}, \sum_{x^{n} \in X^{n}} P^{★ n} (x^{n}) E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})])

to a uniform one

(T^{' n}, \sum_{x^{n} \in X^{n}} P^{★ n} (x^{n}) E_{S^{n}} [W_{S}^{n} (\cdot | {x^{★}}^{n}, S^{n})]))

. For

ϵ > 0

, the set

T^{' n}

is provided by

T^{' n} = ⋃_{V_{S}^{a} : ∥V_{S}^{a} - W_{S}^{a}∥ \leq ϵ} \sum_{x \in X} P^{★ n} (x^{n}) T_{V_{S}^{a}}^{n} (x^{n}) .

(145)

Per Lemma 4, we can obtain the following corollary.

Corollary 1.

Let

(x^{n}, y^{n})

be emitted by the DMS

P_{X Y} (\cdot, \cdot) = W (\cdot | \cdot) P_{X} (\cdot)

and let

V \in W

. For every

ϵ > 0

, there exist a

c (ϵ) > 0

and

n_{0} (ϵ)

such that for

n \geq n_{0} (ϵ)

we have:

1.: $| ⋃_{V : ∥V - W∥ \leq ϵ} T_{V}^{n} (x^{n}) | \geq 2^{n (H (\sum_{x \in X} P_{X} (x) W (\cdot | x)) - c (ϵ))}$ ,
2.: $| ⋃_{V : ∥V - W∥ \leq ϵ} T_{V}^{n} (x^{n}) | \leq 2^{n (H (\sum_{x \in X} P_{X} (x) W (\cdot | x)) + c (ϵ))}$ ,
3.: If $∥V - W∥ \leq ϵ$ , $T_{V}^{n} (x^{n}) \neq \emptyset$ and $c (ϵ) \to 0$ if $ϵ \to 0$ , then

$| T_{V}^{n} (x^{n}) | \geq 2^{n (H (\sum_{x \in X} P_{X} (x) W (\cdot | x)) - c (ϵ))} .$

(146)

Therefore, we have

|T^{' n}| \approx 2^{n H (\sum_{x \in X} P (x) E_{S} [W (\cdot | x, S)])}

. The sender generates randomness according to the random experiment

(T^{' n}, \sum_{x^{n} \in X^{n}} P^{★ n} (x^{n}) E_{S^{n}} [W_{S}^{n} (\cdot | x^{n}, S^{n})])

. Asymptotically, the error probability of

y^{n} \notin T^{' n}

goes to zero. Similar to the deterministic scheme, we prepare the coloring functions

\{F_{i}, i = 1, \dots, N\}

. The last

⌈ \sqrt{n} ⌉

symbols are used to transmit

F_{i} (y^{n})

using a standard

(⌈ \sqrt{n} ⌉, M, 2^{- n \sqrt{n} δ})

transmission code

C = \{(u_{j}, D_{j}), j = 1, \dots, M\}

, where

u_{j} \in X_{D}^{⌈ \sqrt{n} ⌉}

for each

j = 1, \dots, M

. The probability distribution for encoding is defined as

Q_{F} (\cdot | i) = {P^{★}}^{n} \times I \{x_{n}^{m} = F_{i} (y^{n})\}

and the decoding region is provided by

D (F_{i}) = ⋃_{y^{n} \in T^{'}} \{y^{n}\} \times D (F_{i} (y^{n}))

.

3.7.3. Error Analysis

Subsequently, for all

i = 1, \dots, N

, the error of the first kind

P_{e, 1} (i)

is upper-bounded by

\begin{matrix} P_{e, 1} (i) \end{matrix}

(147)

\begin{matrix} = \sum_{f \in F^{m}} Q_{F} (f | i) E_{S^{m}} [W_{S}^{m} (D (F_{i}) | f, S^{m})] \end{matrix}

(148)

\begin{matrix} = \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{f \in F^{m}} Q_{F} (f | i) W_{S}^{m} ({(⋃_{y^{n} \in T^{'}} y^{n} \times D_{F_{i} (y^{n})})}^{c} | f, S^{m}) \end{matrix}

(149)

\begin{matrix} \overset{(a)}{\leq} \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) (\sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) W_{S}^{n} (T^{' c} | x^{n}, S^{n}) + W_{S}^{⌈ \sqrt{n} ⌉} (D_{F_{i} (y^{n})}^{c} | u_{F_{i} (y^{n})}, S^{⌈ \sqrt{n} ⌉})) \end{matrix}

(150)

\begin{matrix} \overset{(b)}{\leq} 2^{- n δ} + 2^{- \sqrt{n} δ^{'}}, \end{matrix}

(151)

where

(a)

follows from the union bound and

(b)

follows from Corollary 1. Furthermore, for all

i, j = 1, \dots, N

with

i \neq j

, the probability of the error of the second kind

P_{e, 2} (i, j)

should be asymptotically upper-bounded by

λ

. Without loss of generality, we fix

i = 1

,

j = 2

and examine the error probability

P_{e, 2} (1, 2)

:

\begin{matrix} P_{e, 2} (1, 2) \end{matrix}

(152)

\begin{matrix} = \sum_{f \in F^{m}} Q (f | i = 1) E_{S^{m}} [W_{S}^{m} (D ({\bar{F}}_{2}) | f, S^{m})] \end{matrix}

(153)

\begin{matrix} = \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{f \in F^{m}} Q (f | i = 1) W_{S}^{m} (D ({\bar{F}}_{2}) | f, s^{m}) \end{matrix}

(154)

\begin{matrix} = \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{f \in F^{m}} Q (f | i = 1) \cdot \end{matrix}

(155)

\begin{matrix} \cdot W_{S}^{m} (D ({\bar{F}}_{2}) \cap ((T^{' n} \times Y^{⌈ \sqrt{n} ⌉}) \cup {(T^{' n} \times Y^{⌈ \sqrt{n} ⌉})}^{c}) | f, s^{m}) \end{matrix}

(156)

\begin{matrix} \overset{(a)}{\leq} \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{f \in F^{m}} Q (f | i = 1) W_{S}^{m} ({(T^{n} \times Y^{⌈ \sqrt{n} ⌉})}^{c} | f, s^{m}) \end{matrix}

(157)

\begin{matrix} + \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{f \in F^{m}} Q (f | i = 1) W_{S}^{m} (D ({\bar{F}}_{2}) \cap (T^{n} \times Y^{⌈ \sqrt{n} ⌉}) | f, s^{m}) \end{matrix}

(158)

\begin{matrix} \overset{(b)}{\leq} \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) \sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) W_{S}^{n} ({(T^{n})}^{c} | x^{n}, s^{n}) W_{S}^{⌈ \sqrt{n} ⌉} (D_{F_{i} (y^{n})}^{c} | u_{F_{i} (y^{n})}, s^{⌈ \sqrt{n} ⌉}) \end{matrix}

(159)

\begin{matrix} + \sum_{s^{m} \in S^{m}} P_{S}^{m} (s^{m}) (\sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) \neq {\bar{F}}_{2} (y^{n}) \end{matrix}} \sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) W_{S}^{n} (y^{n} | x^{n}, s^{n}) \cdot \end{matrix}

(160)

\begin{matrix} \cdot W_{S}^{⌈ \sqrt{n} ⌉} (y^{⌈ \sqrt{n} ⌉} | u_{F_{1} (y^{n})}, s^{⌈ \sqrt{n} ⌉}) + \sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n}) \end{matrix}} \sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) W_{S}^{n} (y^{n} | x^{n}, s^{n})) \end{matrix}

(161)

\begin{matrix} \leq 2^{- n δ^{'}} + 2^{- \sqrt{n} δ} + \sum_{s^{n} \in S^{n}} P_{S}^{n} (s^{n}) \sum_{\begin{matrix} y^{n} \in T^{n} \\ F_{1} (y^{n}) = {\bar{F}}_{2} (y^{n}) \end{matrix}} \sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) W_{S}^{n} (y^{n} | x^{n}, s^{n}) \end{matrix}

(162)

\begin{matrix} \overset{(c)}{\leq} 2^{- n δ^{'}} + 2^{+} \sum_{x^{n} \in X^{n}} {P^{★}}^{n} (x^{n}) \sum_{V_{S}^{a} : | V_{S}^{a} - W_{S}^{a} | \leq ϵ} {(W_{S}^{a})}^{n} ({T^{'}}_{n}^{V_{S}^{a}} (x^{n}) | x^{n}) \cdot \end{matrix}

(163)

\begin{matrix} \cdot (\sum_{y^{n} \in {T^{'}}_{n}^{V_{S}^{a}} (x^{n})} ψ_{y^{n}}) \cdot {| {T^{'}}_{n}^{V_{S}^{a}} (x^{n}) |}^{- 1} \end{matrix}

(164)

\begin{matrix} \overset{(d)}{\leq} 2^{- n δ^{'}} + 2^{- \sqrt{n} δ} + λ \end{matrix}

(165)

where

(a)

follows from the union bound,

(b)

follows from the memoryless channel and the union bound,

(c)

follows from Lemma 3 along with the definition of the transmission code

C

and the definition of the set

T^{n}

in (90), and

(d)

follows from Corollary 1.

We repeatedly perform the same analysis of the error of the second kind for all pairs

(i_{1}, i_{2}) \in {1, \dots, N}^{2}, i_{1} \neq i_{2}

. It is verifiable that we can construct a randomized ID feedback code for

W_{S}

with cardinality N satisfying

N \geq {(n + 1)}^{- 2 | X | | Y |} \cdot 2^{| T^{' n} | \cdot λ \sqrt{n} ϵ}

(166)

and with errors of the first and second kind that are upper-bounded as in (11) and (12), respectively.

Finally the state estimator is checked. The per-symbol distortion defined in (15) can be rewritten as follows:

\begin{matrix} d_{t} & = E [d (S_{t}, {\hat{S}}_{t})] \end{matrix}

(167)

\begin{matrix} = E [E [d (S_{t}, {\hat{S}}_{t}) | X_{t}, Y_{t}]] \end{matrix}

(168)

\begin{matrix} = \sum_{(x, y) \in X \times Y} P_{X_{t} Y_{t}} (x, y) \sum_{s_{t} \in S} P_{S_{t} | X_{t} Y_{t}} (s | x, y) \sum_{{\hat{s}}_{t} \in S} P_{{\hat{S}}_{t} | X_{t} Y_{t}} (\hat{s} | x, y) d (s, \hat{s}) \end{matrix}

(169)

\begin{matrix} = \sum_{(x, y) \in X \times Y} P_{X_{t} Y_{t}} (x, y) \sum_{s_{t} \in S} P_{S_{t} | X_{t} Y_{t}} (s | x, y) d (s, h^{★} (x, y)) \end{matrix}

(170)

\begin{matrix} = \sum_{x \in X} P_{X_{t}} (x) E_{S_{t} Y_{t}} [d (S_{t}, h^{★} (X_{t}, Y_{t})) | X_{t} = x] \end{matrix}

(171)

\begin{matrix} = d^{★} (P) \end{matrix}

(172)

\begin{matrix} \leq D . \end{matrix}

(173)

This completes the direct proof of Theorem 5.

3.8. Converse Proof of Theorem 5

Proof.

We first extend Lemma 2 to the joint randomized ID and sensing problem. □

Lemma 8.

For any randomized feedback strategy

Q_{D} (f) = \prod_{t = 1}^{n} Q_{D}^{t} (f^{t})

over all n-length feedback encoding sets

F_{n}

which satisfies the per symbol distortion constraint as described in (142), i.e., for all

t \in {1, 2, \dots, n}

,

d^{★} (Q_{D}^{t} (f^{t})) \leq D

and for any

μ \in (0, 1)

, we have

\begin{matrix} min_{E_{4} \subset Y^{n} : \sum_{f \in F_{n}} Q_{D} (f) E_{S^{n}} [W_{S}^{n} (E_{4} | f, S^{n})] \geq 1 - μ} | E_{4} | \leq K_{4}, \end{matrix}

(174)

where

K_{4}

is provided by

\begin{matrix} K_{4} = 2^{n {max}_{P \in P^{D}} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S)]) + α \sqrt{n}} = 2^{n H (\sum_{x \in X} P^{★} (x) E [W_{S} (\cdot | x, S)]) + α \sqrt{n}}, \end{matrix}

(175)

where

α = \sqrt{\frac{β}{μ}}

,

β = max {log}^{2} 3, {log}^{2} | Y |

.

Proof.

Define a set

E_{4}^{★} \subset

as follows:

\begin{matrix} E_{4}^{★} = \{y^{n} \in Y^{n}, - log \sum_{f \in F_{n}} Q_{D} (f) E_{S^{n}} [W_{S} (y^{n} | f, S^{n})] \leq log K_{4}\} . \end{matrix}

(176)

Define an auxiliary RV

Z_{t}

as follows:

Z_{t} = - log (\sum_{f^{t} \in F^{t}} Q_{D}^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (Y_{t} | f^{t} (Y^{t - 1}), S_{t})]) .

(177)

Then, per (80), we have

\begin{matrix} \sum_{f \in F_{n}} Q_{D} (f) E [W_{S}^{n} (E_{4} | f, S^{n})] = P r \{\sum_{t = 1}^{n} Z_{t} \leq log K_{4}\} . \end{matrix}

(178)

Similarly, for any

y^{t - 1} \in Y^{t - 1}

, we can examine

\begin{matrix} E [Z_{t} | y^{t - 1}] \end{matrix}

(179)

\begin{matrix} = \sum_{y_{t} \in Y} (- \sum_{f^{t} \in F^{t}} Q_{D}^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (y_{t} | f^{t} (y^{t - 1}), S_{t})] log (\sum_{f^{t} \in F^{t}} Q_{D}^{t} (f^{t}) E_{S_{t}} [W_{S}^{n} (y_{t} | f^{t} (y^{t - 1}), S_{t})])) \end{matrix}

(180)

\begin{matrix} \leq max_{P \in P_{D}} H (\sum_{x \in X} P (x) E [W_{S} (\cdot | x, S)]) \end{matrix}

(181)

\begin{matrix} = H (\sum_{x \in X} P^{★} (x) E [W_{S}^{n} (\cdot | x, S]) . \end{matrix}

(182)

By applying Chebyshev’s inequality, we complete the proof of Lemma 8. □

The subsequent steps in the proof are the same as in Section 3.6.

4. Average Distortion

In addition to the per-symbol distortion constraint, an alternative and more flexible distortion constraint is the average distortion. This approach is valuable because it relaxes the per-symbol fidelity requirement, allowing for minor variations in individual symbol quality as long as the overall average distortion remains below a specified threshold. As defined in [32], the average distortion for a sequence of symbols is provided by

\begin{matrix} {\bar{d}}^{n} & = E_{S^{n} {\hat{S}}^{n}} [d (S^{n}, {\hat{S}}^{n})] \end{matrix}

(183)

\begin{matrix} = \frac{1}{n} \sum_{t = 1}^{n} E_{S_{t}, {\hat{S}}_{t}} [d (S_{t}, {\hat{S}}_{t})] . \end{matrix}

(184)

This metric captures the average quality of the reconstructed sequence, making it suitable for applications where consistent strict fidelity for each symbol is not essential but where the overall fidelity of the transmission needs to remain within acceptable limits.

In the case of a deterministic ID code, the average distortion can be expressed in a more detailed form as

{\bar{d}}^{n} = \frac{1}{N} \sum_{i = 1}^{N} E_{S^{n} Y^{n}} [\frac{1}{n} \sum_{t = 1}^{n} d (S_{t}, {\hat{S}}_{t}) | X^{n} = f_{i}] .

(185)

Using the code construction method from Section 3.5 along with the minimum distortion condition defined in (17), we propose the following theorem, which provides a lower bound on the deterministic ID capacity–distortion function for a state-dependent channel

W_{S}

under an average distortion constraint.

Theorem 6.

The deterministic ID capacity–distortion function with average distortion constraint

{\bar{d}}^{n} \leq D

of the state-dependent channel

W_{S}

is lower-bounded as follows:

C_{I D, a v g}^{d} (D) \geq max_{x \in X_{D}} H (E [W_{S} (\cdot | x, S)]),

(186)

where the set

X_{D}

is provided by

X_{D} = {x \in X, d^{★} (x) \leq D} .

(187)

Despite the practical implications of this result, proving a converse theorem for this bound remains an open problem.

5. Conclusions and Discussion

In this work, we have studied the problem of joint ID and channel state estimation over a DMC with i.i.d. state sequences where the sender simultaneously sends an identity and senses the state via a strictly causal channel output. After establishing the capacity on the corresponding ID capacity–distortion function, it emerges that sensing can increase the ID capacity. In the proof of our theorem, we noticed that the generation of common randomness is a key tool for achieving a high ID rate. The common randomness generation is helped by feedback. The ID rate can be further increased by adding local randomness at the sender.

Our framework closely mirrors the one described in [23], with the key distinction being that we utilize an identification scheme instead of the classical transmission scheme. We want to simultaneously identify the sent message and estimate the channel’s state. As noted in the results of [23], the capacity–distortion function is consistently smaller than the transmission capacity of the state-dependent DMC except when the distortion is infinite. This observation aligns with expectations for the message transmission scheme, as the optimization is performed over a constrained input set defined by the distortion function. However, this does not directly apply to the ID scheme. An interesting aspect is that the capacity–distortion function for the deterministic encoding case scales double exponentially with the blocklength, as highlighted in Theorem 4. However, the ID capacity of the state-dependent DMC with deterministic encoding scales only exponentially with the blocklength. This is because feedback significantly enhances the ID capacity, enabling double-exponential growth of the ID capacity for the state-dependent DMC, as established in Theorem 2. This contrasts sharply with the message transmission scheme, where feedback does not increase the capacity of a DMC. Introducing an estimator into our framework naturally reduces the ID capacity compared to the scenario with feedback but without an estimator. This reduction occurs because the optimization is performed over a constrained input set defined by the distortion function. Nevertheless, the capacity–distortion function remains higher than in the case without feedback and without an estimator. This difference underscores a unique characteristic of the ID scheme, highlighting its distinct scaling behavior and potential advantages in certain scenarios.

We consider two cases, namely, deterministic and randomized identification. For a transmission system without sensing, it was shown in [1,3] that the number of messages grows exponentially, i.e.,

N = 2^{n C_{I D}^{d}}

.

Remarkably, Theorem 4 demonstrates that by incorporating sensing, the growth rate of the number of messages becomes double exponential (

N = 2^{2^{n C_{I D}^{d} (D)}}

); this result is notable, and closely parallels the findings on identification with feedback in [14].

In the case of randomized identification, Theorem 5 shows that the capacity is also improved by incorporating sensing. However, in both the deterministic and randomized settings, the scaling remains double exponential.

One application of message identification is in control and alarm systems [10,33]. For instance, it has been shown that identification codes can be used for status monitoring in digital twins [34]. In this context, our results demonstrate that incorporating a sensing process can significantly enhance the capacity.

Another potential application of our framework is molecular communication, where nanomachines use identification codes to determine when to perform specific actions such as drug delivery [35]. In this context, sensing the position of the nanomachines can enhance the communication rate. For such scenarios, it is also essential to explore alternative channel models such as the Poisson channel.

Furthermore, it is clear that in other applications it would be necessary to consider different distortion functions.

In the future, it would be interesting to apply the method proposed in this paper to other distortion functions. Furthermore, in practical scenarios, there are models where the the sensing is performed either additionally or exclusively by the receiver. This suggests the need to study a wider variety of system models. For wireless communications, the Gaussian channel is more practical and widely applicable. Therefore, it would be valuable to extend our results to the Gaussian case (JIDAS scheme with a Gaussian channel as the forward channel). It has been shown in [15,28] that the ID capacity of a Gaussian channel with noiseless feedback is infinite. Interestingly, the ID capacity of a Gaussian channel with noiseless feedback remains infinite regardless of the scaling used for the rate, e.g., double exponential, triple exponential, etc. By introducing an estimator, we conjecture that the same results will hold, leading to an infinite capacity–distortion function. Thus, considering scenarios with noisy feedback is more practical for future research.

Author Contributions

Writing—original draft preparation, W.L. and Y.Z.; supervision, C.D. and H.B. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge financial support by the Federal Ministry of Education and Research of Germany (BMBF) through the “Souverän. Digital. Vernetzt.” programme, joint project “6G-life”, project identification number 16KISK002. H. Boche and W. Labidi were further supported in part by the BMBF within the national initiative on Post-Shannon Communication (NewCom) under Grant 16KIS1003K and within the national initiative on molecular communication (IoBNT) under grant 16KIS1988. C. Deppe was further supported in part by the BMBF within the national initiative on Post-Shannon Communication (NewCom) under Grant 16KIS1005. C. Deppe, W. Labidi and Y. Zhao were supported by the DFG within the projects DE1915/2-1 and BO 1734/38-1.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

This work has been presented in part at the IEEE International Symposium on Information Theory 2023 (ISIT 2023) [32].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ahlswede, R.; Dueck, G. Identification via channels. IEEE Trans. Inf. Theory 1989, 35, 15–29. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Salariseddigh, M.J.; Pereg, U.; Boche, H.; Deppe, C. Deterministic Identification over Channels with Power Constraints. IEEE Trans. Inf. Theory 2022, 68, 1–24. [Google Scholar] [CrossRef]
Ahlswede, R.; Cai, N. Identification without randomization. IEEE Trans. Inf. Theory 1999, 45, 2636–2642. [Google Scholar] [CrossRef]
Moulin, P. The role of information theory in watermarking and its application to image watermarking. Signal Process. 2001, 81, 1121–1139. [Google Scholar] [CrossRef]
Ahlswede, R. Watermarking Identification Codes with Related Topics on Common Randomness. In Identification and Other Probabilistic Models: Rudolf Ahlswede’s Lectures on Information Theory 6; Ahlswede, A., Althöfer, I., Deppe, C., Tamm, U., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 271–325. [Google Scholar] [CrossRef]
Steinberg, Y.; Merhav, N. Identification in the presence of side information with application to watermarking. IEEE Trans. Inf. Theory 2001, 47, 1410–1422. [Google Scholar] [CrossRef]
Lu, Y. Industry 4.0: A survey on technologies, applications and open research issues. J. Ind. Inf. Integr. 2017, 6, 1–10. [Google Scholar] [CrossRef]
Fettweis, G.; Boche, H. 6G: The Personal Tactile Internet—And Open Questions for Information Theory. IEEE BITS Inf. Theory Mag. 2021, 1, 71–82. [Google Scholar] [CrossRef]
Cabrera, J.; Boche, H.; Deppe, C.; Schaefer, R.F.; Scheunert, C.; Fitzek, F. 6G and the Post-Shannon-Theory. In Shaping Future 6G Networks: Needs, Impacts and Technologies; Bertin, E., Crespi, N., Magedanz, T., Eds.; Wiley-Blackwell: Oxford, UK, 2021. [Google Scholar] [CrossRef]
Fettweis, G.; Boche, H. On 6G and trustworthiness. Commun. ACM 2022, 65, 48–49. [Google Scholar] [CrossRef]
Ahlswede, R.; Zhang, Z. New directions in the theory of identification via channels. IEEE Trans. Inf. Theory 1995, 41, 1040–1050. [Google Scholar] [CrossRef]
Labidi, W.; Deppe, C.; Boche, H. Secure Identification for Gaussian Channels. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 2872–2876. [Google Scholar] [CrossRef]
Ahlswede, R.; Dueck, G. Identification in the presence of feedback-a discovery of new capacity formulas. IEEE Trans. Inf. Theory 1989, 35, 30–36. [Google Scholar] [CrossRef]
Labidi, W.; Boche, H.; Deppe, C.; Wiese, M. Identification over the Gaussian Channel in the presence of feedback. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 278–283. [Google Scholar] [CrossRef]
Ahlswede, R. General theory of information transfer: Updated. Discret. Appl. Math. 2008, 156, 1348–1388. [Google Scholar] [CrossRef]
Ahlswede, R.; Csiszar, I. Common randomness in information theory and cryptography. II. CR capacity. IEEE Trans. Inf. Theory 1998, 44, 225–240. [Google Scholar] [CrossRef]
Proceedings of the 1st IEEE International Online Symposium on Joint Communications and Sensing; IEEE: Piscataway, NJ, USA, 2021; Available online: https://www.proceedings.com/58212.html (accessed on 18 November 2024).
Sturm, C.; Wiesbeck, W. Waveform Design and Signal Processing Aspects for Fusion of Wireless Communications and Radar Sensing. Proc. IEEE 2011, 99, 1236–1259. [Google Scholar] [CrossRef]
Bliss, D.W. Cooperative radar and communications signaling: The estimation and information theory odd couple. In Proceedings of the 2014 IEEE Radar Conference, Cincinnati, OH, USA, 9–23 May 2014; pp. 50–55. [Google Scholar] [CrossRef]
Bica, M.; Huang, K.W.; Mitra, U.; Koivunen, V. Opportunistic Radar Waveform Design in Joint Radar and Cellular Communication Systems. In Proceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), San Diego, CA, USA, 6–10 December 2015; pp. 1–7. [Google Scholar] [CrossRef]
Huang, K.W.; Bică, M.; Mitra, U.; Koivunen, V. Radar waveform design in spectrum sharing environment: Coexistence and cognition. In Proceedings of the 2015 IEEE Radar Conference (RadarCon), Arlington, VA, USA, 10–15 May 2015; pp. 1698–1703. [Google Scholar] [CrossRef]
Kobayashi, M.; Caire, G.; Kramer, G. Joint State Sensing and Communication: Optimal Tradeoff for a Memoryless Case. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 111–115. [Google Scholar] [CrossRef]
Choudhuri, C.; Kim, Y.H.; Mitra, U. Causal State Communication. IEEE Trans. Inf. Theory 2013, 59, 3709–3719. [Google Scholar] [CrossRef]
Chiu, S.E.; Ronquillo, N.; Javidi, T. Active Learning and CSI Acquisition for mmWave Initial Alignment. IEEE J. Sel. Areas Commun. 2019, 37, 2474–2489. [Google Scholar] [CrossRef]
Han, T.S. Information-Spectrum Methods in Information Theory; Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar] [CrossRef]
Wiese, M.; Labidi, W.; Deppe, C.; Boche, H. Identification over Additive Noise Channels in the Presence of Feedback. IEEE Trans. Inf. Theory 2022, 69, 6811–6821. [Google Scholar] [CrossRef]
Zhang, W.; Vedantam, S.; Mitra, U. Joint Transmission and State Estimation: A Constrained Channel Coding Approach. IEEE Trans. Inf. Theory 2011, 57, 7084–7095. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar] [CrossRef]
Hoeffding, W. On Probabilities of Large Deviations. In The Collected Works of Wassily Hoeffding; Springer: New York, NY, USA, 1994; pp. 473–490. [Google Scholar] [CrossRef]
Labidi, W.; Deppe, C.; Boche, H. Joint identification and sensing for discrete memoryless channels. In Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 25–30 June 2023; pp. 442–447. [Google Scholar] [CrossRef]
Bringer, J.; Chabanne, H. Code Reverse Engineering Problem for Identification Codes. IEEE Trans. Inf. Theory 2012, 58, 2406–2412. [Google Scholar] [CrossRef]
von Lengerke, C.; Cabrera, J.A.; Fitzek, F.H.P. Identification Codes for Increased Reliability in Digital Twin Applications over Noisy Channels. In Proceedings of the 2023 IEEE International Conference on Metaverse Computing, Networking and Applications (MetaCom), Kyoto, Japan, 26–28 June 2023; pp. 550–557. [Google Scholar] [CrossRef]
Labidi, W.; Deppe, C.; Boche, H. Information-Theoretical Analysis of Event-Triggered Molecular Communication. In Proceedings of the European Wireless 2024, Brno, Czech Republic, 9–11 September 2024. [Google Scholar]

Figure 1. Discrete memoryless channel with random state.

Figure 2. Discrete memoryless channel with random state and noiseless feedback.

Figure 3. State-dependent channel with noiseless feedback.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Labidi, W.; Zhao, Y.; Deppe, C.; Boche, H. Joint Identification and Sensing for Discrete Memoryless Channels. Entropy 2025, 27, 12. https://doi.org/10.3390/e27010012

AMA Style

Labidi W, Zhao Y, Deppe C, Boche H. Joint Identification and Sensing for Discrete Memoryless Channels. Entropy. 2025; 27(1):12. https://doi.org/10.3390/e27010012

Chicago/Turabian Style

Labidi, Wafa, Yaning Zhao, Christian Deppe, and Holger Boche. 2025. "Joint Identification and Sensing for Discrete Memoryless Channels" Entropy 27, no. 1: 12. https://doi.org/10.3390/e27010012

APA Style

Labidi, W., Zhao, Y., Deppe, C., & Boche, H. (2025). Joint Identification and Sensing for Discrete Memoryless Channels. Entropy, 27(1), 12. https://doi.org/10.3390/e27010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Identification and Sensing for Discrete Memoryless Channels

Abstract

1. Introduction

2. System Models and Main Results

3. Proof of the Main Results

3.1. Direct Proof of Theorem 2

3.2. Converse Proof of Theorem 2

3.3. Direct Proof of Theorem 3

3.4. Converse Proof of Theorem 3

3.5. Direct Proof of Theorem 4

3.5.1. Coding Scheme

3.5.2. Common Randomness Generation

3.5.3. Error Analysis

3.6. Converse Proof of Theorem 4

3.7. Direct Proof of Theorem 5

3.7.1. Coding Scheme

3.7.2. Common Randomness Generation

3.7.3. Error Analysis

3.8. Converse Proof of Theorem 5

4. Average Distortion

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI