Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off

Sreekumar, Sreejith; Gündüz, Deniz

doi:10.3390/e25020304

Open AccessEditor’s ChoiceArticle

Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off

by

Sreejith Sreekumar

^1,*

and

Deniz Gündüz

²

¹

Department of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14850, USA

²

Department of Electrical and Electronic Engineering, Imperial College London, London SW72AZ, UK

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(2), 304; https://doi.org/10.3390/e25020304

Submission received: 28 December 2022 / Revised: 25 January 2023 / Accepted: 31 January 2023 / Published: 6 February 2023

(This article belongs to the Special Issue Information Theory for Distributed Systems)

Download

Browse Figures

Versions Notes

Abstract

:

A two-terminal distributed binary hypothesis testing problem over a noisy channel is studied. The two terminals, called the observer and the decision maker, each has access to n independent and identically distributed samples, denoted by

U

and

V

, respectively. The observer communicates to the decision maker over a discrete memoryless channel, and the decision maker performs a binary hypothesis test on the joint probability distribution of

(U, V)

based on

V

and the noisy information received from the observer. The trade-off between the exponents of the type I and type II error probabilities is investigated. Two inner bounds are obtained, one using a separation-based scheme that involves type-based compression and unequal error-protection channel coding, and the other using a joint scheme that incorporates type-based hybrid coding. The separation-based scheme is shown to recover the inner bound obtained by Han and Kobayashi for the special case of a rate-limited noiseless channel, and also the one obtained by the authors previously for a corner point of the trade-off. Finally, we show via an example that the joint scheme achieves a strictly tighter bound than the separation-based scheme for some points of the error-exponents trade-off.

Keywords:

distributed hypothesis testing; noisy channel; error-exponents; source-channel separation; joint source-channel coding; hybrid coding

1. Introduction

Hypothesis testing (HT), which refers to the problem of choosing between one or more alternatives based on available data, plays a central role in statistics and information theory. Distributed HT (DHT) problems arise in situations where the test data are scattered across multiple terminals, and need to be communicated to a central terminal, called the decision maker, which performs the hypothesis test. The need to jointly optimize the communication scheme and the hypothesis test makes DHT problems much more challenging than their centralized counterparts. Indeed, while an efficient characterization of the optimal hypothesis test and its asymptotic performance is well known in the centralized setting, thanks to [1,2,3,4,5], the same problem in even the simplest distributed setting remains open, except for some special cases (see [6,7,8,9,10,11]).

In this work, we consider a DHT problem with two parties, an observer and a decision maker, such that the former communicates to the latter over a noisy channel. The observer and the decision maker each has access to independent and identically distributed samples, denoted by

U

and

V

, respectively. Based on the information received from the observer and its own observations

V

, the decision maker performs a binary hypothesis test on the joint distribution of

(U, V)

. Our goal is to characterize the trade-off between the best achievable rate of decay (or exponent) of the type I and type II error probabilities with respect to the sample size. We will refer to this problem as DHT over a noisy channel, and its special instance with the noisy channel replaced by a rate-limited noiseless channel as DHT over a noiseless channel.

1.1. Background

Distributed statistical inference problems were first conceived in [12] and the information-theoretic study of DHT over a noiseless channel was first investigated in [6], where the objective is to characterize Stein’s exponent

κ_{se} (ϵ)

, i.e., the optimal type II error-exponent subject to the type I error probability constrained to be at most

ϵ \in (0, 1)

. The authors therein established a multi-letter characterization of this quantity including a strong converse, which shows that

κ_{se} (ϵ)

is independent of

ϵ

. Furthermore, a single-letter characterization of

κ_{se} (ϵ)

is obtained for a special case of HT known as testing against independence (TAI), in which the joint distribution factors as a product of the marginal distributions under the alternative hypothesis. Improved lower bounds on

κ_{se} (ϵ)

were subsequently obtained in [7,8], respectively, and the strong converse was extended to zero-rate settings [13]. While all the aforementioned works focus on

κ_{se} (ϵ)

, the trade-off between the exponents of both the type I and type II error probabilities in the same setting was first explored in [14].

In the recent years, there has been a renewed interest in distributed statistical inference problems motivated by emerging machine learning applications to be served at the wireless edge, particularly in the context of semantic communications in 5G/6G communication systems [15,16]. Several extensions of the DHT over a noiseless channel problem have been studied, such as generalizations to multi-terminal settings [9,17,18,19,20,21], DHT under security or privacy constraints [22,23,24,25], DHT with lossy compression [26], interactive settings [27,28], successive refinement models [29], and more. Improved bounds have been obtained on the type I and type II error-exponents region [30,31], and on

κ_{se} (ϵ)

for testing correlation between bivariate standard normal distributions [32]. In the simpler zero-rate communication setting, there has been some progress in terms of second-order optimal schemes [33], geometric interpretation of type I and type II error-exponent region [34], and characterization of

κ_{se} (ϵ)

for sequential HT [35]. DHT over noisy communication channels with the goal of characterizing

κ_{se} (ϵ)

has been considered in [10,11,36,37].

1.2. Contributions

In this work, our objective is to explore the trade-off between the type I and type II error-exponents for DHT over a noisy channel. This problem is a generalization of [14] from noiseless rate-limited channels to noisy channels, and also of [10,11] from a type I error probability constraint to a positive type I error-exponent constraint.

Our main contributions can be summarized as follows:

(i): We obtain an inner bound (Theorem 1) on the error-exponents trade-off by using a separate HT and channel coding scheme (SHTCC) that is a combination of a type-based (type here refers to the empirical probability distribution of a sequence, see [38]) quantize-bin strategy and unequal error-protection scheme of [39]. This result is shown to recover the bounds established in [10,14]. Furthermore, we evaluate Theorem 1 for two important instances of DHT, namely TAI and its opposite, i.e., testing against dependence (TAD) in which the joint distribution under the null hypothesis factors as a product of marginal distributions.
(ii): We also obtain a second inner bound (Theorem 2) on the error-exponents trade-off by using a joint HT and channel coding scheme (JHTCC) based on hybrid coding [40]. Subsequently, we show via an example that the JHTCC scheme strictly outperforms the SHTCC scheme for some points on the error-exponent trade-off.

While the above schemes are inspired from those in [10], which have been proposed with the goal of maximizing the type II error-exponent, novel modifications in its design and analysis are required when considering both of the error-exponents. More specifically, the schemes presented here perform separate quantization-binning or hybrid coding on each individual source sequence type at the observer/encoder (as opposed to a typical ball in [10]) with the corresponding reverse operation implemented at the decision-maker/decoder. This necessitates a different analysis to compute the probabilities of the various error events contributing to the overall error-exponents. We finally mention that the DHT problem considered here was recently investigated in [41], where an inner bound on the error-exponents trade-off (Theorem 2 in [41]) is obtained using a combination of a type-based quantization scheme and unequal error protection scheme of [42] with two special messages. A qualitative comparison between Theorem 2 and Theorem 2 in [41] seems to suggest that the JHTCC scheme here uses a stronger decoding rule depending jointly on the source-channel statistics. In comparison, the metric used at the decoder for the scheme in [41] factors as the sum of two metrics, one which depends only on the source statistics, and the other which depends only on the channel statistics. Importantly, this hints that the inner bound achieved by JHTCC scheme is not subsumed by that in [41]. That said, a direct computational comparison appears difficult, as evaluating the latter requires optimization over several parameters as mentioned in the last paragraph of [41].

The remainder of the paper is organized as follows. Section 2 formulates the operational problem along with the required definitions. The main results are presented in Section 3. The proofs are furnished in Section 4. Finally, concluding remarks are given in Section 5.

2. Preliminaries

2.1. Notation

We use the following notation. All logarithms are with respect to the natural base e.

N

,

R

,

R_{\geq 0}

, and

\bar{R}

denotes the set of natural, real, non-negative real and extended real numbers, respectively. For

a, b \in R_{\geq 0}

,

[a : b] : = {n \in N : a \leq n \leq b}

and

[b] : = [1 : b]

. Calligraphic letters, e.g.,

X

, denote sets, while

X^{c}

and

| X |

stands for its complement and cardinality, respectively. For

n \in N

,

X^{n}

denotes the n-fold Cartesian product of

X

, and

x^{n} = (x_{1}, \dots, x_{n})

denotes an element of

X^{n}

. Bold-face letters denote vectors or sequences, e.g.,

x

for

x^{n}

; its length n will be clear from the context. For

i, j \in N

such that

i \leq j

,

x_{i}^{j} : = (x_{i}, x_{i + 1}, \dots, x_{j})

, the subscript is omitted when

i = 1

.

𝟙_{A}

denotes the indicator of set

A

. For a real sequence

{a_{n}}_{n \in N}

,

a_{n} \overset{(n)}{\to} b

stands for

{lim}_{n \to \infty} a_{n} = b

, while

a_{n} ≳ b

denotes

{lim}_{n \to \infty} a_{n} \geq b

. Similar notations apply for other inequalities.

O (\cdot)

,

Ω (\cdot)

and

o (\cdot)

denote standard asymptotic notations.

Random variables and their realizations are denoted by uppercase and lowercase letters, respectively, e.g., X and x. Similar conventions apply for random vectors and their realizations. The set of all probability mass functions (PMFs) on a finite set

X

is denoted by

P (X)

. The joint PMF of two discrete random variables X and Y is denoted by

P_{X Y}

; the corresponding marginals are

P_{X}

and

P_{Y}

. The conditional PMF of X given Y is represented by

P_{X | Y}

. Expressions such as

P_{X Y} = P_{X} P_{Y | X}

are to be understood as pointwise equality, i.e.,

P_{X Y} (x, y) = P_{X} (x) P_{Y | X} (y | x)

, for all

(x, y) \in X \times Y

. When the joint distribution of a triple

(X, Y, Z)

factors as

P_{X Y Z} = P_{X Y} P_{Z | X}

, these variables form a Markov chain

X - Y - Z

. When X and Y are statistically independent, we write

X ⫫ Y

. If the entries of

X^{n}

are drawn in an independent and identically distributed manner, i.e., if

P_{X^{n}} (x) = \prod_{i = 1}^{n} P_{X} (x_{i})

,

\forall x \in X^{n}

, then the PMF

P_{X^{n}}

is denoted by

P_{X}^{\otimes n}

. Similarly, if

P_{Y^{n} | X^{n}} (y | x) = \prod_{i = 1}^{n} P_{Y | X} (y_{i} | x_{i})

for all

(x, y) \in X^{n} \times Y^{n}

, then we write

P_{Y | X}^{\otimes n}

for

P_{Y^{n} | X^{n}}

. The conditional product PMF given a fixed

x \in X^{n}

is designated by

P_{Y | X}^{\otimes n} (\cdot | x)

. The probability measure induced by a PMF P is denoted by

P_{P}

. The corresponding expectation is designated by

E_{P}

.

The type or empirical PMF of a sequence

x \in X^{n}

is designated by

P_{x}

, i.e.,

P_{x} (x) : = \frac{1}{n} \sum_{i = 1}^{n} 𝟙_{{x_{i} = x}}

. The set of n-length sequences

x \in X^{n}

of type

P_{X}

is

T_{n} (P_{X}, X^{n}) : = {x \in X^{n} : P_{x} = P_{X}}

. Whenever the underlying alphabet

X^{n}

is clear from the context,

T_{n} (P_{X}, X^{n})

is simplified to

T_{n} (P_{X})

. The set of all possible types of n-length sequences

x \in X^{n}

is

T (X^{n}) : = \{P_{X} \in P (X) : | T_{n} (P_{X}, X^{n}) | \geq 1\}

. Similar notations are used for larger combinations, e.g.,

P_{x y}

,

T_{n} (P_{X Y}, X \times Y)

and

T (X^{n} \times Y^{n})

. For a given

x \in T_{n} (P_{X}, X^{n})

and a conditional PMF

P_{Y | X}

,

T_{n} (P_{Y | X}, x) : = {y \in Y^{n} : (x, y) \in T_{n} (P_{X Y}, X^{n} \times Y^{n})}

stands for the

P_{Y | X}

-conditional type class of

x

.

For PMFs

P, Q \in P (X)

, the Kullback–Leibler (KL) divergence between P and Q is

D (P | | Q) : = \sum_{x \in X} P (x) log (P (x) / Q (x))

. The conditional KL divergence between

P_{Y | X}

and

Q_{Y | X}

given

P_{X}

is

D (P_{Y | X} | | Q_{Y | X} | P_{X}) : = \sum_{x \in X} P_{X} (x) D (P_{Y | X} (\cdot | x) | | Q_{Y | X} (\cdot | x))

. The mutual information and entropy terms are denoted by

I_{P} (\cdot)

and

H_{P} (\cdot)

, respectively, where P denotes the PMF of the relevant random variables. When the PMF is clear from the context, the subscript is omitted. For

(x, y) \in X^{n} \times Y^{n}

, the empirical conditional entropy of

y

given

x

is

H_{e} (y | x) : = H_{P} (\tilde{Y} | \tilde{X})

, where

P_{\tilde{X} \tilde{Y}} = P_{x y}

. For a given function

f : Z \to R

and a random variable

Z \sim P_{Z}

, the log-moment generating function of Z with respect to f is

ψ_{P_{Z}, f} (λ) : = log E_{P_{Z}} [e^{λ f (Z)}]

whenever the expectation exists. Finally, let

\begin{matrix} ψ_{P_{Z}, f}^{*} (θ) : = sup_{λ \in R} θ λ - ψ_{P_{Z}, f} (λ), \end{matrix}

(1)

denote the rate function (see, e.g., Definition 15.5 in [43]).

2.2. Problem Formulation

Let

U

,

V

,

X

and

Y

be finite sets, and

n \in N

. The DHT over a noisy channel setting is depicted in Figure 1. Herein, the observer and the decision maker observe n independent and identically distributed samples, denoted by

u

and

v

, respectively. Based on its observations

u

, the observer outputs a sequence

x \in X^{n}

as the channel input sequence (note that the ratio of the number of channel uses to the number of data samples, termed the bandwidth ratio, is taken to be 1 for simplicity; however, our results easily generalize to arbitrary bandwidth ratios). The discrete memoryless channel (DMC) with transition kernel

P_{Y | X}

produces a sequence

y \in Y^{n}

according to the probability law

P_{Y | X}^{\otimes n} (\cdot | x)

as its output. We will assume that

P_{Y | X} (\cdot | x) ≪ P_{Y | X} (\cdot | x^{'})

,

\forall (x, x^{'}) \in X^{2}

, where

P ≪ Q

indicates the absolute continuity of P with respect to Q. Based on its observations,

y

and

v

, the decision maker performs binary HT on the joint probability distribution of

(U, V)

with the null (

H_{0}

) and alternative (

H_{1}

) hypotheses given by

\begin{matrix} H_{0} : (U, V) \sim P_{U V}^{\otimes n}, \end{matrix}

(2a)

\begin{matrix} H_{1} : (U, V) \sim Q_{U V}^{\otimes n} . \end{matrix}

(2b)

The decision maker outputs

\hat{h} \in \hat{H} : = {0, 1}

as the decision of the hypothesis test, where 0 and 1 denote

H_{0}

and

H_{1}

, respectively.

A length-n DHT code

c_{n}

is a pair of functions

(f_{n}, g_{n})

, where

(i): $f_{n} : U^{n} \to P (X^{n})$ denotes the encoding function;
(ii): $g_{n} : V^{n} \times Y^{n} \to \hat{H}$ denotes a deterministic decision function specified by an acceptance region (for null hypothesis $H_{0}$ ) $A_{n} \subseteq V^{n} \times Y^{n}$ as $g_{n} (v, y) = 1 - 𝟙_{{(v, y) \in A_{n}}}, \forall (v, y) \in V^{n} \times Y^{n}$ .

We emphasize at this point that there is no loss in generality in restricting our attention to a deterministic decision function for the objective of characterizing the error-exponents trade-off in HT (for e.g., see Lemma 3 in [24])).

A code

c_{n} = (f_{n}, g_{n})

induces the joint PMFs

P_{U V X Y \hat{H}}^{(c_{n})}

and

Q_{U V X Y \hat{H}}^{(c_{n})}

under the null and alternative hypotheses, respectively, where

\begin{matrix} P_{U V X Y \hat{H}}^{(c_{n})} (u, v, x, y, \hat{h}) : = P_{U V}^{\otimes n} (u, v) f_{n} (x | u) P_{Y | X}^{\otimes n} (y | x) 𝟙_{\{g_{n} (v, y) = \hat{h}\}}, \end{matrix}

(3)

and

\begin{matrix} Q_{U V X Y \hat{H}}^{(c_{n})} (u, v, x, y, \hat{h}) : = Q_{U V}^{\otimes n} (u, v) f_{n} (x | u) P_{Y | X}^{\otimes n} (y | x) 𝟙_{\{g_{n} (v, y) = \hat{h}\}}, \end{matrix}

(4)

respectively. For a given code

c_{n}

, the type I and type II error probabilities are

α_{n} (c_{n}) : = P_{P^{(c_{n})}} (\hat{H} = 1)

and

β_{n} (c_{n}) : = P_{Q^{(c_{n})}} (\hat{H} = 0)

respectively. The following definition formally states the error-exponents trade-off we aim to characterize.

Definition 1

(Error-exponent region).An error-exponent pair

(κ_{α}, κ_{β}) \in R_{\geq 0}^{2}

is said to be achievable if there exists a sequence of codes

{c_{n}}_{n \in N}

such that

\underset{n \to \infty}{lim inf} - \frac{1}{n} log α_{n} (c_{n}) \geq κ_{α},

(5a)

\underset{n \to \infty}{lim inf} - \frac{1}{n} log β_{n} (c_{n}) \geq κ_{β} .

(5b)

The error-exponent region

\bar{R}

is the closure of the set of all achievable error-exponent pairs

(κ_{α}, κ_{β})

. Set

R : = {(κ_{α}, κ (κ_{α})) : κ_{α} \in (0, κ_{α}^{★})}

, where

κ_{α}^{★} = inf {κ_{α} : κ (κ_{α}) = 0}

and

κ (κ_{α}) : = sup {κ_{β} : (κ_{α}, κ_{β}) \in \bar{R}}

.

We are interested in a computable characterization of

R

, which pertains to the region of positive error-exponents (i.e., excluding the boundary points corresponding to Stein’s exponent). To this end, we present two inner bounds on

R

in the next section.

3. Main Results

In this section, we obtain two inner bounds on

R

, first using a separation-based scheme which performs independent HT and channel coding, termed the SHTCC scheme, and the second via a joint HT and channel coding scheme that uses hybrid coding for communication between the observer and the decision maker.

3.1. Inner Bound on $R$ via SHTCC Scheme

Let

S = X

and

P_{S X Y} = P_{S X} P_{Y | X}

be a PMF under which

S - X - Y

forms a Markov chain. For

x \in X

, let

Λ_{x, P_{S X Y}} (y) : = log (P_{Y | X = x} (y) / P_{Y | S = x} (y))

and define

\begin{matrix} E_{sp} (P_{S X}, θ) : = \sum_{s \in S} P_{S} (s) ψ_{P_{Y | S = s}, Λ_{s, P_{S X Y}}}^{*} (θ), \end{matrix}

where the rate function

ψ^{*}

is defined in (1). For a fixed

P_{S X}

and

R \geq 0

, let

\begin{matrix} E_{ex} (R, P_{S X}) : = max_{ρ \geq 1} - ρ R - ρ log ( & \sum_{s, x, \tilde{x}} P_{S} (s) P_{X | S} (x | s) P_{X | S} (\tilde{x} | s) \\ {(\sum_{y} {(P_{Y | X} (y | x) P_{Y | X} (y | \tilde{x}))}^{\frac{1}{2}})}^{\frac{1}{ρ}}), \end{matrix}

denote the expurgated exponent [38,44]. Let

W

be a finite set and

F

denote the set of all continuous mappings from

P (U)

to

P (W | U)

, where

P (W | U)

is the set of all conditional distributions

P_{W | U}

. Set

θ_{l} (P_{S X}) : = \sum_{s \in S} P_{S} (s) D (P_{Y | S = s} | | P_{Y | X = s})

,

θ_{u} (P_{S X}) : = \sum_{s \in S} P_{S} (s) D (P_{Y | X = s} | | P_{Y | S = s})

,

Θ (P_{S X}) : = (- θ_{l} (P_{S X}), θ_{u} (P_{S X}))

. Denote an arbitrary element of

F \times R_{\geq 0} \times P (S \times X) \times Θ (P_{S X})

by

(ω, R, P_{S X}, θ)

, and set

\begin{matrix} L (κ_{α}) : = \{\begin{matrix} (ω, R, P_{S X}, θ) : & ζ (κ_{α}, ω) - ρ (κ_{α}, ω) \leq R < I_{P} (X; Y | S), P_{S X Y} = P_{S X} P_{Y | X} \\ \min \{E_{sp} (P_{S X}, θ), E_{ex} (R, P_{S X}), E_{b} (κ_{α}, ω, R)\} \geq κ_{α} \end{matrix}\}, \end{matrix}

\begin{matrix} \hat{L} (κ_{α}, ω) : = \{P_{\hat{U} \hat{V} \hat{W}} : D (P_{\hat{U} \hat{V} \hat{W}} | | P_{U V \hat{W}}) \leq κ_{α}, P_{\hat{W} | \hat{U}} = ω (P_{\hat{U}}), P_{U V \hat{W}} = P_{U V} P_{\hat{W} | \hat{U}}\}, \\ E_{b} (κ_{α}, ω, R) : = \{\begin{matrix} R - ζ (κ_{α}, ω) + ρ (κ_{α}, ω), & if 0 \leq R < ζ (κ_{α}, ω), \\ \infty, & otherwise, \end{matrix} \end{matrix}

(6a)

\begin{matrix} ζ (κ_{α}, ω) : = max_{\begin{matrix} P_{\hat{U} \hat{W}} : \exists P_{\hat{V}}, P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}} I_{P} (\hat{U}; \hat{W}), \end{matrix}

(6b)

\begin{matrix} ρ (κ_{α}, ω) : = \min_{\begin{matrix} P_{\hat{V} \hat{W}} : \exists P_{\hat{U}}, P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}} I_{P} (\hat{V}; \hat{W}), \\ E_{1} (κ_{α}, ω) : = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}), \end{matrix}

(6c)

\begin{matrix} E_{2} (κ_{α}, ω, R) : = \{\begin{matrix} \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{2} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + E_{b} (κ_{α}, ω, R), & if R < ζ (κ_{α}, ω), \\ \infty, & otherwise, \end{matrix} \end{matrix}

\begin{matrix} E_{3} (κ_{α}, ω, R, P_{S X}) : = \{\begin{matrix} \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + E_{b} (κ_{α}, ω, R) + E_{ex} (R, P_{S X}), \\ if R < ζ (κ_{α}, ω), \\ \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + ρ (κ_{α}, ω) + E_{ex} (R, P_{S X}), \\ otherwise, \end{matrix} \end{matrix}

\begin{matrix} E_{4} (κ_{α}, ω, R, P_{S X}, θ) : = \{\begin{matrix} \min_{P_{\hat{V}} : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)} D (P_{\hat{V}} | | Q_{V}) + E_{b} (κ_{α}, ω, R) + E_{m} (P_{S X}, θ) - θ, \\ if R < ζ (κ_{α}, ω), \\ \min_{P_{\hat{V}} : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)} D (P_{\hat{V}} | | Q_{V}) + ρ (κ_{α}, ω) + E_{m} (P_{S X}, θ) - θ, \\ otherwise, \end{matrix} \\ where, \end{matrix}

\begin{matrix} T_{1} (κ_{α}, ω) : = \{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) : & P_{\tilde{U} \tilde{W}} = P_{\hat{U} \hat{W}}, P_{\tilde{V} \tilde{W}} = P_{\hat{V} \hat{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}} \\ : = Q_{U V} P_{\tilde{W} | \tilde{U}} for some P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}\}, \\ T_{2} (κ_{α}, ω) : = \{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) : & P_{\tilde{U} \tilde{W}} = P_{\hat{U} \hat{W}}, P_{\tilde{V}} = P_{\hat{V}}, H_{P} (\tilde{W} | \tilde{V}) \geq H_{P} (\hat{W} | \hat{V}), \\ Q_{\tilde{U} \tilde{V} \tilde{W}} : = Q_{U V} P_{\tilde{W} | \tilde{U}} for some P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}\}, \\ T_{3} (κ_{α}, ω) : = \{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) : & P_{\tilde{U} \tilde{W}} = P_{\hat{U} \hat{W}}, P_{\tilde{V}} = P_{\hat{V}}, Q_{\tilde{U} \tilde{V} \tilde{W}} : = Q_{U V} P_{\tilde{W} | \tilde{U}} \\ for some P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}\} . \end{matrix}

(6d)

We have the following lower bound for

κ (κ_{α})

, which translates to an inner bound for

R

.

Theorem 1

(Inner bound via SHTCC scheme).

κ (κ_{α}) \geq κ_{s}^{★} (κ_{α}),

where

\begin{matrix} κ_{s}^{★} (κ_{α}) & : = max_{\begin{matrix} (ω, R, P_{S X}, θ) \in L (κ_{α}) \end{matrix}} \min {E_{1} (κ_{α}, ω), E_{2} (κ_{α}, ω, R), E_{3} (κ_{α}, ω, R, P_{S X}), \\ E_{4} (κ_{α}, ω, R, P_{S X}, θ)} . \end{matrix}

(7)

The proof of Theorem 1 is presented in Section 4.1. The SHTCC scheme, which achieves the error-exponent pair

(κ_{α}, κ_{s}^{★} (κ_{α}))

, is a coding scheme analogous to separate source and channel coding for the lossy transmission of a source over a communication channel with correlated side-information at the receiver [45], however, with the objective of reliable HT. In this scheme, the source samples are first compressed to an index, which acts as the message to be transmitted over the channel. But, in contrast to standard communication problems, there is a need to protect certain messages more reliably than others; hence, an unequal error-protection scheme [39,42] is used. To describe briefly, the SHTCC scheme involves

(i)

the quantization and binning of

u

sequences, whose type

P_{u}

is within a

κ_{α}

-neighborhood (in terms of KL divergence) of

P_{U}

, using

V

as side information at the decision maker for decoding, and

(i i)

unequal error-protection channel coding scheme in [39] for protecting a special message which informs the decision maker that

P_{u}

lies outside the

κ_{α}

-neighborhood of

P_{U}

. The output of the channel decoder is processed by an empirical conditional entropy decoder which recovers the quantization codeword with the least conditional entropy with

V

. Since this decoder depends only on the empirical distributions of the observations, it is universal and useful in the hypothesis testing context, where multiple distributions are involved (as was first noted in [8]). The various factors

E_{1}

to

E_{4}

in (7) have natural interpretations in terms of events that could possibly result in a hypothesis testing error. Specifically,

E_{1}

and

E_{2}

correspond to the error events arising due to quantization and binning, respectively, while

E_{3}

and

E_{4}

correspond to the error events of wrongly decoding an ordinary channel codeword and special message codeword, respectively.

Remark 1

(Generalization of Han–Kobayashi inner bound). In Theorem 1 in [14], Han and Kobayashi obtained an inner bound on

R

for DHT over a noiseless channel. At a high level, their coding scheme involves type-based quantization of

u \in U^{n}

sequences, whose type

P_{u}

lies within a

κ_{α}

-neighborhood of

P_{U}

, where

κ_{α}

is the desired type I error-exponent. As a corollary, Theorem 1 recovers the lower bound for

κ (κ_{α})

obtained in [14] by

(i)

setting

E_{ex} (R, P_{S X})

,

E_{m} (P_{S X}, θ)

and

E_{m} (P_{S X}, θ) - θ

to ∞, which hold when the channel is noiseless, and

(i i)

maximizing over the set

\{(ω, R, P_{S X}, θ) \in F \times R_{\geq 0} \times P (S \times X) \times Θ (P_{S X}) : ζ (κ_{α}, ω) \leq R < I_{P} (X; Y | S), P_{S X Y} : = P_{S X} P_{Y | X}\} \subseteq L (κ_{α})

in (7). Then, note that the terms

E_{2} (κ_{α}, ω, R)

,

E_{3} (κ_{α}, ω, R, P_{S X})

and

E_{4} (κ_{α}, ω, R, P_{S X}, θ)

all equal ∞, and thus the inner bound in Theorem 1 reduces to that given in Theorem 1 in [14].

Remark 2

(Improvement via time-sharing). Since the lower bound on

κ (κ_{α})

in Theorem 1 is not necessarily concave, a tighter bound can be obtained using the technique of time-sharing similar to Theorem 3 in [14]. We omit its description, as it is cumbersome, although straightforward.

Theorem 1 also recovers the lower bound for the optimal type II error-exponent for a fixed type I error probability constraint established in Theorem 2 in [10] by letting

κ_{α} \to 0

. The details are provided in Appendix A. Further, specializing the lower bound in Theorem 1 to the case of TAI, i.e., when

Q_{U V} = P_{U} P_{V}

, we obtain the following corollary which characterizes the optimal type II error-exponent for TAI established in Proposition 7 in [10] as a special case.

Corollary 1

(Inner bound for TAI). Let

P_{U V} \in P (U \times V)

be an arbitrary distribution and

Q_{U V} = P_{U} P_{V}

. Then,

\begin{matrix} κ (κ_{α}) \geq κ_{s}^{★} (κ_{α}) \geq κ_{i}^{★} (κ_{α}), \end{matrix}

(8)

where

\begin{matrix} κ_{i}^{★} (κ_{α}) : = max_{\begin{matrix} (ω, P_{S X}, θ) \in L^{★} (κ_{α}) \end{matrix}} \min \{E_{1}^{i} (κ_{α}, ω), E_{2}^{i} (κ_{α}, ω, P_{S X}), E_{3}^{i} (κ_{α}, ω, P_{S X}, θ)\}, \\ L^{★} (κ_{α}) : = \{\begin{matrix} (ω, P_{S X}, θ) \in F \times P (S \times X) \times Θ (P_{S X}) : ζ (κ_{α}, ω) < I_{P} (X; Y | S), \\ P_{S X Y} : = P_{S X} P_{Y | X}, \min \{E_{sp} (P_{S X}, θ), E_{ex} (ζ (κ_{α}, ω), P_{S X})\} \geq κ_{α} \end{matrix}\}, \\ E_{1}^{i} (κ_{α}, ω) : = \min_{\begin{matrix} P_{\hat{V} \hat{W}} : \exists P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}} I_{P} (\hat{V}; \hat{W}) + D (P_{\hat{V}} | | P_{V}), \\ E_{2}^{i} (κ_{α}, ω, P_{S X}) : = ρ (κ_{α}, ω) + E_{ex} (ζ (κ_{α}, ω), P_{S X}), \\ E_{3}^{i} (κ_{α}, ω, P_{S X}, θ) : = ρ (κ_{α}, ω) + E_{sp} (P_{S X}, θ) - θ, \end{matrix}

(9)

and

\hat{L} (κ_{α}, ω)

,

ζ (κ_{α}, ω)

and

ρ (κ_{α}, ω)

are defined in (6a), (6b) and (6c), respectively. In particular,

\begin{matrix} lim_{κ_{α} \to 0} κ (κ_{α}) = κ_{s}^{★} (0) = κ_{i}^{★} (0) = max_{\begin{matrix} P_{W | U} : I_{P} (U; W) \leq C (P_{Y | X}), \\ P_{U V W} = P_{U V} P_{W | U} \end{matrix}} I_{P} (V; W), \end{matrix}

(10)

where

| W | \leq | U | + 1

and

C (P_{Y | X})

denotes the capacity of the channel

P_{Y | X}

.

The proof of Corollary 1 is given in Section 4.2. Its achievability follows from a special case of the SHTCC scheme without binning at the encoder.

Next, we consider testing against dependence (TAD) for which

Q_{U V}

is an arbitrary joint distribution and

P_{U V} = Q_{U} Q_{V}

. Theorem 1 specialized to TAD gives the following corollary.

Corollary 2

(Inner bound for TAD). Let

Q_{U V} \in P (U \times V)

be an arbitrary distribution and

P_{U V} = Q_{U} Q_{V}

. Then,

\begin{matrix} κ (κ_{α}) \geq κ_{s}^{★} (κ_{α}) = κ_{d}^{★} (κ_{α}) : = max_{\begin{matrix} (ω, P_{S X}, θ) \\ \in L^{★} (κ_{α}) \end{matrix}} \min \{E_{1}^{d} (κ_{α}, ω), E_{2}^{d} (κ_{α}, ω, P_{S X}), E_{3}^{d} (P_{S X}, θ)\}, \end{matrix}

(11)

where

\begin{matrix} E_{1}^{d} (κ_{α}, ω) : = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \\ \in T_{1} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) \geq \min_{\begin{matrix} (P_{\hat{V} \hat{W}}, Q_{V \hat{W}}) : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω), \\ Q_{U V \hat{W}} = Q_{U V} P_{\hat{W} | \hat{U}} \end{matrix}} D (P_{\hat{V} \hat{W}} | | Q_{V \hat{W}}), \\ E_{2}^{d} (κ_{α}, ω, P_{S X}) : = E_{ex} (ζ (κ_{α}, ω), P_{S X}), \\ E_{3}^{d} (P_{S X}, θ) : = E_{sp} (P_{S X}, θ) - θ, \end{matrix}

and

\hat{L} (κ_{α}, ω)

,

T_{1} (κ_{α}, ω)

and

L^{★} (κ_{α})

are given in (6a), (6d) and (9), respectively. In particular,

\begin{matrix} lim_{κ_{α} \to 0} κ (κ_{α}) \geq κ_{s}^{★} (0) = κ_{d}^{★} (0) \geq κ_{TAD}^{★}, \end{matrix}

(12)

where

\begin{matrix} κ_{TAD}^{★} = max_{\begin{matrix} (P_{W | U}, P_{S X}) : \\ I_{Q} (W; U) \leq I_{P} (X; Y | S), \\ Q_{U V W} = Q_{U V} P_{W | U}, \\ P_{S X Y} = P_{S X} P_{Y | X} \end{matrix}} \min \{D (Q_{V} Q_{W} | | Q_{V W}), E_{ex} (I_{Q} (U; W), P_{S X}), θ_{l} (P_{S X})\}, \end{matrix}

and

| W | \leq | U | + 1

.

The proof of Corollary 2 is given in Section 4.3. Note that the expression for

κ_{s}^{★} (κ_{α})

given in (11) is relatively simpler to compute compared to that in Theorem 1. This will be handy in showing that the JHTCC scheme strictly outperforms the SHTCC scheme, which we highlight via an example in Section 3.3 below.

3.2. Inner Bound via JHTCC Scheme

It is well known that joint source-channel coding schemes offer advantages over separation-based coding schemes in several information theoretic problems, such as the transmission of correlated sources over a multiple-access channel [40,46] and the error-exponent in the lossless or lossy transmission of a source over a noisy channel [42,47]. Recently, it was shown via an example in [10] that joint schemes also achieve a strictly larger type II error-exponent in DHT problems compared to a separation-based scheme in some scenarios. Motivated by this, we present an inner bound on

R

using a generalization of the JHTCC scheme in [10].

Let

W

and

S

be arbitrary finite sets, and

F^{'}

denote the set of all continuous mappings from

P (U \times S)

to

P (W | U \times S)

, where

P (W | U \times S)

is the set of all conditional distributions

P_{W | U S}

. Let

(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W}, P_{X^{'} | U S})

denote an arbitrary element of

P (S) \times F^{'} \times P (X | U \times S \times W) \times P (X | U \times S)

, and define

\begin{matrix} L_{h} (κ_{α}) : = \{(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W}, P_{X^{'} | U S}) : E_{b}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \geq κ_{α}\}, \\ {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \\ : = \{\begin{matrix} (P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} : & D (P_{\hat{U} \hat{V} \hat{W} \hat{Y} | S} | | P_{U V \hat{W} Y | S} | P_{S}) \leq κ_{α}, P_{S U V \hat{W} X Y} : = P_{S} P_{U V} P_{\hat{W} | \hat{U} S} P_{X | U S W} P_{Y | X}, \\ P_{\hat{W} | \hat{U} S} = ω^{'} (P_{\hat{U}}, P_{S}) \end{matrix}\}, \\ E_{b}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) : = ρ^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) - ζ_{q}^{'} (κ_{α}, ω^{'}, P_{S}), \\ ζ^{'} (κ_{α}, ω^{'}, P_{S}) : = max_{\begin{matrix} P_{\hat{U} \hat{W} S} : \exists P_{\hat{V} \hat{Y}} s . t . \\ P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} I_{P} (\hat{U}; \hat{W} | S), \\ ρ^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) : = \min_{\begin{matrix} P_{\hat{V} \hat{W} \hat{Y} S} : \exists P_{\hat{U}} s . t . \\ P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} I_{P} (\hat{Y}, \hat{V}; \hat{W} | S), \\ E_{1}^{'} (κ_{α}, ω^{'}) : = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) \in T_{1}^{'} (κ_{α}, ω^{'}) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | | Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | P_{S}), \\ E_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) : = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) \in T_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | | Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | P_{S}) \\ + E_{b}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}), \\ E_{3}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}, P_{X^{'} | U S}) : = \min_{\begin{matrix} P_{\hat{V} \hat{Y} S} : P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} D (P_{\hat{V} \hat{Y} | S} | | Q_{V Y^{'} | S} | P_{S}) \\ + E_{b}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}), \\ Q_{S U V X^{'} Y^{'}} : = P_{S} Q_{U V} P_{X^{'} | U S} P_{Y^{'} | X^{'}}, P_{Y^{'} | X^{'}} : = P_{Y | X}, \\ T_{1}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) : = \{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, & P_{\tilde{U} \tilde{W} S} = P_{\hat{U} \hat{W} S}, P_{\tilde{V} \tilde{W} \tilde{Y} S} = P_{\hat{V} \hat{W} \hat{Y} S}, \\ Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) : & Q_{S \tilde{U} \tilde{V} \tilde{W} \tilde{X} \tilde{Y}} : = P_{S} Q_{U V} P_{\tilde{W} | \tilde{U} S} P_{X | U S W} P_{Y | X} \\ for some P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}\}, \\ T_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) : = \{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, & P_{\tilde{U} \tilde{W} S} = P_{\hat{U} \hat{W} S}, P_{\tilde{V} \tilde{Y} S} = P_{\hat{V} \hat{Y} S}, \\ Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) : & H_{P} (\tilde{W} | \tilde{V}, \tilde{Y}, S) \geq H_{P} (\hat{W} | \hat{V}, \hat{Y}, S), \\ Q_{S \tilde{U} \tilde{V} \tilde{W} \tilde{X} \tilde{Y}} : = P_{S} Q_{U V} P_{\tilde{W} | \tilde{U} S} P_{X | U S W} P_{Y | X} \\ for some P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}\} . \end{matrix}

Then, we have the following result.

Theorem 2

(Inner bound via JHTCC scheme).

\begin{matrix} κ (κ_{α}) \geq max \{κ_{h}^{★} (κ_{α}), κ_{u}^{★} (κ_{α})\}, \end{matrix}

(13)

where

\begin{matrix} κ_{h}^{★} (κ_{α}) : = max_{\begin{matrix} (P_{S}, ω^{'}, P_{X | U S W}, P_{X^{'} | U S}) \in L_{h} (κ_{α}) \end{matrix}} \min {E_{1}^{'} (κ_{α}, ω^{'}), E_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}), \end{matrix}

\begin{matrix} E_{3}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}, P_{X^{'} | U S})}, \\ κ_{u}^{★} (κ_{α}) : = max_{\begin{matrix} (P_{S}, P_{X | U S}) \in P (S) \times P (X | S \times U) \end{matrix}} κ_{u} (κ_{α}, P_{S}, P_{X | U S}), \\ κ_{u} (κ_{α}, P_{S}, P_{X | U S}) : = \min_{\begin{matrix} P_{S} P_{\hat{V} \hat{Y}} : D (P_{\hat{V} \hat{Y} | S} | | P_{V Y | S} | P_{S}) \leq κ_{α} \end{matrix}} D (P_{\hat{V} \hat{Y} | S} | | Q_{V Y | S} | P_{S}), \\ P_{S U V X Y} = P_{S} P_{U V} P_{X | U S} P_{Y | X} a n d Q_{S U V X Y} = P_{S} Q_{U V} P_{X | U S} P_{Y | X} . \end{matrix}

The proof of Theorem 2 is given in Section 4.4, and utilizes a generalization of hybrid coding scheme [40] to achieve the stated inner bound. Specifically, the error-exponent pair

(κ_{α}, κ_{h}^{★} (κ_{α}))

is achieved using type-based hybrid coding, while

(κ_{α}, κ_{u}^{★} (κ_{α}))

is realized by uncoded transmission, in which the channel input

X

is generated as the output of a DMC

P_{X | U}

with input

U

(along with time sharing). In standard hybrid coding, the source sequence is first quantized via joint typicality and the channel input is then chosen as a function of both the original source sequence and its quantization. At the decoder, the quantized codeword is first recovered using the channel output and side information via joint typicality decoding, and an estimate of the source sequence is output as a function of the channel output and recovered codeword. The quantization part forms the digital part of the scheme, while the use of the source sequence for encoding and channel output for decoding comprises the analog part. The scheme derives its name from these joint hybrid digital-analog operations. In the HT context considered here, the aforementioned source quantization is replaced by a type-based quantization at the encoder, and the joint typicality decoder is replaced by a universal empirical conditional entropy decoder. We note that Theorem 2 recovers the lower bound on the optimal type II error-exponent proved in Theorem 5 in [10]. The details are provided in Appendix B.

Next, we provide a comparison between the SHTCC and JHTCC bounds via an example as mentioned earlier.

3.3. Comparison of Inner Bounds

We compare the inner bounds established in Theorem 1 and Theorem 2 for a simple setting of TAD over a BSC. For this purpose, we will use the inner bound

κ_{d}^{★} (κ_{α})

stated in Corollary 2 and

κ_{u}^{★} (κ_{α})

that is achieved by uncoded transmission. Our objective is to illustrate that the JHTCC scheme achieves a strictly tighter bound on

R

compared to the SHTCC scheme, at least for some points of the trade-off.

Example 1.

Let

p, q \in [0, 0.5]

,

U = V = X = Y = S = {0, 1}

,

Q_{U V} = [\begin{matrix} q & 0.5 - q \\ 0.5 - q & q \end{matrix}], P_{Y | X} = [\begin{matrix} 1 - p & p \\ p & 1 - p \end{matrix}], a n d P_{U V} = Q_{U} Q_{V} .

A comparison of the inner bounds achieved by the SHTCC and JHTCC schemes for the above example are shown in Figure 2 and Figure 3, where we plot the error-exponents trade-off achieved by uncoded transmission (a lower bound for the JHTCC scheme), and the expurgated exponent at a zero rate:

\begin{matrix} E_{ex} (0) : = max_{P_{S X} \in P (S \times X)} E_{ex} (P_{S X}, 0) = - 0.25 log (4 p (1 - p)), \end{matrix}

which is an upper bound on

κ_{d}^{★} (κ_{α})

for any

κ_{α} \geq 0

. To compute

E_{ex} (0)

, we used the closed-form expression for

E_{ex} (\cdot)

given in Problem 10.26(c) in [38]. Clearly, it can be seen that the JHTCC scheme outperforms SHTCC scheme for

κ_{α}

below a threshold, which depends on the source and channel distributions. In particular, the threshold below which improvement is seen is reduced when the channel or the source becomes more uniform. The former behavior can be seen directly by comparing the subplots in Figure 2 and Figure 3, while the latter can be noted by comparing Figure 2a with Figure 3a, or Figure 2b with Figure 3b.

4. Proofs

4.1. Proof of Theorem 1

We will show the achievability of the error-exponent pair

(κ_{α}, κ_{s}^{★} (κ_{α}))

by constructing a suitable ensemble of HT codes, and showing that the expected type I and type II error probabilities (over this ensemble) satisfy (5) for the pair

(κ_{α}, κ_{s}^{★} (κ_{α}))

. Then, an expurgation argument [44] will be used to show the existence of a HT code that satisfies (5) for the same error-exponent pair, thus showing that

(κ_{α}, κ_{s}^{★} (κ_{α})) \in R

as desired.

Let

n \in N

,

| W | < \infty

,

κ_{α} > 0

,

(ω, R, P_{S X}, θ) \in L (κ_{α})

,

R^{'} : = ζ (κ_{α}, ω)

, and

η > 0

be a small number. Additionally, suppose that

R \geq 0

satisfies

\begin{matrix} ζ (κ_{α}, ω) - ρ (κ_{α}, ω) \leq R < I_{P} (X; Y | S), \end{matrix}

(14)

where

ζ (κ_{α}, ω)

and

ρ (κ_{α}, ω)

are defined in (6b) and (6c), respectively. The SHTCC scheme is as follows:

Encoding: The observer’s encoder is composed of two stages, a source encoder followed by a channel encoder.
Source encoder: The source encoding comprises a quantization scheme followed by binning to reduce the rate if necessary.
Quantization codebook: Let

\begin{matrix} D_{n} (P_{U}, η) : = \{P_{\hat{U}} \in T (U^{n}) : D (P_{\hat{U}} | | P_{U}) \leq κ_{α} + η\} . \end{matrix}

(15)

Consider some ordering on the types in

D_{n} (P_{U}, η)

and denote the elements as

P_{{\hat{U}}_{i}}

for

i \in [| D_{n} (P_{U}, η) |]

. For each type

P_{{\hat{U}}_{i}} \in D_{n} (P_{U}, η)

,

i \in [| D_{n} (P_{U}, η) |]

, choose a joint type variable

P_{{\hat{U}}_{i} {\hat{W}}_{i}} \in T (U^{n} \times W^{n})

such that

\begin{matrix} D (P_{{\hat{W}}_{i} | {\hat{U}}_{i}} | | P_{W_{i} | U} | P_{{\hat{U}}_{i}}) \leq \frac{η}{3}, \end{matrix}

(16a)

\begin{matrix} I_{P} ({\hat{U}}_{i}; {\hat{W}}_{i}) \leq R^{'} + \frac{η}{3}, \end{matrix}

(16b)

where

P_{W_{i} | U} = ω (P_{{\hat{U}}_{i}})

. Note that this is always possible for n sufficiently large by the definition of

R^{'}

.

Let

\begin{matrix} D_{n} (P_{U W}, η) & : = \{P_{{\hat{U}}_{i} {\hat{W}}_{i}} : i \in [| D_{n} (P_{U}, η) |]\}, \end{matrix}

(17a)

\begin{matrix} R_{i}^{'} & : = I_{P} ({\hat{U}}_{i}; {\hat{W}}_{i}) + (η / 3), i \in [| D_{n} (P_{U}, η) |], \end{matrix}

(17b)

\begin{matrix} M_{i}^{'} & : = [1 + \sum_{k = 1}^{i - 1} e^{n R_{k}^{'}} : \sum_{k = 1}^{i} e^{n R_{k}^{'}}], \end{matrix}

(17c)

and

B_{W, n} = \{W (j), 1 \leq j \leq \sum_{i = 1}^{| D_{n} (P_{U}, η) |} | M_{i}^{'} |\}

denote a random quantization codebook such that the codeword

W (j) \sim Unif [T_{n} (P_{{\hat{W}}_{i}})]

, if

j \in M_{i}^{'}

for some

i \in [| D_{n} (P_{U}, η) |]

. Denote a realization of

B_{W, n}

by

B_{W, n} = \{w (j) \in W^{n}, 1 \leq j \leq \sum_{i = 1}^{| D_{n} (P_{U}, η) |} | M_{i}^{'} |\}

.

Quantization scheme: For a given codebook

B_{W, n}

and

u \in T_{n} (P_{{\hat{U}}_{i}})

such that

P_{{\hat{U}}_{i}} \in D_{n} (P_{U}, η)

for some

i \in [| D_{n} (P_{U}, η) |]

, let

\tilde{M} (u, B_{W, n}) : = \{j \in M_{i}^{'} : w (j) \in B_{W, n}, (u, w (j)) \in T_{n} (P_{{\hat{U}}_{i} {\hat{W}}_{i}}), P_{{\hat{U}}_{i} {\hat{W}}_{i}} \in D_{n} (P_{U W}, η)\} .

If

| \tilde{M} (u, B_{W, n}) | \geq 1

, let

M^{'} (u, B_{W, n})

denote an index selected uniformly at random from the set

\tilde{M} (u, B_{W, n})

, otherwise, set

M^{'} (u, B_{W, n}) = 0

. Denoting the support of

M^{'} (u, B_{W, n})

by

M^{'}

, we have for sufficiently large n that

\begin{matrix} | M^{'} | \leq 1 + \sum_{i = 1}^{| D_{n} (P_{U}, η) |} e^{n R_{i}^{'}} & \leq 1 + | D_{n} (P_{U}, η) | e^{{\overset{max}{P_{\hat{U} \hat{W}} \in D_{n} (P_{U W}, η)}}^{n I (\hat{U}; \hat{W}) + (n η / 3)}} \leq e^{n (R^{'} + η)}, \end{matrix}

(18)

where the last inequality uses (16b) and

| D_{n} (P_{U}, η) | \leq {(n + 1)}^{| U |}

. Binning: If

| M^{'} | > | M |

, then the source encoder performs binning as described below. Let

R_{n} : = log (e^{n R} / | D_{n} (P_{U}, η) |)

,

M_{i} : = [1 + (i - 1) R_{n} : i R_{n}], i \in [| D_{n} (P_{U}, η) |]

, and

M : = {0} ⋃ \{\cup_{i \in [| D_{n} (P_{U}, η) |]} M_{i}\} .

Note that

\begin{matrix} e^{n R_{n}} \geq e^{n R - | U | log (n + 1)} . \end{matrix}

(19)

Let

f_{B}

denote the random binning function such that for each

j \in M_{i}^{'}

,

f_{B} (j) \sim Unif [| M_{i} |]

for

i \in [| D_{n} (P_{U}, η) |]

, and

f_{B} (0) = 0

with probability one. Denote a realization of

f_{B} (j)

by

f_{b}

, where

f_{b} : M^{'} \to M

. Given a codebook

B_{W, n}

and binning function

f_{b}

, the source encoder outputs

M = f_{b} (M^{'} (u, B_{W, n}))

for

u \in U^{n}

. If

| M^{'} | \leq | M |

, then

f_{b}

is taken to be the identity map (no binning), and in this case,

M = M^{'} (u, B_{W, n})

.

Channel codebook: Let

B_{X, n} : = {X (m) \in X^{n}, m \in M}

denote a random channel codebook generated as follows. Without loss of generality, denote the elements of the set

S = X

as

1, \dots, | X |

. The codeword length n is divided into

| S | = | X |

blocks, where the length of the first block is

⌈ P_{S} (1) n ⌉

, the second block is

⌈ P_{S} (2) n ⌉

, so on and so forth, and the length of the last block is chosen such that the total length is n. For

i \in [| X |]

, let

k_{i} : = \sum_{l = 1}^{i - 1} ⌈ P_{S} (l) n ⌉ + 1

and

{\bar{k}}_{i} : = \sum_{l = 1}^{i} ⌈ P_{S} (l) n ⌉

, where the empty sum is defined to be zero. Let

s \in X^{n}

be such that

s_{k_{i}}^{{\bar{k}}_{i}} = i

, i.e., the elements of

s

equal i in the

i^{t h}

block for

i \in [| X |]

. Let

X (0) = s

with probability one, and the remaining codewords

X (m), m \in M ∖ {0}

be constant composition codewords [38] selected such that

X_{k_{i}}^{{\bar{k}}_{i}} (m) \sim Unif [T_{⌈ P_{S} (i) n ⌉} ({\hat{P}}_{X | S} (\cdot | i))]

, where

{\hat{P}}_{X | S}

is such that

T_{⌈ P_{S} (i) n ⌉} ({\hat{P}}_{X | S} (\cdot | i))

is non-empty and

D ({\hat{P}}_{X | S} | | P_{X | S} | P_{S}) \leq \frac{η}{3}

. Denote a realization of

B_{X, n}

by

B_{X, n} : = {x (m) \in X^{n}, m \in M}

. Note that for

m \in M ∖ {0}

and large n, the codeword pair

(x (0), x (m))

has joint type (approx)

P_{x (0) x (j)} = {\hat{P}}_{S X} : = P_{S} {\hat{P}}_{X | S}

.

Channel encoder: For a given

B_{X, n}

, the channel encoder outputs

x = x (m)

for output m from the source encoder. Denote this map by

f_{B_{X, n}} : M \to X^{n}

.

Encoder: Denote by

f_{n} : U^{n} \to P (X^{n})

the encoder induced by all the above operations, i.e.,

f_{n} (\cdot | u) = f_{B_{X, n}} \circ f_{b} (M^{'} (u, B_{W, n}))

.

Decision function: The decision function consists of three parts, a channel decoder, a source decoder and a tester.

Channel decoder: The channel decoder first performs a Neyman–Pearson test on the channel output

y

according to

{\tilde{Π}}_{θ} : Y^{n} \to {0, 1}

, where

\begin{matrix} {\tilde{Π}}_{θ} (y) & : = 𝟙 (\sum_{i = 1}^{n} log (\frac{P_{Y | X} (y_{i} | s_{i})}{P_{Y | S} (y_{i} | s_{i})}) \geq n θ) . \end{matrix}

(20)

If

{\tilde{Π}}_{θ} (y) = 1

, then

\hat{M} = 0

. Else, for a given

B_{X, n}

, maximum likelihood (ML) decoding is done on the remaining set of codewords

{x (m), m \in M ∖ {0}}

, and

\hat{M}

is set equal to the ML estimate. Denote the channel decoder induced by the above operations by

g_{B_{X, n}}

, where

g_{B_{X, n}} : Y^{n} \to M

.

For a given codebook

B_{X, n}

, the channel encoder–decoder pair described above induces a distribution

\begin{matrix} P_{X Y \hat{M} | M}^{(B_{X, n})} (m, x, y, \hat{m} | m) : = 𝟙_{\{f_{B_{X, n}} (m) = x\}} P_{Y | X}^{\otimes n} (y | x) 𝟙_{\{\hat{m} = g_{B_{X, n}}\}} . \end{matrix}

Note that

P_{x (0) x (m)} = {\hat{P}}_{S X}

,

Y \sim \prod_{i = 1}^{| X |} P_{Y | X}^{\otimes ⌈ P_{S} (i) n ⌉} (\cdot | i)

for

M = 0

and

Y \sim \prod_{i = 1}^{| X |} P_{Y | S}^{\otimes ⌈ P_{S} (i) n ⌉} (\cdot | i)

for

M = m \neq 0

. Then, it follows by an application of Proposition A1 proved in Appendix C that for any

B_{X, n}

and n sufficiently large, the Neyman–Pearson test in (20) yields

\begin{matrix} P_{P^{(B_{X, n})}} (\hat{M} = 0 | M = m) & \leq e^{- n (E_{sp} (P_{S X}, θ) - η)}, m \in M ∖ {0}, \end{matrix}

(21a)

\begin{matrix} P_{P^{(B_{X, n})}} (\hat{M} \neq 0 | M = 0) & \leq e^{- n (E_{sp} (P_{S X}, θ) - θ - η)} . \end{matrix}

(21b)

Moreover, given

\hat{M} \neq 0

, a random coding argument over the ensemble of

B_{X}^{n}

(see Exercise 10.18, 10.24 in [38,44]) shows that there exists a deterministic codebook

B_{X, n}

such that (21a) and (21b) holds, and the ML decoding described above asymptotically achieves

\begin{matrix} P_{P^{(B_{X, n})}} (\hat{M} \neq m | M = m \neq 0, \hat{M} \neq 0) & \leq e^{- n (E_{ex} (R, P_{S X}) - η)} . \end{matrix}

(22)

This deterministic codebook

B_{X, n}

is used for channel coding.

Source decoder: For a given codebook

B_{W, n}

and inputs

\hat{M} = \hat{m}

and

V = v

, the source decoder first decodes for the quantization codeword

w ({\hat{m}}^{'})

(if required) using the empirical conditional entropy decoder, and then declares the output

\hat{H}

of the hypothesis test based on

w ({\hat{m}}^{'})

and

v

. More specifically, if binning is not performed, i.e., if

| M | \geq | M^{'} |

,

{\hat{M}}^{'} = \hat{m}

. Otherwise,

{\hat{M}}^{'} = {\hat{m}}^{'}

, where

{\hat{m}}^{'} = 0

if

\hat{m} = 0

and

{\hat{m}}^{'} = {arg min}_{j : f_{b} (j) = \hat{m}} H_{e} (w (j) | v)

otherwise. Denote the source decoder induced by the above operations by

g_{B_{W, n}} : M \times V^{n} \to M^{'}

.

Testing and Acceptance region: If

{\hat{m}}^{'} = 0

,

\hat{H} = 1

is declared. Otherwise,

\hat{H} = 0

or

\hat{H} = 1

is declared depending on whether

({\hat{m}}^{'}, v) \in A_{n}

or

({\hat{m}}^{'}, v) \notin A_{n}

, respectively, where

A_{n}

denotes the acceptance region for

H_{0}

as specified next. For a given codebook

B_{W, n}

, let

O_{m^{'}}

denote the set of

u

such that the source encoder outputs

m^{'}

,

m^{'} \in M^{'} ∖ {0}

. For each

m^{'} \in M^{'} ∖ {0}

and

u \in O_{m^{'}}

, let

\begin{matrix} Z_{m^{'}} (u) = {v \in V^{n} : (w (m^{'}), u, v) \in J_{n} (κ_{α} + η, P_{W_{m^{'}} U V})}, \end{matrix}

where

J_{n} (r, P_{X}) : = {x \in X^{n} : D (P_{x} | | P_{X}) \leq r}

,

\begin{matrix} P_{U V W_{m^{'}}} : = P_{U V} P_{W_{m^{'}} | U} and P_{W_{m^{'}} | U} = ω (P_{u}) . \end{matrix}

(23)

For

m^{'} \in M^{'} ∖ {0}

, set

Z_{m^{'}} : = {v : v \in Z_{m^{'}} (u) for some u \in O_{m^{'}}}

, and define the acceptance region for

H_{0}

at the decision maker as

A_{n} : = \cup_{m^{'} \in M^{'} ∖ 0} m^{'} \times Z_{m^{'}}

or equivalently as

A_{n}^{e} : = \cup_{m^{'} \in M^{'} ∖ 0} O_{m^{'}} \times Z_{m^{'}}

. Note that

A_{n}

is the same as the acceptance region for

H_{0}

in Theorem 1 in [14]. Denote the decision function induced by

g_{B_{X, n}}

,

g_{B_{W, n}}

and

A_{n}

by

g_{n} : Y^{n} \times V^{n} \to \hat{H}

.

Induced probability distribution: The PMFs induced by a code

c_{n} = (f_{n}, g_{n})

with respect to codebook

B_{n} : = (B_{W, n}, f_{b}, B_{X, n})

under

H_{0}

and

H_{1}

are

\begin{matrix} P_{UV M^{'} M XY \hat{M} {\hat{M}}^{'} \hat{H}}^{(B_{n}, c_{n})} (u, v, m^{'}, m, x, y, \hat{m}, {\hat{m}}^{'}, \hat{h}) \\ : = P_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}, f_{b} (m^{'}) = m\}} P_{X Y \hat{M} | M}^{(B_{X, n})} (x, y, \hat{m} | m) 𝟙_{\{g_{B_{W, n}} (m, v) = {\hat{m}}^{'}, \hat{h} = 𝟙_{\{({\hat{m}}^{'}, v) \in A_{n}^{c}\}}\}}, \\ Q_{UV M^{'} M XY \hat{M} {\hat{M}}^{'} \hat{H}}^{(B_{n}, c_{n})} (u, v, m^{'}, m, x, y, \hat{m}, {\hat{m}}^{'}, \hat{h}) \\ : = Q_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}, f_{b} (m^{'}) = m\}} P_{X Y \hat{M} | M}^{(B_{X, n})} (x, y, \hat{m} | m) 𝟙_{\{g_{B_{W, n}} (m, v) = {\hat{m}}^{'}, \hat{h} = 𝟙_{\{({\hat{m}}^{'}, v) \in A_{n}^{c}\}}\}}, \end{matrix}

respectively. For simplicity, we will denote the above distributions by

P^{(B_{n})}

and

Q^{(B_{n})}

. Let

B_{n} : = (B_{W, n}, f_{B}, B_{X, n})

,

B_{n}

, and

μ_{n}

denote the random codebook, its support, and the probability measure induced by its random construction, respectively. Additionally, define

{\bar{P}}_{P^{(B_{n})}} : = E_{μ_{n}} [P_{P^{(B_{n})}}]

and

{\bar{P}}_{Q^{(B_{n})}} : = E_{μ_{n}} [P_{Q^{(B_{n})}}]

.

Analysis of the type I and type II error probabilities: We analyze the type I and type II error probabilities averaged over the random ensemble of quantization and binning codebooks

(B_{W}, f_{B})

. Then, an expurgation technique [44] guarantees the existence of a sequence of deterministic codebooks

{B_{n}}_{n \in N}

and a code

{c_{n} = (f_{n}, g_{n})}_{n \in N}

that achieves the lower bound given in Theorem 1.

Type I error probability: In the following, random sets where the randomness is induced due to

B_{n}

will be written using blackboard bold letters, e.g.,

A_{n}

for the random acceptance region for

H_{0}

. Note that a type I error can occur only under the following events:

(i): $E_{EE} : = \overset{⋃}{P_{\hat{U}} \in D_{n} (P_{U}, η)} \overset{⋃}{u \in T_{n} (P_{\hat{U}})} E_{EE} (u), where$
$\begin{matrix} E_{EE} (u) & : = {∄ j \in M^{'} ∖ {0} s . t . (u, W (j)) \in T_{n} (P_{{\hat{U}}_{i} {\hat{W}}_{i}}), P_{{\hat{U}}_{i}} = P_{u}, \\ P_{{\hat{U}}_{i} {\hat{W}}_{i}} \in D_{n} (P_{U W}, η)}, \end{matrix}$
(ii): $E_{NE} : = {{\hat{M}}^{'} = M^{'} and ({\hat{M}}^{'}, V) \notin A_{n}},$
(iii): $E_{OCE} : = {M^{'} \neq 0, \hat{M} \neq M and ({\hat{M}}^{'}, V) \notin A_{n}},$
(iv): $E_{SCE} : = {M^{'} = M = 0, \hat{M} \neq M and ({\hat{M}}^{'}, V) \notin A_{n}},$
(v): $E_{BE} : = {M^{'} \neq 0, \hat{M} = M, {\hat{M}}^{'} \neq M^{'} and ({\hat{M}}^{'}, V) \notin A_{n}}$ .

Here,

E_{EE}

corresponds to the event that there does not exist a quantization codeword corresponding to atleast one sequence

u

of type

P_{u} \in D_{n} (P_{U}, η)

;

E_{NE}

corresponds to the event, in which, there is neither an error at the channel decoder nor at the empirical conditional entropy decoder;

E_{OCE}

and

E_{SCE}

corresponds to the case, in which there is an error at the channel decoder (hence also at the empirical conditional entropy decoder); and

E_{BE}

corresponds to the case that there is an error (due to binning) only at the empirical conditional entropy decoder. For the event

E_{EE}

, it follows from a slight generalization of the type-covering lemma (Lemma 9.1 in [38]) that

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{EE}) \leq e^{- e^{n Ω (η)}} . \end{matrix}

(24)

Since

e^{n Ω (η)} / n \overset{(n)}{\to} \infty

for

η > 0

, the event

E_{EE}

may be safely ignored from the analysis of the error-exponents. Given that

E_{EE}^{c}

holds for some

B_{W, n}

, it follows from Equation 4.22 in [14] that

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{NE} | E_{EE}^{c}) \leq e^{- n κ_{α}}, \end{matrix}

(25)

for sufficiently large n since the acceptance region is the same as that in Theorem 1 in [14].

Next, consider the event

E_{OCE}

. We have for sufficiently large n that

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{OCE}) & \leq {\bar{P}}_{P^{(B_{n})}} (M^{'} \neq 0) {\bar{P}}_{P^{(B_{n})}} (\hat{M} \neq M | M^{'} \neq 0) \\ \overset{(a)}{\leq} {\bar{P}}_{P^{(B_{n})}} (\hat{M} \neq M | M \neq 0) \\ \leq {\bar{P}}_{P^{(B_{n})}} (\hat{M} = 0 | M \neq 0) + {\bar{P}}_{P^{(B_{n})}} (\hat{M} \neq M | M \neq 0, \hat{M} \neq 0) \\ \overset{(b)}{\leq} e^{- n (E_{m} (P_{S X}, θ) - η)} + e^{- n (E_{ex} (R, P_{S X}) - η)} \\ = e^{- n (\min \{E_{m} (P_{S X}, θ), E_{ex} (R, P_{S X})\} - η)}, \end{matrix}

(26)

where

(a): holds since the event ${M^{'} \neq 0}$ is equivalent to ${M \neq 0}$ ;
(b): holds due to (21a) and (22), which holds for $B_{X, n}$ .

Additionally, the probability of

E_{SCE}

can be upper bounded as

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{SCE}) & \leq {\bar{P}}_{P^{(B_{n})}} (M^{'} = 0) \\ \leq {\bar{P}}_{P^{(B_{n})}} (M^{'} = 0 | U \in D_{n} (P_{U}, η)) + {\bar{P}}_{P^{(B_{n})}} (U \notin D_{n} (P_{U}, η)) \\ = {\bar{P}}_{P^{(B_{n})}} (E_{EE}) + {\bar{P}}_{P^{(B_{n})}} (U \notin D_{n} (P_{U}, η)) \\ \leq e^{- n κ_{α}}, \end{matrix}

(27)

where (27) is due to (24), the definition of

D_{n} (P_{U}, η)

in (15) and Lemma 2.2 and Lemma 2.6 in [38].

Finally, consider the event

E_{BE}

. Note that this event occurs only when

| M | \leq | M^{'} |

. Additionally,

M = 0

iff

M^{'} = 0

, and hence

M^{'} \neq 0

and

\hat{M} = M

implies that

\hat{M} \neq 0

. Let

\begin{matrix} D_{n} (P_{V W}, η) & : = \{P_{\hat{V} \hat{W}} : \begin{matrix} \exists (w, u, v) \in \underset{m^{'} \in M^{'} ∖ {0}}{\cup} J_{n} (κ_{α} + η, P_{W_{m^{'}} U V}), P_{W_{m^{'}} U V} satisfies \\ (23) and P_{w u v} = P_{\hat{W} \hat{U} \hat{V}} \end{matrix}\} . \end{matrix}

We have

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{BE}) & = {\bar{P}}_{P^{(B_{n})}} (E_{BE}, (M^{'}, V) \in A_{n}) + {\bar{P}}_{P^{(B_{n})}} (E_{BE}, (M^{'}, V) \notin A_{n}) . \end{matrix}

(28)

The second term in (28) can be upper bounded as

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{BE}, (M^{'}, V) \notin A_{n}) & \leq {\bar{P}}_{P^{(B_{n})}} ((M^{'}, V) \notin A_{n}, E_{EE}) + {\bar{P}}_{P^{(B_{n})}} ((M^{'}, V) \notin A_{n}, E_{EE}^{c}) \\ \leq e^{- e^{n Ω (η)}} + {\bar{P}}_{P^{(B_{n})}} ((M^{'}, V) \notin A_{n} | E_{EE}^{c}) \\ \leq e^{- e^{n Ω (η)}} + {\bar{P}}_{P^{(B_{n})}} ((U, V) \notin A_{n}^{e}) \\ \leq e^{- e^{n Ω (η)}} + e^{- n κ_{α}}, \end{matrix}

(29)

where the inequality in (29) follows from Equation (4.22) in [14] for sufficiently large n since the acceptance region

A_{n}^{e}

is the same as that in [14]. To bound the first term in (28), define

D_{n} (P_{V}, η) : = {P_{\hat{V}} : \exists P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η)}

, and observe that since

(M^{'}, V) \in A_{n}

implies

M^{'} \neq 0

, we have

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{BE}, (M^{'}, V) \in A_{n}) \\ = \sum_{(m^{'}, m) \in M^{'} \times M} {\bar{P}}_{P^{(B_{n})}} (E_{BE}, (M^{'}, V) \in A_{n}, M = m, M^{'} = m^{'}) \\ = \sum_{(m^{'}, m) \in M^{'} \times M} {\bar{P}}_{P^{(B_{n})}} (M = m, M^{'} = m^{'}, \hat{M} = M) \\ {\bar{P}}_{P^{(B_{n})}} ({\hat{M}}^{'} \neq M^{'}, ({\hat{M}}^{'}, V) \notin A_{n}, (M^{'}, V) \in A_{n} | M^{'} = m^{'}, M = m, \hat{M} = M) \\ \leq \sum_{(m^{'}, m) \in M^{'} \times M} {\bar{P}}_{P^{(B_{n})}} (M = m, M^{'} = m^{'}, \hat{M} = M) \\ {\bar{P}}_{P^{(B_{n})}} ({\hat{M}}^{'} \neq M^{'}, (M^{'}, V) \in A_{n} | M^{'} = m^{'}, M = m, \hat{M} = M) \\ \overset{(a)}{=} {\bar{P}}_{P^{(B_{n})}} ({\hat{M}}^{'} \neq M^{'}, (M^{'}, V) \in A_{n} | M^{'} = 1, M = 1, \hat{M} = M) \\ \overset{(b)}{\leq} \sum_{\begin{matrix} P_{v} \in D_{n} (P_{V}, η) \end{matrix}} \sum_{v \in P_{v}} {\bar{P}}_{P^{(B_{n})}} (V = v | M^{'} = 1) \end{matrix}

(30)

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (\exists j \in f_{B}^{- 1} (1), j \neq 1, H_{e} (W (j) | v) \leq H_{e} (W (1) | v) | M^{'} = 1, V = v), \end{matrix}

(31)

where

(a)

follows since by the symmetry of the source encoder, binning function and random codebook construction, the term in (30) is independent of

(m, m^{'})

;

(b)

holds since

(M^{'}, V) \in A_{n}

implies that

P_{v} \in D_{n} (P_{V}, η)

and

(V, B_{W}) - M^{'} - (M, \hat{M})

form a Markov chain. Defining

P_{\hat{V}} = P_{v}

, and the event

E_{1}^{'} : = {M^{'} = 1, V = v}

, we obtain

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (\exists j \in f_{B}^{- 1} (1), j \neq 1, H_{e} (W (j) | v) \leq H_{e} (W (1) | v) | E_{1}^{'}) \\ = \sum_{j \in M^{'} ∖ {0, 1}} {\bar{P}}_{P^{(B_{n})}} (f_{B} (j) = 1, H_{e} (W (j) | v) \leq H_{e} (W (1) | v) | E_{1}^{'}) \\ \overset{(a)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} {\bar{P}}_{P^{(B_{n})}} (H_{e} (W (j) | v) \leq H_{e} (W (1) | v) | E_{1}^{'}) \\ \overset{(b)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} \sum_{\begin{matrix} P_{\hat{W}} : P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}} \sum_{\begin{matrix} w : (v, w) \in T_{n} (P_{\hat{V} \hat{W}}) \end{matrix}} {\bar{P}}_{P^{(B_{n})}} (W (1) = w | E_{1}^{'}) \\ \sum_{\begin{matrix} \tilde{w} \in T_{n} (P_{\hat{W}}) : H_{e} (\tilde{w} | v) \leq H (\hat{W} | \hat{V}) \end{matrix}} {\bar{P}}_{P^{(B_{n})}} (W (j) = \tilde{w} | E_{1}^{'} \cup {W (1) = w}) \\ \overset{(c)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} \sum_{\begin{matrix} P_{\hat{W}} : P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}} \sum_{\begin{matrix} w : (v, w) \in T_{n} (P_{\hat{V} \hat{W}}) \end{matrix}} {\bar{P}}_{P^{(B_{n})}} (W (1) = w | E_{1}^{'}) \\ \sum_{\begin{matrix} \tilde{w} \in T_{n} (P_{\hat{W}}) : H_{e} (\tilde{w} | v) \leq H (\hat{W} | \hat{V}) \end{matrix}} 2 {\bar{P}}_{P^{(B_{n})}} (W (j) = \tilde{w}), \end{matrix}

(32)

where

(a): follows since $f_{B} (\cdot)$ is the uniform binning function independent of $B_{W, n}$ ;
(b): holds due to the fact that if $P_{v} \in D_{n} (P_{V}, η)$ , then $M^{'} = 1$ implies that $(W (1), v) \in T_{n} (P_{\hat{V} \hat{W}})$ with probability one for some $P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η)$ ;
(c): holds since ${\bar{P}}_{P^{(B_{n})}} (W (j) = \tilde{w} | E_{1}^{'} \cup {W (1) = w}) \leq 2 {\bar{P}}_{P^{(B_{n})}} (W (j) = \tilde{w})$ , which follows similarly to Equation (101) in [10].

Continuing, we can write for sufficiently large n,

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (\exists j \in f_{B}^{- 1} (1), j \neq 1, H_{e} (W (j) | v) \leq H_{e} (W (1) | v) | E_{1}^{'}) \\ \overset{(a)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} \sum_{\begin{matrix} P_{\hat{W}} : P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}} \sum_{\begin{matrix} w : (v, w) \in T_{n} (P_{\hat{V} \hat{W}}) \end{matrix}} {\bar{P}}_{P^{(B_{n})}} (W (1) = w | E_{1}^{'}) \\ \sum_{\begin{matrix} \tilde{w} \in T_{n} (P_{\hat{W}}) : H_{e} (\tilde{w} | v) \leq H (\hat{W} | \hat{V}) \end{matrix}} 2 e^{- n (H (\hat{W}) - η)} \\ \overset{(b)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} \sum_{\begin{matrix} P_{\hat{W}} : P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}} \sum_{\begin{matrix} w : (v, w) \in T_{n} (P_{\hat{V} \hat{W}}) \end{matrix}} {\bar{P}}_{P^{(B_{n})}} (W (1) = w | E_{1}^{'}) \\ {(n + 1)}^{| V | | W |} e^{n H (\hat{W} | \hat{V})} 2 e^{- n (H (\hat{W}) - η)} \\ \leq \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} \sum_{\begin{matrix} P_{\hat{W}} : P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}} 2 {(n + 1)}^{| V | | W |} e^{- n (I (\hat{W}; \hat{V}) - η)} \\ \overset{(c)}{\leq} \frac{1}{e^{n R_{n}}} \sum_{j \in M^{'} ∖ {0, 1}} 2 {(n + 1)}^{| W |} {(n + 1)}^{| V | | W |} e^{- n (min_{P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η)} I (\hat{W}; \hat{V}) - η)} \\ \overset{(d)}{\leq} e^{- n (R - R^{'} + ρ_{n} - η_{n}^{'})}, \end{matrix}

(33)

where

ρ_{n} : = \min_{P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η)} I (\hat{V}; \hat{W})

and

η_{n}^{'} : = 3 η + o (1)

. In the above,

(a): used Lemma 2.3 in [38] and the fact that the codewords are chosen uniformly at random from $T_{n} (P_{\hat{W}})$ ;
(b): follows since the total number of sequences $\tilde{w} \in T_{n} (P_{\hat{W}})$ such that $P_{\tilde{w} v} = P_{\tilde{W} \tilde{V}}$ and $H (\tilde{W} | \tilde{V}) \leq H (\hat{W} | \hat{V})$ is upper bounded by $e^{n H (\hat{W} | \hat{V})}$ , and $| T (W^{n} \times V^{n}) | \leq {(n + 1)}^{| V | | W |}$ ;
(c): holds due to Lemma 2.2 in [38];
(d): follows from $R^{'} : = ζ (κ_{α}$ , (14), (18) and (19).

Thus, since

ρ_{n} \to ρ (κ_{α}, ω) + O (η)

, we have from (28), (29), (31), (33) for large enough n that

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (E_{BE}) \leq e^{- n (\min \{κ_{α}, R - ζ (κ_{α}, ω) + ρ (κ_{α}, ω) - O (η)\})} . \end{matrix}

(34)

By choice of

(ω, P_{S X}, θ) \in L (κ_{α})

, it follows from (24), (25), (26), (27) and (34) that the type I error probability is upper bounded by

e^{- n (κ_{α} - O (η))}

for large n.

Type II error probability: We analyze the type II error probability averaged over

B_{n}

. A type II error can occur only under the following events:

(i): $E_{a} : = \{\begin{matrix} \hat{M} = M, {\hat{M}}^{'} = M^{'} \neq 0, (U, V, W (M^{'})) \in T_{n} (P_{\hat{U} \hat{V} \hat{W}}) \\ s . t . P_{\hat{U} \hat{W}} \in D_{n} (P_{U W}, η) and P_{\hat{V} \hat{W}} \in D_{n} (P_{V W}, η) \end{matrix}\},$
(ii): $E_{b} : = \{\begin{matrix} M^{'} \neq 0, \hat{M} = M, {\hat{M}}^{'} \neq M^{'}, f_{B} ({\hat{M}}^{'}) = f_{B} (M^{'}), (U, V, W (M^{'}), \\ W ({\hat{M}}^{'})) \in T_{n} (P_{\hat{U} \hat{V} \hat{W} {\hat{W}}_{d}}) s . t . P_{\hat{U} \hat{W}} \in D_{n} (P_{U W}, η), \\ P_{\hat{V} {\hat{W}}_{d}} \in D_{n} (P_{V W}, η) and H_{e} (W ({\hat{M}}^{'}) | V) \leq H_{e} (W (M^{'}) | V) \end{matrix}\},$
(iii): $E_{c} : = \{\begin{matrix} M^{'} \neq 0, \hat{M} \neq M o r 0, (U, V, W (M^{'}), W ({\hat{M}}^{'})) \in T_{n} (P_{\hat{U} \hat{V} \hat{W} {\hat{W}}_{d}}) s . t . \\ P_{\hat{U} \hat{W}} \in D_{n} (P_{U W}, η) and P_{\hat{V} {\hat{W}}_{d}} \in D_{n} (P_{V W}, η) \end{matrix}\},$
(iv): $E_{d} : = \{M = M^{'} = 0, \hat{M} \neq M, (V, W ({\hat{M}}^{'})) \in T_{n} (P_{\hat{V} {\hat{W}}_{d}}) s . t . P_{\hat{V} {\hat{W}}_{d}} \in D_{n} (P_{V W}, η)\} .$

Similar to (24), it follows that

{\bar{P}}_{Q^{(B_{n})}} (E_{EE}) \leq e^{- e^{n Ω (η)}}

. Hence, we may assume that

E_{EE}^{c}

holds for the type II error-exponent analysis. It then follows from the analysis in Equations (4.23)–(4.27) in [14] that for sufficiently large n,

\begin{matrix} {\bar{P}}_{Q^{(B_{n})}} (E_{a} | E_{EE}^{c}) \leq e^{- n (E_{1} (κ_{α}, ω) - O (η))} . \end{matrix}

The analysis of the error events

E_{b}

,

E_{c}

and

E_{d}

follows similarly to that in the proof of Theorem 2 in [10], and results in

\begin{matrix} - \frac{1}{n} log ({\bar{P}}_{Q^{(B_{n})}} (E_{b})) \\ ≳ \{\begin{matrix} \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{2} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + E_{b} (κ_{α}, ω, R) - O (η), & if R < ζ (κ_{α}, ω) + η, \\ \infty, & otherwise, \end{matrix} \\ = E_{2} (κ_{α}, ω, R) - O (η) . \\ \frac{- 1}{n} log ({\bar{P}}_{Q^{(B_{n})}} (E_{c})) \\ ≳ \{\begin{matrix} \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + E_{b} (κ_{α}, ω, R) & if R < ζ (κ_{α}, ω) + η \\ + E_{ex} (R, P_{S X}) - O (η), \\ \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + ρ (κ_{α}, ω) & otherwise, \\ + E_{ex} (R, P_{S X}) - O (η), \end{matrix} \\ = E_{3} (κ_{α}, ω, R, P_{S X}) - O (η) . \\ \frac{- 1}{n} log ({\bar{P}}_{Q^{(B_{n})}} (E_{d})) \\ ≳ \{\begin{matrix} \min_{P_{\tilde{V}} : P_{\tilde{V} \tilde{W}} \in D_{n} (P_{V W}, η)} D (P_{\tilde{V}} | | Q_{V}) + E_{b} (κ_{α}, ω, R) & if R < ζ (κ_{α}, ω) + η, \\ + E_{sp} (P_{S X}, θ) - θ - O (η), \\ \min_{\begin{matrix} P_{\tilde{V}} : P_{\tilde{V} \tilde{W}} \in D_{n} (P_{V W}, η) \end{matrix}} D (P_{\tilde{V}} | | Q_{V}) + ρ (κ_{α}, ω) & otherwise, \\ + E_{sp} (P_{S X}, θ) - θ - O (η), \end{matrix} \\ = E_{4} (κ_{α}, ω, R, P_{S X}, θ) - O (η) . \end{matrix}

Since the exponent of the type II error probability is lower bounded by the minimum of the exponent of the type II error-causing events, we have shown from the above that for a fixed

(ω, R, P_{S X}, θ) \in L (κ_{α})

and sufficiently large n,

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (\hat{H} = 1) & \leq e^{- n (κ_{α} - O (η))}, \end{matrix}

(35a)

\begin{matrix} {\bar{P}}_{Q^{(B_{n})}} (\hat{H} = 0) & \leq e^{- n ({\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) - O (η))}, \end{matrix}

(35b)

where

\begin{matrix} {\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) : = \min \{E_{1} (κ_{α}, ω), E_{2} (κ_{α}, ω, R), E_{3} (κ_{α}, ω, R, P_{S X}), E_{4} (κ_{α}, ω, R, P_{S X}, θ)\} . \end{matrix}

Expurgation: To complete the proof, we extract a deterministic codebook

B_{n}^{★}

that satisfies

\begin{matrix} P_{P^{(B_{n}^{★})}} (\hat{H} = 1) & \leq e^{- n (κ_{α} - O (η))}, \\ P_{Q^{(B_{n}^{★})}} (\hat{H} = 0) & \leq e^{- n ({\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) - O (η))} . \end{matrix}

For this purpose, remove a set

B_{n}^{'} \subset B_{n}

of highest type I error probability codebooks such that the remaining set

B_{n} ∖ B_{n}^{'}

has a probability of

τ \in (0.25, 0.5)

, i.e.,

μ_{n} (B_{n} ∖ B_{n}^{'}) = τ

. Then, it follows from (35a) and (35b) that for all

B_{n} \in B_{n} ∖ B_{n}^{'}

,

\begin{matrix} P_{P^{(B_{n})}} (\hat{H} = 1) & \leq 2 e^{- n (κ_{α} - O (η))}, \\ {\tilde{P}}_{Q^{(B_{n})}} (\hat{H} = 0) & \leq 4 e^{- n ({\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) - O (η))}, \end{matrix}

where

{\tilde{P}}_{Q^{(B_{n})}} = \frac{1}{τ} E_{μ_{n}} [P_{Q^{B_{n}}} 𝟙_{\{B_{n} \in B_{n} ∖ B_{n}^{'}\}}]

is a PMF. Perform one more similar expurgation step to obtain

B_{n}^{★} = (B_{W, n}^{★}, f_{b}^{★}, B_{X, n}^{★}) \in B_{n} ∖ B_{n}^{'}

such that for all sufficiently large n

\begin{matrix} P_{P^{(B_{n}^{★})}} (\hat{H} = 1) & \leq 2 e^{- n (κ_{α} - O (η))} \leq e^{- n (κ_{α} - O (η) - (log 2 / n))}, \\ P_{Q^{(B_{n}^{★})}} (\hat{H} = 0) & \leq 4 e^{- n ({\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) - O (η))} \leq e^{- n ({\bar{κ}}_{s} (κ_{α}, ω, R, P_{S X}, θ) - O (η) - (log 4 / n))} . \end{matrix}

Maximizing over

(ω, R, P_{S X}, θ) \in L (κ_{α})

and noting that

η > 0

is arbitrary completes the proof.

4.2. Proof of Corollary 1

Consider

(ω, P_{S X}, θ) \in L^{★} (κ_{α})

and

R = ζ (κ_{α}, ω)

. Then,

(ω, R, P_{S X}, θ) \in L (κ_{α})

. Additionally, for any

(P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω)

, we have

\begin{matrix} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) & = D (P_{\tilde{U} \tilde{W}} | | Q_{\tilde{U} \tilde{W}}) + D (P_{\tilde{V} | \tilde{U} \tilde{W}} | | Q_{\tilde{V} | \tilde{U} \tilde{W}} | P_{\tilde{U} \tilde{W}}) \\ \overset{(a)}{\geq} D (P_{\tilde{V} | \tilde{U} \tilde{W}} | | P_{V} | P_{\tilde{U} \tilde{W}}) \\ = D (P_{\tilde{V} \tilde{U} \tilde{W}} | | P_{V} P_{\tilde{U} \tilde{W}}) \\ \overset{(b)}{\geq} D (P_{\tilde{V} \tilde{W}} | | P_{V} P_{\tilde{W}}) \\ \overset{(c)}{=} D (P_{\hat{V} \hat{W}} | | P_{V} P_{\hat{W}}) \\ = I_{P} (\hat{V}; \hat{W}) + D (P_{\hat{V}} | | P_{V}), \end{matrix}

(36)

where

(a)

is due to the non-negativity of KL divergence and since

Q_{\tilde{V} | \tilde{U} \tilde{W}} = P_{V}

;

(b)

is because of the monotonicity of KL divergence Theorem 2.14 in [43];

(c)

follows since for

(P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω)

,

P_{\tilde{V} \tilde{W}} = P_{\hat{V} \hat{W}}

for some

P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)

. Minimizing over all

P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)

yields that

\begin{matrix} E_{1} (κ_{α}, ω) & = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) \\ \geq \min_{P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)} [I_{P} (\hat{V}; \hat{W}) + D (P_{\hat{V}} | | P_{V})] \\ = \min_{\begin{matrix} P_{\hat{V} \hat{W}} : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω) \end{matrix}} [I_{P} (\hat{V}; \hat{W}) + D (P_{\hat{V}} | | P_{V})] : = E_{1}^{i} (κ_{α}, ω), \end{matrix}

where the inequality above follows from (36). Next, since

ζ (κ_{α}, ω) = R

, we have that

E_{2} (κ_{α}, ω, R) = \infty

. Additionally, by the non-negativity of KL divergence

\begin{matrix} E_{3} (κ_{α}, ω, R, P_{S X}) & = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \\ \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + ρ (κ_{α}, ω) + E_{ex} (R, P_{S X}) \\ \geq ρ (κ_{α}, ω) + E_{ex} (ζ (κ_{α}, ω), P_{S X}) : = E_{2}^{i} (κ_{α}, ω, P_{S X}), \\ E_{4} (κ_{α}, ω, P_{S X}, θ) & = \min_{P_{\hat{V}} : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)} D (P_{\hat{V}} | | P_{V}) + ρ (κ_{α}, ω) + E_{m} (P_{S X}, θ) - θ \\ = ρ (κ_{α}, ω) + E_{m} (P_{S X}, θ) - θ : = E_{3}^{i} (κ_{α}, ω, P_{S X}, θ), \end{matrix}

where the final equality is since

P_{U V} P_{W | U} \in \hat{L} (κ_{α}, ω)

for

P_{W | U} : = ω (P_{U})

. The claim in (8) now follows from Theorem 1.

Next, we prove (10). Note that

\hat{L} (0, ω) = {P_{U V W} = P_{U V} P_{W | U} : P_{W | U} = ω (P_{U})}

and

L^{★} (0) = {(ω, P_{S X}, θ) \in F \times P (S \times X) \times Θ (P_{S X}) : I_{P} (U; W) < I_{P} (X; Y | S), P_{W | U} = ω (P_{U}), P_{S X Y} : = P_{S X} P_{Y | X}}

since

E_{sp} (P_{S X}, θ) \geq 0

and

E_{ex} (I_{P} (U; W), P_{S X}) \geq 0

. Hence, we have

\begin{matrix} E_{1}^{i} (0, ω) & \geq \min_{P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (0, ω)} I_{P} (\hat{V}; \hat{W}) = I_{P} (V; W) . \end{matrix}

Additionally,

ρ (0, ω) = I_{P} (V; W)

,

E_{2}^{i} (0, ω, P_{S X}) \geq ρ (0, ω)

and

E_{3}^{i} (0, ω, P_{S X}, θ) \geq ρ (0, ω)

. By choosing

P_{X S} = P_{X}^{★} P_{S}

where

P_{X}^{★}

is the capacity achieving input distribution, we have

I_{P} (X; Y | S) = C

. Then, it follows from (8) and the continuity of

E_{1}^{i} (κ_{α}, ω)

,

E_{2}^{i} (κ_{α}, ω, P_{S X})

and

E_{3}^{i} (κ_{α}, ω, P_{S X}, θ)

in

κ_{α}

that

{lim}_{κ_{α} \to 0} κ (κ_{α}) \geq κ_{i}^{★} (0)

. On the other hand,

{lim}_{κ_{α} \to 0} κ (κ_{α}) \leq κ_{i}^{★} (0)

follows from the converse proof in Proposition 7 in [10]. The proof of the cardinality bound

| W | \leq | U | + 1

follows from a standard application of the Eggleston–Fenchel–Carathéodory theorem (Theorem18 in [48]), thus completing the proof.

4.3. Proof of Corollary 2

Specializing Theorem 1 to TAD, note that

ρ (κ_{α}, ω) = 0

since

P_{\hat{U} \hat{V} \hat{W}} = Q_{U} Q_{V} P_{\hat{W} | \hat{U}} \in \hat{L} (κ_{α}, ω)

and

I_{P} (\hat{V}; \hat{W}) = 0

. Additionally, for

R \geq ζ (κ_{α}, ω)

,

E_{b} (κ_{α}, ω, R) = \infty

. Hence,

\begin{matrix} L (κ_{α}) - = - \{(ω, R, P_{S X}, θ) : \begin{matrix} ζ (κ_{α}, ω) \leq R < I_{P} (X; Y | S), P_{S X Y} = P_{S X} P_{Y | X}, \\ \min \{E_{sp} (P_{S X}, θ), E_{ex} (R, P_{S X})\} \geq κ_{α} \end{matrix}\}, \\ \hat{L} (κ_{α}, ω) : = \{P_{\hat{U} \hat{V} \hat{W}} : D (P_{\hat{U} \hat{V} \hat{W}} | | P_{U V \hat{W}}) \leq κ_{α}, P_{\hat{W} | \hat{U}} = ω (P_{\hat{U}}), P_{U V \hat{W}} = Q_{U} Q_{V} P_{\hat{W} | \hat{U}}\} . \end{matrix}

Then, we have

\begin{matrix} E_{1} (κ_{α}, ω) : = E_{1}^{d} (κ_{α}, ω) & : = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) \\ \overset{(a)}{\geq} \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{V} \tilde{W}} | | Q_{\tilde{V} \tilde{W}}) \\ \overset{(b)}{=} \min_{\begin{matrix} (P_{\hat{V} \hat{W}}, Q_{V \hat{W}}) : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω), \\ Q_{U V \hat{W}} = Q_{U V} P_{\hat{W} | \hat{U}} \end{matrix}} D (P_{\hat{V} \hat{W}} | | Q_{V \hat{W}}), \end{matrix}

(37)

where

(a)

follows due to the data-processing inequality for KL divergence Theorem 2.15 in [43];

(b)

is since

(P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{1} (κ_{α}, ω)

implies that

P_{\tilde{V} \tilde{W}} = P_{\hat{V} \hat{W}}

and

Q_{\tilde{U} \tilde{V} \tilde{W}} = Q_{U V} P_{\hat{W} | \hat{U}}

for some

P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)

. Next, note that since

R \geq ζ (κ_{α}, ω)

,

E_{2} (κ_{α}, ω, R) = \infty

. Additionally,

\begin{matrix} E_{3} (κ_{α}, ω, R, P_{S X}) & = \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) \in T_{3} (κ_{α}, ω) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W}} | | Q_{\tilde{U} \tilde{V} \tilde{W}}) + E_{ex} (R, P_{S X}) \end{matrix}

(38a)

\begin{matrix} \overset{(a)}{=} E_{ex} (R, P_{S X}), \end{matrix}

(38b)

\begin{matrix} E_{4} (κ_{α}, ω, P_{S X}, θ) & = \min_{P_{\hat{V}} : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (κ_{α}, ω)} D (P_{\hat{V}} | | Q_{V}) + E_{m} (P_{S X}, θ) - θ \end{matrix}

\begin{matrix} \overset{(b)}{=} E_{m} (P_{S X}, θ) - θ = : E_{3}^{d} (P_{S X}, θ), \end{matrix}

(38c)

where

(a): is obtained by taking $P_{\hat{U} \hat{V} \hat{W}} = Q_{U} Q_{V} P_{W | U} \in \hat{L} (κ_{α}, ω)$ and $P_{W | U} = ω (Q_{U})$ in the definition of $T_{3} (κ_{α}, ω)$ . This implies that $(P_{\tilde{U} \tilde{V} \tilde{W}}, Q_{\tilde{U} \tilde{V} \tilde{W}}) = (Q_{U V} P_{W | U}, Q_{U V} P_{W | U}) \in T_{3} (κ_{α}, ω)$ , and hence that the first term in the right hand side (RHS) of (38a) is zero;
(b): is due to $Q_{U} Q_{V} P_{W | U} \in \hat{L} (κ_{α}, ω)$ for $P_{W | U} = ω (Q_{U})$ .

Since

E_{ex} (R, P_{S X})

is a non-increasing function of R and

R \geq ζ (κ_{α}, ω)

, selecting

R = ζ (κ_{α}, ω)

maximizes

E_{3} (κ_{α}, ω, R, P_{S X})

. Then, (11) follows from (37), (38b) and (38c).

Next, we prove (12). Note that

ζ (0, ω) = I_{Q} (U; W)

, where

Q_{U W} = Q_{U} P_{W | U}

,

P_{W | U} = ω (Q_{U})

, and since

E_{sp} (P_{S X}, θ) \geq 0

and

E_{ex} (I_{Q} (U; W), P_{S X}) \geq 0

,

\begin{matrix} L^{★} (0) & = \{\begin{matrix} (ω, P_{S X}, θ) \in F \times P (S \times X) \times Θ (P_{S X}) : I_{Q} (U; W) < I_{P} (X; Y | S), \\ Q_{U V W} = Q_{U V} P_{W | U}, P_{W | U} = ω (Q_{U}), P_{S X Y} : = P_{S X} P_{Y | X} \end{matrix}\} . \end{matrix}

Additionally,

\hat{L} (0, ω) = \{Q_{U} Q_{V} P_{W | U} : P_{W | U} = ω (Q_{U})\}

. By choosing

θ = - θ_{l} (P_{S X})

(defined above (6a)) that maximizes

E_{3}^{d} (P_{S X}, θ)

, we have

\begin{matrix} E_{1}^{d} (0, ω) \geq \min_{\begin{matrix} (P_{\hat{V} \hat{W}}, Q_{V \hat{W}}) : P_{\hat{U} \hat{V} \hat{W}} \in \hat{L} (0, ω), \\ Q_{U V \hat{W}} = Q_{U V} P_{\hat{W} | \hat{U}} \end{matrix}} D (P_{\hat{V} \hat{W}} | | Q_{V \hat{W}}) \\ = \min_{\begin{matrix} (P_{W | U}, P_{S X}) : I_{Q} (U; W) \leq I_{P} (X; Y | S), \\ Q_{U V W} = Q_{U V} P_{W | U}, P_{S X Y} = P_{S X} P_{Y | X} \end{matrix}} D (Q_{V} Q_{W} | | Q_{V W}), \end{matrix}

(39a)

\begin{matrix} E_{2}^{d} (0, ω, P_{S X}) = E_{ex} (I_{Q} (U; W), P_{S X}), \end{matrix}

(39b)

\begin{matrix} E_{3}^{d} (P_{S X}, - θ_{l} (P_{S X})) = E_{m} (P_{S X}, - θ_{l} (P_{S X})) + θ_{l} (P_{S X}) = θ_{l} (P_{S X}), \end{matrix}

(39c)

where (39c) is due to

E_{m} (P_{S X}, - θ_{l} (P_{S X})) = 0

. The latter in turn follows similar to (A10) and (A11) from the definition of

E_{m} (\cdot, \cdot)

. From (11), (39a,39b,39c), and the continuity of

E_{1}^{d} (κ_{α}, ω)

,

E_{2}^{d} (κ_{α}, ω, P_{S X})

in

κ_{α}

, (12) follows. The proof of the cardinality bound

| W | \leq | U | + 1

in the RHS of (39a) follows via a standard application of the Eggleston–Fenchel–Carathéodory Theorem (Theorem 18 in [48]). To see this, note that it is sufficient to preserve

{Q_{U} (u), u \in U}

,

D (Q_{V} Q_{W} | | Q_{V W})

and

H_{Q} (U | W)

, all of which can be written as a linear combination of functionals of

Q_{U | W} (\cdot | w)

with weights

Q_{W} (w)

. Thus, it requires

| U | - 1

points to preserve

{Q_{U} (u), u \in U}

and one each for

D (Q_{V} Q_{W} | | Q_{V W})

and

H_{Q} (U | W)

. This completes the proof.

4.4. Proof of Theorem 2

We will show that the error-exponent pairs

(κ_{α}, κ_{h}^{★} (κ_{α}))

and

(κ_{α}, κ_{u}^{★} (κ_{α}))

are achieved by a hybrid coding scheme and uncoded transmission scheme, respectively. First, we describe the hybrid coding scheme.

Let

n \in N

,

| W | < \infty

,

κ_{α} > 0

, and

(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W},

P_{X^{'} | U S}) \in L_{h} (κ_{α})

. Further, let

η > 0

be a small number, and choose a sequence

s \in T_{n} (P_{\hat{S}})

, where

P_{\hat{S}}

satisfies

D (P_{\hat{S}} | | P_{S}) \leq η

. Set

R^{'} : = ζ^{'} (κ_{α}, ω^{'}, P_{\hat{S}})

.

Encoding: The encoder performs type-based quantization followed by hybrid coding [40]. The details are as follows:

Quantization codebook: Let

D_{n} (P_{U}, η)

be as defined in (15). Consider some ordering on the types in

D_{n} (P_{U}, η)

and denote the elements as

P_{{\hat{U}}_{i}}

,

i \in [| D_{n} (P_{U}, η) |]

. For each joint type

P_{\hat{S} {\hat{U}}_{i}}

such that

P_{{\hat{U}}_{i}} \in D_{n} (P_{U}, η)

and

\hat{S} ⫫ {\hat{U}}_{i}

, choose a joint type variable

P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}}

,

P_{{\hat{W}}_{i}} \in T (W^{n})

, such that

D (P_{{\hat{W}}_{i} | {\hat{U}}_{i} \hat{S}} | | P_{W_{i} | U \hat{S}} | P_{{\hat{U}}_{i} \hat{S}}) \leq η / 3

and

I (\hat{S}, {\hat{U}}_{i}; {\hat{W}}_{i}) \leq R^{'} + (η / 3)

, where

P_{W_{i} | U, S} = ω^{'} (P_{{\hat{U}}_{i}}, P_{\hat{S}})

. Define

D_{n} (P_{S U W}, η) : = \{P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}} : i \in [| D_{n} (P_{U}, η) |]\}

,

R_{i}^{'} : = I_{P} (\hat{S}, {\hat{U}}_{i}; {\hat{W}}_{i}) + (η / 3)

for

i \in [| D_{n} (P_{U}, η) |]

and

M_{i}^{'} : = [1 + \sum_{m = 1}^{i - 1} e^{n R_{m}^{'}} : \sum_{m = 1}^{i} e^{n R_{m}^{'}}], i \in [| D_{n} (P_{U}, η) |]

. Let

B_{W, n} = \{W (j) \in W^{n}, 1 \leq j \leq \sum_{i = 1}^{| D_{n} (P_{U}, η) |} e^{n R_{i}^{'}}\}

denote a random quantization codebook such that for

i \in [| D_{n} (P_{U}, η) |]

, each codeword

W (j)

,

j \in M_{i}^{'}

, is independently selected from

T_{n} (P_{{\hat{W}}_{i}})

according to uniform distribution, i.e.,

W (j) \sim Unif [T_{n} (P_{{\hat{W}}_{i}})]

. Let

B_{W, n}

denote a realization of

B_{W, n}

.

Type-based hybrid coding: For

u \in T_{n} (P_{{\hat{U}}_{i}})

such that

P_{{\hat{U}}_{i}} \in D_{n} (P_{U}, η)

for some

i \in [| D_{n} (P_{U}, η) |]

, let

\bar{M} (u, B_{W, n}) : = \{j \in M_{i}^{'} : w (j) \in B_{W, n}, (s, u, w (j)) \in T_{n} (P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}}), P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}} \in D_{n} (P_{S U W}, η)\} .

If

| \bar{M} (u, B_{W, n}) | \geq 1

, let

M^{'} (u, B_{W, n})

denote an index selected uniformly at random from the set

\bar{M} (u, B_{W, n})

; otherwise, set

M^{'} (u, B_{W, n}) = 0

. Given

B_{W, n}

and

u \in U^{n}

, the quantizer outputs

M^{'} = M^{'} (u, B_{W, n})

, where the support of

M^{'}

is

M^{'} : = {0} ⋃_{i = 1}^{| D_{n} (P_{U}, η) |} M_{i}^{'}

. Note that for sufficiently large n, it follows similarly to (18) that

| M^{'} | \leq e^{n (R^{'} + η)}

. For a given

B_{W, n}

and

u \in U^{n}

, the encoder transmits

X \sim P_{X | U S W}^{\otimes n} (\cdot | u, s, w (m^{'}))

if

M^{'} = m^{'} \neq 0

, and

X^{'} \sim P_{X^{'} | U S}^{\otimes n} (\cdot | u, s)

if

M^{'} = 0

.

Acceptance region: For a given codebook

B_{W, n}

and

m^{'} \in M^{'} ∖ {0}

, let

O_{m^{'}}

denote the set of

u

such that

M^{'} (u, B_{W, n}) = m^{'}

. For each

m^{'} \in M^{'} ∖ {0}

and

u \in O_{m^{'}}

, set

\begin{matrix} Z_{m^{'}}^{'} (u) = \{(v, y) \in V^{n} \times Y^{n} : (s, u, {\bar{w}}_{m^{'}}, v, y) \in J_{n} (κ_{α} + η, P_{\hat{S} U W_{m^{'}} V Y})\}, \end{matrix}

where recall that

J_{n} (r, P_{X}) : = {x \in X^{n} : D (P_{x} | | P_{X}) \leq r}

, and

P_{\hat{S} U W_{m^{'}} V X Y} = P_{\hat{S}} P_{U V} P_{W_{m^{'}} | U \hat{S}} P_{X | U \hat{S} W_{m^{'}}} P_{Y | X},

(40a)

P_{W_{m^{'}} | U \hat{S}} = ω^{'} (P_{u}, P_{\hat{S}}) and P_{X | U \hat{S} W_{m^{'}}} = P_{X | U S W} .

(40b)

For

m^{'} \in M^{'} ∖ {0}

, define

Z_{m^{'}}^{'} : = {(v, y) : (v, y) \in Z_{m^{'}}^{'} (u) for some u \in O_{m^{'}}} .

The acceptance region for

H_{0}

is given by

A_{n} : = \cup_{m^{'} \in M^{'} ∖ 0} s \times m^{'} \times Z_{m^{'}}^{'}

or equivalently as

A_{n}^{e} : = \cup_{m^{'} \in M^{'} ∖ 0} s \times O_{m^{'}} \times Z_{m^{'}}^{'} .

Decoding: Given codebook

B_{W, n}

,

Y = y

, and

V = v

, if

(v, y) \in ⋃_{m^{'} \in M^{'} ∖ {0}} Z_{m^{'}}^{'}

, then

{\hat{M}}^{'} = {\hat{m}}^{'}

, where

{\hat{m}}^{'} : = {arg min}_{j \in M^{'} ∖ 0} H_{e} (w (j) | v, y, s)

. Otherwise,

{\hat{M}}^{'} = 0

. Denote the decoder induced by the above operations by

g_{B_{W, n}} : S^{n} \times V^{n} \times Y^{n} \to M^{'}

.

Testing: If

{\hat{M}}^{'} = 0

,

\hat{H} = 1

is declared. Otherwise,

\hat{H} = 0

or

\hat{H} = 1

is declared depending on whether

(s, {\hat{m}}^{'}, v, y) \in A_{n}

or

(s, {\hat{m}}^{'}, v, y) \notin A_{n}

, respectively. Denote the decision function induced by

g_{B_{W, n}}

and

A_{n}

by

g_{n} : S^{n} \times V^{n} \times Y^{n} \to \hat{H}

.

Induced probability distribution: The PMFs induced by a code

c_{n} = (f_{n}, g_{n})

with respect to codebook

B_{W, n}

under

H_{0}

and

H_{1}

are

\begin{matrix} P_{UV M^{'} XY {\hat{M}}^{'} \hat{H}}^{(B_{W, n}, c_{n})} (u, v, m^{'}, x, y, {\hat{m}}^{'}, \hat{h}) \\ : = \{\begin{matrix} P_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}\}} P_{X | U S W}^{\otimes n} (x | s, u, w (m^{'})) P_{Y | X}^{\otimes n} (y | x) \\ 𝟙_{\{g_{B_{W, n}} (v, y, s) = {\hat{m}}^{'}\}} 𝟙_{\{\hat{h} = 𝟙_{\{(s, {\hat{m}}^{'}, v, y) \in A_{n}^{c}\}}\}}, & if m^{'} \neq 0, \\ P_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}\}} P_{X^{'} | U S}^{\otimes n} (x | s, u) P_{Y | X}^{\otimes n} (y | x) \\ 𝟙_{\{g_{B_{W, n}} (v, y, s) = {\hat{m}}^{'}\}} 𝟙_{\{\hat{h} = 𝟙_{\{(s, {\hat{m}}^{'}, v, y) \in A_{n}^{c}\}}\}}, & otherwise, \end{matrix} \end{matrix}

and

\begin{matrix} Q_{UV M^{'} XY {\hat{M}}^{'} \hat{H}}^{(B_{W, n}, c_{n})} (u, v, m^{'}, x, y, {\hat{m}}^{'}, \hat{h}) \\ : = \{\begin{matrix} Q_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}\}} P_{X | U S W}^{\otimes n} (x | s, u, w (m^{'})) P_{Y | X}^{\otimes n} (y | x) \\ 𝟙_{\{g_{B_{W, n}} (v, y, s) = {\hat{m}}^{'}\}} 𝟙_{\{\hat{h} = 𝟙_{\{(s, {\hat{m}}^{'}, v, y) \in A_{n}^{c}\}}\}}, & if m^{'} \neq 0, \\ Q_{U V}^{\otimes n} (u, v) 𝟙_{\{M^{'} (u, B_{W, n}) = m^{'}\}} P_{X^{'} | U S}^{\otimes n} (x | s, u) P_{Y | X}^{\otimes n} (y | x) \\ 𝟙_{\{g_{B_{W, n}} (v, y, s) = {\hat{m}}^{'}\}} 𝟙_{\{\hat{h} = 𝟙_{\{(s, {\hat{m}}^{'}, v, y) \in A_{n}^{c}\}}\}}, & otherwise, \end{matrix} \end{matrix}

respectively. For brevity, we will denote

B_{W, n}

by

B_{n}

,

B_{W, n}

by

B_{n}

, and the above probability distributions by

P^{(B_{n})}

and

Q^{(B_{n})}

. Let

B_{n}

and

μ_{n}

stand for the support and probability measure of

B_{n}

, respectively, and set

{\bar{P}}_{P^{(B_{n})}} : = E_{μ_{n}} [P_{P^{(B_{n})}}]

,

{\bar{P}}_{Q^{(B_{n})}} : = E_{μ_{n}} [P_{Q^{(B_{n})}}]

Analysis of the type I and type II error probabilities: We analyze the expected type I and type II error probabilities, where the expectation is with respect to the randomness of

B_{n}

, followed by the expurgation technique to extract a sequence of deterministic codebooks

{B_{n}}_{n \in N}

and a code

{c_{n} = (f_{n}, g_{n})}_{n \in N}

that achieves the lower bound in Theorem 2.

Type I error probability: Denoting by

A_{n}

the random acceptance region for

H_{0}

, note that a type I error can occur only under the following events:

(i): $E_{EE}^{'} : = ⋃_{P_{\hat{U}} \in D_{n} (P_{U}, η)} ⋃_{u \in T_{n} (P_{\hat{U}})} E_{EE}^{'} (u)$ , where

$\begin{matrix} E_{EE}^{'} (u) : = \{\begin{matrix} ∄ j \in M^{'} ∖ {0} s . t . (s, u, W (j)) \in T_{n} (P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}}), P_{\hat{S} {\hat{U}}_{i}} = P_{s u}, \\ P_{\hat{S} {\hat{U}}_{i} {\hat{W}}_{i}} \in D_{n} (P_{S U W}, η) \end{matrix}\}, \end{matrix}$
(ii): $E_{NE}^{'} : = {{\hat{M}}^{'} = M^{'}$ and $(s, {\hat{M}}^{'}, V, Y) \notin A_{n}}$ ,
(iii): $E_{ODE}^{'} : = {M^{'} \neq 0$ , ${\hat{M}}^{'} \neq M^{'}$ and $(s, {\hat{M}}^{'}, V, Y) \notin A_{n}}$ ,
(iv): $E_{SDE}^{'} : = {M^{'} = 0$ , ${\hat{M}}^{'} \neq M^{'}$ and $(s, {\hat{M}}^{'}, V, Y) \notin A_{n}}$ .

By definition of

R_{i}^{'}

, we have, similar to (24), the following:

\begin{matrix} {\bar{P}}_{P^{B_{n}}} (E_{EE}^{'}) \leq e^{- e^{n Ω (η)}} . \end{matrix}

(41)

Next, the event

E_{NE}^{'}

can be upper bounded as

\begin{matrix} {\bar{P}}_{P^{B_{n}}} (E_{NE}^{'} | E_{EE}^{' c}) & \leq {\bar{P}}_{P^{B_{n}}} ((s, {\hat{M}}^{'}, V, Y) \notin A_{n} | {\hat{M}}^{'} = M^{'}, E_{E E}^{' c}) \\ = 1 - {\bar{P}}_{P^{B_{n}}} ((s, U, V, Y) \in A_{n}^{e} | E_{E E}^{' c}) . \end{matrix}

(42)

For

u \in O_{m^{'}}

, note that, similar to Equation 4.17 in [14], we have

\begin{matrix} {\bar{P}}_{P^{B_{n}}} ((V, Y) \in Z_{m^{'}}^{'} (u) | U = u, W (m^{'}) = {\bar{w}}_{m^{'}}, E_{E E}^{' c}) \geq 1 - e^{- n (κ_{α} + \frac{η}{3} - D (P_{u} | | P_{U}))} . \end{matrix}

From this and (15), we obtain, similar to Equation (4.22) in [14] that

\begin{matrix} {\bar{P}}_{P^{B_{n}}} ((s, U, V, Y) \in A_{e}^{n} | E_{E E}^{' c}) \geq 1 - e^{- n κ_{α}} . \end{matrix}

(43)

Substituting (43) in (42) yields

\begin{matrix} {\bar{P}}_{P^{B_{n}}} (E_{NE}^{'} | E_{EE}^{' c}) \leq e^{- n κ_{α}} . \end{matrix}

(44)

Next, we bound the probability of the event

E_{ODE}^{'}

as follows:

\begin{matrix} {\bar{P}}_{P^{B_{n}}} (E_{ODE}^{'}) \\ = {\bar{P}}_{P^{B_{n}}} (M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, M^{'}, V, Y) \in A_{n}, (s, {\hat{M}}^{'}, V, Y) \notin A_{n}) \\ + {\bar{P}}_{P^{B_{n}}} (M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, M^{'}, V, Y) \notin A_{n}, (s, {\hat{M}}^{'}, V, Y) \notin A_{n}) \\ \leq {\bar{P}}_{P^{B_{n}}} (M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, M^{'}, V, Y) \in A_{n}, (s, {\hat{M}}^{'}, V, Y) \notin A_{n}) \\ + {\bar{P}}_{P^{B_{n}}} (M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, M^{'}, V, Y) \notin A_{n}) \end{matrix}

\begin{matrix} \overset{(a)}{\leq} {\bar{P}}_{P^{B_{n}}} (M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, M^{'}, V, Y) \in A_{n}, (s, {\hat{M}}^{'}, V, Y) \notin A_{n}) + e^{- e^{n Ω (η)}} + e^{- n κ_{α}} \end{matrix}

(45)

\begin{matrix} \leq {\bar{P}}_{P^{B_{n}}} ({\hat{M}}^{'} \neq M^{'} | M^{'} \neq 0, (s, M^{'}, V, Y) \in A_{n}) + e^{- e^{n Ω (η)}} + e^{- n κ_{α}}, \end{matrix}

\begin{matrix} \overset{(b)}{=} {\bar{P}}_{P^{B_{n}}} ({\hat{M}}^{'} \neq M^{'} | M^{'} \neq 0, {\hat{M}}^{'} \neq 0, (s, M^{'}, V, Y) \in A_{n}) \end{matrix}

(46)

\begin{matrix} \overset{(c)}{\leq} e^{- n (ρ^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) - ζ^{'} (κ_{α}, ω^{'}, P_{\hat{S}}) - O (η))}, \end{matrix}

(47)

where

(a)

follows similar to (29) using (41) and (43);

(b)

is since

(s, M^{'}, V, Y) \in A_{n}

implies that

{\hat{M}}^{'} \neq 0

; and

(c)

follows similar to (33). Further,

\begin{matrix} {\bar{P}}_{P^{B_{n}}} (E_{SDE}^{'}) & \leq {\bar{P}}_{P^{B_{n}}} (M^{'} = 0) \\ \leq {\bar{P}}_{P^{B_{n}}} (M^{'} = 0 | E_{E E}^{' c}) + {\bar{P}}_{P^{B_{n}}} (E_{E E}^{'}) \\ = \sum_{\begin{matrix} u : P_{u} \notin D_{n} (P_{U}, η) \end{matrix}} P_{U}^{\otimes n} (u) + {\bar{P}}_{P^{B_{n}}} (E_{E E}^{'}) \\ \leq e^{- n κ_{α}} + e^{- e^{n Ω (η)}}, \end{matrix}

(48)

where the penultimate equality is since given

E_{E E}^{' c}

,

M^{'} = 0

occurs only for

U = u

such that

P_{u} \notin D_{n} (P_{U}, η)

, and the final inequality follows from (41), the definition of

D_{n} (P_{U}, η)

and Lemma 1.6 in [38]. From (41), (44), (47) and (48), the expected type I error probability satisfies

e^{- n (κ_{α} - O (η))}

for sufficiently large n via the union bound.

Type II error probability: Next, we analyze the expected type II error probability over

B_{n}

. Let

\begin{matrix} D_{n} (P_{S V W Y}, η) : = \{\begin{matrix} P_{\hat{S} \hat{V} \hat{W} \hat{Y}} : & \exists (s, u, v, \bar{w}, y) \in \underset{m^{'} \in M^{'} ∖ {0}}{\cup} J_{n} (κ_{α} + η, P_{\hat{S} U V W_{m^{'}} Y}), \\ P_{\hat{S} U V W_{m^{'}} Y} satisfies (40) and P_{s u v \bar{w} y} = P_{\hat{S} \hat{U} \hat{V} \hat{W} \hat{Y}} \end{matrix}\}, \\ F_{1, n}^{'} (η) : = \{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \in T (S^{n} \times U^{n} \times V^{n} \times W^{n} \times Y^{n}) & : P_{\hat{S} \tilde{U} \tilde{W}} \in D_{n} (P_{S U W}, η), \\ P_{\hat{S} \tilde{V} \tilde{W} \tilde{Y}} \in D_{n} (P_{S V W Y}, η) \end{matrix}\} . \end{matrix}

A type II error can occur only under the following events:

(a): $E_{a}^{'} : = \{\begin{matrix} {\hat{M}}^{'} = M^{'} \neq 0, (s, U, V, W (M^{'}), Y) \in T_{n} (P_{\hat{S} \hat{U} \hat{V} \hat{W} \hat{Y}}) \\ s . t . P_{\hat{U} \hat{W}} \in D_{n} (P_{S U W}, η) and P_{\hat{S} \hat{V} \hat{W} \hat{Y}} \in D_{n} (P_{S V W Y}, η) \end{matrix}\}$ ,
(b): $E_{b}^{'} : = \{\begin{matrix} M^{'} \neq 0, {\hat{M}}^{'} \neq M^{'}, (s, U, V, W (M^{'}), Y, W ({\hat{M}}^{'})) \in T_{n} (P_{\hat{S} \hat{U} \hat{V} \hat{W} \hat{Y} {\hat{W}}_{d}}) \\ s . t . P_{\hat{S} \hat{U} \hat{W}} \in D_{n} (P_{S U W}, η), P_{\hat{S} \hat{V} {\hat{W}}_{d} \hat{Y}} \in D_{n} (P_{S V W Y}, η), \\ and H_{e} (W ({\hat{M}}^{'}) | s, V, Y) \leq H_{e} (W (M^{'}) | s, V, Y) \end{matrix}\}$ ,
(c): $E_{c}^{'} : = \{\begin{matrix} M^{'} = 0, {\hat{M}}^{'} \neq M^{'}, (s, V, Y, W ({\hat{M}}^{'})) \in T_{n} (P_{\hat{S} \hat{V} \hat{Y} {\hat{W}}_{d}}) \\ s . t . P_{\hat{S} \hat{V} {\hat{W}}_{d} \hat{Y}} \in D_{n} (P_{S V W Y}, η) \end{matrix}\} .$

Considering the event

E_{a}^{'}

, we have

\begin{matrix} {\bar{P}}_{Q^{B_{n}}} (E_{a}^{'}) \\ \leq \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \\ \in F_{1, n}^{'} (η) \end{matrix}} \sum_{\begin{matrix} (u, v, \bar{w}, y) : \\ (s, u, v, \bar{w}, y) \\ \in T_{n} (P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}}) \end{matrix}} \sum_{\begin{matrix} m^{'} \in M^{'} ∖ {0} \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (U = u, V = v, M^{'} = m^{'}, W (m^{'}) = \bar{w}, Y = y | S = s) \\ \leq \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \in F_{1, n}^{'} (η) \end{matrix}} \sum_{\begin{matrix} (u, v, \bar{w}, y) : \\ (s, u, v, \bar{w}, y) \in T_{n} (P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}}) \end{matrix}} \sum_{m^{'} \in M^{'} ∖ {0}} {\bar{P}}_{Q^{B_{n}}} (U = u, V = v, M^{'} = m^{'} | S = s) \\ {\bar{P}}_{Q^{B_{n}}} (W (m^{'}) = \bar{w} | U = u, V = v, M^{'} = m^{'}, S = s) \\ {\bar{P}}_{Q^{B_{n}}} (Y = y | U = u, V = v, M^{'} = m^{'}, W (m^{'}) = \bar{w}, S = s) \\ \overset{(a)}{\leq} \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \in F_{1, n}^{'} (η) \end{matrix}} \sum_{\begin{matrix} (u, v, \bar{w}, y) : \\ (s, u, v, \bar{w}, y) \in T_{n} (P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}}) \end{matrix}} e^{- n (H (\tilde{U}, \tilde{V}) + D (P_{\tilde{U} \tilde{V}} | | Q_{U V}))} e^{- n (H (\tilde{W} | \hat{S}, \tilde{U}) - η)} \\ e^{- n (H (\tilde{Y} | \tilde{U}, \hat{S}, \tilde{W}) + D (P_{\tilde{Y} | \tilde{U} \hat{S} \tilde{W}} | | P_{Y | U S W} | P_{\tilde{U} \hat{S} \tilde{W}}))} \\ \leq \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \in F_{1, n}^{'} (η) \end{matrix}} e^{n H (\tilde{U}, \tilde{V}, \tilde{W}, \tilde{Y} | \hat{S})} e^{- n (H (\tilde{U}, \tilde{V}) + D (P_{\tilde{U} \tilde{V}} | | Q_{U V}))} e^{- n (H (\tilde{W} | \hat{S}, \tilde{U}) - η)} \\ e^{- n (H (\tilde{Y} | \tilde{U}, \hat{S}, \tilde{W}) + D (P_{\tilde{Y} | \tilde{U} \hat{S} \tilde{W}} | | P_{Y | U S W} | P_{\tilde{U} \hat{S} \tilde{W}}))} \\ \leq e^{- n E_{1, n}^{'}}, \end{matrix}

(49)

where

\begin{matrix} E_{1, n}^{'} & : = \min_{P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}} \in F_{1, n}^{'} (η)} H (\tilde{U}, \tilde{V}) + D (P_{\tilde{U} \tilde{V}} | | Q_{U V}) + H (\tilde{W} | \hat{S}, \tilde{U}) - η + H (\tilde{Y} | \tilde{U}, \hat{S}, \tilde{W}) \\ + D (P_{\tilde{Y} | \tilde{U} \hat{S} \tilde{W}} | | P_{Y | U S W} | P_{\tilde{U} \hat{S} \tilde{W}}) - H (\tilde{U}, \tilde{V}, \tilde{W}, \tilde{Y} | \hat{S}) - \frac{1}{n} | | U | | V | | W | | Y | log (n + 1) \\ ≳ \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) \\ \in T_{1}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | | Q_{U V W Y | S} | P_{S}) - O (η) \\ = E_{1}^{'} (κ_{α}, ω^{'}) - O (η) . \end{matrix}

For the inequality in

(a)

above, we used

\sum_{} {\bar{P}}_{Q^{B_{n}}} (M^{'} = m^{'} | U = u, V = v, S = s) \leq 1

and

\begin{matrix} {\bar{P}}_{Q^{B_{n}}} (W (m^{'}) = \bar{w} | U = u, V = v, S = s, M^{'} = m^{'}) \leq \{\begin{matrix} e^{- n (H (\tilde{W} | \hat{S}, \tilde{U}) - η)}, & if \bar{w} \in T_{n} (\tilde{W}), \\ 0, & otherwise, \end{matrix} \end{matrix}

which in turn follows from the fact that given

M^{'} = m^{'}

and

U = u

,

W (m^{'})

is uniformly distributed in the set

T_{n} (P_{\tilde{W} | \hat{S} \tilde{U}}, s, u)

and that for sufficiently large n

| T_{n} (P_{\tilde{W} | \hat{S} \tilde{U}}, s, u) | \geq e^{n (H (\tilde{W} | \hat{S}, \tilde{U}) - η)}

.

Next, we analyze the probability of the event

E_{b}^{'}

. Let

\begin{matrix} F_{2, n}^{'} (η) : = \{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y} {\tilde{W}}_{d}} : & P_{\hat{S} \tilde{U} \tilde{W}} \in D_{n} (P_{S U W}, η), P_{\hat{S} \tilde{V} {\tilde{W}}_{d} \tilde{Y}} \in D_{n} (P_{S V W Y}, η) \\ H ({\tilde{W}}_{d} | \hat{S}, \tilde{V}, \tilde{Y}) \leq H (\tilde{W} | \hat{S}, \tilde{V}, \tilde{Y}) \end{matrix}\} . \end{matrix}

Then,

\begin{matrix} {\bar{P}}_{Q^{B_{n}}} (E_{b}^{'}) \\ \leq \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y} {\tilde{W}}_{d}} \\ \in F_{2, n}^{'} (η) \end{matrix}} \sum_{\begin{matrix} (u, v, \bar{w}, y, w^{'}) : \\ (s, u, v, \bar{w}, y, w^{'}) \\ \in T_{n} (P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y} {\tilde{W}}_{d}}) \end{matrix}} \sum_{\begin{matrix} m^{'} \in \\ M^{'} ∖ {0} \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (U = u, V = v, M^{'} = m^{'}, W (m^{'}) = \bar{w}, Y = y | S = s) \\ \sum_{\begin{matrix} {\hat{m}}^{'} \in M^{'} ∖ {0, m^{'}} \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (\bar{W} ({\hat{m}}^{'}) = w^{'} | U = u, M^{'} = m^{'}, W (m^{'}) = \bar{w}, S = s) \\ \leq \sum_{\begin{matrix} P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y} {\tilde{W}}_{d}} \\ \in F_{2, n}^{'} (η) \end{matrix}} \sum_{\begin{matrix} (u, v, \bar{w}, y) : \\ (s, u, v, \bar{w}, y) \in T_{n} (P_{\hat{S} \tilde{U} \tilde{V} \tilde{W} \tilde{Y}}) \end{matrix}} e^{- n (H (\tilde{U}, \tilde{V}) + D (P_{\tilde{U} \tilde{V}} | | Q_{U V}))} e^{- n (H (\tilde{W} | \hat{S}, \tilde{U}) - η)} \\ e^{- n (H (\tilde{Y} | \tilde{U}, \hat{S}, \tilde{W}) + D (P_{\tilde{Y} | \tilde{U} \hat{S} \tilde{W}} | | P_{Y | U S W} | P_{\tilde{U} \hat{S} \tilde{W}}))} e^{n (ζ^{'} (κ_{α}, ω^{'}, P_{\hat{S}}) + η)} \frac{2 e^{n H ({\tilde{W}}_{d} | \hat{S}, \tilde{V}, \tilde{Y})}}{e^{n (H ({\tilde{W}}_{d}) - η)}} \\ \leq e^{- n E_{2, n}^{'}}, \end{matrix}

(50)

where

\begin{matrix} E_{2, n}^{'} & ≳ \min_{\begin{matrix} (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}, Q_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} S}) \in T_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} D (P_{\tilde{U} \tilde{V} \tilde{W} \tilde{Y} | S} | | Q_{U V W Y | S} | P_{S}) + ρ^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \\ - ζ^{'} (κ_{α}, ω^{'}, P_{S}) - O (η) \\ = E_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) - O (η) . \end{matrix}

Finally, considering the event

E_{c}^{'}

, we have

\begin{matrix} {\bar{P}}_{Q^{B_{n}}} (E_{c}^{'}) = \sum_{\begin{matrix} u \in T_{n} (P_{\tilde{U}}) : \\ P_{\tilde{U}} \in D_{n} (P_{U}, η) \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (U = u, E_{EE}^{'}, E_{c}^{'} | S = s) + \sum_{\begin{matrix} u \in T_{n} (P_{\tilde{U}}) : \\ P_{\tilde{U}} \notin D_{n} (P_{U}, η) \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (U = u, E_{c}^{'} | S = s) . \end{matrix}

The first term in the RHS decays double exponentially as

e^{- e^{n Ω (η)}}

, while the second term can be handled as follows:

\begin{matrix} \sum_{\begin{matrix} u \in T_{n} (P_{\tilde{U}}) : P_{\tilde{U}} \notin D_{n} (P_{U}, η) \end{matrix}} {\bar{P}}_{Q^{B_{n}}} (U = u, E_{c}^{'} | S = s) \\ \leq \sum_{\begin{matrix} u \in T_{n} (P_{\tilde{U}}) : \\ P_{\tilde{U}} \notin D_{n} (P_{U}, η) \end{matrix}} \sum_{\begin{matrix} (v, y, w^{'}) : \\ (s, v, y, w^{'}) \in T_{n} (P_{\hat{S} \tilde{V} \tilde{Y} {\tilde{W}}_{d}}), \\ P_{\hat{S} \tilde{V} {\tilde{W}}_{d} \tilde{Y}} \in D_{n} (P_{S V W Y}, η) \end{matrix}} \sum_{\hat{m} \in M ∖ {0}} {\bar{P}}_{Q^{B_{n}}} (U = u, V = v, M^{'} = 0, Y = y | S = s) \\ \sum_{{\hat{m}}^{'} \in M^{'} ∖ {0}} {\bar{P}}_{Q^{B_{n}}} (W ({\hat{m}}^{'}) = \bar{w}) \\ \leq \sum_{\begin{matrix} P_{\tilde{U} \hat{S} \tilde{V} {\tilde{W}}_{d} \tilde{Y}} \in D_{n} {(P_{U}, η)}^{c} \\ \times D_{n} (P_{S V W Y}, η) \end{matrix}} e^{n H (\tilde{U}, \tilde{V}, \tilde{Y} | \hat{S})} e^{- n (H (\tilde{U}, \tilde{V}, \tilde{Y} | \hat{S}) + D (P_{\tilde{U} \tilde{V} \tilde{Y} | \hat{S}} | | Q_{U V Y^{'} | \hat{S}} | P_{\hat{S}}))} \frac{e^{n H ({\tilde{W}}_{d} | \hat{S}, \tilde{V}, \tilde{Y})} e^{n (R^{'} + η)}}{e^{n (H ({\tilde{W}}_{d}) - η)}} \\ \leq e^{- n E_{3, n}^{'}}, \end{matrix}

(51)

where

\begin{matrix} E_{3, n}^{'} & ≳ \min_{\begin{matrix} P_{\hat{V} \hat{Y} S} : P_{\hat{U} \hat{V} \hat{W} \hat{Y} S} \in \\ {\hat{L}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) \end{matrix}} - D (P_{\hat{V} \hat{Y} | S} | | Q_{V Y^{'} | S} | P_{S}) + ρ^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}) - ζ^{'} (κ_{α}, ω^{'}, P_{S}) - O (η) \\ = E_{3}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}, P_{X^{'} | U S}) - O (η) . \end{matrix}

Since the exponent of the type II error probability is lower bounded by the minimum of the exponent of the type II error-causing events, it follows from (49), (50) and (51) that for a fixed

(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W}, P_{X^{'} | U S}) \in L_{h} (κ_{α})

\begin{matrix} {\bar{P}}_{P^{(B_{n})}} (\hat{H} = 1) & \leq e^{- n (κ_{α} - O (η))}, \end{matrix}

(52a)

\begin{matrix} {\bar{P}}_{Q^{(B_{n})}} (\hat{H} = 0) & \leq e^{- n ({\bar{κ}}_{h} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}, P_{X^{'} | U S}) - O (η))}, \end{matrix}

(52b)

where

{\bar{κ}}_{h} = \min \{E_{1}^{'} (κ_{α}, ω^{'}), E_{2}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}), E_{3}^{'} (κ_{α}, ω^{'}, P_{S}, P_{X | U S W}, P_{X^{'} | U S})\}

. Performing expurgation as in the proof of Theorem 1 to obtain a deterministic codebook

B_{n}

satisfying (52a, 52b), maximizing over

(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W}, P_{X^{'} | U S}) \in L_{h} (κ_{α})

and noting that

η > 0

is arbitrary yields

κ (κ_{α}) \geq κ_{h}^{★} (κ_{α})

.

Finally, we show that

κ (κ_{α}) \geq κ_{u}^{★} (κ_{α})

, which will complete the proof. Fix

P_{X | U S}

and let

P_{U V X Y} : = P_{U V} P_{X | U S} P_{Y | X}

and

Q_{U V X Y} : = Q_{U V} P_{X | U S} P_{Y | X}

. Consider an uncoded transmission scheme in which the channel input

X \sim f_{n} (\cdot | u) = P_{X | U S}^{\otimes n} (\cdot | u, s)

. Let the decision rule

g_{n}

be specified by the acceptance region

A_{n} = \{(s, v, y) : D (P_{v y | s} | | P_{V Y | S} | P_{s}) \leq κ_{α} + η\}

for some small

η > 0

. Then, it follows from Lemma 2.6 in [42] that for sufficiently large n,

\begin{matrix} α_{n} (f_{n}, g_{n}) & = P_{V Y | S}^{\otimes n} (A_{n}^{c} | s) \leq e^{- n κ_{α}}, \\ β_{n} (f_{n}, g_{n}) & = Q_{V Y | S}^{\otimes n} (A_{n} | s) \leq e^{- n (κ_{u}^{★} (κ_{α}) - O (η))} . \end{matrix}

The proof is complete by noting that

η > 0

is arbitrary.

5. Conclusions

This work explored the trade-off between the type I and type II error-exponents for distributed hypothesis testing over a noisy channel. We proposed a separate hypothesis testing and channel coding scheme as well as a joint scheme utilizing hybrid coding, and analyzed their performance resulting in two inner bounds on the error-exponents trade-off. The separate scheme recovers some of the existing bounds in the literature as special cases. We also showed via an example of testing against dependence that the joint scheme strictly outperforms the separate scheme at some points of the error-exponents trade-off. An interesting avenue for future research is the exploration of novel outer bounds that could shed light on the scenarios where the separate or joint schemes are tight.

Author Contributions

Conceptualization, S.S. and D.G.; writing—original draft preparation, S.S. and D.G.; supervision, D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the European Research Council Starting Grant project BEACON (grant agreement number 677854).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HT	Hypothesis testing
DHT	Distributed hypothesis testing
TAD	Testing against dependence
TAI	Testing against independence
SHTCC	Separate hypothesis testing and channel coding
JHTCC	Joint hypothesis testing and channel coding

Appendix A. Proof that Theorem 1 Recovers Theorem 2 in [10]

We prove that

{lim}_{κ_{α} \to 0} κ_{s}^{★} (κ_{α}) = κ_{s}

, where

κ_{s}

is the lower bound on the type II error-exponent for a fixed type I error probability constraint and unit bandwidth ratio established in Theorem 2 in [10]. Note that

\hat{L} (0, ω) = {P_{U V W} = P_{U V} P_{W | U}, P_{W | U} = ω (P_{U})}

,

ζ (0, ω) = I_{P} (U; W)

, and

ρ (0, ω) = I_{P} (V; W)

. The result then follows from Theorem 1 by noting that

\hat{L} (κ_{α}, ω)

,

ζ (κ_{α}, ω)

and

ρ (κ_{α}, ω)

are continuous in

κ_{α}

and the fact that

E_{sp} (P_{S X}, θ), E_{ex} (R, P_{S X})

and

E_{b} (κ_{α}, ω, R)

are all greater than or equal to zero.

Appendix B. Proof that Theorem 2 Recovers Theorem 5 in [10]

We show that

{lim}_{κ_{α} \to 0} κ_{h}^{★} (κ_{α}) = κ_{h}

, where

κ_{h}

is as defined in Theorem 5 in [10]. Note that

{\hat{L}}_{h} (0, ω^{'}, P_{S}, P_{X | U S W}) : = \{P_{U V \hat{W} Y S} : P_{S U V W X Y} : = P_{S} P_{U V} P_{W | U S} P_{X | U S W} P_{Y | X}, P_{W | U S} = ω^{'} (P_{U}, P_{S})\}

,

ζ^{'} (0, ω^{'}, P_{S}) : = I_{P} (U; W | S)

,

ρ (0, ω^{'}, P_{S}, P_{X | U S W}) = I_{P} (Y, V; W | S)

,

\begin{matrix} L_{h} (0) : = \{(P_{S}, ω^{'} (\cdot, P_{S}), P_{X | U S W}, P_{X^{'} | U S}) : I_{P} (U; W | S) < I_{P} (Y, V; W | S)\}, \end{matrix}

and

E_{b}^{'} (0, ω^{'}, P_{S}, P_{X | U S W}) = I_{P} (Y, V; W | S) - I_{P} (U; W | S)

. The result then follows from Theorem 2 via the continuity of

{\hat{L}}_{h} (κ_{α}, \cdot, \cdot, \cdot)

,

ζ^{'} (κ_{α}, \cdot, \cdot)

,

ρ (κ_{α}, \cdot, \cdot, \cdot)

,

L_{h} (κ_{α})

and

E_{b}^{'} (κ_{α}, \cdot, \cdot, \cdot)

in

κ_{α}

.

Appendix C. An Auxiliary Result

Here, we prove a result which was used in the proof of Theorem 1, namely Proposition A1 given below. For this purpose, we require a few properties of log-moment generating function, which we briefly review next.

Lemma A1.

(Theorem 15.3, Theorem 15.6 in [43])

(i): $ψ_{P_{Z}, f} (0) = 0$ and $ψ_{P_{Z}, f}^{'} (0) = E_{P_{Z}} [f (Z)]$ , where $ψ_{P_{Z}, f}^{'} (λ)$ denotes the derivative of $ψ_{P_{Z}, f} (λ)$ with respect to λ.
(ii): $ψ_{P_{Z}, f} (λ)$ is a strictly convex function in λ.
(iii): $ψ_{P_{Z}, f}^{*} (θ)$ is strictly convex and strictly positive in θ except $ψ_{P_{Z}, f}^{*} (E_{P_{Z}} [Z]) = 0$ .

Proposition A1 is basically a characterization of the error-exponent region of a hypothesis testing problem, which we introduce next. Let

P_{X_{0} X_{1}} \in P (X^{2})

be an arbitrary joint PMF, and consider a sequence of pairs of n-length sequences

(\tilde{x}, x^{'})

such that

\begin{matrix} P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) \overset{(n)}{\to} P_{X_{0} X_{1}} (\tilde{x}, x^{'}), \forall (\tilde{x}, x^{'}) \in X^{2} . \end{matrix}

(A1)

Consider the following HT:

H_{0} : Y \sim P_{Y | X}^{\otimes n} (\cdot | \tilde{x}),

(A2a)

H_{1} : Y \sim P_{Y | X}^{\otimes n} (\cdot | x^{'}) .

(A2b)

With the achievability of an error-exponent pair

(κ_{α}, κ_{β})

defined similar to Definition 1, consider the error-exponent region of interest

\begin{matrix} R^{'} (P_{X_{0} X_{1}}) : = {(κ_{α}, κ^{'} (κ_{α}, P_{X_{0} X_{1}})) : κ_{α} \in (0, κ_{α}^{' ★})}, \end{matrix}

where

κ^{'} (κ_{α}, P_{X_{0} X_{1}}) : = sup {κ_{β} : (κ_{α}, κ_{β}) is achievable for HT in in (A2)}

and

κ_{α}^{' ★} : = inf {κ_{α} : κ^{'} (κ_{α}, P_{X_{0} X_{1}}) = 0}

. We mention at this point that the notation

R^{'} (P_{X_{0} X_{1}})

is justified as the error-exponent region for the above hypothesis test depends on

(\tilde{x}, x^{'})

only through its limiting joint type

P_{X_{0} X_{1}}

, as will be evident later. Given this, the following proposition provides a single-letter characterization of

R^{'} (P_{X_{0} X_{1}})

.

Proposition A1.

\begin{matrix} R^{'} (P_{X_{0} X_{1}}) = \underset{θ \in I (P_{X_{0} X_{1}})}{\cup} (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)], E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ) \end{matrix}

where

Π_{\tilde{x}, x^{'}} (y) : = log (P_{Y | X} (y | x^{'}) / P_{Y | X} (y | \tilde{x}))

for

(\tilde{x}, x^{'}) \in X^{2}

,

\begin{matrix} I (P_{X_{0} X_{1}}) & : = (- d_{\min} (P_{X_{0} X_{1}}), d_{max} (P_{X_{0} X_{1}})), \\ d_{\min} (P_{X_{0} X_{1}}) & : = E_{P_{X_{0} X_{1}}} [D (P_{Y | X} (\cdot | X_{0}) | | P_{Y | X} (\cdot | X_{1}))] \\ d_{max} (P_{X_{0} X_{1}}) & : = E_{P_{X_{0} X_{1}}} [D (P_{Y | X} (\cdot | X_{1}) | | P_{Y | X} (\cdot | X_{0}))] . \end{matrix}

Proof.

Let

(\tilde{x}, x^{'}) \in X^{n} \times X^{n}

be sequences that satisfy (A1). For simplicity, we will denote

d_{max} (P_{X_{0} X_{1}})

and

d_{\min} (P_{X_{0} X_{1}})

by

d_{max}

and

d_{\min}

, respectively.

Achievability: We will show that for

- d_{\min} < θ < d_{max}

,

\begin{matrix} κ^{'} (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)], P_{X_{0} X_{1}}) \geq E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ . \end{matrix}

Consider the Neyman–Pearson test given by

g_{n} (y) = 𝟙_{\{Π_{\tilde{x}, x^{'}}^{(n)} (y) \geq n θ\}}

, where

Π_{\tilde{x}, x^{'}}^{(n)} (y) : = \sum_{i = 1}^{n} Π_{{\tilde{x}}_{i}, x_{i}^{'}, P_{Y | X}} (y_{i})

. Observe that the type I error probability can be upper bounded for

θ > - d_{\min}

and sufficiently large n as follows:

\begin{matrix} α_{n} (g_{n}) & = P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (Π_{\tilde{x}, x^{'}}^{(n)} (Y) \geq n θ) \\ \overset{(a)}{\leq} e^{- sup_{λ \geq 0} n θ λ - ψ_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}^{(n)}} (λ)} \\ \overset{(b)}{=} e^{- sup_{λ \in R} n (θ λ - \frac{1}{n} ψ_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}^{(n)}} (λ))}, \end{matrix}

(A3)

where

(a)

follows from the Chernoff bound, and

(b)

holds because for

θ > - d_{\min}

and sufficiently large n, the supremum in (A3) always occurs at

λ \geq 0

. To see this, note that the term

l_{n} (λ) : = θ λ - n^{- 1} ψ_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}^{(n)}} (λ)

is a concave function of

λ

by Lemma A1 (i). Additionally, denoting its derivative with respect to

λ

by

l_{n}^{'} (λ)

, we have

\begin{matrix} l_{n}^{'} (0) & = θ - \frac{1}{n} E_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} [Π_{\tilde{x}, x^{'}}^{(n)}] \\ = θ - \frac{1}{n} \sum_{i = 1}^{n} E_{P_{Y | X} (\cdot | {\tilde{x}}_{i})} [log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i}))] \end{matrix}

(A4)

\begin{matrix} \overset{(n)}{\to} θ + d_{\min} > 0, \end{matrix}

(A5)

where (A4) follows from Lemma A1 (iii), and (A5) is due to the absolute continuity assumption,

P_{Y | X} (\cdot | x) ≪ P_{Y | X} (\cdot | x^{'})

,

\forall (x, x^{'}) \in X^{2}

on the channel, and (A1). Thus, by the concavity of

l_{n} (λ)

, its supremum has to occur at

λ \geq 0

. Simplifying the term within the exponent in (A3), we obtain

\begin{matrix} \frac{1}{n} ψ_{P_{Y | X}^{\otimes n} (\cdot | x), Π_{\tilde{x}, x^{'}}^{(n)}} (λ) & = \sum_{\tilde{x}, x^{'}} P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) log (E_{P_{Y | X} (\cdot | \tilde{x})} [{(P_{Y | X} (Y | x^{'}) / P_{Y | X} (Y | \tilde{x}))}^{λ}]) \end{matrix}

(A6)

\begin{matrix} \overset{(n)}{\to} E_{P_{X_{0} X_{1}}} [log (E_{P_{Y | X} (\cdot | X_{0})} [e^{λ Π_{X_{0}, X_{1}} (Y)}])], \end{matrix}

(A7)

where (A7) follows from (A1) and the absolute continuity assumption on

P_{Y | X}

. Substituting (A7) in (A3) and from (1), we obtain for arbitrarily small (but fixed)

δ > 0

and sufficiently large n, that

\begin{matrix} α_{n} (g_{n}) & \leq e^{- sup_{λ \in R} (n (θ λ - E_{P_{X_{0} X_{1}}} [log (E_{P_{Y | X} (\cdot | X_{0})} [e^{λ Π_{X_{0}, X_{1}} (Y)}])] - δ))} \\ = e^{- n (E_{P_{X_{0} X_{1}}} [sup_{λ \in R} (θ λ - E_{P_{Y | X} (\cdot | X_{0})} (e^{λ Π_{X_{0}, X_{1}} (Y)}))] - δ)} \\ = e^{- n (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - δ)} . \end{matrix}

(A8)

Similarly, it can be shown that for

θ < d_{max}

,

\begin{matrix} β_{n} (g_{n}) & \leq e^{- n (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{1}), Π_{X_{0}, X_{1}}}^{*} (θ)] - δ)} . \end{matrix}

(A9)

Moreover, for

(\tilde{x}, x^{'}) \in X^{2}

, we have

\begin{matrix} e^{ψ_{P_{Y | X} (\cdot | x^{'}), Π_{\tilde{x}, x^{'}}} (λ)} & = \sum_{y \in Y} P_{Y | X}^{λ + 1} (\cdot | x^{'}) / P_{Y | X}^{λ} (\cdot | \tilde{x}) = e^{ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}} (λ + 1)} . \end{matrix}

It follows that

\begin{matrix} ψ_{P_{Y | X} (\cdot | x^{'}), Π_{\tilde{x}, x^{'}}}^{*} (θ) & : = sup_{λ \in R} λ θ - ψ_{P_{Y | X} (\cdot | x^{'}), Π_{\tilde{x}, x^{'}}} (λ) \\ = sup_{λ \in R} λ θ - ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}} (λ + 1) \\ = ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}}^{*} (θ) - θ . \end{matrix}

Hence,

\begin{matrix} E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{1}), Π_{X_{0}, X_{1}}}^{*} (θ)] = E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ . \end{matrix}

From this, (A8) and (A9), we obtain for

- d_{\min} < θ < d_{max}

that

\begin{matrix} κ^{'} (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - δ, P_{X_{0} X_{1}}) \geq E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ - δ . \end{matrix}

Then, the proof of achievability is completed by noting that

δ > 0

is arbitrary and

κ^{'} (κ_{α}, P_{X_{0} X_{1}})

is a continuous function of

κ_{α}

for a fixed

P_{X_{0} X_{1}}

.

Converse: Let $I_{n} (\tilde{x}, x^{'}) : = {i \in [n] : {\tilde{x}}_{i} = \tilde{x} and x_{i}^{'} = x^{'}}$ . For any $θ \in R$ and decision function $g_{n}$ , we have from Theorem 14.9 in [43] that

$\begin{matrix} α_{n} (g_{n}) + e^{- n θ} β_{n} (g_{n}) & \geq P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (log (P_{Y | X}^{\otimes n} (Y | x^{'}) / P_{Y | X}^{\otimes n} (Y | \tilde{x})) \geq n θ) . \end{matrix}$

Simplifying the RHS above, we obtain

\begin{matrix} P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (log (P_{Y | X}^{\otimes n} (Y | x^{'}) / P_{Y | X}^{\otimes n} (Y | \tilde{x})) \geq n θ) \\ = P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (\sum_{\tilde{x}, x^{'}} \sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq n θ) \\ = P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (\sum_{\tilde{x}, x^{'}} \sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq \sum_{(\tilde{x}, x^{'}) \in X^{2}} n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ) \\ \overset{(a)}{\geq} P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (⋂_{\tilde{x}, x^{'}} (\sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ)) \\ \overset{(b)}{=} \prod_{(\tilde{x}, x^{'}) \in X^{2}} P_{P_{Y | X}^{\otimes n} (\cdot | \tilde{x})} (\sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ), \end{matrix}

where

(a): follows since

$\begin{matrix} ⋂_{\tilde{x}, x^{'}} \{\sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ\} \\ \subseteq \{\sum_{\tilde{x}, x^{'}} \sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq \sum_{(\tilde{x}, x^{'}) \in X^{2}} n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ\}; \end{matrix}$
(b): is due to the independence of the events $\{\sum_{i \in I_{n} (\tilde{x}, x^{'})} log (P_{Y | X} (Y_{i} | x_{i}^{'}) / P_{Y | X} (Y_{i} | {\tilde{x}}_{i})) \geq n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) θ\}$ for different $(\tilde{x}, x^{'}) \in X^{2}$ .

Define

b_{\tilde{x}, x^{'}} (θ) : = \min_{\begin{matrix} {\tilde{Q}}_{\tilde{x}} \in P (Y) : E_{{\tilde{Q}}_{\tilde{x}}} [log (P_{Y | X} (Y | x^{'}) / P_{Y | X} (Y | \tilde{x}))] \geq θ \end{matrix}} D ({\tilde{Q}}_{x} | | P_{Y | X} (\cdot | \tilde{x}))

. Then, for arbitrary

δ > 0

,

δ^{'} > δ

and sufficiently large n, we can write

\begin{matrix} α_{n} + e^{- n θ} β_{n} & \overset{(a)}{\geq} \prod_{(\tilde{x}, x^{'}) \in X^{2}} e^{- n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) (b_{\tilde{x}, x^{'}} (θ) + δ)} \\ \overset{(b)}{\geq} \prod_{(\tilde{x}, x^{'}) \in X^{2}} e^{- n P_{\tilde{x} x^{'}} (\tilde{x}, x^{'}) (ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}}^{*} (θ) + δ)} \\ \overset{(c)}{=} e^{- n (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] + δ^{'})}, \end{matrix}

where

(a)

follows from Theorem 15.9 in [43];

(b)

follows since

b_{\tilde{x}, x^{'}} (θ) = ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}}^{*} (θ)

from Theorem 15.6 in [43] and Theorem 15.11 in [43]; and

(c)

is due to (A1). The equation above implies that

\begin{matrix} \underset{n \to \infty}{lim sup} \min \{- \frac{1}{n} log α_{n}, - \frac{1}{n} log β_{n} + θ\} \leq E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] + δ^{'} . \end{matrix}

Hence, if

- log (α_{n}) / n > E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] + δ^{'}

for all sufficiently large n, then

\begin{matrix} \underset{n \to \infty}{lim sup} - \frac{1}{n} log β_{n} \leq E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ + δ^{'} . \end{matrix}

Since

δ

(and

δ^{'}

) is arbitrary, this implies via the continuity of

κ^{'} (κ_{α}, P_{X_{0} X_{1}})

in

κ_{α}

that

\begin{matrix} κ^{'} (E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)], P_{X_{0} X_{1}}) \leq E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ . \end{matrix}

To complete the proof, we need to show that

θ

can be restricted to lie in

I (P_{X_{0} X_{1}})

. Toward this, it suffices to show the following:

(i): $E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (- d_{\min})] = 0$ ,
(ii): $E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (d_{max})] = d_{max}$ , and
(iii): $E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)]$ and $E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (θ)] - θ$ are convex functions of $θ$ .

We have

\begin{matrix} E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (- d_{\min})] \\ : = sup_{λ \in R} - λ E_{P_{X_{0} X_{1}}} [D (P_{Y | X} (\cdot | X_{0}) | | P_{Y | X} (\cdot | X_{1}))] - E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}} (λ)] \\ \leq \sum_{\tilde{x}, x^{'}} P_{X_{0} X_{1}} (\tilde{x}, x^{'}) [sup_{λ_{\tilde{x}, x^{'}} \in R} - λ_{\tilde{x}, x^{'}} D (P_{Y | X} (\cdot | \tilde{x}) | | P_{Y | X} (\cdot | x^{'})) - ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}} (λ_{\tilde{x}, x^{'}})] \\ = 0, \end{matrix}

(A10)

where (A10) follows since each term inside the square braces in the penultimate equation is zero, which in turn follows from Lemma A1 (iii). Additionally,

\begin{matrix} E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (- d_{\min})] & = \sum_{\tilde{x}, x^{'}} P_{X_{0} X_{1}} (\tilde{x}, x^{'}) ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}}^{*} (- d_{\min}) \geq 0, \end{matrix}

(A11)

where (A11) follows from the non-negativity of

ψ_{P_{Y | X} (\cdot | \tilde{x}), Π_{\tilde{x}, x^{'}}}^{*}

for every

(\tilde{x}, x^{'}) \in X^{2}

stated in Lemma A1 (iii). Combining (A10) and (A11) proves

(i)

. We also have

\begin{matrix} E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{0}), Π_{X_{0}, X_{1}}}^{*} (d_{max})] - d_{max} & = E_{P_{X_{0} X_{1}}} [ψ_{P_{Y | X} (\cdot | X_{1}), Π_{X_{0}, X_{1}}}^{*} (d_{max})] = 0, \end{matrix}

(A12)

where the final equality follows similarly to the proof of

(i)

. This proves

(i i)

. Finally, (iii) follows from Lemma A1 (iii) and the fact that a weighted sum of convex functions is convex provided the weights are non-negative, thus completing the proof. □

References

Neyman, J.; Pearson, E. On the Problem of the Most Efficient Tests of Statistical Hypotheses. Philos. Trans. R. Soc. Lond. 1933, 231, 289–337. [Google Scholar]
Chernoff, H. A measure of asymptotic efficiency for tests of a hypothesis based on a sum of observations. Ann. Math. Statist. 1952, 23, 493–507. [Google Scholar] [CrossRef]
Hoeffding, W. Asymptotically optimal tests for multinominal distributions. Ann. Math. Statist. 1965, 36, 369–400. [Google Scholar] [CrossRef]
Blahut, R.E. Hypothesis Testing and Information Theory. IEEE Trans. Inf. Theory 1974, 20, 405–417. [Google Scholar] [CrossRef]
Tuncel, E. On error-exponents in hypothesis testing. IEEE Trans. Inf. Theory 2005, 51, 2945–2950. [Google Scholar] [CrossRef]
Ahlswede, R.; Csiszár, I. Hypothesis Testing with Communication Constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [Google Scholar] [CrossRef]
Han, T.S. Hypothesis Testing with Multiterminal Data Compression. IEEE Trans. Inf. Theory 1987, 33, 759–772. [Google Scholar] [CrossRef]
Shimokawa, H.; Han, T.S.; Amari, S. Error Bound of Hypothesis Testing with Data Compression. In Proceedings of the IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 114. [Google Scholar]
Rahman, M.S.; Wagner, A.B. On the Optimality of Binning for Distributed Hypothesis Testing. IEEE Trans. Inf. Theory 2012, 58, 6282–6303. [Google Scholar] [CrossRef]
Sreekumar, S.; Gündüz, D. Distributed Hypothesis Testing Over Discrete Memoryless Channels. IEEE Trans. Inf. Theory 2020, 66, 2044–2066. [Google Scholar] [CrossRef]
Salehkalaibar, S.; Wigger, M. Distributed Hypothesis Testing Based on Unequal-Error Protection Codes. IEEE Trans. Inf. Theory 2020, 66, 4150–4182. [Google Scholar] [CrossRef]
Berger, T. Decentralized estimation and decision theory. In Proceedings of the IEEE 7th Spring Workshop on IInformation Theory, Mt. Kisco, NY, USA, September 1979. [Google Scholar]
Shalaby, H.M.H.; Papamarcou, A. Multiterminal Detection with Zero-Rate Data Compression. IEEE Trans. Inf. Theory 1992, 38, 254–267. [Google Scholar] [CrossRef]
Han, T.S.; Kobayashi, K. Exponential-Type Error Probabilities for Multiterminal Hypothesis Testing. IEEE Trans. Inf. Theory 1989, 35, 2–14. [Google Scholar] [CrossRef]
Gündüz, D.; Kurka, D.B.; Jankowski, M.; Amiri, M.M.; Ozfatura, E.; Sreekumar, S. Communicate to Learn at the Edge. IEEE Commun. Mag. 2020, 58, 14–19. [Google Scholar] [CrossRef]
Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications. IEEE J. Sel. Areas Commun. 2023, 41, 5–41. [Google Scholar] [CrossRef]
Zhao, W.; Lai, L. Distributed Testing Against Independence with Multiple Terminals. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–3 October 2014; pp. 1246–1251. [Google Scholar]
Wigger, M.; Timo, R. Testing Against Independence with Multiple Decision Centers. In Proceedings of the International Conference on Signal Processing and Communications, Bangalore, India, 12–15 June 2016; pp. 1–5. [Google Scholar]
Salehkalaibar, S.; Wigger, M.; Wang, L. Hypothesis Testing In Multi-Hop Networks. IEEE Trans. Inf. Theory 2019, 65, 4411–4433. [Google Scholar] [CrossRef]
Zaidi, A.; Aguerri, I.E. Optimal Rate-Exponent Region for a Class of Hypothesis Testing Against Conditional Independence Problems. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 25–28 August 2019; pp. 1–5. [Google Scholar]
Zaidi, A. Rate-Exponent Region for a Class of Distributed Hypothesis Testing Against Conditional Independence Problems. IEEE Trans. Inf. Theory 2023, 69, 703–718. [Google Scholar] [CrossRef]
Mhanna, M.; Piantanida, P. On Secure Distributed Hypothesis Testing. In Proceedings of the IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 1605–1609. [Google Scholar]
Sreekumar, S.; Gündüz, D. Testing Against Conditional Independence Under Security Constraints. In Proceedings of the IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 181–185. [Google Scholar]
Sreekumar, S.; Cohen, A.; Gündüz, D. Privacy-Aware Distributed Hypothesis Testing. Entropy 2020, 22, 665. [Google Scholar] [CrossRef] [PubMed]
Gilani, A.; Amor, S.B.; Salehkalaibar, S.; Tan, V. Distributed Hypothesis Testing with Privacy Constraints. Entropy 2019, 21, 478. [Google Scholar] [CrossRef] [PubMed]
Katz, G.; Piantanida, P.; Debbah, M. Distributed Binary Detection with Lossy Data Compression. IEEE Trans. Inf. Theory 2017, 63, 5207–5227. [Google Scholar] [CrossRef]
Xiang, Y.; Kim, Y.H. Interactive hypothesis testing with communication constraints. In Proceedings of the 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 1065–1072. [Google Scholar]
Xiang, Y.; Kim, Y.H. Interactive hypothesis testing against Independence. In Proceedings of the IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 2840–2844. [Google Scholar]
Tian, C.; Chen, J. Successive Refinement for Hypothesis Testing and Lossless One-Helper Problem. IEEE Trans. Inf. Theory 2008, 54, 4666–4681. [Google Scholar] [CrossRef]
Haim, E.; Kochman, Y. On Binary Distributed Hypothesis Testing. arXiv 2017, arXiv:1801.00310. [Google Scholar]
Weinberger, N.; Kochman, Y. On the Reliability Function of Distributed Hypothesis Testing Under Optimal Detection. IEEE Trans. Inf. Theory 2019, 65, 4940–4965. [Google Scholar] [CrossRef] [Green Version]
Hadar, U.; Liu, J.; Polyanskiy, Y.; Shayevitz, O. Error Exponents in Distributed Hypothesis Testing of Correlations. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 2674–2678. [Google Scholar]
Watanabe, S. Neyman-Pearson Test for Zero-Rate Multiterminal Hypothesis Testing. IEEE Trans. Inf. Theory 2018, 64, 4923–4939. [Google Scholar] [CrossRef]
Xu, X.; Huang, S.L. On Distributed Learning With Constant Communication Bits. IEEE J. Sel. Areas Inf. Theory 2022, 3, 125–134. [Google Scholar] [CrossRef]
Salehkalaibar, S.; Tan, V.Y.F. Distributed Sequential Hypothesis Testing With Zero-Rate Compression. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021; pp. 1–5. [Google Scholar]
Sreekumar, S.; Gündüz, D. Strong Converse for Testing Against Independence over a Noisy channel. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1283–1288. [Google Scholar]
Salehkalaibar, S.; Wigger, M. Distributed Hypothesis Testing with Variable-Length Coding. IEEE J. Sel. Areas Inf. Theory 2020, 1, 681–694. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Borade, S.; Nakiboğlu, B.; Zheng, L. Unequal Error Protection: An Information-Theoretic Perspective. IEEE Trans. Inf. Theory 2009, 55, 5511–5539. [Google Scholar] [CrossRef]
Minero, P.; Lim, S.H.; Kim, Y.H. A Unified Approach to Hybrid Coding. IEEE Trans. Inf. Theory 2015, 61, 1509–1523. [Google Scholar] [CrossRef]
Weinberger, N.; Kochman, Y.; Wigger, M. Exponent Trade-off for Hypothesis Testing Over Noisy Channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1852–1856. [Google Scholar]
Csiszár, I. On the error-exponent of source-channel transmission with a distortion threshold. IEEE Trans. Inf. Theory 1982, 28, 823–828. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Wu, Y. Information Theory: From Coding to Learning; Cambridge University Press: Cambridge, UK, 2012; Available online: https://people.lids.mit.edu/yp/homepage/data/itbook-export.pdf (accessed on 10 December 2022).
Gallager, R. A simple derivation of the coding theorem and some applications. IEEE Trans. Inf. Theory 1965, 11, 3–18. [Google Scholar] [CrossRef]
Merhav, N.; Shamai, S. On joint source-channel coding for the Wyner-Ziv source and the Gelfand-Pinsker channel. IEEE Trans. Inf. Theory 2003, 49, 2844–2855. [Google Scholar] [CrossRef]
Cover, T.; Gamal, A.E.; Salehi, M. Multiple Access Channels with Arbitrarily Correlated Sources. IEEE Trans. Inf. Theory 1980, 26, 648–657. [Google Scholar] [CrossRef] [Green Version]
Csiszár, I. Joint Source-Channel Error Exponent. Prob. Control Inf. Theory 1980, 9, 315–328. [Google Scholar]
Eggleston, H.G. Convexity, 6th ed.; Cambridge University Press: Cambridge, UK, 1958. [Google Scholar]

Figure 1. DHT over a noisy channel. The observer observes an n-length independent and identically distributed sequence

U

, and transmits

X

over the DMC

P_{Y | X}^{\otimes n}

. Based on the channel output

Y

and the n-length independent and identically distributed sequence

V

, the decision maker performs a binary HT to determine whether

(U, V) \sim P_{U V}^{\otimes n}

or

(U, V) \sim Q_{U V}^{\otimes n}

.

Figure 1. DHT over a noisy channel. The observer observes an n-length independent and identically distributed sequence

U

, and transmits

X

over the DMC

P_{Y | X}^{\otimes n}

. Based on the channel output

Y

and the n-length independent and identically distributed sequence

V

, the decision maker performs a binary HT to determine whether

(U, V) \sim P_{U V}^{\otimes n}

or

(U, V) \sim Q_{U V}^{\otimes n}

.

Figure 2. Comparison of the error-exponents trade-off achieved by the SHTCC and JHTCC schemes for TAD over a BSC in Example 1 with parameters

p = 0.25, q = 0

for (a) and

p = 0.35, q = 0

for (b). The red curve shows

(κ_{α}, κ_{u}^{★} (κ_{α}))

pairs achieved by uncoded transmission while the blue line plots

(κ_{α}, E_{ex} (0))

. The joint scheme clearly achieves a better error-exponent trade-off for values of

κ_{α}

below a threshold which depends on the transition kernel of the channel. In particular, a more uniform channel results in a lesser threshold.

Figure 2. Comparison of the error-exponents trade-off achieved by the SHTCC and JHTCC schemes for TAD over a BSC in Example 1 with parameters

p = 0.25, q = 0

for (a) and

p = 0.35, q = 0

for (b). The red curve shows

(κ_{α}, κ_{u}^{★} (κ_{α}))

pairs achieved by uncoded transmission while the blue line plots

(κ_{α}, E_{ex} (0))

. The joint scheme clearly achieves a better error-exponent trade-off for values of

κ_{α}

below a threshold which depends on the transition kernel of the channel. In particular, a more uniform channel results in a lesser threshold.

Figure 3. Comparison of the error-exponents trade-off achieved by the SHTCC and JHTCC schemes for Example 1 with parameters

p = 0.25, q = 0.05

for (a) and

p = 0.35, q = 0.05

for (b). The JHTCC scheme improves over the separation based scheme for small values of

κ_{α}

; however, the region of improvement is reduced compared to Figure 2 as the source is more uniformly distributed.

Figure 3. Comparison of the error-exponents trade-off achieved by the SHTCC and JHTCC schemes for Example 1 with parameters

p = 0.25, q = 0.05

for (a) and

p = 0.35, q = 0.05

for (b). The JHTCC scheme improves over the separation based scheme for small values of

κ_{α}

; however, the region of improvement is reduced compared to Figure 2 as the source is more uniformly distributed.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sreekumar, S.; Gündüz, D. Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off. Entropy 2023, 25, 304. https://doi.org/10.3390/e25020304

AMA Style

Sreekumar S, Gündüz D. Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off. Entropy. 2023; 25(2):304. https://doi.org/10.3390/e25020304

Chicago/Turabian Style

Sreekumar, Sreejith, and Deniz Gündüz. 2023. "Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off" Entropy 25, no. 2: 304. https://doi.org/10.3390/e25020304

APA Style

Sreekumar, S., & Gündüz, D. (2023). Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off. Entropy, 25(2), 304. https://doi.org/10.3390/e25020304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off

Abstract

1. Introduction

1.1. Background

1.2. Contributions

2. Preliminaries

2.1. Notation

2.2. Problem Formulation

3. Main Results

3.1. Inner Bound on $R$ via SHTCC Scheme

3.2. Inner Bound via JHTCC Scheme

3.3. Comparison of Inner Bounds

4. Proofs

4.1. Proof of Theorem 1

4.2. Proof of Corollary 1

4.3. Proof of Corollary 2

4.4. Proof of Theorem 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof that Theorem 1 Recovers Theorem 2 in [10]

Appendix B. Proof that Theorem 2 Recovers Theorem 5 in [10]

Appendix C. An Auxiliary Result

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Distributed Hypothesis Testing over a Noisy Channel: Error-Exponents Trade-Off

Abstract

1. Introduction

1.1. Background

1.2. Contributions

2. Preliminaries

2.1. Notation

2.2. Problem Formulation

3. Main Results

3.1. Inner Bound on R via SHTCC Scheme

3.2. Inner Bound via JHTCC Scheme

3.3. Comparison of Inner Bounds

4. Proofs

4.1. Proof of Theorem 1

4.2. Proof of Corollary 1

4.3. Proof of Corollary 2

4.4. Proof of Theorem 2

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof that Theorem 1 Recovers Theorem 2 in [10]

Appendix B. Proof that Theorem 2 Recovers Theorem 5 in [10]

Appendix C. An Auxiliary Result

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. Inner Bound on $R$ via SHTCC Scheme