Hypothesis Testing over Noisy Broadcast Channels

: This paper studies binary hypothesis testing with a single sensor that communicates with two decision centers over a memoryless broadcast channel. The main focus lies on the tradeoff between the two type-II error exponents achievable at the two decision centers. In our proposed scheme, we can partially mitigate this tradeoff when the transmitter has a probability larger than 1/2 to distinguish the alternate hypotheses at the decision centers, i.e., the hypotheses under which the decision centers wish to maximize their error exponents. In the cases where these hypotheses cannot be distinguished at the transmitter (because both decision centers have the same alternative hypothesis or because the transmitter’s observations have the same marginal distribution under both hypotheses), our scheme shows an important tradeoff between the two exponents. The results in this paper thus reinforce the previous conclusions drawn for a setup where communication is over a common noiseless link. Compared to such a noiseless scenario, here, however, we observe that even when the transmitter can distinguish the two hypotheses, a small exponent tradeoff can persist, simply because the noise in the channel prevents the transmitter to perfectly describe its guess of the hypothesis to the two decision centers.


Introduction
In Internet of Things (IoT) networks, data are collected at sensors and transmitted over a wireless channel to remote decision centers, which decide on one or multiple hypotheses based on the collected information. In this paper, we study simple binary hypothesis testing with a single sensor but two decision centers. The results can be combined with previous studies focusing on multiple sensors and a single decision center to tackle the practically relevant case of multiple sensors and multiple decision centers. We consider a single sensor for simplicity and because our main focus is on studying the tradeoff between the performances at the two decision centers that can arise because the single sensor has to send information over the channel that can be used by both decision centers. A simple, but highly suboptimal, approach would be to time-share communication and serve each of the two decision centers only during a part of the transmission. As we will see, better schemes are possible, and, in some cases, it is even possible to serve each of the two decision centers as if the other center was not present in the system.
In this paper, we follow the information-theoretic framework introduced in [1,2]. That means each terminal observes a memoryless sequence, and depending on the underlying hypothesis H ∈ {0, 1}, all sequences follow one of two possible joint distributions, which are known to all involved terminals. A priori, the transmitter, however, ignores the correct hypothesis and has to compute its transmit signal as a function of the observed source symbols only. Decision centers observe outputs of the channel, and, combined with their local observations, they have to make a decision on whether H = 0 or H = 1. The exponents they achieve for the two decision centers. We propose two different schemes, depending on whether the sensor can distinguish with error probability = 1/2 the two null hypotheses at the two decision centers. If a distinction is possible (because the decision centers have different null hypothesis and the sensor's observations follow different marginal distributions under the two hypotheses), then we employ a similar scheme as proposed in [12,13] over a common noiseless link, but where the SHA scheme is replaced by the UEP-based scheme for DMCs in [15]. That means, the sensor makes a tentative guess about the hypothesis and conveys this guess to both decision centers using an UEP mechanism. Moreover, the joint source-channel coding scheme in [15] with dedicated codebooks is used to communicate to the decision center that aims to maximize the error exponent under the hypothesis that does not correspond to the sensor's tentative guess. This scheme shows no tradeoff between the exponents achieved at the two decision centers in various interesting cases. Sometimes, however, a tradeoff arises because even under UEP the specially protected messages can be in error and because the decision centers can confuse the codewords of the two different sets of codebooks. For the case where the sensor cannot reasonably distinguish the alternate hypotheses at the two decision centers (because both decision centers have the same alternate hypotheses or the sensor's observations have the same marginal observations under both hypotheses), we present a scheme similar to [10] but again including UEP. In this scheme, a tradeoff between the exponents achieved at the two decision centers naturally arises and mostly stems from the inherent tradeoff in distributed lossy compression systems with multiple decoders having different side informations.

Notation
We mostly follow the notation in [18]. Random variables are denoted by capital letters, e.g., X, Y, and their realizations by lower-case letters, e.g., x, y. Script symbols such as X and Y stand for alphabets of random variables, and X n and Y n for the corresponding n-fold Cartesian products. Sequences of random variables (X i , ..., X j ) and realizations (x i , . . . , x j ) are abbreviated by X j i and x j i . When i = 1, then we also use X j and x j instead of X j 1 and x j 1 . We write the probability mass function (pmf) of a discrete random variable X as P X ; to indicate the pmf under hypothesis H = 1, we also use Q X . The conditional pmf of X given Y is written as P X|Y , or as Q X|Y when H = 1. The term D(P Q) stands for the Kullback-Leibler (KL) divergence between two pmfs P and Q over the same alphabet. We use tp(a n , b n ) to denote the joint type of the pair of sequences (a n , b n ), and cond_tp(a n |b n ) for the conditional type of a n given b n . For a joint type π ABC over alphabet A × B × C, we denote by I π ABC (A; B|C) the conditional ßmutual information assuming that the random triple (A, B, C) has pmf π ABC ; similarly for the entropy H π ABC (A) and the conditional entropy H π ABC (A|B). Sometimes we abbreviate π ABC by π. In addition, when π ABC has been defined and is clear from the context, we write π A or π AB for the corresponding subtypes. When the type π ABC coincides with the actual pmf of a triple (A, B, C), we omit the subscript and simply write H(A), H(A|B), and I(A; B|C).
For a given P X and a constant µ > 0, let T n µ (P X ) be the set of µ-typical sequences in X n as defined in [8] (Section 2.4). Similarly, T n µ (P XY ) stands for the set of jointly µtypical sequences. The expectation operator is written as E[·]. We abbreviate independent and identically distributed by i.i.d.. The log function is taken with base 2. Finally, in our justifications, we use (DP) and (CR) for "data processing inequality" and "chain rule".

System Model
Consider the distributed hypothesis testing problem in Figure 1, where a transmitter observes sequence X n , Receiver 1 sequence Y n 1 , and Receiver 2 sequence Y n 2 . Under the null hypothesis: and under the alternative hypothesis: for two given pmfs P XY 1 Y 2 and Q XY 1 Y 2 . The transmitter can communicate with the receivers over n uses of a discrete memoryless broadcast channel (W, V 1 × V 2 , P V 1 V 2 |W ) where W denotes the finite channel input alphabet and V 1 and V 2 the finite channel output alphabets. Specifically, the transmitter feeds inputs to the channel, where f (n) denotes the chosen (possibly stochastic) encoding function Each Receiver i ∈ {1, 2} observes the BC ouputs V n i , where for a given input W t = w t , Based on the sequence of channel outputs V n i and the source sequence Y n i , Receiver i decides on the hypothesis H. That means it produces the guesŝ for a chosen decoding function Γ V1V2|W Figure 1. Hypothesis testing over a noisy BC.
There are different possible scenarios regarding the requirements on error probabilities. We assume that each receiver is interested in only one of the two exponents. For each i ∈ {1, 2}, let h i ∈ {0, 1} be the hypothesis whose error exponent Receiver i wishes to maximize, andh i the other hypothesis, i.e.,h i ∈ {0, 1} and h i =h i . (The values of h 1 and h 2 are fixed and part of the problem statement.) We then have: Definition 1. An exponent pair (θ 1 , θ 2 ) is said to be achievable over a BC, if for each 1 , 2 ∈ (0, 1) and sufficiently large blocklengths n, there exist encoding and decoding functions ( f (n) , g 2 ) such that: and − lim Definition 2. The fundamental exponents region E is the set of all exponent pairs (θ 1 , θ 2 ) that are achievable.

Remark 1.
Notice that both α 1,n and β 1,n depend on the BC law Γ V 1 V 2 |W only through the conditional marginal distribution Γ V 1 |W . Similarly, α 2,n and β 2,n only depend on Γ V 2 |W . As a consequence, also the fundamental exponents region E depends on the joint laws P XY 1 Y 2 and Q XY 1 Y 2 only through their marginal laws P XY 1 , P XY 2 , Q XY 1 , and Q XY 2 .

Remark 2.
As a consequence to the preceding Remark 1, when P X = Q X , one can restrict attention to a scenario where both receivers aim at maximizing the error exponent under hypothesis H = 1, i.e., h 1 = h 2 = 1. In fact, under P X = Q X , the fundamental exponents region E for arbitrary h 1 and h 2 coincides with the fundamental exponents region E for h 1 = 1 and h 2 = 1 if one exchanges pmfs P XY 1 and Q XY 1 in case h 1 = 0 and one exchanges pmfs P XY 2 and Q XY 2 in case h 2 = 0.
To simplify the notation in the sequel, we use the following shorthand notations for the pmfs P XY 1 Y 2 and Q XY 1 Y 2 . For each i ∈ {1, 2}: and We propose two coding schemes yielding two different exponent regions, depending on whether ∀x ∈ X : p 1 Notice that (13) always holds when h 1 = h 2 . In contrast, given (14), then obviously h 1 = h 2 .

Results on Exponents Region
Before presenting our main results, we recall the achievable error exponent over a discrete memoryless channel reported in [15] (Theorem 1).

Achievable Exponent for Point-to-Point Channels
Consider a single-receiver setup with only Receiver 1 that wishes to maximize the error exponent under hypothesis h 1 = 1. For simplicity then, we drop the user index 1 and simply call the receiver's source observation Y n and its channel outputs V n .
Theorem 1 (Theorem 1 in [15]). Any exponent θ satisfying the following condition is achievable: where the maximization is over pmfs P S|X , P T , and P W|T such that the joint law P STWVXY := P XY P S|X P T P W|T P V|W satisfies and where the exponents in (15) are defined as: Here, all mutual information terms are calculated with respect to the joint pmf P STWVXY defined above.
The exponent in Theorem 1 is obtained by the following scheme, which is also depicted in Figure 2. The transmitter attempts to quantize the source sequence X n using a random codebook consisting of codewords {S n (m, )}. If the quantization fails because no codeword is jointly typical with the source sequence, then the transmitter applies the UEP mechanism in [17] by sending an IID P T -sequence T n over the channel. Otherwise, it sends the codeword W n (m) for m indicating the first index of the S n (m, ) codeword that is jointly typical with its source observation X n . The receiver jointly decodes the channel and source codeword by verifying the existence of indices (m , ) such that W n = W n (m ) is jointly typical with its channel outputs V n and there is no other codeword S n (m ,˜ ) with smaller conditional empirical entropy given Y n than S n (m , ). If the decoded codeword S n (m , ) is jointly typical with the receiver's observation Y n , then it producesĤ = 0, and otherwiseĤ = 1.
The three competing type-II error exponents in Theorem 1 can be understood in view of this coding scheme as follows. Exponent θ standard indicates the event that a random codeword S n (m, ) is jointly typical with the transmitter's observation X n and with the receiver's observation Y n while being under H = 1. This is also the error exponent in Han's scheme [2] over a noiseless communication link and does not depend on the channel law Γ V|W . Exponent θ dec is related to the joint decoding that checks the joint typicality of the source codeword, as well as of the channel codeword, and applies a conditional minimum entropy decoder. A similar error exponent is observed in the SHA scheme [3,4] over a noiseless link if the mutual information I(W; V|T) is replaced by the rate of the link. The third exponent θ miss finally indicates an event where the transmitter sends T n (so as to indicate the receiver to decide forĤ = 1) but the receiver detects a channel codeword W n (m ) and a corresponding source codeword S n (m , ). This exponent is directly related to the channel transition law Γ V|W and not only to the mutual information of the channel and does not occur when transmission is over a noiseless link. Interestingly, it is redundant in view of exponent θ dec whenever Q XY = P X Q Y because in this case the minimization in (18) evaluates to D(P Y Q Y ). In this special case, the exponent can also be shown to be optimal, see [15].
We now present our achievable exponents region, where we distinguish the two cases (1) h 1 = h 2 and P X = Q X ; and (2) (h 1 = h 2 ) or P X = Q X .

Achievable Exponents Region
When h 1 = h 2 and P X = Q X Theorem 2. If h 1 = h 2 and P X = Q X , i.e., (14) holds, then all error exponent pairs (θ 1 , θ 2 ) satisfying the following condition are achievable: where the union is over pmfs p i S|X , p T , p i T i |T , and p i W|T i , for i ∈ {1, 2}, so that the joint pmfs p 1 , p 2 defined through (12) and satisfy constraints and where the exponents in (20) are defined as follows, where we set q 1 = p 2 and q 2 = p 1 : θ dec,i := miñ θ cross,i := miñ Proof. See Appendix A.
In Theorem 2, the exponent triple θ standard,1 , θ dec,1 , θ miss,1 can be optimized over the pmfs and independently thereof the exponent triple θ standard,2 , θ dec,2 , θ miss,2 can be optimized over the pmfs p 2 S|X , p 2 T 2 |T and p 2 W|T,T 2 . The pmf p T is common to both optimizations. However, whenever the exponents θ cross,1 and θ cross,2 are not active, Theorem 2 depends only on p i S|X , p i T i andp i W|T i , for i = 1, 2, and there is thus no tradeoff between the two exponents θ 1 and θ 2 . In other words, the same exponents θ 1 and θ 2 can be attained as in a system where the transmitter communicates over two individual DMCs Γ V 1 |W and Γ V 2 |W to the two receivers, or equivalently each receiver achieves the same exponent as if the other receiver was not present in the system.
The scheme achieving the exponents region in Theorem 2 is described in detail in Section 4 and analyzed in Appendix A. The main feature is that the sensor makes a tentative decision on H and conveys this decision to both receivers through its choice of the codebooks and a special coded time-sharing sequence indicating this choice. The receiver that wishes to maximize the error exponent corresponding to the hypothesis guessed at the sensor directly decides on this hypothesis. The other receiver should compare its own observation to a quantized version of the source sequence observed at the sensor. The sensor uses the quantization and binning scheme presented in [15] tailored to this latter receiver using either coded time-sharing sequence T n 1 and codebooks {S n (1; m, )} and {W n (1; m)} or coded time-sharing sequence T n 2 and codebooks {S n (2; m, )} and {W n (2; m)}, respectively. The overall scheme is illustrated in Figure 3. Exponents θ standard,i , θ dec,i , and θ miss,i have similar explanations as in the single-user case. Exponent θ cross,i corresponds to the event that the transmitter sends a codeword from {W(j; m)}, for j = 3 − i, but Receiver i decides that a codeword from {W(i; m)} was sent and a source codeword S(i; m, ) satisfies the minimum conditional entropy condition and the typicality check with the observed source sequence Y n i . Notice that setting T i as a constant decreases the error exponent θ cross,i .
For the special case where the BC consists of a common noiseless link, Theorem 2 has been proved in [12,13]. (More precisely, [12] considers the more general case with K ≥ 2 receivers and M ≥ K hypotheses.) In this case, the exponents (θ miss,1 , θ cross,1 ) and (θ miss,2 , θ cross,2 ) are not active and there is no tradeoff between θ 1 and θ 2 .

Achievable Exponents
Region for h 1 = h 2 or P X = Q X Define for any pmfs P T , P SU 1 U 2 |XT and function the joint pmfs and and for each i ∈ {1, 2}, the four exponents θ standard,i := miñ P SU i XY i TV i : θ a dec,i := miñ P SU i XY i TV i : Theorem 3. If h 1 = h 2 or P X = Q X , i.e., (13) holds, then the union of all nonnegative error exponent pairs (θ 1 , θ 2 ) satisfying the following conditions is achievable: θ 1 + θ 2 ≤ min θ standard,1 + θ standard,2 , θ standard,1 + θ a dec,2 , θ standard,1 + θ b dec,2 , θ standard,2 + θ a dec,1 , θ standard,2 + θ b dec,1 , θ miss,1 + θ miss,2 − I p 1 (U 1 ; U 2 |S, T), (31b) where the union is over pmfs P T , P SU 1 U 2 |XT and functions f as in (27) so that the pmfs (28) and (29) satisfy for i ∈ {1, 2}: I p 1 (S, U 1 ; X|T) + I p 1 (U 2 ; X|S, T) + I p 1 (U 1 ; U 2 |S, T) Proof. The coding and testing scheme achieving these exponents is described in Section 5. The analysis of the scheme is similar to the proof of [15] (Theorem 4) and omitted for brevity. In particular, error exponent θ standard,i corresponds with the event that Receiver i decodes the correct cloud and satellite codewords but wrongly decides onĤ i = 0. In contrast, error exponents θ a dec,i and θ b dec,i correspond to the events that Receiver i wrongly decides onĤ i = 0 after wrongly decoding both the cloud center and the satellite or only the satellite. Error exponent θ miss,i corresponds to the miss-detection event. Due to the implicit rate constraints in (46), the final constraints in (31) are obtained by eliminating the rates R 0 , R 1 , R 2 by means of Fourier-Motzkin elimination. Notice that in constraint (31c), the mutual information I p 1 (U 1 ; U 2 |S, T) is multiplied by a factor 2, whereas in (31b), it appears without a factor. The reason is that the error analysis includes union bounds over the codewords in a bin and when wrongly decoding the satellite codewords (which is the case of exponents θ a dec,i and θ b dec,i ) then the union bound is over pairs of codewords, whereas under correct decoding, it is over single codewords. In the former case, we have the factor 2 2nR i in the error probability, and, in the latter case, the factor 2 nR i . The auxiliary rates R 1 and R 2 are then eliminated using the Fourier-Motzkin elimination algorithm.
For each i ∈ {1, 2}, exponents θ standard,i , θ a dec,i , θ b dec,i , and θ miss,i have the same form as the three exponents in [15] (Theorem 1) for the DMC. There is, however, a tradeoff between the two exponents θ 1 and θ 2 in the above theorem because they share the same choice of the auxiliary pmfs P T and P SU 1 U 2 |XT and the function f . In [10], the above setup is studied in the special case of testing against conditional independence, and the mentioned tradeoff is illustrated through a Gaussian example.

Coding and Testing Scheme When
The mutual information in (33) is calculated according to the joint distribution: For each i ∈ {1, 2}, if I p i (S; X) < I p i (W; V i |T, T i ), choose rates If I p i (S; X) ≥ I p i (W; V i |T, T i ), then choose rates Again, all mutual informations in (35)-(38) are calculated with respect to the pmf in (34).
Code Construction: Generate a sequence T n = (T 1 , . . . , T n ) by independently drawing each component T k according to p T . For each i ∈ {1, 2}, generate a sequence T n i = (T i,1 , . . . , T i,n ) by independently drawing each T i,k according to p i T i |T (.|t) when T k = t. In addition, construct a random codebook superpositioned on (T n , T n i ) where the k-th symbol W k (i; m) of codeword W n (i; m) is drawn independently of all codeword symbols according to p i W|TT i (·|t, t i ) when T k = t and T i,k = t i . Finally, construct a random codebook by independently drawing the k-th component S k (i; m, ) of codeword S n (i; m, ) according to the marginal pmf p i S . Reveal all codebooks and the realizations t n , t n 1 , t n 2 of the sequences T n , T n 1 , T n 2 to all terminals.
Transmitter: Given source sequence X n = x n , the transmitter looks for and the corresponding codeword w n (i; m) from codebook C i W satisfies (t n , t n i , w n (i; m)) ∈ T n µ/2 (p i TT i W ).
(Notice that when µ is sufficiently small, then Condition (41) can be satisfied for at most one value i ∈ {1, 2}, because p 1 X = p 2 X .) If successful, the transmitter picks uniformly at random one of the triples (i, m, ) that satisfy (41), and it sends the sequence w n (i; m) over the channel. If no triple satisfies Condition (41), then the transmitter sends the sequence t n over the channel.
Receiver i ∈ {1, 2}: Receives v n i and checks whether there exist indices (m , ) such that the following three conditions are satisfied: 2.
(s n (i; m , ), y n i ) ∈ T n µ (p i SY i ).
If successful, it declaresĤ i =h i . Otherwise, it declaresĤ i = h i . Analysis: See Appendix A.

Coding and Testing Scheme When
In this case, the scheme is based on hybrid source-channel coding. Choose a large positive integer n, auxiliary alphabets S, U 1 , and U 2 , and a function f as in (27).
Choose an auxiliary distribution P T over W, and a conditional distribution P SU 1 U 2 |XT over S × U 1 × U 2 so that for i ∈ {1, 2} inequalities (32) are satisfied with strict inequality.
Then, choose a positive µ and rates R 0 , R 1 , R 2 so that and Generate a sequence T n i.i.d. according to P T and construct a random codebook Transmitter: Given that it observes the source sequence X n = x n , the transmitter looks for indices (m 0 , m 1 , m 2 ) that satisfy If successful, it picks one of these indices uniformly at random and sends the codeword w n over the channel, where and where (s k (m 0 ), u 1,k (m 0 , m 1 ), u 2,k (m 0 , m 2 )) denote the k-th components of codewords (s n (m 0 ), u n 1 (m 0 , m 1 ), u n 2 (m 0 , m 2 )). Otherwise, it sends the sequence of inputs t n over the channel.
If successful, Receiver i declaresĤ i =h i . Otherwise, it declaresĤ i = h i . Analysis: Similar to [15] (Appendix D) and omitted.

Summary and Conclusions
The paper proposed and analyzed general distributed hypothesis testing schemes both for the case where the sensor can distinguish the two null hypotheses and where it cannot. Our general schemes recover all previously studied special cases. Moreover, our schemes illustrate a similar phenomenon for setups with common noisefree communication links from the sensor to all decision centers: while a tradeoff arises when the transmitter cannot distinguish the alternate hypotheses at the two decision centers, such a tradeoff can almost completely be mitigated when the transmitter can distinguish the alternate hypotheses. In contrast to the noise-free link scenario, under a noisy broadcast channel model, a tradeoff can still arise in this case because decision centers can confuse the decision taken at the transmitter, and thus misinterpret to whom the communication is dedicated.
Interesting directions for future research include information-theoretic converse results and extensions to multiple sensors or more than two decision centers.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2
The proof is based on the scheme of Section 4. Fix a choice of blocklength n, the small positive µ and the (conditional) pmfs p T , p 1 , p 1 S|X and p 2 S|X so that (22) holds. Assume that I p 1 (S; X) ≥ I p 1 (W; V 1 |T, T 1 ) and I p 2 (S; X) ≥ I p 2 (W; V 2 |T, T 2 ), in which case R 1 , R 2 , R 1 , R 2 are given by (37) and (38). Additionally, set for convenience of notation: The analysis of type-I error probability is similar as in [15] (Appendix A). The main novelty is that because p 1 X (x) = p 2 X (x) for some x ∈ X , for sufficiently small values of µ > 0, the source sequence cannot lie in both T µ/2 (p 1 X ) and T µ/2 (p 2 X ). Details are omitted. Consider the type-II error probability at Receiver 1 averaged over all random codebooks. Define the following events for i ∈ {1, 2}: {(S n (i; m, ), X n ) ∈ T n µ/2 (p i SX ), (T n , T n i , W n (i; m)) ∈ T n µ/2 (p i TT i W ), W n (i; m)) is sent}, (A3) Notice that Above probability is upper bounded as: The sum of above probabilities can be upper bounded by the sum of the probabilities of the following events: The probabilities of events B 1 , B 2 , B 3 and B 5 can be bounded following similar steps to [15] (Appendix A). This yields: for some functions δ 1 (µ), δ 2 (µ), δ 3 (µ) and δ 5 (µ) that go to zero as n goes to infinity and µ → 0, and where we define: θ standard,i := miñ P SXY i : θ dec,i := miñ P SXY i : Consider event B 4 : Pr (S n (2; m, ), X n ) ∈ T n µ/2 (p 2 SX ), (T n , W n (2; m)) ∈ T n µ/2 (p 2 TW ), Pr (S n (2; m, ), X n ) ∈ T n µ/2 (p 2 SX ), (S n (1; m , ), Y n 1 ) ∈ T n µ (p 1 SY 1 ), H tp(S n (1;m , ),Y n 1 ) (S|Y 1 ) = miñ H tp(S n (1;m ,˜ ),Y n 1 ) (S|Y 1 ) H = h 1 · Pr (T n , T n 1 , W n (1; m ), V n 1 ) ∈ T n µ (p 1 TT 1 WV 1 ), (T n , W n (2; m)) ∈ T n µ/2 (p 2 TW ) where (a) holds because the channel code is drawn independently of the source code and (b) holds by Sanov's theorem.