Exponential Strong Converse for One Helper Source Coding Problem

We consider the one helper source coding problem posed and investigated by Ahlswede, Körner and Wyner. Two correlated sources are separately encoded and are sent to a destination where the decoder wishes to decode one of the two sources with an arbitrary small error probability of decoding. In this system, the error probability of decoding goes to one as the source block length n goes to infinity. This implies that we have a strong converse theorem for the one helper source coding problem. In this paper, we provide the much stronger version of this strong converse theorem for the one helper source coding problem. We prove that the error probability of decoding tends to one exponentially and derive an explicit lower bound of this exponent function.


Introduction
For single or multi terminal source encoding systems, the converse coding theorems state that, at any data compression rates below the fundamental theoretical limit of the system, the error probability of decoding can not go to zero when the block length n of the codes tends to infinity.
In this paper, we study the one helper source coding problem posed and investigated by Ahlswede, Körner [1] and Wyner [2]. We call the above source coding system (the AKW system). The AKW system is shown in Figure 1. In this figure, the AKW system corresponds to the case where the switch is closed. In Figure 1, the sequence (X n , Y n ) represents independent copies of a pair of dependent random variables (X, Y) which take values in the finite sets X , Y, respectively. We assume that (X, Y) has a probability distribution denoted by p XY . For each i = 1, 2, the encoder ϕ (n) i outputs a binary sequence which appears at a rate R i bits per input symbol. The decoder function ψ (n) observes ϕ (n) 1 (X n ) and ϕ (n) 2 (Y n ) to output a sequence Y n := ψ (n) (ϕ (n) 1 (X n ), ϕ (n) 2 (Y n )), which is an estimation of Y n . When the switch is open, it is well known that the minimum transmission rate R 2 such that the error probability P (n) e := Pr{Y n =Ŷ n } of decoding tends to zero as n tends to infinity is given by H(Y). Csiszár and Longo [3] proved that, if R 2 < H(Y), then the correct probability P (n) c := Pr{Y n =Ŷ n } of decoding decay exponentially and derived the optimal exponent function. When the switch is open and R 1 > H(X), Slepian and Wolf [4] proved that H(Y|X) is the minimum transmission rate R 2 such that the error probability Pr{Y n =Ŷ n } of decoding tends to zero as n tends to infinity. Oohama and Han [5] proved that, if R 2 < H(Y|X), then the correct probability P (n) c := Pr{Y n =Ŷ n } of decoding decay exponentially and derived the optimal exponent function.
In this paper, we consider the strong converse theorem in the case where the switch is closed and 0 < R 1 < H(X). Let R AKW (p XY ) be the rate region of the AKW system. This region consists of the rate pair (R 1 , R 2 ) such that the error provability of decoding goes to zero as n tends to infinity. The rate region was determined by Ahlswede, Körner [1] and Wyner [2]. On the converse coding theorem, Ahlswede et al. [6] proved that, if (R 1 , R 2 ) is outside the rate region, then, P (n) c must tends to zero as n tends to infinity. Gu and Effors [7] examined a speed of convergence for P (n) c to tend to zero as n → ∞ by carefully checking the proof of Ahlswede et al. [6]. However, they could not obtain a result on an explicit form of the exponent function with respect to the code length n.
Our main results on the strong converse theorem for the AKW system are as follows. For the AKW system, we prove that, if (R 1 , R 2 ) is outside the rate region R AKW (p XY ), P (n) c must go to zero exponentially and derive an explicit lower bound of this exponent. This result corresponds to Theorem 3. As a corollary from this theorem, we obtain the strong converse result, which is stated in Corollary 2. This result states that we have an outer bound with O(1/ √ n) gap from the rate region R AKW (p XY ).
To derive our result, we use a new method called the recursive method. This method, which is a new method introduced by the author, includes a certain recursive algorithm for a single letterization of exponent functions. In a standard argument of proving converse coding theorems, single letterization methods based on the chain rule of the entropy functions are used. In general, the functions representing multi letter characterizations of exponent functions do not have the chain rule property. In such cases, the recursive method is quite useful for deriving single letterized bounds. The recursive method is a general powerful tool to prove strong converse theorems for several coding problems in information theory. In fact, the recursive method plays important roles in deriving exponential strong converse exponent for communication systems treated in [8][9][10][11][12].
On the strong converse theorem for the one helper source coding problem, we have two recent other works [13,14]. The above two works proved the strong converse theorem using different methods from our method. In [13], Watanabe found a relationship between the AKW system and the Gray-Wyner network. Using this relationship and the second order rate region for the Gray-Wyner network obtained by him [15], Watanabe established the strong converse theorem for the AKW system. In [14], Liu et al. introduced a new method to derive sharp strong converse bounds via a reverse hypercontractivity. Using this method, they obtained an outer bound of the rate region for the AKW system with O(1/ √ n) gap from the rate region. Furthermore, in [14], an extension of the AKW system to the case of Gaussian source and quadratic distortion is investigated, obtaining an outer bound with O(1/ √ n) gap from the rate distortion region for the extended source coding system. In his resent paper [16], Liu showed a lower bound (converse) on the dispersion of AWK as the variance of the linear combination of information densities.
The strong converse theorems seem to be regarded just as a mathematical problem and have been investigated mainly from theoretical interest. Recently, Watanabe and Oohama [17] have found an interesting security problem, which has a close connection with the strong converse theorem for the AKW system. Furthermore, Oohama and Santoso [18] and Santoso and Oohama [19] clarify that the exponential strong converse theorem obtained by this paper plays an essential role in deriving a strong sufficient secure condition for the privacy amplification in their new theoritical model of side channel attacks to the Shannon chipher systems. From the above two cases, we expect that exponential strong converse theorems for multiterminal source networks will serve as a strong tool to several information theoretical security problems.

Problem Formulation
Let X and Y be finite sets and {(X t , Y t )} ∞ t=1 be a stationary discrete memoryless source. For each t = 1, 2, · · · , the random pair (X t , Y t ) takes values in X × Y, and has a probability distribution We write n independent copies of {X t } ∞ t=1 and {Y t } ∞ t=1 , respectively as X n = X 1 , X 2 , · · · , X n and Y n = Y 1 , Y 2 , · · · , Y n .
We consider a communication system depicted in Figure 2. This communication system corresponds to the case where the switch is closed in Figure 1. Data sequences X n and Y n are separately encoded to ϕ (n) 1 (X n ) and ϕ (n) 2 (Y n ) and those are sent to the information processing center. At the center, the decoder function ψ (n) observes (ϕ (n) where for each i = 1, 2, ϕ The error probability of decoding is for any δ > 0, there exists a positive integer n 0 = n 0 (ε, δ) and a sequence of triples {(ϕ (n) 1 , ϕ (n) 2 , ψ (n) )} n≥n 0 such that, for n ≥ n 0 , For ε ∈ (0, 1), the rate region R AKW (ε|p XY ) is defined by We can show that the two rate regions R AKW (ε| p XY ), ε ∈ (0, 1) and R AKW (p XY ) satisfy the following property.

Property 1.
(a) The regions R AKW (ε|p XY ), ε ∈ (0, 1), and R AKW ( p XY ) are closed convex sets of R 2 + , where (b) R AKW (ε|p XY ) has another form using (n, ε)-rate region R AKW (n, ε|p XY ), the definition of which is as follows. We set Using R AKW (n, ε|p XY ), R AKW (ε|p XY ) can be expressed as Proof of this property is given in Appendix A. It is well known that R AKW (p XY ) was determined by Ahlswede, Körner and Wyner. To describe their result, we introduce an auxiliary random variable U taking values in a finite set U . We assume that the joint distribution of (U, X, Y) is The above condition is equivalent to U ↔ X ↔ Y. Define the set of probability distribution p = p UXY by We can show that the region R(p XY ) satisfies the following property.

Property 2.
(a) The region R(p XY ) is a closed convex subset of R 2 + .
(b) For any p XY , we have min The minimum is attained by (R 1 , R 2 ) = (0, H p (Y)). This result implies that Furthermore, the point (0, H p (Y)) always belongs to R(p XY ).
Property 2 part a is a well known property. Proof of Property 2 part b is easy. Proofs of Property 2 parts a and b are omitted. A typical shape of the rate region R(p XY ) is shown in Figure 3. The rate region R AKW (p XY ) was determined by Ahlswede and Körner [1] and Wyner [2]. Their results are the following.
On the converse coding theorem, Ahlswede et al. [6] obtained the following. Theorem 2 (Ahlswede et al. [6]). For each fixed ε ∈ (0, 1), we have R AKW (ε|p XY ) = R(p XY ). [7] examined a speed of convergence for P (n) e to tend to 1 as n → ∞ by carefully checking the proof of Ahlswede et al. [6]. However, they could not obtain a result on an explicit form of the exponent function with respect to the code length n.

Gu and Effors
Our aim is to find an explicit form of the exponent function for the error probability of decoding to tend to one as n → ∞ when (R 1 , R 2 ) / ∈ R AKW (p XY ). To examine this quantity, we define the following quantity. Set By time sharing, we have that Choosing R = R in the inequality (5), we obtain the following subadditivity property on {G (n) (R 1 , R 2 |p XY ) } n≥1 : from which this, and Fekete's subadditive lemma, we have that G (n) (R 1 , R 2 |p XY ) exists and satisfies the following: The exponent function G(R 1 , R 2 |p XY ) is a convex function of (R 1 , R 2 ). In fact, from the inequality (5), we have that for any α ∈ [0, 1] The region G(p XY ) is also a closed convex set. Our main aim is to find an explicit characterization of G(p XY ). In this paper, we derive an explicit outer bound of G (p XY ) whose section by the plane G = 0 coincides with R AKW (p XY ).

Main Results
In this section, we state our main result. We first explain that the region R(p XY ) can be expressed with a family of supporting hyperplanes. To describe this result, we define a set of probability distributions on U ×X ×Y by For µ ≥ 0, define Then, we have the following property.
(c) For any p XY , we have R sh (p XY ) = R(p XY ).
Property 3 part a is stated as Lemma A1 in Appendix B. Proof of this lemma is given in this appendix. Proofs of Property 3 parts b and c are given in Appendix C. Set , We next define a function serving as a lower bound of F(R 1 , R 2 |p XY ). For λ ≥ 0 and for p UXY ∈ P sh (p XY ), definẽ Furthermore, set We can show that the above functions satisfy the following property.

The second equality implies thatΩ
: and set where g is the inverse function of ϑ(a) := a + (5/4)a 2 , a ≥ 0.
Property 3 part a is stated as Lemma A2 in Appendix B. Proof of this lemma is given in this appendix. Proof of Property 4 part b is given in Appendix D. Proofs of Property 4 parts c, d, e, and f are given in Appendix E.
Our main result is the following.

Theorem 3.
For any R 1 , R 2 ≥ 0, any p XY , and for any (ϕ It can be seen from Property 4 parts b and f that F(R 1 , is outside the rate region R(p XY ). Hence, by Theorem 3, we have that, if (R 1 , R 2 ) is outside the rate region, then the error probability of decoding goes to one exponentially and its exponent is not below F(R 1 , R 2 |p XY ). It immediately follows from Theorem 3 that we have the following corollary. Corollary 1.
Proof of Theorem 3 will be given in the next section. The exponent function at rates outside the rate region was derived by Oohama and Han [5] for the separate source coding problem for correlated sources [4]. The techniques used by them is a method of types [21], which is not useful to prove Theorem 3. Some novel techniques based on the information spectrum method introduced by Han [22] are necessary to prove this theorem.
From Theorem 3 and Property 4 part e, we can obtain an explicit outer bound of R AKW (ε|p XY ) with an asymptotically vanishing deviation from R AKW (p XY ) = R(p XY ). The strong converse theorem established by Ahlswede et al. [6] immediately follows from this corollary. To describe this outer bound, for κ > 0, we set which serves as an outer bound of R(p XY ). For each fixed ε ∈ (0, 1), we define κ n = κ n (ε, ρ(p XY )) by Step (a) follows from ϑ(a) = a + (5/4)a 2 . Since κ n → 0 as n → ∞, we have the smallest positive integer n 0 = n 0 (ε, ρ(p XY )) such that κ n ≤ (1/2)ρ(p XY ) for n ≥ n 0 . From Theorem 3 and Property 4 part e, we have the following corollary.
Proof of this corollary will be given in the next section.

Proof of the Main Result
Let (X n , Y n ) be a pair of random variables from the information source. We set S = ϕ (n) 1 (X n ). Joint distribution p SX n Y n of (S, X n , Y n ) is given by It is obvious that S ↔ X n ↔ Y n . Then, we have the following lemma, which is well known as a single shot infomation spectrum bound. Lemma 1. For any η > 0 and for any (ϕ The probability distributions appearing in the three inequalities (12), (13), and (14) in the right members of (15) have a property that we can select them as arbitrary. In (12), we can choose any probability distributionq SX n Y n on S×X n ×Y n . In (13), we can choose any distribution Q X n on X n . In (14), we can choose any stochastic matrixQ X n |U n : X n → U n .
This lemma can be proved by a standard argument in the information spectrum method [22]. The detail of the proof is given in Appendix F. Next, we single letterize the four information spectrum quantities inside the first term in the right members of (15) in Lemma 1 to obtain the following lemma.

Lemma 2.
For any η > 0 and for any (ϕ where for each t = 1, 2, · · · , n, the probability distribution Q X t on X appearing in (16) and the stochastic matrix Q X t |SX t−1 : M 1 × X t−1 → X appearing in (17) have a property that we can choose their values arbitrary.
Proof. In (12) in Lemma 1, we chooseq SX n Y n having the form In (13) in Lemma 1, we choose Q X n having the form We further note that Then, the bound (15) in Lemma 1 becomes completing the proof.
As in the standard converse coding argument, we identify auxiliary random variables, based on the bound in Lemma 2. The following lemma is necessary for such identification.
Lemma 3. Suppose that, for each t = 1, 2, · · · , n, the joint distribution p SX t Y t of the random vector SX t Y t is a marginal distribution of p SX n Y n . Then, we have the following Markov chain: (18) or equivalently that I(Y t ; SX t−1 |X t ) = 0. Furthermore, we have the following Markov chain: (19) or equivalently that I(X t Y t ; Y t−1 |SX t−1 ) = 0. The above two Markov chains are equivalent to the following one long Markov chain: Proof of this lemma is given in Appendix G. For t = 1, 2, · · · , n, set U t := M 1 ×X t−1 . Define a random variable U t ∈ U t by U t := (S, X t−1 ). From Lemmas 2 and 3, we identify auxiliary random variables to obtain the following lemma.
where, for each t = 1, 2, · · · , n, the probability distribution Q X t on X appearing in (21) and the stochastic matrixQ X t |U t : U t → X appearing in (22) have a property that we can choose their values arbitrary.
Now, the challenge is that, although the quantities inside the first term in the right members of (23) in Lemma 4 have n sum of information spectrum quantities, the measure p SX n Y n does not have an i.i.d. structure in general. To resolve this, we first use the large deviation theory to upper bound the first quantity in the right members of (23). For each t = 1, 2, · · · , n, set Q t := (Q X t ,Q X t |U t ). Let Q t be a set of all Q t . We define a quantity which serves as an exponential upper bound of P Let P (n) (p XY ) be a set of all probability distributions p SX n Y n on M 1 ×X n ×Y n having a form: For simplicity of notation, we use the notation p (n) for p SX n Y n ∈ P (n) (p XY ). For each t = 1, 2, · · · , n, where for each t = 1, 2, · · · , n, the probability distribution Q X t and the conditional probability distributionQ X t |U t appearing in the definition of Ω (µ,θ) (p (n) , Q n ) can be chosen as arbitrary.
The following is well known as the Cramèr's bound in the large deviation principle.

Lemma 5.
For any real valued random variable Z and any α ≥ 0, we have By Lemmas 4 and 5, we have the following proposition.
By Proposition 1, we have the following corollary.

Corollary 3.
For any (µ, α) ∈ [0, 1] 2 and any (ϕ We shall call Ω (µ,α) (p XY ) the communication potential. The above corollary implies that the analysis of Ω (µ,α) ( p XY ) leads to an establishment of a strong converse theorem for the one helper source coding problem. Note here that Ω (µ,α) (p XY ) is still a multi letter quantity. However, we successfully single letterize this quantity. This result which will be stated later in Proposition 2 is a mathematical core of our main result.
In the following argument, we drive an explicit lower bound of Ω (µ,α) (p XY ). For each t = 1, 2, · · · , n, set u t = (s, x t−1 ) ∈ U t and For t = 1, 2, · · · , n, define a function of (u t , x t , y t ) ∈ U t ×X ×Y by By definition, we have For each t = 1, 2, · · · , n, we define the probability distribution are constants for normalization. For t = 1, 2, · · · , n, define where we define C 0 = 1. Then, we have the following lemma.

Lemma 6.
For each t = 1, 2, · · · , n, and for any (s, x t , y t ) ∈ M 1 ×X t ×Y t , we have Furthermore, we have Proof of this lemma is given in Appendix H. Define Then, we have the following lemma, which is a key result to derive a single letterized lower bound of Ω (µ,α) (p XY ).

Lemma 7.
For any p (n) ∈ P (n) (p XY ) and any Q n ∈ Q n , we have Proof. We first prove (29). From (26), we have Furthermore, by definition, we have From (31) and (32), (29) is obvious. We next prove (30). We first observe that for (s, x t , y t ) ∈ S × X t × Y t and for t = 1, 2, · · · , n, Step (a) follows from Lemma 3. Then, by Lemma 6, we have completing the proof.
The following proposition is a mathematical core to prove our main result.

Proposition 2.
For any µ ∈ [0, 1] and any α ≥ 0, we have For each t = 1, 2, · · · , n, we define q t = q U t X t Y t Z t by Equation (33) implies that q t = q U t X t Y t ∈ Q n (p Y|X ). Furthermore, for each t = 1, 2, · · · , n, we choose For this choice of Q t , we have the following chain of inequalities: Step (a) follows from Lemma 7 and (33).
Step (b) follows from the choice (Q X t ,Q X t |U t ) = (q X t , q X t |U t ) of (Q X t ,Q X t |U t ) for t = 1, 2, · · · , n.
Step (d) follows from q t ∈ Q n (p Y|X ) and the definition ofΩ (µ,α) n (p XY ).
Step (e) follows from Property 4 part a. Hence, we have the following: Step (a) follows from Lemma 7.

One Helper Problem Studied by Wyner
We consider a communication system depicted in Figure 4. Data sequences X n , Y n , and Z n , respectively are separately encoded to ϕ (n) 1 (X n ) and ϕ (n) 2 (Y n ) are sent to the information processing center 1. The encoded data ϕ (n) 1 (X n ) and ϕ (n) 3 (Z n ) are sent to the information processing center 2. At center 1, the decoder function ψ (n) observes (ϕ (n) 1 (X n ), ϕ (n) 2 (Y n )) to output the estimationŶ n of Y n . At center 2, the decoder function φ (n) observes (ϕ (n) 1 (X n ), ϕ (n) 3 (Z n )) to output the estimationẐ n of Z n . The error probability of decoding is EẐn EŶn Figure 4. One helper source coding system investigated by Wyner.

Property 5.
(a) The regions R W (ε|p XYZ ), ε ∈ (0, 1), and R W ( p XYZ ) are closed convex sets of R 3 which is called the (n, ε)-rate region. Using R W (n, ε|p XYZ ), R W (ε|p XYZ ) can be expressed as It is well known that R W (p XYZ ) was determined by Wyner. To describe his result, we introduce an auxiliary random variable U taking values in a finite set U . We assume that the joint distribution of (U, X, Y, Z) is p UXY (u, x, y, z) = p U (u)p X|U (x|u)p YZ|X (y, z|x).
The above condition is equivalent to U ↔ X ↔ YZ. Define the set of probability distribution on U ×X ×Y ×Z by

R(p).
We can show that the region R(p XYZ ) satisfies the following property.

Property 6.
(a) The region R(p XYZ ) is a closed convex subset of R 3 + .
The rate region R W (p XYZ ) was determined by Wyner [2]. His result is the following.

Theorem 5 (Csiszár and Körner
To examine a rate of convergence for the error probability of decoding to tend to one as n → ∞ for (R 1 , R 2 , R 3 ) / ∈ R W (p XYZ ), we define the following quantity. Set By time sharing, we have that Choosing R = R in (42), we obtain the following subadditivity property on {G (n) (R 1 , R 2 , R 3 |p XYZ ) } n≥1 : from which we have that G(R 1 , R 2 , R 3 |p XYZ ) exists and satisfies the following: The exponent function G(R 1 , R 2 , R 3 |p XYZ ) is a convex function of (R 1 , R 2 , R 3 ). In fact, by time sharing, we have that from which we have that for any α ∈ [0, 1] The region G(p XYZ ) is also a closed convex set. Our main aim is to find an explicit characterization of G(p XYZ ). In this paper, we derive an explicit outer bound of G (p XYZ ) whose section by the plane G = 0 coincides with R W (p XYZ ). We first explain that the region R(p XYZ ) has another expression using the supporting hyperplane. We define two sets of probability distributions on U ×X ×Y ×Z by Then, we have the following property.
(c) For any p XYZ , we have R sh (p XYZ ) = R(p XYZ ).
We can show that the above functions and sets satisfy the following property.
Proof of Theorem 6 is quite parallel with that of Theorem 3. We omit the detail of the proof. From Theorem 6 and Property 8 part e, we can obtain an explicit outer bound of R W (ε|p XYZ ) with an asymptotically vanishing deviation from R W (p XYZ ) = R(p XYZ ). The strong converse theorem established by Csiszár and Körner [21] immediately follows from this corollary. To describe this outer bound, for κ > 0, we set which serves as an outer bound of R(p XYZ ). For each fixed ε ∈ (0, 1), we defineκ n =κ n (ε, ρ(p XYZ )) by Step (a) follows from ϑ(a) = a + (5/4)a 2 . Sinceκ n → 0 as n → ∞, we have the smallest positive integer n 1 = n 1 (ε, ρ(p XYZ )) such thatκ n ≤ (1/2)ρ(p XYZ ) for n ≥ n 1 . From Theorem 6 and Property 8 part e, we have the following corollary.
Proof of this corollary is quite parallel with that of Corollary 2. We omit the detail.

Conclusions
For the AWZ system, the one helper source coding system posed by Ahlswede, Körner [1] and Wyner [2], we have derived an explicit lower bound of the optimal exponent function G(R 1 , R 2 |p XY ) on the correct probability of decoding for (R 1 , R 2 ) / ∈ R WZ (p XY ). We have described this result in Theorem 3. Furthermore, for the source coding system posed and investigated Wyner [2], we have obtained an explicit lower bound of the optimal exponent function G(R 1 , R 2 , R 3 |p XYZ ) on the correct probability of decoding for (R 1 , R 2 , R 3 ) / ∈ R W (p XYZ ). We have described this result in Theorem 6. The determination problems of G(R 1 , R 2 |p XY ) and G(R 1 , R 2 , R 3 |p XYZ ) still remain to be resolved. Those problems are our future works.

Acknowledgments:
The author is very grateful to Shun Watanabe and Shigeaki Kuzuoka for their helpful comments.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Properties of the Rate Regions
In this appendix, we prove Property 1. Property 1 part a can easily be proved by the definitions of the rate distortion regions. We omit the proofs of this part. In the following argument, we prove the part b.

Proof of Property 1 part b: We set
By the definitions of R AKW (m, ε|p XY ) and R AKW (ε|p XY ), we have that R AKW (m, ε|p XY ) ⊆ R AKW (ε|p XY ) for m ≥ 1. Hence, we have that We next assume that (R 1 , Then, by the definitions of R AKW (n, ε |p XY ) and R AKW ( ε|p XY ), we have that, for any δ > 0, there exists n 0 (ε, δ) such that for any n ≥ n 0 (ε, δ), (R 1 + δ, R 2 + δ) ∈ R AKW (n, ε|p XY ), which implies that Here, we assume that there exists a pair (R 1 , R 2 ) belonging to R AKW (ε|p XY ) such that Since the set on the right-hand side of (A3) is a closed set, we have for some small δ > 0. On the other hand, we have (R 1 + δ, R 2 + δ) ∈ R (δ) AKW (ε|p XY ), which contradicts (A2). Thus, we have Note here that R AKW (ε|p XY ) is a closed set. Then, from (A5), we conclude that completing the proof.

Appendix B. Cardinality Bound on Auxiliary Random Variables
We first prove the following lemma.
Proof. We bound the cardinality |U | of U to show that the bound |U | ≤ |X | is sufficient to describe R (µ) (p XY ). Observe that where For each u ∈ U , π(p X|U (·|u)) is a continuous function of p X|U (·|u). Then, by the support lemma, |U | ≤ |X | − 1 + 1 = |X | is sufficient to express |X | − 1 values of (A6) and one value of (A7).
Next, we prove the following lemma.

Appendix C. Supporting Hyperplain Expressions of R(p XY )
In this appendix we prove Property 3 parts (b), (c). We first prove the part (b).

Proof of Property 3 part b:
For any µ ≥ 0, we have the following chain of inequalities: Step (a) follows from Lemma A1 stating that the cardinality bound |U | ≤ |X | + 1 in P(p XY ) can be reduced to that |U | ≤ |X | in P sh (p XY ).
We next prove part c. We first prepare a lemma useful to prove this property. From the convex property of the region R(p XY ), we have the following lemma.
Lemma A3. Suppose that (R 1 ,R 2 ) does not belong to R(p XY ). Then, there exist > 0 and µ 0 ≥ 0 such that for any (R 1 , R 2 ) ∈ R(p XY ) we have Proof of this lemma is omitted here. Lemma A3 is equivalent to the fact that if the region R(p XY ) is a convex set; then, for any point (R 1 ,R 2 ) outside the region R(p XY ), there exists a line which separates the point (R 1 ,R 2 ) from the region R(p XY ).

Proof of Property 3 part c:
We first prove R sh (p XY ) ⊆ R(p XY ). We assume that (R 1 ,R 2 ) / ∈ R(p XY ). Then, by Lemma A3, there exist > 0 and µ 0 ≥ 0 such that for any (R 1 , R 2 ) ∈ R(p XY ), we have Then, we have Step (a) follows from the definition of R(p XY ). The inequality (A12) implies that (R 1 ,R 2 ) / ∈ R sh (p XY ). Thus R sh (p XY ) ⊆ R(p XY ) is concluded.

Appendix D. Proof of Property 4 Part b
In this appendix, we prove Property 4 part b. Fix q = q UXY ∈ Q(p Y|X ) and p = p UXY = (p U|X , p XY ) ∈ P sh ( p XY ) arbitrary. For β ≥ 0, p ∈ P sh (p XY ), and q Y|U induced by q, definê Then, we have the following two lemmas.
Proof of Lemma A4: We fix (µ, α) ∈ [0, 1] 2 arbitrary. For each q = q UXY ∈ Q(p Y|X ), we choose p = p UXY ∈ P sh (p XY ) so that p U|X = q U|X . Then, we have the following: where we set Step (a) follows from Hölder's inequality. From (A17), we can see that it suffices to show A ≤ 1 to complete the proof. When µ = 1, we have A = 1. When µ ∈ [0, 1), we apply Hölder's inequality to A to obtain Hence, we have (A13) in Lemma A4.

Proof of Lemma A5:
We fix µ ∈ [0, 1], α ∈ [0, 1/2), arbitrary. For any p = p UXY ∈ P sh (p XY ), and any q = q UXY ∈ Q(p Y|X ), we have the following chain of inequalities: where we set Step (a) follows from Hölder's inequality. From (A18), we can see that it suffices to show B ≤ 1 to complete the proof. In a manner quite smilar to the proof of A ≤ 1 in the proof of (A13) in Lemma A4, we can show that B ≤ 1. Thus, we have (A14) in Lemma A5.
Then, we have the following lemma.

Lemma A7.
p SX n Y n (A c n ) ≤ e −nη , p SX n Y n (B c n ) ≤ e −nη , p SX n Y n (C c n ) ≤ e −nη , p SX n Y n (D c n ∩ E n ) ≤ e −nη .