1. Introduction
For single or multi terminal source coding systems, the converse coding theorems state that at any data compression rates below the fundamental theoretical limit of the system the error probability of decoding cannot go to zero when the block length n of the codes tends to infinity. On the other hand, the strong converse theorems state that, at any transmission rates exceeding the fundamental theoretical limit, the error probability of decoding must go to one when n tends to infinity. The former converse theorems are sometimes called the weak converse theorems to distinguish them with the strong converse theorems.
In this paper, we study the strong converse theorem for the rate distortion problem with side information at the decoder posed and investigated by Wyner and Ziv [
1]. We call the above source coding system the Wyner and Ziv source coding system (the WZ system). The WZ system is shown in 
Figure 1. In this figure, the WZ system corresponds to the case where the switch is close. In 
Figure 1, the sequence 
 represents independent copies of a pair of dependent random variables 
 which take values in the finite sets 
 and 
, respectively. We assume that 
 has a probability distribution denoted by 
. The encoder 
 outputs a binary sequence which appears at a rate 
R bits per input symbol. The decoder function 
 observes 
 and 
 to output a sequence 
. The 
t-th component 
 of 
 for 
 take values in the finite reproduction alphabet 
. Let 
 be an arbitrary distortion measure on 
. The distortion between 
 and 
 is defined by
In general, we have two criteria on 
. One is the excess-distortion probability of decoding defined by
The other is the average distortion defined by
A pair 
 is 
-
achievable for 
 if there exist a sequence of pairs 
  such that for any 
 and any 
n with 
where 
 stands for the range of cardinality of 
. The rate distortion region 
 is defined by
On the other hand, we can define a rate distortion region based on the average distortion criterion, a formal definition of which is the following. A pair 
 is 
achievable for 
 if there exist a sequence of pairs 
 such that for any 
 and any 
n with 
,
The rate distortion region 
 is defined by
If the switch is open, then the side information is not available to the decoder. In this case the communication system corresponds to the source coding for the discrete memoryless source (DMS) specified with . We define the rate distortion region  in a similar manner to the definition of . We further define the region  and , respectively in a similar manner to the definition of  and .
Previous works on the characterizations of 
, 
, and 
 are shown in 
Table 1. Shannon [
2] determined 
. Subsequently, Wolfowiz [
3] proved that 
 Furthermore, he proved the strong converse theorem. That is, if 
 , then for any sequence 
 of encoder and decoder functions satisfying the condition
we have
The above strong converse theorem implies that, for any 
,
Csiszár and Körner proved that in Equation (3), the probability  converges to one exponentially and determined the optimal exponent as a function of .
The previous works on the coding theorems for the WZ system are summarized in 
Table 1. The rate distortion region 
 was determined by Wyner and Ziv [
1]. Csiszár and Körner [
4] proved that 
 . On the other hand, we have had no result on the strong converse theorem for the WZ system.
Main results of this paper are summarized in 
Table 1. For the WZ system, we prove that if 
 is out side the rate distortion region 
, then we have that for any sequence 
 of encoder and decoder functions satisfying the conditionin Equation (
2), the quantity 
 goes to zero exponentially and derive an explicit lower bound of this exponent function. This result corresponds to Theorem 3 in 
Table 1. As a corollary from this theorem, we obtain the strong converse result, which is stated in Corollary 2 in 
Table 1. This results states that we have an outer bound with 
 gap from the rate distortion region 
.
To derive our result, we use a new method called the recursive method. This method is a general powerful tool to prove strong converse theorems for several coding problems in information theory. In fact, the recursive method plays important roles in deriving exponential strong converse exponent for communication systems treated in [
5,
6,
7,
8].
  2. Source Coding with Side Information at the Decoder
In the following argument, the operations 
 and 
, respectively, stand for the expectation and the variance with respect to a probability distribution 
p. When the value of 
p is obvious from the context, we omit the suffix 
p in those operations to simply write 
 and 
. Let 
 and 
 be finite sets and 
 be a stationary discrete memoryless source. For each 
, the random pair 
 takes values in 
, and has a probability distribution
We write 
n independent copies of 
 and 
, respectively, as
We consider a communication system depicted in 
Figure 2. Data sequences 
 is separately encoded to 
 and is sent to the information processing center. At the centerm the decoder function 
 observes 
 and 
 to output the estimation 
 of 
. The encoder function 
 is defined by
Let 
 be a reproduction alphabet. The decoder function 
 is defined by
Let 
 be an arbitrary distortion measure on 
. The distortion between 
 and 
 is defined by
The excess-distortion probability of decoding is
where 
. The average distortion 
 between 
 and 
 is defined by
In the previous section, we gave the formal definitions of , , , and . We can show that the above three rate distortion regions satisfy the following property.
Property 1. - (a) 
- The regions , , , and  are closed convex sets of , where 
- (b) 
-  has another form using -rate distortion region, the definition of which is as follows. We setwhich is called the -rate distortion region. Using ,  can be expressed aswhere  stands for the closure operation. 
 It is well known that 
 was determined by Wyner and Ziv [
1]. To describe their result we introduce auxiliary random variables 
U and 
Z, respectively, taking values in finite sets 
 and 
. We assume that the joint distribution of 
 is
The above condition is equivalent to
Define the set of probability distribution 
 by
By definitions, it is obvious that 
. Set
We can show that the above functions and sets satisfy the following property:
Property 2. - (a) 
- The region  is a closed convex set of . 
- (b) 
 Proof of Property 2 is given in 
Appendix C. In Property 2 Part (b), 
 is regarded as another expression of 
. This expression is useful for deriving our main result. The rate region 
 was determined by Wyner and Ziv [
1]. Their result is the following:
On 
, Csiszár and Körner [
4] obtained the following result.
Theorem 2  (Csiszár and Körner [
4])
. We are interested in an asymptotic behavior of the error probability of decoding to tend to one as 
 for 
. To examine the rate of convergence, we define the following quantity. Set
By time sharing, we have that
Choosing 
 and 
 in Equation (
7), we obtain the following subadditivity property on 
:
which together with Fekete’s lemma yields that 
 exists and satisfies the following:
The exponent function 
 is a convex function of 
. In fact, from Equation (
7), we have that for any 
where 
. The region 
 is also a closed convex set. Our main aim is to find an explicit characterization of 
. In this paper, we derive an explicit outer bound of 
  whose section by the plane 
 coincides with 
.
  3. Main Results
In this section, we state our main results. We first explain that the rate distortion region 
 can be expressed with two families of supporting hyperplanes. To describe this result, we define two sets of probability distributions on 
 by
Then, we have the following property:
Proof of Property 3 is given in 
Appendix D. For 
 and 
, define
We next define a functions serving as a lower bound of 
. For each 
, define
We can show that the above functions satisfies the following properties:
Property 4. - (a) 
- The cardinality bound  appearing in  is sufficient to describe the quantity . Furthermore, the cardinality bound  in  is sufficient to describe the quantity . 
- (b) 
- For any , we have 
- (c) 
- Fix any  and . For ,  exists and is nonnegative. For , define a probability distribution  by - Then, for ,  is twice differentiable. Furthermore, for , we have - The second equality implies that  is a concave function of . 
- (d) 
- For , defineand set - Then, we have . Furthermore, for any , 
- (e) 
- For every , the condition  implieswhere g is the inverse function of . 
 Proof of Property 4 Part (a) is given in 
Appendix B. Proof of Property 4 Part (b) is given in 
Appendix E. Proofs of Property 4 Parts (c), (d), and (e) are given in 
Appendix F.
Our main result is the following:
Theorem 3. For any , any , and for any  satisfying  we have  It follows from Theorem 3 and Property 4 Part (d) that if  is outside the rate distortion region, then the error probability of decoding goes to one exponentially and its exponent is not below .
It immediately follows from Theorem 3 that we have the following corollary.
Corollary 1. For any  and any , we have Furthermore, for any , we have  Proof of Theorem 3 will be given in the next section. The exponent function in the case of 
 can be obtained as a corollary of the result of Oohama and Han [
9] for the separate source coding problem of correlated sources [
10]. The techniques used by them is a method of types [
4], which is not useful for proving Theorem 3. In fact, when we use this method, it is very hard to extract a condition related to the Markov chain condition 
, which the auxiliary random variable 
 must satisfy when 
 is on the boundary of the set 
. Some novel techniques based on the information spectrum method introduced by Han [
11] are necessary to prove this theorem.
From Theorem 3 and Property 4 Part (e), we can obtain an explicit outer bound of 
 with an asymptotically vanishing deviation from 
 . The strong converse theorem immediately follows from this corollary. To describe this outer bound, for 
, we set
which serves as an outer bound of 
. For each fixed 
, we define 
 by
Step (a) follows from . Since  as , we have the smallest positive integer  such that  for . From Theorem 3 and Property 4 Part (e), we have the following corollary.
Corollary 2. For each fixed ε, we choose the above positive integer  Then, for any , we have The above result together withyields that for each fixed , we have  Proof of this corollary will be given in the next section.
The direct part of coding theorem, i.e., the inclusion of 
⊆
 was established by Csiszár and Körner [
4]. They proved a weak converse theorem to obtain the inclusion 
 . Until now, we have had no result on the strong converse theorem. The above corollary stating the strong converse theorem for the Wyner–Ziv source coding problem implies that a long standing open problem since Csiszár and Körner [
4] has been resolved.
  4. Proof of the Main Results
In this section, we prove Theorem 3 and Corollary 2. We first present a lemma which upper bounds the correct probability of decoding by the information spectrum quantities. We set
Then, we have the following:
Lemma 1. For any  and for any ,  satisfying  we have The probability distribution and stochastic matrices appearing in the right members of Equation (18) have a property that we can select them arbitrary. In Equation (14), we can choose any probability distribution  on . In Equation (15), we can choose any stochastic matrix  . In Equation (16), we can choose any stochastic matrix  ×. In Equation (17), we can choose any stochastic matrix ×.
 Lemma 2. Suppose that, for each , the joint distribution  of the random vector  is a marginal distribution of . Then, for , we have the following Markov chain:or equivalently that .  Proof of this lemma is given in 
Appendix H. For 
, set 
. Let 
 be a random vector taking values in 
 ×
 ×
. From Lemmas 1 and 2, we have the following:
Lemma 3. For any  and for any ,  satisfying  we have the following:where for each , the following probability distribution and stochastic matrices:appearing in the first term in the right members of Equation (21) have a property that we can choose their values arbitrary.  Proof.  On the probability distributions appearing in the right members of Equation (18), we take the following choices. In Equation (14), we choose 
 so that
In Equation (15), we choose 
 so that
In Equation (16), we choose 
 so that
In Equation (16), we note that
Step (a) follows from Lemma 2. In Equation (17), we choose 
 so that
From Lemma 1 and Equations (21)–(25), we have the bound of Equation (21) in Lemma 3. ☐
 To evaluate an upper btound of Equation (
21) in Lemma 3, we use the following lemma, which is well known as the Cramér’s bound in the large deviation principle.
Lemma 4. For any real valued random variable A and any , we have  Here, we define a quantity which serves as an exponential upper bound of 
 . For each 
, let 
 be a set of all
Let 
 be a set of all probability distributions 
 on 
 having the form:
For simplicity of notation, we use the notation 
 for 
 . We assume that 
 is a marginal distribution of 
. For 
, we simply write 
. For 
  and 
, we define
where, for each 
, the following probability distribution and stochastic matrices:
appearing in the definition of 
 are chosen so that they are induced by the joint distribution 
.
By Lemmas 3 and 4, we have the following proposition:
Proposition 1. For any  , any , and any  satisfying  we have  Proof.  When 
, the bound we wish to prove is obvious. In the following argument, we assume that 
  We define five random variables 
 by
By Lemma 3, for any 
 satisfying 
 we have
where we set
Applying Lemma 4 to the first term in the right member of Equation (26), we have
Solving Equation (28) with respect to 
, we have
For this choice of 
 and Equation (27), we have
completing the proof. ☐
 By Proposition 1, we have the following corollary.
Corollary 3. For any , for any , and for any  satisfying  we have  We shall call 
 the communication potential. The above corollary implies that the analysis of 
 leads to an establishment of a strong converse theorem for Wyner–Ziv source coding problem. In the following argument, we drive an explicit lower bound of 
. We use a new technique we call 
the recursive method. The recursive method is a powerful tool to drive a single letterized exponent function for rates below the rate distortion function. This method is also applicable to prove the exponential strong converse theorems for other network information theory problems [
5,
6,
7]. Set
For each 
, define a function of 
     by
For each 
, we define the conditional probability distribution
by
where
are constants for normalization. For 
, define
where we define 
 for 
 Then, we have the following lemma:
Lemma 5. For each , and for any    , we have  The equality in Equation (34) in Lemma 5 is obvious from Equations (29)–(31). Proofs of Equations (32) and (33) in this lemma are given in 
Appendix I. Next, we define a probability distribution of the random pair 
 taking values in 
 by
where 
 is a constant for normalization given by
For 
, define
where we define 
. Set
Then, we have the following:
Proof.  By the equality Equation (34) in Lemma 5, we have
Step (a) follows from the definition in Equation (36) of 
 We next prove Equation (39) in Lemma 6. Multiplying 
 to both sides of Equation (35), we have
Taking summations of Equations (41) and (42) with respect to 
, we have
      
        
      
      
      
      
     Step (a) follows from Equation (33) in Lemma 5. Step (b) follows from the definition in Equation (37) of . ☐
 The following proposition is a mathematical core to prove our main result.
Proposition 2. For , we choose the parameter α such that Then, for any  and for any , we have  Proof.  Then, by Lemma 6, we have
For each 
, we recursively choose 
 so that 
 and choose 
, 
, 
, and 
 appearing in
such that they are the distributions induced by 
. Then, for each 
⋯, 
n, we have the following chain of inequalities:
Step (a) follows from Hölder’s inequality and the following identity:
Step (b) follows from Equation (43). Step (c) follows from the definition of 
. Step (d) follows from that by Property 4 Part (a), the bound 
, is sufficient to describe 
. Hence, we have the following:
Step (a) follows from Equation (38) in Lemma 6. Step (b) follows from Equation (45). Since Equation (46) holds for any 
 and any 
, we have
Thus, we have Equation (44) in Proposition 2. ☐
 Proof of Theorem 3: Then, we have the following:
Step (a) follows from Corollary 3. Step (b) follows from Proposition 2 and Equation (47). Since the above bound holds for any positive 
, 
 and 
, we have
Thus, Equation (10) in Theorem 3 is proved. ☐
 Proof of Corollary 2: Since 
g is an inverse function of 
, the definition in Equation (
13) of 
 is equivalent to
By the definition of 
, we have that 
 for 
. We assume that for 
, 
 Then, there exists a sequence 
  such that for 
, we have
Then, by Theorem 3, we have
for any 
. We claim that for 
, we have 
 ∈ 
. To prove this claim, we suppose that 
 does not belong to 
 for some 
. Then, we have the following chain of inequalities:
Step (a) follows from 
 and Property 4 Part (e). Step (b) follows from Equation (48). The bound of Equation (50) contradicts Equation (49). Hence, we have 
 ∈ 
 or equivalent to
for 
, which implies that for 
,
completing the proof. ☐