Next Article in Journal
Fog Computing: Enabling the Management and Orchestration of Smart City Applications in 5G Networks
Next Article in Special Issue
A Coding Theorem for f-Separable Distortion Measures
Previous Article in Journal
Polar Codes for Covert Communications over Asynchronous Discrete Memoryless Channels
Previous Article in Special Issue
Rate Distortion Functions and Rate Distortion Function Lower Bounds for Real-World Sources
Article

Rate-Distortion Region of a Gray–Wyner Model with Side Information

1
Department of Electronics Optronics and Signal Processing (DEOS), Institut Superieur de l’Aéronautique et de l’Espace Supaéro (ISAE Supaéro), 31400 Toulouse, France
2
Mathematics and Algorithmic Sciences Lab, Huawei Technologies France, 92100 Boulogne-Billancourt, France
*
Author to whom correspondence should be addressed.
Current address: Institut d’Électronique et d’Informatique Gaspard-Monge, Université Paris-Est, 77454 Champs-sur-Marne, France.
Entropy 2018, 20(1), 2; https://doi.org/10.3390/e20010002
Received: 29 November 2017 / Revised: 13 December 2017 / Accepted: 15 December 2017 / Published: 22 December 2017
(This article belongs to the Special Issue Rate-Distortion Theory and Information Theory)

Abstract

In this work, we establish a full single-letter characterization of the rate-distortion region of an instance of the Gray–Wyner model with side information at the decoders. Specifically, in this model, an encoder observes a pair of memoryless, arbitrarily correlated, sources ( S 1 n , S 2 n ) and communicates with two receivers over an error-free rate-limited link of capacity R 0 , as well as error-free rate-limited individual links of capacities R 1 to the first receiver and R 2 to the second receiver. Both receivers reproduce the source component S 2 n losslessly; and Receiver 1 also reproduces the source component S 1 n lossily, to within some prescribed fidelity level D 1 . In addition, Receiver 1 and Receiver 2 are equipped, respectively, with memoryless side information sequences Y 1 n and Y 2 n . Important in this setup, the side information sequences are arbitrarily correlated among them, and with the source pair ( S 1 n , S 2 n ) ; and are not assumed to exhibit any particular ordering. Furthermore, by specializing the main result to two Heegard–Berger models with successive refinement and scalable coding, we shed light on the roles of the common and private descriptions that the encoder should produce and the role of each of the common and private links. We develop intuitions by analyzing the developed single-letter rate-distortion regions of these models, and discuss some insightful binary examples.
Keywords: rate-distortion; Gray–Wyner; side-information; Heegard–Berger; successive refinement rate-distortion; Gray–Wyner; side-information; Heegard–Berger; successive refinement

1. Introduction

The Gray–Wyner source coding problem was originally formulated, and solved, by Gray and Wyner in [1]. In their original setting, a pair of arbitrarily correlated memoryless sources ( S 1 n , S 2 n ) is to be encoded and transmitted to two receivers that are connected to the encoder each through a common error-free rate-limited link as well as a private error-free rate-limited link. Because the channels are rate-limited, the encoder produces a compressed bit string W 0 of rate R 0 that it transmits over the common link, and two compressed bit strings, W 1 of rate R 1 and W 2 of rate R 2 , that transmits over the private links each to their respective receiver. The first receiver uses the bit strings W 0 and W 1 to reproduce an estimate S ^ 1 n of the source component S 1 n to within some prescribed distortion level D 1 , for some distortion measure d 1 ( · , · ) . Similarly, the second receiver uses the bit strings W 0 and W 2 to reproduce an estimate S ^ 2 n of the source component S 2 n to within some prescribed distortion level D 2 , for some different distortion measure d 2 ( · , · ) . In [1], Gray and Wyner characterized the optimal achievable rate triples ( R 0 , R 1 , R 2 ) and distortion pairs ( D 1 , D 2 ) .
Figure 1 shows a generalization of the original Gray–Wyner model in which the receivers also observe correlated memoryless side information sequences, Y 1 n at Receiver 1 and Y 2 n at Receiver 2. Some special cases of the Gray–Wyner model with side information of Figure 1 have been solved (see the Section 1.2 below). However, in its most general form, i.e., when the side information sequences are arbitrarily correlated among them and with the sources, this problem has so-far eluded single-letter characterization of the optimal rate-distortion region. Indeed, the Gray–Wyner problem with side information subsumes the well known Heegard–Berger problem [2], obtained by setting R 1 = R 2 = 0 in Figure 1, which remains, to date, an open problem.
In this paper, we study an instance of the Gray–Wyner model with side information of Figure 1 in which the reconstruction sets are degraded, meaning, both receivers reproduce the source component S 2 n losslessly and Receiver 1 wants also to reproduce the source component S 1 n lossily, to within some prescribed distortion level D 1 . It is important to note that, while the reconstruction sets are nested, and so degraded, no specific ordering is imposed on the side information sequences, which then can be arbitrarily correlated among them and with the sources ( S 1 n , S 2 n ) .
As in the Gray–Wyner original coding scheme, the encoder produces a common description of the sources pair ( S 1 n , S 2 n ) that is intended to be recovered by both receivers, as well as individual or private descriptions of ( S 1 n , S 2 n ) that are destined to be recovered each by a distinct receiver. Because the side information sequences do not exhibit any specific ordering, the choice of the information that each description should carry, and, the links over which each is transmitted to its intended receiver, are challenging questions that we answer in this work.
To build the understanding of the role of each of the links and of the descriptions in the optimal coding scheme for the setting of Figure 2, we will investigate as well two important underlying problems, which are Heegard–Berger type models with refinement links as shown in Figure 3. In both models, only one of the two refinement individual links has non-zero rate.
In the model of Figure 3a, the receiver that accesses the additional rate-limited link (i.e., Receiver 1) is also required to reproduce a lossy estimate of the source component S 1 n , in addition to the source component S 2 n which is to be reproduced losslessly by both receivers. We will refer to this model as a “Heegard–Berger problem with successive refinement”. Reminiscent of successive refinement source coding, this model may be appropriate to model applications in which descriptions of only some components (e.g., S 2 n ) of the source suffices at the first use of the data; and descriptions of the remaining components (e.g., S 1 n ) are needed only at a later stage.
The model of Figure 3b has the individual rate-limited link connected to the receiver that is required to reproduce only the source component S 2 n . We will refer to this model as a “Heegard–Berger problem with scalable coding”, reusing a term that was introduced in [3] for a similar scenario, and in reference to that user 1 may have such a “good quality” side information that only a minimal amount of information from the encoder suffices, thus, so as not to constrain the communication by user 2 with the lower quality side information, an additional rate limited link R 2 is added to balance the decoding capabilities of both users.

1.1. Main Contributions

The main result of this paper is a single-letter characterization of the optimal rate-distortion region of the Gray–Wyner model with side information and degraded reconstruction sets of Figure 2. To this end, we derive a converse proof that is tailored specifically for the model with degraded reconstruction sets that we study here. For the proof of the direct part, we develop a coding scheme that is very similar to one developed in the context of coding for broadcast channels with feedback in [4], but with an appropriate choice of the variables which we specify here. The specification of the main result to the Heegard–Berger models with successive refinement and scalable coding of Figure 3 sheds light on the roles of the common and private descriptions and what they should carry optimally. We develop intuitions by analyzing the established single-letter optimal rate-distortion regions of these two models, and illustrate our discussion through some binary examples.

1.2. Related Works

In [4], Shayevitz and Wigger study a two-receiver discrete memoryless broadcast channel with feedback. They develop an efficient coding scheme which treats the feedback signal as a source that has to be conveyed lossily to the receivers in order to refine their messages’ estimates, through a block Markov coding scheme. In doing so, the users’ channel outputs are regarded as side information sequences; thus, the scheme clearly connects with the Gray–Wyner model with side information of Figure 1—as is also clearly explicit in [4]. The Gray–Wyner model with side information for which Shayevitz and Wigger’s develop a (source) coding scheme, as part of their study of the broadcast channel with feedback, assumes general, possibly distinct, distortion measures at the receivers (i.e., not necessarily nested) and side information sequences that are arbitrarily correlated among them and with the source. In this paper, we show that, when specialized to the model with degraded reconstruction sets of Figure 2 that we study here, Shayevitz and Wigger’s coding scheme for the Gray–Wyner model with side information of [4] yields a rate-distortion region that meets the converse result that we here establish, thus is optimal.
The Gray–Wyner model with side information generalizes another long standing open source coding problem, the famous Heegard–Berger problem [2]. Full single-letter characterization of the optimal rate-distortion function of the Heegard–Berger problem is known only in few specific cases, the most important of which are the cases of : (i) stochastically degraded side information sequences [2] (see also [5]); (ii) Sgarro’s result [6] on the corresponding lossless problem; (iii) Gaussian sources with quadratic distortion measure [3,7]; (iv) some instances of conditionally less-noisy side information sequences [8]; and (v) the recently solved HB model with general side information sequences and degraded reconstruction sets [9], i.e., the model of Figure 2 with R 1 = R 2 = 0 — in the lossless case, a few other optimal results were shown, such as for the so-called complementary delivery [10]. A lower bound for general instances of the rate distortion problem with side information at multiple decoders, which is inspired by a linear-programming lower bound for index coding, has been developed recently by Unal and Wagner in [11].
Successive refinement of information was investigated by Equitz et al. in [12], wherein the description of the source is successively refined to a collection of receivers which are required to reconstruct the source with increasing quality levels. Extensions of successive refinement to cases in which the receivers observe some side information sequences was first investigated by Steinberg et al. in [13] who establish the optimal rate-distortion region under the assumption that the receiver that observes the refinement link, say receiver 1, observes also a better side information sequence than the opposite user, i.e., the Markov chain S Y 1 Y 2 holds. Tian et al. give in [7] an equivalent formulation of the result of [13] and extend it to the N-stage successive refinement setting. In [3], Tian et al. investigate another setting, coined as “side information scalable coding”, in which it is rather the receiver that accesses the refinement link, say receiver 2, which observes the less good side information sequence, i.e., S Y 1 Y 2 . Balancing refinement quality and side information asymmetry for such a side-information scalable source coding problem allows authors in [3] to derive the rate-distortion region in the degraded side information case. The previous results on successive refinement in the presence of side information, which were generalized by Timo et al. in [14], all assume, however, a specific structure in the side information sequences.

1.3. Outline

An outline of the remainder of this paper is as follows. Section 2 describes formally the Gray–Wyner model with side information and degraded reconstruction sets of Figure 2 that we study in this paper. Section 3 contains the main result of this paper, a full single-letter characterization of the rate-distortion region of the model of Figure 2, together with some useful discussions and connections. A formal proof of the direct and converse parts of this result appear in Section 6. In Section 4 and Section 5, we specialize the result, respectively, to the Heegard–Berger model with successive refinement of Figure 3a and the Heegard–Berger model with scalable coding of Figure 3b. These sections also contain insightful discussions illustrated by some binary examples.

Notation

Throughout the paper, we use the following notations. The term pmf stands for probability mass function. Upper case letters are used to denote random variables, e.g., X; lower case letters are used to denote realizations of random variables, e.g., x; and calligraphic letters designate alphabets, i.e., X . Vectors of length n are denoted by X n = ( X 1 , , X n ) , and X i j is used to denote the sequence ( X i , , X j ) , whereas X < i > ( X 1 , , X i 1 , X i + 1 , , X n ) . The probability distribution of a random variable X is denoted by P X ( x ) P ( X = x ) . Sometimes, for convenience, we write it as P X . We use the notation E ( X ) to denote the expectation of a random variable X. A probability distribution of a random variable Y given X is denoted by P Y | X . The set of probability distributions defined on an alphabet X is denoted by P ( X ) . The cardinality of a set X is denoted by X . For random variables X, Y and Z, the notation X Y Z indicates that X, Y and Z, in this order, form a Markov Chain, i.e., P X Y Z ( x , y , z ) = P Y ( y ) P X | Y ( x | y ) P Z | Y ( z | y ) . The set T [ X ] ( n ) denotes the set of sequences strongly typical with respect to the probability distribution P X and the set T [ X | y n ] ( n ) denotes the set of sequences x n jointly typical with y n with respect to the joint pmf P X Y . Throughout this paper, we use h 2 ( α ) to denote the entropy of a Bernoulli ( α ) random variable, i.e., h 2 ( α ) = α log ( α ) ( 1 α ) log ( 1 α ) . In addition, the indicator function is denoted by 𝟙 ( · ) . For real-valued scalars a and b, with a b , the notation [ a , b ] means the set of real numbers comprised between a and b. For integers i j , [ i : j ] denotes the set of integers comprised between i and j, i.e., [ i : j ] = { i , i + 1 , , j } . Finally, throughout the paper, logarithms are taken to base 2.

2. Problem Setup and Formal Definitions

Consider the Gray–Wyner source coding model with side information and degraded reconstruction sets shown in Figure 2. Let ( S 1 × S 2 × Y 1 × Y 2 , P S 1 , S 2 , Y 1 , Y 2 ) be a discrete memoryless vector source with generic variables S 1 , S 2 , Y 1 and Y 2 . In addition, let S ^ 1 be a reconstruction alphabet and, d 1 a distortion measure defined as:
d 1 : S 1 × S ^ 1 R + ( s 1 , s ^ 1 ) d 1 ( s 1 , s ^ 1 ) .
Definition 1.
An ( n , M 0 , n , M 1 , n , M 2 , n , D 1 ) code for the Gray–Wyner source coding model with side information and degraded reconstruction sets of Figure 2 consists of:
-
Three sets of messages W 0 [ 1 : M 0 , n ] , W 1 [ 1 : M 1 , n ] , and W 2 [ 1 : M 2 , n ] .
-
Three encoding functions, f 0 , f 1 and f 2 defined, for j { 0 , 1 , 2 } as
f j : S 1 n × S 2 n W j ( S 1 n , S 2 n ) W j = f j ( S 1 n , S 2 n ) .
-
Two decoding functions g 1 and g 2 , one at each user:
g 1 : W 0 × W 1 × Y 1 n S ^ 2 n × S ^ 1 n ( W 0 , W 1 , Y 1 n ) ( S ^ 2 , 1 n , S ^ 1 n ) = g 1 ( W 0 , W 1 , Y 1 n ) ,
and
g 2 : W 0 × W 2 × Y 2 n S ^ 2 n ( W 0 , W 2 , Y 2 n ) S ^ 2 , 2 n = g 2 ( W 0 , W 2 , Y 2 n ) .
The expected distortion of this code is given by
E d 1 ( n ) ( S 1 n , S ^ 1 n ) E 1 n i = 1 n d 1 ( S 1 , i , S ^ 1 , i ) .
The probability of error is defined as
P e ( n ) P S ^ 2 , 1 n S 2 n or S ^ 2 , 2 n S 2 n .
Definition 2.
A rate triple ( R 0 , R 1 , R 2 ) is said to be D 1 -achievable for the Gray–Wyner source coding model with side information and degraded reconstruction sets of Figure 2 if there exists a sequence of ( n , M 0 , n , M 1 , n , M 2 , n , D 1 ) codes such that:
lim sup n P e ( n ) = 0 ,
lim sup n E d 1 ( n ) ( S 1 n , S ^ 1 n ) D 1 ,
lim sup n 1 n log 2 ( M j , n ) R j for j { 0 , 1 , 2 } .
The rate-distortion region RD of this problem is defined as the union of all rate-distortion quadruples ( R 0 , R 1 , R 2 , D 1 ) such that ( R 0 , R 1 , R 2 ) is D 1 -achievable, i.e,
RD ( R 0 , R 1 , R 2 , D 1 ) : ( R 0 , R 1 , R 2 ) is D 1 achievable .
As we already mentioned, we shall also study the special case Heegard–Berger type models shown in Figure 3. The formal definitions for these models are similar to the above, and we omit them here for brevity.

3. Gray–Wyner Model with Side Information and Degraded Reconstruction Sets

In the following, we establish the main result of this work, i.e., the single-letter characterization of the optimal rate-distortion region RD of the Gray–Wyner model with side information and degraded reconstructions sets shown in Figure 2. We then describe how the result subsumes and generalizes existing rate-distortion regions for this setting under different assumptions.
Theorem 1.
The rate-distortion region RD of the Gray–Wyner problem with side information and degraded reconstruction set of Figure 2 is given by the sets of all rate-distortion quadruples ( R 0 , R 1 , R 2 , D 1 ) satisfying:
R 0 + R 1 H ( S 2 | Y 1 ) + I ( U 0 U 1 ; S 1 | S 2 Y 1 )
R 0 + R 2 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 )
R 0 + R 1 + R 2 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 )
for some product pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 , such that:
(1) 
The following Markov chain is valid:
( Y 1 , Y 2 ) ( S 1 , S 2 ) ( U 0 , U 1 )
(2) 
There exists a function ϕ : Y 1 × U 0 × U 1 × S 2 S ^ 1 such that:
E d 1 ( S 1 , S ^ 1 ) D 1 .
Proof. 
The detailed proof of the direct part and the converse part of this theorem appear in Section 6.
The proof of converse, which is the most challenging part, uses appropriate combinations of bounding techniques for the transmitted rates based on the system model assumptions and Fano’s inequality, a series of analytic bounds based on the underlying Markov chains, and most importantly, a proper use of Csiszár–Körner sum identity in order to derive single letter bounds.
As for the proof of achievability, it combines the optimal coding scheme of the Heegard–Berger problem with degraded reconstruction sets [9] and the double-binning based scheme of Shayevitz and Wigger (Theorem 2, [4]) for the Gray–Wyner problem with side information, and is outlined in the following.
The encoder produces a common description of ( S 1 n , S 2 n ) that is intended to be recovered by both receivers, and an individual description that is intended to be recovered only by Receiver 1. The common description is chosen as V 0 n = ( U 0 n , S 2 n ) and is thus designed to describe all of S 2 n , which both receivers are required to reproduce lossessly, but also all or part of S 1 n , depending on the desired distortion level D 1 . Since we make no assumptions on the side information sequences, this is meant to account for possibly unbalanced side information pairs ( Y 1 n , Y 2 n ) , in a manner that is similar to [9]. The message that carries the common description is obtained at the encoder through the technique of double-binning of Tian and Diggavi in [3], used also by Shayevitz and Wigger (Theorem 2, [4]) for a Gray–Wyner model with side information. In particular, similar to the coding scheme of (Theorem 2, [4]), the double-binning is performed in two ways, one that is tailored for Receiver 1 and one that is tailored for Receiver 2.
More specifically, the codebook of the common description is composed of codewords v 0 n that are drawn randomly and independently according to the product law of P V 0 ; and is partitioned uniformly into 2 n R ˜ 0 , 0 superbins, indexed with w ˜ 0 , 0 [ 1 : 2 n R ˜ 0 , 0 ] . The codewords of each superbin of this codebook are partitioned in two distinct ways. In the first partition, they are assigned randomly and independently to 2 n R ˜ 0 , 1 subbins indexed with w ˜ 0 , 1 [ 1 : 2 n R ˜ 0 , 1 ] , according to a uniform pmf over [ 1 : 2 n R ˜ 0 , 1 ] . Similarly, in the second partition, they are assigned randomly and independently to 2 n R ˜ 0 , 2 subbins indexed with w ˜ 0 , 2 [ 1 : 2 n R ˜ 0 , 2 ] , according to a uniform pmf over [ 1 : 2 n R ˜ 0 , 2 ] . The codebook of the private description is composed of codewords u 1 n that are drawn randomly and independently according to the product law of P U 1 | V 0 . This codebook is partitioned similarly uniformly into 2 n R ˜ 1 , 0 superbins indexed with w ˜ 1 , 0 [ 1 : 2 n R ˜ 1 , 0 ] , each containing 2 n R ˜ 1 , 1 subbins indexed with w ˜ 1 , 1 [ 1 : 2 n R ˜ 1 , 1 ] codewords u 1 n .
Upon observing a typical pair ( S 1 n , S 2 n ) = ( s 1 n , s 2 n ) , the encoder finds a pair of codewords ( v 0 n , u 1 n ) that is jointly typical with ( s 1 n , s 2 n ) . Let w ˜ 0 , 0 , w ˜ 0 , 1 and w ˜ 0 , 2 denote respectively the indices of the superbin, subbin of the first partition and subbin of the second partition of the codebook of the common description, in which lies the found v 0 n . Similarly, let w ˜ 1 , 0 and w ˜ 1 , 1 denote respectively the indices of the superbin and subbin of the codebook of the individual description in which lies the found u 1 n . The encoder sets the common message W 0 as W 0 = ( w ˜ 0 , 0 , w ˜ 1 , 0 ) and sends it over the error-free rate-limited common link of capacity R 0 . In addition, it sets the individual message W 1 as W 1 = ( w ˜ 0 , 1 , w ˜ 1 , 1 ) and sends it the error-free rate-limited link to Receiver 1 of capacity R 1 ; and the individual message W 2 as W 2 = w ˜ 0 , 2 and sends it the error-free rate-limited link to Receiver 2 of capacity R 2 . For the decoding, Receiver 2 utilizes the second partition of the codebook of the common description; and looks in the subbin of index w ˜ 0 , 2 of the superbin of index w ˜ 0 , 0 for a unique v 0 n that is jointly typical with its side information y 2 n . Receiver 1 decodes v 0 n similarly, utilizing the first partition of the codebook of the common description and its side information y 1 n . It also utilizes the codebook of the individual description; and looks in the subbin of index w ˜ 1 , 1 of the superbin of index w ˜ 1 , 1 for a unique u 1 n that is jointly typical with the pair ( y 1 n , v 0 n ) . In the formal proof in Section IV, we argue that with an appropropriate choice of the communication rates R ˜ 0 , 0 , R ˜ 0 , 1 , R ˜ 0 , 2 , R ˜ 1 , 0 and R ˜ 1 , 1 , as well as the sizes of the subbins, this scheme achieves the rate-distortion region of Theorem 1. ☐
A few remarks that connect Theorem 1 to known results on related models are in order.
Remark 1.
The setting of Figure 1 generalizes two important settings which are the Gray–Wyner problem, through the presence of side-information sequences Y 1 n and Y 2 n , and the Heegard–Berger problem, through the presence of private links of rates R 1 and R 2 . As such, the coding scheme for the setting of Figure 2 differs from that of the Gray–Wyner problem and that of the Heegard–Berger problem in many aspects as shown in Figure 4.
First, the presence of side information sequences imposes the use of “binning” for each of the produced descriptions V 0 n , V 1 n and V 2 n in the Gray–Wyner code construction. However, unlike the binning performed in the Heegard–Berger coding scheme, the binning of the common codeword V 0 n needs to be performed with two different indices, each tailored to a side information sequence at the respective receivers, i.e., “double binning”. Another different aspect is the role of the private and common links. When in the original Gray–Wyner work, these links carried each a description, i.e., V 0 n on the common link and V 1 n with respect to V 2 n on the private links of rates R 1 with respect to R 2 , and when in the Heegard–Berger the three descriptions V 0 n , V 1 n and V 2 n are all carried through the common link only, in the optimal coding scheme of the setting of Figure 2, the private and common links play different roles. Indeed, the common description V 0 n and the private description V j n are transmitted on both the common link and the private link of rates R 0 and R j , for j { 1 , 2 } , through rate-splitting. As such, these key differences imply an intricate interplay between the side information sequences and the role of the common and private links, which we will emphasize later on in Section 4 and Section 5.
Remark 2.
In the special case in which R 1 = R 2 = 0 , the Gray–Wyner model with side information and degraded reconstruction sets of Figure 2 reduces to a Heegard–Berger problem with arbitrary side information sequences and degraded reconstruction sets, a model that was studied, and solved, recently in the authors’ own recent work [9]. Theorem 1 can then be seen as a generalization of (Theorem 1, [9]) to the case in which the encoder is connected to the receivers also through error-free rate-limited private links of capacity R 1 and R 2 respectively. The most important insight in the Heegard–Berger problem with degraded reconstruction sets is the role that the common description V 0 should play in such a setting. Authors show in (Theorem 1, [9]) that the optimal choice of this description is to contain, intuitively, the common source S 2 intended to both users, and, maybe less intuitive, an additional description U 0 , i.e., V 0 = ( U 0 , S 2 ) , which is used to piggyback part of the source S 1 in the common codeword though not required by both receivers, in order to balance the asymmetry of the side information sequences. In Section 4 and Section 5 we show that the utility of this description will depend on both the side information sequences and the rates of the private links.
Remark 3.
In [15], Timo et al. study the Gray–Wyner source coding model with side information of Figure 1. They establish the rate-region of this model in the specific case in which the side information sequence Y 2 n is a degraded version of Y 1 n , i.e., ( S 1 , S 2 ) Y 1 Y 2 is a Markov chain, and both receivers reproduce the component S 2 n and Receiver 1 also reproduces the component S 1 n , all in a lossless manner. The result of Theorem 1 generalizes that of (Theorem 5, [15]) to the case of side information sequences that are arbitrarily correlated among them and with the source pair ( S 1 , S 2 ) and lossy reconstruction of S 1 . In [15], Timo et al. also investigate, and solve, a few other special cases of the model, such as those of single source S 1 = S 2 (Theorem 4, [15]) and complementary delivery ( Y 1 , Y 2 ) = ( S 2 , S 1 ) (Theorem 6, [15]). The results of (Theorem 4, [15]) and (Theorem 6, [15]) can be recovered from Theorem 1 as special cases of it. Theorem 1 also generalizes (Theorem 6, [15]) to the case of lossy reproduction of the component S 1 n .

4. The Heegard–Berger Problem with Successive Refinement

An important special case of the Gray–Wyner source coding model with side information and degraded reconstruction sets of Figure 2 is the case in which R 2 = 0 . The resulting model, a Heegard–Berger problem with successive refinement, is shown in Figure 3a.
In this section, we derive the optimal rate distortion region for this setting, and show how it compares to existing results in literature. Besides, we discuss the utility of the common description U 0 depending, not only on the side information sequences structures, but also on the refinement link rate R 1 . We illustrate through a binary example that the utility of U 0 , namely the optimality of the choice of a non-degenerate U 0 , is governed by the quality of the refinement link rate R 1 and the side information structure.

4.1. Rate-Distortion Region

The following theorem states the optimal rate-distortion region of the Heegard–Berger problem with successive refinement of Figure 3a.
Corollary 1.
The rate-distortion region of the Heegard–Berger problem with successive refinement of Figure 3a is given by the set of rate-distortion triples ( R 0 , R 1 , D 1 ) satisfying:
R 0 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 )
R 0 + R 1 H ( S 2 | Y 1 ) + I ( U 0 U 1 ; S 1 | S 2 Y 1 )
R 0 + R 1 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 )
for some product pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 , such that:
(1) 
The following Markov chain is valid:
( U 0 , U 1 ) ( S 1 , S 2 ) ( Y 1 , Y 2 )
(2) 
There exists a function ϕ : Y 1 × U 0 × U 1 × S 2 S ^ 1 such that:
E d 1 ( S 1 , S ^ 1 ) D 1 .
Proof. 
The proof of Corollary 1 follows from that of Theorem 1 by setting R 2 = 0 therein. ☐
Remark 4.
Recall the coding scheme of Theorem 1. If R 2 = 0 , the second partition of the codebook of the common description, which is relevant for Receiver 2, becomes degenerate since, in this case, all the codewords v 0 n of a superbin B 00 ( w ˜ 0 , 0 ) are assigned to a single subbin. Correspondingly, the common message that the encoder sends over the common link carries only the index w ˜ 0 , 0 of the superbin B 00 ( w ˜ 0 , 0 ) of the codebook of the common description in which lies the typical pair v 0 n = ( s 2 n , u 0 n ) , in addition to the index w ˜ 1 , 0 of the subbin B 10 ( w ˜ 1 , 0 ) of the codebook of the individual description in which lies the recovered typical u 1 n . Constraint (14a) on the common rate R 0 is in accordance with that Receiver 2 utilizes only the index w ˜ 0 , 0 in the decoding. Furthermore, note that Constraints (14b) and (14c) on the sum-rate ( R 0 + R 1 ) can be combined as
R 0 + R 1 max I ( U 0 S 2 ; S 1 S 2 | Y 1 ) , I ( U 0 S 2 ; S 1 S 2 | Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 )
which resembles the Heegard–Berger result of (Theorem 2, p. 733, [2]).
Remark 5.
As we already mentioned, the result of Corollary 1 holds for side information sequences that are arbitrarily correlated among them and with the sources. In the specific case in which the user who gets the refinement rate-limited link also has the “better-quality” side information, in the sense that ( S 1 , S 2 ) Y 1 Y 2 forms a Markov chain, the rate-distortion region of Corollary 1 reduces to the set of all rate-distortion triples ( R 0 , R 1 , D 1 ) that satisfy
R 0 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 )
R 0 + R 1 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 ) .
for some joint pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 for which (15) and (16) hold. This result can also be obtained from previous works on successive refinement for the Wyner–Ziv source coding problem by Steinberg and Merhav (Theorem 1, [13]) and Tian and Diggavi (Theorem 1, [7]). The results of (Theorem 1, [13]) and (Theorem 1, [7]) hold for possibly distinct, i.e., not necessarily nested, distortion measures at the receivers; but they require the aforementioned Markov chain condition which is pivotal for their proofs. Thus, for the considered degraded reconstruction sets setting, Corollary 1 can be seen as generalizing (Theorem 1, [13]) and (Theorem 1, [7]) to the case in which the side information sequences are arbitrarily correlated among them and with the sources ( S 1 , S 2 ) , i.e., do not exhibit any ordering.
Remark 6.
In the case in which it is the user who gets only the common rate-limited link that has the “better-quality” side information, in the sense that ( S 1 , S 2 ) Y 2 Y 1 forms a Markov chain, the rate distortion region of Corollary 1 reduces to the set of all rate-distortion triples ( R 0 , R 1 , D 1 ) that satisfy
R 0 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 )
R 0 + R 1 H ( S 2 | Y 1 ) + I ( U 0 U 1 ; S 1 | S 2 Y 1 )
for some joint pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 for which (15) and (16) hold. This result can also be conveyed from [3]. Specifically, in [3] Tian and Diggavi study a therein referred to as “side-information scalable” source coding setup where the side informations are degraded, and the encoder produces two descriptions such that the receiver with the better-quality side information (Receiver 2 if ( S 1 , S 2 ) Y 2 Y 1 is a Markov chain) uses only the first description to reconstruct its source while the receiver with the low-quality side information (Receiver 1 if ( S 1 , S 2 ) Y 2 Y 1 is a Markov chain) uses the two descriptions in order to reconstruct its source. They establish inner and outer bounds on the rate-distortion region of the model, which coincide when either one of the decoders requires a lossless reconstruction or when the distortion measures are degraded and deterministic. Similar to the previous remark, Corollary 1 can be seen as generalizing the aforementioned results of [3] to the case in which the side information sequences are arbitrarily correlated among them and with the sources ( S 1 , S 2 ) .
Remark 7.
A crucial remark that is in order for the Heegard–Berger problem with successive refinement of Figure 3a, is that, depending on the rate of the refinement link R 1 , resorting to a common auxiliary variable U 0 might be unnecessary. Indeed, in the case in which S 1 needs to be recovered losslessly at the first receiver, for instance, parts of the rate-region can be achieved without resorting to the common auxiliary variable U 0 , setting U 0 = , while other parts of the rate region can only be achieved through a non-trivial choice of U 0 .
As such, if R 1 H ( S 1 | S 2 Y 1 ) , then letting U 0 = yields the optimal rate region. To see this, note that the rate constraints under lossless construction of S 1 write as:
R 0 H ( S 1 S 2 | Y 2 ) H ( S 1 | S 2 Y 2 U 0 )
R 0 + R 1 H ( S 1 S 2 | Y 1 )
R 0 + R 1 H ( S 1 S 2 | Y 2 ) H ( S 1 | S 2 Y 2 U 0 ) + H ( S 1 | U 0 S 2 Y 1 )
which, can be rewritten as follows
* R 0 H ( S 1 S 2 | Y 2 ) + min P U 0 | S 1 S 2 ( H ( S 1 | S 2 Y 1 U 0 ) R 1 ) + H ( S 1 | S 2 Y 2 U 0 )
R 0 + R 1 H ( S 1 S 2 | Y 1 )
where ( x ) + max { 0 , x } .
Under the constraint that R 1 H ( S 1 | S 2 Y 1 ) , the constraints in (21a) reduce to the following
R 0 H ( S 1 S 2 | Y 2 ) max P U 0 | S 1 S 2 H ( S 1 | S 2 Y 2 U 0 )
R 0 + R 1 H ( S 1 S 2 | Y 1 ) .
Next, by noting that max P U 0 | S 1 S 2 H ( S 1 | S 2 Y 2 U 0 ) = H ( S 1 | S 2 Y 2 ) is achieved by U 0 = , the claim follows.
However, when R 1 < H ( S 1 | S 2 Y 1 ) , the choice of U 0 = might be strictly sub-optimal (as shown in the following binary example).

4.2. Binary Example

Let X 1 , X 2 , X 3 and X 4 be four independent Ber ( 1 / 2 ) random variables. Let the sources be S 1 ( X 1 , X 2 , X 3 ) and S 2 X 4 . Now, consider the Heegard–Berger model with successive refinement shown in Figure 5. The first user, which gets both the common and individual links, observes the side information Y 1 = ( X 1 , X 4 ) and wants to reproduce the pair ( S 1 , S 2 ) losslessly. The second user gets only the common link, has side information Y 2 = ( X 2 , X 3 ) and wants to reproduce only the component S 2 , losslessly.
The side information at the decoders do not exhibit any degradedness ordering, in the sense that none of the Markov chain conditions of Remarks 5 and 6 hold. The following claim provides the rate-region of this binary example.
Claim 1.
The rate region of the binary Heegard–Berger example with successive refinement of Figure 5 is given by the set of rate pairs ( R 0 , R 1 ) that satisfy
R 0 1
R 0 + R 1 2 .
Proof. 
The proof of Claim 1 follows easily by computing the rate region
R 0 H ( S 1 S 2 | Y 2 ) H ( S 1 | S 2 Y 2 U 0 )
R 0 + R 1 H ( S 1 S 2 | Y 1 )
R 0 + R 1 H ( S 1 S 2 | Y 2 ) H ( S 1 | S 2 Y 2 U 0 ) + H ( S 1 | U 0 S 2 Y 1 )
in the binary setting under study.
First, we note that
H ( S 1 S 2 | Y 2 ) = H ( X 1 X 4 | X 2 X 3 ) = 2
H ( S 1 S 2 | Y 1 ) = H ( X 2 X 3 | X 1 X 4 ) = 2
which allows then to rewrite the rate region as
R 0 2 H ( X 1 | X 4 U 0 ) 2 H ( X 1 | X 4 ) = 1
R 0 + R 1 2 + max { 0 , H ( X 2 X 3 | X 1 X 4 U 0 ) H ( X 1 | X 2 X 3 X 4 U 0 ) } 2
The proof of the claim follows by noticing that the following inequalities hold with equality for the choices U 0 = ( X 2 , X 3 ) or U 0 = X 2 or U 0 = X 3 . ☐
The rate region of Claim 1 is depicted in Figure 6. It is insightful to notice that although the second user is only interested in reproducing the component S 2 = X 4 , the optimal coding scheme that achieves this region sets the common description that is destined to be recovered by both users as one that is composed of not only S 2 but also some part U 0 = ( X 2 , X 3 ) , or U 0 = X 2 or U 0 = X 3 , of the source component S 1 (though the latter is not required by the second user). A possible intuition is that this choice of U 0 is useful for user 1, who wants to reproduce S 1 = ( X 1 , X 2 , X 3 ) , and its transmission to also the second user does not cost any rate loss since this user has available side information Y 2 = ( X 2 , X 3 ) .

5. The Heegard–Berger Problem with Scalable Coding

In the following, we consider the model of Figure 3b. As we already mentioned, the reader may find it appropriate for the motivation to think about the side information Y 2 n as being of lower quality than Y 1 n , in which case, the refinement link that is given to the second user is intended to improve its decoding capability. In this section, we describe the optimal coding scheme for this setting, and show that it can be recovered, independently, from the work of Timo et al. [14] through a careful choice of the coding sets. Next, we illustrate through a binary example the interplay between the utility of the common description U 0 and the side information sequences, and the refinement rate R 2 .

5.1. Rate-Distortion Region

The following theorem states the rate-distortion region of the Heegard–Berger model with scalable coding of Figure 3b.
Corollary 2.
The rate-distortion region of the Heegard–Berger model with scalable coding of Figure 3b is given by the set of all rate-distortion triples ( R 0 , R 2 , D 1 ) that satisfy
R 0 H ( S 2 | Y 1 ) + I ( U 0 U 1 ; S 1 | S 2 Y 1 )
R 0 + R 2 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 )
for some product pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 , such that:
(1) 
The following Markov chain is valid:
( U 0 , U 1 ) ( S 1 , S 2 ) ( Y 1 , Y 2 )
(2) 
There exists a function ϕ : Y 1 × U 0 × U 1 × S 2 S ^ 1 such that:
E d 1 ( S 1 , S ^ 1 ) D 1 .
Proof. 
The proof of Corollary 2 follows from that of Theorem 1 by seeting R 1 = 0 therein. ☐
Remark 8.
In the specific case in which Receiver 2 has a better-quality side information in the sense that ( S 1 , S 2 ) Y 2 Y 1 forms a Markov chain, the rate distortion region of Corollary 2 reduces to one that is described by a single rate-constraint, namely
R 0 H ( S 2 | Y 1 ) + I ( U ; S 1 | S 2 Y 1 )
for some conditional P U | S 1 S 2 that satisfies E [ d 1 ( S 1 , S ^ 1 ) ] D 1 . This is in accordance with the observation that, in this case, the transmission to Receiver 1 becomes the bottleneck, as Receiver 2 can recover the source component S 2 losslessly as long as so does Receiver 1.
Remark 9.
Consider the case in which S 1 needs to be recovered losslessly as well at Receiver 1. Then, the rate region is can be expressed as follows
R 0 H ( S 1 S 2 | Y 1 )
R 0 + R 2 H ( S 1 S 2 | Y 2 ) + min P U 0 | S 1 S 2 H ( S 1 | U 0 S 2 Y 1 ) H ( S 1 | U 0 S 2 Y 2 ) .
An important comment here is that the optimization problem in P U 0 | S 1 S 2 does not depend on the refinement link R 2 , and the optimal solution to it, i.e., the optimal choice of U 0 , meets the solution to the Heegard–Berger problem without refinement link, R 2 = 0 , rendering it optimal for all choices of R 2 , which is a main difference with the Heegard–Berger problem with refinement link of Figure 3a in which the solution to the Heegard–Berger problem (with R 1 = 0 ) might not be optimal for all values of R 1 .
Remark 10.
In (Theorem 1, [14]), Timo et al. present an achievable rate-region for the multistage successive-refinement problem with side information. Timo et al. consider distortion measures of the form δ l : X × X ^ l R + , where X is the source alphabet and X ^ l is the reconstruction at decoder l, l { 1 , , t } ; and for this reason this result is not applicable as is to the setting of Figure 3b, in the case of two decoders. However, the result of (Theorem 1, [14]) can be extended to accommodate a distortion measure at the first decoder that is vector-valued; and the direct part of Corollary 2 can then be obtained by applying this extension. Specifically, in the case of two decoders, i.e., t = 2 , and with X = ( S 1 , S 2 ) , and two distortion measures δ 1 : S 1 × S 2 × S ^ 1 , 1 × S ^ 1 , 2 { 0 , 1 } × R + and δ 2 : S 1 × S 2 × S ^ 1 , 2 × S ^ 2 , 2 { 0 , 1 } chosen such that
δ 1 ( s 1 , s 2 ) , ( s ^ 1 , 1 , s ^ 2 , 1 ) = d H ( s 2 , s ^ 2 , 1 ) , d 1 ( s 1 , s ^ 1 , 1 )
and
δ 2 ( s 1 , s 2 ) , ( s ^ 1 , 2 , s ^ 2 , 2 ) = d H ( s 2 , s ^ 2 , 2 )
where d H ( · , · ) is the Hamming distance, letting d 1 = ( 0 , D 1 ) and d 2 = 0 , a straightforward extension of (Theorem 1, [14]) to this setting yields a rate-region that is described by the following rate constraints (using the notation of (Theorem 1, [14]))
R 0 Φ ( T 0 , 1 ) + Φ ( T 1 , 1 )
R 0 + R 2 Φ ( T 0 , 2 ) + Φ ( T 1 , 2 ) + Φ ( T 2 , 2 )
where T 0 = { 1 , 2 } , T 1 = { 1 } , T 2 = { 2 } , and for j = 0 , 1 , 2 and l 1 , 2 such that T j { 1 , , l } , the function Φ ( T j , l ) , j = 0 , 1 , 2 , is defined as
Φ ( T j , l ) = I S 1 S 2 A T j ; U T j | A T j min l T j [ 1 : l ] I U T j ; A T j , l Y l | A T j
where A = { U 12 , U 1 , U 2 } and the sets A T j , A T j , A T j + , A T j , A T j , 1 , A T j , 2 , evaluated in this case, are given in Table 1. It is easy to see that the region described by (35) can be written more explicitly in this case as
R 0 I ( U 12 ; S 1 S 2 | Y 1 )
R 0 + R 2 max { I ( U 12 ; S 1 S 2 | Y 1 ) , I ( U 12 ; S 1 S 2 | Y 2 ) } + I ( U 1 ; S 1 S 2 | Y 1 U 12 ) + I ( U 2 ; S 1 S 2 | Y 2 U 12 ) .
Also, setting U 12 = ( U 0 , S 2 ) and U 2 = S 2 in (37) one recovers the rate-region of Corollary 2. (Such a connection can also be stated for the result of Corollary 1).

5.2. Binary Example

Consider the setting of Figure 7. Let X 1 , X 2 , X 3 and X 4 be four independent Ber ( 1 / 2 ) random variables. Let the sources be S 1 ( X 1 , X 2 , X 3 ) and S 2 X 4 . Now, consider the Heegard–Berger model with scalable coding shown in Figure 7. The first user, which gets both only the common link, observes the side information Y 1 = ( X 1 , X 4 ) and wants to reproduce the pair ( S 1 , S 2 ) losslessly. The second user gets both the common and private links, has side information Y 2 = ( X 2 , X 3 ) and wants to reproduce only the component S 2 , losslessly.
Claim 2.
The rate region of the binary Heegard–Berger example with scalable coding of Figure 7 is given by the set of all rate pairs ( R 0 , R 2 ) that satisfy R 2 0 and R 0 2 .
Proof. 
The proof of Claim 2 follows easily by specializing, and computing, the result of Remark 9 for the example at hand. First note that
(38a) R 0 + R 2 H ( S 2 S 1 | Y 2 ) + min P U 0 | S 1 S 2 H ( S 1 | U 0 S 2 Y 1 ) H ( S 1 | U 0 S 2 Y 2 ) (38b) = 2 + min P U 0 | S 1 S 2 [ H ( X 2 X 3 | X 1 X 4 U 0 ) H ( X 1 | X 2 X 3 X 4 U 0 ) ] (38c) 2 + min P U 0 | S 1 S 2 [ H ( X 1 | X 2 U 0 ) ] (38d) 1
where equality in all previous inequalities is satisfied with U 0 = ( X 2 , X 3 ) or with U 0 = X 2 or U 0 = X 3 .
Note as well that the single rate constraint on R 0 writes as:
(39a) R 0 H ( S 1 S 2 | Y 1 ) (39b) = 2
which renders the sum-rate constraint redundant and ends the proof of the claim. ☐
The optimal rate region of Claim 2 is depicted in Figure 8, as the region delimited by the lines R 0 = 1 and R 2 = 0 . Note that for this example, the source component X 2 , which is the only source component that is required by Receiver 2, needs to be transmitted entirely on the common link to also be recovered losslessly by Receiver 1. For this reason, the refinement link is not-constrained and appears to be useless for this example.
There is a sharp difference with the binary Heegard–Berger example with successive refinement of Figure 5 for which the refinement link may sometimes be instrumental to reducing the required rate on the common link. With scalable coding, the refinement link with rate R 0 does not improve the rate transmitted on the common link.
Also, it is insightful to notice that for this example, because of the side information configuration, the choice U 0 = in Corollary 2 is strictly suboptimal and results in the smaller region that is described by
R 0 2
R 0 + R 2 3 .

6. Proof of Theorem 1

In the following, we give the proof of the converse part and the direct part of Theorem 1.
The converse part is strongly dependent on the system model we investigate and consists in a series of careful bounding steps resorting to Fano’s inequality, Markov chains and Csiszár–Körner sum-identity.
The proof of achievability is two-fold, and consists in proving a general result that holds for a Gray–Wyner setting with side information, and then deriving the optimal choice of the auxiliary codewords involved for the specific setting with degraded reconstruction sets.

6.1. Proof of Converse Part

Assume that a rate triple ( R 0 , R 1 , R 2 ) is D 1 -achievable. Then, let W j = f j ( S 1 n , S 2 n ) , where j { 0 , 1 , 2 } , be the encoded indices and let S ^ 1 n = g 1 ( W 0 , W 1 , Y 1 n ) be the reconstruction sequence at the first decoder such that E d 1 ( n ) ( S 1 n , S ^ 1 n ) D 1 .
Using Fano’s inequality, the lossless reconstruction of the source S 2 n at both decoders implies that there exists a sequence ϵ n n 0 such that:
H ( S 2 n | W 0 W 1 Y 1 n ) n ϵ n ,
H ( S 2 n | W 0 W 2 Y 2 n ) n ϵ n .
We start by showing the following sum-rate constraint,
R 0 + R 1 + R 2 H ( S 2 | Y 2 ) + I ( U 0 ; S 1 | S 2 Y 2 ) + I ( U 1 ; S 1 | U 0 S 2 Y 1 ) .
We have that
n ( R 0 + R 1 + R 2 ) (44a) H ( W 0 ) + H ( W 2 ) + H ( W 1 ) (44b) H ( W 0 ) + H ( W 2 | W 0 ) + H ( W 1 ) (44c) = H ( W 0 W 2 ) + H ( W 1 ) (44d) H ( W 0 W 2 | Y 2 n ) + H ( W 1 | W 0 S 2 n Y 1 n ) (44e) I ( W 0 W 2 ; S 1 n S 2 n | Y 2 n ) + I ( W 1 ; S 1 n | W 0 S 2 n Y 1 n ) (44f) = H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n S 2 n | W 0 W 2 Y 2 n ) + H ( S 1 n | W 0 S 2 n Y 1 n ) H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) (44g) ( a ) H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n | W 0 W 2 S 2 n Y 2 n ) + H ( S 1 n | W 0 S 2 n Y 1 n ) H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) n ϵ n (44h) H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n | W 0 S 2 n Y 2 n ) + H ( S 1 n | W 0 S 2 n Y 1 n ) H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) n ϵ n
where ( a ) in (44) stems from Fano’s inequality (42), which results from the lossless reconstruction of S 2 n at receiver 2.
Let us define then:
A H ( S 1 n | W 0 S 2 n Y 1 n ) H ( S 1 n | W 0 S 2 n Y 2 n ) ,
B H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) .
In the following, we aim for single-letter bounds on the two quantities A and B.
Since the side information sequences Y 1 n and Y 2 n are not degraded and do not exhibit any structure, together with the sources ( S 1 n , S 2 n ) , single-letterizing the quantity A can be obtained through some judicious bounding steps that are reported below, in which some important Markov chain are shown to hold and quantities are manipulated appropriately, together with several invocations of Csiszár–Körner sum identity .
Let us start by writing that
(47a) A H ( S 1 n | W 0 S 2 n Y 1 n ) H ( S 1 n | W 0 S 2 n Y 2 n ) (47b) = I ( S 1 n ; Y 2 n | W 0 S 2 n ) I ( S 1 n ; Y 1 n | W 0 S 2 n ) (47c) = i = 1 n I ( S 1 n ; Y 2 , i | W 0 Y 2 i 1 S 2 n ) I ( S 1 n ; Y 1 , i | W 0 Y 1 , i + 1 n S 2 n ) (47d) = ( a ) i = 1 n I ( S 1 n Y 1 , i + 1 n ; Y 2 , i | W 0 Y 2 i 1 S 2 n ) I ( S 1 n Y 2 i 1 ; Y 1 , i | W 0 Y 1 , i + 1 n S 2 n ) (47e) = ( b ) i = 1 n I ( S 1 n ; Y 2 , i | W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) I ( S 1 n ; Y 1 , i | W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) (47f) = ( c ) i = 1 n I ( S 1 , i ; Y 2 , i | W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) I ( S 1 , i ; Y 1 , i | W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) (47g) = i = 1 n H ( S 1 , i | Y 1 , i W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) H ( S 1 , i | Y 2 , i W 0 Y 2 i 1 Y 1 , i + 1 n S 2 n ) (47h) = i = 1 n H ( S 1 , i | Y 1 , i S 2 , i U 0 , i ) H ( S 1 , i | Y 2 , i S 2 , i U 0 , i )
where U 0 , i ( W 0 , Y 2 i 1 , Y 1 , i + 1 n , S 2 , < i > ) (note that the lossless reconstruction of S 2 n at both receivers is instrumental to the definition of U 0 which plays the role of the common auxiliary variable in the proof of converse), and where ( a ) in (47) follows using the following Csiszár–Körner sum-identity
i = 1 n I ( Y 2 i 1 ; Y 1 , i | S 1 n W 0 Y 1 , i + 1 n S 2 n ) = i = 1 n I ( Y 1 , i + 1 n ; Y 2 , i | S 1 n W 0 Y 2 i 1 S 2 n ) ,
( b ) in (47) follows using the Csiszár–Körner sum-identity given by
i = 1 n I ( Y 2 i 1 ; Y 1 , i | W 0 Y 1 , i + 1 n S 2 n ) = i = 1 n I ( Y 1 , i + 1 n ; Y 2 , i | W 0 Y 2 i 1 S 2 n ) ,
while ( c ) in (47) is the consequence of the following sequence of Markov chains
( S 1 i 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n , Y 1 , i + 1 n , Y 2 i 1 ) ( S 1 , i , S 2 , i ) Y j , i
( a ) ( S 1 i 1 , S 1 , i + 1 n , S 2 i 1 , S 2 , i + 1 n , Y 1 , i + 1 n , Y 2 i 1 , W 0 ) ( S 1 , i , S 2 , i ) Y j , i
( S 1 i 1 , S 1 , i + 1 n ) ( S 2 i 1 , S 2 , i + 1 n , Y 1 , i + 1 n , Y 2 i 1 , W 0 , S 1 , i , S 2 , i ) Y j , i
where (50a) results from that the source sequences ( S 1 n , S 2 n , Y 1 n , Y 2 n ) are memoryless, while ( a ) in (50) is a consequence of that W 0 is a function of the pair of sequences ( S 1 n , S 2 n ) .
To upper-bound the term B, note the following
(51a) B H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) (51b) = i = 1 n H ( S 1 , i | W 0 W 1 S 2 n Y 1 n S 1 i 1 ) (51c) = i = 1 n H ( S 1 , i | S 2 , i Y 1 , i W 0 S 2 , < i > Y 1 , i + 1 n S 1 i 1 W 1 Y 1 i 1 ) (51d) = ( a ) i = 1 n H ( S 1 , i | S 2 , i Y 1 , i W 0 S 2 , < i > Y 1 , i + 1 n S 1 i 1 Y 2 i 1 W 1 Y 1 i 1 ) (51e) i = 1 n H ( S 1 , i | S 2 , i Y 1 , i W 0 S 2 , < i > Y 1 , i + 1 n Y 2 i 1 W 1 Y 1 i 1 )
where ( a ) in (51) is a consequence of the following sequence of Markov chains:
(52a) Y 2 i 1 ( S 1 i 1 , S 2 i 1 , Y 1 i 1 ) ( S 1 , i , S 1 , i + 1 n , S 2 , i , S 2 , i + 1 n , Y 1 , i + 1 n ) (52b) ( a ) Y 2 i 1 ( S 1 i 1 , S 2 i 1 , Y 1 i 1 ) ( S 1 , i , S 1 , i + 1 n , S 2 , i , S 2 , i + 1 n , Y 1 , i + 1 n , W 0 , W 1 ) (52c) Y 2 i 1 ( S 1 i 1 , S 2 i 1 , Y 1 i 1 , S 2 , i , S 2 i 1 , Y 1 , i + 1 n , W 0 , W 1 ) S 1 , i .
where (52a) results from that the source sequences ( S 1 n , S 2 n , Y 1 n , Y 2 n ) are memoryless, while ( a ) in (52) is a consequence of that W 0 and W 1 are each function of the pair of sequences ( S 1 n , S 2 n ) .
Finally, letting U 1 , i ( W 1 , Y 1 i 1 ) so that the choice of ( U 0 , i , U 1 , i ) satisfy the condition S ^ 1 , i = g i ( Y 1 , i , U 0 , i , U 1 , i , S 2 , i ) , we write the resulting sum-rate constraint as
n ( R 0 + R 1 + R 2 ) n H ( S 1 S 2 | Y 2 ) + i = 1 n H ( S 1 , i | S 2 , i Y 1 , i U 0 , i ) H ( S 1 , i | S 2 , i Y 2 , i U 0 , i ) i = 1 n H ( S 1 , i | S 2 , i Y 1 , i U 0 , i U 1 , i ) n ϵ n .
Let us now prove that the following bound holds
R 0 + R 1 H ( S 2 S 1 | Y 1 ) H ( S 1 | U 0 U 1 Y 1 S 2 ) .
We have
(55a) n ( R 0 + R 1 ) H ( W 0 ) + H ( W 1 | W 0 ) (55b) = H ( W 0 , W 1 ) (55c) H ( W 0 W 1 | Y 1 n ) (55d) I ( W 0 W 1 ; S 1 n S 2 n | Y 1 n ) (55e) = H ( S 1 n S 2 n | Y 1 n ) H ( S 1 n S 2 n | W 0 W 1 Y 1 n ) (55f) ( a ) H ( S 1 n S 2 n | Y 1 n ) H ( S 1 n | W 0 W 1 S 2 n Y 1 n ) n ϵ n (55g) = n H ( S 1 S 2 | Y 1 ) B n ϵ n (55h) ( b ) n H ( S 1 S 2 | Y 1 ) i = 1 n H ( S 1 , i | S 2 , i Y 1 , i U 0 , i U 1 , i ) n ϵ n ,
where ( a ) in (55) is a consequence of Fano’s inequality in (41), which results from the lossless reconstruction of S 2 n at receiver 1, and ( b ) in (55) results from the upper bound on B in (51e). As for the third rate constraint
R 0 + R 2 H ( S 1 S 2 | Y 2 ) H ( S 1 | U 0 Y 2 S 2 ) ,
we write
(57a) n ( R 0 + R 2 ) H ( W 0 W 2 ) (57b) H ( W 0 W 2 | Y 2 n ) (57c) I ( W 0 W 2 ; S 1 n S 2 n | Y 2 n ) (57d) = H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n S 2 n | W 0 W 2 Y 2 n ) (57e) ( a ) H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n | W 0 W 2 S 2 n Y 2 n ) n ϵ n (57f) H ( S 1 n S 2 n | Y 2 n ) H ( S 1 n | W 0 S 2 n Y 2 n ) n ϵ n (57g) = n H ( S 1 S 2 | Y 2 ) i = 1 n H ( S 1 , i | S 2 , i Y 2 , i W 0 S 2 , < i > Y 2 , < i > S 1 , i + 1 n ) n ϵ n (57h) = ( b ) n H ( S 1 S 2 | Y 2 ) i = 1 n H ( S 1 , i | S 2 , i Y 2 , i W 0 S 2 , < i > Y 2 , < i > S 1 , i + 1 n Y 1 , i + 1 n ) n ϵ n (57i) n H ( S 1 S 2 | Y 2 ) i = 1 n H ( S 1 , i | S 2 , i Y 2 , i W 0 S 2 , < i > Y 2 i 1 Y 1 , i + 1 n ) n ϵ n (57j) = n H ( S 1 S 2 | Y 2 ) i = 1 n H ( S 1 , i | S 2 , i Y 2 , i U 0 , i ) n ϵ n .
where ( a ) in (57) is a consequence of Fano’s inequality in (42) and ( b ) in (57) stems for the following sequence of Markov Chains.
(58a) Y 1 , i + 1 n ( S 2 , i + 1 n , S 1 , i + 1 n , Y 1 , i + 1 n ) ( S 1 , i , S 1 i 1 , S 2 , i , S 2 i 1 , Y 1 i 1 ) (58b) ( a ) Y 1 , i + 1 n ( S 2 , i + 1 n , S 1 , i + 1 n , Y 1 , i + 1 n ) ( S 1 , i , S 1 i 1 , S 2 , i , S 2 i 1 , Y 1 i 1 , W 0 , W 1 ) (58c) Y 1 , i + 1 n ( S 2 , i + 1 n , S 1 , i + 1 n , Y 1 , i + 1 n , S 2 , i , S 2 i 1 , Y 1 i 1 , W 0 , W 1 ) S 1 , i .
where (58a) results from that the source sequences ( S 1 n , S 2 n , Y 1 n , Y 2 n ) are memoryless, while ( a ) in (58) is a consequence of that W 0 and W 1 are each function of the pair of sequences ( S 1 n , S 2 n ) .
Let Q be an integer-valued random variable, ranging from 1 to n, uniformly distributed over [1:n] and independent of all other variables ( S 1 , S 2 , U 0 , U 1 , Y 1 , Y 2 ) . We have
R 0 + R 1 + R 2 H ( S 1 S 2 | Y 2 ) + 1 n i = 1 n H ( S 1 , i | S 2 , i Y 1 , i U 0 , i ) H ( S 1 , i | S 2 , i Y 2 , i U 0 , i ) (59a) 1 n i = 1 n H ( S 1 , i | S 2 , i Y 1 , i U 0 , i U 1 , i ) n ϵ n = H ( S 1 S 2 | Y 2 ) + i = 1 n P ( Q = i ) H ( S 1 , Q | S 2 , Q Y 1 , Q U 0 , Q , Q = i ) H ( S 1 , Q | S 2 , Q Y 2 , Q U 0 , Q , Q = i ) (59b) i = 1 n P ( Q = i ) H ( S 1 , Q | S 2 , Q Y 1 , Q U 0 , Q U 1 , Q , Q = i ) n ϵ n = H ( S 1 S 2 | Y 2 ) + H ( S 1 , Q | S 2 , Q Y 1 , Q U 0 , Q Q ) H ( S 1 , Q | S 2 , Q Y 2 , Q U 0 , Q Q ) (59c) H ( S 1 , Q | S 2 , Q Y 1 , Q U 0 , Q U 1 , Q Q ) n ϵ n = ( a ) H ( S 1 S 2 | Y 2 ) + H ( S 1 | S 2 Y 1 U 0 , Q Q ) H ( S 1 | S 2 Y 2 U 0 , Q Q ) (59d) H ( S 1 | S 2 Y 1 U 0 , Q U 1 , Q Q ) n ϵ n
where ( a ) in (59) is a consequence of that all sources ( S 1 n , S 2 n , Y 1 n , Y 2 n ) are memoryless.
Let us now define U 1 ( Q , U 1 , Q ) and U 0 ( Q , U 0 , Q ) , we obtain
R 0 + R 1 + R 2 H ( S 1 S 2 | Y 2 ) + H ( S 1 | S 2 Y 1 U 0 ) H ( S 1 | S 2 Y 2 U 0 ) H ( S 1 | S 2 Y 1 U 0 U 1 ) .
The two other rate constraints can be written in a similar fashion,
R 0 + R 1 H ( S 2 S 1 | Y 1 ) H ( S 1 | U 0 U 1 Y 1 S 2 )
R 0 + R 2 H ( S 1 S 2 | Y 2 ) H ( S 1 | U 0 Y 2 S 2 ) ;
and this completes the proof of converse.  ☐

6.2. Proof of Direct Part

We first show that the rate-distortion region of the proposition that will follow is achievable. The achievability of the rate-distortion region of Theorem 1 follows by choosing then the random variable V 0 of the proposition as V 0 = ( U 0 , S 2 ) .
Proposition 1.
An inner bound on the rate-distortion region of the Gray–Wyner model with side information and degraded reconstruction sets of Figure 2 is given by the set of all rate-distortion quadruples ( R 0 , R 1 , R 2 , D 1 ) that satisfy
R 0 + R 1 I ( V 0 U 1 ; S 1 S 2 | Y 1 )
R 0 + R 2 I ( V 0 ; S 1 S 2 | Y 2 )
R 0 + R 1 + R 2 max { I ( V 0 ; S 1 S 2 | Y 1 ) , I ( V 0 ; S 1 S 2 | Y 2 ) } + I ( U 1 ; S 1 S 2 | V 0 Y 1 )
for some choice of the random variables ( V 0 , U 1 ) such that ( V 0 , U 1 ) ( S 1 , S 2 ) ( Y 1 , Y 2 ) and there exist functions g 1 , g 2 , 1 , and g 2 , 2 such that:
S ^ 1 = g 1 ( V 0 , U 1 , Y 1 )
S 2 = g 2 , 1 ( V 0 , U 1 , Y 1 )
S 2 = g 2 , 2 ( V 0 , Y 2 ) ,
and
E d 1 ( S 1 ; S ^ 1 ) D 1 .
Proof of Proposition 1.
We now describe a coding scheme that achieves the rate-distortion region of Proposition 1. The scheme is very similar to one that is developed by Shayevitz and Wigger (Theorem 2, [4]) for a Gray–Wyner model with side information. In particular, similar to (Theorem 2, [4]), it uses a double-binning technique for the common codebook, one that is relevant for Receiver 1 and one that is relevant for Receiver 2. Note, however, that, formally, the result of Proposition 1 cannot be obtained by readily applying (Theorem 2, [4]) as is; and one needs to extend the result of (Theorem 2, [4]) in a manner that accounts for that the source component S 2 n is to be recovered losslessly by both decoders. This can be obtained by extending the distortion measure of (Theorem 2, [4]) to one that is vector-valued, i.e., d ( s 1 , s 2 ) , ( s ^ 1 , s ^ 2 ) = d 1 ( s 1 , s ^ 1 ) , d H ( s 2 , s ^ 2 ) , where d H ( · , · ) denotes the Hamming distance. For reasons of completeness, we provide here a proof of Proposition 1.
Our scheme has the following parameters: a conditional joint pmf P V 0 U 1 | S 1 S 2 that satisfies (63) and (64), and non-negative communication rates T 0 , T 1 , T 0 , 0 , T 0 , p , T 1 , 0 , T 1 , 1 , R ˜ 0 , 0 , R ˜ 0 , 1 , R ˜ 0 , 2 , R ˜ 1 , 0 and R ˜ 1 , 1 such that
T 0 = T 0 , 0 + T 0 , p , 0 R ˜ 0 , 0 T 0 , 0 , 0 R ˜ 0 , 1 T 0 , p , 0 R ˜ 0 , 2 T 0 , p
T 1 = T 1 , 0 + T 1 , 1 , 0 R ˜ 1 , 0 T 1 , 0 , 0 R ˜ 1 , 1 T 1 , 1 .

6.2.1. Codebook Generation

(1)
Randomly and independently generate 2 n T 0 length-n codewords v 0 n ( k 0 ) indexed with the pair of indices k 0 = ( k 0 , 0 , k 0 , p ) , where k 0 , 0 [ 1 : 2 n T 0 , 0 ] and k 0 , p [ 1 : 2 n T 0 , p ] . Each codeword v 0 n ( k 0 ) has i.i.d. entries drawn according to i = 1 n P V 0 ( v 0 , i ( k 0 ) ) . The codewords { v 0 n ( k 0 ) } are partitioned into superbins whose indices will be relevant for both receivers; and each superbin is partitioned int two different ways, each into subbins whose indices will be relevant for a distinct receiver (i.e., double-binning). This is obtained by partitioning the indices { ( k 0 , 0 , k 0 , p ) } as follows. We partition the 2 n T 0 , 0 indices { k 0 , 0 } into 2 n R ˜ 0 , 0 bins by randomly and independently assigning each index k 0 , 0 to an index w ˜ 0 , 0 ( k 0 , 0 ) according to a uniform pmf over [ 1 : 2 n R ˜ 0 , 0 ] . We refer to each subset of indices { k 0 , 0 } with the same index w ˜ 0 , 0 as a bin B 00 ( w ˜ 0 , 0 ) , w ˜ 0 , 0 [ 1 : 2 n R ˜ 0 , 0 ] . In addition, we make two distinct partitions of the 2 n T 0 , p indices { k 0 , p } , each relevant for a distinct receiver. In the first partition, which is relevant for Receiver 1, the indices { k 0 , p } are assigned randomly and independently each to an index w ˜ 0 , 1 ( k 0 , p ) according to a uniform pmf over [ 1 : 2 n R ˜ 0 , 1 ] . We refer to each subset of indices { k 0 , p } with the same index w ˜ 0 , 1 as a bin B 01 ( w ˜ 0 , 1 ) , w ˜ 0 , 1 [ 1 : 2 n R ˜ 0 , 1 ] . Similarly, in the second partition, which is relevant for Receiver 2, the indices { k 0 , p } are assigned randomly and independently each to an index w ˜ 0 , 2 ( k 0 , p ) according to a uniform pmf over [ 1 : 2 n R ˜ 0 , 2 ] ; and refer to each subset of indices { k 0 , p } with the same index w ˜ 0 , 2 as a bin B 02 ( w ˜ 0 , 2 ) , w ˜ 0 , 2 [ 1 : 2 n R ˜ 0 , 2 ] .
(2)
For each k 0 [ 1 : 2 n T 0 ] , randomly and independently generate 2 n T 1 length-n codewords u 1 n ( k 1 , k 0 ) indexed with the pair of indices k 1 = ( k 1 , 0 , k 1 , 1 ) , where k 1 , 0 [ 1 : 2 n T 1 , 0 ] and k 1 , 1 [ 1 : 2 n T 1 , 1 ] . Each codeword u 1 n ( k 1 , k 0 ) is with i.i.d. elements drawn according to i = 1 n P U 1 | V 0 ( u 1 , i ( k 1 , k 0 ) | v 0 , i ( k 0 ) ) . We partition the 2 n T 1 , 0 indices { k 1 , 0 } into 2 n R ˜ 1 , 0 bins by randomly and independently assigning each index k 1 , 0 to an index w ˜ 1 , 0 ( k 1 , 0 ) according to a uniform pmf over [ 1 : 2 n R ˜ 1 , 0 ] . We refer to each subset of indices { k 1 , 0 } with the same index w ˜ 1 , 0 as a bin B 10 ( w ˜ 1 , 0 ) , w ˜ 1 , 0 [ 1 : 2 n R ˜ 1 , 0 ] . Similarly, we partition the 2 n T 1 , 1 indices { k 1 , 1 } into 2 n R ˜ 1 , 1 bins by randomly and independently assigning each index k 1 , 1 to an index w ˜ 1 , 1 ( k 1 , 1 ) according to a uniform pmf over [ 1 : 2 n R ˜ 1 , 1 ] ; and refer to each subset of indices { k 1 , 1 } with the same index w ˜ 1 , 1 as a bin B 11 ( w ˜ 1 , 1 ) , w ˜ 1 , 1 [ 1 : 2 n R ˜ 1 , 1 ] .
(3)
Reveal all codebooks and their partitions to the encoder, the codebook of { v 0 n ( k 0 ) } and its partitions to both receivers, and the codebook of { u 1 n ( k 1 , k 0 ) } and its partitions to only Receiver 1.

6.2.2 Encoding

Upon observing the source pair ( S 1 n , S 2 n ) = ( s 1 n , s 2 n ) , the encoder finds an index k 0 = ( k 0 , 0 , k 0 , p ) such that the codeword v 0 n ( k 0 ) is jointly typical with ( s 1 n , s 2 n ) , i.e.,
s 1 n , s 2 n , v 0 n ( k 0 ) T [ S 1 S 2 V 0 ] ( n ) .
By the covering lemma (Chapter 3, [16]), the encoding in this step is successful as long as n is large and
T 0 I ( V 0 ; S 1 S 2 ) .
Next, it finds an index k 1 = ( k 1 , 0 , k 1 , 1 ) such that the codeword u 1 n ( k 1 , k 0 ) is jointly typical with the triple ( s 1 n , s 2 n , v 0 n ( k 0 ) ) , i.e.,
s 1 n , s 2 n , v 0 n ( k 0 ) , u 1 n ( k 1 , k 0 ) T [ S 1 S 2 V 0 U 1 ] ( n ) .
Again, by the covering lemma (Chapter 3, [16]), the encoding in this step is successful as long as n is large and
T 1 I ( U 1 ; S 1 S 2 | V 0 ) .
Let w ˜ 0 , 0 , w ˜ 0 , 1 and w ˜ 0 , 2 be the bin indices such that k 0 , 0 B 00 ( w ˜ 0 , 0 ) , k 0 , p B 01 ( w ˜ 0 , 1 ) and k 0 , p B 02 ( w ˜ 0 , 2 ) . In addition, let w ˜ 1 , 0 and w ˜ 1 , 1 be the bin indices such that k 1 , 0 B 10 ( w ˜ 1 , 0 ) and k 1 , 1 B 11 ( w ˜ 1 , 1 ) . The encoder then sends the product message W 0 = ( w ˜ 0 , 0 , w ˜ 1 , 0 ) over the error-free rate-limited common link of capacity R 0 . In addition, it sends the product message W 1 = ( w ˜ 0 , 1 , w ˜ 1 , 1 ) over the error-free rate-limited individual link to Receiver 1 of capacity R 1 , and the message W 2 = w ˜ 0 , 2 over the error-free rate-limited individual link to Receiver 2 of capacity R 2 .

6.2.3 Decoding

Receiver 1 gets the messages ( W 0 , W 1 ) = ( w ˜ 0 , 0 , w ˜ 1 , 0 , w ˜ 0 , 1 , w ˜ 1 , 1 ) . It seeks a codeword v 0 n ( k 0 ) and a codeword u 1 n ( k 1 , k 0 ) , with the indices k 0 = ( k 0 , 0 , k 0 , p ) and k 1 = ( k 1 , 0 , k 1 , 1 ) satisfying k 0 , 0 B 00 ( w ˜ 0 , 0 ) , k 0 , p B 01 ( w ˜ 0 , 1 ) , k 1 , 0 B 10 ( w ˜ 1 , 0 ) and k 1 , 1 B 11 ( w ˜ 1 , 1 ) , and such that
v 0 n ( k 0 ) , u 1 n ( k 1 , k 0 ) , y 1 n T [ V 0 U 1 Y 1 ] ( n ) .
By the multivariate packing lemma (Chapter 12, [16]), the error in this decoding step at Receiver 1 vanishes exponentially as long as n is large and
T 0 , 0 R ˜ 0 , 0 + T 0 , p R ˜ 0 , 1 I ( V 0 ; Y 1 )
T 1 , 0 R ˜ 1 , 0 + T 1 , 1 R ˜ 1 , 1 I ( U 1 ; Y 1 | V 0 ) .
Receiver 1 then sets its reproduced codewords s ^ 2 , 1 n and s ^ 1 n , respectively, as
s ^ 2 , 1 n = g 2 , 1 v 0 n ( k 0 ) , u 1 n ( k 1 , k 0 ) , y 1 n
s ^ 1 n = g 1 v 0 n ( k 0 ) , u 1 n ( k 1 , k 0 ) , y 1 n .
Similary, Receiver 2 gets the message ( W 0 , W 2 ) = ( w ˜ 0 , 0 , w ˜ 1 , 0 , w ˜ 0 , 2 ) . It seeks a codeword v 0 n ( k 0 ) , with k 0 = ( k 0 , 0 , k 0 , p ) satisfying k 0 , 0 B 00 ( w ˜ 0 , 0 ) and k 0 , p B 02 ( w ˜ 0 , 2 ) , and such that
v 0 n ( k 0 ) , y 1 n T [ V 0 Y 2 ] ( n ) .
Again, using the multivariate packing lemma (Chapter 12, [16]), the error in this decoding step at Receiver 2 vanishes exponentially as long as n is large and
T 0 , 0 R ˜ 0 , 0 + T 0 , p R ˜ 0 , 2 I ( V 0 ; Y 2 ) .
Receiver 2 then sets its reconstructed codeword s ^ 2 , 1 n as
s ^ 2 , 2 n = g 2 , 2 v 0 n ( k 0 ) , y 2 n .
Summarizing, combining Equations (67), (69), (71) and (74), the communication rates T 0 , T 1 , T 0 , 0 , T 0 , p , T 1 , 0 , T 1 , 1 , R ˜ 0 , 0 , R ˜ 0 , 1 , R ˜ 0 , 2 , R ˜ 1 , 0 and R ˜ 1 , 1 satisfy the following inequalities
T 0 I ( V 0 ; S 1 S 2 )
T 1 I ( U 1 ; S 1 S 2 | V 0 )
T 0 , 0 R ˜ 0 , 0 + T 0 , p R ˜ 0 , 1 I ( V 0 ; Y 1 )
T 0 , 0 R ˜ 0 , 0 + T 0 , p R ˜ 0 , 2 I ( V 0 ; Y 2 )
T 1 , 0 R ˜ 1 , 0 + T 1 , 1 R ˜ 1 , 1 I ( U 1 ; Y 1 | V 0 ) .
Choosing R ˜ 0 , 0 , R ˜ 1 , 1 , R ˜ 0 , 2 , R ˜ 1 , 0 and R ˜ 1 , 1 to also satisfy the rate relations
R 0 = R ˜ 0 , 0 + R ˜ 1 , 0
R 1 = R ˜ 0 , 1 + R ˜ 1 , 1
R 2 = R ˜ 0 , 2 .
and, finally, using Fourier-Motzkin elimination (FME) to successively project out the nuisance variables T 0 , 0 , T 0 , p , T 1 , 0 , T 1 , 1 , T 0 , T 1 , and then R ˜ 0 , 0 , R ˜ 0 , 1 , R ˜ 0 , 2 , R ˜ 1 , 0 and R ˜ 1 , 1 from the set of relations formed by (65), (76) and (77), we get the region of Proposition 1.
This completes the proof of the proposition; and so that of the direct part of Theorem 1. ☐

Acknowledgments

Part of the work and the whole publication costs are supported by ISAE-Supaéro.

Author Contributions

Authors had equal contributions in the paper. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gray, R.; Wyner, A. Source coding for a simple network. Bell Syst. Tech. J. 1974, 53, 1681–1721. [Google Scholar] [CrossRef]
  2. Heegard, C.; Berger, T. Rate distortion when side information may be absent. IEEE Trans. Inf. Theory 1985, 31, 727–734. [Google Scholar] [CrossRef]
  3. Tian, C.; Diggavi, S.N. Side-information scalable source coding. Inf. Theory IEEE Trans. 2008, 54, 5591–5608. [Google Scholar] [CrossRef]
  4. Shayevitz, O.; Wigger, M. On the capacity of the discrete memoryless broadcast channel with feedback. IEEE Trans. Inf. Theory 2013, 59, 1329–1345. [Google Scholar] [CrossRef]
  5. Kaspi, A.H. Rate distortion function when side information may be present at the decoder. IEEE Trans. Inf. Theory 1994, 40, 2031–2034. [Google Scholar] [CrossRef]
  6. Sgarro, A. Source coding with side information at several decoders. Inf. Theory IEEE Trans. 1977, 23, 179–182. [Google Scholar] [CrossRef]
  7. Tian, C.; Diggavi, S.N. On multistage successive refinement for Wyner–Ziv source coding with degraded side informations. Inf. Theory IEEE Trans. 2007, 53, 2946–2960. [Google Scholar] [CrossRef]
  8. Timo, R.; Oechtering, T.; Wigger, M. Source Coding Problems With Conditionally Less Noisy Side Information. Inf. Theory IEEE Trans. 2014, 60, 5516–5532. [Google Scholar] [CrossRef]
  9. Benammar, M.; Zaidi, A. Rate-distortion function for a heegard-berger problem with two sources and degraded reconstruction sets. IEEE Trans. Inf. Theory 2016, 62, 5080–5092. [Google Scholar] [CrossRef]
  10. Timo, R.; Grant, A.; Kramer, G. Rate-distortion functions for source coding with complementary side information. In Proceedings of the 2011 IEEE International Symposium on Information Theory (ISIT), St. Petersburg, Russia, 31 July–5 August 2011; pp. 2934–2938. [Google Scholar]
  11. Unal, S.; Wagner, A. An LP bound for rate distortion with variable side information. In Proceedings of the Data Compression Conference (DCC), Snowbird, UT, USA, 4–7 April 2017. [Google Scholar]
  12. Equitz, W.H.; Cover, T.M. Successive refinement of information. IEEE Trans. Inf. Theory 1991, 37, 269–275. [Google Scholar] [CrossRef]
  13. Steinberg, Y.; Merhav, N. On successive refinement for the Wyner-Ziv problem. IEEE Trans. Inf. Theory 2004, 50, 1636–1654. [Google Scholar] [CrossRef]
  14. Timo, R.; Chan, T.; Grant, A. Rate distortion with side-information at many decoders. Inf. Theory IEEE Trans. 2011, 57, 5240–5257. [Google Scholar] [CrossRef]
  15. Timo, R.; Grant, A.; Chan, T.; Kramer, G. Source coding for a simple network with receiver side information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Toronto, ON, Canada, 6–11 July 2008; pp. 2307–2311. [Google Scholar]
  16. Gamal, A.E.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Figure 1. Gray–Wyner network with side information at the receivers.
Figure 1. Gray–Wyner network with side information at the receivers.
Entropy 20 00002 g001
Figure 2. Gray–Wyner model with side information at both receivers and degraded reconstruction sets.
Figure 2. Gray–Wyner model with side information at both receivers and degraded reconstruction sets.
Entropy 20 00002 g002
Figure 3. Two classes of Heegard–Berger models (HB models): (a) HB model with successive refinement; and (b) HB model with scalable coding.
Figure 3. Two classes of Heegard–Berger models (HB models): (a) HB model with successive refinement; and (b) HB model with scalable coding.
Entropy 20 00002 g003
Figure 4. Comparison of coding schemes for the Gray–Wyner network with side information, Gray–Wyner network and the Heegard–Berger problem: (a) coding scheme for the Gray–Wyner network; (b) coding scheme for the Heegard–Berger problem; and (c) coding scheme for the Gray–Wyner network with side information.
Figure 4. Comparison of coding schemes for the Gray–Wyner network with side information, Gray–Wyner network and the Heegard–Berger problem: (a) coding scheme for the Gray–Wyner network; (b) coding scheme for the Heegard–Berger problem; and (c) coding scheme for the Gray–Wyner network with side information.
Entropy 20 00002 g004
Figure 5. Binary Heegard–Berger example with successive refinement.
Figure 5. Binary Heegard–Berger example with successive refinement.
Entropy 20 00002 g005
Figure 6. Rate region of the binary example of Figure 5. The choices U 0 = ( X 2 , X 3 ) or U 0 = X 2 or U 0 = X 3 are optimal irrespective of the value of R 1 , while the degenerate choice U 0 = is optimal only in some slices of the region.
Figure 6. Rate region of the binary example of Figure 5. The choices U 0 = ( X 2 , X 3 ) or U 0 = X 2 or U 0 = X 3 are optimal irrespective of the value of R 1 , while the degenerate choice U 0 = is optimal only in some slices of the region.
Entropy 20 00002 g006
Figure 7. Binary Heegard–Berger example with scalable coding.
Figure 7. Binary Heegard–Berger example with scalable coding.
Entropy 20 00002 g007
Figure 8. The optimal rate region for the setting of Figure 7 given by ( R 0 2 , R 2 0 ). The choice of U 0 = is optimal only in a slice of the region.
Figure 8. The optimal rate region for the setting of Figure 7 given by ( R 0 2 , R 2 0 ). The choice of U 0 = is optimal only in a slice of the region.
Entropy 20 00002 g008
Table 1. Auxiliary random variables associated with the subsets that appear in (36).
Table 1. Auxiliary random variables associated with the subsets that appear in (36).
T 0 T 1 T 2
A T j U 1
A T j U 12 U 12
A T j + { U 1 , U 2 }
A T j
A T j , 1
A T j , 2
Back to TopTop