Rate-Distortion Region of a Gray–Wyner Model with Side Information

In this work, we establish a full single-letter characterization of the rate-distortion region of an instance of the Gray–Wyner model with side information at the decoders. Specifically, in this model, an encoder observes a pair of memoryless, arbitrarily correlated, sources (S1n,S2n) and communicates with two receivers over an error-free rate-limited link of capacity R0, as well as error-free rate-limited individual links of capacities R1 to the first receiver and R2 to the second receiver. Both receivers reproduce the source component S2n losslessly; and Receiver 1 also reproduces the source component S1n lossily, to within some prescribed fidelity level D1. In addition, Receiver 1 and Receiver 2 are equipped, respectively, with memoryless side information sequences Y1n and Y2n. Important in this setup, the side information sequences are arbitrarily correlated among them, and with the source pair (S1n,S2n); and are not assumed to exhibit any particular ordering. Furthermore, by specializing the main result to two Heegard–Berger models with successive refinement and scalable coding, we shed light on the roles of the common and private descriptions that the encoder should produce and the role of each of the common and private links. We develop intuitions by analyzing the developed single-letter rate-distortion regions of these models, and discuss some insightful binary examples.


16
The Gray-Wyner source coding problem was originally formulated, and solved, by Gray and 17 Wyner in [1]. In their original setting, a pair of arbitrarily correlated memoryless sources (S n 1 , S n 2 ) is to As in the Gray-Wyner original coding scheme, the encoder produces a common description of the 43 sources pair (S n 1 , S n 2 ) that is intended to be recovered by both receivers, as well as individual or private 44 descriptions of (S n 1 , S n 2 ) that are destined to be recovered each by a distinct receiver. Because the side challenging questions that we answer in this work. 48 In order to build the understanding of the role of each of the links and of the descriptions in the 49 optimal coding scheme for the setting of Figure 2, we will investigate as well two important underlying 50 problems which are Heegard-Berger type models with refinement links as shown in Figure 3. In both 51 models, only one of the two refinement individual links has non-zero rate.

52
In the model of Figure 3a, the receiver that accesses the additional rate-limited link (i.e., Receiver 53 1) is also required to reproduce a lossy estimate of the source component S n 1 , in addition to the source 54 component S n 2 which is to be reproduced losslessly by both receivers. We will refer to this model  in [14] who establish the optimal rate-distortion region under the assumption that the receiver 108 that observes the refinement link, say receiver 1, observes also a better side information sequence than An outline of the remainder of this paper is as follows. Section II describes formally the Throughout the paper we use the following notations. The term pmf stands for probability mass 129 function. Upper case letters are used to denote random variables, e.g., X; lower case letters are used 130 to denote realizations of random variables, e.g., x; and calligraphic letters designate alphabets, i.e., 131 X . Vectors of length n are denoted by X n = (X 1 , . . . , X n ), and X j i is used to denote the sequence 132 (X i , . . . , X j ), whereas X <i> (X 1 , . . . , X i−1 , X i+1 , . . . , X n ). The probability distribution of a random 133 variable X is denoted by P X (x) P(X = x). Sometimes, for convenience, we write it as P X . We use 134 the notation E X [·] to denote the expectation of random variable X. A probability distribution of a 135 random variable Y given X is denoted by P Y|X . The set of probability distributions defined on an 136 alphabet X is denoted by P (X ). The cardinality of a set X is denoted by X . For random variables X,

137
Y and Z, the notation X − − Y − − Z indicates that X, Y and Z, in this order, form a Markov Chain, i.e., 138 P XYZ (x, y, z) = P Y (y)P X|Y (x|y)P Z|Y (z|y). The set T

147
Consider the Gray-Wyner source coding model with side information and degraded reconstruction sets shown in Figure 2. Let (S 1 × S 2 × Y 1 × Y 2 , P S 1 ,S 2 ,Y 1 ,Y 2 ) be a discrete memoryless vector source with generic variables S 1 , S 2 , Y 1 and Y 2 . Also, letŜ 1 be a reconstruction alphabet and, d 1 a distortion measure defined as: (1) Definition 1. An (n, M 0,n , M 1,n , M 2,n , D 1 ) code for the Gray-Wyner source coding model with side information and degraded reconstruction sets of Figure 2 consists of: -Three sets of messages W 0 [1 : M 0,n ], W 1 [1 : M 1,n ], and W 2 [1 : M 2,n ].
-Three encoding functions, f 0 , f 1 and f 2 defined, for j ∈ {0, 1, 2} as (2) -Two decoding functions g 1 and g 2 , one at each user: and The expected distortion of this code is given by The probability of error is defined as P (n) e P Ŝ n 2,1 = S n 2 orŜ n 2,2 = S n 2 .
148 Definition 2. A rate triple (R 0 , R 1 , R 2 ) is said to be D 1 -achievable for the Gray-Wyner source coding lim sup lim sup n→∞ 1 n log 2 (M j,n ) ≤ R j for j ∈ {0, 1, 2} (9) The rate-distortion region RD of this problem is defined as the union of all rate-distortion quadruples 152 As we already mentioned, we shall also study the special case Heegard-Berger type models shown 153 in Figure 3. The formal definitions for these models are similar to the above, and we omit them here 154 for brevity.  Theorem 1. The rate-distortion region RD of the Gray-Wyner problem with side information and degraded reconstruction set of Figure 2 is given by the sets of all rate-distortion quadruples (R 0 , R 1 , R 2 , D 1 ) satisfying: for some product pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 , such that: 161 1) the following Markov chain is valid: 2) and there exists a function φ : Y 1 × U 0 × U 1 × S 2 →Ŝ 1 such that: Proof: The detailed proof of the direct part and the converse part of this theorem appear in Section VI.

162
The proof of converse, which is the most challenging part, uses appropriate combinations of  The encoder produces a common description of (S n 1 , S n 2 ) that is intended to be recovered by both 171 receivers, and an individual description that is intended to be recovered only by Receiver 1. The 172 common description is chosen as V n 0 = (U n 0 , S n 2 ) and is thus designed so as to describe all of S n 2 , which 173 both receivers are required to reproduce lossessly, but also all or part of S n 1 , depending on the desired 174 distortion level D 1 . Since we make no assumptions on the side information sequences, this is meant to 175 account for possibly unbalanced side information pairs (Y n 1 , Y n 2 ), in a manner that is similar to [10].

191
Upon observing a typical pair (S n 1 , S n 2 ) = (s n 1 , s n 2 ), the encoder finds a pair of codewords (v n 0 , u n 1 ) that 192 is jointly typical with (s n 1 , s n 2 ). Letw 0,0 ,w 0,1 andw 0,2 denote respectively the indices of the superbin, is jointly typical with the pair (y n 1 , v n 0 ). In the formal proof in Section IV, we argue that with an 206 appropropriate choice of the communication ratesR 0,0 ,R 0,1 ,R 0,2 ,R 1,0 andR 1,1 , as well as the sizes of 207 the subbins, this scheme achieves the rate-distortion region of Theorem 1.

208
A few remarks that connect Theorem 1 to known results on related models are in order.  Another different aspect is the role of the private and common links. When in Gray-Wyner's original work, these 218 links carried each a description, i.e., V n 0 on the common link and V n 1 resp. V n 2 on the private links of rates R 1 219 resp. R 2 , and when in the Heegard-Berger the three descriptions V n 0 , V n 1 and V n 2 are all carried through the 220 common link only, in the optimal coding scheme of the setting of figure 2, the private and common links play 221 different roles. Indeed, the common description V n 0 and the private description V n j are transmitted on both the 222 common link and the private link of rates R 0 and R j , for j ∈ {1, 2}, through rate-splitting. As such, these key 223 differences imply an intricate interplay between the side information sequences and the role of the common and 224 private links, which we will emphasize later on in sections IV and V.  optimal choice of this description is to contain, intuitively, the common source S 2 intended to both users, and, 233 maybe less intuitive, an additional description U 0 , i.e. V 0 = (U 0 , S 2 ), which is used to piggyback part of the 234 source S 1 in the common codeword though not required by both receivers, in order to balance the asymmetry of 235 the side information sequences. In sections IV and V we show that the utility of this description will depend on 236 both the side information sequences and the rates of the private links.  Heegard-Berger problem with successive refinement, is shown in Figure 3a.

251
In this section, we derive the optimal rate distortion region for this setting, and show how it 252 compares to existing results in literature. Besides, we discuss the utility of the common description U 0 253 depending, not only on the side information sequences structures, but also on the refinement link rate 254 R 1 . We illustrate through a binary example that the utility of U 0 , namely the optimality of the choice 255 of a non-degenerate U 0 = ∅, is governed by the quality of the refinement link rate R 1 and the side 256 information structure. The following theorem states the optimal rate-distortion region of the Heegard-Berger problem 259 with successive refinement of Figure 3a.
260 Corollary 1. The rate-distortion region of the Heegard-Berger problem with successive refinement of Figure 3a is given by the set of rate-distortion triples (R 0 , R 1 , D 1 ) satisfying: 1) the following Markov chain is valid: 2) and there exists a function φ : Y 1 × U 0 × U 1 × S 2 →Ŝ 1 such that: Proof: The proof of Corollary 1 follows from that of Theorem 1 by setting R 2 = 0 therein.

262
Remark 4. Recall the coding scheme of Theorem 1. If R 2 = 0, the second partition of the codebook of the common description, which is relevant for Receiver 2, becomes degenerate since, in this case, all the codewords v n 0 of a superbin B 00 (w 0,0 ) are assigned to a single subbin. Correspondingly, the common message that the encoder sends over the common link carries only the indexw 0,0 of the superbin B 00 (w 0,0 ) of the codebook of the common description in which lies the typical pair v n 0 = (s n 2 , u n 0 ), in addition to the indexw 1,0 of the subbin B 10 (w 1,0 ) of the codebook of the individual description in which lies the recovered typical u n 1 . The constraint (14a) on the common rate R 0 is in accordance with that Receiver 2 utilizes only the indexw 0,0 in the decoding. Furthermore, note that the constraints (14b) and (14c) on the sum-rate (R 0 + R 1 ) can be combined as which resembles the Heegard-Berger result of [2, Theorem 2, p. 733].

263
Remark 5. As we already mentioned, the result of Corollary 1 holds for side information sequences that are arbitrarily correlated among them and with the sources. In the specific case in which the user who gets the refinement rate-limited link also has the "better-quality" side information, in the sense that (S 1 , forms a Markov chain, the rate-distortion region of Corollary 1 reduces to the set of all rate-distortion triples (R 0 , R 1 , D 1 ) that satisfy for some joint pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 for which (15) and (16)  Remark 6. In the case in which it is the user who gets only the common rate-limited link that has the "better-quality" side information, in the sense that (S 1 , S 2 ) − − Y 2 − − Y 1 forms a Markov chain, the rate distortion region of Corollary 1 reduces to the set of all rate-distortion triples (R 0 , R 1 , D 1 ) that satisfy (15) and (16) hold. This result can also be conveyed from [3].  Figure 3a, is that, depending on the rate of the refinement link R 1 , resorting to a common auxiliary variable U 0 284 might be unnecessary. Indeed, in the case in which S 1 needs to be recovered losslessly at the first receiver, for 285 instance, parts of the rate-region can be achieved without resorting to the common auxiliary variable U 0 , setting 286 U 0 = ∅, while other parts of the rate region can only be achieved through a non-trivial choice of U 0 .

287
As such, if R 1 ≥ H(S 1 |S 2 Y 1 ), then letting U 0 = ∅ yields the optimal rate region. To see this, note that the rate constraints under lossless construction of S 1 write as: which, can be rewritten as follows where (x) + max{0, x}.

288
Under the constraint that R 1 ≥ H(S 1 |S 2 Y 1 ), the constraints in (21) reduce to the following Next, by noting that max P U 0 |S 1 S 2 H(S 1 |S 2 Y 2 U 0 ) = H(S 1 |S 2 Y 2 ) is achieved by U 0 = ∅, the claim follows.

290
However, when R 1 < H(S 1 |S 2 Y 1 ), the choice of U 0 = ∅ might be strictly sub-optimal (as shown in the 291 following binary example).

293
Let X 1 , X 2 , X 3 and X 4 be four independent Ber(1/2) random variables. Let the sources be 294 S 1 (X 1 , X 2 , X 3 ) and S 2 X 4 . Now, consider the Heegard-Berger model with successive refinement 295 shown in Figure 5. The first user, which gets both the common and individual links, observes the side 296 information Y 1 = (X 1 , X 4 ) and wants to reproduce the pair (S 1 , S 2 ) losslessly. The second user gets 297 only the common link, has side information Y 2 = (X 2 , X 3 ) and wants to reproduce only the component 298 S 2 , losslessly. The side information at the decoders do not exhibit any degradedness ordering, in the sense that none 300 of the Markov chain conditions of Remark 5 and Remark 6 hold. The following claim provides the 301 rate-region of this binary example.

302
Claim 1. The rate region of the binary Heegard-Berger example with successive refinement of Figure 5 is given by the set of rate pairs (R 0 , R 1 ) that satisfy Proof. The proof of Claim 1 follows easily by computing the rate region in the binary setting under study.

303
First, we note that which allows then to rewrite the rate region as The proof of the claim follows by noticing that the following inequalities hold with equality for the 304 choices U 0 = (X 2 , X 3 ) or U 0 = X 2 or U 0 = X 3 .

306
The rate region of Claim 1 is depicted in Figure 6. It is insightful to notice that although the 307 second user is only interested in reproducing the component S 2 = X 4 , the optimal coding scheme that 308 achieves this region sets the common description that is destined to be recovered by both users as one 309 that is composed of not only S 2 but also some part U 0 = (X 2 , X 3 ), or U 0 = X 2 or U 0 = X 3 , of the source 310 component S 1 (though the latter is not required by the second user). A possible intuition is that this 311 choice of U 0 is useful for user 1, who wants to reproduce S 1 = (X 1 , X 2 , X 3 ), and its transmission to also 312 the second user does not cost any rate loss since this user has available side information Y 2 = (X 2 , X 3 ). 313 Figure 6. Rate region of the binary example of Figure 5. The choices U 0 = (X2, X3) or U 0 = X 2 or U 0 = X 3 are optimal irrespective of the value of R 1 , while the degenerate choice U 0 = ∅ is optimal only in some slices of the region.

314
In the following, we consider the model of Figure 3b. As we already mentioned, the reader may 315 find it appropriate for the motivation to think about the side information Y n 2 as being of lower quality 316 than Y n 1 , in which case, the refinement link that is given to the second user is intended to improve its 317 decoding capability. In this section, we describe the optimal coding scheme for this setting, and show 318 that it can be recovered, independently, from the work of Timo et al.
[15] through a careful choice of 319 the coding sets. Next, we illustrate through a binary example the interplay between the utility of the 320 common description U 0 and the side information sequences, and the refinement rate R 2 .

322
The following theorem states the rate-distortion region of the Heegard-Berger model with scalable 323 coding of Figure 3b.

324
Corollary 2. The rate-distortion region of the Heegard-Berger model with scalable coding of Figure 3b is given by the set of all rate-distortion triples (R 0 , R 2 , D 1 ) that satisfy for some product pmf P U 0 U 1 S 1 S 2 Y 1 Y 2 , such that: 325 1) the following Markov chain is valid: 2) and there exists a function φ : Y 1 × U 0 × U 1 × S 2 →Ŝ 1 such that: Proof. The proof of Corollary 2 follows from that of Theorem 1 by seeting R 1 = 0 therein.

326
Remark 8. In the specific case in which Receiver 2 has a better-quality side information in the sense that (S 1 , S 2 ) − − Y 2 − − Y 1 forms a Markov chain, the rate distortion region of Corollary 2 reduces to one that is described by a single rate-constraint, namely for some conditional P U|S 1 S 2 that satisfies E[d 1 (S 1 ,Ŝ 1 )] ≤ D 1 . This is in accordance with the observation 327 that, in this case, the transmission to Receiver 1 becomes the bottleneck, as Receiver 2 can recover the source 328 component S 2 losslessly as long as so does Receiver 1.

335
Theorem 1] to this setting yields a rate-region that is described by the following rate constraints (using the 336 notation of [15, Theorem 1]) , and for j = 0, 1, 2 and l ∈ 1, 2 such that T j ∩ {1, . . . , l} = ∅, the function Φ(T j , l), j = 0, 1, 2, is defined as T j ,2 , evaluated in this case, are given in Table 1. It is easy to see that the region described by (35) can be written more explicitly in this case as Also, setting U 12 = (U 0 , S 2 ) and U 2 = S 2 in (37) one recovers the rate-region of Corollary 2. ( Such a 338 connection can also be stated for the result of Corollary 1 ).

341
Let the sources be S 1 (X 1 , X 2 , X 3 ) and S 2 X 4 . Now, consider the Heegard-Berger model with 342 scalable coding shown in Figure 7. The first user, which gets both only the common link, observes the 343 side information Y 1 = (X 1 , X 4 ) and wants to reproduce the pair (S 1 , S 2 ) losslessly. The second user 344 gets both the common and private links, has side information Y 2 = (X 2 , X 3 ) and wants to reproduce set of all rate pairs (R 0 , R 2 ) that satisfy R 2 ≥ 0 and R 0 ≥ 2.

348
Proof. The proof of Claim 2 follows easily by specializing, and computing, the result of Remark 9 for the example at hand. First note that where equality in all previous inequalities is satisfied with U 0 = (X 2 , X 3 ) or with U 0 = X 2 or U 0 = X 3 .

349
Note as well that the single rate constraint on R 0 writes as: which renders the sum-rate constraint redundant and ends the proof of the claim.

350
The optimal rate region of Claim 2 is depicted in Figure 8, as the region delimited by the lines 351 R 0 = 1 and R 2 = 0. Note that for this example, the source component X 2 , which is the only source 352 component that is required by Receiver 2, needs to be transmitted entirely on the common link so as to 353 be recovered losslessly also by Receiver 1. For this reason, the refinement link is not-constrained and 354 appears to be useless for this example.

355
There is a sharp difference with the binary Heegard-Berger example with successive refinement 356 of Figure 5 for which the refinement link may sometimes be instrumental to reducing the required rate 357 on the common link. With scalable coding, the refinement link with rate R 0 does not improve the rate 358 transmitted on the common link.

359
Also, it is insightful to notice that for this example, because of the side information configuration, the choice U 0 = ∅ in Corollary 2 is strictly suboptimal and results in the smaller region that is described by Figure 8. The optimal rate region for the setting of Figure 7 given by (R 0 ≥ 2, R 2 ≥ 0). The choice of U 0 = ∅ is optimal only in a slice of the region.

Proof of Theorem 1 360
In the following, we give the proof of the converse part and the direct part of Theorem 1.

361
The converse part is strongly dependent on the system model we investigate and consists in a 362 series of careful bounding steps resorting to Fano's inequality, Markov chains and Csiszár-Körner 363 sum-identity.

364
The proof of achievability is two-fold, and consits in proving a general result that holds for a 365 Gray-Wyner setting with side information, and then deriving the optimal choice of the auxiliary 366 codewords involved for the specific setting with degraded reconstruction sets. Assume that a rate triple (R 0 , R 1 , R 2 ) is D 1 -achievable. Let then W j = f j (S n 1 , S n 2 ), where j ∈ 369 {0, 1, 2}, be the encoded indices and letŜ n 1 = g 1 (W 0 , W 1 , Y n 1 ) be the reconstruction sequence at the first 370 decoder such that Ed (n) 1 (S n 1 ,Ŝ n 1 ) ≤ D 1 .

371
Using Fano's inequality, the lossless reconstruction of the source S n 2 at both decoders implies that there exists a sequence n → n→∞ 0 such that: We start by showing the following sum-rate constraint, We have that ≥ I(W 0 W 2 ; S n 1 S n 2 |Y n 2 ) + I(W 1 ; S n 1 |W 0 S n 2 Y n 1 ) (44e) where (a) stems from Fano's inequality (42), which results from the lossless reconstruction of S n 2 at 372 receiver 2.

373
Let us define then: In the following, we aim for single-letter bounds on the two quantities A and B.

374
Since the side information sequences Y n 1 and Y n 2 are not degraded and do not exhibit any structure, 375 together with the sources (S n 1 , S n 2 ), single-letterizing the quantity A can be obtained through some  Let us start by writing that where U 0,i (W 0 , Y i−1 2 , Y n 1,i+1 , S 2,<i> ) ( note that the lossless reconstruction of S n 2 at both receivers is instrumental to the definition of U 0 which plays the role of the common auxiliary variable in the proof of converse), and where (a) follows using the following Csiszár-Körner sum-identity (b) follows using the Csiszár-Körner sum-identity given by while (c) is the consequence of the following sequence of Markov chains where (50a) results from that the source sequences (S n 1 , S n 2 , Y n 1 , Y n 2 ) are memoryless, while (a) is a 380 consequence of that W 0 is a function of the pair of sequences (S n 1 , S n 2 ).

381
To upper-bound the term B, note the following where (a) is a consequence of the following sequence of Markov chains: where (52a) results from that the source sequences (S n 1 , S n 2 , Y n 1 , Y n 2 ) are memoryless, while (a) is a 382 consequence of that W 0 and W 1 are each function of the pair of sequences (S n 1 , S n 2 ).

383
Finally, letting U 1,i (W 1 , Y i−1 1 ) so that the choice of (U 0,i , U 1,i ) satisfy the condition:Ŝ 1,i = g i (Y 1,i , U 0,i , U 1,i , S 2,i ), we write the resulting sum-rate constraint as Let us now prove that the following bound holds We have where (a) is a consequence of Fano's inequality in (41), which results from the lossless reconstruction 384 of S n 2 at receiver 1, and (b) results from the upper bound on B in (51e).

385
As for the third rate constraint we write where (a) is a consequence of Fano's inequality in (42) and (b) stems for the following sequence of Markov Chains where (58a) results from that the source sequences (S n 1 , S n 2 , Y n 1 , Y n 2 ) are memoryless, while (a) is a 386 consequence of that W 0 and W 1 are each function of the pair of sequences (S n 1 , S n 2 ).

387
Let Q be an integer-valued random variable, ranging from 1 to n, uniformly distributed over [1 : n] and independent of all other variables (S 1 , S 2 , U 0 , U 1 , Y 1 , Y 2 ). We have where (a) is a consequence of that all sources (S n 1 , S n 2 , Y n 1 , Y n 2 ) are memoryless.

403
Our scheme has the following parameters: a conditional joint pmf P V 0 U 1 |S 1 S 2 that satisfies (63) and

434
3) Reveal all codebooks and its partitions to the encoder, the codebook of {v n 0 (k 0 )} and its partitions 435 to both receivers, and the codebook of {u n 1 (k 1 , k 0 )} and its partitions to only Receiver 1.

448
This completes the proof of the proposition; and so that of the direct part of Theorem 1.