Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources

The main focus of this paper is the derivation of the structural properties of the test channels of Wyner’s operational information rate distortion function (RDF), R¯(ΔX), for arbitrary abstract sources and, subsequently, the derivation of additional properties for a tuple of multivariate correlated, jointly independent, and identically distributed Gaussian random variables, {Xt,Yt}t=1∞, Xt:Ω→Rnx, Yt:Ω→Rny, with average mean-square error at the decoder and the side information, {Yt}t=1∞, available only at the decoder. For the tuple of multivariate correlated Gaussian sources, we construct optimal test channel realizations which achieve the informational RDF, R¯(ΔX)=▵infM(ΔX)I(X;Z|Y), where M(ΔX) is the set of auxiliary RVs Z such that PZ|X,Y=PZ|X, X^=f(Y,Z), and E{||X−X^||2}≤ΔX. We show the following fundamental structural properties: (1) Optimal test channel realizations that achieve the RDF and satisfy conditional independence, PX|X^,Y,Z=PX|X^,Y=PX|X^,EX|X^,Y,Z=EX|X^=X^. (2) Similarly, for the conditional RDF, RX|Y(ΔX), when the side information is available to both the encoder and the decoder, we show the equality R¯(ΔX)=RX|Y(ΔX). (3) We derive the water-filling solution for RX|Y(ΔX).


The Wyner and Ziv Lossy Compression Problem and Generalizations
Wyner and Ziv [1] derived an operational information definition for the lossy compression problem in Figure 1 with respect to a single-letter fidelity of reconstruction.The joint sequence of random variables (RVs) {(X t , Y t ) : t = 1, 2, . . .} takes values in sets of finite cardinality, {X , Y }, and it is generated independently according to the joint probability distribution function P X,Y .Wyner [2] generalized [1] to RVs {(X t , Y t ) : t = 1, 2, . . .} that take values in abstract alphabet spaces {X , Y } and hence include continuous-valued RVs.(A) Switch "A" Closed: When the side information {Y t : t = 1, 2, . . .} is available noncausally at both the encoder and the decoder, Wyner [2] (see also Berger [3]) characterized the infimum of all achievable operational rates (denoted by R 1 (∆ X ) in [2]), subject to a single-letter fidelity with average distortion less than or equal to ∆ X ∈ [0, ∞).The rate is given by the single-letter operational information theoretic conditional RDF: where M 0 (∆ X ) is the set specified by and X is the reproduction of X. I(X; X|Y) is the conditional mutual information between X and X conditioned on Y, and d X (•, •) is the fidelity criterion between x and x.The infimum in (1) is over all elements of M 0 (∆ X ) with induced joint distributions P X,Y, X of the RVs (X, Y, X) such that the marginal distribution P X,Y is the fixed joint distribution of the source (X, Y).This problem is equivalent to (2) [4].
(B) Switch "A" Open: When the side information is available non-causally only at the decoder, Wyner [2] characterized the infimum of all achievable operational rates (denoted by R * (∆ X ) in [2]), subject to a single-letter fidelity with average distortion less than or equal to ∆ X .The rate is given by the single-letter operational information theoretic RDF as a function of an auxiliary RV Z : Ω → Z: = inf M(∆ X ) I(X; Z|Y) (5) where M(∆ X ) is specified by the set of auxiliary RVs Z and defined as: M(∆ X ) △ = Z : Ω → Z : P X,Y,Z, X is the joint measure on X × Y × Z × X , P Z|X,Y = P Z|X , ∃ meas.fun.
Wyner's realization of the joint measure P X,Y,Z, X induced by the RVs (X, Y, Z, X) is illustrated in Figure 2, where Z is the output of the "test channel", P Z|X .Clearly, R(∆ X ) involves two strategies, i.e., f (•, •) and P Z|X,Y = P Z|X .This makes it a much more complex problem compared to R X|Y (∆ X ) (which involves only P X|X,Y ).Throughout [2], the following assumption is imposed.
Wyner [2] considered scalar-valued jointly Gaussian RVs (X, Y) with square-error distortion and constructed the optimal realizations X and (Z, X) and the function f (X, Z) from the sets M 0 (∆ X ) and M(∆ X ), respectively.Also, it is shown that these realizations achieve the characterizations of the RDFs R X|Y (∆ X ) and R(∆ X ), respectively, and that the two rates are equal, i.e., R(∆ X ) = R X|Y (∆ X ).
(C) Marginal RDF: If there is no side information, {Y t : t = 1, 2, . ..}, or the side information is independent of the source, {X t : t = 1, 2, . ..}, the RDFs R X|Y (∆ X ) and R(∆ X ) degenerate to the marginal RDF R X (∆ X ), defined by (D) Gray's Lower Bounds: A lower bound on R X|Y (∆ X ) is given by Gray in [4] [Theorem 3.1].This bound connects R X|Y (∆ X ) with the marginal RDF and the mutual information between X and Y as follows: Clearly, the lower bound is trivial for values of ∆ X ∈ [0, ∞) such that R X (∆ X ) − I(X; Y) < 0.

Main Contributions of the Paper
We first consider Wyner's [2] RDFs R X|Y (∆ X ) and R(∆ X ) for arbitrary RVs (X, Y) defined on abstract alphabet spaces, and we derive structural properties of the realizations that achieve the two optimal test channels.Subsequently, we generalize Wyner's [2] results to multivariate-valued jointly Gaussian RVs (X, Y).In other words, we construct the optimal multivariate-valued realizations X and ( X, Z) and the function f (X, Z) which achieve the RDFs R X|Y (∆ X ) and R(∆ X ), respectively.In the literature, it is often called achievability of the converse coding theorem.In addition, we use the realizations to prove the equality R(∆ X ) = R X|Y (∆ X ) and to derive the water-filling solution.Along the way, we verify that our results reproduce, for scalar-valued RVs (X, Y), Wyner [2] RDFs and the optimal realizations.However, to our surprise, the existing results from the literature [ [5], Theorem 4 and Abstract and [6], Theorem 3A], which deal with the more general multivariate-valued remote sensor problem (the RDF of the remote sensor problem is a generalization of Wyner's RDF R(∆ X ), with the encoder observing a noisy version of the RVs generated by the source), do not degenerate to Wyner's [2] RDFs, when specilized to scalar-valued RVs (we verify this in Remark 5 by also checking the correction suggested in https://tiangroup.engr.tamu.edu/publications/)(accessed on 3 January 2024).In Section 1.3, we give a detailed account of the main results of this paper.We should emphasize that preliminary results of this paper appeared in [7], mostly without the details of the proofs.This paper is extended [7] and contains complete proofs of the preliminary results of [7], which in some cases are lengthy (see, for example, Section 4, proofs of Theorems 3-5, Corollaries 1 and 2, etc.).

(a)
We consider a tuple of jointly independent and identically distributed (i.i.d.) arbitrary RVs (X n , Y n ) = {(X t , Y t ) : t = 1, 2, . . ., n} defined on abstract alphabet spaces, and we derive the following results.
(a.2) Theorem 2: Structural properties of the optimal reconstruction X, which achieves a lower bound on R X|Y (∆ X ) for mean-square error distortion.Theorem 2 strengthens the conditions for the equality to hold, R X|Y (∆ X ) = R(∆ X ), given by Wyner [2] [Remarks, p. 65] (see Remark 1).However, for finite-alphabet-valued sources with Hamming distance distortion, it might be the case that R X|Y (∆ X ) < R(∆ X ), as pointed out by Wyner and Ziv [1] [Section 3] for the doubly symmetric binary source.
(b) We consider a tuple of jointly i.i.d.multivariate Gaussian RVs (X n , Y n ) = {(X t , Y t ) : t = 1, 2, . . ., n}, with respect to the square-error fidelity, as defined below.
where n x , n y are arbitrary positive integers, X ∈ N(0, Q X ) means X is a Gaussian RV, with zero mean and covariance matrix Q X , and || • || 2 R nx is the Euclidean distance on R n x .To give additional insight we often consider the following realization of side information (the condition DD T ≻ 0 ensures I(X; Y) < ∞, and hence, Assumption 1 is respected).
V n independent of X n , ( where I n y denotes the n y × n y identity matrix.For the above specification of the source and distortion criterion, we derive the following results.(b.1) Theorems 3 and 4: Structural properties of optimal realization of X, which achieves R X|Y (∆ X ), its closed form expression.
(b.2) Theorem 5: Structural properties of optimal realization of X and X = f (Y, Z), which achieve R(∆ X ) and the closed form expression of R(∆ X ).
(b.3)A proof that R(∆ X ) and R X|Y (∆ X ) coincide: Calculation of the distortion region such that Gray's lower bound (8) holds with equality.
In Remark 4, we consider the tuple of scalar-valued, jointly Gaussian RVs (X, Y) with square error distortion function and verify that our optimal realizations of X and the closed form expressions for R X|Y (∆ X ) and R(∆ X ) are identical to Wyner's [2] realizations and RDFs.
We should emphasize that our methodology is different from past studies in the sense that we focus on the structural properties of the realizations of the test channels, that achieve the characterizations of the two RDFs (i.e., verification of the converse coding theorem).Our derivations are generic and bring new insight into the construction of realizations that induce the optimal test channels of other distributed source coding problems (i.e., establishing the achievability of the converse coding theorem).

Additional Generalizations of the Wyner-Ziv [1] and Wyner [2] RDFs
Below, we discuss additional generalizations of Wyner and Ziv [1] and Wyner's [2] RDFs.(A) Draper and Wornell [8] Distributed Remote Source Coding Problem: Draper and Wornell [8] generalized the RDF R(∆ X ), when the source to be estimated at the decoder is S : Ω → S, and it is not directly observed at the encoder.Rather, the encoder observes a RV X : Ω → X (which is correlated with S), while the decoder observes another RV, as side information, Y : Ω → Y, which provides information on (S, X).The aim is to reconstruct S at the decoder by S : Ω → S, subject to an average distortion E{d S (S, S)} ≤ ∆ S , by a function S = f (Y, Z).The RDF for this problem, called the distributed remote source coding problem, is defined by [ where M PO (∆ S ) is specified by the set of auxiliary RVs Z, and defined as: Clearly, if S = X−a.s(almost surely), then R PO (∆ S ) degenerates (this implies the optimal test channel that achieves the characterization of the RDF R PO (∆ S ) should degenerate to the optimal test channel that achieves the characterization of the RDF R(∆ X )) to R(∆ X ).
For scalar-valued jointly Gaussian RVs (S, X, Y, Z, X) with square-error distortion, Draper and Wornell [8] [Equation (3) and Appendix A.1] derived the characterization of the RDF R PO (∆ S ) and constructed the optimal realization S = f PO (Y, Z), which achieves this characterization.
However, it will become apparent in Remark 5 that, when S = X− almost surely (a.s.), and hence R PO (∆ S ) = R(∆ X ), the RDFs given in [ [5], Theorem 4] and [ [6], Theorem 3A], do not produce Wyner's [2] value.We also show in Remark 5 that the same technical issues occur for the correction suggested in https://tiangroup.engr.tamu.edu/publications/(accessed on 3 January 2024).Similarly, when S = X−a.s. and Y = X−a.s.[ [5], Theorem 4] and [[6], Theorem 3A], do not produce the classical RDF R X (∆ X ) of the Gaussian source X. (B) Additional Literature Review: The formulation of Figure 1 is generalized to other multiterminal or distributed lossy compression problems, such as relay networks, sensor networks, etc., under various code formulations and assumptions.Oohama [9] analyzed lossy compression problems for a tuple of scalar correlated Gaussian memoryless sources with square error distortion criterion.Also, he determined the rate-distortion region, in the special case when one source provides partial side information to the other source.Furthermore, Oohama in [10] analyzed separate lossy compression problems for L + 1 scalar correlated Gaussian memoryless sources, when L of the sources provide partial side information at the decoder for the reconstruction of the remaining source and gave a partial answer to the rate distortion region.Additionally, ref. [10] proved that the problem of [10] includes, as a special case, the additive white Gaussian CEO problem analyzed by Viswanathan and Berger [11].Extensions of [10] are derived by Ekrem and Ulukus [12] and Wang and Chen [13], where an outer bound on the rate region is derived for the vector Gaussian multiterminal source.Additional works are [14][15][16] and the references therein.
The vast literature on multiterminal or distributed lossy compression of jointly Gaussian sources with square-error distortion (including the references mentioned above), is often confined to scalar-valued correlated RVs.Moreover, as easily verified, not much emphasis is given in the literature on the structural properties of the realizations of RVs that induce the optimal test channels that achieve the characterizations of the RDFs.
The rest of the paper is organized as follows.In Section 2, we review Wyner's [2] operational definition of lossy compression.We also state a fundamental theorem on mean-square estimation that we use throughout the paper regarding the analysis of (b).The main Theorems are presented in Section 3; some of the proofs, including the structural properties, are given in Section 4. Connections between our results and the past literature are provided in Section 5. A simulation to show the gap between the two rates is given in the same section.

Preliminaries
In this section, we review the Wyner [2] source coding problems with fidelity in Figure 1.We begin with the notation, which follows closely [2].For any matrix A ∈ R p×m , (p, m) ∈ Z + × Z + , we denote its kernel by ker(A) its transpose by A T , and for m = p, we denote its trace by trace(A), and by diag{A}, the matrix with diagonal entries A ii , i ∈ Z p , and zero elsewhere.The determinant of a square matrix A is denoted by det(A).The identity matrix with dimensions p × p is designated as I p .Denote an arbitrary set or space by U and the product space formed by n copies of it by U n △ = × n t=1 U .u n ∈ U n denotes the set of n−tuples u n △ = (u 1 , u 2 , . . ., u n ), where u k ∈ U , k ∈ Z k are its coordinates.Denote a probability space by (Ω, F , P).For a subsigma-field G ⊆ F , and A ∈ F , denote by P(A|G) the conditional probability of A given G; i.e., P(A|G) = P(A|G)(ω), ω ∈ Ω is a measurable function on Ω.
On the above probability space, consider two-real valued random variables (RV) X : Ω → X , Y : Ω → X , where (X , B(X )), (Y, B(Y )) are arbitrary measurable spaces.The measure (or joint distribution if X , Y are Euclidean spaces) induced by (X, Y) on X × Y is denoted by P X,Y or P(dx, dy) and their marginals on X and Y by P X and P Y , respectively.The conditional measure of RV X conditioned on Y is denoted by P X|Y or P(dx|y), when Y = y is fixed.On the above probability space, consider three-real values RVs X : Ω → X , Y : Ω → X , Z : Ω → Z.We say that RVs (Y, Z) are conditional independent given RV X if P Y,Z|X = P Y|X P Z|X −a.s (almost surely) or equivalently P Z|X,Y = P Z|X −a.s; the specification a.s is often omitted.We often denote the above conditional independence by the Markov chain (MC) Y ↔ X ↔ Z.
Finally, for RVs X, Y, etc., H(X) denotes differential entropy of X, H(X|Y) conditional differential entropy of X given Y, and I(X; Y) the mutual information between X and Y, as defined in standard books on information theory [17,18].We use log(•) to denote the natural logarithm.The notation X ∈ N(0, Q X ) means X is a Gaussian distributed RV with zero mean and covariance Q X ⪰ 0, where Q X ⪰ 0 (resp.Q X ≻ 0) means Q X is positive semidefinite (respectively, positive definite).We denote the covariance of X and Y by We denote the covariance of X conditioned on Y by where the second equality is due to a property of jointly Gaussian RVs.

Mean-Square Estimation of Conditionally Gaussian RVs
Below, we state a well-known property of conditionally Gaussian RVs from [19], which we use in our derivations.
Proposition 1. Conditionally Gaussian RVs [19].Consider a pair of multivariate RVs X = (X 1 , . . ., X n x ) T : Ω → R n x and Y = (Y 1 , . . ., Y n y ) T : Ω → R n y , (n x , n y ) ∈ Z + × Z + , defined on some probability distribution Ω, F , P .Let G ⊆ F be a sub−σ−algebra.Assume the conditional distribution of (X, Y) conditioned on G, i.e., P(dx, dy|G) is P−a.s.(almost surely) Gaussian, with conditional means and conditional covariances Then, the vectors of conditional expectations µ X|Y,G If G is the trivial information, i.e., G = {Ω, ∅}, then G is removed from the above expressions.
Note that, if G = {Ω, ∅}, then (26) and ( 27) reduce to the well-known conditional mean and conditional covariance of X conditioned on Y.
For Gaussian RVs, we make use of the following properties.
and denote by F X and F SX the σ−algebra generated by the RVs X and SX, respectively.The following hold.
Proof.This is well-known in measure theory, see [20].
Proof.This is well-known in probability theory, see [20].

Wyner's Coding Theorems with Side Information at the Decoder
For the sake of completeness, we introduce certain results from Wyner's work in [2], which we use in this paper.On a probability space (Ω, F , P), consider a tuple of jointly i with induced distribution P X t ,Y t = P X,Y , ∀t.Consider also the measurable function d be a finite set.
A code (n, M, D X ), when switch "A" is open (see Figure 1), is defined by two measurable functions, the encoder F E and the decoder F D , with average distortion, as follows.
where X n is again a sequence of RVs, ) is said to be achievable if for every ϵ > 0, and n sufficiently large, there exists a code (n, M, D X ) such that Let R denote the set of all achievable pairs (R, ∆ X ), and define, for ∆ X ≥ 0, the infimum of all achievable rates by If for some For arbitrary abstract spaces Wyner [2] characterized the infimum of all achievable rates R * (∆ X ) by the single-letter RDF, R(∆ X ) given by ( 5) and ( 6), in terms of an auxiliary RV Z : Ω → Z. Wyner's realization of the joint measure P X,Y,Z, X induced by the RVs (X, Y, Z, X) is illustrated in Figure 2, where Z is the output of the "test channel", P Z|X .Wyner proved the following coding theorems.
In Figure 1, when switch A is closed and the tuple of jointly independent and identically distributed RVs (X n , Y n ) is defined as in Section 2.3, Wyner [2] generalized Berger's [3] characterization of all achievable pairs (R, ∆ X ), from finite alphabet spaces to abstract alphabet spaces.
A code (n, M, D X ), when switch "A" is closed, (see Figure 1), is defined as in Section 2.3, with the encoder F E , replaced by Let R 1 denote the set of all achievable pairs (R, ∆ X ), again as defined in Section 2.3.For ∆ X ≥ 0, define the infimum of all achievable rates by Wyner [2] characterized the infimum of all achievable rates R 1 (∆ X ) by the single-letter RDF R X|Y (∆ X ) given by (1) and (3).The coding Theorems are given by Theorem 1 with R * (∆ X ) and R(∆ X ) replaced by R 1 (∆ X ) and R X|Y (∆ X ), respectively.That is, R 1 (∆ X ) = R X|Y (∆ X ) (using Wyner's notation [ [2], Appendix A.1]) These coding theorems generalized earlier work of Berger [3] for finite alphabet spaces.Wyner also derived a fundamental lower bound on R * (∆ X ) in terms of R 1 (∆ X ), as stated in the next remark.
Remark 1. Wyner [[2], Remarks, p. 65] (A) For Z ∈ M(∆ X ), X = f (Y, Z), and thus P Z|X,Y = P Z|X .Then, by a property of conditional mutual information and the data processing inequality: where the last equality is defined since X ∈ M 0 (∆ X ) (see [ [2], Remarks, p. 65].Moreover, minimizing (36 (B) Inequality (37) holds with equality, i.e., R ) can be generated as in Figure 2 with I(X; Z|Y) = I(X; X|Y).This occurs if and only if I(X; Z| X, Y) = 0, and follows from the identity and lower bound where the inequality holds with equality if and only if I(X; Z| X, Y) = 0.

Main Theorems and Discussion
In this section, we state the main results of this paper.These are the achievable lower bounds of Lemma 1 and Theorem 2, which hold for RVs defined on general abstract alphabet spaces, and Theorems 4 and 5, which hold for multivariate Gaussian RVs.

Side Information at Encoder and Decoder for an Arbitrary Source
We start with the following achievable lower bound on the conditional mutual information I(X; X|Y), which appears in the definition of R X|Y (∆ X ) of (1); this strengthens Gray's lower bound (8) [ [4], Theorem 3.1].
Lemma 1. Achievable lower bound on conditional mutual information.Let (X, Y, X) be a triple of arbitrary RVs taking values in the abstract spaces X × Y × X , with distribution P X,Y, X and joint marginal the fixed distribution P X,Y of (X, Y).Then, the following hold.(a) The inequality holds: Moreover, the equality holds if and only if i.e., for all ∆ X that belong to strictly positive set D C (X|Y) ⊆ [0, ∞).
The next theorem which holds for arbitrary RVs is further used to derive the characterization of R X|Y (∆ X ) for multivariate Gaussian RVs.Theorem 2. Achievable lower bound on conditional mutual information and mean-square error estimation (a) Let (X, Y, X) be a triple of arbitrary RVs on the abstract spaces X × Y × X , with distribution P X,Y, X and joint marginal the fixed distribution P X,Y of (X, Y).Define the conditional mean of X conditioned on ( X, Y) by for some measurable function f : Y × X → X .
(1) The inequality holds: (2) The equality holds, I(X; X|Y) = I(X; X cm |Y) if anyone of the conditions (i) or (ii) holds. (i) (ii) For a fixed y ∈ Y the function e(y, •) : X → X , e(y, x) = x cm uniquely defines x i.e., e(y, •) is an injective function on the support of x. (47) For all measurable functions (y, x) −→ g(y, x) ∈ R n x , the mean-square error satisfies Proof.See Appendix A.2.

Side Information at Encoder and Decoder for Multivariate Gaussian Source
The characterizations of the RDFs R X|Y (∆ X ) and R(∆ X ) for a multivariate Gaussian source are encapsulated in Theorems 3-5; these are proved in Section 4. These theorems include the structural properties of optimal test channels or realizations of ( X, Z), which induce joint distributions.Furthermore, they achieve the RDFs; the closed form expressions of the RDFs are based on a water-filling.The realization of the optimal test channel of R X|Y (∆ X ) is shown in Figure 3.  n x are the diagonal element of the spectral decomposition of the matrix H = Udiag{h 1 , . . ., h nx }U T , and W i ∈ N(0, h i δ i ), i = 1, . . ., n x , the additive noise introduced due to compression.
The following theorem gives a parametric realization of optimal test channel that achieves the characterization of the RDF R X|Y (∆ X ).
Theorem 3. Characterization of R X|Y (∆ X ) by test channel realization.Consider the RDF R X|Y (∆ X ) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by ( 9)- (18).The following hold.
(a) The optimal realization X that achieves R X|Y (∆ X ) is parametrized by the matrices (H, Q W ) and represented by where Moreover, the optimal parametric realization of X satisfies the following structural properties. (i Proof.The proof is given in Section 4. The next theorem gives additional structural properties of the optimal test channel realization of Theorem 3 and uses these properties to characterize RDF R X|Y (∆ X ) via a water-filling solution.
Theorem 4. Characterization of R X|Y (∆ X ) via water-filling solution.
Consider the RDF R X|Y (∆ X ) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by ( 9)- (18), and its characterization in Theorem 3. The following hold.
(a) The matrices of the parametric realization of X, decompositions with respect to the same unitary matrix UU T where the realization coefficients are and the eigenvalues σ 2 W i and h i are given by Moreover, if σ 2 W i = 0, then h i = 0, and vice versa.(b) The RDF R X|Y (∆ X ) is given by the water-filling solution: where and µ ∈ (0, ∞) is a Lagrange multiplier (obtained from the Kuch-Tucker conditions).(c) Figure 3 depicts the parallel channel scheme that realizes the optimal X of parts (a), (b), which achieves R X|Y (∆ X ).
(d) If X and Y are independent or Y is replaced by a RV that generates the trivial information, i.e., the σ−algebra of Y is σ{Y} = {Ω, ∅} (or C = 0 in (15)), then (a)-(c) hold with Q X|Y = Q X , Q X,Y = 0, and R X|Y (∆ X ) = R X (∆ X ), i.e., reduces to the marginal RDF of X.
Proof.The proof is given in Section 4.
The proof of Theorem 4 (see Section 4) is based on the identification of structural properties of the test channel distribution.Some of the implications are briefly described below.

Conclusion 1:
The construction and the structural properties of the optimal test channel P X| X,Y that achieves the water-filling characterization of the RDF R X|Y (∆ X ) of Theorems 3 and 4 are not documented elsewhere in the literature.
By the realization of the optimal reproduction X, it follows that the subtraction of equal quantities E X Y at the encoder and decoder does not affect the information measure, noting that E X Y = E X Y .
Theorem 4 points (a) and (b) are obtained with the aid of Theorem 3 and Hadamard's inequality, which shows Q X|Y and Σ ∆ have the same eigenvectors.
(ii) Structural properties of realizations of Theorems 3 and 4: The matrices {Σ ∆ , Q X|Y , H, Q W } are nonnegative symmetric and have a spectral decomposition with respect to the same unitary matrix UU T = I n x [21].This implies that the test channel is equivalently represented by parallel additive Gaussian noise channels (subject to pre-processing and post-processing at the encoder and decoder).
(iii) In Remark 4, we show that the realization of optimal X in Figure 3, which achieves the RDF of Theorem 4, degenerates to Wyner's [2] optimal realization, which attains the RDF R X|Y (∆ X ), for the tuple of scalar-valued, jointly Gaussian RVs (X, Y) with square error distortion function.

Side Information Only at Decoder for Multivariate Gaussian Source
Theorem 5 gives the optimal test channel that achieves the characterization of the RDF R(∆ X ) and further states that there is no loss of compression rate if side information is only available at the decoder.That is, although in general, R(∆ X ) ≥ R X|Y (∆ X ), an optimal reproduction X = f (Y, Z) of X, where f (•, •) is linear, is constructed such that the inequality holds with equality.Theorem 5. Characterization and water-filling solution of R(∆ X ).Consider the RDF R(∆ X ) defined by ( 5) for the multivariate Gaussian source with mean-square error distortion, defined by ( 9)- (18).Then, the following hold.
Proof.It is given in Section 4.
The proof of Theorem 5 is based on the derivation of the structural properties and Theorem 4. Some implications are discussed below.

Conclusion 2:
The optimal reproduction X = f (X, Z) or test channel distribution P X| X,Y,Z , which achieves R(∆ X ) of Theorem 5, are not reported in the literature.
(i) From the structural property (1) of Theorem 5, i.e., (77), it follows that the lower bound R(∆ X ) ≥ R X|Y (∆ X ) is achieved by the realization X = f (Y, Z) of Theorem 5b; i.e., for a given Y = y, then X uniquely defines Z.
(ii) If X is independent of Y or Y generates trivial information, then the RDFs R(∆ X ) = R X|Y (∆ X ) degenerate to the classical RDF of the source X, i.e., R X (∆ X ), as expected.This is easily verified from (73) and (76), i.e., Q X,Y = 0, which implies X = Z.For scalar-valued RVs, X : Ω → R, Y : Ω → R, X ∈ N(0, σ 2 X ), and X independent of Y, then the optimal realization reduces to as expected.
(iii) In Remark 4, we show that the realization of optimal X = f (Y, Z), which achieves the RDF R(∆ X ) of Theorem 5, degenerates to Wyner's [2] realization that attains the RDF R(∆ X ), of the tuple of scalar-valued, jointly Gaussian RVs (X, Y), with the square error distortion function.

Proofs of Theorems 3-5
In this section, we derive the statements of Theorems 3-5 by making use of Theorem 2 (which holds for general abstract alphabet spaces) by restricting attention to multivariate jointly Gaussian (X, Y).

Side Information at Encoder and Decoder
For jointly Gaussian RVs (X, Y, X), in the next theorem we identify simple sufficient conditions for the lower bound of Theorem 2 to be achievable.Theorem 6. Sufficient conditions for the lower bounds of Theorem 2 to be achievable.Consider the statement of Theorem 2 for a triple of jointly Gaussian RVs (X, Y, X) on R n x × R n y × R n x , (n x , n y ) ∈ Z + × Z + , i.e., P X,Y, X = P G X,Y, X and joint marginal the fixed Gaussian distribution Moreover, the following hold.
In addition, Conditions 1 and 2 below are sufficient for (84) to hold.
Proof.Note that identity (83) follows from Proposition 1, (26), by letting Y = X and G be the information generated by Y. Consider Case (i); If (84) holds then I(X; X|X cm , Y) = 0.
By (83), Conditions 1 and 2 are sufficient for (84) to hold.Consider Case (ii).Sufficient condition (87) follows from Theorem 2, and implies I(X; X|X cm , Y) = 0.The statement below (87) follows from Proposition 2. Now, we turn our attention to the optimization problem R X|Y (∆ X ) defined by (1) for the multivariate Gaussian source with mean-square error distortion defined by ( 9)-( 18).In the next lemma, we derive a preliminary parametrization of the optimal reproduction distribution P X|X,Y of the RDF R X|Y (∆ X ).

Lemma 2. Preliminary parametrization of optimal reproduction distribution of R X|Y (∆ X ).
Consider the RDF R X|Y (∆ X ) defined by (1) for the multivariate Gaussian source, i.e., P X,Y = P G X,Y , with mean-square error distortion defined by ( 9)-( 18).(a) For every joint distribution P X,Y, X there exists a jointly Gaussian distribution denoted by P G X,Y, X , with marginal the fixed distribution P G X,Y , which minimizes I(X; X|Y) and satisfies the average distortion constraint, i.e., with d and induced by the parametric realization of X (in terms of H, G, Q W ), and X is a Gaussian RV.(c) R X|Y (∆ X ) is characterized by the optimization problem.
and the corresponding characterization of the RDF is Proof.(a) This is omitted since it is similar to the classical unconditional RDF R X (∆ X ) of a Gaussian message X ∈ N(0, Q X ).In the next theorem, we identify the optimal triple (H, G, Q W ) such that (84) or ( 87) hold (i.e., establish its existence), characterize the RDF by R X|Y (∆ X ) (∆ X ) I(X; X|Y), and construct a realization X that achieves it.Theorem 7. Characterization of RDF R X|Y (∆ X ).Consider the RDF R X|Y (∆ X ), defined by (1), for the multivariate Gaussian source with mean-square error distortion, defined by ( 9)- (18).The where The realization of the optimal reproduction X ∈ M G,o 0 (∆ X ), which achieves R X|Y (∆ X ), is given in Theorem 3a, also satisfies the properties stated under Theorem 3a.(i)-(iv).

Proof. See Appendix A.3.
Remark 2. Structural properties of the optimal realization of Theorem 4a.For the characterization of the RDF R X|Y (∆ X ) of Theorem 7, which is achieved by X defined in Theorem 3a in terms of the matrices Σ ∆ , Q X|Y , H, Q W , we show in Corollary 2, the statements of Theorem 4a, i.e., (ii) Σ ∆ , Σ X|Y , H, Q W have spectral repres.with respect to the same unitary matrix UU T = I n x .
To prove the structural property of Remark 2, we use the next corollary, which is a degenerate case of [ [22], Lemma 2] (i.e., the structural properties of test channel of Gorbunov and Pinsker [23] nonanticipatory RDF of Markov sources).
Corollary 1. Structural properties of realization of optimal X of Theorem 4a.Consider the characterization of the RDF R X|Y (∆ X ) of Theorem 7. Suppose Q X|Y ≻ 0 and Σ ∆ ⪰ 0 commute, that is, decompositions with respect to the same unitary matrix that is, the following hold.
In the next corollary, we re-express the realization of X of Theorem 4a, which characterizes the RDF of Theorem 7 using a translation of X and X by subtracting their conditional means with respect to Y, making use of property E X Y = E X Y of (78).This is the the realization shown in Figure 3.
Corollary 2. Equivalent characterization of R X|Y (∆ X ).Consider the characterization of the RDF R X|Y (∆ X ) of Theorem 7 and the realization of X of Theorem 3a and Theorem 4a.Define the translated RVs Let Then, where (H, Q W ) are given in Theorem 3a.Further, the characterization of the RDF R X|Y (∆ X ) (98) satisfies the following equalities and inequality: ≥ inf Moreover, the inequality (121 where Proof.By Theorem 3a, The last equation establishes (115).By properties of conditional mutual information and the properties of optimal realization X, the following equalities hold.
Proof.By invoking Corollary 2, Theorem 7 and the convexity of R X|Y (∆ X ) given by ( 122), then we arrive at the statements of Theorem 4, which completely characterize the RDF R X|Y (∆ X ) and construct a realization of the optimal X that achieves it.
Next, we discuss the degenerate case, when the statements of Theorems 3, 4 and 7 reduce to the RDF R X (∆ X ) of a Gaussian RV X with square-error distortion function.We illustrate that the identified structural property of the realization matrices Σ ∆ , Q X|Y , H, Q W leads to to the well-known water-filling solution.
Remark 3. Degenerate case of Theorem 7 and realization X of Theorem 4a.
Consider the characterization of the RDF R X|Y (∆ X ) of Theorem 7, the realization of X Theorem 3a, Theorem 3, and assume X and Y are independent or Y generates the trivial information; i.e., the σ−algebra of Y is σ{Y} = {Ω, ∅} or C = 0 in ( 15)-( 18).
(a) By the definitions of Q X,Y , Q X|Y then Substituting (142) into the expressions of Theorem 7, the RDF R X|Y (∆ X ) reduces to where and the optimal reproduction X reduces to Thus, R X (∆ X ) is the well-known RDF of a multivariate memoryless Gaussian RV X with squareerror distortion.(b) For the RDF R X (∆ X ) of part (a), it is known [24] that Σ ∆ and Q X have a spectral decomposition with respect to the same unitary matrix, that is, where the entries of (Λ X , ∆) are in decreasing order.Define Then, a parallel channel realization of the optimal reproduction X p is obtained as follows: The RDF R X (∆ X ) is then computed from the reverse water-filling equations as follows. where and µ ∈ [0, ∞) is a Lagrange multiplier (obtained from the Kuch-Tucker conditions).

Side Information Only at Decoder
In general, when the side information is available only at the decoder, the achievable operational rate R * (∆ X ) is greater than the achievable operational rate R 1 (∆ X ) when the side information is available to the encoder and the decoder [2].By Remark 1, R(∆ X ) ≥ R X|Y (∆ X ), and equality holds if I(X; Z| X, Y) = 0.
In view of the characterization of R X|Y (∆ X ) and the realization of the optimal reproduction X of Theorem 3, which is presented in Figure 3, we observe that we can re-write (49) as follows.
Proof.From the above realization of X = f (Y, Z), we have the following.(a) By Wyner, see Remark 1, then the inequalities (36) and (37) hold, and equalities hold if I(X; Z| X, Y) = 0.That is, for any X = f (Y, Z), and by the properties of conditional mutual information, then = I(X; Z| X, Y) + I(X; X|Y) (163) where (α) is due to X = f (Y, Z), (β) is due to the chain rule of mutual information, and (γ) is due to I(X; Z| X, Y) ≥ 0. Hence, (72) is obtained (as in Wyner [2] for a tuple of scalar jointly Gaussian RVs).(b) Equality holds in (164) if there exists an X = f (Y, Z) such that I(X; Z| X, Y) = 0, and the average distortion is satisfied.Taking where Z = g(X, W) is specified by ( 156)-(160), then I(X; Z| X, Y) = 0 and the average distortion is satisfied.Since the realization (156)-( 160) is identical to the realization (73)-(76), then part (b) is also shown.(c) This follows directly from the optimal realization.

Connection with Other Works and Simulations
In this section, we illustrate that for the special case of scalar-valued jointly Gaussian RVs (X, Y), our results reproduce Wyner's [2] results.In addition, we show that the characterizations of the RDFs of the more general problems considered in [5,6] (i.e., where a noisy version of source is available at the encoder) do not reproduce Wyner's [2] results.Finally, we present simulations.

Connection with Other Works
Remark 4. The degenerate case to Wyner's [2] optimal test channel realizations.Now, we verify that for the tuple of scalar-valued, jointly Gaussian RVs (X, Y), with square error distortion function specified below, our optimal realizations of X and closed form expressions for R X|Y (∆ X ) and R(∆ X ) are identical to Wyner's [2] realizations and RDFs (see Figure 4).Let us define: (a) RDF R X|Y (∆ X ): By Theorem 4a applied to (165)-(168), we obtain Moreover, by Theorem 4b the optimal reproduction X ∈ M 0 (d This shows our realization of Figure 3   This shows our value of R(∆ X ) and optimal realization X = f (Y, Z) reproduce Wyner's optimal realization and the value of R(∆ X ) given in [2] (i.e., Figure 4b).Wyner's [2] optimal realization X = f (X, Z) for RDF R(∆ X ) of ( 165)-(168).
In the following Remark, we show that, when S = X-a.s., the realization of the auxiliary RV Z, which is used in the proofs in [5,6] to show the converse coding theorem does not coincide with Wyner's realization [2].Also, their realizations do not reproduce Wyner's RDF (this observation is verified for modified realization given in the correction note without proof in https://tiangroup.engr.tamu.edu/publications/(accessed on 3 January 2024)).The deficiency of the realizations in [5,6] to show the converse was first pointed out in [7], using an alternative proof.
Remark 5. Optimal test channel realization of [5,6] (a) The derivation of [ [5], Theorem 4], uses the following representation of RVs (see [ [5], Equation ( 4)] adopted to our notation using ( 19)): where N 1 and N 2 are independent Gaussian RVs with zero mean, N 1 is independent Y and N 2 is independent of (S, Y).To reduce [5,6] to the Wyner and Ziv RDF, we set X = S−a.s., which then implies, K xs = I, N 2 = 0 − a.s and K xy = 0.According to the derivation of the converse [ [5], Theorem 4] (see [ [5], 3 lines above Equation (32)] using our notation), the optimal realization of the auxiliary RV Z T used to achieve the RDF is where Q X|Y = Udiag(λ 1 , . . ., λ n )U T and U is a unitary matrix, N 3 ∈ N(0, Q N 3 ) such that Q N 3 is a diagonal covariance matrix, with elements given by (for the value of σ 2 3,i , we considered the one given in the correction note in https://tiangroup.engr.tamu.edu/publications/(accessed on 3 January 2024) (although no derivation is given), where it is stated that σ 2  3,i that appeared in the derivation [ [5], proof of theorem 4] should be multiplied by λ i ), Now, we examine whether the realization (179) corresponds to Wyner's realization and induces Wyner's RDF.Recall that the Wyner's [2] RDF, denoted by R X;Z|Y (∆ X ) and corresponding to auxiliary RV Z, is Clearly, the two realizations (179) and (180) are different.Let R X;Z T |Y (∆ X ) denote the RDF corresponding to the realization Z T .Then R X;Z T |Y (∆ X ) can be computed using I(X; Z where H(•|•) denotes the conditional differential entropy.Then, by using However, we note that (i) unlike Wyner's RDF given in (181), which gives R X;Z|Y (∆ and W ∈ N(0, H∆ X ), which is different from the test channel realization in (179).In particular, if Q X|Y = ∆ X , then H = 0 ⇒ W ∈ N(0, 0) and Z = 0−a.s.On the other hand, for the test channel in (179), if Q X|Y = ∆ X , then N 3 ∈ N 0, +∞ , and thus the variance of Z T in (179) is not zero.Further, in Proposition 6, we prove that for the multi-dimensional source, the test channel realization in (179) does not achieve the RDF when water-filling is active, i.e., when at least one component of the source is not reproduced.(d) Special Case Classical RDF: The classical RDF is obtained as a special case if we assume X and Y are independent or Y generates the trivial information {Ω, ∅}; i.e., Y is nonrandom.Clearly, in this case, the RDF R S;Z T |Y (∆ X ) should degenerate to the classical RDF of the source X, i.e., R X (∆ X ), and it should be that X = Z T .However, for this case, (179 Proposition 6.When S = X-a.s., Wyner's [2] auxiliary RV Z and the auxiliary RV Z T given in (177) i.e., the degenerate case of [5,6] (with the correction of footnote 6), are not related by an invertible function.As a result, the computed RDF based on the two realizations are different.
On the other hand, from (177) and ( 178), if water-filling is active, then σ 2  3,i = . Moreover, by comparing Equations ( 187) with ( 178) and ( 188) with (177), it is straightforward to show that f (•) = HU.If HU is not an invertible matrix for all values of the distortion ∆ X , then I(X; Z T ) − I(Y; Z T ) ̸ = I(X; Z) − I(Y; Z).
By (188) it is easy to show that if min(λ i , δ i ) = λ i , HU is not invertible.This implies I(X; Z T ) − I(Y; Z T ) ̸ = I(X; Z) − I(Y; Z).

Simulations
In this section, we provide an example to show the gap between the classical rate distortion R X (∆ X ) defined in (154), the conditional distortion function R X|Y (∆ X ) (69), and to verify the validity of Gray's lower bound (8).Note that in Theorem 5 it is shown that R X|Y (∆ X ) = R(∆ X ), and hence the plot for R(∆ X ) is omitted.For the evaluation, we pick a joint covariance matrix (11) given by In order to compute the rates, we first have to find Q X , Q Y , Q XY and Q X|Y .From the definition of Q (X,Y) given in (11), it is easy to see that the covariance of X, Y, and the joint covariance of X and Y are equal to Then, the conditional covariance Q X|Y , which appears in R X|Y (∆ X ), can be computed from (27).Using Singular Value Decomposition (SVD), we can calculate the eigenvalues of Q X|Y .For this case, the eigenvalues of the conditional covariance are {0.7538,0.2}.Similarly, the eigenvalues of Q X can be determined.Finally, the eigenvalues of Q X and Q X|Y are passed to the water-filling to compute the R X (∆ X ) and R X|Y (∆ X ), respectively.The classical rate distortion, the conditional RDF, and the Gray's lower bound for the joint covariance above are illustrated in Figure 5.It is clear that R X|Y (∆ X ) is smaller, and as the distortion ∆ X increases, the gap between the classical and conditional RDF becomes larger.Gray's lower bound is achievable for some positive distortion values, as provided in (71), i.e., for ∆ X ∈ {∆ X ∈ [0, ∞) : ∆ X ≤ n x λ n x }.Recall that the set of eigenvalues of Q X|Y is {0.7538, 0.2}, and the lower bound is achievable for ∆ X ≤ 2 • 0.2 = 0.4; i.e., for these values R X|Y (∆ X ) = R X (∆ X ) − I(X; Y).

Conclusions
We derived nontrivial structural properties of the optimal test channel realizations that achieve the optimal test channel distributions of the characterizations of RDFs for a tuple of multivariate jointly independent and identically distributed Gaussian random variables with mean-square error fidelity for two cases.Initially, the side information was available at the encoder and decoder, and then it was only available at the decoder.Using the realizations of the optimal test channels, we showed that when the side information is known to the encoder and the decoder, it does not achieve a better compression compared to when side information is only known to the decoder.

Figure 1 .
Figure 1.The Wyner and Ziv [1] block diagram of lossy compression.If switch A is closed, then the side information is available at both the encoder and the decoder; if switch A is open, the side information is only available at the decoder.

Figure 2 .
Figure 2. Test channel when side information is only available to the decoder.

Figure 3 .
Figure 3. R X|Y (∆ X ): A realization of optimal reproduction X over parallel additive Gaussian noise channels of Theorem 4, where h i (b) By (a), the conditional distribution P G X|X,Y is such that, its conditional mean is linear in (X, Y), its conditional covariance is nonrandom, i.e., constant, and for fixed (X, Y) = (x, y), P G X|X,Y is Gaussian.Such a distribution is induced by the parametric realization (88)-(91).(c) Follows from parts (a) and (b).(d) Follows from Theorem 6 and (48) due to the achievability of the lower bounds.

)
(b) It is easy to verify that the above realization of Z T that uses the correction of footnote 6 is precisely the realization given in [[6], Theorem 3A].(c) Special Case: For scalar-valued RVs the auxiliary RV Z T reduces to : Ω → R 2 , Y : Ω → R 2 .
which is fundamentally different from Wyner's degenerate, and correct values Q X