A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs

Examined in this paper is the Gray and Wyner source coding for a simple network of correlated multivariate Gaussian random variables, Y1:Ω→Rp1 and Y2:Ω→Rp2. The network consists of an encoder that produces two private rates R1 and R2, and a common rate R0, and two decoders, where decoder 1 receives rates (R1,R0) and reproduces Y1 by Y^1, and decoder 2 receives rates (R2,R0) and reproduces Y2 by Y^2, with mean-square error distortions E||Yi−Y^i||Rpi2≤Δi∈[0,∞],i=1,2. Use is made of the weak stochastic realization and the geometric approach of such random variables to derive test channel distributions, which characterize the rates that lie on the Gray and Wyner rate region. Specific new results include: (1) A proof that, among all continuous or finite-valued random variables, W:Ω→W, Wyner’s common information, C(Y1,Y2)=infPY1,Y2,W:PY1,Y2|W=PY1|WPY2|WI(Y1,Y2;W), is achieved by a Gaussian random variable, W:Ω→Rn of minimum dimension n, which makes the two components of the tuple (Y1,Y2) conditionally independent according to the weak stochastic realization of (Y1,Y2), and a the formula C(Y1,Y2)=12∑j=1nln1+dj1−dj, where di∈(0,1),i=1,…,n are the canonical correlation coefficients of the correlated parts of Y1 and Y2, and a realization of (Y1,Y2,W) which achieves this. (2) The parameterization of rates that lie on the Gray and Wyner rate region, and several of its subsets. The discussion is largely self-contained and proceeds from first principles, while connections to prior literature is discussed.


Introduction
In their seminal paper, Source Coding for a Simple Network [1], Gray and Wyner characterized the lossless rate region for a tuple of finite-valued random variables, and the lossy rate region for a tuple of arbitrary distributed random variables. Many extensions and generalizations followed Gray and Wyner's fundamental work. Wyner [2] introduced an operational definition of the common information between a tuple of sources that generate symbols with values in finite spaces. Wyner's operational definition of common information is defined as the minimum achievable common message rate on the Gray and Wyner lossless rate region. Witsenhausen [3] investigated bounds for Wyner's common information, and sequences of pairs of random variables in this regard [4]. Gács and Körner [5] introduced another definition of common randomness between a tuple of jointly independent and identically distributed random variables. Benammar and Zaidi [6,7] characterized the Gray-Wyner rate region, when there is side information at the decoders, under various scenarios that include both receivers and reproduce the source symbols without distortion.
Insightful application examples for binary sources are considered in [7] (Section 4.2). In their previous work, Benammar and Zaidi [8,9] characterized the rate distortion function of the Heegard and Berger [10] problem, with two sources and side information at the two decoders (under a degraded set-up). Connections between the Gray and Wyner lossy source coding network and the notions of empirical and strong coordination capacity for arbitrary networks were developed by Cuff, Permuter and Cover [11] and the references therein, where the authors elaborated on the usefulness of the common information between the different network nodes.
Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13], explored the connection of Wyner's common information and the Gray and Wyner lossy rate region, to generalize Wyner's common information to its lossy counterpart, for random variables taking values in arbitrary spaces. They characterized Wyner's lossy common information as the minimum common message rate on the Gray and Wyner lossy rate region, when the sum rate is arbitrarily close to the rate distortion function with joint decoding for the Gray and Wyner lossy network. Applications to encryption and secret key generation are discussed by Viswanatha, Akyol and Rose in [12] (and references therein).
The current paper is focused on the calculations of rates that lie in the Gray and Wyner rate region [1], for two sources that generate symbols, according to the model of jointly independent and identically distributed multivariate correlated Gaussian random variables Y 1 : Ω → R p 1 , Y 2 : Ω → R p 2 , and square-error fidelity at the two decoders. The current literature on methods and algorithms to compute such rates are subject to a number of limitations which often prevent their practical usefulness: (1) Rates that lie in the Gray and Wyner rate region are only known for the special case of a tuple of scalar-valued Gaussian random variables with square error distortion, i.e., p 1 = p 2 = 1 [1,12,13].
(2) Wyner's lossy common information is only computed in closed form, for the special cases of a tuple of scalar-valued Gaussian random variables, [12,13].
(3) Important generalizations to a tuple of sources that generate multivariate Gaussian symbols, require new derivations often of considerable difficulty.
(4) Realizations of the optimal test channel distributions and their structural properties of the various rate distortion functions (RDFs), which are involved in the Gray and Wyner characterization of the rate region, are not developed. (5) A proof that the Gray and Wyner for jointly Gaussian sources, is characterized by a Gaussian auxiliary random variable W is still missing from past literature.
It is known from [1] that the Gray and Wyner rate region can be parameterized by an auxiliary random variable W : Ω → W, via several rate distortion functions. Moreover, subsets of the Gray and Wyner rate region are parameterized by W which satisfies conditional independence (1).
The current paper makes use of the canonical variable form and the weak stochastic realization of the tuple of random variables (Y 1 , Y 2 ), introduced in Section 2 to characterize subsets of the Gray and Wyner rate region, which are parameterized by jointly Gaussian random variables (Y 1 , Y 2 , W) with W : Ω → R n , where n is a finite number, while in some cases, the minimum dimension of W is clarified. The weak stochastic realization is developed to deal with the fundamental issue that, for the Gray and Wyner network, one is given the joint distribution P Y 1 ,Y 2 , while the characterization of the RDFs involves the specification of the test channel distributions, that achieve these RDFs, and the actual construction of realizations of all random variables involved, that induce the test channel distributions. Furthermore, Wyner's common information between Y 1 and Y 2 involves the construction of a joint distribution P Y 1 ,Y 2 ,W where W is the auxiliary random variable that makes Y 1 and Y 2 conditionally independent, i.e., (1) holds.
The rest of the section serves mainly to review the Gray and Wyner characterization of the lossy rate region and the characterization of Wyner's lossy common information.

Literature Review
(a) The Gray and Wyner source coding for a simple network [1].
Consider the Gray and Wyner source coding for a simple network, as shown in Figure 1, for a tuple of jointly independent and identically distributed multivariate Gaussian random variables (Y N 1 , Y N 2 ) = {(Y 1,i , Y 2,i ) : i = 1, 2, . . . , N}, with square error distortion functions at the two decoders, where || · || 2 R p i are Euclidean distances on R p i , i = 1, 2. The encoder takes as its input the data sequences (Y N 1 , Y N 2 ) and produces at its output three messages, (S 0 , S 1 , S 2 ), with binary bit representations (NR 0 , NR 1 , NR 2 ), respectively. There are three channels, channel 0, channel 1, channel 2, with capacities (C 0 , C 1 , C 2 ) (in bits per second), respectively, to transmit the messages to two decoders. Channel 0 is a common channel and channel 1 and channel 2 are the private channels which connect the encoder to each of the two decoders. Message S 0 is a common or public message that is transmitted through the common channel 0 with capacity C 0 to decoder 1 and decoder 2; S 1 is a private message which is transmitted through the private channel 1 with capacity C 1 to decoder 1; and S 2 is a private message, which is transmitted through the private channel 2 with capacity C 2 to decoder 2.
For the source and distortion function specified in (2) and (3), we apply the weak stochastic realization to construct the family of distributions P, which is parameterized by the auxiliary random variable W. Use is made of the characterization of R GW (∆ 1 , ∆ 2 ) is described in terms of an auxiliary random variable, as follows.
Theorem 1 (Theorem 8 in [1]). Let R GW (∆ 1 , ∆ 2 ) denote the Gray and Wyner rate region of the simple network shown in Figure 1. Suppose there existsŷ i ∈Ŷ i such that E{d Y i (Y i ,ŷ i )} < ∞, for i = 1, 2. For each P Y 1 ,Y 2 ,W ∈ P and ∆ 1 ≥ 0, ∆ 2 ≥ 0, define the subset of Euclidean 3−dimensional space where R Y i |W (∆ i ) is the conditional rate distortion function of Y N i , conditioned on W N , at decoder i, for i = 1, 2, and R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) is the joint rate distortion function of the joint decoding of (Y N 1 , Y N 2 ) (all single letters). Let where · c denotes the closure of the indicated set. The achievable Gray-Wyner lossy rate region is given by Gray and Wyner [1] (Theorem 6) also showed that, if (R 0 , R 1 , R 2 ) ∈ R GW (∆ 1 , ∆ 2 ), then where R Y i (∆ i ) is the rate distortion function of Y N i at decoder i, for i = 1, 2, and R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) is the joint rate distortion function of (Y N 1 , Y N 2 ) at the two decoders. The inequality in (9) is called the Pangloss Bound of the Gray-Wyner lossy rate region R GW (∆ 1 , ∆ 2 ). The set of triples (R 0 , R 1 , R 2 ) ∈ R GW (∆ 1 , ∆ 2 ) that satisfy the equality R 0 + R 1 + R 2 = R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) is called the Pangloss Plane of the Gray-Wyner lossy rate region R GW (∆ 1 , ∆ 2 ).
(b) Wyner's common Information of finite-valued random variables. Wyner [2] introduced an operational definition of the common information between a tuple of random variables (Y N 1 , Y N 2 ) that takes values in finite spaces. The first approach of Wyner's operational definition of common information between sequences Y N 1 and Y N 2 is defined as the minimum achievable common message rate R 0 on the Gray-Wyner network of Figure 1.
Wyner's single letter information theoretic characterization of the infimum of all achievable message rates R 0 , called Wyner's common information, is defined by Here, P Y 1 ,Y 2 ,W is any joint probability distribution on (c) Minimum common message rate and Wyner's lossy common information for arbitrary random variables. Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13] explored the connection of Wyner's common information and the Gray-Wyner lossy rate region, to provide a new interpretation of Wyner's common information to its lossy counterpart.
The following characterization was derived by Xu, Liu and Chen [13] (an equivalent characterization was also derived by Viswanatha, Akyol and Rose [12]).
denote the minimum common message rate R 0 on the Gray-Wyner lossy rate region R GW (∆ 1 , ∆ 2 ), with a sum rate not exceeding the joint rate distortion func- such that the following identity holds where the infimum is over all random variables W taking values in W, which parameterize the source distribution via P Y 1 ,Y 2 ,W , having a Y 1 × Y 2 −marginal source distribution P Y 1 ,Y 2 , and induce joint distributions P W,Y 1 ,Y 2 ,Ŷ 1 ,Ŷ 2 which satisfy the constraints.
It is shown in [12,13] that there exists a distortion region such that C GW (Y 1 , it is equal to Wyner's information theoretic characterization of common information between Y 1 and Y 2 , defined by (13). However, their proofs that W is finite-dimensional Gaussian relies on the assumption that W is continuous-valued.
The next theorem is derived by Xu, Liu and Chen [13].

Main Theorems and Discussion
What follows is a brief summary of the main theorems derived in this paper, and relations to the literature.
Theorem 9 shows that, among all joint distributions P Y 1 ,Y 2 ,W induced by a tuple of multivariate correlated Gaussian random variables (Y 1 , Y 2 ), and an arbitrary random variable W : Ω → W, continuous or discrete-valued, Wyner's common information C(Y 1 , Y 2 ), defined by (13), is minimized by a triple (Y 1 , Y 2 , W) which induces a jointly Gaussian distribution P Y 1 ,Y 2 ,W , and W : Ω → W = R n is a finite-dimensional Gaussian random variable. In particular, Theorem 9 gives the weak stochastic realization of (Y 1 , Y 2 ), and the construction of the random variable W, which induce a joint distribution P Y 1 ,Y 2 ,W that achieves the minimum of I(Y 1 , Y 2 ; W) such that W makes Y 1 and Y 2 conditionally independent.
Then, use is made of Theorem 9, Section 2.2, such as Definition 1 of the canonical variable form and the weak stochastic realization to derive Wyner's common information C(Y 1 , Y 2 ) defined by (13), and the optimal realization of the triple (Y 1 , Y 2 , W * ) = (Y 1 , Y 2 , W * ) that achieves C(Y 1 , Y 2 ), as stated in the next theorem.

Theorem 4. Consider a tuple of Gaussian random variables
≥ 0, and apply Algorithm A1 (and the notation therein) to decompose and transform the random variables into a canonical variable form (with abuse of notation, the transform random variables are denoted by (Y 1 , Y 2 ) ∈ G(0, Q cvf ).), (Y 1 , Y 2 ) ∈ G(0, Q cvf ), using the material and notation of Section 2.2, i.e., Definition 1. (a) Then, where (p 11 , p 12 , p 13 ) and (p 21 , p 22 , p 23 ) are the dimensions of the canonical variable decomposition of the tuple (Y 1 , Y 2 ), and Thus, C(Y 12 , Y 22 ) is the most interesting value if defined. (b) The random variable W * defined below is such that C(Y 1 , Y 2 ) of part (a) is attained.
W * : Ω → R n , n ∈ Z + , n 1 = p 11 = p 21 , n 2 = p 12 = p 22 , n 1 + n 2 = n, Theorem 11. (b) for the formulas of L 1 , L 2 , L 3 (24) where the following properties hold: (c) The following operations are defined, using (a), Z 13 = Y 13 , Z 23 = Y 23 , (the components Z 11 and Z 21 do not exist), and these imply, The derivation of Theorem 4 is presented in Section 3.2, after several of the tools are presented, such as, weak stochastic realizations and minimal realizations.  [15] gives an expression analogous to the case (22), which is expressed in terms of the correlation coefficients, ρ i ∈ (−1, 1) and not the canonical correlation coefficients d i ∈ (0, 1).
Similarly, [16], under Lemma 1, reproduces Corollary 1 in [15], with the correlation coefficients ρ i replaced by their absolute values |ρ i |. (b) The derivation in [15,16] is based on the use of rate distortion functions of Gaussian random variables with square-error distortion functions, which presupposes the that auxiliary RV W → W takes continuous values. (c) Refs. [15,16] do not provide a realization of the triple (Y 1 , Y 2 , W * ), as given in Theorem 4 (which is based on applying the parametrization of Theorem 8).
On the other hand, the derivation of Theorem 4 is based on Theorem 9, which shows that, among all joint distributions P Y 1 ,Y 2 ,W induced by a tuple of multivariate correlated Gaussian random variables (Y 1 , Y 2 ), and an arbitrary random variable W : Ω → W, continuous or discrete-valued, Wyner's common information C(Y 1 , Y 2 ), defined by (13), is minimized by a triple (Y 1 , Y 2 , W) which induces a jointly Gaussian distribution P Y 1 ,Y 2 ,W , and W : Ω → W = R n is finite-dimensional Gaussian random variable. (d) The derivation of Theorem 4 contains many intermediate results which are applicable to the problems considered in [15,16], such as Relaxed Wyner's Common Information in [17]. These are discussed in Section 4.3.
Theorem 5 gives a parametric characterization of the Gray and Wyner rate region R GW (∆ 1 , ∆ 2 ), with respect to the variance matrix of the triple of jointly Gaussian random variables (Y 1 , Y 2 , W).
The derivation of Theorem 5 is presented in Section 4.4, after the structural properties of RDFs, of Theorem 12, Theorem 13, Theorem 14 are presented. From Theorem 5, follow simplified characterizations of subsets of the rate region R GW (∆ 1 , ∆ 2 ), such as rates that lie on Pangloss Plane, and rates that correspond to W that make Y 1 and Y 2 conditional independent, i.e., W is such that Utilizing the structural properties of RDFs, , of Theorem 12, Theorem 13, Theorem 14, and Theorem 4, the next theorem is obtained, which gives the formula of Wyner's lossy common information C GW (Y 1 , Y 2 ; ∆ 1 , ∆ 2 ) = C W (Y 1 , Y 2 ). Theorem 6. Consider a tuple (Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)-(79), and subset of the distortion region is defined by ∀j ∈ Z n , d j ∈ (0, 1).
Then, Wyner's lossy common information (calculation of expression in Theorem 3) is given by The derivation of Theorem 6 is presented in Section 4.2 and makes use of a degenerate version of the realization of the triple (Y 1 , Y 2 , W * ) given in Theorem 4, and the RDFs Remark 2. By Theorem 5, a subset of the Gray-Wyner rate region is obtained by replacing (Y 1 , Y 2 , W) ∈ G(0, Q (Y 1 ,Y 2 ,W) ) of (39) and (40) by W that makes Y 1 and Y 2 conditionally independent, i.e., (Z 1 , Z 2 ) ∈ G(0, Q (Z 1 ,Z 2 ) ) and (Z 1 , Z 2 , W) mutually independent (e.g., Q (Z 1 ,Z 2 ) is block-diagonal).

Structure of the Paper
Section 2 introduces the mathematical tools of the geometric approach to Gaussian random variables, the weak stochastic realization of conditional independence (Section 2.4).
Section 3 contains the problem statement, the solution procedure and the weak realization of a tuple of multivariable random variables (Y 1 , Y 2 ) such that another multivariate Gaussian random variable W makes Y 1 and Y 2 conditionally independent (Section 2.5).
Section 4 is concerned with the characterization of the Gray-Wyner rate region R GW (∆ 1 , ∆ 2 ), the characterization of rates that lie on the Pangloss Plane, and Wyner's lossy common information. This section includes calculations of the rate distortion functions , the weak stochastic realizations of the random variables (Y 1 , Y 2 ,Ŷ 1 ,Ŷ 2 , W) which achieve these rate distortion functions, for jointly multivariate Gaussian random variables with square-error distortion functions. Section 5 includes remarks on possible extensions. Appendix A.3 makes use of a matrix equality and a determinant inequality first obtained by Hua LooKeng in 1952, which are used to carry out the optimization problem of Wyner's lossy common information C W (Y 1 , Y 2 ) = C(Y 1 , Y 2 ).

Probabilistic Properties of Tuples of Random Variables
The reader finds in this section the basic properties associated with: (1) the transformation of a tuple of Gaussian multivariate random variables (Y 1 , Y 2 ) in their canonical variable form, and (2) The parameterization of all jointly Gaussian distributions P Y 1 ,Y 2 ,W (y 1 , y 2 , w) by a zero mean Gaussian random variables W : Ω → R k ≡ W such that (a) W makes the multivariate random variables (Y 1 , Y 2 ) conditional independent, and (b) The marginal distribution P Y 1 ,Y 2 ,W (y 1 , y 1 , ∞) = P Y 1 ,Y 2 (y 1 , y 2 ) coincides with the joint distribution of the multivariate random variables (Y 1 , Y 2 ).
Denote the real numbers by R and the set of positive and of strictly positive real numbers, respectively, by R + = [0, ∞) and R ++ = (0, ∞) ⊂ R. The vector space of ntuples of real numbers is denoted by R n . Denote the Borel σ-algebra on this vector space by B(R n ), hence, (R n , B(R n )) is a measurable space.
The expression R n×m denotes the set of n by m matrices with elements in the real numbers, for n, m ∈ Z + . For the symmetric matrix Q ∈ R n×n , the inequality Q ≥ 0 denotes that for all vectors u ∈ R n the inequality u T Qu ≥ 0 holds. Similar, Q > 0 denotes that for all u ∈ R n \{0}, u T Qu > 0. The notation Consider a probability space denoted by (Ω, F, P) consisting of a set Ω, a σ-algebra F of subsets of Ω, and a probability measure P : F → [0, 1].
A real-valued random variable is a function X : Ω → R such that the following set belongs to the indicated σ-algebra, {ω ∈ Ω|X(ω) ∈ (−∞, u]} ∈ F for all u ∈ R. A random variable taking values in an arbitrary measurable space (X, B(X)) is defined correspondingly by X : Ω → X and X −1 (A) = {ω ∈ Ω|X(ω) ∈ A} ∈ B(X), for all A ∈ B(X). The measure (or distribution if X is an Euclidean space) induced by the random variable on (X, B(X)) is denoted by P X or P(dx). The σ-algebra generated by a random variable X : Ω → X is defined as the smallest σ-algebra containing the subsets X −1 (A) ∈ F for all A ∈ B(X). It is denoted by F X . The real-valued random variable X is called Gmeasurable for a σ-algebra G ⊆ F if the subset {ω ∈ Ω|X(ω) ∈ (−∞, u]} ∈ G for all u ∈ R.
Denote the set of positive random variables which are measurable on a sub-σ-algebra G ⊆ F by, The tuple of sub-σ-algebras . The definition can be extended to any finite set of independent sub-σ-algebras.

Geometric Approach of Gaussian Random Variables and Canonical Variable Form
The purpose of this section is to introduce the geometric approach of a tuple of finitedimensional Gaussian random variables using the canonical variable form of the tuple introduced by H. Hotelling, [18]. The use of the geometric approach of two Gaussian random variables with respect to the computation of mutual information is elaborated by Gelfand and Yaglom in [19], making reference to an insight due to Kolmogorov. However, the canonical variable form is not given in [19].
An R n -valued Gaussian random variable with as parameters the mean value m X ∈ R n and the variance Q X ∈ R n×n , Q X = Q T X ≥ 0, is a function X : Ω → R n which is a random variable and such that the measure of this random variable equals a Gaussian measure described by its characteristic function, Note that this definition includes the case in which the random variable is almost surely equal to a constant in which case Q X = 0. A Gaussian random variable with these parameters is denoted by X ∈ G(m X , Q X ).
The effective dimension of the random variable is denoted by dim(X) = rank(Q X ). Any tuple of random variables X 1 , . . . , X k is called jointly Gaussian if the vector (X T 1 , X T 2 , . . . , X T k ) T is a Gaussian random variable. A tuple of Gaussian random variables (Y 1 , Y 2 ) will be denoted this way to save space, rather than by Then, the variance matrix of this tuple is denoted by The reader should distinguish the variance matrices Q (Y 1 ,Y 2 ) and Q Y 1 ,Y 2 ∈ R p 1 ×p 2 . Any such tuple of Gaussian random variables is independent if and only if Q Y 1 ,Y 2 = 0.
Define the canonical variable form of these random variables if a basis has been chosen and a transformation of the random variables to this basis has been carried out such that with respect to the new basis, one has the representation p, p 1 , p 2 , p 11 , p 12 , p 13 , p 21 , p 22 , p 23 ∈ N, Appendix A.1 gives Algorithm A1 to transform the variance matrix Q (Y 1 ,Y 2 ) ≥ 0 by two nonsingular transformations S i ∈ R p i ×p i , i = 1, 2, to its canonical variable form Q cvf of Definition 1, such that Note that the different components of Y 12 and of Y 22 are independent random variables; thus, Y 12,i and Y 12,j are independent, and Y 22,i and Y 22,j are independent, and Y 12,i and Y 22,j are independent, for all i = j; and that Y 12,j and Y 22,j for j = 1, . . . , Proof. The results are immediately obvious from the fact that the random variables are all jointly Gaussian and from the variance formula (50) of the canonical variable form.
Next, the interpretation of the various components of the canonical variable form is defined, as in [20].

Definition 2.
Interpretation of components of the canonical variable form. Consider a tuple of jointly Gaussian random variables (Y 1 , Y 2 ) ∈ G(0, Q cvf ) in the canonical variable form of Definition 1. Call the various components as defined in the next table.
s. the term identical information is used.
). This formula is the subject of much discussion in Gelfand and Yaglom [19] (see Equantion (2.8') and Chapter II).

Theorem 7. Consider a tuple of finite-dimensional Gaussian random variables
Compute the canonical variable form of the tuple of Gaussian random variables according to Algorithm A1. This yields the indices p 11 = p 21 , p 12 = p 22 , p 13 , p 23 , and n = p 11 + p 12 = p 21 + p 22 and the diagonal matrix D with canonical correlation coefficients or singular values where d i are the canonical correlation coefficients, i.e., Proof. The derivation given in Appendix A.3 of [21] (since it is not given in [19]).
By the last entry of (56), it is appropriate to consider to only (Y 1 , Y 2 ) ∈ G(0, Q (Y 1 ,Y 2 ) ) such that p 11 = p 21 = 0, i.e., by removing the identical components prior to the analysis of mutual information problems.

Remark 3.
The material discussed in Section 1.2 makes use of the concepts of this section. The main point to be made is that, in lossy source coding problems, the source distribution is fixed, while the optimal reproduction distribution needs to be found and realized. Then, a pre-encoder can be used by invoking Algorithm A1.

Conditional Independence of a Triple of Gaussian Random Variables
The concept of conditional independence is basic to the entire paper. The definition is provided below. The characterization of a Gaussian measure on a triple of Gaussian random variables having the conditional independence property is stated.

Definition 3. Conditional independence.
Consider a probability space (Ω, F, P) and three sub-σ-algebras F 1 , F 2 , G ⊆ F. Call the sub-σalgebras F 1 and F 2 conditionally independent given, or conditioned on, the sub-σ-algebra G if the following factorization property holds: Denote this property by (F 1 , F 2 |G) ∈ CI.
For Gaussian random variables, the definition of minimality of a Gaussian random variable X that makes two Gaussian random variables (Y 1 , Y 2 ) conditionally independent is needed. The definition is introduced below.

Definition 4.
Minimality of conditional independence of Gaussian random variables. Consider three random variables, Y i : Ω → R p i for i = 1, 2 and X : Ω → R n . Call the random variables Y 1 and Y 2 Gaussian conditionally independent conditioned on or given F X if: (2) (Y 1 , Y 2 , X) are jointly Gaussian random variables. The notation (Y 1 , Y 2 |X) ∈ CIG is used to denote this property. Call the random variables (Y 1 , Y 2 |X) minimally Gaussian conditionally independent if (1) They are Gaussian conditionally independent; (2) There does not exist another tuple (Y 1 , There exists a simple equivalent condition for the conditional independence of tuple of Gaussian random variables by a third Gaussian random variable. This condition is expressed in terms of parameterizing the variance matrix of the tuple as presented in the next proposition.

Proposition 2. [22] (Proposition 3.4) Equivalent condition for the conditional independence of the tuple of Gaussian random variables.
Consider a triple of jointly Gaussian random variables denoted as (Y 1 , Y 2 , X) ∈ G(0, Q) with Q X > 0. This triple is Gaussian conditionally independent if and only if It is minimally Gaussian conditionally independent if and only if, in addition, n = dim(X) = rank(Q Y 1 ,Y 2 ).
It will become apparent in Section 4.4 that the Gray and Wyner lossy rate region R GW (∆ 1 , ∆ 2 ) is parameterized by a triple of jointly Gaussian random variables (Y 1 , Y 2 , W), but not necessarily such that W makes Y 1 and Y 2 conditionally independent. However, subsets of R GW (∆ 1 , ∆ 2 ), are characterized by a triple (Y 1 , Y 2 , W), such that W makes Y 1 and Y 2 conditionally independent.

Weak Realization of a Gaussian Probability Measure on a Tuple of Random Variables
This section is motivated by Theorem 9, which states that, among all joint distributions P Y 1 ,Y 2 ,W induced by a tuple of multivariate correlated Gaussian random variables (Y 1 , Y 2 ), and an arbitrary random variable W : Ω → W, continuous or discrete-valued, Wyner's common information C(Y 1 , Y 2 ), defined by (13), is minimized by a triple (Y 1 , Y 2 , W) which induces a jointly Gaussian distribution P Y 1 ,Y 2 ,W , and W : Ω → W = R n is finite-dimensional Gaussian random variable.
To develop the above results, use is made of the solution of the problem of the weak Gaussian stochastic realization of a tuple of Gaussian random variables. Specifically, to determine a Gaussian probability measure on a triple of Gaussian random variables such that: (1) The measure restricted to the first two Gaussian random variables is equal to the considered probability measure; (2) The third Gaussian random variable makes the other two random variables conditionally independent. This problem does not have a unique solution, there is a set of Gaussian probability measures which meets those conditions. Needed is the parameterization of this set of solutions.
Below, the problem is stated in more detail. Its solution is provided in the next section.

Problem 1.
Weak stochastic realization of a tuple of conditionally independent Gaussian random variables.
Weak stochastic realization problem of a Gaussian random variable. Consider a Gaussian measure P 0 = G(0, Q 0 ) on the space (R p 1 +p 2 , B(R p 1 +p 2 )). Determine the integer n ∈ N and construct all Gaussian measures on the space (R p 1 +p 2 +n , (B(R p 1 +p 2 +n )) such that, if Here, the indicated random variables (Y 1 , Y 2 , X) are constructed having the measure G(0, Q 1 ) with the dimensions p 1 , p 2 , n ∈ Z + , respectively.
The next definition and proposition are about the weak Gaussian stochastic realization of a tuple of jointly Gaussian multivariate random variables and its weak stochastic realization.

Definition 5.
Minimality of weak stochastic realization of a tuple of conditionally independent Gaussian random variables. Consider a Gaussian measure (a) A weak Gaussian stochastic realization of the Gaussian measure G 0 (0, Q (y 1 ,Y 2 ) ) is defined to be a Gaussian measure P 1 = G 1 if there exists an integer n ∈ Z + such that the Gaussian measure ) associated with random variables in the three spaces denoted, respectively, by Y 1 , Y 2 , and X, and such that: where these are Gaussian measures, with means which are linear functions of the random variable X and deterministic variance matrices. (b) The weak Gaussian stochastic realization is called minimal if the dimension n of the random variable X is the smallest possible over all weak Gaussian stochastic realizations as defined in (a). (c) A Gaussian random variable representation of a weak Gaussian stochastic realization G 1 is defined as a triple of random variables satisfying the following relations and these are zero mean independent random variables From the assumptions, it then follows that (Y 1 , Y 2 ) are Gaussian random variables, hence the last equality makes sense.
(d) A minimal Gaussian random variable representation of a weak Gaussian stochastic realization is defined as a triple of random variables as in (c) except that, in addition, it is required that, The case Q X ≥ 0 in (a). (2) is similar.
The next proposition shows the equivalence of weak Gaussian stochastic realizations of Definition 5.   Proof. The derivation given in Appendix A.5 of [21].
Consider Figure 2. The two signals Y 1 , Y 2 are to be reproduced at the two decoders byŶ 1 ,Ŷ 2 subject to the square-error distortion functions. According to Gray and Wyner, the characterization of the lossy rate region is described by a single coding scheme that uses the auxiliary random variable W, which is common to both Y 1 , Y 2 . A subset of the rate triples on the Gray and Wyner rate region, which is achieved by a triple that satisfies Below, this conditional independence is further detailed in terms of the mathematical framework of weak stochastic realization such that ( . . , N at the encoder and decoder with respect to the common and private random variables Denote the parameterized joint measure with respect to W, by (Y 1 , In the following subsections, it will be shown how such a random variable W can be constructed in a number of cases. Algorithm 1. (a) gives the general case, while (b) gives the special case when the joint measure by (Y 1 ,

1.
At the encoder, first compute the variables, then, the triple (Z 1 , Z 2 , W) of jointly Gaussian random variables are such that (Z 1 , The tuple of random variables (Y 1 , Y 2 ) are represented according to At the encoder, compute first the variables, then the triple (Z 1 , Z 2 , W) of jointly Gaussian random variables are independent. 2.
The tuple of random variables (Y 1 , Y 2 ) are represented according to, We emphasize that Y 1 and Y 2 are conditionally independent condition on W if and only if Z 1 and Z 2 are independent.
The validity of the statements of the algorithm follow from the next proposition. (a) At the encoder, the conditional expectations are correct and the definitions of Z 1 and of Z 2 are well defined. (b) The three random variables (Z 1 , Z 2 , W) are independent. Consequently, the three sequences (W N , Z N 1 , Z N 2 ), and messages generated by the Gray-Wyner encoder, Proof. Case (a). This follows from realization theory (since no constraints are imposed). Case (b). This is a specific application of Proposition 3.
For the definition of C(Y 1 , Y 2 ), use is made of the construction of the actual family of measures such that (Y 1 , Y 2 |W) ∈ CIG holds, and the weak strochastic realization. These are presented in Theorem 8 and Corollary 1.

Characterization of Minimal Conditional Independence of a Triple of Gaussian Random Variables
Introduce the notation of the parameterization of the family of Gaussian probability distributions A subset of the set P CIG is the set of distributions P CIG min , with the additional constraint that the dimension of the random variable W is minimal while all other conditions hold, defined by The parameterization of the family of Gaussian probability distributions P CIG and P CIG min require the solution of the weak stochastic realization problem of Gaussian random variables defined by Problem 1. This problem is solved in [22] (Theorem 4.2). For the readers' convenience, it is stated below.
Theorem 8. Ref. [22] (Theorem 4.2) Consider a tuple (Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables. Thus, the random variables Y 1 , Y 2 have the same dimension n = p 1 = p 2 , and their covariance matrix D ∈ R n×n is a nonsingular diagonal matrix with the diagonal ordered real-numbers in the interval (0, 1). Hence, That is, p 11 = p 21 = 0, p 13 = p 23 = 0.
(a) There exists a probability measure P 1 , and a triple of Gaussian random variables Y 1 , Y 2 , W : Ω → R n defined on it, such that (i) There exist a family of Gaussian measures denoted by P ci ⊆ P CIG min , that satisfy (i) and (ii) of (a), and moreover, this family is parameterized by the matrices and sets, as follows.
Furthermore, for any measure P 1 ∈ P CIG min , there exists a triple of state transformation of the form (Y 1 , Y 2 , W) → (S 1 Y 1 , S 2 Y 2 , S W W) for nonsingular square matrices S 1 , S 2 , S W such that the corresponding measure of the three transformed variables belongs to P ci .
The application of Theorem 8 is discussed in the next remark, in the context of parameterizing any rate-triple on the Gray-Wyner lossy rate region (R 0 , R 1 , R 2 ) ∈ R GW (∆ 1 , ∆ 2 ) that lies on the Pangloss plane.

Remark 4.
Applications of Theorem 8. (a) Theorem 8 is a parameterization of the family of Gaussian measures P ci ⊆ P CIG min by the entries of the covariance matrix Q W . Hence, it is at most an n(n + 1)/2−dimensional parameterization; (b) It is shown in Section 4.4 that only a subset of the achievable rate region R GW (∆ 1 , The next corollary is useful to the calculation of C(Y 1 , Y 2 ), since by Theorem 9, an achievable lower bound on I(Y 1 , Y 2 ; W) is incurred by a Gaussian random variable W, such that the distribution P Y 1 ,Y 2 ,W ∈ P ci ⊆ P CIG min , corresponding to W ∈ G(0, Q W ). By Theorem 9, and since C(Y 1 , Y 2 ) is invariant with respect to nonsingular transformations applied to (Y 1 , Y 2 , W), the next corollary gives the realization of (Y 1 , Y 2 ) as defined in Theorem 8, by (77)-(79), expressed in terms of an arbitrary Gaussian random variable W ∈ G(0, Q W ).
Furthermore, the mutual information I(Y 1 , Y 2 ; W) is given by and it is parameterized by Q W ∈ Q W , where Q W is defined by the set of Equation (82).
Proof. The correctness of the realization is due to Proposition 2 and Theorem 8. The calculation of mutual information follows from the realization.

Wyner's Common Information
This section is devoted to the calculation of Wyner's common information C(Y 1 , Y 2 ), defined by (13), for P Y 1 ,Y 2 = G(0, Q (Y 1 ,Y 2 ) ), and the construction of the weak stochastic realization of (Y 1 , Y 2 , W) that achieves this.

Reduction of the Calculation of Wyner's Common Information
First, we show Theorem 9, which states: given a tuple of multivariate correlated Gaussian random variables (Y 1 , Y 2 ), and an arbitrary random variable W (i.e., taking continuous or discrete values), Wyner's common information C(Y 1 , Y 2 ), defined by (13), is minimized by a triple (Y 1 , Y 2 , W) which induces a jointly Gaussian distribution P Y 1 ,Y 2 ,W , and W : Ω → W = R n is finite-dimensional Gaussian random variable. Theorem 9. Consider a tuple of multivariate-correlated Gaussian random variables Y 1 : Ω → R p 1 , Y 2 : Ω → R p 2 , p i ∈ Z + , i = 1, 2 with the variance matrix of this tuple denoted by and, without loss of generality, assume that Q (Y 1 ,Y 2 ) is a positive definite matrix. Let W : Ω → W be any auxiliary random variable, with W being an arbitrary measurable space, and P Y 1 ,Y 2 ,W any joint probability distribution of the triple (Y 1 , Y 2 , W) on the product space The following hold. (a) Define the random variables Z 1 , Z 2 by The inequalities hold: (c) Among all joint distributions, P Y 1 ,Y 2 ,W induced by the jointly Gaussian random variables (Y 1 , Y 2 ) ∈ G(0, Q (Y 1 ,Y 2 ) ) of (91), and an arbitrary random variable W : Ω → W, such that the (Y 1 , Y 2 )−marginal P Y 1 ,Y 2 is the Gaussian distribution P Y 1 ,Y 2 = G(0, Q (Y 1 ,Y 2 ) ), and P Y 1 ,Y 2 |W = P Y 1 |W P Y 2 |W , a jointly Gaussian distribution achieves the lower bounds of I(Y 1 , Y 2 ; W) in part (a), i.e., achieves Wyner's common information C(Y 1 , Y 2 ), defined by (13), and P Y 1 ,Y 2 ,W is induced by an n−dimensional, n ∈ Z + , Gaussian random variable W : , and such a distribution is induced by the triple (Y 1 , Y 2 , W) represented by (W, Z 1 , Z 2 ) are mutually independent, Gaussian random variables.
Proof. (a) (93) is due to an identity of mutual information, (94) is due to the chain rule of entropy, (95) due to conditioning reduces entropy, (96) due to a property of conditional entropy, (97) due to conditioning reduces entropy, (98) is due to definition (92) and (99), is due to maximum entropy principle. Remark 5. Theorem 9 shows that, among all random variables W which induce a joint distribution , then for the Wyner's common information C(Y 1 , Y 2 ) problem, it suffices to consider a jointly Gaussian triple (Y 1 , Y 2 , W) such that W makes Y 1 and Y 2 conditionally independent.

Wyner's Common Information of Correlated Random Variables
Assume that the tuple of multivariate correlated Gaussian random variables (Y 1 , Y 2 ) ∈ G(0, Q (Y 1 ,Y 2 ) ) of Theorem 9 is already transformed to the canonical variable representation, see Definition 1, using Algorithm A1, i.e., by the nonsingular transformation, S = Block-diag(S 1 , S 2 ). Mutual information is invariant with respect to nonsingular transformations, and I(Y 1 , Y 2 ; W) = I(S 1 Y 1 , S 2 Y 2 ; W). By Theorem 9. (c), the joint probability distributions P Y 1 ,Y 2 ,W (y 1 , y 2 , w) are jointly Gaussian, and parameterized by the random variable W. This family of distributions is parameterized by the multidimensional random variable W, such that (Y 1 , Y 2 ) are conditionally independent, conditioned on W, the marginal distribution P Y 1 ,Y 2 ,W (y 1 , y 2 , ∞) = P Y 1 ,Y 2 (y 1 , y 2 ) coincides with the distribution of (Y 1 , Y 2 ), and to represent (Y 1 , Y 2 ).
Using the above construction, one obtains the next theorem.
(a) Theorem 8 holds, and in particular, the family of jointly Gaussian distributions P Y 1 ,Y 2 ,W induced by (Y 1 , Y 2 ) ∈ G(0, Q cvf )) and a Gaussian random variable W : Ω → R n , with minimum dimension n, such that the (Y 1 , , parameterized by W, which induce jointly Gaussian distributions, such that W : Ω → R n is a Gaussian random variable with minimum dimension n, of Corollary 1, (90), optimized over Q W ∈ Q W , where Q W is defined by the set of Equation (82).

Remark 6.
It is apparent that the proof of the formula C(Y 1 , Y 2 ; W) in [15,16] is based on rate distortion function, i.e., they do not directly address Wyner's optimization problem (13), as in Theorem 9, which first shows, among all continuous or discrete random variables, W is Gaussian, and there is no parameterization of the set of distributions P Y 1 ,Y 2 ,W achieving conditional independence P Y 1 ,Y 2 |W = P Y 1 |W P Y 2 |W , i.e., the optimization over the parameterized family of Gaussian measures of Theorem 8 is not given.
In the next theorem, the family of measures P ci ⊆ P CIG min , defined by (80)-(83), which leads to realization of (Y 1 , Y 2 ), given in Corollary 1, is ordered for the determination of a single joint distribution P Y 1 ,Y 2 ,W * ∈ P ci ⊆ P CIG min , which achieves C(Y 1 , Y 2 ). This leads to the realization of (Y 1 , Y 2 ) expressed in terms of W * and vectors of independent Gaussian random variables (Z 1 , Z 2 ), one for each realization, each having independent components. Theorem 11. Consider a tuple (Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, defined by (77)-(79). The following hold. (a) The information quantity C(Y 1 , Y 2 ) is given by (b) The realizations of the random variables (Y 1 , Y 2 , W * ) that achieve C(Y 1 , Y 2 ) are represented by V : Ω → R n , V ∈ G(0, I), the vector V has independent components, Then: (Z 1 , Z 2 , W * ), are independent and (109) hence, the variables (Y 1 , Y 2 , W * ) induce a distribution P Y 1 ,Y 2 ,W * ∈ P ci ⊆ P CIG min . Note that, in addition, each of the random variables Z 1 , Z 2 , and W * have independent components.
Proof. By Theorem 9, the random variables (Y 1 , Y 2 , W) are restricted to jointly Gaussian random variables. Since mutual information I(Y 1 , Y 2 ; W) is invariant with respect to nonsingular transformations S 1 , S 2 , i.e., I(Y 1 , Y 2 ; W) = I(S 1 Y 1 , SY 2 ; W), and (F Y 1 , F Y 2 |F W ) ∈ CIG is equivalent to (F S 1 Y 1 , F S 2 Y 2 |F W ) ∈ CIG, then it suffices to consider the canonical variable form of Definition 1, and to construct a measure that carries a triple of jointly Gaussian random variables Y 1 , Y 2 , W : (a) (1) Take a probability measure P 1 such that there exists a triple of Gaussian random variables Y 1 , Y 2 , W : Ω → R n with P 1 | (y 1 ,y 2 ) = P 0 and (F Y 1 , F Y 2 |F W ) ∈ CIG. It will first be proven that attention can be restricted to those state random variables W of which the dimension equals n = p 12 = p 22 . Suppose that there exists a state random variable W : Ω → R n 1 such that (F Y 1 , F Y 2 |F W ) ∈ CIG and n 1 > n. Hence, W does not make (Y 1 , Y 2 ) minimally conditionally independent. Construct a minimal vector which makes the tuple minimally conditionally independent according to the procedure of [22] (Proposition 3.5). Thus, Then, (F Y 1 , F Y 2 |F W 2 ) ∈ CIG min and the dimension of W 2 is n = p 12 = p 22 . Determine a linear transformation of W 2 by a matrix L 15 ∈ R n×n such that It is then possible to construct a matrix L 14 ∈ R (n 1 −n)×n 1 such that ∈ G(0, I n 1 ), rank L 13 L 14 = n 1 , and, due to L 14 Q W L T 13 = 0, W 3 , W 4 are independent random variables. See [23] [Theorem 4.9] for a theorem with which the existence of L 4 can be proven. Note further that F W = F W 3 ,W 4 .
By properties of mutual information, it now follows that Thus, for the computation of C(Y 1 , Y 2 ), attention can be restricted to those state variables W which are of miminal dimension.
(2) Take a probability measure P 1 such that there exists a triple of Gaussian random variables According to [22] (Theorem 4.2), there exist in general many such measures which are parameterized by the matrices and the sets, as stated in Theorem 8, (b), and defined by (80)-(83).
(3) Then, the mutual information of the triple of Gaussian random variables is calculated, using Theorem 8. (b) for any choice of Q W ∈ Q W , where Q W is given by (82). Then The following calculations are then obvious: From the above calculations, it then follows The above calculations verify the statements of Corollary 1.
(4) The computation of C(Y 1 , Y 2 ) requires the solution of an optimization problem.
Since the first term in (115), 1 2 ∑ n i=1 ln(1 − d 2 i ), does not depend on Q W and the natural logarithm is a strictly increasing function, then Define: Note that the expression L 1 (Q W ) ∈ R n×n is a non-symmetric square matrix in general. It will be proven that From these two relations, it follows that Q * W = I ∈ R n×n is the unique solution of the supremization problem.
The inequality in (119) follows from Proposition A4. The equality of (120) is proven in two steps. If Q W = I, then the equality of (120) holds as follows from direct substitution in (117). The converse is proven by contradiction. Suppose that Q W = I. Then, it again follows from Proposition A4 that strict inequality holds in (119). Hence, the equality is proven.
(b) It follows from part (a) of the theorem that C(Y 1 , Y 2 ) is attained as the mutual information I(Y 1 , Y 2 ; W) for a random variable W with Q W = Q = I. Consider now a triple of random variables (Y 1 , Y 2 , W) ∈ G(0, Q s (I)) as defined in (80)-(83), hence, Q W = I. Denote the random variable W from now on by W * to indicate that it achieves the infimum of the definition of C(Y 1 , Y 2 ). Thus, Q W * = I and (Y 12 , Y 22 , W * ) ∈ G(0, Q s (I)), Let V : Ω → R n 12 be a Gaussian random variable with V ∈ G(0, I) which is independent of (Y 1 , Y 2 , W). Define the new state variable W = L 1 Y 1 + L 2 Y 2 + L 3 V. Then, (Y 1 , Y 2 , V, W * ) are jointly Gaussian and it has to be shown that then Q W = I, Q Y 1 ,W = D 1/2 , and Q Y 2 ,W = D 1/2 . These equalities follow from simple calculations using the expressions of L 1 , L 2 , and L 3 which calculations are omitted. It then follows from those calculations and the definition of the Gaussian measure G(0, Q s (I)) that, almost surely, W = W * .
The signals are then represented by It is proven that the triple of random variables (Z 1 , Z 2 , W * ) are independent.
Hence, the original signals are represented as shown by the formulas,

Wyner's Common Information of Arbitrary Gaussian Random Variables
First, the two special cases of (1) a tuple of independent Gaussian random variables and (2) a tuple of identical Gaussian random variables are analyzed. From those results and that of the previous subsection, one can then prove Wyner's common information for arbitrary Gaussian random variables.
The special case of the canonical variable form with only private parts is presented below.

Proposition 5.
Consider the case of a tuple of Gaussian vectors with only private parts. Hence, the Gaussian measure is (124) (a) The minimal σ-algebra F W which makes Y 13 , Y 23 conditionally independent is the trivial (c) The weak stochastic realization that achieves C(Y 13 , Y 23 ) = 0 is The special case of canonical variable form with only identical parts is presented below.

Proposition 6.
Consider the case of a tuple of Gaussian vectors with only the identical part. Hence the Gaussian measure is, (a) The only minimal σ-algebra which makes Y 11 and Y 21 Gaussian conditional-independent is The weak stochastic realization is again simple, the variable W equals the identical component and there is no need to use the signals Z 1 and Z 2 . Thus, the representations are, Theorem 4 is now proven. Thus, the setting is that of a tuple of arbitrary Gaussian random variables, not necessarily restricted to the correlated parts of these random variables of Theorem 8, by (77)-(79). It is shown that C(Y 1 , Y 2 ) is computed by a decomposition and by the use of the formulas previously obtained in Section 3.2.

Proof of Theorem 4. (a)
The latter equality follows from, respectively, Proposition 6, Theorem 11 and Proposition 5 (a and b). It will be shown that C(Y 1 , Y 2 ) is less than or equal to the right-hand side of Equation (129). From the latter inequality and the above inequality then follows the expression according to Equation (129). To be specific, it will be proven that C(Y 1 , Y 2 ) is less than the expression I(Y 1 , Y 2 ; W * ) where W * is defined in statement (b) of the proposition. It then follows from the proof of Theorem 11 that (F Y 12 , F Y 22 |F W * 2 ) ∈ CIG min . Then: The latter equality is proven as follows. In the first case, when p 13 > 0, p 23 > 0, and p 11 = p 12 = p 21 = p 22 = 0, then Y 1 = Y 13 and Y 2 = Y 23 are independent random variables. It then follows from Proposition 5 that I(Y 1 , Y 2 ; 0) = I(Y 13 , Y 23 ; 0) = 0. In the second case, when p 12 = p 22 > 0, p 13 ≥ 0, p 23 ≥ 0, and p 11 = p 21 = 0, it follows from Proposition A1 and from Theorem 11 that In the third case, when p 11 = p 21 > 0 and the other p ij indices are arbitrary, then I(Y 1 , Y 2 ; W * ) = +∞. Hence, the inequality C(Y 1 , Y 2 ) ≤ right-hand side is proven and hence equality holds.
(c) This directly follows from Proposition 4. See also Section 3.6 of [21].
A procedure for the numerical calculations of Wyner's information common information is given in Section 3.7 of [21].

Parametrization of Gray and Wyner Rate Region and Wyner's Lossy Common Information
This section is devoted to the characterizations of rates that lie in the Gray-Wyner rate region for a tuple of Gaussian random variables with square error distortion functions.
By Gray-Wyner [1] (Theorem 8), reproduced in Theorem 1 to characterize rate triples ∆ 2 ), and understand the structural properties of the realizations.

Theorem 12. Ref. [24] Consider a tuple of Gaussian random variables Y
and the mean square error satisfies Moreover, inequalities in (130) and (131) hold with equality if there exists a jointly Gaussian realization of (Ŷ 1 ,Ŷ 2 ) or a Gaussian test channel distribution PŶ 1 ,Ŷ 2 |Y 1 ,Y 2 such that the joint distribution PŶ 1 ,Ŷ 2 ,Y 1 ,Y 2 is jointly Gaussian, and such that the following identities both hold; (b) A realization that achieves the lower bounds of part (a), i.e., satisfies (132), is the Gaussian realization of (Y 1 , Y 2 ,Ŷ 1 ,Ŷ 2 ) given by if Q (Y 1 ,Y 2 ) > 0 then where (E 1 , E 2 ) is the error tuple, that satisfies the structural property, and the variance matrix of this tuple is, where the test channel distribution PŶ 1 ,Ŷ 2 |Y 1 ,Y 2 or the joint distribution PŶ 1 ,Ŷ 2 ,Y 1 ,Y 2 is induced by the realization of part (b).
The conditional rate distortion function, derived in [25,26], is also required.
Theorem 13. Ref. [25] (Theorem 1, Thorem 4), [26] Consider a triple of random variables Y i : Ω → R p i , i = 1, 2, W : Ω → W, where W is continuous or finite-valued, with joint distribution P Y 1 ,Y 2 ,W , and marginal distributions P Y 1 ,Y 2 and P Y i , i = 1, 2, the jointly Gaussian Then, the following hold. (a) For an arbitrary random variable, W : Ω → W, the mutual information I(Y i ;Ŷ i |W) satisfies and the mean square error satisfies Moreover, inequalities in (141) and (142) hold with equality, if there exists a realization ofŶ 1 of the test channel distribution PŶ i |Y i ,W , such that the joint distribution PŶ i ,Y i ,W satisfies the identity: (b) Suppose P Y 1 ,Y 2 ,W is a jointly Gaussian distribution and W : Ω → R n is Gaussian. A realization that achieves the lower bounds of part (a), i.e., satisfies (143), is the Gaussian realization of (Y i , W,Ŷ i ) given byŶ where † denotes the pseudoinverse of a matrix, E i is the error, that satisfies the structural property, where the test channel distribution PŶ i |Y i ,W or the joint distribution PŶ i ,Y i ,W is induced by the above realization.
The following is stated as a conjectured, because it is not shown in this paper; it can be shown using Theorem 13.

Conjecture 1.
Conditional RDF of Gaussian sources with arbitrary conditioning RV Consider a triple of random variables Y i : Ω → R p i , i = 1, 2, W : Ω → W, where W is continuous or finite-valued, with joint distribution P Y 1 ,Y 2 ,W , and marginal distributions P Y 1 ,Y 2 and P Y i , i = 1, 2, the jointly Gaussian distribution (Y 1 , Then, the following hold. (a) For an arbitrary random variable, W : Ω → W, andX cm i satisfying (143), the following lower bounds hold. where Moreover, the inequalities in (153), (157), are achieved if, (i) (141) holds, and (ii) the mutual information I(X i ;X i |W = w) and ∆ i (w) for i = 1, 2, are independent of w ∈ W. (b) The rate distortion function R X i |W (∆ i ) for W : Ω → W a continuous or finite-valued, achieves a minimum value if, (i) W : Ω → R n , n ∈ Z + is Gaussian, and P Y i ,W is jointly Gaussian, (ii) (X i , X i , W) is given by the realization of Theorem 13.(b) for i = 1, 2.
The characterization of the marginal RDFs R Y i (∆ i ), i = 1, 2-which are well-known, and can be found in many books-is also needed; the weak realization of the test channel, which follows from Theorem 13 (see also [25]), as a degenerate case, and summarized in the next theorem, is important in this paper.

Theorem 14.
Ref. [25] (Theorem 1, Theorem 4) Consider a tuple of Gaussian random variables R p i , the statements of Theorem 13 hold with W, generating the trivial information, i.e., F W = {Ω, ∅}. That is, the marginal RDFs R Y i (∆ i ) are characterized by where the test channel distribution PŶ and where E i is the error that satisfies the structural property Then, we express the characterization of the joint RDF of Theorem 12, using the canonical variable form, and the canonical correlation coefficients. The special case when Q (E 1 ,E 2 ) is is block-diagonal is given in [27].

Corollary 2.
Consider the statement of Theorem 15, and without loss of generality, assume Theorem 15, and corresponding to p 11 = p 21 = p 11 = p 21 = 0 satisfies the lower bound (R Y 2 |Y 1 (∆ 2 ) is obtained from Theorem 13 by letting W = Y 1 .), Moreover, the inequalities (184) and (185) hold with the equalities, on the strictly positive surface Proof. The lower bound (183) is due to Gray [28]. The equality in (184) follows by using the values of the rate distortion functions in the right hand side of (183). Equality (185) follows from the singular value decomposition of the matrices given in Theorem 15, using To establish the equalities, note that (177) with det(Q cvf ) = 1 equivalently p 12 = p 22 = 0, is precisely (185). Moreover, it can be easily verified that p 12 = p 22 = 0 for the distortion region D C (Y 1 , Y 2 ).

Wyner's Lossy Common Information of Correlated Gaussian Vectors
Derived in this section are the characterizations of C GW (Y 1 , Y 2 ; ∆ 1 , ∆ 2 ) via Theorem 2, for jointly Gaussian random variables with square-error distortion, as well as C W (Y 1 , Y 2 ) via Theorem 3.

Definition 7.
Wyner's lossy common information of a tuple of Gaussian multivariate random variables. Consider a tuple of jointly Gaussian random variables Y 1 : , and square error distortion functions between (y 1 , y 2 ), and its reproduction (ŷ 1 ,ŷ 2 ), given by where || · || 2 R p i denotes Euclidean distances on R p i , i = 1, 2. (a) Wyner's common information (information definition) of the tuple of Gaussian random variables (Y 1 , Y 2 ) is defined by the expression Call any random variable W as defined above such that (Y 1 , If there exists a random variable W * : Ω → R n * with n * ∈ Z + = {1, 2, . . . , } which attains the infimum; thus, if C(Y 1 , Y 2 ) = I(Y 1 , Y 2 ; W * ), then call that random variable a minimal information state of the tuple (Y 1 , Y 2 ).
(b) Wyner's common information (operational definition) is defined for a tuple of strictly positive real numbers γ = (γ 1 , provided identity (15) holds, i.e., R Y 1 |W ( By the above definition, the problem of calculating Wyner's lossy common information via (18) is decomposed into the characterization of C(Y 1 , Y 2 ) such that identity (15) is satisfied. This follows from the fact that the only difference between C W (Y 1 , Y 2 ) and C(Y 1 , Y 2 ) is the specification of the region D W such that C GW (Y 1 , In the next theorem, we make use of the characterizations of the various rate distortion functions, and the test channel realizations to identify subsets of the rate region that lie on the Pangloss plane, and are consistent with the characterization of Viswanatha, Akyol and Rose [12] (Theorem 1, Equations (19) and (20)).

Theorem 16.
Consider a tuple (Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict the attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)-(79). Furthermore, consider a realization of the random variables (Y 1 , Y 2 ) which induces the family of measures P ci ⊆ P CIG min , as defined in Corollary 1, by (84)-(88). Then, the following hold. (a) The joint rate distortion function R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) of (Y 1 , Y 2 ) with square error distortion satisfies where D C (Y 1 , Y 2 ) is a strictly positive surface, defined by and where Q (E 1 ,E 2 ) is the variance of the errors E i = Y i − Y i , i = 1, 2, with parameters p 11 = p 21 = p 12 = p 22 = 0, and p 13 = p 23 = n. The conditional rate distortion functions R Y i |W (∆ i ) of Y i conditioned on W with square error distortion, and mutual information I(Y 1 , Y 2 ; W) satisfy where trace(Q E i ), i = 1, 2 are defined as in (191).
(b) The representations of reproductions (the reader may verify that the realization satisfies the conditions given in Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)), ( Y 1 , Y 2 ) of (Y 1 , Y 2 ) at the output of decoder 1 and decoder 2, which achieve the joint rate distortion functions and are parameterized by Q W ∈ Q W , where Q W is defined by the set of Equation (82). Moreover, the joint distribution P Y 1 ,Y 2 , Y 1 , Y 2 ,W satisfies (the reader may verify that conditions (210) are identical to Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)) for rates that lie on the Pangloss plane) (c) Consider part (a) and the realization of part (b). Then, i , ∀i. Then, the conditional RDFs R Y i |W * (∆ i ) are given by and the optimal ∆ 1,j , ∆ 2,j are obtained from the water-filling equations, and the representations of part (b) hold, with Q E i , Q Y i |W * diagonal matrices.
Proof. (a) Since the attention is restricted to the correlated parts of these random variables, as defined in Theorem 8, by (77)-(79), then the statements of joint RDF R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) of part (a) are a special case of Theorem 12. (c), and obtained from Corollary 2. Similarly, expressions (193)-(195) follow from (13). However, as demonstrated shortly, these also follow, from the derivation of part (b). (b) Recall that the joint rate distortion function is achieved by a jointly Gaussian distribution P Y 1 ,Y 2 , Y 1 , Y 2 such that the average square-error distortions are satisfied. Consider the realization of the random variables (Y 1 , Y 2 ) which induce the family of measures P ci ⊆ P CIG min , as defined in Corollary 1, by (84)-(88). By properties of mutual information, then n ln(2πe), maximum entropy of Gaus. dist. (222) where The average distortion satisfies Furthermore, if E[Y 2 |F Y 2 ] = Y 2 then inequality (225) holds with equality.
It can be verified that the representations (196)-(209) satisfy and that all inequalities become equalities. The decomposition of the joint distribution according to (210) follows from the representations of ( Y 1 , Y 2 ), and similarly for (211) To determine the entire set that characterizes the Pangloss plane, we need to consider the rate distortion function (177) with (178), and the general realization (39). We do not pursue this further, because it requires the closed-form solution of R Y 1 ,Y 2 (∆ 1 , ∆ 2 ), which is currently an open and challenging problem, and beyond the scope of this paper. We should mention that the analysis of the scalar-valued Gaussian example in [12,13], i.e., when p 1 = p 2 = 1, made use of closed-form expression of R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) due to [13].
Proof of Theorem 6. One way to prove the statement is to compute the characterizations of the rate distortion functions , using the realization of the random variables (Y 1 , Y 2 ) which induce the family of measures P ci ⊆ P CIG min , as defined in Corollary 1, by (84)-(88). In view of Definition 7. (b), it suffices to verify that identity (15) holds, i.e., R Y 1 |W (∆ 1 ) + R Y 2 |W (∆ 2 ) + I(Y 1 , Y 2 ; W) = R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) for (∆ 1 , ∆ 2 ) ∈ D W , for the choice W = W * ∈ G(0, I) which achieves the minimum in (188) (i.e., due to Theorem 11. (b)). Similar to Theorem 16, it can be shown that the conditional RDFs R Y i |W * (∆ i ), i = 1, 2 are given by The pay-off of the joint RDF R Y 1 ,Y 2 (∆ 1 , ∆ 2 ) in (190) is related to the pay-offs of the conditional RDFs For (∆ 1 , ∆ 2 ) ∈ D W defined by (47), it then follows from (232), the identity This completes the proof.

Applications to Problems of the Literature [15-17]
The next two corollaries illustrate the application of the results developed in this paper to the optimization problems analyzed in [15][16][17].

Corollary 3.
Applications to problems in [15] Consider the Gaussian secure source coding and Wyner's common information [15], defined by the optimization problem [15] (see Equation (18), Section IV.B), arg min where the tuple (Y 1 , Y 2 ) are the zero mean jointly Gaussian, and W : Ω → W is continuous or discrete-valued random variable (The derivation of the formula for (235) in [15] makes use of rate distortion functions, [15] (Equation (47))). Then, the following hold. For any jointly distributed random variables (Y 1 , Y 2 , W) that minimize the expression in (235), there exists a jointly Gaussian triple (Y 1 , Y 2 , W) such that W :→ R n is a Gaussian random variable, which achieves the same minimum value. Moreover, the following characterization of (235) holds.
arg min = arg min where Q W = D W = Diag(q 1 , q 2 , . . . , q n ) ∈ Q W , q 1 ≥ q 2 ≥ . . . ≥ q n > 0, Q W = (82). (238) Proof. By the use Theorem 9. (c), it suffices to restrict attention to jointly Gaussian random variables (Y 1 , Y 2 , W). Transform the tuple (Y 1 , Y 2 ) in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)-(79), and consider the realization of the transformed random variables of Corollary 1. Then, the value of I(Y 1 ; W) + I(Y 2 , Y 2 ; W) is identical to the value of the same expression, evaluated using the realization of Corollary 1. By simple evaluation, using the realization of Corollary 1, and it is parameterized by Q W ∈ Q W , where Q W is defined by the set of Equation (82). By Hadamard's determinant inequality, an achievable lower bound on the first right-hand side term of (239), holds if Q W ∈ Q W and (I − D 1/2 Q −1 W D 1/2 ) is a diagonal matrix, and this lower bound is achieved by a diagonal Q W ∈ Q W . Furthermore, by recalling the derivation of Theorem 11, an achievable lower bound on the second right-hand side term of (239) holds, i.e., of I(Y 1 , Y 2 ; W), when Q W ∈ Q W is diagonal. Hence, both lower bounds are achieved simultaneously, by Q W ∈ Q W and Q W a diagonal matrix. Then, an achievable lower bound on (239) is obtained, if Q W is specified by (238).
The remaining optimization problem in (237) is easily carried out, and hence omitted. Corollary 4 illustrates the application of the results developed in this paper to the Gaussian relaxed Wyner's common information [16,17] (Definition 2 and Section III).

Corollary 4.
Applications to problems in [16,17] Consider the Gaussian relaxed Wyner's common information considered in [16,17] (see Definition 2 and Section III of [17]) where the tuple (Y 1 , Y 2 ) are zero mean jointly Gaussian, and W : Ω → W is continuous or discrete-valued random variable (the value of (240) computed in [16,17], Theorem 4, is different from (241); moreover, the derivation in [16,17], Section III.A, is different from the derivation presented below). Then Proof. By the use Theorem 9. (c), it suffices to restrict the attention to jointly Gaussian random variables (Y 1 , Y 2 , W). By Proposition 3 or Corollary 1, there exists a family of realizations of (Y 1 , Y 2 ) parameterized by a Gaussian random variable W, which induces conditional independence P Y 1 ,Y 2 |W = P Y 1 |W P Y 2 |W , and hence the lower bound is achieved, i.e., the constraint in (240) is always satisfied, because the minimizer is such that I(Y 1 ; Y 2 |W) = 0, i.e., the constraint is not active. Hence, the general solution of (240) is the one given in Theorem 4.

Remark 9.
Corollary 4 implies that the definition of the relaxed Gaussian Wyner's common information considered in [16,17] (see Definition 2 and Section III of [17]) should be replaced by min P W|Y 1 ,Y 2 : I(Y 1 ;Y 2 |W)=γ I(Y 1 , Y 2 ; W), i.e., the inequality is replaced by an equality, so that the constraint is active for all γ ∈ (0, ∞).
Theorem 17 gives the parameterization of a subset of the Pangloss Plane, as a degenerate case of Theorem 5. Theorem 17. Consider the statement of Theorem 5. (a) Rate triples (R 0 , R 1 , R 2 ) that lie on the Pangloss Plane, is determined by the subset of the rate region R GW (∆ 1 , ∆ 2 ) of Theorem 5. (b), such that the joint distribution P W,Y 1 ,Y 2 ,Ŷ 1 ,Ŷ 2 satisfies the conditions,
From Theorem 16 and Theorem 17. (b) follows a simpler parameterization of rates that lie on the Pangloss Plane of the Gray-Wyner rate region R GW (∆ 1 , ∆ 2 ), when (Y 1 , Y 2 ) are in canonical variable form.

Conclusions
This paper formulates the classical Gray and Wyner source coding for a simple network with a tuple of multivariate, correlated Gaussian random variables, with square-error fidelity at the two decoders, from the geometric approach of a Gaussian random variables and the weak stochastic realization of correlated Gaussian random variables. This approach leads to a parameterization of the Gray-Wyner rate region, with respect to variance matrix of the jointly Gaussian triple (Y 1 , Y 2 , W), where W a Gaussian auxiliary random variable. However, much remains to be achieved, from the computation point of view, for this problem, and to exploit the new approach to other multi-user problems of information theory.
Author Contributions: C.D.C. and J.H.v.S. contributed to the conceptualization, methodology, and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding:
The work of C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCEL-LENCE/1216/0296).

Data Availability Statement:
Numerical evaluations of Wyner's common information, based on the implementation of the canonical variable form, and the calculation of the canonical variable coefficients are found in [21], Section 3.7.
First, it is proven that the assumptions of the lemma are satisfied. Note that 0 < Q X implies that rank(Q X ) = n. This and the fact that rank(D) = n imply that rank(B) = rank(Q 1/2 X D 1/2 ) = n. Further note that