Next Article in Journal
Generic Structure Extraction with Bi-Level Optimization for Graph Structure Learning
Previous Article in Journal
Attention Network with Information Distillation for Super-Resolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs †

by
Charalambos D. Charalambous
1,* and
Jan H. van Schuppen
2
1
Department of Electrical and Computer Engineering, University of Cyprus, P.O. Box 20537, CY-1678 Nicosia, Cyprus
2
Van Schuppen Control Research, Gouden Leeuw 143, 1103 KB Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Preliminary results are presented in part at the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020.
Entropy 2022, 24(9), 1227; https://doi.org/10.3390/e24091227
Submission received: 21 June 2022 / Revised: 21 August 2022 / Accepted: 25 August 2022 / Published: 1 September 2022
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Examined in this paper is the Gray and Wyner source coding for a simple network of correlated multivariate Gaussian random variables, Y 1 : Ω R p 1 and Y 2 : Ω R p 2 . The network consists of an encoder that produces two private rates R 1 and R 2 , and a common rate R 0 , and two decoders, where decoder 1 receives rates ( R 1 , R 0 ) and reproduces Y 1 by Y ^ 1 , and decoder 2 receives rates ( R 2 , R 0 ) and reproduces Y 2 by Y ^ 2 , with mean-square error distortions E | | Y i Y ^ i | | R p i 2 Δ i [ 0 , ] , i = 1 , 2 . Use is made of the weak stochastic realization and the geometric approach of such random variables to derive test channel distributions, which characterize the rates that lie on the Gray and Wyner rate region. Specific new results include: (1) A proof that, among all continuous or finite-valued random variables, W : Ω W , Wyner’s common information, C ( Y 1 , Y 2 ) = inf P Y 1 , Y 2 , W : P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W I ( Y 1 , Y 2 ; W ) , is achieved by a Gaussian random variable, W : Ω R n of minimum dimension n, which makes the two components of the tuple ( Y 1 , Y 2 ) conditionally independent according to the weak stochastic realization of ( Y 1 , Y 2 ) , and a the formula C ( Y 1 , Y 2 ) = 1 2 j = 1 n ln 1 + d j 1 d j , where d i ( 0 , 1 ) , i = 1 , , n are the canonical correlation coefficients of the correlated parts of Y 1 and Y 2 , and a realization of ( Y 1 , Y 2 , W ) which achieves this. (2) The parameterization of rates that lie on the Gray and Wyner rate region, and several of its subsets. The discussion is largely self-contained and proceeds from first principles, while connections to prior literature is discussed.

1. Introduction

In their seminal paper, Source Coding for a Simple Network [1], Gray and Wyner characterized the lossless rate region for a tuple of finite-valued random variables, and the lossy rate region for a tuple of arbitrary distributed random variables. Many extensions and generalizations followed Gray and Wyner’s fundamental work. Wyner [2] introduced an operational definition of the common information between a tuple of sources that generate symbols with values in finite spaces. Wyner’s operational definition of common information is defined as the minimum achievable common message rate on the Gray and Wyner lossless rate region. Witsenhausen [3] investigated bounds for Wyner’s common information, and sequences of pairs of random variables in this regard [4]. Gács and Körner [5] introduced another definition of common randomness between a tuple of jointly independent and identically distributed random variables. Benammar and Zaidi [6,7] characterized the Gray–Wyner rate region, when there is side information at the decoders, under various scenarios that include both receivers and reproduce the source symbols without distortion. Insightful application examples for binary sources are considered in [7] (Section 4.2). In their previous work, Benammar and Zaidi [8,9] characterized the rate distortion function of the Heegard and Berger [10] problem, with two sources and side information at the two decoders (under a degraded set-up). Connections between the Gray and Wyner lossy source coding network and the notions of empirical and strong coordination capacity for arbitrary networks were developed by Cuff, Permuter and Cover [11] and the references therein, where the authors elaborated on the usefulness of the common information between the different network nodes.
Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13], explored the connection of Wyner’s common information and the Gray and Wyner lossy rate region, to generalize Wyner’s common information to its lossy counterpart, for random variables taking values in arbitrary spaces. They characterized Wyner’s lossy common information as the minimum common message rate on the Gray and Wyner lossy rate region, when the sum rate is arbitrarily close to the rate distortion function with joint decoding for the Gray and Wyner lossy network. Applications to encryption and secret key generation are discussed by Viswanatha, Akyol and Rose in [12] (and references therein).
The current paper is focused on the calculations of rates that lie in the Gray and Wyner rate region [1], for two sources that generate symbols, according to the model of jointly independent and identically distributed multivariate correlated Gaussian random variables Y 1 : Ω R p 1 , Y 2 : Ω R p 2 , and square-error fidelity at the two decoders. The current literature on methods and algorithms to compute such rates are subject to a number of limitations which often prevent their practical usefulness:
(1) Rates that lie in the Gray and Wyner rate region are only known for the special case of a tuple of scalar-valued Gaussian random variables with square error distortion, i.e., p 1 = p 2 = 1 [1,12,13].
(2) Wyner’s lossy common information is only computed in closed form, for the special cases of a tuple of scalar-valued Gaussian random variables, [12,13].
(3) Important generalizations to a tuple of sources that generate multivariate Gaussian symbols, require new derivations often of considerable difficulty.
(4) Realizations of the optimal test channel distributions and their structural properties of the various rate distortion functions (RDFs), which are involved in the Gray and Wyner characterization of the rate region, are not developed.
(5) A proof that the Gray and Wyner for jointly Gaussian sources, is characterized by a Gaussian auxiliary random variable W is still missing from past literature.
It is known from [1] that the Gray and Wyner rate region can be parameterized by an auxiliary random variable  W : Ω W , via several rate distortion functions. Moreover, subsets of the Gray and Wyner rate region are parameterized by W which satisfies conditional independence (1).
P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W .
The current paper makes use of the canonical variable form and the weak stochastic realization of the tuple of random variables ( Y 1 , Y 2 ) , introduced in Section 2 to characterize subsets of the Gray and Wyner rate region, which are parameterized by jointly Gaussian random variables ( Y 1 , Y 2 , W ) with W : Ω R n , where n is a finite number, while in some cases, the minimum dimension of W is clarified. The weak stochastic realization is developed to deal with the fundamental issue that, for the Gray and Wyner network, one is given the joint distribution P Y 1 , Y 2 , while the characterization of the RDFs involves the specification of the test channel distributions, that achieve these RDFs, and the actual construction of realizations of all random variables involved, that induce the test channel distributions. Furthermore, Wyner’s common information between Y 1 and Y 2 involves the construction of a joint distribution P Y 1 , Y 2 , W where W is the auxiliary random variable that makes Y 1 and Y 2 conditionally independent, i.e., (1) holds.
The rest of the section serves mainly to review the Gray and Wyner characterization of the lossy rate region and the characterization of Wyner’s lossy common information.

1.1. Literature Review

(a) The Gray and Wyner source coding for a simple network [1].
Consider the Gray and Wyner source coding for a simple network, as shown in Figure 1, for a tuple of jointly independent and identically distributed multivariate Gaussian random variables ( Y 1 N , Y 2 N ) = { ( Y 1 , i , Y 2 , i ) : i = 1 , 2 , , N } ,
Y 1 , i : Ω R p 1 = Y 1 , Y 2 , i : Ω R p 2 = Y 2 , i = 1 , , N
with square error distortion functions at the two decoders,
D Y 1 ( y 1 N , y ^ 1 N ) = 1 N i = 1 N | | y 1 , i y ^ 1 , i | | R p 1 2 , D Y 2 ( y 2 N , y ^ 2 N ) = 1 N i = 1 N | | y 2 , i y ^ 2 , i | | R p 2 2
where | | · | | R p i 2 are Euclidean distances on R p i , i = 1 , 2 .
The encoder takes as its input the data sequences ( Y 1 N , Y 2 N ) and produces at its output three messages, ( S 0 , S 1 , S 2 ) , with binary bit representations ( N R 0 , N R 1 , N R 2 ) , respectively. There are three channels, channel 0, channel 1, channel 2, with capacities ( C 0 , C 1 , C 2 ) (in bits per second), respectively, to transmit the messages to two decoders. Channel 0 is a common channel and channel 1 and channel 2 are the private channels which connect the encoder to each of the two decoders. Message S 0 is a common or public message that is transmitted through the common channel 0 with capacity C 0 to decoder 1 and decoder 2; S 1 is a private message which is transmitted through the private channel 1 with capacity C 1 to decoder 1; and S 2 is a private message, which is transmitted through the private channel 2 with capacity C 2 to decoder 2.
Decoder 1 aims to reproduce Y 1 N by Y ^ 1 N subject to an average distortion and decoder 2 aims to reproduce Y 2 N by Y ^ 2 N , subject to an average distortion, where ( Y ^ 1 , i , Y ^ 2 , i ) = ( y ^ 1 , i , y ^ 2 , i ) Y ^ 1 × Y ^ 2 Y 1 × Y 2 , i = 1 , , N , that is,
E D Y 1 ( Y 1 N , Y ^ 1 N ) Δ 1 , E D Y 2 ( Y 2 N , Y ^ 2 N ) Δ 2 , ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] .
Gray and Wyner characterized the rate region, denoted by R G W ( Δ 1 , Δ 2 ) , via a coding scheme that uses the auxiliary random variable W, and the family of probability distributions
P P Y 1 , Y 2 , W ( y 1 , y 2 , w ) , y 1 Y 1 , y 2 Y 2 , w W | P Y 1 , Y 2 , W ( y 1 , y 2 , ) = P Y 1 , Y 2 ( y 1 , y 2 )
such that the joint probability distribution P Y 1 , Y 2 , W ( y 1 , y 2 , w ) on Y 1 × Y 2 × W , has a ( Y 1 , Y 2 ) −marginal probability distribution P Y 1 , Y 2 ( y 1 , y 2 ) on Y 1 × Y 2 that coincides with the probability distribution of ( Y 1 , Y 2 ) .
For the source and distortion function specified in (2) and (3), we apply the weak stochastic realization to construct the family of distributions P , which is parameterized by the auxiliary random variable W. Use is made of the characterization of R G W ( Δ 1 , Δ 2 ) is described in terms of an auxiliary random variable, as follows.
Theorem 1
(Theorem 8 in [1]). Let R G W ( Δ 1 , Δ 2 ) denote the Gray and Wyner rate region of the simple network shown in Figure 1.
Suppose there exists y ^ i Y ^ i such that E { d Y i ( Y i , y ^ i ) } < , for i = 1 , 2 .
For each P Y 1 , Y 2 , W P and Δ 1 0 , Δ 2 0 , define the subset of Euclidean 3 dimensional space
R G W P Y 1 , Y 2 , W ( Δ 1 , Δ 2 ) = R 0 , R 1 , R 2 | R 0 I ( Y 1 , Y 2 ; W ) , R 1 R Y 1 | W ( Δ 1 ) , R 2 R Y 2 | W ( Δ 2 )
where R Y i | W ( Δ i ) is the conditional rate distortion function of Y i N , conditioned on W N , at decoder i, for i = 1 , 2 , and R Y 1 , Y 2 ( Δ 1 , Δ 2 ) is the joint rate distortion function of the joint decoding of ( Y 1 N , Y 2 N ) (all single letters). Let
R G W ( Δ 1 , Δ 2 ) = P Y 1 , Y 2 , W P R G W P Y 1 , Y 2 , W ( Δ 1 , Δ 2 ) c
where · c denotes the closure of the indicated set. The achievable Gray–Wyner lossy rate region is given by
R G W ( Δ 1 , Δ 2 ) = R G W ( Δ 1 , Δ 2 ) .
Gray and Wyner [1] (Theorem 6) also showed that, if ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) , then
R 0 + R 1 + R 2 R Y 1 , Y 2 ( Δ 1 , Δ 2 ) ,
R 0 + R 1 R Y 1 ( Δ 1 ) ,
R 0 + R 2 R Y 2 ( Δ 2 )
where R Y i ( Δ i ) is the rate distortion function of Y i N at decoder i, for i = 1 , 2 , and R Y 1 , Y 2 ( Δ 1 , Δ 2 ) is the joint rate distortion function of ( Y 1 N , Y 2 N ) at the two decoders. The inequality in (9) is called the Pangloss Bound of the Gray–Wyner lossy rate region R G W ( Δ 1 , Δ 2 ) . The set of triples ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) that satisfy the equality R 0 + R 1 + R 2 = R Y 1 , Y 2 ( Δ 1 , Δ 2 ) is called the Pangloss Plane of the Gray–Wyner lossy rate region R G W ( Δ 1 , Δ 2 ) .
Gray and Wyner proved that R G W ( Δ 1 , Δ 2 ) , is also determined from [1] ((4) of page 1703, Equation (42)),
T ( α 1 , α 2 ) = inf P Y 1 , Y 2 , W P I ( Y 1 , Y 2 ; W ) + α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 )
where 0 α i 1 , i = 1 , 2 , α 1 + α 2 1 , and where for each P Y 1 , Y 2 , W P , the conditional distribution P Y 1 , Y 2 | W is defined, from which follows the Y i marginals P Y i | W , i = 1 , 2 .
(b) Wyner’s common Information of finite-valued random variables.
Wyner [2] introduced an operational definition of the common information between a tuple of random variables ( Y 1 N , Y 2 N ) that takes values in finite spaces.
The first approach of Wyner’s operational definition of common information between sequences Y 1 N and Y 2 N is defined as the minimum achievable common message rate R 0 on the Gray–Wyner network of Figure 1.
Wyner’s single letter information theoretic characterization of the infimum of all achievable message rates R 0 , called Wyner’s common information, is defined by
C ( Y 1 , Y 2 ) = inf P Y 1 , Y 2 , W : P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W I ( Y 1 , Y 2 ; W ) .
Here, P Y 1 , Y 2 , W is any joint probability distribution on Y 1 × Y 2 × W with ( Y 1 , Y 2 ) −marginal P Y 1 , Y 2 , such that W makes Y 1 and Y 2 conditionally independent, that is P Y 1 , Y 2 , W P .
(c) Minimum common message rate and Wyner’s lossy common information for arbitrary random variables.
Viswanatha, Akyol and Rose [12], and Xu, Liu and Chen [13] explored the connection of Wyner’s common information and the Gray–Wyner lossy rate region, to provide a new interpretation of Wyner’s common information to its lossy counterpart.
The following characterization was derived by Xu, Liu and Chen [13] (an equivalent characterization was also derived by Viswanatha, Akyol and Rose [12]).
Theorem 2
(Theorem 4 in [13]). Suppose there exists y ^ i Y ^ i such that E { d Y i ( Y i , y ^ i ) } < , for i = 1 , 2 .
Let C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) denote the minimum common message rate R 0 on the Gray–Wyner lossy rate region R G W ( Δ 1 , Δ 2 ) , with a sum rate not exceeding the joint rate distortion function, i = 0 2 R i R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , while satisfying the average distortions.
Then, C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) is characterized by the optimization problem
C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = inf I ( Y 1 , Y 2 ; W )
such that the following identity holds
R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) = R Y 1 , Y 2 ( Δ 1 , Δ 2 )
where the infimum is over all random variables W taking values in W , which parameterize the source distribution via P Y 1 , Y 2 , W , having a Y 1 × Y 2 marginal source distribution P Y 1 , Y 2 , and induce joint distributions P W , Y 1 , Y 2 , Y ^ 1 , Y ^ 2 which satisfy the constraints.
It is shown in [12,13] that there exists a distortion region such that C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 ) , i.e., it is independent of the distortions ( Δ 1 , Δ 2 ) , and C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) , i.e., it is equal to Wyner’s information theoretic characterization of common information between Y 1 and Y 2 , defined by (13). However, their proofs that W is finite-dimensional Gaussian relies on the assumption that W is continuous-valued.
The next theorem is derived by Xu, Liu and Chen [13].
Theorem 3
(Theorem 5 in [13]). Let ( Y 1 , Y 2 ) be a pair of random variables with distribution P Y 1 , Y 2 on the alphabet space Y 1 × Y 2 , where Y 1 and Y 2 are arbitrary measurable spaces that can be discrete or continuous.
Let W be any random variable achieving C ( Y 1 , Y 2 ) defined by (13).
Let the reproduction alphabet Y ^ 1 = Y 1 , Y ^ 2 = Y 2 and two per-letter distortion measures d Y 1 ( y 1 , y ^ 1 ) , d Y 2 ( y 2 , y ^ 2 ) satisfy
d Y i ( y i , y ^ i ) > d Y i ( y i , y i ) = 0 , y i y ^ i , i = 1 , 2
If the following conditions are satisfied:
(1) For any y 1 Y 1 , y 2 Y 2 and w W , P W | Y 1 , Y 2 > 0 ;
(2) There exists an y ^ i Y ^ i , such that
E d Y i ( Y i , y ^ i ) < , i = 1 , 2
then there exists a strictly positive vector γ = ( γ 1 , γ 2 ) ( 0 , ) × ( 0 , ) , such that, for 0 ( Δ 1 , Δ 2 ) γ ,
C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) .
Moreover, C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) is constant on D W = ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] : 0 ( Δ 1 , Δ 2 ) γ .
The analog of the above theorem is also derived by Viswanatha, Akyol and Rose in [12] (Lemma 1). A subset of the Pangloss plane is derived by Gray and Wyner [1] (Theorem 9).
For a bivariate Gaussian random variables, i.e., p 1 = p 2 = 1 , with square-error distortions, Viswanatha, Akyol and Rose in [12], and Xu, Liu and Chen [13] computed C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) , by using Xiao’s and Luo’s [14] (Theorem 6) the closed-form expression of joint rate distortion function R Y 1 , Y 2 ( Δ 1 , Δ 2 ) . In addition, for the bivariate Gaussian random variables, with symmetric square-error distortions, i.e., Δ 1 = Δ 2 = Δ , Gray and Wyner [1] (Section 2.5, (B)), computed a rate-triple ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) that lies on the Pangloss plane.

1.2. Main Theorems and Discussion

What follows is a brief summary of the main theorems derived in this paper, and relations to the literature.
Theorem 9 shows that, among all joint distributions P Y 1 , Y 2 , W induced by a tuple of multivariate correlated Gaussian random variables ( Y 1 , Y 2 ) , and an arbitrary random variable W : Ω W , continuous or discrete-valued, Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), is minimized by a triple ( Y 1 , Y 2 , W ) which induces a jointly Gaussian distribution P Y 1 , Y 2 , W , and W : Ω W = R n is a finite-dimensional Gaussian random variable. In particular, Theorem 9 gives the weak stochastic realization of ( Y 1 , Y 2 ) , and the construction of the random variable W, which induce a joint distribution P Y 1 , Y 2 , W that achieves the minimum of I ( Y 1 , Y 2 ; W ) such that W makes Y 1 and Y 2 conditionally independent.
Then, use is made of Theorem 9, Section 2.2, such as Definition 1 of the canonical variable form and the weak stochastic realization to derive Wyner’s common information C ( Y 1 , Y 2 ) defined by (13), and the optimal realization of the triple ( Y 1 , Y 2 , W ) = ( Y 1 , Y 2 , W ) that achieves C ( Y 1 , Y 2 ) , as stated in the next theorem.
Theorem 4.
Consider a tuple of Gaussian random variables Y i : Ω R p i , with Q Y i > 0 , for i = 1 , 2 , ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) 0 , and apply Algorithm A1 (and the notation therein) to decompose and transform the random variables into a canonical variable form (with abuse of notation, the transform random variables are denoted by ( Y 1 , Y 2 ) G ( 0 , Q cvf ) .), ( Y 1 , Y 2 ) G ( 0 , Q cvf ) , using the material and notation of Section 2.2, i.e., Definition 1.
(a) Then,
C ( Y 1 , Y 2 ) = C ( Y 11 , Y 21 ) + C ( Y 12 , Y 22 ) + C ( Y 13 , Y 23 ) = 0 , i f p 13 > 0 , p 23 > 0 , p 11 = p 12 = p 21 = p 22 = 0 , 1 2 i = 1 n ln 1 + d i 1 d i , i f p 12 = p 22 > 0 , p 11 = p 21 = 0 , p 13 0 , p 23 0 , + , i f p 11 = p 21 > 0
where ( p 11 , p 12 , p 13 ) and ( p 21 , p 22 , p 23 ) are the dimensions of the canonical variable decomposition of the tuple ( Y 1 , Y 2 ) , and
C ( Y 11 , Y 21 ) = + , i f p 11 = p 21 > 0 ;
C ( Y 13 , Y 23 ) = 0 , i f p 13 > 0 a n d p 23 > 0 ;
C ( Y 12 , Y 22 ) = 1 2 i = 1 n ln 1 + d i 1 d i , i f n = p 12 = p 22 > 0 .
Thus, C ( Y 12 , Y 22 ) is the most interesting value if defined.
(b) The random variable W defined below is such that C ( Y 1 , Y 2 ) of part (a) is attained.
W : Ω R n , n Z + , n 1 = p 11 = p 21 , n 2 = p 12 = p 22 , n 1 + n 2 = n , W = W 1 W 2 , W 1 : Ω R n 1 , W 2 : Ω R n 2 , W 1 = Y 11 = Y 21 ,
W 2 = L 1 Y 12 + L 2 Y 22 + L 3 V , s e e   T h e o r e m   11 . ( b ) f o r   t h e   f o r m u l a s   o f   L 1 ,   L 2 ,   L 3
where the following properties hold:
t h e n ( Y 1 , Y 2 , W ) G ( 0 , Q s ( I ) ) , s e e   ( 81 ) f o r Q s ( I ) ,
( F Y 11 , Y 12 , Y 13 , F Y 21 , Y 22 , Y 23 | F W 1 , W 2 ) CI ,
F W 1 ( F Y 11 F Y 21 ) , F W 2 ( F Y 12 F Y 22 ) ,
C ( Y 1 , Y 2 ) = I ( Y 1 , Y 2 ; W ) .
(c) The following operations are defined, using (a),
W = W 1 W 2 ,
W 1 = Y 11 = Y 21 ,
W 2 = L 1 Y 12 + L 2 Y 22 + L 3 V , s e e   ( 103 ) ,   ( 104 )   f o r   t h e   f o r m u l a s   o f L 1 , L 2 , L 3 ;
Z 12 = Y 12 E [ Y 12 | F W 2 ] = Y 12 Q Y 12 , W 2 Q W 2 1 W 2 ,
Z 22 = Y 22 E [ Y 22 | F W 2 ] = Y 22 Q Y 22 , W 2 Q W 2 1 W 2 ,
Z 13 = Y 13 , Z 23 = Y 23 , ( t h e   c o m p o n e n t s   Z 11   a n d   Z 21   d o   n o t   e x i s t ) ,
Z 1 = Z 12 Z 13 , Z 2 = Z 22 Z 23
and these imply,
Y 11 = W 1 = Y 21 ,
Y 12 = Q Y 12 , W 2 Q W 2 1 W 2 + Z 12 , Y 22 = Q Y 22 , W 2 Q W 2 1 W 2 + Z 22 ,
Y 13 = Z 13 , Y 23 = Z 23 ;
equivalently
Y 11 Y 12 Y 13 = I n 1 0 0 Q Y 12 , W 2 Q W 2 1 0 0 W 1 W 2 + 0 0 I n 2 0 0 I n n 1 n 2 Z 12 Z 13 , Y 21 Y 22 Y 23 = I n 1 0 0 Q Y 22 , W 2 Q W 2 1 0 0 W 1 W 2 + 0 0 I n 2 0 0 I n n 1 n 2 Z 22 Z 23 .
The derivation of Theorem 4 is presented in Section 3.2, after several of the tools are presented, such as, weak stochastic realizations and minimal realizations.
Remark 1.
Relation of Theorem 4 to the literature.
(a) Corollary 1 in [15] gives an expression analogous to the case (20), which is expressed in terms of the correlation coefficients, ρ i ( 1 , 1 ) and not the canonical correlation coefficients d i ( 0 , 1 ) . Similarly, [16], under Lemma 1, reproduces Corollary 1 in [15], with the correlation coefficients ρ i replaced by their absolute values | ρ i | .
(b) The derivation in [15,16] is based on the use of rate distortion functions of Gaussian random variables with square-error distortion functions, which presupposes the that auxiliary RV W W takes continuous values.
(c) Refs. [15,16] do not provide a realization of the triple ( Y 1 , Y 2 , W ) , as given in Theorem 4 (which is based on applying the parametrization of Theorem 8).
On the other hand, the derivation of Theorem 4 is based on Theorem 9, which shows that, among all joint distributions P Y 1 , Y 2 , W induced by a tuple of multivariate correlated Gaussian random variables ( Y 1 , Y 2 ) , and an arbitrary random variable W : Ω W , continuous or discrete-valued, Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), is minimized by a triple ( Y 1 , Y 2 , W ) which induces a jointly Gaussian distribution P Y 1 , Y 2 , W , and W : Ω W = R n is finite-dimensional Gaussian random variable.
(d) The derivation of Theorem 4 contains many intermediate results which are applicable to the problems considered in [15,16], such as Relaxed Wyner’s Common Information in [17]. These are discussed in Section 4.3.
Theorem 5 gives a parametric characterization of the Gray and Wyner rate region R G W ( Δ 1 , Δ 2 ) , with respect to the variance matrix of the triple of jointly Gaussian random variables ( Y 1 , Y 2 , W ) .
Theorem 5.
Consider a tuple of Gaussian random variables Y i : Ω R p i , with Q Y i > 0 , for i = 1 , 2 , ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) (not necessarily in canonical variable form), with induced Gaussian measure P 0 = G ( 0 , Q ( Y 1 , Y 2 ) ) on the space ( R p 1 × R p 2 , B ( R p 1 ) B ( R p 2 ) ) , and square error distrortion functions D Y 1 ( y 1 , y ^ 1 ) = | | y 1 y ^ 1 | | R p 1 2 , D Y 2 ( y 2 , y ^ 2 ) = | | y 2 y ^ 2 | | R p 2 2 .
The following hold.
(a) There exists a Gaussian measure P 1 = G ( 0 , Q ( Y 1 , Y 2 , W ) ) defined on the space ( R p 1 × R p 2 × R n , B ( R p 1 ) B ( R p 2 ) B ( R n ) ) , n Z + associated with the Gaussian random variables ( Y 1 , Y 2 ) ) , W : Ω R n , W G ( 0 , Q W ) such that P 1 | R p 1 × R p 2 = G ( 0 , Q ( Y 1 , Y 2 ) ) . Moreover, a realization of the random variables ( Y 1 , Y 2 , W ) with induced measure P 1 = G ( 0 , Q ( Y 1 , Y 2 , W ) ) is
Y 1 Y 2 = Q ( Y 1 , Y 2 ) , W Q W W + Z 1 Z 2
( Z 1 , Z 2 ) G ( 0 , Q ( Z 1 , Z 2 ) ) , ( Z 1 , Z 2 ) i n d e p e n d e n t   o f W ,
Q ( Y 1 , Y 2 ) , W = E [ Y 1 Y 2 W T ] = Q Y 1 , W Q Y 2 , W
where Q W is the pseudoinverse of Q W .
(b) For Gaussian auxiliary random variables given in part (a), the Gray–Wyner rate region R G W ( Δ 1 , Δ 2 ) is determined from
T G ( α 1 , α 2 ) = inf ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) o f ( 68 ) { I ( Y 1 , Y 2 ; W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) } = inf ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) , o f ( ) , ( ) { I ( Y 1 , Y 2 ; W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) } = inf Q Y 1 , Y 2 | W , Q Y i | W , i = 1 , 2 { 1 2 ln det ( Q ( Y 1 , Y 2 ) ) det ( Q Y 1 | Y 2 , W ) det ( Q Y 2 | W ) +
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) }
s u b j e c t   t o , Q ( Y 1 , Y 2 ) Q ( Y 1 , Y 2 ) | W , Q Y 1 | Y 2 , W = Q Y 1 | W Q Y 1 , Y 2 | W Q Y 2 | W Q Y 1 , Y 2 | W T
where 0 α i 1 , i = 1 , 2 , α 1 + α 2 1 , I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | Y 2 , W ) H ( Y 2 | W ) , and R Y i | W ( Δ i ) , i = 1 , 2 are given in Theorem 13.(b), and · + = max { 1 , · } .
The derivation of Theorem 5 is presented in Section 4.4, after the structural properties of RDFs, R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , R Y i ( Δ i ) , i = 1 , 2 , of Theorem 12, Theorem 13, Theorem 14 are presented. From Theorem 5, follow simplified characterizations of subsets of the rate region R G W ( Δ 1 , Δ 2 ) , such as rates that lie on Pangloss Plane, and rates that correspond to W that make Y 1 and Y 2 conditional independent, i.e., W is such that P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W .
Utilizing the structural properties of RDFs, R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , R Y i ( Δ i ) , i = 1 , 2 , of Theorem 12, Theorem 13, Theorem 14, and Theorem 4, the next theorem is obtained, which gives the formula of Wyner’s lossy common information C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 ) .
Theorem 6.
Consider a tuple ( Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), and subset of the distortion region is defined by
D W = ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] | 0 Δ 1 n ( 1 d 1 ) , 0 Δ 2 n ( 1 d 1 ) ,
j Z n , d j ( 0 , 1 ) .
Then, Wyner’s lossy common information (calculation of expression in Theorem 3) is given by
C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 )
= C ( Y 1 , Y 2 ) = 1 2 j = 1 n ln 1 + d j 1 d j , ( Δ 1 , Δ 2 ) D W
The derivation of Theorem 6 is presented in Section 4.2 and makes use of a degenerate version of the realization of the triple ( Y 1 , Y 2 , W ) given in Theorem 4, and the RDFs R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , i = 1 , 2 .
Remark 2.
By Theorem 5, a subset of the Gray–Wyner rate region is obtained by replacing ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) of (39) and (40) by W that makes Y 1 and Y 2 conditionally independent, i.e., ( Z 1 , Z 2 ) G ( 0 , Q ( Z 1 , Z 2 ) ) and ( Z 1 , Z 2 , W ) mutually independent (e.g., Q ( Z 1 , Z 2 ) is block-diagonal).

1.3. Structure of the Paper

Section 2 introduces the mathematical tools of the geometric approach to Gaussian random variables, the weak stochastic realization of conditional independence (Section 2.4).
Section 3 contains the problem statement, the solution procedure and the weak realization of a tuple of multivariable random variables ( Y 1 , Y 2 ) such that another multivariate Gaussian random variable W makes Y 1 and Y 2 conditionally independent (Section 2.5). C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) .
Section 4 is concerned with the characterization of the Gray–Wyner rate region R G W ( Δ 1 , Δ 2 ) , the characterization of rates that lie on the Pangloss Plane, and Wyner’s lossy common information. This section includes calculations of the rate distortion functions R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , R Y i ( Δ i ) , i = 1 , 2 , the weak stochastic realizations of the random variables ( Y 1 , Y 2 , Y ^ 1 , Y ^ 2 , W ) which achieve these rate distortion functions, for jointly multivariate Gaussian random variables with square-error distortion functions.
Section 5 includes remarks on possible extensions.
Appendix A.3 makes use of a matrix equality and a determinant inequality first obtained by Hua LooKeng in 1952, which are used to carry out the optimization problem of Wyner’s lossy common information C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) .

2. Probabilistic Properties of Tuples of Random Variables

The reader finds in this section the basic properties associated with:
(1) the transformation of a tuple of Gaussian multivariate random variables ( Y 1 , Y 2 ) in their canonical variable form, and
(2) The parameterization of all jointly Gaussian distributions P Y 1 , Y 2 , W ( y 1 , y 2 , w ) by a zero mean Gaussian random variables W : Ω R k W such that (a) W makes the multivariate random variables ( Y 1 , Y 2 ) conditional independent, and (b) The marginal distribution P Y 1 , Y 2 , W ( y 1 , y 1 , ) = P Y 1 , Y 2 ( y 1 , y 2 ) coincides with the joint distribution of the multivariate random variables ( Y 1 , Y 2 ) .

2.1. Notation of Elements of Probability Theory

The notation used in this paper is briefly specified. Denote by Z + = { 1 , 2 , , } the set of the integers and by N = { 0 , 1 , 2 , , } the set of the natural integers. For n Z + denote the following finite subsets of the above defined sets by Z n = { 1 , 2 , , n } and N n = { 0 , 1 , 2 , , n } .
Denote the real numbers by R and the set of positive and of strictly positive real numbers, respectively, by R + = [ 0 , ) and R + + = ( 0 , ) R . The vector space of n-tuples of real numbers is denoted by R n . Denote the Borel σ -algebra on this vector space by B ( R n ) , hence, ( R n , B ( R n ) ) is a measurable space.
The expression R n × m denotes the set of n by m matrices with elements in the real numbers, for n , m Z + . For the symmetric matrix Q R n × n , the inequality Q 0 denotes that for all vectors u R n the inequality u T Q u 0 holds. Similar, Q > 0 denotes that for all u R n { 0 } , u T Q u > 0 . The notation Q 1 Q 2 denotes that Q 2 Q 1 0 .
Consider a probability space denoted by ( Ω , F , P ) consisting of a set Ω , a σ -algebra F of subsets of Ω , and a probability measure P : F [ 0 , 1 ] .
A real-valued random variable is a function X : Ω R such that the following set belongs to the indicated σ -algebra, { ω Ω | X ( ω ) ( , u ] } F for all u R . A random variable taking values in an arbitrary measurable space ( X , B ( X ) ) is defined correspondingly by X : Ω X and X 1 ( A ) = { ω Ω | X ( ω ) A } B ( X ) , for all A B ( X ) . The measure (or distribution if X is an Euclidean space) induced by the random variable on ( X , B ( X ) ) is denoted by P X or P ( d x ) . The σ -algebra generated by a random variable X : Ω X is defined as the smallest σ -algebra containing the subsets X 1 ( A ) F for all A B ( X ) . It is denoted by F X . The real-valued random variable X is called G-measurable for a σ -algebra G F if the subset { ω Ω | X ( ω ) ( , u ] } G for all u R . Denote the set of positive random variables which are measurable on a sub- σ -algebra G F by,
L + ( G ) = { X : Ω R + = [ 0 , ) | X i s G m e a s u r a b l e } .
The tuple of sub- σ -algebras F 1 , F 2 F is called independent if E [ X 1 X 2 ] = E [ X 1 ] E [ X 2 ] for all X 1 L + ( F 1 ) and all X 2 L + ( F 2 ) . The definition can be extended to any finite set of independent sub- σ -algebras.

2.2. Geometric Approach of Gaussian Random Variables and Canonical Variable Form

The purpose of this section is to introduce the geometric approach of a tuple of finite-dimensional Gaussian random variables using the canonical variable form of the tuple introduced by H. Hotelling, [18]. The use of the geometric approach of two Gaussian random variables with respect to the computation of mutual information is elaborated by Gelfand and Yaglom in [19], making reference to an insight due to Kolmogorov. However, the canonical variable form is not given in [19].
An R n -valued Gaussian random variable with as parameters the mean value  m X R n and the variance  Q X R n × n , Q X = Q X T 0 , is a function X : Ω R n which is a random variable and such that the measure of this random variable equals a Gaussian measure described by its characteristic function,
E [ exp ( i u T X ) ] = exp ( i u T m X 1 2 u T Q X u ) , u R n .
Note that this definition includes the case in which the random variable is almost surely equal to a constant in which case Q X = 0 . A Gaussian random variable with these parameters is denoted by X G ( m X , Q X ) .
The effective dimension of the random variable is denoted by dim ( X ) = rank ( Q X ) .
Any tuple of random variables X 1 , , X k is called jointly Gaussian if the vector ( X 1 T , X 2 T , , X k T ) T is a Gaussian random variable. A tuple of Gaussian random variables ( Y 1 , Y 2 ) will be denoted this way to save space, rather than by
Y 1 Y 2 .
Then, the variance matrix of this tuple is denoted by
( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) = Q Y 1 Q Y 1 , Y 2 Q Y 1 , Y 2 T Q Y 2 R ( p 1 + p 2 ) × ( p 1 + p 2 ) .
The reader should distinguish the variance matrices Q ( Y 1 , Y 2 ) and Q Y 1 , Y 2 R p 1 × p 2 . Any such tuple of Gaussian random variables is independent if and only if Q Y 1 , Y 2 = 0 .
Definition 1.
The canonical variable form.
Consider a tuple of Gaussian random variables Y i : Ω R p i , with Q Y i > 0 , for i = 1 , 2 , ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) 0 . Define the canonical variable form of these random variables if a basis has been chosen and a transformation of the random variables to this basis has been carried out such that with respect to the new basis, one has the representation
( Y 1 , Y 2 ) G ( 0 , Q cvf ) , w h e r e ,
Q cvf = I p 11 0 0 I p 21 0 0 0 I p 12 0 0 D 0 0 0 I p 13 0 0 0 I p 21 0 0 I p 21 0 0 0 D 0 0 I p 22 0 0 0 0 0 0 I p 23 R p × p , p , p 1 , p 2 , p 11 , p 12 , p 13 , p 21 , p 22 , p 23 N , p = p 1 + p 2 , p 1 = p 11 + p 12 + p 13 , p 2 = p 21 + p 22 + p 23 , p 11 = p 21 , p 12 = p 22 ,
D = Diag ( d 1 , , d p 12 ) , 1 > d 1 d 2 d p 12 > 0 ,
Y = Y 1 Y 2 = Y 11 Y 12 Y 13 Y 21 Y 22 Y 23 , Y i j : Ω R p i j , i = 1 , 2 , j = 1 , 2 , 3 .
One then says that ( Y 11 , , Y 1 k 1 ) , ( Y 21 , , Y 2 k 2 ) are the canonical variables and ( d 1 , , d k 12 ) the canonical correlation coefficients.
If Q ( Y 1 , Y 2 ) > 0 then necessarily p 11 = p 21 = 0 .
Appendix A.1 gives Algorithm A1 to transform the variance matrix Q ( Y 1 , Y 2 ) 0 by two nonsingular transformations S i R p i × p i , i = 1 , 2 , to its canonical variable form Q cvf of Definition 1, such that
S 1 Y 1 = ( V 1 , Y 1 ) = ( ( Y 11 , Y 12 ) , Y 13 ) , S 2 Y 2 = ( V 2 , Y 2 ) = ( ( Y 21 , Y 22 ) , Y 23 ) ,
Y 11 = Y 21 a . s . , E [ Y 12 Y 22 T ] = D .
Proposition 1.
Properties of components of the canonical variable form.
Consider a tuple ( Y 1 , Y 2 ) G ( 0 , Q cvf ) of Gaussian random variables in the canonical variable form.
(a) The three components Y 11 , Y 12 , Y 13 of Y 1 are independent random variables. Similarly, the three components Y 21 , Y 22 , Y 23 of Y 2 are independent random variables.
(b) The equality Y 11 = Y 21 of these random variables holds almost surely.
(c) The tuple of random variables ( Y 12 , Y 22 ) is correlated as shown by the formula
E [ Y 12 Y 22 T ] = D = Diag ( d 1 , , d p 12 ) .
Note that the different components of Y 12 and of Y 22 are independent random variables; thus, Y 12 , i and Y 12 , j are independent, and Y 22 , i and Y 22 , j are independent, and Y 12 , i and Y 22 , j are independent, for all i j ; and that Y 12 , j and Y 22 , j for j = 1 , , p 12 = p 22 are correlated.
(d) The random variable Y 13 is independent of Y 2 . Similarly, the random variable Y 23 is independent of Y 1
Proof. 
The results are immediately obvious from the fact that the random variables are all jointly Gaussian and from the variance formula (51) of the canonical variable form. □
Next, the interpretation of the various components of the canonical variable form is defined, as in [20].
Definition 2.
Interpretation of components of the canonical variable form.
Consider a tuple of jointly Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q cvf ) in the canonical variable form of Definition 1. Call the various components as defined in the next table.
Y 11 = Y 21 a . s . identical information of Y 1 and Y 2
Y 12 correlated information of Y 1 with respect to Y 2
Y 13 private information of Y 1 with respect to Y 2  
Y 21 = Y 11 a . s . identical information of Y 1 and Y 2
Y 22 correlated information of Y 2 with respect to Y 1
Y 23 private information of Y 2 with respect to Y 1
For Y 11 = Y 21 a . s . the term identical information is used.
Theorem 7 is a formula of mutual information I ( Y 1 ; Y 2 ) for a general tuple of finite-dimensional Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) . This formula is the subject of much discussion in Gelfand and Yaglom [19] (see Equantion (2.8’) and Chapter II).
Theorem 7.
Consider a tuple of finite-dimensional Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q Y i > 0 , i = 1 , 2 .
Compute the canonical variable form of the tuple of Gaussian random variables according to Algorithm A1. This yields the indices p 11 = p 21 , p 12 = p 22 , p 13 , p 23 , and n = p 11 + p 12 = p 21 + p 22 and the diagonal matrix D with canonical correlation coefficients or singular values d i ( 0 , 1 ) for i = 1 , , n .
Then, mutual information I ( Y 1 ; Y 2 ) is computed according to the formula
I ( Y 1 ; Y 2 ) = 0 , i f 0 = p 11 = p 12 = p 21 = p 22 , p 13 > 0 , p 23 > 0 , 1 2 i = 1 n ln 1 d i 2 , i f 0 = p 11 = p 21 , p 12 = p 22 > 0 , p 13 0 , p 23 0 , , i f p 11 = p 21 > 0 , p 12 = p 22 0 , p 13 0 , p 23 0 .
where d i are the canonical correlation coefficients, i.e.,
d i = d i ( Y 12 , i Y 22 , i ) = E Y 12 , i Y 22 , i E Y 12 , i 2 E Y 22 , i 2 = E Y 12 , i Y 22 , i , i = 1 , , n .
Proof. 
The derivation given in Appendix A.3 of [21] (since it is not given in [19]). □
By the last entry of (57), it is appropriate to consider to only ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) such that p 11 = p 21 = 0 , i.e., by removing the identical components prior to the analysis of mutual information problems.
Remark 3.
The material discussed in Section 1.2 makes use of the concepts of this section. The main point to be made is that, in lossy source coding problems, the source distribution is fixed, while the optimal reproduction distribution needs to be found and realized. Then, a pre-encoder can be used by invoking Algorithm A1.

2.3. Conditional Independence of a Triple of Gaussian Random Variables

The concept of conditional independence is basic to the entire paper. The definition is provided below. The characterization of a Gaussian measure on a triple of Gaussian random variables having the conditional independence property is stated.
Definition 3.
Conditional independence.
Consider a probability space ( Ω , F , P ) and three sub-σ-algebras F 1 , F 2 , G F . Call the sub-σ-algebras F 1 and F 2 conditionally independent given, or conditioned on, the sub-σ-algebra G if the following factorization property holds:
E [ Y 1 Y 2 | G ] = E [ Y 1 | G ] E [ Y 2 | G ] , Y 1 L + ( F 1 ) , Y 2 L + ( F 2 ) .
Denote this property by ( F 1 , F 2 | G ) C I .
For Gaussian random variables, the definition of minimality of a Gaussian random variable X that makes two Gaussian random variables ( Y 1 , Y 2 ) conditionally independent is needed. The definition is introduced below.
Definition 4.
Minimality of conditional independence of Gaussian random variables.
Consider three random variables, Y i : Ω R p i for i = 1 , 2 and X : Ω R n .
Call the random variables Y 1 and Y 2 Gaussian conditionally independent conditioned on or given F X if:
(1) ( F Y 1 , F Y 2 | F X ) CI ;
(2) ( Y 1 , Y 2 , X ) are jointly Gaussian random variables.
The notation ( Y 1 , Y 2 | X ) CIG is used to denote this property.
Call the random variables ( Y 1 , Y 2 | X ) minimally Gaussian conditionally independent if
(1) They are Gaussian conditionally independent;
(2) There does not exist another tuple ( Y 1 , Y 2 | X 1 ) with X 1 : Ω R n 1 such that ( Y 1 , Y 2 | X 1 ) CIG and n 1 < n .
This property is denoted by ( Y 1 , Y 2 | X 1 ) CIG m i n .
There exists a simple equivalent condition for the conditional independence of tuple of Gaussian random variables by a third Gaussian random variable. This condition is expressed in terms of parameterizing the variance matrix of the tuple as presented in the next proposition.
Proposition 2.
[22] (Proposition 3.4) Equivalent condition for the conditional independence of the tuple of Gaussian random variables.
Consider a triple of jointly Gaussian random variables denoted as ( Y 1 , Y 2 , X ) G ( 0 , Q ) with Q X > 0 . This triple is Gaussian conditionally independent if and only if
Q Y 1 , Y 2 = Q Y 1 , X Q X 1 Q X , Y 2 .
It is minimally Gaussian conditionally independent if and only if, in addition, n = dim ( X ) = rank ( Q Y 1 , Y 2 ) .
It will become apparent in Section 4.4 that the Gray and Wyner lossy rate region R G W ( Δ 1 , Δ 2 ) is parameterized by a triple of jointly Gaussian random variables ( Y 1 , Y 2 , W ) , but not necessarily such that W makes Y 1 and Y 2 conditionally independent. However, subsets of R G W ( Δ 1 , Δ 2 ) , are characterized by a triple ( Y 1 , Y 2 , W ) , such that W makes Y 1 and Y 2 conditionally independent.

2.4. Weak Realization of a Gaussian Probability Measure on a Tuple of Random Variables

This section is motivated by Theorem 9, which states that, among all joint distributions P Y 1 , Y 2 , W induced by a tuple of multivariate correlated Gaussian random variables ( Y 1 , Y 2 ) , and an arbitrary random variable W : Ω W , continuous or discrete-valued, Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), is minimized by a triple ( Y 1 , Y 2 , W ) which induces a jointly Gaussian distribution P Y 1 , Y 2 , W , and W : Ω W = R n is finite-dimensional Gaussian random variable.
To develop the above results, use is made of the solution of the problem of the weak Gaussian stochastic realization of a tuple of Gaussian random variables. Specifically, to determine a Gaussian probability measure on a triple of Gaussian random variables such that:
(1)
The measure restricted to the first two Gaussian random variables is equal to the considered probability measure;
(2)
The third Gaussian random variable makes the other two random variables conditionally independent. This problem does not have a unique solution, there is a set of Gaussian probability measures which meets those conditions. Needed is the parameterization of this set of solutions.
Below, the problem is stated in more detail. Its solution is provided in the next section.
Problem 1.
Weak stochastic realization of a tuple of conditionally independent Gaussian random variables.
Weak stochastic realization problem of a Gaussian random variable. Consider a Gaussian measure P 0 = G ( 0 , Q 0 ) on the space ( R p 1 + p 2 , B ( R p 1 + p 2 ) ) . Determine the integer n N and construct all Gaussian measures on the space ( R p 1 + p 2 + n , ( B ( R p 1 + p 2 + n ) ) such that, if P 1 = G ( 0 , Q 1 ) is such a measure with ( Y 1 , Y 2 , X ) G ( 0 , Q 1 ) , then:
(1) G ( 0 , Q 1 ) | R p 1 + p 2 = G ( 0 , Q 0 ) ;
(2) ( Y 1 , Y 2 | X ) CIG m i n .
Here, the indicated random variables ( Y 1 , Y 2 , X ) are constructed having the measure G ( 0 , Q 1 ) with the dimensions p 1 , p 2 , n Z + , respectively.
The next definition and proposition are about the weak Gaussian stochastic realization of a tuple of jointly Gaussian multivariate random variables and its weak stochastic realization.
Definition 5.
Minimality of weak stochastic realization of a tuple of conditionally independent Gaussian random variables.
Consider a Gaussian measure P 0 = G 0 ( 0 , Q ( y 1 , Y 2 ) ) with zero mean values for a tuple ( Y 1 , Y 2 ) of random variables on the product space ( R p 1 × R p 2 , B ( R p 1 ) B ( R p 2 ) for p 1 , p 2 Z + with
Q ( Y 1 , Y 2 ) = Q Y 1 Q Y 1 , Y 2 Q Y 1 , Y 2 T Q Y 2 , Q Y 1 > 0 , Q Y 2 > 0 .
(a) A weak Gaussian stochastic realization of the Gaussian measure G 0 ( 0 , Q ( y 1 , Y 2 ) ) is defined to be a Gaussian measure P 1 = G 1 if there exists an integer n Z + such that the Gaussian measure G 1 is defined on the space ( R p 1 × R p 1 × R n , B ( R p 1 ) B ( R p 2 ) B ( R n ) ) associated with random variables in the three spaces denoted, respectively, by Y 1 , Y 2 , and X, and such that:
(1) G 1 | R p 1 × R p 2 = G 0 ( 0 , Q ( Y 1 , Y 2 ) ) ;
(2) Q X > 0 ;
(3) Conditional independence holds: P Y 1 , Y 2 | X = P Y 1 | X P Y 2 | X , where these are Gaussian measures, with means which are linear functions of the random variable X and deterministic variance matrices.
(b) The weak Gaussian stochastic realization is called minimal if the dimension n of the random variable X is the smallest possible over all weak Gaussian stochastic realizations as defined in (a).
(c) A Gaussian random variable representation of a weak Gaussian stochastic realization G 1 is defined as a triple of random variables satisfying the following relations
( Y 1 , Y 2 , X , V 1 , V 2 ) , p V 1 , p V 2 Z + , p V 1 p 1 , p V 2 p 2 , Y 1 : Ω R p 1 , Y 2 : Ω R p 2 , V 1 : Ω R p v 1 , V 2 : Ω R p v 2 , X : Ω R n , ( V 1 , V 2 , X ) G , a n d   t h e s e   a r e   z e r o   m e a n   i n d e p e n d e n t   r a n d o m   v a r i a b l e s Q V 1 > 0 , Q V 2 > 0 , Q X > 0 ;
C 1 R p 1 × n , C 2 R p 2 × n , N 1 R p 1 × p V 1 , N 2 R p 2 × p V 2 ,
Y 1 = C 1 X + N 1 V 1 ,
Y 2 = C 2 X + N 2 V 2 ,
Q Y 1 = C 1 Q X C 1 T + N 1 Q V 1 N 1 T ,
Q Y 2 = C 2 Q X C 2 T + N 2 Q V 2 N 2 T ,
Q Y 1 , Y 2 = C 1 Q X C 2 T , G 0 ( 0 , Q ( Y 1 , Y 2 ) ) = G 1 | R p 1 × R p 2 .
From the assumptions, it then follows that ( Y 1 , Y 2 ) are Gaussian random variables, hence the last equality makes sense.
(d) A minimal Gaussian random variable representation of a weak Gaussian stochastic realization is defined as a triple of random variables as in (c) except that, in addition, it is required that,
rank ( C 1 ) = n = rank ( C 2 ) .
The case Q X 0 in (a).(2) is similar.
The next proposition shows the equivalence of weak Gaussian stochastic realizations of Definition 5. (a), (b) to Definition 5. (c), (d), respectively.
Proposition 3.
Consider the setting of Definition 5 with ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) with the representation of (62) and (63).
(a) A weak Gaussian stochastic realization in terms of a measure P 1 = G 1 as defined in Definition 5. (a) is equivalent to a Gaussian random variable representation of Definition 5. (c).
(b) The minimal weak Gaussian stochastic realization of Definition 5. (b) is equivalent to a minimal weak Gaussian random variable representation of Definition 5. (d).
Proof. 
The derivation given in Appendix A.5 of [21]. □
Consider Figure 2. The two signals Y 1 , Y 2 are to be reproduced at the two decoders by Y ^ 1 , Y ^ 2 subject to the square-error distortion functions. According to Gray and Wyner, the characterization of the lossy rate region is described by a single coding scheme that uses the auxiliary random variable W, which is common to both Y 1 , Y 2 . A subset of the rate triples on the Gray and Wyner rate region, which is achieved by a triple that satisfies ( F Y 1 , F Y 2 | F W ) CIG . Below, this conditional independence is further detailed in terms of the mathematical framework of weak stochastic realization such that ( F Y 1 , F Y 2 | F W ) CIG .
Definition 6.
The model for a triple of Gaussian random variables.
Consider a tuple of Gaussian random variables specified by Y = ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) with Y i : Ω R p i for i = 1 , 2 . Take a jointly Gaussian measure G ( 0 , Q ( Y 1 , Y 2 , W ) ) for the triple ( Y 1 , Y 2 , W ) , W : Ω R n , W G ( 0 , Q W ) , such that the marginal measure on ( Y 1 , Y 2 ) is equal to the considered measure, with
Q ( Y 1 , Y 2 , W ) = Q Y 1 Q Y 1 , Y 2 Q Y 1 , W Q Y 1 , Y 2 T Q Y 2 Q Y 2 , W Q Y 1 , W T Q Y 2 , W T Q W .
Denote the parameterized joint measure with respect to W, by ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) . This parameterized joint measure ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) also includes the subset such that the conditional independence holds, ( F Y 1 , F Y 2 | F W ) CIG .
In the following subsections, it will be shown how such a random variable W can be constructed in a number of cases.
Algorithm 1. (a) gives the general case, while (b) gives the special case when the joint measure by ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) such that ( F Y 1 , F Y 2 | F W ) CIG via weak stochastic realization.
Aldorithm 1.
Consider the model of a tuple of Gaussian random variables of Definition 6.
(a) General case.
  • At the encoder, first compute the variables,
    Z 1 Z 2 = Y 1 Y 2 E [ Y 1 Y 2 W T ] = Y 1 Y 2 Q ( Y 1 , Y 2 ) , W Q W W
    Q ( Y 1 , Y 2 ) , W = E [ Y 1 Y 2 W T ] = Q Y 1 , W Q Y 2 , W ;
    then, the triple ( Z 1 , Z 2 , W ) of jointly Gaussian random variables are such that ( Z 1 , Z 2 ) G ( 0 , Q ( Z 1 , Z 2 ) ) and ( Z 1 , Z 2 ) independent of W.
  • The tuple of random variables ( Y 1 , Y 2 ) are represented according to
    Y 1 Y 2 = Q ( Y 1 , Y 2 ) , W Q W W + Z 1 Z 2
(b) Special case. Consider ( F Y 1 , F Y 2 | F W ) CIG , and assume Q W > 0 .
  • At the encoder, compute first the variables,
    Z 1 = Y 1 E [ Y 1 | F W ] = Y 1 Q Y 1 , W Q W 1 W ,
    Z 2 = Y 2 E [ Y 2 | F W ] = Y 2 Q Y 2 , W Q W 1 W ;
    then the triple ( Z 1 , Z 2 , W ) of jointly Gaussian random variables are independent.
  • The tuple of random variables ( Y 1 , Y 2 ) are represented according to,
    Y 1 = Q Y 1 , W Q W 1 W + Z 1 , Y 2 = Q Y 2 , W Q W 1 W + Z 2 .
We emphasize that Y 1 and Y 2 are conditionally independent condition on W if and only if Z 1 and Z 2 are independent.
The validity of the statements of the algorithm follow from the next proposition.
Proposition 4.
Consider the model of a tuple of Gaussian random variables of Definition 6, for cases (a), (b).
(a)
At the encoder, the conditional expectations are correct and the definitions of Z 1 and of Z 2 are well defined.
(b)
The three random variables ( Z 1 , Z 2 , W ) are independent. Consequently, the three sequences
( W N , Z 1 N , Z 2 N ) , and messages generated by the Gray–Wyner encoder,
f ( E ) ( Y 1 N , Y 2 N ) = f ¯ ( E ) ( W N , Z 1 N , Z 2 N ) = ( S 0 , S 1 , S 2 ) are independent.
Proof. 
Case (a). This follows from realization theory (since no constraints are imposed). Case (b). This is a specific application of Proposition 3. □
For the definition of C ( Y 1 , Y 2 ) , use is made of the construction of the actual family of measures such that ( Y 1 , Y 2 | W ) CIG holds, and the weak strochastic realization. These are presented in Theorem 8 and Corollary 1.

2.5. Characterization of Minimal Conditional Independence of a Triple of Gaussian Random Variables

Introduce the notation of the parameterization of the family of Gaussian probability distributions
P C I G = { P Y 1 , Y 2 , W ( y 1 , y 2 , w ) | P Y 1 , Y 2 | W ( y 1 , y 2 | w ) = P Y 1 | W ( y 1 | w ) P Y 2 | W ( y 2 | w ) , P Y 1 , Y 2 , W ( y 1 , y 2 , ) = P Y 1 , Y 2 ( y 1 , y 2 ) , ( Y 1 , Y 2 , W ) i s   j o i n t l y   G a u s s i a n } .
A subset of the set P C I G is the set of distributions P m i n C I G , with the additional constraint that the dimension of the random variable W is minimal while all other conditions hold, defined by
P m i n C I G = P Y 1 , Y 2 , W ( y 1 , y 2 , w ) P C I G | ( Y 1 , Y 2 | W ) CIG m i n P C I G .
The parameterization of the family of Gaussian probability distributions P C I G and P m i n C I G require the solution of the weak stochastic realization problem of Gaussian random variables defined by Problem 1. This problem is solved in [22] (Theorem 4.2). For the readers’ convenience, it is stated below.
Theorem 8.
Ref. [22] (Theorem 4.2) Consider a tuple ( Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables. Thus, the random variables Y 1 , Y 2 have the same dimension n = p 1 = p 2 , and their covariance matrix D R n × n is a nonsingular diagonal matrix with the diagonal ordered real-numbers in the interval ( 0 , 1 ) . Hence,
( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) = P 0 , Y 1 , Y 2 : Ω R n , n Z + ,
Q ( y 1 , y 2 ) = I D D I ,
D = Diag ( d 1 , d 2 , , d n ) R n × n , 1 > d 1 d 2 d n > 0 .
That is, p 11 = p 21 = 0 , p 13 = p 23 = 0 .
(a) There exists a probability measure P 1 , and a triple of Gaussian random variables Y 1 , Y 2 , W : Ω R n defined on it, such that (i) P 1 | ( Y 1 , Y 2 ) = P 0 and (ii) ( F Y 1 , F Y 2 | F W ) CIG min ;
(b) There exist a family of Gaussian measures denoted by P ci P m i n C I G , that satisfy (i) and (ii) of (a), and moreover, this family is parameterized by the matrices and sets, as follows.
G ( 0 , Q s ( Q W ) ) , Q W Q W ,
Q s = Q s ( Q W ) = I D D 1 / 2 D I D 1 / 2 Q W D 1 / 2 Q W D 1 / 2 Q W ,
Q W = Q W R n × n | Q W = Q W T , 0 < D Q W D 1 ,
P ci = G ( 0 , Q s ( Q W ) ) o n ( R 3 n , B ( R 3 n ) ) | Q W Q W P m i n C I G .
Furthermore, for any measure P 1 P m i n C I G , there exists a triple of state transformation of the form ( Y 1 , Y 2 , W ) ( S 1 Y 1 , S 2 Y 2 , S W W ) for nonsingular square matrices S 1 , S 2 , S W such that the corresponding measure of the three transformed variables belongs to P ci .
The application of Theorem 8 is discussed in the next remark, in the context of parameterizing any rate-triple on the Gray–Wyner lossy rate region ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) that lies on the Pangloss plane.
Remark 4.
Applications of Theorem 8.
(a) Theorem 8 is a parameterization of the family of Gaussian measures P ci P m i n C I G by the entries of the covariance matrix Q W . Hence, it is at most an n ( n + 1 ) / 2 dimensional parameterization;
(b) It is shown in Section 4.4 that only a subset of the achievable rate region R G W ( Δ 1 , Δ 2 ) = R G W ( Δ 1 , Δ 2 ) is generated from distributions P ci P m i n C I G P .
The next corollary is useful to the calculation of C ( Y 1 , Y 2 ) , since by Theorem 9, an achievable lower bound on I ( Y 1 , Y 2 ; W ) is incurred by a Gaussian random variable W, such that the distribution P Y 1 , Y 2 , W P ci P m i n C I G , corresponding to W G ( 0 , Q W ) . By Theorem 9, and since C ( Y 1 , Y 2 ) is invariant with respect to nonsingular transformations applied to ( Y 1 , Y 2 , W ) , the next corollary gives the realization of ( Y 1 , Y 2 ) as defined in Theorem 8, by (77)–(79), expressed in terms of an arbitrary Gaussian random variable W G ( 0 , Q W ) .
Corollary 1.
Consider a tuple ( Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict the attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79).
Then, a realization of the random variables ( Y 1 , Y 2 ) which induce the family of measures P ci P m i n C I G , defined by (80)–(83), is
Y 1 = Q Y 1 , W Q W 1 W + Z 1
Q Y 1 , W = D 1 / 2 , Z 1 G ( 0 , ( I D 1 / 2 Q W 1 D 1 / 2 ) ) ,
Y 2 = Q Y 2 , W Q W 1 W + Z 2
Q Y 2 , W = D 1 / 2 Q W , Z 2 G ( 0 , ( I D 1 / 2 Q W D 1 / 2 ) ) ,
( Z 1 , Z 2 , W ) , a r e   i n d e p e n d e n t .
Furthermore, the mutual information I ( Y 1 , Y 2 ; W ) is given by
I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W )
= 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( det ( [ I D 1 / 2 Q W 1 D 1 / 2 ] [ I D 1 / 2 Q W D 1 / 2 ] ) )
and it is parameterized by Q W Q W , where Q W is defined by the set of Equation (82).
Proof. 
The correctness of the realization is due to Proposition 2 and Theorem 8. The calculation of mutual information follows from the realization. □

3. Wyner’s Common Information

This section is devoted to the calculation of Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), for P Y 1 , Y 2 = G ( 0 , Q ( Y 1 , Y 2 ) ) , and the construction of the weak stochastic realization of ( Y 1 , Y 2 , W ) that achieves this.

3.1. Reduction of the Calculation of Wyner’s Common Information

First, we show Theorem 9, which states: given a tuple of multivariate correlated Gaussian random variables ( Y 1 , Y 2 ) , and an arbitrary random variable W (i.e., taking continuous or discrete values), Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), is minimized by a triple ( Y 1 , Y 2 , W ) which induces a jointly Gaussian distribution P Y 1 , Y 2 , W , and W : Ω W = R n is finite-dimensional Gaussian random variable.
Theorem 9.
Consider a tuple of multivariate-correlated Gaussian random variables Y 1 : Ω R p 1 , Y 2 : Ω R p 2 , p i Z + , i = 1 , 2 with the variance matrix of this tuple denoted by
( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) = Q Y 1 Q Y 1 , Y 2 Q Y 1 , Y 2 T Q Y 2 R ( p 1 + p 2 ) × ( p 1 + p 2 )
and, without loss of generality, assume that Q ( Y 1 , Y 2 ) is a positive definite matrix. Let W : Ω W be any auxiliary random variable, with W being an arbitrary measurable space, and P Y 1 , Y 2 , W any joint probability distribution of the triple ( Y 1 , Y 2 , W ) on the product space ( R p 1 × R p 2 × W ,   B ( R p 1 ) B ( R p 2 ) B ( W ) ) with ( Y 1 , Y 2 ) marginal P Y 1 , Y 2 the Gaussian distribution P Y 1 , Y 2 = G ( 0 , Q ( Y 1 , Y 2 ) ) .
The following hold.
(a) Define the random variables Z 1 , Z 2 by
Z i = Y i E [ Y i | W ] , Z i : Ω R p i , i = 1 , 2 .
The inequalities hold:
I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 , Y 2 | W )
= H ( Y 1 , Y 2 ) H ( Y 1 | Y 2 , W ) H ( Y 2 | W )
H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W )
= H ( Y 1 , Y 2 ) H ( Y 1 E [ Y 1 | W ] | W ) H ( Y 2 E [ Y 2 | W ] | W )
H ( Y 1 , Y 2 ) H ( Y 1 E [ Y 1 | W ] ) H ( Y 2 E [ Y 2 | W ] )
= H ( Y 1 , Y 2 ) H ( Z 1 ) H ( Z 2 )
H ( Y 1 , Y 2 ) H ( Z 1 ) H ( Z 2 ) i f Z 1 , Z 2 h a v e   f i n i t e   v a r i a n c e s ,   a n d   G a u s s i a n .
(b) If:
(i) W : Ω W = R n is an n dimensional, n Z + , Gaussian random variable;
(ii) ( Z 1 , Z 2 , W ) are mutually independent jointly Gaussian random variables, then all inequalities in (93)–(99) hold with equality, and ( Y 1 , Y 2 , W ) induces a family of jointly probability distributions P Y 1 , Y 2 , W with ( Y 1 , Y 2 ) marginal P Y 1 , Y 2 , such that W makes Y 1 and Y 2 conditionally independent, that is P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W ;
(c) Among all joint distributions, P Y 1 , Y 2 , W induced by the jointly Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) of (91), and an arbitrary random variable W : Ω W , such that the ( Y 1 , Y 2 ) marginal P Y 1 , Y 2 is the Gaussian distribution P Y 1 , Y 2 = G ( 0 , Q ( Y 1 , Y 2 ) ) , and P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W , a jointly Gaussian distribution achieves the lower bounds of I ( Y 1 , Y 2 ; W ) in part (a), i.e., achieves Wyner’s common information C ( Y 1 , Y 2 ) , defined by (13), and P Y 1 , Y 2 , W is induced by an n dimensional, n Z + , Gaussian random variable W : Ω W = R n , and ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) , and such a distribution is induced by the triple ( Y 1 , Y 2 , W ) represented by
Y 1 = E [ Y 1 | W ] + Z 1 , Y 2 = E [ Y 2 | W ] + Z 2 ,
( W , Z 1 , Z 2 ) a r e   m u t u a l l y   i n d e p e n d e n t ,   G a u s s i a n   r a n d o m   v a r i a b l e s .
Proof. 
(a) (93) is due to an identity of mutual information, (94) is due to the chain rule of entropy, (95) due to conditioning reduces entropy, (96) due to a property of conditional entropy, (97) due to conditioning reduces entropy, (98) is due to definition (92) and (99), is due to maximum entropy principle. (b) Since, Y i = E [ Y i | W ] + Z i , i = 1 , 2 and ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , if (i) and (ii) hold, then all inequalities hold with equality, and the statements are easily verified. (c) Follows from part (b). □
Remark 5.
Theorem 9 shows that, among all random variables W which induce a joint distribution P Y 1 , Y 2 , W , with ( Y 1 , Y 2 ) marginal P Y 1 , Y 2 the Gaussian distribution P Y 1 , Y 2 = G ( 0 , Q ( Y 1 , Y 2 ) ) , then for the Wyner’s common information C ( Y 1 , Y 2 ) problem, it suffices to consider a jointly Gaussian triple ( Y 1 , Y 2 , W ) such that W makes Y 1 and Y 2 conditionally independent.

3.2. Wyner’s Common Information of Correlated Random Variables

Assume that the tuple of multivariate correlated Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) of Theorem 9 is already transformed to the canonical variable representation, see Definition 1, using Algorithm A1, i.e., by the nonsingular transformation, S = Block diag ( S 1 , S 2 ) . Mutual information is invariant with respect to nonsingular transformations, and I ( Y 1 , Y 2 ; W ) = I ( S 1 Y 1 , S 2 Y 2 ; W ) .
By Theorem 9. (c), the joint probability distributions P Y 1 , Y 2 , W ( y 1 , y 2 , w ) are jointly Gaussian, and parameterized by the random variable W. This family of distributions is parameterized by the multidimensional random variable W, such that ( Y 1 , Y 2 ) are conditionally independent, conditioned on W, the marginal distribution P Y 1 , Y 2 , W ( y 1 , y 2 , ) = P Y 1 , Y 2 ( y 1 , y 2 ) coincides with the distribution of ( Y 1 , Y 2 ) , and to represent ( Y 1 , Y 2 ) .
Using the above construction, one obtains the next theorem.
Theorem 10.
Consider a tuple of Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q cvf ) as described and decomposed according to Algorithm A1. Restrict attention to the correlated parts of these random variables, as described in Theorem 8, (77)–(79) (i.e., only components ( Y 12 , Y 22 ) are present).
(a) Theorem 8 holds, and in particular, the family of jointly Gaussian distributions P Y 1 , Y 2 , W induced by ( Y 1 , Y 2 ) G ( 0 , Q cvf ) ) and a Gaussian random variable W : Ω R n , with minimum dimension n, such that the ( Y 1 , Y 2 ) marginal is P Y 1 , Y 2 = G ( 0 , G ( 0 , Q cvf ) ) , and P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W , is paremetrized by the family of Theorem 8. (b), i.e., (80)–(83);
(b) Corollary 1, (84)–(88) characterizes the family of realizations of ( Y 1 , Y 2 , W ) , parameterized by W, which induce jointly Gaussian distributions, such that W : Ω R n is a Gaussian random variable with minimum dimension n, P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W , and ( Y 1 , Y 2 ) G ( 0 , Q cvf ) ) . Moreover, Wyner’s common information C ( Y 1 , Y 2 ) is computed from the expression I ( Y 1 , Y 2 ; W ) of Corollary 1, (90), optimized over Q W Q W , where Q W is defined by the set of Equation (82).
Remark 6.
It is apparent that the proof of the formula C ( Y 1 , Y 2 ; W ) in [15,16] is based on rate distortion function, i.e., they do not directly address Wyner’s optimization problem (13), as in Theorem 9, which first shows, among all continuous or discrete random variables, W is Gaussian, and there is no parameterization of the set of distributions P Y 1 , Y 2 , W achieving conditional independence P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W , i.e., the optimization over the parameterized family of Gaussian measures of Theorem 8 is not given.
In the next theorem, the family of measures P ci P m i n C I G , defined by (80)–(83), which leads to realization of ( Y 1 , Y 2 ) , given in Corollary 1, is ordered for the determination of a single joint distribution P Y 1 , Y 2 , W P ci P m i n C I G , which achieves C ( Y 1 , Y 2 ) . This leads to the realization of ( Y 1 , Y 2 ) expressed in terms of W and vectors of independent Gaussian random variables ( Z 1 , Z 2 ) , one for each realization, each having independent components.
Theorem 11.
Consider a tuple ( Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, defined by (77)–(79.
The following hold.
(a) The information quantity C ( Y 1 , Y 2 ) is given by
C ( Y 1 , Y 2 ) = 1 2 i = 1 n ln 1 + d i 1 d i = 1 2 i = 1 n ln 1 + 2 d i 1 d i ( 0 , ) .
(b) The realizations of the random variables ( Y 1 , Y 2 , W ) that achieve C ( Y 1 , Y 2 ) are represented by
V : Ω R n , V G ( 0 , I ) , t h e   v e c t o r   V   h a s   i n d e p e n d e n t   c o m p o n e n t s , F V , F Y 1 F Y 2 , a r e   i n d e p e n d e n t   σ   a l g e b r a s ,
L 1 = L 2 = D 1 / 2 ( I + D ) 1 R n × n ,
L 3 = ( I D ) 1 / 2 ( I + D ) 1 / 2 R n × n , L 1 , L 2 , L 3 , a r e   d i a g o n a l   m a t r i c e s ,
W = L 1 Y 1 + L 2 Y 2 + L 3 V , W : Ω R n ,
Z 1 = Y 1 D 1 / 2 W , Z 1 : Ω R n ,
Z 2 = Y 2 D 1 / 2 W , Z 2 : Ω R n .
Then:
Z 1 G ( 0 , ( I D ) ) , Z 2 G ( 0 , ( I D ) ) , W G ( 0 , I ) ;
( Z 1 , Z 2 , W ) , a r e   i n d e p e n d e n t   a n d
Y 1 = D 1 / 2 W + Z 1 , Y 2 = D 1 / 2 W + Z 2
hence, the variables ( Y 1 , Y 2 , W ) induce a distribution P Y 1 , Y 2 , W P ci P m i n C I G . Note that, in addition, each of the random variables Z 1 , Z 2 , and W have independent components.
(c) The variables ( Y 1 , Y 2 , W ) defined in (b) induce a distribution P Y 1 , Y 2 , W P ci P m i n C I G which achieves C ( Y 1 , Y 2 ) ,
C ( Y 1 , Y 2 ) = I ( Y 1 , Y 2 ; W ) .
Proof. 
By Theorem 9, the random variables ( Y 1 , Y 2 , W ) are restricted to jointly Gaussian random variables. Since mutual information I ( Y 1 , Y 2 ; W ) is invariant with respect to nonsingular transformations S 1 , S 2 , i.e., I ( Y 1 , Y 2 ; W ) = I ( S 1 Y 1 , S Y 2 ; W ) , and ( F Y 1 , F Y 2 | F W ) CIG is equivalent to ( F S 1 Y 1 , F S 2 Y 2 | F W ) CIG , then it suffices to consider the canonical variable form of Definition 1, and to construct a measure that carries a triple of jointly Gaussian random variables Y 1 , Y 2 , W : Ω R n such that ( F Y 1 , F Y 2 | F W ) CIG .
(a) (1) Take a probability measure P 1 such that there exists a triple of Gaussian random variables Y 1 , Y 2 , W : Ω R n with P 1 | ( y 1 , y 2 ) = P 0 and ( F Y 1 , F Y 2 | F W ) CIG . It will first be proven that attention can be restricted to those state random variables W of which the dimension equals n = p 12 = p 22 .
Suppose that there exists a state random variable W : Ω R n 1 such that ( F Y 1 , F Y 2 | F W ) CIG and n 1 > n . Hence, W does not make ( Y 1 , Y 2 ) minimally conditionally independent. Construct a minimal vector which makes the tuple minimally conditionally independent according to the procedure of [22] (Proposition 3.5). Thus,
W 1 = E [ Y 1 | F W ] = L 11 W , L 11 R n × n 1 , W 2 = E [ Y 2 | F W 1 ] = L 12 W 1 , L 12 R n × n .
Then, ( F Y 1 , F Y 2 | F W 2 ) CIG min and the dimension of W 2 is n = p 12 = p 22 . Determine a linear transformation of W 2 by a matrix L 15 R n × n such that
W 3 = L 15 W 2 = L 15 L 12 L 11 W = L 13 W , L 13 = L 15 L 12 L 11 , W 3 G ( 0 , Q 3 ) , Q 3 = I n = L 13 Q W L 13 T .
It is then possible to construct a matrix L 14 R ( n 1 n ) × n 1 such that
W 4 = L 14 W , W 4 G ( 0 , Q 4 ) , Q 4 = I , L 14 Q W L 13 T = 0 ; W 3 W 4 G ( 0 , I n 1 ) , rank L 13 L 14 = n 1 ,
and, due to L 14 Q W L 13 T = 0 , W 3 , W 4 are independent random variables. See [23] [Theorem 4.9] for a theorem with which the existence of L 4 can be proven. Note further that F W = F W 3 , W 4 .
Hence, the random variables W 3 , W 4 are independent, ( F Y 1 , F Y 2 | F W 3 ) CIG min , and I ( Y 1 , Y 2 ; W ) = I ( Y 1 , Y 2 ; W 3 , W 4 ) .
By properties of mutual information, it now follows that
I ( Y 1 , Y 2 ; W 3 , W 4 ) I ( Y 1 , Y 2 ; W 3 ) = H ( Y 1 , Y 2 ) + H ( W 3 , W 4 ) H ( Y 1 , Y 2 , W 3 , W 4 ) H ( Y 1 , Y 2 ) H ( W 3 ) + H ( Y 1 , Y 2 , W 3 ) = H ( Y 1 , Y 2 , W 3 ) + H ( W 4 ) H ( Y 1 , Y 2 , W 3 , W 4 ) , b y   i n d e p e n d e n c e   o f W 3 a n d W 4 ; = I ( Y 1 , Y 2 , W 3 ; W 4 ) 0 .
Thus, for the computation of C ( Y 1 , Y 2 ) , attention can be restricted to those state variables W which are of miminal dimension.
(2) Take a probability measure P 1 such that there exists a triple of Gaussian random variables Y 1 , Y 2 , W : Ω R n with P 1 | ( Y 1 , Y 2 ) = P 0 and ( F Y 1 , F Y 2 | F W ) CIG min .
According to [22] (Theorem 4.2), there exist in general many such measures which are parameterized by the matrices and the sets, as stated in Theorem 8, (b), and defined by (80)–(83).
(3) Then, the mutual information of the triple of Gaussian random variables is calculated, using Theorem 8. (b) for any choice of Q W Q W , where Q W is given by (82). Then
I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W ) .
The following calculations are then obvious:
det ( Q ( Y 1 , Y 2 ) ) = det I D D I = det ( I D 2 ) = i = 1 n ( 1 d i 2 ) ; H ( Y 1 , Y 2 ) = 1 2 ln ( det ( Q ( y 1 , y 2 ) ) ) + 1 2 ( 2 n ) ln ( 2 π e ) = 1 2 i = 1 n ln ( 1 d i 2 ) + n ln ( 2 π e ) ; P Y 1 | W ( y 1 | w ) G ( E [ Y 1 | F W ] , Q Y 1 | W ) , E [ Y 1 | F W ] = Q Y 1 , W Q W 1 W = D 1 / 2 Q W 1 W ; b y ( 81 ) Q Y 1 | W = I Q Y 1 , W Q W 1 Q W Q W 1 Q Y 1 , W T = I D 1 / 2 Q W 1 D 1 / 2 ; b y ( 81 ) H ( Y 1 | W ) = 1 2 ln ( det ( I D 1 / 2 Q W 1 D 1 / 2 ) ) + 1 2 n ln ( 2 π e ) ; E [ Y 2 | F W ] = Q Y 2 , W Q W 1 W = D 1 / 2 Q W Q W 1 W = D 1 / 2 W ; Q Y 2 | W = I Q Y 2 , W Q W 1 Q W Q W 1 Q Y 2 , W T = I D 1 / 2 Q W D 1 / 2 ; H ( Y 2 | W ) = 1 2 ln ( det ( I D 1 / 2 Q W D 1 / 2 ) ) + 1 2 n ln ( 2 π e ) .
From the above calculations, it then follows
I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W )
= 1 2 i = 1 n ln ( 1 d i 2 ) + n ln ( 2 π e )
1 2 ln ( det ( I D 1 / 2 Q W 1 D 1 / 2 ) ) 1 2 n ln ( 2 π e )
1 2 ln ( det ( I D 1 / 2 Q W D 1 / 2 ) ) 1 2 n ln ( 2 π e )
= 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( det ( [ I D 1 / 2 Q W 1 D 1 / 2 ] [ I D 1 / 2 Q W D 1 / 2 ] ) ) .
The above calculations verify the statements of Corollary 1.
(4) The computation of C ( Y 1 , Y 2 ) requires the solution of an optimization problem.
C ( Y 1 , Y 2 ) = inf P 1 P ci I ( Y 1 , Y 2 ; W ) = inf Q W Q W 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( det ( [ I D 1 / 2 Q W 1 D 1 / 2 ] [ I D 1 / 2 Q W D 1 / 2 ] ) ) .
Since the first term in (115), 1 2 i = 1 n ln ( 1 d i 2 ) , does not depend on Q W and the natural logarithm is a strictly increasing function, then
C ( Y 1 , Y 2 ) i s   e q u i v a l e n t   t o : sup Q W Q W det ( I D 1 / 2 Q W 1 D 1 / 2 ) ( I D 1 / 2 Q W D 1 / 2 ) .
Define:
L 1 ( Q W ) = ( I D 1 / 2 Q W 1 D 1 / 2 ) ( I D 1 / 2 Q W D 1 / 2 ) ,
f 1 ( Q W ) = det ( L 1 ( Q W ) ) .
Note that the expression L 1 ( Q W ) R n × n is a non-symmetric square matrix in general.
It will be proven that
f 1 ( Q W ) = det ( L 1 ( Q W ) ) det ( [ I D ] 2 ) , Q W Q W ,
det ( L 1 ( Q W ) ) = det ( [ I D ] 2 ) i f a n d o n l y i f Q W = I .
From these two relations, it follows that Q W = I R n × n is the unique solution of the supremization problem.
The inequality in (119) follows from Proposition A4. The equality of (120) is proven in two steps. If Q W = I , then the equality of (120) holds as follows from direct substitution in (117). The converse is proven by contradiction. Suppose that Q W I . Then, it again follows from Proposition A4 that strict inequality holds in (119). Hence, the equality is proven.
(5) Finally, the value of C ( Y 1 , Y 2 ) is computed for Q W = I .
C ( Y 1 , Y 2 ) = 1 2 i = 1 n ln ( 1 d 1 2 ) 1 2 ln ( det ( I D 1 / 2 ( Q W ) 1 D 1 / 2 ) ) 1 2 ln ( det ( I D 1 / 2 ( Q W ) D 1 / 2 ) ) = 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 2 ln ( det ( I D ) ) = 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 i = 1 n ln ( ( 1 d i ) 2 ) = 1 2 i = 1 n ln 1 d i 2 ( 1 d i ) 2 = 1 2 i = 1 n ln ( 1 + d i 1 d i ) = 1 2 i = 1 n ln 1 + 2 d i 1 d i .
(b) It follows from part (a) of the theorem that C ( Y 1 , Y 2 ) is attained as the mutual information I ( Y 1 , Y 2 ; W ) for a random variable W with Q W = Q = I . Consider now a triple of random variables ( Y 1 , Y 2 , W ) G ( 0 , Q s ( I ) ) as defined in (80)–(83), hence, Q W = I . Denote the random variable W from now on by W to indicate that it achieves the infimum of the definition of C ( Y 1 , Y 2 ) . Thus, Q W = I and
( Y 12 , Y 22 , W ) G ( 0 , Q s ( I ) ) , Q s ( I ) = I D D 1 / 2 D I D 1 / 2 D 1 / 2 D 1 / 2 I > 0 .
Let V : Ω R n 12 be a Gaussian random variable with V G ( 0 , I ) which is independent of ( Y 1 , Y 2 , W ) .
Define the new state variable W ¯ = L 1 Y 1 + L 2 Y 2 + L 3 V . Then, ( Y 1 , Y 2 , V , W ) are jointly Gaussian and it has to be shown that then Q W ¯ = I , Q Y 1 , W ¯ = D 1 / 2 , and Q Y 2 , W ¯ = D 1 / 2 . These equalities follow from simple calculations using the expressions of L 1 , L 2 , and L 3 which calculations are omitted. It then follows from those calculations and the definition of the Gaussian measure G ( 0 , Q s ( I ) ) that, almost surely, W ¯ = W .
The signals are then represented by
Z 1 = Y 1 E [ Y 1 | F W ] = Y 1 Q Y 1 , W ( Q W ) 1 W = Y 1 D 1 / 2 W ,
Z 2 = Y 2 E [ Y 2 | F W ] = Y 2 Q Y 2 , W ( Q W ) 1 W = Y 2 D 1 / 2 W .
It is proven that the triple of random variables ( Z 1 , Z 2 , W ) are independent.
E [ Z 1 ( W ) T ] = E [ Y 1 ( W ) T ] D 1 / 2 E [ W ( W ) T ] = D 1 / 2 D 1 / 2 = 0 , E [ Z 2 ( W ) T ] = 0 , E [ Z 1 Z 2 T ] = E [ ( Y 1 D 1 / 2 W ) ( Y 2 D 1 / 2 W ) T ] = 0 .
Hence, the original signals are represented as shown by the formulas,
Y 1 = Z 1 + D 1 / 2 W , b y ( 121 ) , Q y 1 , W Q W 1 = D 1 / 2 , a n d   b y   d e f i n i t i o n   o f Z 1 , Y 2 = Z 2 + D 1 / 2 W , s i m i l a r l y .

3.3. Wyner’s Common Information of Arbitrary Gaussian Random Variables

First, the two special cases of (1) a tuple of independent Gaussian random variables and (2) a tuple of identical Gaussian random variables are analyzed. From those results and that of the previous subsection, one can then prove Wyner’s common information for arbitrary Gaussian random variables.
The special case of the canonical variable form with only private parts is presented below.
Proposition 5.
Consider the case of a tuple of Gaussian vectors with only private parts. Hence, the Gaussian measure is
( Y 13 , Y 23 ) G ( 0 , Q ( Y 13 , Y 23 ) ) , Q ( Y 13 , Y 23 ) = I 0 0 I , Y 13 : Ω R p 13 , Y 23 : Ω R p 23 .
(a) 
The minimal σ-algebra F W which makes Y 13 , Y 23 conditionally independent is the trivial σ-algebra denoted by F 0 = { , Ω } . Thus, ( F Y 13 , F Y 23 | F 0 ) CI . The random variable W, in this case, is the constant W 3 = 0 R , hence F W 3 = F 0 .
(b) 
Then, W = W 3 and
C ( Y 1 , Y 2 ) = C ( Y 13 , Y 23 ) = I ( Y 13 , Y 23 ; W 3 ) = 0 .
(c) 
The weak stochastic realization that achieves C ( Y 13 , Y 23 ) = 0 is
Z 1 = Y 13 , Z 2 = Y 23 , W 3 = 0 .
The special case of canonical variable form with only identical parts is presented below.
Proposition 6.
Consider the case of a tuple of Gaussian vectors with only the identical part. Hence the Gaussian measure is,
Y 11 : Ω R p 11 , Y 21 : Ω R p 21 , p 11 = p 21 , ( Y 11 , Y 21 ) G ( 0 , Q ( Y 11 , Y 21 ) ) , Q ( y 11 , y 21 ) = I I I I , Y 11 = Y 21 a . s .
(a) 
The only minimal σ-algebra which makes Y 11 and Y 21 Gaussian conditional-independent is F Y 11 = F Y 21 . The state variable is thus, W 1 = Y 11 = Y 21 and F W = F Y 11 = F Y 21 .
(b) 
Then C ( Y 1 , Y 2 ) = C ( Y 11 , Y 21 ) = + .
(c) 
The weak stochastic realization is again simple, the variable W equals the identical component and there is no need to use the signals Z 1 and Z 2 . Thus, the representations are,
Z 1 = 0 R , Z 2 = 0 R , W = Y 11 = Y 21 .
Theorem 4 is now proven. Thus, the setting is that of a tuple of arbitrary Gaussian random variables, not necessarily restricted to the correlated parts of these random variables of Theorem 8, by (77)–(79). It is shown that C ( Y 1 , Y 2 ) is computed by a decomposition and by the use of the formulas previously obtained in Section 3.2.
Proof of Theorem 4.
(a)
C ( Y 1 , Y 2 ) = inf ( Y 1 , Y 2 , W ) CIG I ( Y 1 , Y 2 ; W ) = inf I ( Y 11 , Y 21 ; W 1 ) + I ( Y 12 , Y 22 ; W 2 ) + I ( Y 13 , Y 23 ; 0 ) , b y   P r o p o s i t i o n   A 1 , inf ( Y 1 , Y 2 , W ) CIG I ( Y 11 , Y 21 ; W 1 ) + inf ( Y 1 , Y 2 , W ) CIG I ( Y 12 , Y 22 ; W 2 ) + inf ( Y 1 , Y 2 , W ) CIG I ( Y 13 , Y 23 ; 0 ) = inf ( Y 11 , Y 21 , W 1 ) CIG I ( Y 11 , Y 21 ; W 1 ) + inf ( Y 12 , Y 22 , W 2 ) CIG I ( Y 12 , Y 22 ; W 2 ) + I ( Y 13 , Y 23 ; 0 ) = C ( Y 11 , Y 21 ; W 1 ) + C ( Y 12 , Y 22 ; W 2 ) + C ( Y 13 , Y 23 ; 0 ) = 0 , i f p 13 > 0 , p 23 > 0 , p 11 = p 12 = p 21 = p 22 = 0 , 1 2 i = 1 n ln 1 + d i 1 d i , i f p 12 = p 22 > 0 , p 11 = p 21 = 0 , p 13 0 , p 23 0 , + , e l s e .
The latter equality follows from, respectively, Proposition 6, Theorem 11 and Proposition 5
(a and b). It will be shown that C ( Y 1 , Y 2 ) is less than or equal to the right-hand side of Equation (129). From the latter inequality and the above inequality then follows the expression according to Equation (129).
To be specific, it will be proven that C ( Y 1 , Y 2 ) is less than the expression I ( Y 1 , Y 2 ; W ) where W is defined in statement (b) of the proposition. It then follows from the proof of Theorem 11 that ( F Y 12 , F Y 22 | F W 2 ) CIG min .
Then:
C ( Y 1 , Y 2 ) = inf ( Y 1 , Y 2 | W ) CIG I ( Y 1 , Y 2 ; W ) I ( Y 1 , Y 2 ; W ) = I ( Y 11 , Y 21 ; W 1 ) + I ( Y 12 , Y 22 ; W 2 ) + I ( Y 13 , Y 23 ; ) = 0 , i f p 13 > 0 , p 23 > 0 , p 11 = p 12 = p 21 = p 22 = 0 , 1 2 i = 1 n ln 1 + d i 1 d i , i f p 12 = p 22 > 0 , p 11 = p 21 = 0 , p 13 0 , p 23 0 , + , e l s e .
The latter equality is proven as follows. In the first case, when p 13 > 0 , p 23 > 0 , and p 11 = p 12 = p 21 = p 22 = 0 , then Y 1 = Y 13 and Y 2 = Y 23 are independent random variables. It then follows from Proposition 5 that I ( Y 1 , Y 2 ; 0 ) = I ( Y 13 , Y 23 ; 0 ) = 0 . In the second case, when p 12 = p 22 > 0 , p 13 0 , p 23 0 , and p 11 = p 21 = 0 , it follows from Proposition A1 and from Theorem 11 that
I ( Y 1 , Y 2 ; W ) = I ( Y 12 , Y 22 ; W 2 ) + I ( Y 13 , Y 23 ; 0 ) = 1 2 i = 1 n ln 1 + d i 1 d i .
In the third case, when p 11 = p 21 > 0 and the other p i j indices are arbitrary, then I ( Y 1 , Y 2 ; W ) = + . Hence, the inequality C ( Y 1 , Y 2 ) r i g h t h a n d s i d e is proven and hence equality holds.
(c) This directly follows from Proposition 4. See also Section 3.6 of [21]. □
A procedure for the numerical calculations of Wyner’s information common information is given in Section 3.7 of [21].

4. Parametrization of Gray and Wyner Rate Region and Wyner’s Lossy Common Information

This section is devoted to the characterizations of rates that lie in the Gray–Wyner rate region for a tuple of Gaussian random variables with square error distortion functions.
By Gray–Wyner [1] (Theorem 8), reproduced in Theorem 1 to characterize rate triples ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) , it is necessary to:
(i) Characterize the rate distortion functions R Y i ( Δ i ) , R Y i | W ( Δ i ) , i = 1 , 2 and R Y 1 , Y 2 ( Δ 1 , Δ 2 ) ;
(ii) Construct the realizations that induce the test channels of R Y i ( Δ i ) , R Y i | W ( Δ i ) , i = 1 , 2 and R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , and understand the structural properties of the realizations.

4.1. Characterizations of Joint, Conditional and Marginal RDFs

Theorem 12 is the characterization of the joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) from [24].
Theorem 12.
Ref. [24] Consider a tuple of Gaussian random variables Y i : Ω R p i , with ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) 0 (which implies Q Y i > 0 , for i = 1 , 2 ). Consider the joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) with square error distortion functions D Y 1 ( y 1 , y ^ 1 ) = | | y 1 y ^ 1 | | R p 1 2 , D Y 2 ( y 2 , y ^ 2 ) = | | y 2 y ^ 2 | | R p 2 2 . Then, the following hold:
(a) The mutual information I ( Y 1 , Y 2 ; Y ^ 1 , Y ^ 2 ) satisfies
I ( Y 1 , Y 2 ; Y ^ 1 , Y ^ 2 ) I ( Y 1 , Y 2 ; E Y 1 | F Y ^ 1 , Y ^ 2 , E Y 2 | F Y ^ 1 , Y ^ 2 )
and the mean square error satisfies
E | | Y i Y ^ i | | R p 1 2 E | | Y i E Y i | F Y ^ 1 , Y ^ 2 | | R p i 2 , i = 1 , 2 .
Moreover, inequalities in (130) and (131) hold with equality if there exists a jointly Gaussian realization of ( Y ^ 1 , Y ^ 2 ) or a Gaussian test channel distribution P Y ^ 1 , Y ^ 2 | Y 1 , Y 2 such that the joint distribution P Y ^ 1 , Y ^ 2 , Y 1 , Y 2 is jointly Gaussian, and such that the following identities both hold;
E Y 1 | F Y ^ 1 , Y ^ 2 = E Y 1 | F Y ^ 1 = Y ^ 1 , E Y 2 | F Y ^ 1 , Y ^ 2 = E Y 2 | F Y ^ 2 = Y ^ 2 .
(b) A realization that achieves the lower bounds of part (a), i.e., satisfies (132), is the Gaussian realization of ( Y 1 , Y 2 , Y ^ 1 , Y ^ 2 ) given by
Y ^ 1 Y ^ 2 = H Y 1 Y 2 + V 1 V 2
( V 1 , V 2 ) G ( 0 , Q ( V 1 , V 2 ) ) , ( V 1 , V 2 ) i n d e p e n d e n t   o f ( Y 1 , Y 2 ) ,
H Q ( Y 1 , Y 2 ) = Q ( Y 1 , Y 2 ) Q ( E 1 , E 2 ) 0 ,
Q ( V 1 , V 2 ) = Q ( E 1 , E 2 ) H T = H Q ( Y 1 , Y 2 ) H Q ( Y 1 , Y 2 ) H T 0 , i f Q ( Y 1 , Y 2 ) > 0 t h e n
H = I p 1 + p 2 Q ( E 1 , E 2 ) Q ( Y 1 , Y 2 ) 1 , Q ( V 1 , V 2 ) = Q ( E 1 , E 2 ) Q ( E 1 , E 2 ) Q ( Y 1 , Y 2 ) 1 Q ( E 1 , E 2 ) 0 ,
where ( E 1 , E 2 ) is the error tuple, that satisfies the structural property,
E i = Y i E Y i | F Y ^ 1 , Y ^ 2 = Y i E Y i | F Y ^ i = Y i Y ^ i , i = 1 , 2
and the variance matrix of this tuple is,
( E 1 , E 2 ) G ( 0 , Q ( E 1 , E 2 ) ) , Q ( E 1 , E 2 ) = Q E 1 Q E 1 , E 2 Q E 1 , E 2 T Q E 2 R ( p 1 + p 2 ) × ( p 1 + p 2 ) .
(c) The joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) is characterized by
R Y 1 , Y 2 ( Δ 1 , Δ 2 ) = inf E | | E i | | R p i 2 = trace ( Q E i ) Δ i , i = 1 , 2 1 2 ln det ( Q ( Y 1 , Y 2 ) ) det ( Q ( E 1 , E 2 ) ) + [ 0 , ] ,
s u c h   t h a t Q ( Y ^ 1 , Y ^ 2 ) = Q ( Y 1 , Y 2 ) Q ( E 1 , E 2 ) 0
where the test channel distribution P Y ^ 1 , Y ^ 2 | Y 1 , Y 2 or the joint distribution P Y ^ 1 , Y ^ 2 , Y 1 , Y 2 is induced by the realization of part (b).
The conditional rate distortion function, derived in [25,26], is also required.
Theorem 13.
Ref. [25] (Theorem 1, Thorem 4), [26] Consider a triple of random variables Y i : Ω R p i , i = 1 , 2 , W : Ω W , where W is continuous or finite-valued, with joint distribution P Y 1 , Y 2 , W , and marginal distributions P Y 1 , Y 2 and P Y i , i = 1 , 2 , the jointly Gaussian distribution ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) 0 and Y i G ( 0 , Q Y i ) , Q Y i > 0 , i = 1 , 2 , respectively. Consider the conditional RDFs R Y i | W ( Δ i ) , i = 1 , 2 with square error distortion functions D Y i ( y i , y ^ i ) = | | y i y ^ i | | R p i 2 , i = 1 , 2 . Then, the following hold.
(a) For an arbitrary random variable, W : Ω W , the mutual information I ( Y i ; Y ^ i | W ) satisfies
I ( Y i ; Y ^ i | W ) I ( Y i ; E Y i | F Y ^ i , W | W ) , i = 1 , 2
and the mean square error satisfies
E | | Y i Y ^ i | | R p i 2 E | | Y i E Y i | F Y ^ i , W | | R p i 2 , i = 1 , 2 .
Moreover, inequalities in (141) and (142) hold with equality, if there exists a realization of Y ^ 1 of the test channel distribution P Y ^ i | Y i , W , such that the joint distribution P Y ^ i , Y i , W satisfies the identity:
X ^ i cm E Y i | F Y ^ i , W = E Y i | F Y ^ i = Y ^ i , i = 1 , 2 .
(b) Suppose P Y 1 , Y 2 , W is a jointly Gaussian distribution and W : Ω R n is Gaussian. A realization that achieves the lower bounds of part (a), i.e., satisfies (143), is the Gaussian realization of ( Y i , W , Y ^ i ) given by
Y ^ i = H i Y i + I p i H i Q Y i , W Q W W + V i , i = 1 , 2 ,
V i G ( 0 , Q V 1 ) , V i i n d e p e n d e n t   o f Y i ,
H i Q Y i | W = Q Y i | W Q E i 0 ,
Q V i = H i Q E i = H i Q Y i | W H i Q Y i | W H i T 0 ,
where denotes the pseudoinverse of a matrix, E i is the error, that satisfies the structural property,
E i = Y i E Y i | F Y ^ i , W = Y i Y ^ i , E i G ( 0 , Q E i ) , i = 1 , 2 .
The RDF R Y i | W ( Δ i ) for jointly Gaussian ( Y 1 , Y 2 , W ) is characterized by
R Y i | W ( Δ i ) = inf E | | E i | | R p i 2 = trace ( Q E i ) Δ i 1 2 ln det ( Q Y i | W ) det ( Q E i ) + [ 0 , ] , i = 1 , 2 ,
s u c h t h a t Q Y ^ i = Q Y i | W Q E i 0 ,
where the test channel distribution P Y ^ i | Y i , W or the joint distribution P Y ^ i , Y i , W is induced by the above realization.
The following is stated as a conjectured, because it is not shown in this paper; it can be shown using Theorem 13.
Conjecture 1.
Conditional RDF of Gaussian sources with arbitrary conditioning RV
Consider a triple of random variables Y i : Ω R p i , i = 1 , 2 , W : Ω W , where W is continuous or finite-valued, with joint distribution P Y 1 , Y 2 , W , and marginal distributions P Y 1 , Y 2 and P Y i , i = 1 , 2 , the jointly Gaussian distribution ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) 0 and Y i G ( 0 , Q Y i ) , Q Y i > 0 , i = 1 , 2 , respectively. Consider the conditional RDFs R Y i | W ( Δ i ) , i = 1 , 2 with square error distortion functions D Y i ( y i , y ^ i ) = | | y i y ^ i | | R p i 2 , i = 1 , 2 .
Then, the following hold.
(a) For an arbitrary random variable, W : Ω W , and X ^ i cm satisfying (143), the following lower bounds hold.
I ( X i ; X ^ i | W ) I ( X i ; X ^ i cm | W ) , i = 1 , 2
= W I ( X i ; X ^ i cm | W = w ) P W ( d w )
inf w W I ( X i ; X ^ i cm | W = w ) ,
E [ D X i ( X i , X ^ i ) ] = W R p 1 × R p 2 D X i ( x i , x ^ i ) P X i , X ^ i | W ( x i , x ^ i | w ) P W ( d w )
= W Δ i ( w ) P W ( d w ) ,
W Δ i cm ( w ) P W ( d w ) ,
inf w W Δ i cm ( w ) , i = 1 , 2 ,
where
Δ i ( w ) E [ D X i ( X i , X ^ i ) | W = w ] , Δ i cm ( w ) E [ D X i ( X i , X ^ i cm ) | W = w ] , i = 1 , 2 .
Moreover, the inequalities in (153), (157), are achieved if,
(i) (141) holds, and
(ii) the mutual information I ( X i ; X ^ i | W = w ) and Δ i ( w ) for i = 1 , 2 , are independent of w W .
(b) The rate distortion function R X i | W ( Δ i ) for W : Ω W a continuous or finite-valued, achieves a minimum value if,
(i) W : Ω R n , n Z + is Gaussian, and P Y i , W is jointly Gaussian,
(ii) ( X ^ i , X i , W ) is given by the realization of Theorem 13.(b) for i = 1 , 2 .
The characterization of the marginal RDFs R Y i ( Δ i ) , i = 1 , 2 —which are well-known, and can be found in many books—is also needed; the weak realization of the test channel, which follows from Theorem 13 (see also [25]), as a degenerate case, and summarized in the next theorem, is important in this paper.
Theorem 14.
Ref. [25] (Theorem 1, Theorem 4) Consider a tuple of Gaussian random variables Y i : Ω R p i , i = 1 , 2 , with ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q ( Y 1 , Y 2 ) > 0 , Q Y i > 0 , for i = 1 , 2 .
For the marginal RDFs R Y i ( Δ i ) , i = 1 , 2 with square error distortion functions D Y i ( y i , y ^ i ) = | | y i y ^ i | | R p i 2 , the statements of Theorem 13 hold with W, generating the trivial information, i.e., F W = { Ω , } . That is, the marginal RDFs R Y i ( Δ i ) are characterized by
R Y i ( Δ i ) = inf E | | E i | | R p i 2 = trace ( Q E i ) Δ i 1 2 ln det ( Q Y i ) det ( Q E i ) + [ 0 , ] , i = 1 , 2 ,
s u c h t h a t Q Y ^ i = Q Y i Q E i 0
where the test channel distribution P Y ^ i | Y i or the joint distribution P Y ^ i , Y i is induced by the realization
Y ^ i = H i Y i + V i , i = 1 , 2 ,
V i G ( 0 , Q V 1 ) , V i i n d e p e n d e n t   o f Y i ,
H i Q Y i = Q Y i Q E i 0 ,
Q V i = H i Q E i = H i Q Y i H i Q Y i H i T 0 .
and where E i is the error that satisfies the structural property
E i = Y i E Y i | F Y ^ i = Y i Y ^ i , E i G ( 0 , Q E i ) , i = 1 , 2 .
Then, we express the characterization of the joint RDF of Theorem 12, using the canonical variable form, and the canonical correlation coefficients. The special case when Q ( E 1 , E 2 ) is is block-diagonal is given in [27].
Theorem 15.
Consider the statement of Theorem 12. Compute the canonical variable form of the tuple of Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q Y i > 0 , according to Algorithm A1. This yields the indices p 11 = p 21 , p 12 = p 22 , p 13 , p 23 , and n = p 11 + p 12 = p 21 + p 22 , the diagonal matrix D 4 with canonical correlation coefficients d 4 , i ( 0 , 1 ) for i = 1 , , p 12 , and decompositions (see Algorithm A1, 1–4)
Q Y 1 = U 1 D 1 U 1 T , Q Y 2 = U 2 D 2 U 2 T ,
with U i R p i × p i orthogonal ( U i U i T = I p i = U i T U i ), i = 1 , 2 , singular-value decomposition of
D 1 1 2 U 1 T Q Y 1 Y 2 U 2 D 2 1 2 = U 3 D 3 U 4 T ,
with U 3 R p 1 × p 1 , U 4 R p 2 × p 2 orthogonal,
D 3 = I p 11 0 0 0 D 4 0 0 0 0 R p 1 × p 2 ,
D 4 = Diag ( d 4 , 1 , . . . , d 4 , p 12 ) R p 12 × p 12 , 1 > d 4 , 1 d 4 , 2 d 4 , p 12 > 0 .
Define the new variance matrix of Q ( Y 1 , Y 2 ) according to
Q cvf = I p 1 D 3 D 3 T I p 2 .
Compute the canonical variable form of the tuple of Gaussian error random variables ( E 1 , E 2 ) G ( 0 , Q ( E 1 , E 2 ) ) of Theorem 12.(b), according to Algorithm A1. This yields the indices p ¯ 11 = p ¯ 21 , p ¯ 12 = p ¯ 22 , p ¯ 13 , p ¯ 23 , and n ¯ = p ¯ 11 + p ¯ 12 = p ¯ 21 + p ¯ 22 and the diagonal matrix D ¯ 4 with canonical correlation coefficients d ¯ 4 , i ( 0 , 1 ) for i = 1 , , p ¯ 12 , and decompositions (see Algorithm A1, 1–4),
Q E 1 = U ¯ 1 D ¯ 1 U ¯ 1 T , Q E 2 = U ¯ 2 D ¯ 2 U ¯ 2 T ,
D ¯ i = Diag ( d ¯ i , 1 , , d ¯ i , p i ) R p i × p i , d ¯ i , 1 d ¯ i , 2 d ¯ i , p i > 0 , i = 1 , 2 ,
with U ¯ i R p i × p i orthogonal ( U ¯ i U ¯ i T = I p i = U ¯ i T U ¯ i ), i = 1 , 2 , singular-value decomposition of
D ¯ 1 1 2 U ¯ 1 T Q E 1 E 2 U ¯ 2 D ¯ 2 1 2 = U ¯ 3 D ¯ 3 U ¯ 4 T ,
with U ¯ 3 R p 1 × p 1 , U ¯ 4 R p 2 × p 2 orthogonal,
D ¯ 3 = I p ¯ 11 0 0 0 D ¯ 4 0 0 0 0 R p 1 × p 2 ,
D ¯ 4 = Diag ( d ¯ 4 , 1 , . . . , d ¯ 4 , p ¯ 12 ) R p ¯ 12 × p ¯ 12 , 1 > d ¯ 4 , 1 d ¯ 4 , 2 d ¯ 4 , p ¯ 12 > 0 .
Define the new variance matrix of Q ( E 1 , E 2 ) according to,
Q ¯ cvf = I p 1 D ¯ 3 D ¯ 3 T I p 2 .
The joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) of Theorem 12. (c) is equivalently characterized by
R Y 1 , Y 2 ( Δ 1 , Δ 2 ) = inf Q ( E 1 , E 2 ) 0 : n ¯ Z + , i = 1 p 1 d ¯ 1 , i Δ 1 , i = 1 p 2 d ¯ 2 , i Δ 2 1 2 ln det ( D 1 ) det ( D 2 ) det ( Q cvf ) det ( D ¯ 1 ) det ( D ¯ 2 ) det ( Q ¯ cvf ) + [ 0 , ] ,
s u c h   t h a t Q ( Y ^ 1 , Y ^ 2 ) = Q ( Y 1 , Y 2 ) Q ( E 1 , E 2 ) 0 ,
where
det ( Q cvf ) = det ( I p 1 D 3 D 3 T )
= 1 , i f p 13 > 0 , p 23 > 0 , p 11 = p 12 = p 21 = p 22 = 0 , i = 1 n 1 d 4 , i 2 , i f p 11 = p 21 = 0 , p 12 = p 22 = n , p 13 0 , p 23 0 , 0 , i f p 11 = p 21 > 0 , p 12 = p 22 0 , p 13 0 , p 23 0 ,
det ( Q ¯ cvf ) = det ( I p 1 D ¯ 3 D ¯ 3 T )
= 1 , i f p ¯ 13 > 0 , p ¯ 23 > 0 , p ¯ 11 = p ¯ 12 = p ¯ 21 = p ¯ 22 = 0 , i = 1 n ¯ 1 d ¯ 4 , i 2 , i f p ¯ 11 = p ¯ 21 = 0 , p ¯ 12 = p ¯ 22 = n ¯ , p ¯ 13 0 , p ¯ 23 0 , 0 , i f p ¯ 11 = p ¯ 21 > 0 , p ¯ 12 = p ¯ 22 0 , p ¯ 13 0 , p ¯ 23 0 ,
Moreover, a necessary condition for R Y 1 , Y 2 ( Δ 1 , Δ 2 ) < + is p ¯ 11 = p ¯ 21 = 0 .
Proof. 
First, apply Algorithm A1 to the tuple of Gaussian random variables ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , and then to the Gaussian random variables ( E 1 , E 2 ) G ( 0 , Q ( E 1 , E 2 ) ) of Theorem 12. (b). This gives (166)–(176). Then, (177) follows from (139) using (166)–(176), and the standard properties of a determinant of a matrix. The remaining equations are obtained from (166)–(176). The last statement follows from the values of det ( Q ¯ cvf ) , det ( Q cvf ) .
Remark 7.
By Theorem 15, since R Y 1 , Y 2 ( Δ 1 , Δ 2 ) [ 0 , ] , by (180), it suffices to consider Q ( Y 1 , Y 2 ) > 0 , which implies p 11 = p 21 = 0 , Q Y i > 0 , i = 1 , 2 . Furthermore, to ensure R Y 1 , Y 2 ( Δ 1 , Δ 2 ) [ 0 , ) , it suffices to also consider Q ( E 1 , E 2 ) > 0 , which implies that p ¯ 11 = p ¯ 21 = 0 , Q E i > 0 , i = 1 , 2 .
From Theorem 15, the next corollary directly follows which identifies the subset of the distortion region such that Gray’s lower bound [28], R Y 1 , Y 2 ( Δ 1 , Δ 2 ) R Y 2 | Y 1 ( Δ 2 ) + R Y 1 ( Δ 1 ) holds with equality.
Corollary 2.
Consider the statement of Theorem 15, and without loss of generality, assume ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , with Q ( Y 1 , Y 2 ) > 0 (and hence Q Y i > 0 , i = 1 , 2 ).
The joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) with D 1 , D 2 , D ¯ 1 , D ¯ 2 , Q cvf , Q ¯ cvf , defined in Theorem 15, and corresponding to p 11 = p 21 = p ¯ 11 = p ¯ 21 = 0 satisfies the lower bound ( R Y 2 | Y 1 ( Δ 2 ) is obtained from Theorem 13 by letting W = Y 1 .),
R Y 1 , Y 2 ( Δ 1 , Δ 2 ) R Y 2 | Y 1 ( Δ 2 ) + R Y 1 ( Δ 1 ) = inf E | | E 2 | | R p 2 2 = trace ( Q E 2 ) Δ 2 1 2 ln ( det ( Q Y 2 | Y 1 ) det ( Q E 2 ) )
+ inf E | | E 1 | | R p 1 2 = trace ( Q E 1 ) Δ 1 1 2 ln ( det ( Q Y 1 ) det ( Q E 1 ) )
= inf i = 1 p 2 d ¯ 2 , i Δ 2 1 2 ln ( det ( D 2 ) det ( Q cvf ) det ( D ¯ 2 ) ) + inf i = 1 p 1 d ¯ 1 , i Δ 1 1 2 ln ( det ( D 1 ) det ( D ¯ 1 ) )
that is, p ¯ 12 = p ¯ 22 = 0 .
Moreover, the inequalities (184) and (185) hold with the equalities, on the strictly positive surface D C ( Y 1 , Y 2 ) , defined by
D C ( Y 1 , Y 2 ) = ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] | Q ( Y 1 , Y 2 ) Q ( E 1 , E 2 ) > 0 .
Proof. 
The lower bound (183) is due to Gray [28]. The equality in (184) follows by using the values of the rate distortion functions in the right hand side of (183). Equality (185) follows from the singular value decomposition of the matrices given in Theorem 15, using Q Y 2 | Y 1 = Q Y 2 Q Y 2 , Y 1 Q Y 1 1 Q Y 2 , Y 1 T . To establish the equalities, note that (177) with det ( Q ¯ cvf ) = 1 equivalently p ¯ 12 = p ¯ 22 = 0 , is precisely (185). Moreover, it can be easily verified that p ¯ 12 = p ¯ 22 = 0 for the distortion region D C ( Y 1 , Y 2 ) . □

4.2. Wyner’s Lossy Common Information of Correlated Gaussian Vectors

Derived in this section are the characterizations of C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) via Theorem 2, for jointly Gaussian random variables with square-error distortion, as well as C W ( Y 1 , Y 2 ) via Theorem 3.
Definition 7.
Wyner’s lossy common information of a tuple of Gaussian multivariate random variables. Consider a tuple of jointly Gaussian random variables Y 1 : Ω R p 1 Y 1 , Y 2 : Ω R p 2 Y 2 , in terms of the notation ( Y 1 , Y 2 ) G ( 0 , Q ( Y 1 , Y 2 ) ) , Q Y i > 0 , i = 1 , 2 , and square error distortion functions between ( y 1 , y 2 ) , and its reproduction ( y ^ 1 , y ^ 2 ) , given by
D Y 1 ( y 1 , y ^ 1 ) = | | y 1 y ^ 1 | | R p 1 2 , D Y 2 ( y 2 , y ^ 2 ) = | | y 2 y ^ 2 | | R p 2 2
where | | · | | R p i 2 denotes Euclidean distances on R p i , i = 1 , 2 .
(a) Wyner’s common information (information definition) of the tuple of Gaussian random variables ( Y 1 , Y 2 ) is defined by the expression
C ( Y 1 , Y 2 ) = inf W : Ω R n , ( F Y 1 , F Y 2 | F W ) CIG I ( Y 1 , Y 2 ; W ) [ 0 , ] .
Call any random variable W as defined above such that ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) and ( F Y 1 , F Y 2 | F W ) CIG a state of the tuple ( Y 1 , Y 2 ) .
If there exists a random variable W : Ω R n with n Z + = { 1 , 2 , , } which attains the infimum; thus, if C ( Y 1 , Y 2 ) = I ( Y 1 , Y 2 ; W ) , then call that random variable a minimal information state of the tuple ( Y 1 , Y 2 ) .
(b) Wyner’s common information (operational definition) is defined for a tuple of strictly positive real numbers γ = ( γ 1 , γ 2 ) R + + × R + + = ( 0 , ) × ( 0 , ) such that, for all 0 ( Δ 1 , Δ 2 ) γ ,
C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) , f o r ( Δ 1 , Δ 2 ) D W = ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] | 0 ( Δ 1 , Δ 2 ) γ
provided identity (15) holds, i.e., R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) = R Y 1 , Y 2 ( Δ 1 , Δ 2 ) .
By the above definition, the problem of calculating Wyner’s lossy common information via (18) is decomposed into the characterization of C ( Y 1 , Y 2 ) such that identity (15) is satisfied. This follows from the fact that the only difference between C W ( Y 1 , Y 2 ) and C ( Y 1 , Y 2 ) is the specification of the region D W such that C G W ( Y 1 , Y 2 ; Δ 1 , Δ 2 ) = C W ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) is constant for ( Δ 1 , Δ 2 ) D W .
In the next theorem, we make use of the characterizations of the various rate distortion functions, and the test channel realizations to identify subsets of the rate region that lie on the Pangloss plane, and are consistent with the characterization of Viswanatha, Akyol and Rose [12] (Theorem 1, Equations (19) and (20)).
Theorem 16.
Consider a tuple ( Y 1 , Y 2 ) of Gaussian random variables in the canonical variable form of Definition 1. Restrict the attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79). Furthermore, consider a realization of the random variables ( Y 1 , Y 2 ) which induces the family of measures P ci P m i n C I G , as defined in Corollary 1, by (84)–(88).
Then, the following hold.
(a) The joint rate distortion function R Y 1 , Y 2 ( Δ 1 , Δ 2 ) of ( Y 1 , Y 2 ) with square error distortion satisfies
R Y 1 , Y 2 ( Δ 1 , Δ 2 ) = inf j = 1 n Δ 1 , j Δ 1 , j = 1 n Δ 2 , j Δ 2 1 2 j = 1 n ln ( 1 d j 2 ) Δ 1 , j Δ 2 , j , ( Δ 1 , Δ 2 ) D C ( Y 1 , Y 2 ) ,
trace ( Q E 1 ) = E | | Y 1 Y ^ 1 | | R n 2 = j = 1 n Δ 1 , j , trace ( Q E 2 ) = E | | Y 2 Y ^ 2 | | R n 2 = j = 1 n Δ 2 , j
where D C ( Y 1 , Y 2 ) is a strictly positive surface, defined by
D C ( Y 1 , Y 2 ) = ( Δ 1 , Δ 2 ) [ 0 , ] × [ 0 , ] | Q ( Y 1 , Y 2 ) Q ( E 1 , E 2 ) > 0
and where Q ( E 1 , E 2 ) is the variance of the errors E i = Y i Y ^ i , i = 1 , 2 , with parameters p ¯ 11 = p ¯ 21 = p ¯ 12 = p ¯ 22 = 0 , and p ¯ 13 = p ¯ 23 = n .
The conditional rate distortion functions R Y i | W ( Δ i ) of Y i conditioned on W with square error distortion, and mutual information I ( Y 1 , Y 2 ; W ) satisfy
R Y 1 | W ( Δ 1 ) = inf trace ( Q E 1 ) Δ 1 1 2 ln det ( I D 1 / 2 Q W 1 D 1 / 2 ) det ( Q E 1 ) + , Δ 1 [ 0 , )
R Y 2 | W ( Δ 2 ) = inf trace ( Q E 2 ) Δ 2 1 2 ln det ( I D 1 / 2 Q W D 1 / 2 ) det ( Q E 2 ) + , Δ 2 [ 0 , )
I ( Y 1 , Y 2 ; W ) = 1 2 ln det ( I D 2 ) det ( [ I D 1 / 2 D W 1 D 1 / 2 ] [ I D 1 / 2 D W D 1 / 2 ] ) + .
where trace ( Q E i ) , i = 1 , 2 are defined as in (191).
(b) The representations of reproductions (the reader may verify that the realization satisfies the conditions given in Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)), ( Y ^ 1 , Y ^ 2 ) of ( Y 1 , Y 2 ) at the output of decoder 1 and decoder 2, which achieve the joint rate distortion functions R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , i = 1 , 2 of part (a), are
Y 1 = D 1 / 2 Q W 1 W + Z 1 ,
Y 2 = D 1 / 2 W + Z 2 ,
Y ^ 1 = Y 1 Q E 1 ( I D 1 / 2 Q W 1 D 1 / 2 ) 1 Z 1 + V 1 ,
= D 1 / 2 Q W 1 W + A 1 Z 1 + V 1 ,
Y ^ 2 = Y 2 Q E 2 ( I D 1 / 2 Q W D 1 / 2 ) 1 Z 2 + V 2 ,
= D 1 / 2 W + A 2 Z 2 + V 2 ,
Z 1 G ( 0 , ( I D 1 / 2 Q W 1 D 1 / 2 ) ) , Z 2 G ( 0 , ( I D 1 / 2 Q W D 1 / 2 ) ) , W G ( 0 , Q W ) ,
Q E 1 = E { ( Y 1 Y ^ 1 ) ( Y 1 Y ^ 1 ) T } , Q E 2 = E { ( Y 2 Y ^ 2 ) ( Y 2 Y ^ 2 ) T } ,
V 1 G ( 0 , Q E 1 A 1 T ) , V 2 G ( 0 , Q E 2 A 2 T ) ,
A 1 = I Q E 1 ( I D 1 / 2 Q W 1 D 1 / 2 ) 1 , A 2 = I Q E 2 ( I D 1 / 2 Q W D 1 / 2 ) 1 ,
Q E i = U i Λ i U i T , Λ i = Diag ( Δ i , 1 , , Δ i , n ) R n × n , U i U i T = U i T U i = I , i = 1 , 2 ,
Q Y 1 | W = I D 1 / 2 Q W 1 D 1 / 2 = U 1 Λ Y 1 | W U 1 T , Λ Y 1 | W = Diag ( Λ Y 1 | W , 1 , , Λ Y 1 | W , n ) ,
Q Y 2 | W = I D 1 / 2 Q W D 1 / 2 = U 2 Λ Y 2 | W U 2 T , Λ Y 2 | W = Diag ( Λ Y 2 | W , 1 , , Λ Y 2 | W , n ) ,
( V 1 , V 2 , Z 1 , Z 2 , W ) , a r e   i n d e p e n d e n t
and are parameterized by Q W Q W , where Q W is defined by the set of Equation (82).
Moreover, the joint distribution P Y 1 , Y 2 , Y ^ 1 , Y ^ 2 , W satisfies (the reader may verify that conditions (210) are identical to Viswanatha, Akyol and Rose [12], Theorem 1, Equations (19) and (20)) for rates that lie on the Pangloss plane)
P Y ^ 1 , Y ^ 2 | W = P Y ^ 1 | W P Y ^ 2 | W , P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 , W = P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 ,
P Y 1 , Y 2 , Y ^ 1 , Y ^ 2 , W = P Y ^ 1 | Y 1 , W P Y ^ 2 | Y 2 , W P Y 1 | W P Y 2 | W P W ,
P Y 1 , Y 2 , Y ^ 1 , Y ^ 2 , W = P Y 1 | Y ^ 1 P Y 2 | Y ^ 2 P Y ^ 1 | W P Y ^ 2 | W P W .
(c) Consider part (a) and the realization of part (b). Then, R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) = R Y 1 , Y 2 ( Δ 1 , Δ 2 ) on the subset D C ( Y 1 , Y 2 ) , defined by (192) such that (210) holds.
(d) Suppose Q W = Q W Q W is diagonal, i.e., Q W = Diag ( Q W 1 , , Q W n ) , d i Q W i d i 1 , i . Then, the conditional RDFs R Y i | W ( Δ i ) are given by
R Y 1 | W ( Δ 1 ) = inf j = 1 n Δ 1 , j = Δ 1 1 2 j = 1 n ln ( 1 d j / Q W j ) Δ 1 , j ,
R Y 2 | W ( Δ 2 ) = inf j = 1 n Δ 2 , j = Δ 2 1 2 j = 1 n ln ( 1 d j Q W j ) Δ 2 , j ,
and the optimal Δ 1 , j , Δ 2 , j are obtained from the water-filling equations,
Δ 1 , j = λ , λ < 1 d j / Q W j 1 d j , λ 1 d j / Q W j , Δ 1 [ 0 , ) ,
Δ 2 , j = λ , λ < 1 d j Q W j 1 d j , λ 1 d j Q W j , Δ 2 [ 0 , ) .
and the representations of part (b) hold, with Q E i , Q Y i | W diagonal matrices.
Proof. 
(a) Since the attention is restricted to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), then the statements of joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) of part (a) are a special case of Theorem 12. (c), and obtained from Corollary 2. Similarly, expressions (193)–(195) follow from (13). However, as demonstrated shortly, these also follow, from the derivation of part (b). (b) Recall that the joint rate distortion function is achieved by a jointly Gaussian distribution P Y 1 , Y 2 , Y ^ 1 , Y ^ 2 such that the average square-error distortions are satisfied. Consider the realization of the random variables ( Y 1 , Y 2 ) which induce the family of measures P ci P m i n C I G , as defined in Corollary 1, by (84)–(88). By properties of mutual information, then
I ( Y 1 , Y 2 ; Y ^ 1 , Y ^ 2 ) = H ( Y 1 , Y 2 ) I ( Y 1 , Y 2 | Y ^ 1 , Y ^ 2 )
= H ( Y 1 , Y 2 ) H ( Y 1 | Y ^ 1 , Y ^ 2 , Y 2 ) H ( Y 2 | Y ^ 1 , Y ^ 2 )
= H ( Y 1 , Y 2 ) H ( Y 2 | Y ^ 1 , Y ^ 2 , Y 1 ) H ( Y 1 | Y ^ 1 , Y ^ 2 )
H ( Y 1 , Y 2 ) H ( Y 1 | Y ^ 1 ) H ( Y 2 | Y ^ 2 ) , c o n d .   r e d u c e s   e n t r o p y ,
= 1 2 i = 1 n ln ( 1 d i 2 ) + n ln ( 2 π e ) H ( Y 1 | Y ^ 1 ) H ( Y 2 | Y ^ 2 ) 1 2 i = 1 n ln ( 1 d i 2 ) + n ln ( 2 π e ) 1 2 i = 1 n ln ( Δ 1 , i ) 1 2 n ln ( 2 π e )
1 2 i = 1 n ln ( Δ 2 , i ) 1 2 n ln ( 2 π e ) , m a x i m u m   e n t r o p y   o f   G a u s .   d i s t .
= 1 2 i = 1 n ln ( 1 d i 2 ) Δ 1 , i Δ 2 , i
where i = 1 n Δ 1 , i = E [ | | Y 1 Y ^ 1 | | R n 2 ] Δ 1 and i = 1 n Δ 2 , i = E [ | | Y 2 Y ^ 2 | | R n 2 ] Δ 2 . The average distortion satisfies
Δ 1 E [ | | Y 1 Y ^ 1 | | R n 2 ] E [ | | Y 1 E [ Y 1 | F Y ^ 1 ] | | R n 2 ] ,
Δ 2 E [ | | Y 1 Y ^ 1 | | R n 2 ] E [ | | Y 1 E [ Y 1 | F Y ^ 1 ] | | R n 2 ] .
Furthermore,
i f P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 = P Y 1 | Y ^ 1 P Y 2 | Y ^ 2 t h e n   i n e q u a l i t y   ( 220 )   h o l d s   w i t h   e q u a l i t y ,
i f E [ Y 1 | F Y ^ 1 ] = Y ^ 1 t h e n   i n e q u a l i t y   ( 224 )   h o l d s   w i t h   e q u a l i t y ,
i f E [ Y 2 | F Y ^ 2 ] = Y ^ 2 t h e n   i n e q u a l i t y   ( 225 )   h o l d s   w i t h   e q u a l i t y .
It can be verified that the representations (196)–(209) satisfy P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 = P Y 1 | Y ^ 1 P Y 2 | Y ^ 2 , E [ Y 1 | F Y ^ 1 ] = Y ^ 1 , E [ Y 2 | F Y ^ 2 ] = Y ^ 2 , and that all inequalities become equalities. The decomposition of the joint distribution according to (210) follows from the representations of ( Y ^ 1 , Y ^ 2 ) , and similarly for (211) and (112). The conditional RDFs R Y i | W ( Δ i ) , i = 1 , 2 are shown as above. (c) This is easily verified, because (210) holds and hence rates lie on the Pangloss plane, for the strictly positive surface, D C ( Y 1 , Y 2 ) . (d) This follows directly from parts (a)–(c). □
Remark 8.
We should emphasize that Theorem 16 does not fully characterize the Pangloss plane, i.e., the subset of the rate regions such that R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) = R Y 1 , Y 2 ( Δ 1 , Δ 2 ) is larger than D C ( Y 1 , Y 2 ) . To determine the entire set that characterizes the Pangloss plane, we need to consider the rate distortion function (177) with (178), and the general realization (40). We do not pursue this further, because it requires the closed-form solution of R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , which is currently an open and challenging problem, and beyond the scope of this paper. We should mention that the analysis of the scalar-valued Gaussian example in [12,13], i.e., when p 1 = p 2 = 1 , made use of closed-form expression of R Y 1 , Y 2 ( Δ 1 , Δ 2 ) due to [13].
Proof 
(Proof of Theorem 6). One way to prove the statement is to compute the characterizations of the rate distortion functions R Y i ( Δ i ) , R Y i | W ( Δ i ) , i = 1 , 2 and R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , using the realization of the random variables ( Y 1 , Y 2 ) which induce the family of measures P ci P m i n C I G , as defined in Corollary 1, by (84)–(88). In view of Definition 7. (b), it suffices to verify that identity (15) holds, i.e., R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) = R Y 1 , Y 2 ( Δ 1 , Δ 2 ) for ( Δ 1 , Δ 2 ) D W , for the choice W = W G ( 0 , I ) which achieves the minimum in (188) (i.e., due to Theorem 11. (b)).
Similar to Theorem 16, it can be shown that the conditional RDFs R Y i | W ( Δ i ) , i = 1 , 2 are given by
R Y 1 | W ( Δ 1 ) = inf j = 1 n Δ 1 , j Δ 1 1 2 j = 1 n ln ( 1 d j ) Δ 1 , j , W G ( 0 , I )
R Y 2 | W ( Δ 2 ) = inf j = 1 n Δ 2 , j Δ 1 1 2 j = 1 n ln ( 1 d j ) Δ 2 , j , W G ( 0 , I ) ,
E | | Y 1 Y ^ 1 | | R n 2 = j = 1 n Δ 1 , j , E | | Y 2 Y ^ 2 | | R n 2 = j = 1 n Δ 2 , j .
The pay-off of the joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) in (190) is related to the pay-offs of the conditional RDFs R Y 1 | W ( Δ 1 ) , R Y 2 | W ( Δ 2 ) , and C ( Y 1 , Y 2 ) = I ( Y 1 , Y 2 ; W ) in (102), via the identity
1 2 j = 1 n ln ( 1 d j 2 ) Δ 1 , j Δ 2 , j = 1 2 j = 1 n ln ( 1 d j ) Δ 1 , j + 1 2 j = 1 n ln ( 1 d j ) Δ 2 , j + 1 2 i = 1 n ln 1 + d i 1 d i .
For ( Δ 1 , Δ 2 ) D W defined by (47), it then follows from (232), the identity
R Y 1 , Y 2 ( Δ 1 , Δ 2 ) = inf j = 1 n Δ 1 , j Δ 1 , j = 1 n Δ 2 , j Δ 2 1 2 j = 1 n ln ( 1 d j 2 ) Δ 1 , j Δ 2 , j
= R Y 1 | W ( Δ 1 ) + R Y 2 | W ( Δ 2 ) + I ( Y 1 , Y 2 ; W ) , ( Δ 1 , Δ 2 ) D W d e f i n e d   b y   ( 47 ) .
This completes the proof. □

4.3. Applications to Problems of the Literature [15,16,17]

The next two corollaries illustrate the application of the results developed in this paper to the optimization problems analyzed in [15,16,17].
Corollary 3.
Applications to problems in [15]
Consider the Gaussian secure source coding and Wyner’s common information [15], defined by the optimization problem [15] (see Equation (18), Section IV.B),
arg min P Y 1 , Y 2 , W : P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W λ I ( Y 1 ; W ) + I ( Y 1 , Y 2 ; W ) , λ [ 0 , )
where the tuple ( Y 1 , Y 2 ) are the zero mean jointly Gaussian, and W : Ω W is continuous or discrete-valued random variable (The derivation of the formula for (235) in [15] makes use of rate distortion functions, [15] (Equation (47))).
Then, the following hold.
For any jointly distributed random variables ( Y 1 , Y 2 , W ) that minimize the expression in (235), there exists a jointly Gaussian triple ( Y 1 , Y 2 , W ) such that W : R n is a Gaussian random variable, which achieves the same minimum value.
Moreover, the following characterization of (235) holds.
arg min P Y 1 , Y 2 , W : P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W λ I ( Y 1 ; W ) + I ( Y 1 , Y 2 ; W ) = arg min Q W Q W : Q W d e f i n e d   b y   ( 82 ) { λ 2 ln 1 det ( I D 1 / 2 Q W D 1 / 2 ) + 1 2 i = 1 n ln ( 1 d i 2 )
1 2 ln ( det ( [ I D 1 / 2 D W 1 D 1 / 2 ] [ I D 1 / 2 D W D 1 / 2 ] ) ) } = arg min Q W = D W Q W : Q W d e f i n e d   b y   ( 82 ) { λ 2 i = 1 n ln ( [ 1 d i q i ] 1 ) + 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( [ 1 d i q i ] [ 1 q i d i ] ) }
where
Q W = D W = Diag ( q 1 , q 2 , , q n ) Q W , q 1 q 2 q n > 0 , Q W = ( 82 ) .
Proof. 
By the use Theorem 9. (c), it suffices to restrict attention to jointly Gaussian random variables ( Y 1 , Y 2 , W ) . Transform the tuple ( Y 1 , Y 2 ) in the canonical variable form of Definition 1. Restrict attention to the correlated parts of these random variables, as defined in Theorem 8, by (77)–(79), and consider the realization of the transformed random variables of Corollary 1. Then, the value of I ( Y 1 ; W ) + I ( Y 2 , Y 2 ; W ) is identical to the value of the same expression, evaluated using the realization of Corollary 1. By simple evaluation, using the realization of Corollary 1,
λ I ( Y 1 ; W ) + I ( Y 2 , Y 2 ; W ) = λ 2 ln 1 det ( I D 1 / 2 Q W 1 D 1 / 2 ) + 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( det ( [ I D 1 / 2 Q W 1 D 1 / 2 ] [ I D 1 / 2 Q W D 1 / 2 ] ) )
and it is parameterized by Q W Q W , where Q W is defined by the set of Equation (82). By Hadamard’s determinant inequality, an achievable lower bound on the first right-hand side term of (239), holds if Q W Q W and ( I D 1 / 2 Q W 1 D 1 / 2 ) is a diagonal matrix, and this lower bound is achieved by a diagonal Q W Q W . Furthermore, by recalling the derivation of Theorem 11, an achievable lower bound on the second right-hand side term of (239) holds, i.e., of I ( Y 1 , Y 2 ; W ) , when Q W Q W is diagonal. Hence, both lower bounds are achieved simultaneously, by Q W Q W and Q W a diagonal matrix. Then, an achievable lower bound on (239) is obtained, if Q W is specified by (238). □
The remaining optimization problem in (237) is easily carried out, and hence omitted.
Corollary 4 illustrates the application of the results developed in this paper to the Gaussian relaxed Wyner’s common information [16,17] (Definition 2 and Section III).
Corollary 4.
Applications to problems in [16,17]
Consider the Gaussian relaxed Wyner’s common information considered in [16,17] (see Definition 2 and Section III of [17])
C γ ( Y 1 , Y 2 ) = min P W | Y 1 , Y 2 : I ( Y 1 ; Y 2 | W ) γ I ( Y 1 , Y 2 ; W )
where the tuple ( Y 1 , Y 2 ) are zero mean jointly Gaussian, and W : Ω W is continuous or discrete-valued random variable (the value of (240) computed in [16,17], Theorem 4, is different from (241); moreover, the derivation in [16,17], Section III.A, is different from the derivation presented below). Then
C γ ( Y 1 , Y 2 ) = C ( Y 1 , Y 2 ) = ( 19 ) , γ ( 0 , ) .
Proof. 
By the use Theorem 9. (c), it suffices to restrict the attention to jointly Gaussian random variables ( Y 1 , Y 2 , W ) . By Proposition 3 or Corollary 1, there exists a family of realizations of ( Y 1 , Y 2 ) parameterized by a Gaussian random variable W, which induces conditional independence P Y 1 , Y 2 | W = P Y 1 | W P Y 2 | W , and hence the lower bound I ( Y 1 , Y 2 ; W ) H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W ) is achieved, i.e., the constraint in (240) is always satisfied, because the minimizer is such that I ( Y 1 ; Y 2 | W ) = 0 , i.e., the constraint is not active. Hence, the general solution of (240) is the one given in Theorem 4. □
Remark 9.
Corollary 4 implies that the definition of the relaxed Gaussian Wyner’s common information considered in [16,17] (see Definition 2 and Section III of [17]) should be replaced by min P W | Y 1 , Y 2 : I ( Y 1 ; Y 2 | W ) = γ I ( Y 1 , Y 2 ; W ) , i.e., the inequality is replaced by an equality, so that the constraint is active for all γ ( 0 , ) .

4.4. Characterization and Parameterization of the Gray and Wyner Rate Region by Jointly Gaussian RVs

Derived in this section, for jointly Gaussian random variables with square-error distortion, using [1] ((4) of page 1703, Equation (42)), i.e., (12), and the RDFs, R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , R Y i ( Δ i ) , i = 1 , 2 , of Theorems 12, 13, 14, and Theorem 9, are:
(1)
Theorem 5—the characterizations of the rate region R G W ( Δ 1 , Δ 2 ) , and
(2)
The characterization of rates that lie on Pangloss Plane.
Proof 
(Proof of Theorem 5). (a) This follows from the realization of random variables that induce Gaussian measures, by repeating Theorem 9, without requiring W makes Y 1 and Y 2 conditionally independent, i.e., the stated realization with random variables Z 1 and Z 2 correlated, achieves a lower bound on I ( Y 1 , Y 2 ; W ) , among all random variables W. (b) The stated characterization (42) follows from the discussion prior to the Theorem, i.e., by an application of [1] ((4) of page 1703, Equation (42)), i.e., T ( α 1 , α 2 ) , and Theorem 9, Theorem 13, which imply that the infimum in T ( α 1 , α 2 ) is over the parameterized set of jointly Gaussian random variables ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) with joint distribution (68). From part (a), then (43) follows. (44) is due to identity I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | Y 2 , W ) H ( Y 2 | W ) and that the values R Y 1 | W ( Δ 1 ) and R Y 2 | W ( Δ 2 ) depend only on Q Y 1 | W and Q Y 2 | W , and the errors (see Theorem 13). □
Theorem 17 gives the parameterization of a subset of the Pangloss Plane, as a degenerate case of Theorem 5.
Theorem 17.
Consider the statement of Theorem 5.
(a) Rate triples ( R 0 , R 1 , R 2 ) that lie on the Pangloss Plane, is determined by the subset of the rate region R G W ( Δ 1 , Δ 2 ) of Theorem 5. (b), such that the joint distribution P W , Y 1 , Y 2 , Y ^ 1 , Y ^ 2 satisfies the conditions,
P Y ^ 1 , Y ^ 2 | W = P Y ^ 1 | W P Y ^ 2 | W , P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 , W = P Y 1 , Y 2 | Y ^ 1 , Y ^ 2 .
Specifically, the Pangloss Plane is characterized by
T G ( α 1 , α 2 ) = inf ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) o f ( 30 ) , ( 40 ) { I ( Y 1 , Y 2 ; W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) } = inf Q Y 1 | Y 2 , W , Q Y 2 | W { 1 2 ln det ( Q ( Y 1 , Y 2 ) ) det ( Q Y 1 | Y 2 , W ) det ( Q Y 2 | W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) }
such that
P W , Y 1 , Y 2 , Y ^ 1 , Y ^ 2 s a t i s f i e s ( 242 ) , a n d   i t s   m a r g i n a l s   i n d u c e   t h e   t e s t   c h a n n e l s o f t h e R D F s , R Y 1 , Y 2 ( Δ 1 , Δ 2 ) , R Y i | W ( Δ i ) , R Y i ( Δ i ) , i = 1 , 2 , o f   T h e o r e m s   12 ,   13 ,   14 .
(b) A subset of the rate triple ( R 0 , R 1 , R 2 ) that lie on the Pangloss Plane is determined by the restriction of part (a) to ( Y 1 , Y 2 | W ) CIG , i.e.,
T G C I ( α 1 , α 2 ) = inf ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) , ( Y 1 , Y 2 | W ) CIG { I ( Y 1 , Y 2 ; W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) } = inf Q Y 1 | W , Q Y 2 | W { 1 2 ln det ( Q ( Y 1 , Y 2 ) ) det ( Q Y 1 | W ) det ( Q Y 2 | W )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) }
such that (245) holds, where I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W ) , and R Y i | W ( Δ i ) , i = 1 , 2 are given in Theorem 13. (c).
Proof. 
(a) Condition (242) characterizes the rates ( R 0 , R 1 , R 2 ) R G W ( Δ 1 , Δ 2 ) that lie on the Pangloss plane, and these are derived in [12] (Theorem 1, Equations (19) and (20)). Hence, the statement follows from Theorem 5.
(b) That (246) defines a subset of R G W ( Δ 1 , Δ 2 ) , follows from the fact that the set of joint distributions ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) and ( Y 1 , Y 2 | W ) CIG is a subset of the set of joint distributions ( Y 1 , Y 2 , W ) G ( 0 , Q ( Y 1 , Y 2 , W ) ) . Moreover, by part (a) and ( Y 1 , Y 2 | W ) CIG implies I ( Y 1 , Y 2 ; W ) = H ( Y 1 , Y 2 ) H ( Y 1 | Y 2 , W ) H ( Y 2 | W ) = H ( Y 1 , Y 2 ) H ( Y 1 | W ) H ( Y 2 | W ) and that the values R Y 1 | W ( Δ 1 ) and R Y 2 | W ( Δ 2 ) only depend on Q Y 1 | W and Q Y 2 | W , and the errors, as shown in Theorem 13. (c). Hence, the statement holds. □
From Theorem 16 and Theorem 17. (b) follows a simpler parameterization of rates that lie on the Pangloss Plane of the Gray–Wyner rate region R G W ( Δ 1 , Δ 2 ) , when ( Y 1 , Y 2 ) are in canonical variable form.
Corollary 5.
Consider the statement of Theorem 16 with square error distortion functions, i.e., a tuple ( Y 1 , Y 2 ) is the canonical variable form.
A subset of the rate triple ( R 0 , R 1 , R 2 ) that lie on the Pangloss Plane corresponding to the restriction ( Y 1 , Y 2 | W ) CIG , is determined from
T cvf C I G ( α 1 , α 2 ) = inf Q W I ( Y 1 , Y 2 ; W ) + α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) = inf Q W { 1 2 i = 1 n ln ( 1 d i 2 ) 1 2 ln ( det ( [ I D 1 / 2 Q W 1 D 1 / 2 ] [ I D 1 / 2 Q W D 1 / 2 ] ) )
+ α 1 R Y 1 | W ( Δ 1 ) + α 2 R Y 2 | W ( Δ 2 ) }
such that (245) holds, where 0 α i 1 , i = 1 , 2 , α 1 + α 2 1 , and where R Y i | W ( Δ i ) , i = 1 , 2 are given in Theorem 16, and the infimum is taken over Q W Q W , defined by the set of Equation (82).
Proof. 
The stated characterization (248) is the application of [1] ((4) of page 1703, Equation (42)) and the results of Theorem 16 and Theorem 17. (b). □
In view of Theorem 1 (i.e., Theorem 8 in [1]), additional parameterizations of the Gray–Wyner rate region R G W ( Δ 1 , Δ 2 ) directly follow from the expressions already derived, i.e., the joint RDF R Y 1 , Y 2 ( Δ 1 , Δ 2 ) of Theorem 12, the conditional RDFs R Y 1 | W ( Δ 1 ) , R Y 2 | W ( Δ 2 ) of Theorem 13, the maginal RDFs R Y 1 ( Δ 1 ) , R Y 2 ( Δ 2 ) of Theorem 14, and the values of I ( Y 1 , Y 2 ; W ) , based on Gaussian realization of Theorem 5. (a).

5. Conclusions

This paper formulates the classical Gray and Wyner source coding for a simple network with a tuple of multivariate, correlated Gaussian random variables, with square-error fidelity at the two decoders, from the geometric approach of a Gaussian random variables and the weak stochastic realization of correlated Gaussian random variables. This approach leads to a parameterization of the Gray–Wyner rate region, with respect to variance matrix of the jointly Gaussian triple ( Y 1 , Y 2 , W ) , where W a Gaussian auxiliary random variable. However, much remains to be achieved, from the computation point of view, for this problem, and to exploit the new approach to other multi-user problems of information theory.

Author Contributions

C.D.C. and J.H.v.S. contributed to the conceptualization, methodology, and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The work of C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).

Data Availability Statement

Numerical evaluations of Wyner’s common information, based on the implementation of the canonical variable form, and the calculation of the canonical variable coefficients are found in [21], Section 3.7.

Acknowledgments

The second author is grateful to H.S. Witsenhausen (formerly affiliated with Bell Laboratories) for contacts about the problem of common information in the early 1980s. This paper is an answer to their questions about the problem of Wyner’s common information. The authors are very grateful to the University of Cyprus for the partial financial support which made their cooperation possible. The authors are also grateful to Guo Lei (Chinese Academy of Sciences, Institute for Mathematics) and to Xi Kaihua (Shandong University, Jinan, Shandong Province, China; formerly of Delft University of Technology) for help with obtaining copies of the papers of Hua LooKeng.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Appendix A.1. Algorithm to Generate the Canonical Variable Form

The algorithm that generates Q cvf is presented below.
Transformation of a variance matrix into its canonical variable form.
Data: p 1 , p 2 Z + , Q R ( p 1 + p 2 ) × ( p 1 + p 2 ) , satisfying Q = Q T 0 (it is noted that Q > 0 implies Q 11 > 0 , Q 22 > 0 ), with decomposition
Q = Q 11 Q 12 Q 12 T Q 22 , Q 11 R p 1 × p 1 , Q 22 R p 2 × p 2 , Q 12 R p 1 × p 2 , Q 11 > 0 , Q 22 > 0 .
1
Perform singular-value decompositions:
Q 11 = U 1 D 1 U 1 T , Q 22 = U 2 D 2 U 2 T ,
with U 1 R p 1 × p 1 orthogonal ( U 1 U 1 T = I = U 1 T U 1 ) and
D 1 = Diag ( d 1 , 1 , , d 1 , p 1 ) R p 1 × p 1 , d 1 , 1 d 1 , 2 d 1 , p 1 > 0 ,
and U 2 , D 2 satisfying corresponding conditions.
2
Perform a singular-value decomposition of
D 1 1 2 U 1 T Q 12 U 2 D 2 1 2 = U 3 D 3 U 4 T ,
with U 3 R p 1 × p 1 , U 4 R p 2 × p 2 orthogonal and
D 3 = I p 11 0 0 0 D 4 0 0 0 0 R p 1 × p 2 , D 4 = Diag ( d 4 , 1 , , d 4 , p 12 ) R p 12 × p 12 , 1 > d 4 , 1 d 4 , 2 d 4 , p 12 > 0 .
3
Compute the new variance matrix according to
Q cvf = I p 1 D 3 D 3 T I p 2 .
4
The transformation to the canonical variable representation
( Y 1 S 1 Y 1 , Y 2 S 2 Y 2 ) is then
S 1 = U 3 T D 1 1 2 U 1 T , S 2 = U 4 T D 2 1 2 U 2 T .
If Q > 0 then Q 11 > 0 and Q 22 > 0 , and D 3 does not contain the first row and first column, i.e., these are removed.

Appendix A.2. Information Theory

In this appendix, the reader finds two formulas of information theory which are used in the body of the paper. These are obtained from [29,30,31].
The first equality is proven in [29] (p. 19, Th. 2.4.1) and [30] (p. 31, (2.4.26), (2.4.28)).
Proposition A1.
Consider random variables Y 1 , 1 , Y 1 , 2 , Y 2 , 1 , Y 2 , 2 , X 1 , X 2 such that the following two triples are independent random variables, ( Y 1 , 1 , Y 2 , 1 , X 1 ) and ( Y 1 , 2 , Y 2 , 2 , X 2 ) . Then, the mutual information expression additively decomposes,
I ( Y 1 , 1 , Y 1 , 2 , Y 2 , 1 , Y 2 , 2 ; X 1 , X 2 ) = I ( Y 1 , 1 , Y 2 , 1 ; X 1 ) + I ( Y 1 , 2 , Y 2 , 2 ; X 2 ) .
Proof. 
For completeness, the proof is given in [21] (Proposition A.2). □
Consider a tuple of jointly Gaussian random variables ( X , Y ) G ( 0 , Q ( X , Y ) ) with X : Ω R n , Y : Ω R p , Q ( X , Y ) > 0 , and Q ( X , Y ) = Q X Q X , Y Q X , Y T Q Y . Then,
H ( X ) = 1 2 ln det ( Q X ) + 1 2 n ln ( 2 π e ) ,
H ( Y | X ) = H ( X , Y ) H ( X ) = 1 2 ln det ( Q Y Q X , Y T Q X 1 Q X , Y ) + 1 2 p ln ( 2 π e ) ,
I ( Y ; X ) = 1 2 ln det ( Q ( X , Y ) ) det ( Q Y ) det ( Q X ) .

Appendix A.3. An Inequality for Determinants

An inequality for matrices is derived in this appendix which is needed in the body of the paper.
Lemma A1.
Consider the real-valued matrices A , B R n × n . Assume that
0 I A T A , 0 < I B T B , a n d rank ( B ) = n .
Then,
det ( [ I A T A ] [ I B T B ] ) ( det ( I A T B ) ) 2 .
A related result is mentioned at [32] (Theorem 9.E.6). In that book, the proof of corresponding result refers to paper [33]. That reference was received by the authors but they could not read it because the paper is in Mandarin. However, they could read the formulas of the paper. Hua LooKeng developed these results to calculate an orthonormal basis for a function of one complex variable. A more recent reference for this inequality is [34] (Theorem 7.19).
The proof of Lemma A1 below is analogous to that of Hua LooKeng in [33]. The main differences are in the assumptions.
Lemma A2.
Ref. [33] (pp. 464, 470). Consider the matrices A , B R n × n . Assume that I B T B is a nonsingular matrix and that rank ( B ) = n . Then
( I A T A ) ( I A T B ) [ I B T B ] 1 ( I A T B ) T = ( A B ) [ I B T B ] 1 ( A B ) T .
Proof. 
For completeness, the proof is given in [21] (Lemma A.4). □
Proposition A2.
Ref. [33] (Equation (2)). Consider the symmetric positive-definite matrices Q 1 , Q 2 , Q R n × n such that Q 1 + Q 2 = Q . Then
det ( Q 1 ) + det ( Q 2 ) det ( Q ) .
Proof. Proof of Lemma A1.
By the assumptions, 0 ( I A T A ) , 0 < ( I B T B ) , and rank ( B ) = n , from Lemma A2 follows that
( A B ) [ I B T B ] 1 ( A B ) T + [ I A T A ] = ( I A T B ) [ I B T B ] 1 ( I A T B ) T ; 0 det ( I A T A ) , b y   t h e   a s s u m p t i o n o n   A , det ( ( A B ) [ I B T B ] 1 ( A B ) T ) + det ( [ I A T A ] ) , b y   a n   a s s u m p t i o n o n   B , det ( ( I A T B ) [ I B T B ] 1 ( I A T B ) T ) b y   L e m m a   , P r o p o s i t i o n   , a n d   b y   t h e   a s s u m p t i o n s , = det ( I A T B ) 2 [ det ( [ I B T B ] ) ] 1 , det ( [ I A T A ] [ I B T B ] ) = det ( [ I A T A ] ) det ( [ I B T B ] ) det ( [ I A T B ] ) 2 .
Another preliminary result is needed.
Proposition A3.
Consider the matrix Q X = Q X T R n × n of Proposition 2 and the matrix D R n × n of Definition 1. Thus, both Q X > 0 and D > 0 . Then,
D Q X 1 D 1 D Q X D 1 ,
D < Q X 1 < D 1 D < Q X < D 1 .
Proof. 
For completeness, the proof is given in [21] (Proposition A.6). □
Proposition A4.
Consider the matrices defined in Definition 1. Thus, D R n × n , is a diagonal matrix satisfying 0 < D , and the matrix Q X R n × n satisfies Q X = Q X T and 0 < D Q X D 1 . Then,
det [ I D 1 / 2 Q X 1 D 1 / 2 ] [ I D 1 / 2 Q X D 1 / 2 ] det ( [ I D ] 2 ) , Q X ,
det [ I D 1 / 2 Q X 1 D 1 / 2 ] [ I D 1 / 2 Q X D 1 / 2 ] < det ( [ I D ] 2 ) , i f Q X I .
Proof. 
(1) If Q X < D 1 then det ( D 1 Q X ) > 0 . Consider the case of in which Q X satisfies Q X D 1 but not Q X < D 1 . Then, det ( D 1 Q X ) = 0 . Hence,
det ( I D 1 / 2 Q X D 1 / 2 ) = det ( D 1 / 2 [ D 1 Q X ] D 1 / 2 ) = det ( D 1 / 2 ) det ( D 1 Q X ) det ( Q 1 / 2 ) = 0 .
Then, 0 < D < I implies that det ( [ I D ] 2 ) 0 , hence that the inequality (A11) holds.
(2) If D < Q X then det ( D Q X ) < 0 . If D Q X but not D < Q X then by Proposition A3 Q X 1 D 1 but not Q X 1 < D 1 . Then, det ( D 1 Q X 1 ) = 0 . Hence,
det ( I D 1 / 2 Q X 1 D 1 / 2 ) = det ( D 1 / 2 [ D 1 Q X 1 ] D 1 / 2 ) = det ( D 1 / 2 ) det ( D 1 Q X 1 ) det ( D 1 / 2 ) = 0 .
In this case, the inequality (A11) also holds.
(3) Then consider the case in which D < Q x < D 1 . Lemma A1 will be used to prove the result. Define, therefore,
A = ( Q X 1 / 2 ) D 1 / 2 , B = ( Q X 1 / 2 ) D 1 / 2 .
First, it is proven that the assumptions of the lemma are satisfied. Note that 0 < Q X implies that rank ( Q X ) = n . This and the fact that rank ( D ) = n imply that rank ( B ) = rank ( Q X 1 / 2 D 1 / 2 ) = n . Further note that
I A T A = I D 1 / 2 Q X 1 D 1 / 2 , 0 < D Q X D 1 b y   a s s u m p t i o n , 0 < D Q X 1 D 1 , b y   P r o p o s i t i o n , 0 < D 2 D 1 / 2 Q X 1 D 1 / 2 I 0 I D 1 / 2 Q X 1 D 1 / 2 = I A T A ; 0 < D < Q X < D 1 b y   t h e   c a s e   c o n s i d e r e d , 0 < D 2 < D 1 / 2 Q X D 1 / 2 < I , 0 < I D 1 / 2 Q X D 1 / 2 = I B T B ; I A T B = I D 1 / 2 Q X 1 / 2 Q X 1 / 2 D 1 / 2 = I D .
From Lemma A1 it then follows that
det [ I D 1 / 2 Q X 1 D 1 / 2 ] [ I D 1 / 2 Q X D 1 / 2 ] det ( [ I D ] 2 ) .
(4) Then suppose that, in addition, Q X I . Then
A B = Q X 1 / 2 D 1 / 2 Q X 1 / 2 D 1 / 2 = Q x 1 / 2 [ I Q X ] D 1 / 2 0 , u s i n g t h a t , Q X I , 0 < ( A B ) [ I B T B ] 1 ( A B ) T , b e c a u s e 0 < I B T B , a n d A B 0 , det ( I A T A ) < det ( ( A B ) [ I B T B ] 1 ( A B ) ) + det ( I A T A ) , det ( ( I A T B ) [ I B T B ] 1 ( I A T B ) T ) , b y   L e m m a , det ( ( I A T A ) ( I B T B ) ) = det ( I A T A ) det ( I B T B ) < [ det ( I A T B ) ] 2 , b y   L e m m a   a n d   i t s   p r o o f , det ( [ I D 1 / 2 Q X 1 D 1 / 2 ] [ I D 1 / 2 Q X D 1 / 2 ] ) < det ( [ I D ] 2 ) , b y   s u b . o f   A ,   B .

References

  1. Gray, R.M.; Wyner, A.D. Source coding for a simple network. Bell Syst. Tech. J. 1974, 53, 1681–1721. [Google Scholar] [CrossRef]
  2. Wyner, A.D. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
  3. Witsenhausen, H.S. Values and bounds for common information of two discrete variables. SIAM J. Appl. Math. 1976, 31, 313–333. [Google Scholar] [CrossRef]
  4. Witsenhausen, H.S. On sequences of pairs of dependent random variables. SIAM J. Appl. Math. 1975, 28, 100–113. [Google Scholar] [CrossRef]
  5. Gacs, P.; Korner, J. Common information is much less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
  6. Benammar, M.; Zaidi, A. Rate-distortion region of a Gray-Wyner model with side information. Entropy 2018, 20, 2. [Google Scholar] [CrossRef] [PubMed]
  7. Benammar, M.; Zaidi, A. Rate-distortion region of a Gray-Wyner problem with side information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2017), Aachen, Germany, 25–30 June 2017; pp. 106–110. [Google Scholar]
  8. Benammar, M.; Zaidi, A. Rate-distortion function for a Heegard-Berger problem with two sources and degraded reconstruction sets. IEEE Trans. Inf. Theory 2016, 62, 5080–5092. [Google Scholar] [CrossRef]
  9. Benammar, M.; Zaidi, A. Rate-distortion function for a Heegard-Berger problem with common reconstruction constraint. In Proceedings of the IEEE International Theory Workshop (ITW.2015), Jeju Island, Korea, 11–15 October 2015. [Google Scholar]
  10. Heegard, C.; Berger, T. Rate distortion when side information may be absent. IEEE Trans. Inf. Theory 1985, 31, 727–734. [Google Scholar] [CrossRef]
  11. Cuff, P.W.; Permuter, H.H.; Cover, T.M. Coordination Capacity. IEEE Trans. Inf. Theory 2010, 56, 4181–4206. [Google Scholar] [CrossRef]
  12. Kumar, B.; Viswanatha, E.A.; Rose, K. The lossy common information of correlated sources. IEEE Trans. Inf. Theory 2014, 60, 3238–3253. [Google Scholar]
  13. Xu, G.; Liu, W.; Chen, B. A lossy source coding interpretation of Wyner’s common information. IEEE Trans. Inf. Theory 2016, 62, 754–768. [Google Scholar] [CrossRef]
  14. Xiao, J.-J.; Luo, Z.-Q. Compression of Correlated Gaussian Sources under Individual Distortion Criteria. In Proceedings of the 43rd Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 28–30 September 2005; pp. 438–447. [Google Scholar]
  15. Satpathy, S.; Cuff, P. Gaussian secure source coding and Wyner’s common information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2015), Hong Kong, China, 14–19 July 2015; pp. 116–120. [Google Scholar]
  16. Veld, G.J.O.; Gastpar, M.C. Total correlation of Gaussian vector sources on the Gray-Wyner network. In Proceedings of the 54th Annual Allerton Conference on Communication, Control and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 385–392. [Google Scholar]
  17. Sula, E.; Gastpar, M. Relaxed Wyner’s common information. arXiv 2019, arXiv:1912.07083. [Google Scholar]
  18. Hotelling, H. Relation between two sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
  19. Gelfand, M.I.; Yaglom, M. Calculation of the amount of information about a random function contained in another such function. Am. Math. Soc. Transl. 1959, 2, 199–246. [Google Scholar]
  20. van Schuppen, J.H. Common, correlated, and private information in control of decentralized systems. In Coordination Control of Distributed Systems; van Schuppen, J.H., Villa, T., Eds.; Number 456 in Lecture Notes in Control and Information Sciences; Springer International Publishing: Cham, Switzerland, 2015; pp. 215–222. [Google Scholar]
  21. Charalambous, C.D.; van Schuppen, J.H. A new approach to lossy network compression of a tuple of correlated multivariate Gaussian RVs. arXiv 2019, arXiv:1905.12695. [Google Scholar]
  22. van Putten, C.; van Schuppen, J.H. The weak and strong Gaussian probabilistic realization problem. J. Multivar. Anal. 1983, 13, 118–137. [Google Scholar] [CrossRef]
  23. Noble, B. Applied Linear Algebra; Prentice-Hall: Englewood Cliffs, NJ, USA, 1969. [Google Scholar]
  24. Stylianou, E.; Charalambous, C.D.; Charalambous, T. Joint rate distortion function of a tuple of correlated multivariate Gaussian sources with individual fidelity criteria. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT.2021), Melbourne, Australia, 12–20 July 2021; pp. 2167–2172. [Google Scholar]
  25. Gkagkos, M.; Charalambous, C.D. Structural properties of optimal test channels for distributed source coding with decoder side information for multivariate Gaussian sources with square-error fidelity. arXiv 2020, arXiv:2011.10941. [Google Scholar]
  26. Gkangos, M.; Charalambous, C.D. Structural properties of test channels of the RDF for Gaussian multivariate distributed sources. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT.2021), Melbourne, Australia, 12–20 July 2021; pp. 2631–2636. [Google Scholar]
  27. Charalambous, C.D.; van Schuppen, J.H. Characterization of conditional independence and weak realizations of multivariate gaussian random variables: Applications to networks. In Proceedings of the IEEE International Symposium on Information Theory (ISIT.2020), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
  28. Gray, R.M. A new class of lower bounds to information rates of stationary via conditional rate-distortion functions. IEEE Trans. Inf. Theory 1973, 19, 480–489. [Google Scholar] [CrossRef]
  29. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
  30. Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons: New York, NY, USA, 1968. [Google Scholar]
  31. Yaglom, A.M.; Yaglom, I.M. Probability and Information; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1983. [Google Scholar]
  32. Marshall, A.W.; Olkin, I. Inequalities: Theory of Majorization and Its Applications; Academic Press: New York, NY, USA, 1979. [Google Scholar]
  33. Hua, L.K. Inequalies involving determinants. Acta Math. Sin. 1955, 5, 463–470, (In Chinese; English summary). [Google Scholar]
  34. Zhang, F. Positive semidefinite matrices. In Matrix Theory: Basic Results and Techniques; Springer Science+Business Media: New York, NY, USA, 2011; pp. 199–252. [Google Scholar]
Figure 1. The Gray and Wyner source coding for a simple network [1] ( Y 1 , i , Y 2 , i ) P Y 1 , Y 2 , i = 1 , , N .
Figure 1. The Gray and Wyner source coding for a simple network [1] ( Y 1 , i , Y 2 , i ) P Y 1 , Y 2 , i = 1 , , N .
Entropy 24 01227 g001
Figure 2. Weak stochastic realization of ( Y 1 , i , Y 2 , i ) P Y 1 , Y 2 , i = 1 , , N and ( Y ^ 1 , i , Y ^ 2 , i ) , i = 1 , , N at the encoder and decoder with respect to the common and private random variables ( W N , Z 1 N , Z 2 N ) , ( W N , Z ^ 1 N , Z ^ 2 N ) .
Figure 2. Weak stochastic realization of ( Y 1 , i , Y 2 , i ) P Y 1 , Y 2 , i = 1 , , N and ( Y ^ 1 , i , Y ^ 2 , i ) , i = 1 , , N at the encoder and decoder with respect to the common and private random variables ( W N , Z 1 N , Z 2 N ) , ( W N , Z ^ 1 N , Z ^ 2 N ) .
Entropy 24 01227 g002
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Charalambous, C.D.; van Schuppen, J.H. A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs. Entropy 2022, 24, 1227. https://doi.org/10.3390/e24091227

AMA Style

Charalambous CD, van Schuppen JH. A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs. Entropy. 2022; 24(9):1227. https://doi.org/10.3390/e24091227

Chicago/Turabian Style

Charalambous, Charalambos D., and Jan H. van Schuppen. 2022. "A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs" Entropy 24, no. 9: 1227. https://doi.org/10.3390/e24091227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop